[2024-08-04 09:37:42 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 09:37:42 vssm_base_ms_e300] (main_hfai_mnodes.py 533): INFO AMP_ENABLE: true AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 1.0 CUTMIX_MINMAX: null MIXUP: 0.8 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE: - '' DATA: BATCH_SIZE: 64 CACHE_MODE: part DATASET: imagenet DATA_PATH: /dataset/ImageNet_ILSVRC2012 IMG_SIZE: 224 INTERPOLATION: bicubic MASK_PATCH_SIZE: 32 MASK_RATIO: 0.6 NUM_WORKERS: 8 PERSISTENT_WORKERS: true PIN_MEMORY: true ZIP_MODE: false ENABLE_AMP: false EVAL_MODE: false FUSED_LAYERNORM: false MODEL: DDP: hfai DROP_PATH_RATE: 0.5 DROP_RATE: 0.0 LABEL_SMOOTHING: 0.1 MLLA: APE: false DEPTHS: - 2 - 4 - 8 - 4 DROP_PATH_RATE: 0.1 DROP_RATE: 0.0 EMBED_DIM: 64 IMAGE_SIZE: 224 IN_CHANS: 3 MLP_RATIO: 4.0 NUM_HEADS: - 2 - 4 - 8 - 16 PATCH_SIZE: 4 SIMPLE_DOWNSAMPLE: false SIMPLE_PATCH_EMBED: false MMCKPT: false NAME: vssm_base_ms_e300 NUM_CLASSES: 1000 PRETRAINED: '' RESUME: '' RMT: CHUNKWISE_RECURRENTS: - true - true - false - false DEPTHS: - 2 - 2 - 6 - 2 DROP_PATH_RATE: 0.1 EMBED_DIMS: - 64 - 128 - 256 - 512 HEADS_RANGES: - 3 - 3 - 3 - 3 INIT_VALUES: - 1 - 1 - 1 - 1 LAYERSCALES: - false - false - false - false MLP_RATIOS: - 3 - 3 - 3 - 3 NUM_HEADS: - 3 - 6 - 12 - 24 PATCH_NORM: true TYPE: vssm VMAMBA2: APE: false ATTN_TYPES: - mamba2 - mamba2 - mamba2 - mamba2 BIDIRECTION: false DEPTHS: - 2 - 4 - 8 - 4 DROP_PATH_RATE: 0.2 DROP_RATE: 0.0 D_STATE: 64 EMBED_DIM: 64 IMAGE_SIZE: 224 IN_CHANS: 3 LEPE: false LINEAR_ATTN_DUALITY: false MLP_RATIO: 4.0 NUM_HEADS: - 2 - 4 - 8 - 16 PARTIAL_WIN_SIZE: -1 PATCH_SIZE: 4 SIMPLE_DOWNSAMPLE: true SIMPLE_PATCH_EMBED: true SSD_AEXP: false SSD_CHUNK_SIZE: 256 SSD_EXPANSION: 2 SSD_NGROUPS: 1 SSD_POSITIVE_DA: false VSSM: ADD_SE: true AXIS_STAGE: [] CONVFFN: true CONV_FFN_RATIO: 4 DEPTHS: - 2 - 4 - 18 - 2 DOWNSAMPLE: v3 EMBED_DIM: 128 FULL_RES_INDEX: 1 GMLP: false IN_CHANS: 3 MLP_ACT_LAYER: gelu MLP_DROP_RATE: 0.0 MLP_RATIO: 4.0 NORM_LAYER: ln2d NUM_HEADS: - 1 - 2 - 4 - 8 PATCHEMBED: v2 PATCH_NORM: true PATCH_SIZE: 4 POSEMBED: false PRE_NORM: false SSM_ACT_LAYER: silu SSM_CONV: 3 SSM_CONV_BIAS: true SSM_DROP_RATE: 0.0 SSM_DT_RANK: auto SSM_D_STATE: 1 SSM_FORWARDTYPE: vms SSM_INIT: v0 SSM_RANK_RATIO: 2.0 SSM_RATIO: 1.0 OUTPUT: ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509 PRINT_FREQ: 10 SAVE_FREQ: 1 SEED: 0 TAG: '20240804093509' TEST: CROP: true SEQUENTIAL: false SHUFFLE: false THROUGHPUT_MODE: false TRAIN: ACCUMULATION_STEPS: 1 AUTO_RESUME: true BASE_LR: 0.0012 CLIP_GRAD: 5.0 EPOCHS: 300 LAYER_DECAY: 1.0 LR_SCHEDULER: DECAY_EPOCHS: 30 DECAY_RATE: 0.1 GAMMA: 0.1 MULTISTEPS: [] NAME: cosine WARMUP_PREFIX: true MIN_LR: 1.2e-05 MOE: SAVE_MASTER: false OPTIMIZER: BETAS: - 0.9 - 0.999 EPS: 1.0e-08 MOMENTUM: 0.9 NAME: adamw START_EPOCH: 0 USE_CHECKPOINT: false WARMUP_EPOCHS: 20 WARMUP_LR: 2.0e-06 WEIGHT_DECAY: 0.05 TRAINCOST_MODE: false [2024-08-04 09:37:42 vssm_base_ms_e300] (main_hfai_mnodes.py 534): INFO {"cfg": "./configs/msvssm/msvmamba_base_224.yaml", "opts": null, "batch_size": 64, "data_path": "/dataset/ImageNet_ILSVRC2012", "zip": false, "cache_mode": "part", "pretrained": null, "resume": null, "accumulation_steps": null, "use_checkpoint": false, "disable_amp": false, "output": "./exclude/output_msvmamba", "tag": "20240804093509", "eval": false, "throughput": false, "fused_layernorm": false, "optim": null, "model_ema": true, "model_ema_decay": 0.99984, "model_ema_force_cpu": false, "memory_limit_rate": -1, "ddp": "hfai", "enable_preload": true, "enable_persistance": true, "mesa": false, "mesa_value": 1.0, "mute_repeat": false} [2024-08-04 09:37:43 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 09:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 135): INFO VSSM( (patch_embed): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (1): Identity() (2): LayerNorm2d((64,), eps=1e-05, elementwise_affine=True) (3): Identity() (4): GELU(approximate='none') (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (6): Identity() (7): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) ) (layers): ModuleList( (0): Sequential( (blocks): Sequential( (0): VSSBlock( (norm): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=128, out_features=16, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=16, out_features=128, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(128, 128, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=128) (in_proj): Linear2d(in_features=128, out_features=256, bias=False) (act): SiLU() (conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (out_act): Identity() (out_proj): Linear2d(in_features=128, out_features=128, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.0) (convFFN): ConvFFN( (linear1): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) ) (norm2): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) ) (1): VSSBlock( (norm): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=128, out_features=16, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=16, out_features=128, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(128, 128, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=128) (in_proj): Linear2d(in_features=128, out_features=256, bias=False) (act): SiLU() (conv2d): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128) (out_act): Identity() (out_proj): Linear2d(in_features=128, out_features=128, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.019999999552965164) (convFFN): ConvFFN( (linear1): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) ) (norm2): LayerNorm2d((128,), eps=1e-05, elementwise_affine=True) ) ) (downsample): Sequential( (0): Identity() (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (2): Identity() (3): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) ) ) (1): Sequential( (blocks): Sequential( (0): VSSBlock( (norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=256, out_features=32, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=32, out_features=256, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(256, 256, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=256) (in_proj): Linear2d(in_features=256, out_features=512, bias=False) (act): SiLU() (conv2d): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (out_act): Identity() (out_proj): Linear2d(in_features=256, out_features=256, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.03999999910593033) (convFFN): ConvFFN( (linear1): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024) ) (norm2): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) ) (1): VSSBlock( (norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=256, out_features=32, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=32, out_features=256, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(256, 256, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=256) (in_proj): Linear2d(in_features=256, out_features=512, bias=False) (act): SiLU() (conv2d): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (out_act): Identity() (out_proj): Linear2d(in_features=256, out_features=256, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.05999999865889549) (convFFN): ConvFFN( (linear1): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024) ) (norm2): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) ) (2): VSSBlock( (norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=256, out_features=32, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=32, out_features=256, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(256, 256, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=256) (in_proj): Linear2d(in_features=256, out_features=512, bias=False) (act): SiLU() (conv2d): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (out_act): Identity() (out_proj): Linear2d(in_features=256, out_features=256, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.07999999821186066) (convFFN): ConvFFN( (linear1): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024) ) (norm2): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) ) (3): VSSBlock( (norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=256, out_features=32, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=32, out_features=256, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(256, 256, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=256) (in_proj): Linear2d(in_features=256, out_features=512, bias=False) (act): SiLU() (conv2d): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256) (out_act): Identity() (out_proj): Linear2d(in_features=256, out_features=256, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.09999999403953552) (convFFN): ConvFFN( (linear1): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024) ) (norm2): LayerNorm2d((256,), eps=1e-05, elementwise_affine=True) ) ) (downsample): Sequential( (0): Identity() (1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (2): Identity() (3): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) ) (2): Sequential( (blocks): Sequential( (0): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.11999999731779099) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (1): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.14000000059604645) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (2): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.1599999964237213) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (3): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.17999999225139618) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (4): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.19999998807907104) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (5): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.2199999988079071) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (6): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.23999999463558197) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (7): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.25999999046325684) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (8): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.2800000011920929) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (9): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.30000001192092896) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (10): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.3199999928474426) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (11): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.3400000035762787) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (12): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.36000001430511475) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (13): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.3799999952316284) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (14): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.4000000059604645) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (15): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.42000001668930054) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (16): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.4399999976158142) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) (17): VSSBlock( (norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=512, out_features=64, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=64, out_features=512, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(512, 512, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=512) (in_proj): Linear2d(in_features=512, out_features=1024, bias=False) (act): SiLU() (conv2d): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512) (out_act): Identity() (out_proj): Linear2d(in_features=512, out_features=512, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.46000000834465027) (convFFN): ConvFFN( (linear1): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048) ) (norm2): LayerNorm2d((512,), eps=1e-05, elementwise_affine=True) ) ) (downsample): Sequential( (0): Identity() (1): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (2): Identity() (3): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) ) ) (3): Sequential( (blocks): Sequential( (0): VSSBlock( (norm): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=1024, out_features=128, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=128, out_features=1024, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(1024, 1024, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=1024) (in_proj): Linear2d(in_features=1024, out_features=2048, bias=False) (act): SiLU() (conv2d): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024) (out_act): Identity() (out_proj): Linear2d(in_features=1024, out_features=1024, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.47999998927116394) (convFFN): ConvFFN( (linear1): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(4096, 1024, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4096) ) (norm2): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) ) (1): VSSBlock( (norm): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) (op): SS2D( (out_norm): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) (se): SEModule( (avg_pool): AdaptiveAvgPool2d(output_size=1) (fc): Sequential( (0): Linear(in_features=1024, out_features=128, bias=False) (1): ReLU(inplace=True) (2): Linear(in_features=128, out_features=1024, bias=False) (3): Sigmoid() ) ) (conv2d_b1): Conv2d(1024, 1024, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), groups=1024) (in_proj): Linear2d(in_features=1024, out_features=2048, bias=False) (act): SiLU() (conv2d): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1024) (out_act): Identity() (out_proj): Linear2d(in_features=1024, out_features=1024, bias=False) (dropout): Identity() ) (drop_path): timm.DropPath(0.5) (convFFN): ConvFFN( (linear1): Conv2d(1024, 4096, kernel_size=(1, 1), stride=(1, 1)) (drop1): Dropout(p=0.0, inplace=True) (act): GELU(approximate='none') (linear2): Conv2d(4096, 1024, kernel_size=(1, 1), stride=(1, 1)) (drop2): Dropout(p=0.0, inplace=True) (dwc): Conv2d(4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4096) ) (norm2): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) ) ) (downsample): Identity() ) ) (classifier): Sequential( (norm): LayerNorm2d((1024,), eps=1e-05, elementwise_affine=True) (permute): Identity() (avgpool): AdaptiveAvgPool2d(output_size=1) (flatten): Flatten(start_dim=1, end_dim=-1) (head): Linear(in_features=1024, out_features=1000, bias=True) ) ) [2024-08-04 09:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 137): INFO number of params: 90591976 [2024-08-04 09:38:02 vssm_base_ms_e300] (main_hfai_mnodes.py 139): INFO number of GFLOPs: 16.282519552 [2024-08-04 09:38:02 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 09:38:02 vssm_base_ms_e300] (optimizer.py 27): INFO No weight decay list: ['patch_embed.0.bias', 'patch_embed.2.weight', 'patch_embed.2.bias', 'patch_embed.5.bias', 'patch_embed.7.weight', 'patch_embed.7.bias', 'layers.0.blocks.0.norm.weight', 'layers.0.blocks.0.norm.bias', 'layers.0.blocks.0.op.Ds', 'layers.0.blocks.0.op.out_norm.weight', 'layers.0.blocks.0.op.out_norm.bias', 'layers.0.blocks.0.op.conv2d_b1.bias', 'layers.0.blocks.0.op.conv2d.bias', 'layers.0.blocks.0.convFFN.linear1.bias', 'layers.0.blocks.0.convFFN.linear2.bias', 'layers.0.blocks.0.convFFN.dwc.bias', 'layers.0.blocks.0.norm2.weight', 'layers.0.blocks.0.norm2.bias', 'layers.0.blocks.1.norm.weight', 'layers.0.blocks.1.norm.bias', 'layers.0.blocks.1.op.Ds', 'layers.0.blocks.1.op.out_norm.weight', 'layers.0.blocks.1.op.out_norm.bias', 'layers.0.blocks.1.op.conv2d_b1.bias', 'layers.0.blocks.1.op.conv2d.bias', 'layers.0.blocks.1.convFFN.linear1.bias', 'layers.0.blocks.1.convFFN.linear2.bias', 'layers.0.blocks.1.convFFN.dwc.bias', 'layers.0.blocks.1.norm2.weight', 'layers.0.blocks.1.norm2.bias', 'layers.0.downsample.1.bias', 'layers.0.downsample.3.weight', 'layers.0.downsample.3.bias', 'layers.1.blocks.0.norm.weight', 'layers.1.blocks.0.norm.bias', 'layers.1.blocks.0.op.Ds', 'layers.1.blocks.0.op.out_norm.weight', 'layers.1.blocks.0.op.out_norm.bias', 'layers.1.blocks.0.op.conv2d_b1.bias', 'layers.1.blocks.0.op.conv2d.bias', 'layers.1.blocks.0.convFFN.linear1.bias', 'layers.1.blocks.0.convFFN.linear2.bias', 'layers.1.blocks.0.convFFN.dwc.bias', 'layers.1.blocks.0.norm2.weight', 'layers.1.blocks.0.norm2.bias', 'layers.1.blocks.1.norm.weight', 'layers.1.blocks.1.norm.bias', 'layers.1.blocks.1.op.Ds', 'layers.1.blocks.1.op.out_norm.weight', 'layers.1.blocks.1.op.out_norm.bias', 'layers.1.blocks.1.op.conv2d_b1.bias', 'layers.1.blocks.1.op.conv2d.bias', 'layers.1.blocks.1.convFFN.linear1.bias', 'layers.1.blocks.1.convFFN.linear2.bias', 'layers.1.blocks.1.convFFN.dwc.bias', 'layers.1.blocks.1.norm2.weight', 'layers.1.blocks.1.norm2.bias', 'layers.1.blocks.2.norm.weight', 'layers.1.blocks.2.norm.bias', 'layers.1.blocks.2.op.Ds', 'layers.1.blocks.2.op.out_norm.weight', 'layers.1.blocks.2.op.out_norm.bias', 'layers.1.blocks.2.op.conv2d_b1.bias', 'layers.1.blocks.2.op.conv2d.bias', 'layers.1.blocks.2.convFFN.linear1.bias', 'layers.1.blocks.2.convFFN.linear2.bias', 'layers.1.blocks.2.convFFN.dwc.bias', 'layers.1.blocks.2.norm2.weight', 'layers.1.blocks.2.norm2.bias', 'layers.1.blocks.3.norm.weight', 'layers.1.blocks.3.norm.bias', 'layers.1.blocks.3.op.Ds', 'layers.1.blocks.3.op.out_norm.weight', 'layers.1.blocks.3.op.out_norm.bias', 'layers.1.blocks.3.op.conv2d_b1.bias', 'layers.1.blocks.3.op.conv2d.bias', 'layers.1.blocks.3.convFFN.linear1.bias', 'layers.1.blocks.3.convFFN.linear2.bias', 'layers.1.blocks.3.convFFN.dwc.bias', 'layers.1.blocks.3.norm2.weight', 'layers.1.blocks.3.norm2.bias', 'layers.1.downsample.1.bias', 'layers.1.downsample.3.weight', 'layers.1.downsample.3.bias', 'layers.2.blocks.0.norm.weight', 'layers.2.blocks.0.norm.bias', 'layers.2.blocks.0.op.Ds', 'layers.2.blocks.0.op.out_norm.weight', 'layers.2.blocks.0.op.out_norm.bias', 'layers.2.blocks.0.op.conv2d_b1.bias', 'layers.2.blocks.0.op.conv2d.bias', 'layers.2.blocks.0.convFFN.linear1.bias', 'layers.2.blocks.0.convFFN.linear2.bias', 'layers.2.blocks.0.convFFN.dwc.bias', 'layers.2.blocks.0.norm2.weight', 'layers.2.blocks.0.norm2.bias', 'layers.2.blocks.1.norm.weight', 'layers.2.blocks.1.norm.bias', 'layers.2.blocks.1.op.Ds', 'layers.2.blocks.1.op.out_norm.weight', 'layers.2.blocks.1.op.out_norm.bias', 'layers.2.blocks.1.op.conv2d_b1.bias', 'layers.2.blocks.1.op.conv2d.bias', 'layers.2.blocks.1.convFFN.linear1.bias', 'layers.2.blocks.1.convFFN.linear2.bias', 'layers.2.blocks.1.convFFN.dwc.bias', 'layers.2.blocks.1.norm2.weight', 'layers.2.blocks.1.norm2.bias', 'layers.2.blocks.2.norm.weight', 'layers.2.blocks.2.norm.bias', 'layers.2.blocks.2.op.Ds', 'layers.2.blocks.2.op.out_norm.weight', 'layers.2.blocks.2.op.out_norm.bias', 'layers.2.blocks.2.op.conv2d_b1.bias', 'layers.2.blocks.2.op.conv2d.bias', 'layers.2.blocks.2.convFFN.linear1.bias', 'layers.2.blocks.2.convFFN.linear2.bias', 'layers.2.blocks.2.convFFN.dwc.bias', 'layers.2.blocks.2.norm2.weight', 'layers.2.blocks.2.norm2.bias', 'layers.2.blocks.3.norm.weight', 'layers.2.blocks.3.norm.bias', 'layers.2.blocks.3.op.Ds', 'layers.2.blocks.3.op.out_norm.weight', 'layers.2.blocks.3.op.out_norm.bias', 'layers.2.blocks.3.op.conv2d_b1.bias', 'layers.2.blocks.3.op.conv2d.bias', 'layers.2.blocks.3.convFFN.linear1.bias', 'layers.2.blocks.3.convFFN.linear2.bias', 'layers.2.blocks.3.convFFN.dwc.bias', 'layers.2.blocks.3.norm2.weight', 'layers.2.blocks.3.norm2.bias', 'layers.2.blocks.4.norm.weight', 'layers.2.blocks.4.norm.bias', 'layers.2.blocks.4.op.Ds', 'layers.2.blocks.4.op.out_norm.weight', 'layers.2.blocks.4.op.out_norm.bias', 'layers.2.blocks.4.op.conv2d_b1.bias', 'layers.2.blocks.4.op.conv2d.bias', 'layers.2.blocks.4.convFFN.linear1.bias', 'layers.2.blocks.4.convFFN.linear2.bias', 'layers.2.blocks.4.convFFN.dwc.bias', 'layers.2.blocks.4.norm2.weight', 'layers.2.blocks.4.norm2.bias', 'layers.2.blocks.5.norm.weight', 'layers.2.blocks.5.norm.bias', 'layers.2.blocks.5.op.Ds', 'layers.2.blocks.5.op.out_norm.weight', 'layers.2.blocks.5.op.out_norm.bias', 'layers.2.blocks.5.op.conv2d_b1.bias', 'layers.2.blocks.5.op.conv2d.bias', 'layers.2.blocks.5.convFFN.linear1.bias', 'layers.2.blocks.5.convFFN.linear2.bias', 'layers.2.blocks.5.convFFN.dwc.bias', 'layers.2.blocks.5.norm2.weight', 'layers.2.blocks.5.norm2.bias', 'layers.2.blocks.6.norm.weight', 'layers.2.blocks.6.norm.bias', 'layers.2.blocks.6.op.Ds', 'layers.2.blocks.6.op.out_norm.weight', 'layers.2.blocks.6.op.out_norm.bias', 'layers.2.blocks.6.op.conv2d_b1.bias', 'layers.2.blocks.6.op.conv2d.bias', 'layers.2.blocks.6.convFFN.linear1.bias', 'layers.2.blocks.6.convFFN.linear2.bias', 'layers.2.blocks.6.convFFN.dwc.bias', 'layers.2.blocks.6.norm2.weight', 'layers.2.blocks.6.norm2.bias', 'layers.2.blocks.7.norm.weight', 'layers.2.blocks.7.norm.bias', 'layers.2.blocks.7.op.Ds', 'layers.2.blocks.7.op.out_norm.weight', 'layers.2.blocks.7.op.out_norm.bias', 'layers.2.blocks.7.op.conv2d_b1.bias', 'layers.2.blocks.7.op.conv2d.bias', 'layers.2.blocks.7.convFFN.linear1.bias', 'layers.2.blocks.7.convFFN.linear2.bias', 'layers.2.blocks.7.convFFN.dwc.bias', 'layers.2.blocks.7.norm2.weight', 'layers.2.blocks.7.norm2.bias', 'layers.2.blocks.8.norm.weight', 'layers.2.blocks.8.norm.bias', 'layers.2.blocks.8.op.Ds', 'layers.2.blocks.8.op.out_norm.weight', 'layers.2.blocks.8.op.out_norm.bias', 'layers.2.blocks.8.op.conv2d_b1.bias', 'layers.2.blocks.8.op.conv2d.bias', 'layers.2.blocks.8.convFFN.linear1.bias', 'layers.2.blocks.8.convFFN.linear2.bias', 'layers.2.blocks.8.convFFN.dwc.bias', 'layers.2.blocks.8.norm2.weight', 'layers.2.blocks.8.norm2.bias', 'layers.2.blocks.9.norm.weight', 'layers.2.blocks.9.norm.bias', 'layers.2.blocks.9.op.Ds', 'layers.2.blocks.9.op.out_norm.weight', 'layers.2.blocks.9.op.out_norm.bias', 'layers.2.blocks.9.op.conv2d_b1.bias', 'layers.2.blocks.9.op.conv2d.bias', 'layers.2.blocks.9.convFFN.linear1.bias', 'layers.2.blocks.9.convFFN.linear2.bias', 'layers.2.blocks.9.convFFN.dwc.bias', 'layers.2.blocks.9.norm2.weight', 'layers.2.blocks.9.norm2.bias', 'layers.2.blocks.10.norm.weight', 'layers.2.blocks.10.norm.bias', 'layers.2.blocks.10.op.Ds', 'layers.2.blocks.10.op.out_norm.weight', 'layers.2.blocks.10.op.out_norm.bias', 'layers.2.blocks.10.op.conv2d_b1.bias', 'layers.2.blocks.10.op.conv2d.bias', 'layers.2.blocks.10.convFFN.linear1.bias', 'layers.2.blocks.10.convFFN.linear2.bias', 'layers.2.blocks.10.convFFN.dwc.bias', 'layers.2.blocks.10.norm2.weight', 'layers.2.blocks.10.norm2.bias', 'layers.2.blocks.11.norm.weight', 'layers.2.blocks.11.norm.bias', 'layers.2.blocks.11.op.Ds', 'layers.2.blocks.11.op.out_norm.weight', 'layers.2.blocks.11.op.out_norm.bias', 'layers.2.blocks.11.op.conv2d_b1.bias', 'layers.2.blocks.11.op.conv2d.bias', 'layers.2.blocks.11.convFFN.linear1.bias', 'layers.2.blocks.11.convFFN.linear2.bias', 'layers.2.blocks.11.convFFN.dwc.bias', 'layers.2.blocks.11.norm2.weight', 'layers.2.blocks.11.norm2.bias', 'layers.2.blocks.12.norm.weight', 'layers.2.blocks.12.norm.bias', 'layers.2.blocks.12.op.Ds', 'layers.2.blocks.12.op.out_norm.weight', 'layers.2.blocks.12.op.out_norm.bias', 'layers.2.blocks.12.op.conv2d_b1.bias', 'layers.2.blocks.12.op.conv2d.bias', 'layers.2.blocks.12.convFFN.linear1.bias', 'layers.2.blocks.12.convFFN.linear2.bias', 'layers.2.blocks.12.convFFN.dwc.bias', 'layers.2.blocks.12.norm2.weight', 'layers.2.blocks.12.norm2.bias', 'layers.2.blocks.13.norm.weight', 'layers.2.blocks.13.norm.bias', 'layers.2.blocks.13.op.Ds', 'layers.2.blocks.13.op.out_norm.weight', 'layers.2.blocks.13.op.out_norm.bias', 'layers.2.blocks.13.op.conv2d_b1.bias', 'layers.2.blocks.13.op.conv2d.bias', 'layers.2.blocks.13.convFFN.linear1.bias', 'layers.2.blocks.13.convFFN.linear2.bias', 'layers.2.blocks.13.convFFN.dwc.bias', 'layers.2.blocks.13.norm2.weight', 'layers.2.blocks.13.norm2.bias', 'layers.2.blocks.14.norm.weight', 'layers.2.blocks.14.norm.bias', 'layers.2.blocks.14.op.Ds', 'layers.2.blocks.14.op.out_norm.weight', 'layers.2.blocks.14.op.out_norm.bias', 'layers.2.blocks.14.op.conv2d_b1.bias', 'layers.2.blocks.14.op.conv2d.bias', 'layers.2.blocks.14.convFFN.linear1.bias', 'layers.2.blocks.14.convFFN.linear2.bias', 'layers.2.blocks.14.convFFN.dwc.bias', 'layers.2.blocks.14.norm2.weight', 'layers.2.blocks.14.norm2.bias', 'layers.2.blocks.15.norm.weight', 'layers.2.blocks.15.norm.bias', 'layers.2.blocks.15.op.Ds', 'layers.2.blocks.15.op.out_norm.weight', 'layers.2.blocks.15.op.out_norm.bias', 'layers.2.blocks.15.op.conv2d_b1.bias', 'layers.2.blocks.15.op.conv2d.bias', 'layers.2.blocks.15.convFFN.linear1.bias', 'layers.2.blocks.15.convFFN.linear2.bias', 'layers.2.blocks.15.convFFN.dwc.bias', 'layers.2.blocks.15.norm2.weight', 'layers.2.blocks.15.norm2.bias', 'layers.2.blocks.16.norm.weight', 'layers.2.blocks.16.norm.bias', 'layers.2.blocks.16.op.Ds', 'layers.2.blocks.16.op.out_norm.weight', 'layers.2.blocks.16.op.out_norm.bias', 'layers.2.blocks.16.op.conv2d_b1.bias', 'layers.2.blocks.16.op.conv2d.bias', 'layers.2.blocks.16.convFFN.linear1.bias', 'layers.2.blocks.16.convFFN.linear2.bias', 'layers.2.blocks.16.convFFN.dwc.bias', 'layers.2.blocks.16.norm2.weight', 'layers.2.blocks.16.norm2.bias', 'layers.2.blocks.17.norm.weight', 'layers.2.blocks.17.norm.bias', 'layers.2.blocks.17.op.Ds', 'layers.2.blocks.17.op.out_norm.weight', 'layers.2.blocks.17.op.out_norm.bias', 'layers.2.blocks.17.op.conv2d_b1.bias', 'layers.2.blocks.17.op.conv2d.bias', 'layers.2.blocks.17.convFFN.linear1.bias', 'layers.2.blocks.17.convFFN.linear2.bias', 'layers.2.blocks.17.convFFN.dwc.bias', 'layers.2.blocks.17.norm2.weight', 'layers.2.blocks.17.norm2.bias', 'layers.2.downsample.1.bias', 'layers.2.downsample.3.weight', 'layers.2.downsample.3.bias', 'layers.3.blocks.0.norm.weight', 'layers.3.blocks.0.norm.bias', 'layers.3.blocks.0.op.Ds', 'layers.3.blocks.0.op.out_norm.weight', 'layers.3.blocks.0.op.out_norm.bias', 'layers.3.blocks.0.op.conv2d_b1.bias', 'layers.3.blocks.0.op.conv2d.bias', 'layers.3.blocks.0.convFFN.linear1.bias', 'layers.3.blocks.0.convFFN.linear2.bias', 'layers.3.blocks.0.convFFN.dwc.bias', 'layers.3.blocks.0.norm2.weight', 'layers.3.blocks.0.norm2.bias', 'layers.3.blocks.1.norm.weight', 'layers.3.blocks.1.norm.bias', 'layers.3.blocks.1.op.Ds', 'layers.3.blocks.1.op.out_norm.weight', 'layers.3.blocks.1.op.out_norm.bias', 'layers.3.blocks.1.op.conv2d_b1.bias', 'layers.3.blocks.1.op.conv2d.bias', 'layers.3.blocks.1.convFFN.linear1.bias', 'layers.3.blocks.1.convFFN.linear2.bias', 'layers.3.blocks.1.convFFN.dwc.bias', 'layers.3.blocks.1.norm2.weight', 'layers.3.blocks.1.norm2.bias', 'classifier.norm.weight', 'classifier.norm.bias', 'classifier.head.bias'] [2024-08-04 09:38:05 vssm_base_ms_e300] (main_hfai_mnodes.py 195): INFO no checkpoint found in ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509, ignoring auto resume [2024-08-04 09:38:05 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 09:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][0/625] eta 3:14:51 lr 0.000002 wd 0.0500 time 18.7060 (18.7060) data time 0.5813 (0.5813) model time 0.0000 (0.0000) loss 7.0123 (7.0123) grad_norm 2.1798 (2.1798) loss_scale 65536.0000 (65536.0000) mem 32970MB [2024-08-04 09:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][10/625] eta 0:22:08 lr 0.000003 wd 0.0500 time 0.4425 (2.1602) data time 0.0008 (0.0537) model time 0.0000 (0.0000) loss 7.0875 (7.0315) grad_norm 2.0420 (2.0928) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:38:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][20/625] eta 0:13:32 lr 0.000004 wd 0.0500 time 0.4416 (1.3422) data time 0.0008 (0.0285) model time 0.0000 (0.0000) loss 7.0305 (7.0143) grad_norm 1.9103 (2.0344) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:38:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][30/625] eta 0:10:25 lr 0.000005 wd 0.0500 time 0.4411 (1.0518) data time 0.0006 (0.0200) model time 0.0000 (0.0000) loss 6.8915 (6.9997) grad_norm 1.9347 (1.9987) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:38:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][40/625] eta 0:08:48 lr 0.000006 wd 0.0500 time 0.4487 (0.9037) data time 0.0009 (0.0153) model time 0.0000 (0.0000) loss 6.9100 (6.9947) grad_norm 1.8019 (1.9661) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:38:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][50/625] eta 0:07:47 lr 0.000007 wd 0.0500 time 0.4388 (0.8133) data time 0.0008 (0.0125) model time 0.0000 (0.0000) loss 6.9597 (6.9915) grad_norm 1.7760 (1.9313) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][60/625] eta 0:07:05 lr 0.000008 wd 0.0500 time 0.4432 (0.7527) data time 0.0010 (0.0106) model time 0.4422 (0.4431) loss 7.0075 (6.9878) grad_norm 1.7483 (1.9045) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][70/625] eta 0:06:33 lr 0.000009 wd 0.0500 time 0.4445 (0.7093) data time 0.0009 (0.0093) model time 0.4435 (0.4432) loss 6.9857 (6.9849) grad_norm 1.7131 (1.8818) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][80/625] eta 0:06:08 lr 0.000010 wd 0.0500 time 0.4509 (0.6766) data time 0.0009 (0.0082) model time 0.4500 (0.4435) loss 6.9312 (6.9801) grad_norm 1.6533 (1.8584) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][90/625] eta 0:05:48 lr 0.000011 wd 0.0500 time 0.4467 (0.6511) data time 0.0007 (0.0074) model time 0.4460 (0.4435) loss 6.9353 (6.9767) grad_norm 1.6917 (1.8371) loss_scale 65536.0000 (65536.0000) mem 16787MB [2024-08-04 09:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][100/625] eta 0:05:31 lr 0.000012 wd 0.0500 time 0.4431 (0.6321) data time 0.0007 (0.0068) model time 0.4424 (0.4463) loss 6.9473 (6.9737) grad_norm 1.6191 (1.8165) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][110/625] eta 0:05:16 lr 0.000013 wd 0.0500 time 0.4404 (0.6150) data time 0.0009 (0.0063) model time 0.4395 (0.4456) loss 6.9052 (6.9716) grad_norm 1.6608 (1.7991) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][120/625] eta 0:05:03 lr 0.000014 wd 0.0500 time 0.4475 (0.6009) data time 0.0006 (0.0058) model time 0.4469 (0.4453) loss 6.9461 (6.9683) grad_norm 1.6030 (1.7821) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][130/625] eta 0:04:51 lr 0.000014 wd 0.0500 time 0.4436 (0.5889) data time 0.0008 (0.0054) model time 0.4428 (0.4450) loss 6.8990 (6.9648) grad_norm 1.5614 (1.7645) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][140/625] eta 0:04:40 lr 0.000015 wd 0.0500 time 0.4437 (0.5787) data time 0.0007 (0.0051) model time 0.4429 (0.4449) loss 6.9356 (6.9620) grad_norm 1.5591 (1.7470) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][150/625] eta 0:04:30 lr 0.000016 wd 0.0500 time 0.4453 (0.5699) data time 0.0008 (0.0048) model time 0.4446 (0.4449) loss 6.8751 (6.9598) grad_norm 1.4766 (1.7292) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][160/625] eta 0:04:21 lr 0.000017 wd 0.0500 time 0.4475 (0.5621) data time 0.0008 (0.0046) model time 0.4467 (0.4448) loss 6.8781 (6.9576) grad_norm 1.5154 (1.7125) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][170/625] eta 0:04:12 lr 0.000018 wd 0.0500 time 0.4677 (0.5554) data time 0.0008 (0.0044) model time 0.4669 (0.4449) loss 6.9026 (6.9553) grad_norm 1.5095 (1.6946) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][180/625] eta 0:04:04 lr 0.000019 wd 0.0500 time 0.4433 (0.5492) data time 0.0008 (0.0042) model time 0.4425 (0.4446) loss 6.9117 (6.9521) grad_norm 1.3928 (1.6788) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][190/625] eta 0:03:56 lr 0.000020 wd 0.0500 time 0.4448 (0.5436) data time 0.0008 (0.0040) model time 0.4440 (0.4445) loss 6.8932 (6.9489) grad_norm 1.4271 (1.6641) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][200/625] eta 0:03:48 lr 0.000021 wd 0.0500 time 0.4488 (0.5387) data time 0.0008 (0.0039) model time 0.4479 (0.4444) loss 6.8541 (6.9452) grad_norm 1.3065 (1.6504) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][210/625] eta 0:03:41 lr 0.000022 wd 0.0500 time 0.4444 (0.5342) data time 0.0010 (0.0037) model time 0.4434 (0.4443) loss 6.9086 (6.9425) grad_norm 1.6494 (1.6386) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][220/625] eta 0:03:34 lr 0.000023 wd 0.0500 time 0.4388 (0.5301) data time 0.0007 (0.0036) model time 0.4381 (0.4442) loss 6.8773 (6.9389) grad_norm 1.5301 (1.6283) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][230/625] eta 0:03:28 lr 0.000024 wd 0.0500 time 0.4437 (0.5274) data time 0.0007 (0.0035) model time 0.4430 (0.4455) loss 6.8006 (6.9362) grad_norm 1.7785 (1.6192) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][240/625] eta 0:03:21 lr 0.000025 wd 0.0500 time 0.4475 (0.5239) data time 0.0008 (0.0034) model time 0.4467 (0.4454) loss 6.8331 (6.9331) grad_norm 1.4013 (1.6129) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][250/625] eta 0:03:15 lr 0.000026 wd 0.0500 time 0.4445 (0.5208) data time 0.0007 (0.0033) model time 0.4438 (0.4453) loss 6.7982 (6.9299) grad_norm 2.5517 (1.6117) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][260/625] eta 0:03:09 lr 0.000027 wd 0.0500 time 0.4432 (0.5180) data time 0.0007 (0.0032) model time 0.4425 (0.4454) loss 6.8693 (6.9263) grad_norm 1.2334 (1.6102) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][270/625] eta 0:03:02 lr 0.000028 wd 0.0500 time 0.4449 (0.5152) data time 0.0006 (0.0031) model time 0.4443 (0.4452) loss 6.9025 (6.9225) grad_norm 1.2608 (1.6059) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][280/625] eta 0:02:56 lr 0.000029 wd 0.0500 time 0.4449 (0.5127) data time 0.0008 (0.0030) model time 0.4441 (0.4451) loss 6.9496 (6.9203) grad_norm 1.6422 (1.5979) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][290/625] eta 0:02:50 lr 0.000030 wd 0.0500 time 0.4452 (0.5104) data time 0.0006 (0.0030) model time 0.4445 (0.4451) loss 6.7891 (6.9163) grad_norm 1.0855 (1.5893) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][300/625] eta 0:02:45 lr 0.000031 wd 0.0500 time 0.4464 (0.5082) data time 0.0008 (0.0029) model time 0.4456 (0.4451) loss 6.8307 (6.9117) grad_norm 1.4944 (1.5853) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][310/625] eta 0:02:39 lr 0.000032 wd 0.0500 time 0.4415 (0.5062) data time 0.0009 (0.0028) model time 0.4406 (0.4450) loss 6.8210 (6.9083) grad_norm 1.5175 (1.5862) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][320/625] eta 0:02:33 lr 0.000033 wd 0.0500 time 0.4432 (0.5046) data time 0.0007 (0.0028) model time 0.4425 (0.4454) loss 6.8309 (6.9061) grad_norm 2.2369 (1.5946) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][330/625] eta 0:02:28 lr 0.000034 wd 0.0500 time 0.4429 (0.5027) data time 0.0007 (0.0027) model time 0.4422 (0.4453) loss 6.7417 (6.9026) grad_norm 1.7179 (1.6003) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][340/625] eta 0:02:22 lr 0.000035 wd 0.0500 time 0.4479 (0.5010) data time 0.0008 (0.0027) model time 0.4472 (0.4452) loss 6.7509 (6.8998) grad_norm 1.5418 (1.6152) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][350/625] eta 0:02:17 lr 0.000036 wd 0.0500 time 0.4453 (0.4994) data time 0.0007 (0.0026) model time 0.4445 (0.4451) loss 6.8358 (6.8969) grad_norm 1.1476 (1.6178) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][360/625] eta 0:02:11 lr 0.000037 wd 0.0500 time 0.4450 (0.4979) data time 0.0007 (0.0026) model time 0.4443 (0.4451) loss 6.7418 (6.8941) grad_norm 2.0009 (1.6205) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][370/625] eta 0:02:06 lr 0.000037 wd 0.0500 time 0.4432 (0.4964) data time 0.0009 (0.0025) model time 0.4424 (0.4450) loss 6.8333 (6.8909) grad_norm 1.7251 (1.6303) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][380/625] eta 0:02:01 lr 0.000038 wd 0.0500 time 0.4450 (0.4951) data time 0.0007 (0.0025) model time 0.4443 (0.4450) loss 6.7856 (6.8883) grad_norm 1.6718 (1.6400) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][390/625] eta 0:01:56 lr 0.000039 wd 0.0500 time 0.4485 (0.4938) data time 0.0006 (0.0024) model time 0.4478 (0.4450) loss 6.8470 (6.8857) grad_norm 2.1060 (1.6551) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][400/625] eta 0:01:50 lr 0.000040 wd 0.0500 time 0.4440 (0.4926) data time 0.0009 (0.0024) model time 0.4431 (0.4449) loss 6.8694 (6.8841) grad_norm 2.2152 (1.6622) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][410/625] eta 0:01:45 lr 0.000041 wd 0.0500 time 0.4438 (0.4914) data time 0.0009 (0.0023) model time 0.4429 (0.4449) loss 6.8022 (6.8819) grad_norm 2.6150 (1.6735) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][420/625] eta 0:01:40 lr 0.000042 wd 0.0500 time 0.4417 (0.4907) data time 0.0006 (0.0023) model time 0.4410 (0.4453) loss 6.7974 (6.8794) grad_norm 3.0264 (1.6979) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][430/625] eta 0:01:35 lr 0.000043 wd 0.0500 time 0.4463 (0.4896) data time 0.0007 (0.0023) model time 0.4456 (0.4453) loss 6.7387 (6.8776) grad_norm 1.6502 (1.7244) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][440/625] eta 0:01:30 lr 0.000044 wd 0.0500 time 0.4437 (0.4886) data time 0.0007 (0.0022) model time 0.4430 (0.4452) loss 6.6858 (6.8749) grad_norm 1.8922 (1.7388) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][450/625] eta 0:01:25 lr 0.000045 wd 0.0500 time 0.4433 (0.4877) data time 0.0006 (0.0022) model time 0.4427 (0.4452) loss 6.6780 (6.8716) grad_norm 2.3773 (1.7603) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][460/625] eta 0:01:20 lr 0.000046 wd 0.0500 time 0.6606 (0.4872) data time 0.0008 (0.0022) model time 0.6598 (0.4457) loss 6.7805 (6.8689) grad_norm 1.7431 (1.7633) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:41:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][470/625] eta 0:01:15 lr 0.000047 wd 0.0500 time 0.4478 (0.4862) data time 0.0008 (0.0022) model time 0.4469 (0.4456) loss 6.7648 (6.8665) grad_norm 2.7310 (1.7932) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][480/625] eta 0:01:10 lr 0.000048 wd 0.0500 time 0.4438 (0.4853) data time 0.0007 (0.0021) model time 0.4432 (0.4455) loss 6.7612 (6.8631) grad_norm 1.8716 (1.8094) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][490/625] eta 0:01:05 lr 0.000049 wd 0.0500 time 0.4423 (0.4845) data time 0.0009 (0.0021) model time 0.4415 (0.4455) loss 6.7910 (6.8616) grad_norm 2.4011 (1.8384) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][500/625] eta 0:01:00 lr 0.000050 wd 0.0500 time 0.4429 (0.4837) data time 0.0008 (0.0021) model time 0.4421 (0.4455) loss 6.7749 (6.8593) grad_norm 5.3194 (1.8604) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][510/625] eta 0:00:55 lr 0.000051 wd 0.0500 time 0.4418 (0.4829) data time 0.0007 (0.0021) model time 0.4411 (0.4454) loss 6.7720 (6.8573) grad_norm 5.8164 (1.9006) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][520/625] eta 0:00:50 lr 0.000052 wd 0.0500 time 0.4431 (0.4821) data time 0.0007 (0.0020) model time 0.4424 (0.4453) loss 6.7416 (6.8556) grad_norm 2.7942 (1.9294) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][530/625] eta 0:00:45 lr 0.000053 wd 0.0500 time 0.4435 (0.4815) data time 0.0009 (0.0020) model time 0.4426 (0.4453) loss 6.7127 (6.8530) grad_norm 2.2100 (1.9607) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][540/625] eta 0:00:40 lr 0.000054 wd 0.0500 time 0.4464 (0.4808) data time 0.0009 (0.0020) model time 0.4455 (0.4453) loss 6.7965 (6.8507) grad_norm 3.5112 (1.9888) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][550/625] eta 0:00:36 lr 0.000055 wd 0.0500 time 0.4432 (0.4802) data time 0.0006 (0.0020) model time 0.4426 (0.4453) loss 6.6740 (6.8478) grad_norm 2.1419 (2.0090) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][560/625] eta 0:00:31 lr 0.000056 wd 0.0500 time 0.4444 (0.4795) data time 0.0009 (0.0020) model time 0.4435 (0.4452) loss 6.7780 (6.8463) grad_norm 3.8842 (2.0219) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][570/625] eta 0:00:26 lr 0.000057 wd 0.0500 time 0.4419 (0.4789) data time 0.0008 (0.0019) model time 0.4411 (0.4452) loss 6.7944 (6.8443) grad_norm 4.7953 (2.0414) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][580/625] eta 0:00:21 lr 0.000058 wd 0.0500 time 0.4439 (0.4783) data time 0.0007 (0.0019) model time 0.4432 (0.4452) loss 6.8201 (6.8420) grad_norm 5.3549 (2.0687) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][590/625] eta 0:00:16 lr 0.000059 wd 0.0500 time 0.4515 (0.4778) data time 0.0008 (0.0019) model time 0.4507 (0.4452) loss 6.8618 (6.8399) grad_norm 5.3037 (2.1040) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:42:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][600/625] eta 0:00:11 lr 0.000060 wd 0.0500 time 0.4470 (0.4772) data time 0.0009 (0.0019) model time 0.4461 (0.4452) loss 6.6440 (6.8377) grad_norm 2.1698 (2.1286) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:43:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][610/625] eta 0:00:07 lr 0.000060 wd 0.0500 time 0.4391 (0.4767) data time 0.0006 (0.0019) model time 0.4385 (0.4451) loss 6.6470 (6.8351) grad_norm 3.7015 (2.1548) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [0/300][620/625] eta 0:00:02 lr 0.000061 wd 0.0500 time 0.4379 (0.4761) data time 0.0005 (0.0019) model time 0.4375 (0.4450) loss 6.7516 (6.8330) grad_norm 2.5914 (2.1624) loss_scale 65536.0000 (65536.0000) mem 16690MB [2024-08-04 09:43:07 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 0 training takes 0:04:57 [2024-08-04 09:43:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 09:43:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 09:43:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 6.0547 (6.0547) Acc@1 1.514 (1.514) Acc@5 9.277 (9.277) Mem 16690MB [2024-08-04 09:43:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 6.4883 (6.2667) Acc@1 0.146 (1.221) Acc@5 1.465 (5.376) Mem 16690MB [2024-08-04 09:43:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.4141 (6.3118) Acc@1 1.465 (1.551) Acc@5 4.248 (5.636) Mem 16690MB [2024-08-04 09:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 1.947 Acc@5 6.804 [2024-08-04 09:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 1.9% [2024-08-04 09:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 1.95% [2024-08-04 09:43:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 09:43:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 09:43:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 7.0234 (7.0234) Acc@1 0.098 (0.098) Acc@5 0.391 (0.391) Mem 16690MB [2024-08-04 09:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 7.0039 (7.0092) Acc@1 0.098 (0.102) Acc@5 0.439 (0.524) Mem 16690MB [2024-08-04 09:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 7.0078 (7.0141) Acc@1 0.049 (0.095) Acc@5 0.439 (0.463) Mem 16690MB [2024-08-04 09:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.092 Acc@5 0.474 [2024-08-04 09:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-08-04 09:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.09% [2024-08-04 09:43:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 09:43:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 09:43:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][0/625] eta 0:08:10 lr 0.000062 wd 0.0500 time 0.7846 (0.7846) data time 0.3335 (0.3335) model time 0.0000 (0.0000) loss 6.4716 (6.4716) grad_norm 2.0641 (2.0641) loss_scale 65536.0000 (65536.0000) mem 16704MB [2024-08-04 09:43:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][10/625] eta 0:05:11 lr 0.000063 wd 0.0500 time 0.4455 (0.5062) data time 0.0006 (0.0312) model time 0.0000 (0.0000) loss 6.6192 (6.6653) grad_norm 4.3638 (4.4214) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][20/625] eta 0:04:51 lr 0.000064 wd 0.0500 time 0.4404 (0.4824) data time 0.0008 (0.0167) model time 0.0000 (0.0000) loss 6.7769 (6.6708) grad_norm 2.7123 (4.4573) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][30/625] eta 0:04:39 lr 0.000065 wd 0.0500 time 0.4446 (0.4704) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 6.6200 (6.6806) grad_norm 3.2064 (4.0325) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][40/625] eta 0:04:31 lr 0.000066 wd 0.0500 time 0.4444 (0.4645) data time 0.0006 (0.0090) model time 0.0000 (0.0000) loss 6.7707 (6.6692) grad_norm 2.9200 (3.9042) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][50/625] eta 0:04:24 lr 0.000067 wd 0.0500 time 0.4505 (0.4606) data time 0.0007 (0.0074) model time 0.0000 (0.0000) loss 6.5484 (6.6785) grad_norm 3.9182 (3.8901) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][60/625] eta 0:04:18 lr 0.000068 wd 0.0500 time 0.4413 (0.4577) data time 0.0006 (0.0063) model time 0.4408 (0.4423) loss 6.5259 (6.6728) grad_norm 5.1270 (3.9863) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][70/625] eta 0:04:13 lr 0.000069 wd 0.0500 time 0.4461 (0.4559) data time 0.0006 (0.0055) model time 0.4455 (0.4431) loss 6.5094 (6.6614) grad_norm 2.7715 (3.8215) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:43:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][80/625] eta 0:04:07 lr 0.000070 wd 0.0500 time 0.4417 (0.4544) data time 0.0006 (0.0050) model time 0.4411 (0.4430) loss 6.7810 (6.6613) grad_norm 4.1452 (3.7790) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][90/625] eta 0:04:02 lr 0.000071 wd 0.0500 time 0.4434 (0.4534) data time 0.0008 (0.0045) model time 0.4426 (0.4433) loss 6.6482 (6.6561) grad_norm 2.2038 (3.7625) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][100/625] eta 0:03:57 lr 0.000071 wd 0.0500 time 0.4454 (0.4525) data time 0.0006 (0.0041) model time 0.4448 (0.4433) loss 6.6710 (6.6484) grad_norm 2.9721 (3.6942) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][110/625] eta 0:03:52 lr 0.000072 wd 0.0500 time 0.4471 (0.4517) data time 0.0006 (0.0038) model time 0.4465 (0.4433) loss 6.7155 (6.6489) grad_norm 4.1959 (3.7246) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][120/625] eta 0:03:47 lr 0.000073 wd 0.0500 time 0.4447 (0.4512) data time 0.0007 (0.0036) model time 0.4440 (0.4434) loss 6.7660 (6.6532) grad_norm 1.4432 (3.6938) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][130/625] eta 0:03:43 lr 0.000074 wd 0.0500 time 0.4445 (0.4506) data time 0.0006 (0.0034) model time 0.4439 (0.4434) loss 6.7013 (6.6527) grad_norm 3.6094 (3.7211) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][140/625] eta 0:03:38 lr 0.000075 wd 0.0500 time 0.4389 (0.4502) data time 0.0009 (0.0033) model time 0.4380 (0.4433) loss 6.6881 (6.6539) grad_norm 1.8723 (3.6903) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][150/625] eta 0:03:33 lr 0.000076 wd 0.0500 time 0.4408 (0.4499) data time 0.0008 (0.0031) model time 0.4399 (0.4434) loss 6.7056 (6.6521) grad_norm 3.0532 (3.6513) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][160/625] eta 0:03:29 lr 0.000077 wd 0.0500 time 0.4460 (0.4496) data time 0.0009 (0.0030) model time 0.4451 (0.4436) loss 6.5429 (6.6481) grad_norm 2.6623 (3.6600) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][170/625] eta 0:03:24 lr 0.000078 wd 0.0500 time 0.4434 (0.4493) data time 0.0006 (0.0028) model time 0.4427 (0.4436) loss 6.4889 (6.6468) grad_norm 2.7077 (3.6533) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][180/625] eta 0:03:19 lr 0.000079 wd 0.0500 time 0.4434 (0.4490) data time 0.0008 (0.0027) model time 0.4426 (0.4436) loss 6.4566 (6.6428) grad_norm 3.1705 (3.5914) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][190/625] eta 0:03:15 lr 0.000080 wd 0.0500 time 0.4435 (0.4488) data time 0.0006 (0.0026) model time 0.4429 (0.4436) loss 6.7871 (6.6401) grad_norm 3.8640 (3.6396) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][200/625] eta 0:03:10 lr 0.000081 wd 0.0500 time 0.4399 (0.4486) data time 0.0009 (0.0026) model time 0.4390 (0.4436) loss 6.3735 (6.6365) grad_norm 6.6518 (3.6755) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][210/625] eta 0:03:06 lr 0.000082 wd 0.0500 time 0.4453 (0.4484) data time 0.0007 (0.0025) model time 0.4446 (0.4435) loss 6.7404 (6.6323) grad_norm 2.8918 (3.6598) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][220/625] eta 0:03:01 lr 0.000083 wd 0.0500 time 0.4439 (0.4481) data time 0.0006 (0.0024) model time 0.4433 (0.4435) loss 6.7463 (6.6295) grad_norm 3.1720 (3.6516) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:45:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][230/625] eta 0:02:57 lr 0.000084 wd 0.0500 time 0.4410 (0.4492) data time 0.0008 (0.0023) model time 0.4402 (0.4451) loss 6.6888 (6.6279) grad_norm 3.6143 (3.6243) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:45:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][240/625] eta 0:02:52 lr 0.000085 wd 0.0500 time 0.4450 (0.4490) data time 0.0009 (0.0023) model time 0.4441 (0.4450) loss 6.6665 (6.6249) grad_norm 2.4538 (3.6300) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 09:45:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][250/625] eta 0:02:48 lr 0.000086 wd 0.0500 time 0.4360 (0.4488) data time 0.0009 (0.0022) model time 0.4352 (0.4448) loss 6.5622 (6.6232) grad_norm inf (inf) loss_scale 32768.0000 (65405.4502) mem 16696MB [2024-08-04 09:45:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][260/625] eta 0:02:43 lr 0.000087 wd 0.0500 time 0.4432 (0.4486) data time 0.0007 (0.0022) model time 0.4424 (0.4447) loss 6.6329 (6.6191) grad_norm 3.9006 (inf) loss_scale 32768.0000 (64154.9732) mem 16696MB [2024-08-04 09:45:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][270/625] eta 0:02:39 lr 0.000088 wd 0.0500 time 0.4452 (0.4484) data time 0.0008 (0.0021) model time 0.4444 (0.4447) loss 6.6537 (6.6180) grad_norm 1.9424 (inf) loss_scale 32768.0000 (62996.7823) mem 16696MB [2024-08-04 09:45:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][280/625] eta 0:02:34 lr 0.000089 wd 0.0500 time 0.4434 (0.4483) data time 0.0009 (0.0021) model time 0.4425 (0.4447) loss 6.6363 (6.6156) grad_norm 2.4797 (inf) loss_scale 32768.0000 (61921.0249) mem 16696MB [2024-08-04 09:45:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][290/625] eta 0:02:30 lr 0.000090 wd 0.0500 time 0.4524 (0.4483) data time 0.0006 (0.0020) model time 0.4517 (0.4447) loss 6.5687 (6.6127) grad_norm 4.3786 (inf) loss_scale 32768.0000 (60919.2027) mem 16696MB [2024-08-04 09:45:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][300/625] eta 0:02:25 lr 0.000091 wd 0.0500 time 0.4420 (0.4482) data time 0.0008 (0.0020) model time 0.4412 (0.4447) loss 6.6888 (6.6128) grad_norm 4.8666 (inf) loss_scale 32768.0000 (59983.9468) mem 16696MB [2024-08-04 09:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][310/625] eta 0:02:21 lr 0.000092 wd 0.0500 time 0.4411 (0.4481) data time 0.0006 (0.0020) model time 0.4405 (0.4447) loss 6.3558 (6.6123) grad_norm 3.5183 (inf) loss_scale 32768.0000 (59108.8360) mem 16696MB [2024-08-04 09:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][320/625] eta 0:02:16 lr 0.000093 wd 0.0500 time 0.4407 (0.4480) data time 0.0009 (0.0019) model time 0.4399 (0.4447) loss 6.6140 (6.6109) grad_norm 2.8587 (inf) loss_scale 32768.0000 (58288.2492) mem 16696MB [2024-08-04 09:45:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][330/625] eta 0:02:12 lr 0.000094 wd 0.0500 time 0.4443 (0.4479) data time 0.0007 (0.0019) model time 0.4436 (0.4446) loss 6.7644 (6.6081) grad_norm 3.4594 (inf) loss_scale 32768.0000 (57517.2447) mem 16696MB [2024-08-04 09:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][340/625] eta 0:02:07 lr 0.000094 wd 0.0500 time 0.4415 (0.4478) data time 0.0008 (0.0019) model time 0.4407 (0.4446) loss 6.6719 (6.6070) grad_norm 3.6314 (inf) loss_scale 32768.0000 (56791.4604) mem 16696MB [2024-08-04 09:45:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][350/625] eta 0:02:03 lr 0.000095 wd 0.0500 time 0.4425 (0.4476) data time 0.0008 (0.0018) model time 0.4417 (0.4445) loss 6.6836 (6.6076) grad_norm 3.6256 (inf) loss_scale 32768.0000 (56107.0313) mem 16696MB [2024-08-04 09:46:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][360/625] eta 0:01:58 lr 0.000096 wd 0.0500 time 0.4402 (0.4479) data time 0.0009 (0.0018) model time 0.4393 (0.4450) loss 6.3092 (6.6036) grad_norm 3.9585 (inf) loss_scale 32768.0000 (55460.5208) mem 16696MB [2024-08-04 09:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][370/625] eta 0:01:54 lr 0.000097 wd 0.0500 time 0.4444 (0.4478) data time 0.0009 (0.0018) model time 0.4435 (0.4449) loss 6.4738 (6.6033) grad_norm 2.6942 (inf) loss_scale 32768.0000 (54848.8625) mem 16696MB [2024-08-04 09:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][380/625] eta 0:01:49 lr 0.000098 wd 0.0500 time 0.4439 (0.4488) data time 0.0008 (0.0018) model time 0.4431 (0.4460) loss 6.6683 (6.6008) grad_norm 3.6002 (inf) loss_scale 32768.0000 (54269.3123) mem 16696MB [2024-08-04 09:46:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][390/625] eta 0:01:45 lr 0.000099 wd 0.0500 time 0.4406 (0.4486) data time 0.0007 (0.0017) model time 0.4399 (0.4459) loss 6.3550 (6.6007) grad_norm 3.6254 (inf) loss_scale 32768.0000 (53719.4066) mem 16696MB [2024-08-04 09:46:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][400/625] eta 0:01:40 lr 0.000100 wd 0.0500 time 0.4434 (0.4485) data time 0.0009 (0.0017) model time 0.4425 (0.4459) loss 6.4220 (6.5978) grad_norm 3.1057 (inf) loss_scale 32768.0000 (53196.9277) mem 16696MB [2024-08-04 09:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][410/625] eta 0:01:36 lr 0.000101 wd 0.0500 time 0.4475 (0.4484) data time 0.0008 (0.0017) model time 0.4467 (0.4458) loss 6.5804 (6.5960) grad_norm 3.5037 (inf) loss_scale 32768.0000 (52699.8735) mem 16696MB [2024-08-04 09:46:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][420/625] eta 0:01:31 lr 0.000102 wd 0.0500 time 0.4449 (0.4483) data time 0.0008 (0.0017) model time 0.4441 (0.4457) loss 6.4117 (6.5931) grad_norm 2.1791 (inf) loss_scale 32768.0000 (52226.4323) mem 16696MB [2024-08-04 09:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][430/625] eta 0:01:27 lr 0.000103 wd 0.0500 time 0.4430 (0.4482) data time 0.0009 (0.0017) model time 0.4422 (0.4456) loss 6.6248 (6.5915) grad_norm 2.2682 (inf) loss_scale 32768.0000 (51774.9606) mem 16696MB [2024-08-04 09:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][440/625] eta 0:01:22 lr 0.000104 wd 0.0500 time 0.4379 (0.4481) data time 0.0006 (0.0017) model time 0.4373 (0.4455) loss 6.1779 (6.5903) grad_norm 3.9457 (inf) loss_scale 32768.0000 (51343.9637) mem 16696MB [2024-08-04 09:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][450/625] eta 0:01:18 lr 0.000105 wd 0.0500 time 0.4483 (0.4480) data time 0.0009 (0.0016) model time 0.4475 (0.4455) loss 6.3888 (6.5875) grad_norm 2.0975 (inf) loss_scale 32768.0000 (50932.0798) mem 16696MB [2024-08-04 09:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][460/625] eta 0:01:13 lr 0.000106 wd 0.0500 time 0.4424 (0.4479) data time 0.0005 (0.0016) model time 0.4418 (0.4454) loss 6.4117 (6.5854) grad_norm 3.6978 (inf) loss_scale 32768.0000 (50538.0651) mem 16696MB [2024-08-04 09:46:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][470/625] eta 0:01:09 lr 0.000107 wd 0.0500 time 0.4444 (0.4479) data time 0.0006 (0.0016) model time 0.4438 (0.4454) loss 6.6619 (6.5823) grad_norm 3.3569 (inf) loss_scale 32768.0000 (50160.7813) mem 16696MB [2024-08-04 09:46:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][480/625] eta 0:01:04 lr 0.000108 wd 0.0500 time 0.4446 (0.4478) data time 0.0008 (0.0016) model time 0.4438 (0.4454) loss 6.4810 (6.5810) grad_norm 2.7158 (inf) loss_scale 32768.0000 (49799.1850) mem 16696MB [2024-08-04 09:47:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][490/625] eta 0:01:00 lr 0.000109 wd 0.0500 time 0.4489 (0.4477) data time 0.0009 (0.0016) model time 0.4479 (0.4453) loss 6.6234 (6.5804) grad_norm 3.8293 (inf) loss_scale 32768.0000 (49452.3177) mem 16696MB [2024-08-04 09:47:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][500/625] eta 0:00:55 lr 0.000110 wd 0.0500 time 0.4429 (0.4476) data time 0.0008 (0.0016) model time 0.4421 (0.4452) loss 6.8191 (6.5790) grad_norm 2.9226 (inf) loss_scale 32768.0000 (49119.2974) mem 16696MB [2024-08-04 09:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][510/625] eta 0:00:51 lr 0.000111 wd 0.0500 time 0.4425 (0.4475) data time 0.0009 (0.0016) model time 0.4415 (0.4452) loss 6.7195 (6.5776) grad_norm 4.3255 (inf) loss_scale 32768.0000 (48799.3112) mem 16696MB [2024-08-04 09:47:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][520/625] eta 0:00:46 lr 0.000112 wd 0.0500 time 0.4409 (0.4475) data time 0.0007 (0.0015) model time 0.4403 (0.4451) loss 6.1561 (6.5751) grad_norm 3.0653 (inf) loss_scale 32768.0000 (48491.6084) mem 16696MB [2024-08-04 09:47:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][530/625] eta 0:00:42 lr 0.000113 wd 0.0500 time 0.4428 (0.4474) data time 0.0007 (0.0015) model time 0.4421 (0.4451) loss 6.7873 (6.5737) grad_norm 3.7133 (inf) loss_scale 32768.0000 (48195.4953) mem 16696MB [2024-08-04 09:47:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][540/625] eta 0:00:38 lr 0.000114 wd 0.0500 time 0.4461 (0.4473) data time 0.0006 (0.0015) model time 0.4455 (0.4450) loss 6.6450 (6.5706) grad_norm 3.0861 (inf) loss_scale 32768.0000 (47910.3290) mem 16696MB [2024-08-04 09:47:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][550/625] eta 0:00:33 lr 0.000115 wd 0.0500 time 0.4426 (0.4474) data time 0.0009 (0.0015) model time 0.4417 (0.4451) loss 6.6594 (6.5694) grad_norm 2.8821 (inf) loss_scale 32768.0000 (47635.5136) mem 16696MB [2024-08-04 09:47:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][560/625] eta 0:00:29 lr 0.000116 wd 0.0500 time 0.4430 (0.4473) data time 0.0008 (0.0015) model time 0.4421 (0.4450) loss 6.5308 (6.5697) grad_norm 3.0368 (inf) loss_scale 32768.0000 (47370.4955) mem 16696MB [2024-08-04 09:47:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][570/625] eta 0:00:24 lr 0.000117 wd 0.0500 time 0.4421 (0.4472) data time 0.0009 (0.0015) model time 0.4412 (0.4450) loss 6.5551 (6.5684) grad_norm 3.5927 (inf) loss_scale 32768.0000 (47114.7601) mem 16696MB [2024-08-04 09:47:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][580/625] eta 0:00:20 lr 0.000117 wd 0.0500 time 0.4437 (0.4471) data time 0.0006 (0.0015) model time 0.4432 (0.4449) loss 6.2781 (6.5663) grad_norm 3.3838 (inf) loss_scale 32768.0000 (46867.8279) mem 16696MB [2024-08-04 09:47:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][590/625] eta 0:00:15 lr 0.000118 wd 0.0500 time 0.4421 (0.4471) data time 0.0008 (0.0015) model time 0.4414 (0.4449) loss 6.2862 (6.5636) grad_norm 4.5599 (inf) loss_scale 32768.0000 (46629.2521) mem 16696MB [2024-08-04 09:47:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][600/625] eta 0:00:11 lr 0.000119 wd 0.0500 time 0.4418 (0.4471) data time 0.0006 (0.0015) model time 0.4412 (0.4449) loss 6.1435 (6.5610) grad_norm 3.0333 (inf) loss_scale 32768.0000 (46398.6156) mem 16696MB [2024-08-04 09:47:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][610/625] eta 0:00:06 lr 0.000120 wd 0.0500 time 0.4436 (0.4470) data time 0.0004 (0.0015) model time 0.4432 (0.4449) loss 6.3744 (6.5589) grad_norm 8.8013 (inf) loss_scale 32768.0000 (46175.5286) mem 16696MB [2024-08-04 09:47:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [1/300][620/625] eta 0:00:02 lr 0.000121 wd 0.0500 time 0.4406 (0.4469) data time 0.0006 (0.0014) model time 0.4400 (0.4448) loss 6.5596 (6.5571) grad_norm 3.3051 (inf) loss_scale 32768.0000 (45959.6264) mem 16696MB [2024-08-04 09:47:59 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 1 training takes 0:04:39 [2024-08-04 09:47:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 09:48:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 09:48:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 5.5273 (5.5273) Acc@1 3.906 (3.906) Acc@5 15.869 (15.869) Mem 16696MB [2024-08-04 09:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 5.8789 (5.5604) Acc@1 1.855 (4.341) Acc@5 9.570 (15.217) Mem 16696MB [2024-08-04 09:48:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 5.7578 (5.5986) Acc@1 2.295 (4.785) Acc@5 8.789 (15.227) Mem 16696MB [2024-08-04 09:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 5.414 Acc@5 16.477 [2024-08-04 09:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 5.4% [2024-08-04 09:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 5.41% [2024-08-04 09:48:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 09:48:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 09:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 6.9922 (6.9922) Acc@1 0.049 (0.049) Acc@5 0.684 (0.684) Mem 16696MB [2024-08-04 09:48:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.151) Loss 6.9844 (6.9890) Acc@1 0.244 (0.133) Acc@5 0.586 (0.506) Mem 16696MB [2024-08-04 09:48:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.9961 (6.9950) Acc@1 0.146 (0.114) Acc@5 0.537 (0.472) Mem 16696MB [2024-08-04 09:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.112 Acc@5 0.506 [2024-08-04 09:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-08-04 09:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.11% [2024-08-04 09:48:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 09:48:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 09:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][0/625] eta 0:07:53 lr 0.000122 wd 0.0500 time 0.7578 (0.7578) data time 0.3656 (0.3656) model time 0.0000 (0.0000) loss 6.4731 (6.4731) grad_norm 2.5088 (2.5088) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][10/625] eta 0:05:01 lr 0.000123 wd 0.0500 time 0.4427 (0.4904) data time 0.0009 (0.0341) model time 0.0000 (0.0000) loss 6.3678 (6.4627) grad_norm 3.7866 (3.2158) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][20/625] eta 0:04:43 lr 0.000124 wd 0.0500 time 0.4470 (0.4686) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 6.0816 (6.4738) grad_norm 2.3027 (3.4095) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][30/625] eta 0:04:33 lr 0.000125 wd 0.0500 time 0.4431 (0.4601) data time 0.0007 (0.0127) model time 0.0000 (0.0000) loss 6.5940 (6.5114) grad_norm 3.2015 (3.3922) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][40/625] eta 0:04:34 lr 0.000126 wd 0.0500 time 0.4419 (0.4689) data time 0.0007 (0.0098) model time 0.0000 (0.0000) loss 6.5439 (6.5120) grad_norm 2.7677 (3.3734) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][50/625] eta 0:04:26 lr 0.000127 wd 0.0500 time 0.4522 (0.4641) data time 0.0007 (0.0081) model time 0.0000 (0.0000) loss 6.6928 (6.5001) grad_norm 5.9343 (3.3264) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][60/625] eta 0:04:20 lr 0.000128 wd 0.0500 time 0.4462 (0.4611) data time 0.0006 (0.0069) model time 0.4456 (0.4450) loss 5.9325 (6.4799) grad_norm 2.7914 (3.3745) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][70/625] eta 0:04:14 lr 0.000129 wd 0.0500 time 0.4462 (0.4588) data time 0.0010 (0.0060) model time 0.4452 (0.4445) loss 6.0556 (6.4532) grad_norm 2.7614 (3.3403) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][80/625] eta 0:04:09 lr 0.000129 wd 0.0500 time 0.4531 (0.4570) data time 0.0006 (0.0054) model time 0.4525 (0.4442) loss 6.4554 (6.4511) grad_norm 3.2317 (3.3009) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][90/625] eta 0:04:03 lr 0.000130 wd 0.0500 time 0.4464 (0.4557) data time 0.0006 (0.0049) model time 0.4458 (0.4441) loss 6.5342 (6.4497) grad_norm 3.7595 (3.3227) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][100/625] eta 0:03:58 lr 0.000131 wd 0.0500 time 0.4430 (0.4544) data time 0.0008 (0.0045) model time 0.4422 (0.4437) loss 6.5773 (6.4491) grad_norm 2.4083 (3.3230) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][110/625] eta 0:03:53 lr 0.000132 wd 0.0500 time 0.4427 (0.4534) data time 0.0009 (0.0042) model time 0.4419 (0.4435) loss 6.5583 (6.4488) grad_norm 4.7016 (3.3047) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][120/625] eta 0:03:48 lr 0.000133 wd 0.0500 time 0.4424 (0.4526) data time 0.0009 (0.0039) model time 0.4415 (0.4434) loss 6.5297 (6.4481) grad_norm 3.4587 (3.3085) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][130/625] eta 0:03:43 lr 0.000134 wd 0.0500 time 0.4429 (0.4519) data time 0.0009 (0.0037) model time 0.4420 (0.4433) loss 6.6240 (6.4490) grad_norm 2.0446 (3.2809) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][140/625] eta 0:03:38 lr 0.000135 wd 0.0500 time 0.4446 (0.4513) data time 0.0007 (0.0035) model time 0.4438 (0.4432) loss 6.4743 (6.4480) grad_norm 4.1127 (3.2568) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][150/625] eta 0:03:34 lr 0.000136 wd 0.0500 time 0.4435 (0.4508) data time 0.0008 (0.0033) model time 0.4428 (0.4431) loss 6.4965 (6.4439) grad_norm 4.3065 (3.2484) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][160/625] eta 0:03:29 lr 0.000137 wd 0.0500 time 0.4453 (0.4504) data time 0.0007 (0.0032) model time 0.4446 (0.4431) loss 6.3413 (6.4436) grad_norm 3.0437 (3.2430) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][170/625] eta 0:03:24 lr 0.000138 wd 0.0500 time 0.4469 (0.4501) data time 0.0008 (0.0030) model time 0.4461 (0.4433) loss 6.5766 (6.4407) grad_norm 3.7910 (3.2296) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][180/625] eta 0:03:20 lr 0.000139 wd 0.0500 time 0.4470 (0.4498) data time 0.0006 (0.0029) model time 0.4464 (0.4434) loss 6.4074 (6.4349) grad_norm 2.1892 (3.2117) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][190/625] eta 0:03:15 lr 0.000140 wd 0.0500 time 0.4424 (0.4496) data time 0.0006 (0.0028) model time 0.4418 (0.4434) loss 6.3323 (6.4281) grad_norm 3.1333 (3.1804) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][200/625] eta 0:03:10 lr 0.000141 wd 0.0500 time 0.4465 (0.4494) data time 0.0007 (0.0027) model time 0.4458 (0.4435) loss 6.3297 (6.4266) grad_norm 3.1349 (3.1888) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][210/625] eta 0:03:06 lr 0.000142 wd 0.0500 time 0.4414 (0.4491) data time 0.0008 (0.0026) model time 0.4406 (0.4435) loss 6.4324 (6.4276) grad_norm 4.0323 (3.1823) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][220/625] eta 0:03:01 lr 0.000143 wd 0.0500 time 0.4417 (0.4489) data time 0.0007 (0.0025) model time 0.4410 (0.4435) loss 6.0888 (6.4238) grad_norm 2.4277 (3.1590) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][230/625] eta 0:02:57 lr 0.000144 wd 0.0500 time 0.4442 (0.4487) data time 0.0007 (0.0025) model time 0.4435 (0.4435) loss 6.3606 (6.4144) grad_norm 2.5979 (3.1723) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:49:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][240/625] eta 0:02:52 lr 0.000145 wd 0.0500 time 0.4429 (0.4485) data time 0.0008 (0.0024) model time 0.4422 (0.4434) loss 6.1637 (6.4029) grad_norm 3.5460 (3.1506) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][250/625] eta 0:02:48 lr 0.000146 wd 0.0500 time 0.4417 (0.4483) data time 0.0008 (0.0023) model time 0.4409 (0.4434) loss 6.2743 (6.3986) grad_norm 2.6301 (3.1465) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][260/625] eta 0:02:43 lr 0.000147 wd 0.0500 time 0.4447 (0.4482) data time 0.0008 (0.0023) model time 0.4439 (0.4434) loss 6.4557 (6.4000) grad_norm 3.1379 (3.1511) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][270/625] eta 0:02:39 lr 0.000148 wd 0.0500 time 0.4447 (0.4480) data time 0.0007 (0.0022) model time 0.4440 (0.4434) loss 6.3859 (6.3993) grad_norm 3.5908 (3.1682) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][280/625] eta 0:02:34 lr 0.000149 wd 0.0500 time 0.4436 (0.4479) data time 0.0006 (0.0022) model time 0.4430 (0.4435) loss 6.1154 (6.3959) grad_norm 2.8516 (3.1497) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][290/625] eta 0:02:30 lr 0.000150 wd 0.0500 time 0.4453 (0.4478) data time 0.0007 (0.0022) model time 0.4445 (0.4435) loss 5.9158 (6.3925) grad_norm 2.5916 (3.1409) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][300/625] eta 0:02:25 lr 0.000151 wd 0.0500 time 0.4395 (0.4477) data time 0.0009 (0.0021) model time 0.4386 (0.4435) loss 6.1085 (6.3877) grad_norm 3.1686 (3.1252) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][310/625] eta 0:02:20 lr 0.000152 wd 0.0500 time 0.4450 (0.4476) data time 0.0007 (0.0021) model time 0.4444 (0.4435) loss 5.9621 (6.3837) grad_norm 3.5549 (3.1265) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][320/625] eta 0:02:16 lr 0.000152 wd 0.0500 time 0.4478 (0.4475) data time 0.0007 (0.0020) model time 0.4471 (0.4435) loss 6.1397 (6.3769) grad_norm 3.6039 (3.1291) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][330/625] eta 0:02:11 lr 0.000153 wd 0.0500 time 0.4452 (0.4474) data time 0.0006 (0.0020) model time 0.4446 (0.4435) loss 6.0723 (6.3727) grad_norm 2.6893 (3.1231) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][340/625] eta 0:02:07 lr 0.000154 wd 0.0500 time 0.4458 (0.4474) data time 0.0010 (0.0020) model time 0.4448 (0.4435) loss 6.4202 (6.3714) grad_norm 2.4245 (3.1198) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][350/625] eta 0:02:03 lr 0.000155 wd 0.0500 time 0.4443 (0.4473) data time 0.0007 (0.0019) model time 0.4436 (0.4435) loss 6.4535 (6.3735) grad_norm 2.2186 (3.1116) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][360/625] eta 0:01:58 lr 0.000156 wd 0.0500 time 0.4440 (0.4472) data time 0.0008 (0.0019) model time 0.4432 (0.4435) loss 6.3041 (6.3717) grad_norm 2.5991 (3.0988) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][370/625] eta 0:01:54 lr 0.000157 wd 0.0500 time 0.4464 (0.4475) data time 0.0008 (0.0019) model time 0.4456 (0.4440) loss 5.9263 (6.3678) grad_norm 3.0256 (3.0921) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][380/625] eta 0:01:49 lr 0.000158 wd 0.0500 time 0.4445 (0.4480) data time 0.0008 (0.0018) model time 0.4437 (0.4446) loss 6.5443 (6.3663) grad_norm 3.2540 (3.0969) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][390/625] eta 0:01:45 lr 0.000159 wd 0.0500 time 0.4445 (0.4480) data time 0.0006 (0.0018) model time 0.4439 (0.4447) loss 6.1976 (6.3640) grad_norm 2.5458 (3.0893) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][400/625] eta 0:01:40 lr 0.000160 wd 0.0500 time 0.4418 (0.4479) data time 0.0008 (0.0018) model time 0.4410 (0.4446) loss 5.9454 (6.3559) grad_norm 3.6042 (3.0876) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][410/625] eta 0:01:36 lr 0.000161 wd 0.0500 time 0.4435 (0.4478) data time 0.0009 (0.0018) model time 0.4426 (0.4446) loss 5.9908 (6.3492) grad_norm 4.0542 (3.0959) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][420/625] eta 0:01:31 lr 0.000162 wd 0.0500 time 0.4464 (0.4477) data time 0.0009 (0.0018) model time 0.4456 (0.4445) loss 6.3820 (6.3480) grad_norm 2.9666 (3.0888) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][430/625] eta 0:01:27 lr 0.000163 wd 0.0500 time 0.4390 (0.4476) data time 0.0008 (0.0017) model time 0.4381 (0.4445) loss 6.5990 (6.3453) grad_norm 2.9767 (3.0815) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][440/625] eta 0:01:22 lr 0.000164 wd 0.0500 time 0.4470 (0.4475) data time 0.0006 (0.0017) model time 0.4463 (0.4445) loss 6.0990 (6.3428) grad_norm 3.3646 (3.0759) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][450/625] eta 0:01:18 lr 0.000165 wd 0.0500 time 0.4457 (0.4475) data time 0.0006 (0.0017) model time 0.4451 (0.4445) loss 6.4933 (6.3448) grad_norm 2.9703 (3.0727) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][460/625] eta 0:01:13 lr 0.000166 wd 0.0500 time 0.4471 (0.4474) data time 0.0006 (0.0017) model time 0.4464 (0.4445) loss 5.9858 (6.3404) grad_norm 2.9435 (3.0674) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][470/625] eta 0:01:09 lr 0.000167 wd 0.0500 time 0.4455 (0.4474) data time 0.0007 (0.0017) model time 0.4448 (0.4445) loss 6.1831 (6.3378) grad_norm 3.5663 (3.0708) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][480/625] eta 0:01:04 lr 0.000168 wd 0.0500 time 0.4442 (0.4473) data time 0.0008 (0.0016) model time 0.4433 (0.4444) loss 6.4619 (6.3390) grad_norm 2.4257 (3.0687) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][490/625] eta 0:01:00 lr 0.000169 wd 0.0500 time 0.4454 (0.4473) data time 0.0008 (0.0016) model time 0.4445 (0.4445) loss 6.1167 (6.3357) grad_norm 3.5999 (3.0622) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][500/625] eta 0:00:55 lr 0.000170 wd 0.0500 time 0.4486 (0.4472) data time 0.0008 (0.0016) model time 0.4478 (0.4445) loss 5.9298 (6.3318) grad_norm 2.6128 (3.0506) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:51:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][510/625] eta 0:00:51 lr 0.000171 wd 0.0500 time 0.4449 (0.4472) data time 0.0007 (0.0016) model time 0.4442 (0.4445) loss 6.4766 (6.3303) grad_norm 2.9859 (3.0444) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][520/625] eta 0:00:46 lr 0.000172 wd 0.0500 time 0.4441 (0.4472) data time 0.0008 (0.0016) model time 0.4433 (0.4445) loss 6.0764 (6.3286) grad_norm 2.5151 (3.0368) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][530/625] eta 0:00:42 lr 0.000173 wd 0.0500 time 0.4449 (0.4471) data time 0.0007 (0.0016) model time 0.4442 (0.4444) loss 6.3312 (6.3265) grad_norm 2.0575 (3.0305) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][540/625] eta 0:00:38 lr 0.000174 wd 0.0500 time 0.4449 (0.4471) data time 0.0006 (0.0016) model time 0.4443 (0.4444) loss 6.1043 (6.3228) grad_norm 2.8909 (3.0338) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][550/625] eta 0:00:33 lr 0.000175 wd 0.0500 time 0.4537 (0.4471) data time 0.0008 (0.0016) model time 0.4529 (0.4445) loss 5.9933 (6.3224) grad_norm 2.6729 (3.0332) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][560/625] eta 0:00:29 lr 0.000175 wd 0.0500 time 0.4445 (0.4471) data time 0.0008 (0.0015) model time 0.4437 (0.4445) loss 5.8372 (6.3187) grad_norm 2.8719 (3.0297) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][570/625] eta 0:00:24 lr 0.000176 wd 0.0500 time 0.4449 (0.4473) data time 0.0008 (0.0015) model time 0.4441 (0.4448) loss 6.3419 (6.3166) grad_norm 3.7232 (3.0220) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][580/625] eta 0:00:20 lr 0.000177 wd 0.0500 time 0.4445 (0.4473) data time 0.0008 (0.0015) model time 0.4437 (0.4448) loss 6.3759 (6.3161) grad_norm 4.4453 (3.0270) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][590/625] eta 0:00:15 lr 0.000178 wd 0.0500 time 0.4544 (0.4472) data time 0.0007 (0.0015) model time 0.4536 (0.4448) loss 6.2219 (6.3145) grad_norm 2.4692 (3.0265) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][600/625] eta 0:00:11 lr 0.000179 wd 0.0500 time 0.4402 (0.4472) data time 0.0006 (0.0015) model time 0.4396 (0.4447) loss 5.7199 (6.3103) grad_norm 2.6786 (3.0266) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][610/625] eta 0:00:06 lr 0.000180 wd 0.0500 time 0.4369 (0.4471) data time 0.0004 (0.0015) model time 0.4365 (0.4447) loss 6.1449 (6.3079) grad_norm 3.7283 (3.0228) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [2/300][620/625] eta 0:00:02 lr 0.000181 wd 0.0500 time 0.4404 (0.4471) data time 0.0006 (0.0015) model time 0.4398 (0.4446) loss 6.3276 (6.3041) grad_norm 2.6477 (3.0301) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 2 training takes 0:04:39 [2024-08-04 09:52:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 09:52:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 09:52:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 4.6992 (4.6992) Acc@1 10.742 (10.742) Acc@5 30.762 (30.762) Mem 16696MB [2024-08-04 09:52:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 5.3789 (4.7248) Acc@1 6.689 (10.485) Acc@5 18.799 (28.693) Mem 16696MB [2024-08-04 09:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 5.3242 (4.9299) Acc@1 5.176 (9.787) Acc@5 17.578 (26.179) Mem 16696MB [2024-08-04 09:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 10.729 Acc@5 27.529 [2024-08-04 09:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 10.7% [2024-08-04 09:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 10.73% [2024-08-04 09:52:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 09:52:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 09:52:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 6.9883 (6.9883) Acc@1 0.098 (0.098) Acc@5 0.537 (0.537) Mem 16696MB [2024-08-04 09:52:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 6.9492 (6.9737) Acc@1 0.244 (0.142) Acc@5 0.439 (0.528) Mem 16696MB [2024-08-04 09:52:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.9766 (6.9779) Acc@1 0.195 (0.126) Acc@5 0.488 (0.514) Mem 16696MB [2024-08-04 09:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.142 Acc@5 0.546 [2024-08-04 09:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-08-04 09:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.14% [2024-08-04 09:52:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 09:53:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 09:53:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][0/625] eta 0:07:46 lr 0.000182 wd 0.0500 time 0.7467 (0.7467) data time 0.3565 (0.3565) model time 0.0000 (0.0000) loss 6.3439 (6.3439) grad_norm 3.2797 (3.2797) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][10/625] eta 0:04:50 lr 0.000183 wd 0.0500 time 0.4488 (0.4722) data time 0.0007 (0.0333) model time 0.0000 (0.0000) loss 6.0379 (6.1364) grad_norm 2.8791 (3.3206) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][20/625] eta 0:04:37 lr 0.000184 wd 0.0500 time 0.4400 (0.4585) data time 0.0008 (0.0179) model time 0.0000 (0.0000) loss 6.3584 (6.0908) grad_norm 2.7119 (3.0469) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][30/625] eta 0:04:29 lr 0.000185 wd 0.0500 time 0.4404 (0.4533) data time 0.0009 (0.0124) model time 0.0000 (0.0000) loss 6.4498 (6.1194) grad_norm 2.4267 (2.9025) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][40/625] eta 0:04:23 lr 0.000186 wd 0.0500 time 0.4419 (0.4509) data time 0.0007 (0.0096) model time 0.0000 (0.0000) loss 6.2481 (6.0980) grad_norm 2.7599 (2.8628) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][50/625] eta 0:04:20 lr 0.000186 wd 0.0500 time 0.4405 (0.4528) data time 0.0007 (0.0079) model time 0.0000 (0.0000) loss 5.5629 (6.0786) grad_norm 2.7815 (2.8406) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][60/625] eta 0:04:15 lr 0.000187 wd 0.0500 time 0.4404 (0.4514) data time 0.0006 (0.0067) model time 0.4398 (0.4435) loss 6.1476 (6.0910) grad_norm 2.7742 (2.8565) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][70/625] eta 0:04:10 lr 0.000188 wd 0.0500 time 0.4446 (0.4505) data time 0.0008 (0.0059) model time 0.4438 (0.4438) loss 6.0935 (6.0936) grad_norm 3.2837 (2.8869) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][80/625] eta 0:04:05 lr 0.000189 wd 0.0500 time 0.4441 (0.4498) data time 0.0007 (0.0053) model time 0.4434 (0.4438) loss 6.2701 (6.0919) grad_norm 2.4643 (2.8943) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][90/625] eta 0:04:00 lr 0.000190 wd 0.0500 time 0.4431 (0.4491) data time 0.0009 (0.0048) model time 0.4423 (0.4436) loss 5.7184 (6.0938) grad_norm 3.4354 (2.8701) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][100/625] eta 0:03:55 lr 0.000191 wd 0.0500 time 0.4469 (0.4487) data time 0.0008 (0.0044) model time 0.4460 (0.4437) loss 6.4890 (6.1050) grad_norm 2.1966 (2.8115) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][110/625] eta 0:03:50 lr 0.000192 wd 0.0500 time 0.4453 (0.4483) data time 0.0009 (0.0041) model time 0.4444 (0.4436) loss 5.8478 (6.0951) grad_norm 2.5269 (2.7930) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][120/625] eta 0:03:46 lr 0.000193 wd 0.0500 time 0.4440 (0.4480) data time 0.0007 (0.0039) model time 0.4432 (0.4436) loss 6.2305 (6.0871) grad_norm 3.1651 (2.7696) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:53:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][130/625] eta 0:03:41 lr 0.000194 wd 0.0500 time 0.4472 (0.4477) data time 0.0006 (0.0036) model time 0.4466 (0.4436) loss 6.2577 (6.0949) grad_norm 2.6241 (2.7987) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][140/625] eta 0:03:37 lr 0.000195 wd 0.0500 time 0.4457 (0.4487) data time 0.0008 (0.0035) model time 0.4449 (0.4454) loss 5.9273 (6.0876) grad_norm 4.0020 (2.8202) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][150/625] eta 0:03:32 lr 0.000196 wd 0.0500 time 0.4458 (0.4483) data time 0.0009 (0.0033) model time 0.4449 (0.4451) loss 6.2415 (6.0983) grad_norm 2.2429 (2.8379) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][160/625] eta 0:03:28 lr 0.000197 wd 0.0500 time 0.4452 (0.4481) data time 0.0008 (0.0031) model time 0.4444 (0.4450) loss 6.3730 (6.1033) grad_norm 2.8829 (2.8321) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][170/625] eta 0:03:23 lr 0.000198 wd 0.0500 time 0.4426 (0.4479) data time 0.0007 (0.0030) model time 0.4419 (0.4448) loss 5.7280 (6.1045) grad_norm 2.4334 (2.8290) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][180/625] eta 0:03:19 lr 0.000199 wd 0.0500 time 0.4417 (0.4477) data time 0.0007 (0.0029) model time 0.4410 (0.4447) loss 5.7071 (6.1072) grad_norm 2.7507 (2.8228) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][190/625] eta 0:03:14 lr 0.000200 wd 0.0500 time 0.4406 (0.4474) data time 0.0008 (0.0028) model time 0.4398 (0.4445) loss 5.9893 (6.1016) grad_norm 3.0958 (2.8232) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][200/625] eta 0:03:10 lr 0.000201 wd 0.0500 time 0.4429 (0.4472) data time 0.0007 (0.0027) model time 0.4421 (0.4444) loss 6.2172 (6.1061) grad_norm 2.6953 (2.8234) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][210/625] eta 0:03:05 lr 0.000202 wd 0.0500 time 0.4475 (0.4471) data time 0.0007 (0.0026) model time 0.4469 (0.4444) loss 6.2435 (6.1036) grad_norm 3.2335 (2.8202) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][220/625] eta 0:03:01 lr 0.000203 wd 0.0500 time 0.4423 (0.4470) data time 0.0006 (0.0026) model time 0.4417 (0.4443) loss 6.3871 (6.1074) grad_norm 2.3922 (2.7978) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][230/625] eta 0:02:56 lr 0.000204 wd 0.0500 time 0.4432 (0.4469) data time 0.0007 (0.0025) model time 0.4425 (0.4442) loss 6.3953 (6.1075) grad_norm 2.0167 (2.7871) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][240/625] eta 0:02:52 lr 0.000205 wd 0.0500 time 0.4492 (0.4468) data time 0.0007 (0.0024) model time 0.4485 (0.4443) loss 6.2572 (6.1078) grad_norm 2.7368 (2.7951) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][250/625] eta 0:02:47 lr 0.000206 wd 0.0500 time 0.4417 (0.4467) data time 0.0008 (0.0024) model time 0.4409 (0.4442) loss 6.3036 (6.1050) grad_norm 3.0814 (2.7846) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][260/625] eta 0:02:42 lr 0.000207 wd 0.0500 time 0.4425 (0.4466) data time 0.0009 (0.0023) model time 0.4417 (0.4441) loss 6.4528 (6.1084) grad_norm 2.6767 (2.7747) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][270/625] eta 0:02:38 lr 0.000208 wd 0.0500 time 0.4440 (0.4465) data time 0.0009 (0.0023) model time 0.4431 (0.4441) loss 6.4521 (6.1069) grad_norm 2.8947 (2.7663) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][280/625] eta 0:02:33 lr 0.000209 wd 0.0500 time 0.4452 (0.4464) data time 0.0008 (0.0022) model time 0.4444 (0.4440) loss 6.3206 (6.1061) grad_norm 2.7879 (2.7559) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][290/625] eta 0:02:29 lr 0.000209 wd 0.0500 time 0.4340 (0.4463) data time 0.0007 (0.0022) model time 0.4333 (0.4440) loss 6.2580 (6.1062) grad_norm 2.7084 (2.7665) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][300/625] eta 0:02:25 lr 0.000210 wd 0.0500 time 0.4414 (0.4462) data time 0.0009 (0.0021) model time 0.4405 (0.4440) loss 6.2670 (6.1023) grad_norm 2.6233 (2.7737) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][310/625] eta 0:02:20 lr 0.000211 wd 0.0500 time 0.4433 (0.4462) data time 0.0007 (0.0021) model time 0.4427 (0.4439) loss 6.4255 (6.1023) grad_norm 2.0560 (2.7678) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][320/625] eta 0:02:16 lr 0.000212 wd 0.0500 time 0.4427 (0.4461) data time 0.0007 (0.0020) model time 0.4421 (0.4438) loss 5.4207 (6.1031) grad_norm 3.3955 (2.7693) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][330/625] eta 0:02:11 lr 0.000213 wd 0.0500 time 0.4477 (0.4460) data time 0.0008 (0.0020) model time 0.4469 (0.4439) loss 5.7377 (6.0977) grad_norm 2.7314 (2.7785) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][340/625] eta 0:02:07 lr 0.000214 wd 0.0500 time 0.4409 (0.4459) data time 0.0008 (0.0020) model time 0.4401 (0.4438) loss 5.9161 (6.0951) grad_norm 4.0566 (2.7731) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][350/625] eta 0:02:02 lr 0.000215 wd 0.0500 time 0.4431 (0.4459) data time 0.0007 (0.0019) model time 0.4424 (0.4438) loss 6.4695 (6.0912) grad_norm 1.8346 (2.7684) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][360/625] eta 0:01:58 lr 0.000216 wd 0.0500 time 0.4451 (0.4464) data time 0.0009 (0.0019) model time 0.4442 (0.4444) loss 6.2139 (6.0853) grad_norm 2.9225 (2.7648) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][370/625] eta 0:01:53 lr 0.000217 wd 0.0500 time 0.4421 (0.4463) data time 0.0007 (0.0019) model time 0.4414 (0.4444) loss 6.0263 (6.0831) grad_norm 2.3841 (2.7532) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][380/625] eta 0:01:49 lr 0.000218 wd 0.0500 time 0.4441 (0.4467) data time 0.0006 (0.0019) model time 0.4435 (0.4448) loss 6.3655 (6.0806) grad_norm 2.1267 (2.7414) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][390/625] eta 0:01:44 lr 0.000219 wd 0.0500 time 0.4405 (0.4467) data time 0.0008 (0.0018) model time 0.4397 (0.4448) loss 6.0143 (6.0795) grad_norm 2.8524 (2.7341) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][400/625] eta 0:01:40 lr 0.000220 wd 0.0500 time 0.4514 (0.4466) data time 0.0006 (0.0018) model time 0.4509 (0.4448) loss 6.0327 (6.0794) grad_norm 2.2861 (2.7273) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][410/625] eta 0:01:36 lr 0.000221 wd 0.0500 time 0.4434 (0.4466) data time 0.0007 (0.0018) model time 0.4427 (0.4448) loss 6.1586 (6.0759) grad_norm 3.1344 (2.7234) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][420/625] eta 0:01:31 lr 0.000222 wd 0.0500 time 0.4433 (0.4465) data time 0.0008 (0.0018) model time 0.4425 (0.4448) loss 6.2725 (6.0732) grad_norm 3.0278 (2.7168) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][430/625] eta 0:01:27 lr 0.000223 wd 0.0500 time 0.4451 (0.4465) data time 0.0008 (0.0017) model time 0.4444 (0.4447) loss 6.1872 (6.0710) grad_norm 4.3521 (2.7269) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][440/625] eta 0:01:22 lr 0.000224 wd 0.0500 time 0.4459 (0.4464) data time 0.0006 (0.0017) model time 0.4453 (0.4447) loss 5.9782 (6.0688) grad_norm 3.4508 (2.7310) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][450/625] eta 0:01:18 lr 0.000225 wd 0.0500 time 0.4466 (0.4464) data time 0.0009 (0.0017) model time 0.4458 (0.4447) loss 6.0691 (6.0707) grad_norm 1.9054 (2.7271) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][460/625] eta 0:01:13 lr 0.000226 wd 0.0500 time 0.4436 (0.4464) data time 0.0007 (0.0017) model time 0.4429 (0.4446) loss 6.3101 (6.0713) grad_norm 3.3327 (2.7242) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][470/625] eta 0:01:09 lr 0.000227 wd 0.0500 time 0.4463 (0.4463) data time 0.0008 (0.0017) model time 0.4455 (0.4446) loss 5.9205 (6.0683) grad_norm 2.9701 (2.7340) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][480/625] eta 0:01:04 lr 0.000228 wd 0.0500 time 0.4427 (0.4463) data time 0.0008 (0.0017) model time 0.4419 (0.4446) loss 5.7072 (6.0656) grad_norm 2.2898 (2.7319) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][490/625] eta 0:01:00 lr 0.000229 wd 0.0500 time 0.4424 (0.4462) data time 0.0008 (0.0016) model time 0.4416 (0.4445) loss 6.4096 (6.0644) grad_norm 4.9552 (2.7317) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][500/625] eta 0:00:55 lr 0.000230 wd 0.0500 time 0.4433 (0.4462) data time 0.0007 (0.0016) model time 0.4426 (0.4445) loss 6.1518 (6.0601) grad_norm 2.3258 (2.7321) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][510/625] eta 0:00:51 lr 0.000231 wd 0.0500 time 0.4459 (0.4465) data time 0.0007 (0.0016) model time 0.4452 (0.4449) loss 6.0920 (6.0582) grad_norm 2.2195 (2.7264) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][520/625] eta 0:00:46 lr 0.000232 wd 0.0500 time 0.4399 (0.4465) data time 0.0008 (0.0016) model time 0.4391 (0.4449) loss 6.1618 (6.0542) grad_norm 2.9930 (2.7187) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:56:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][530/625] eta 0:00:42 lr 0.000232 wd 0.0500 time 0.4413 (0.4464) data time 0.0007 (0.0016) model time 0.4406 (0.4448) loss 6.3272 (6.0533) grad_norm 2.6716 (2.7168) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][540/625] eta 0:00:37 lr 0.000233 wd 0.0500 time 0.4438 (0.4464) data time 0.0007 (0.0016) model time 0.4431 (0.4448) loss 5.3096 (6.0489) grad_norm 2.9916 (2.7089) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][550/625] eta 0:00:33 lr 0.000234 wd 0.0500 time 0.4456 (0.4463) data time 0.0006 (0.0016) model time 0.4449 (0.4448) loss 5.2176 (6.0446) grad_norm 3.1972 (2.7070) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][560/625] eta 0:00:29 lr 0.000235 wd 0.0500 time 0.4425 (0.4463) data time 0.0008 (0.0016) model time 0.4417 (0.4447) loss 6.1792 (6.0460) grad_norm 2.9714 (2.7066) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][570/625] eta 0:00:24 lr 0.000236 wd 0.0500 time 0.4404 (0.4463) data time 0.0007 (0.0015) model time 0.4397 (0.4448) loss 5.6586 (6.0446) grad_norm 2.1957 (2.6977) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][580/625] eta 0:00:20 lr 0.000237 wd 0.0500 time 0.4397 (0.4463) data time 0.0006 (0.0015) model time 0.4391 (0.4447) loss 6.0725 (6.0466) grad_norm 2.4234 (2.7021) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][590/625] eta 0:00:15 lr 0.000238 wd 0.0500 time 0.4426 (0.4462) data time 0.0006 (0.0015) model time 0.4420 (0.4447) loss 5.5808 (6.0398) grad_norm 2.3074 (2.6943) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][600/625] eta 0:00:11 lr 0.000239 wd 0.0500 time 0.4514 (0.4462) data time 0.0006 (0.0015) model time 0.4507 (0.4446) loss 5.6154 (6.0392) grad_norm 2.1299 (2.6888) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][610/625] eta 0:00:06 lr 0.000240 wd 0.0500 time 0.4458 (0.4461) data time 0.0004 (0.0015) model time 0.4454 (0.4446) loss 6.3454 (6.0388) grad_norm 2.2246 (2.6864) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [3/300][620/625] eta 0:00:02 lr 0.000241 wd 0.0500 time 0.4381 (0.4460) data time 0.0006 (0.0015) model time 0.4374 (0.4445) loss 5.6039 (6.0362) grad_norm 1.9903 (2.6817) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 3 training takes 0:04:38 [2024-08-04 09:57:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 09:57:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 09:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 3.7598 (3.7598) Acc@1 25.342 (25.342) Acc@5 50.439 (50.439) Mem 16696MB [2024-08-04 09:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 4.6758 (3.9714) Acc@1 12.842 (19.922) Acc@5 31.885 (44.234) Mem 16696MB [2024-08-04 09:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 4.5469 (4.1855) Acc@1 11.279 (18.092) Acc@5 31.104 (40.204) Mem 16696MB [2024-08-04 09:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 18.816 Acc@5 41.131 [2024-08-04 09:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 18.8% [2024-08-04 09:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 18.82% [2024-08-04 09:57:44 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 09:57:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 09:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 6.9805 (6.9805) Acc@1 0.049 (0.049) Acc@5 0.635 (0.635) Mem 16696MB [2024-08-04 09:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 6.8984 (6.9684) Acc@1 0.146 (0.115) Acc@5 0.635 (0.688) Mem 16696MB [2024-08-04 09:57:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.9648 (6.9702) Acc@1 0.146 (0.123) Acc@5 0.488 (0.625) Mem 16696MB [2024-08-04 09:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.136 Acc@5 0.660 [2024-08-04 09:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-08-04 09:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][0/625] eta 0:12:52 lr 0.000242 wd 0.0500 time 1.2357 (1.2357) data time 0.4972 (0.4972) model time 0.0000 (0.0000) loss 6.2970 (6.2970) grad_norm 2.7726 (2.7726) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][10/625] eta 0:05:17 lr 0.000243 wd 0.0500 time 0.4410 (0.5156) data time 0.0006 (0.0460) model time 0.0000 (0.0000) loss 6.2433 (5.7769) grad_norm 2.0523 (2.4009) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][20/625] eta 0:04:51 lr 0.000244 wd 0.0500 time 0.4439 (0.4819) data time 0.0008 (0.0245) model time 0.0000 (0.0000) loss 5.8505 (5.8918) grad_norm 3.2041 (2.5661) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][30/625] eta 0:04:39 lr 0.000244 wd 0.0500 time 0.4453 (0.4705) data time 0.0008 (0.0169) model time 0.0000 (0.0000) loss 6.2631 (5.9305) grad_norm 2.1310 (2.4439) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][40/625] eta 0:04:31 lr 0.000245 wd 0.0500 time 0.4443 (0.4642) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 5.4839 (5.9015) grad_norm 1.9685 (2.3981) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][50/625] eta 0:04:24 lr 0.000246 wd 0.0500 time 0.4404 (0.4602) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 6.1110 (5.9288) grad_norm 2.1104 (2.3654) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][60/625] eta 0:04:18 lr 0.000247 wd 0.0500 time 0.4465 (0.4576) data time 0.0006 (0.0090) model time 0.4458 (0.4432) loss 6.2270 (5.9728) grad_norm 2.4590 (2.3873) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][70/625] eta 0:04:12 lr 0.000248 wd 0.0500 time 0.4432 (0.4557) data time 0.0007 (0.0079) model time 0.4425 (0.4432) loss 6.2579 (5.9872) grad_norm 2.2317 (2.4170) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][80/625] eta 0:04:07 lr 0.000249 wd 0.0500 time 0.4411 (0.4543) data time 0.0009 (0.0070) model time 0.4402 (0.4433) loss 5.7841 (5.9770) grad_norm 1.9451 (2.4013) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][90/625] eta 0:04:02 lr 0.000250 wd 0.0500 time 0.4482 (0.4533) data time 0.0006 (0.0063) model time 0.4475 (0.4435) loss 6.2171 (5.9562) grad_norm 2.3514 (2.4249) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][100/625] eta 0:03:58 lr 0.000251 wd 0.0500 time 0.4518 (0.4548) data time 0.0006 (0.0058) model time 0.4512 (0.4483) loss 6.2774 (5.9574) grad_norm 2.5981 (2.4122) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][110/625] eta 0:03:53 lr 0.000252 wd 0.0500 time 0.4459 (0.4539) data time 0.0007 (0.0054) model time 0.4452 (0.4476) loss 6.1269 (5.9312) grad_norm 2.3316 (2.4227) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][120/625] eta 0:03:48 lr 0.000253 wd 0.0500 time 0.4433 (0.4531) data time 0.0007 (0.0050) model time 0.4427 (0.4470) loss 5.8028 (5.9326) grad_norm 2.7217 (2.4234) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][130/625] eta 0:03:43 lr 0.000254 wd 0.0500 time 0.4407 (0.4524) data time 0.0010 (0.0047) model time 0.4397 (0.4466) loss 5.9286 (5.9241) grad_norm 2.3016 (2.4226) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][140/625] eta 0:03:39 lr 0.000255 wd 0.0500 time 0.4447 (0.4519) data time 0.0008 (0.0044) model time 0.4440 (0.4463) loss 6.2656 (5.9272) grad_norm 1.7940 (2.4041) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][150/625] eta 0:03:34 lr 0.000256 wd 0.0500 time 0.4420 (0.4515) data time 0.0008 (0.0042) model time 0.4411 (0.4461) loss 5.3649 (5.9256) grad_norm 2.4922 (2.3963) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][160/625] eta 0:03:29 lr 0.000257 wd 0.0500 time 0.4423 (0.4510) data time 0.0008 (0.0040) model time 0.4415 (0.4458) loss 5.5872 (5.9243) grad_norm 2.8404 (2.3840) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][170/625] eta 0:03:25 lr 0.000258 wd 0.0500 time 0.4430 (0.4506) data time 0.0008 (0.0038) model time 0.4422 (0.4456) loss 6.2031 (5.9254) grad_norm 2.4487 (2.3887) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][180/625] eta 0:03:20 lr 0.000259 wd 0.0500 time 0.4437 (0.4503) data time 0.0007 (0.0036) model time 0.4430 (0.4455) loss 6.0167 (5.9322) grad_norm 2.8208 (2.4308) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][190/625] eta 0:03:15 lr 0.000260 wd 0.0500 time 0.4446 (0.4500) data time 0.0008 (0.0035) model time 0.4438 (0.4454) loss 6.2487 (5.9319) grad_norm 2.3158 (2.4203) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][200/625] eta 0:03:11 lr 0.000261 wd 0.0500 time 0.4456 (0.4498) data time 0.0008 (0.0034) model time 0.4448 (0.4453) loss 5.9681 (5.9327) grad_norm 2.5011 (2.4312) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][210/625] eta 0:03:06 lr 0.000262 wd 0.0500 time 0.4445 (0.4495) data time 0.0008 (0.0032) model time 0.4436 (0.4452) loss 5.8318 (5.9377) grad_norm 2.4425 (2.4395) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][220/625] eta 0:03:01 lr 0.000263 wd 0.0500 time 0.4440 (0.4493) data time 0.0008 (0.0031) model time 0.4432 (0.4451) loss 5.9076 (5.9338) grad_norm 2.0123 (2.4266) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][230/625] eta 0:02:57 lr 0.000264 wd 0.0500 time 0.4485 (0.4491) data time 0.0008 (0.0030) model time 0.4477 (0.4451) loss 5.7949 (5.9360) grad_norm 2.0218 (2.4143) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][240/625] eta 0:02:52 lr 0.000265 wd 0.0500 time 0.4443 (0.4489) data time 0.0008 (0.0029) model time 0.4435 (0.4450) loss 5.4833 (5.9232) grad_norm 3.6567 (2.4289) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][250/625] eta 0:02:48 lr 0.000266 wd 0.0500 time 0.4417 (0.4487) data time 0.0007 (0.0029) model time 0.4411 (0.4449) loss 5.6798 (5.9192) grad_norm 2.9317 (2.4432) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][260/625] eta 0:02:43 lr 0.000267 wd 0.0500 time 0.4423 (0.4485) data time 0.0008 (0.0028) model time 0.4414 (0.4448) loss 5.4432 (5.9112) grad_norm 2.1141 (2.4347) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][270/625] eta 0:02:39 lr 0.000267 wd 0.0500 time 0.4422 (0.4483) data time 0.0008 (0.0027) model time 0.4415 (0.4447) loss 6.0646 (5.9168) grad_norm 2.6501 (2.4242) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][280/625] eta 0:02:34 lr 0.000268 wd 0.0500 time 0.4478 (0.4482) data time 0.0009 (0.0027) model time 0.4469 (0.4446) loss 6.1895 (5.9113) grad_norm 2.1570 (2.4202) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 09:59:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][290/625] eta 0:02:30 lr 0.000269 wd 0.0500 time 0.4404 (0.4480) data time 0.0009 (0.0026) model time 0.4396 (0.4446) loss 6.1680 (5.9040) grad_norm 2.1806 (2.4090) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][300/625] eta 0:02:25 lr 0.000270 wd 0.0500 time 0.4426 (0.4479) data time 0.0008 (0.0025) model time 0.4418 (0.4445) loss 6.1249 (5.8994) grad_norm 2.9698 (2.4005) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][310/625] eta 0:02:21 lr 0.000271 wd 0.0500 time 0.4415 (0.4478) data time 0.0007 (0.0025) model time 0.4408 (0.4444) loss 5.9166 (5.8998) grad_norm 2.8499 (2.4023) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][320/625] eta 0:02:16 lr 0.000272 wd 0.0500 time 0.4453 (0.4481) data time 0.0006 (0.0024) model time 0.4447 (0.4449) loss 5.8164 (5.8971) grad_norm 3.3485 (2.4041) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][330/625] eta 0:02:12 lr 0.000273 wd 0.0500 time 0.4426 (0.4479) data time 0.0006 (0.0024) model time 0.4420 (0.4448) loss 5.4513 (5.8894) grad_norm 2.2867 (2.4024) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][340/625] eta 0:02:07 lr 0.000274 wd 0.0500 time 0.4416 (0.4478) data time 0.0010 (0.0023) model time 0.4406 (0.4447) loss 5.1595 (5.8852) grad_norm 2.3190 (2.4010) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][350/625] eta 0:02:03 lr 0.000275 wd 0.0500 time 0.4405 (0.4477) data time 0.0009 (0.0023) model time 0.4396 (0.4446) loss 5.6258 (5.8762) grad_norm 2.2475 (2.4004) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][360/625] eta 0:01:58 lr 0.000276 wd 0.0500 time 0.4514 (0.4476) data time 0.0007 (0.0023) model time 0.4507 (0.4446) loss 5.0978 (5.8620) grad_norm 2.1356 (2.4019) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][370/625] eta 0:01:54 lr 0.000277 wd 0.0500 time 0.4461 (0.4475) data time 0.0006 (0.0022) model time 0.4455 (0.4445) loss 5.2918 (5.8567) grad_norm 2.0678 (2.4051) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][380/625] eta 0:01:49 lr 0.000278 wd 0.0500 time 0.4476 (0.4474) data time 0.0009 (0.0022) model time 0.4467 (0.4445) loss 5.7643 (5.8518) grad_norm 1.7427 (2.3988) loss_scale 65536.0000 (33284.0315) mem 16696MB [2024-08-04 10:00:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][390/625] eta 0:01:45 lr 0.000279 wd 0.0500 time 0.4437 (0.4473) data time 0.0009 (0.0022) model time 0.4428 (0.4444) loss 6.0681 (5.8443) grad_norm 1.8023 (2.3913) loss_scale 65536.0000 (34108.8900) mem 16696MB [2024-08-04 10:00:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][400/625] eta 0:01:40 lr 0.000280 wd 0.0500 time 0.4446 (0.4472) data time 0.0008 (0.0021) model time 0.4438 (0.4444) loss 5.4080 (5.8432) grad_norm 2.2324 (2.3880) loss_scale 65536.0000 (34892.6085) mem 16696MB [2024-08-04 10:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][410/625] eta 0:01:36 lr 0.000281 wd 0.0500 time 0.4424 (0.4471) data time 0.0010 (0.0021) model time 0.4414 (0.4443) loss 5.9693 (5.8356) grad_norm 2.4380 (2.3873) loss_scale 65536.0000 (35638.1898) mem 16696MB [2024-08-04 10:00:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][420/625] eta 0:01:31 lr 0.000282 wd 0.0500 time 0.4443 (0.4470) data time 0.0007 (0.0021) model time 0.4437 (0.4443) loss 5.2402 (5.8347) grad_norm 1.6298 (2.3787) loss_scale 65536.0000 (36348.3515) mem 16696MB [2024-08-04 10:01:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][430/625] eta 0:01:27 lr 0.000283 wd 0.0500 time 0.4393 (0.4475) data time 0.0007 (0.0020) model time 0.4386 (0.4448) loss 5.6093 (5.8363) grad_norm 1.8082 (2.3679) loss_scale 65536.0000 (37025.5592) mem 16696MB [2024-08-04 10:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][440/625] eta 0:01:22 lr 0.000284 wd 0.0500 time 0.4434 (0.4474) data time 0.0007 (0.0020) model time 0.4428 (0.4448) loss 5.9452 (5.8378) grad_norm 2.4137 (2.3626) loss_scale 65536.0000 (37672.0544) mem 16696MB [2024-08-04 10:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][450/625] eta 0:01:18 lr 0.000285 wd 0.0500 time 0.4436 (0.4473) data time 0.0007 (0.0020) model time 0.4430 (0.4447) loss 5.8390 (5.8344) grad_norm 2.2080 (2.3620) loss_scale 65536.0000 (38289.8803) mem 16696MB [2024-08-04 10:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][460/625] eta 0:01:13 lr 0.000286 wd 0.0500 time 0.4427 (0.4475) data time 0.0008 (0.0020) model time 0.4418 (0.4451) loss 5.7321 (5.8325) grad_norm 2.1606 (2.3580) loss_scale 65536.0000 (38880.9024) mem 16696MB [2024-08-04 10:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][470/625] eta 0:01:09 lr 0.000287 wd 0.0500 time 0.4437 (0.4474) data time 0.0007 (0.0019) model time 0.4430 (0.4450) loss 5.9261 (5.8318) grad_norm 2.4528 (2.3549) loss_scale 65536.0000 (39446.8280) mem 16696MB [2024-08-04 10:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][480/625] eta 0:01:04 lr 0.000288 wd 0.0500 time 0.4434 (0.4474) data time 0.0007 (0.0019) model time 0.4428 (0.4450) loss 5.0394 (5.8300) grad_norm 1.9635 (2.3531) loss_scale 65536.0000 (39989.2225) mem 16696MB [2024-08-04 10:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][490/625] eta 0:01:00 lr 0.000289 wd 0.0500 time 0.4427 (0.4473) data time 0.0009 (0.0019) model time 0.4418 (0.4449) loss 6.0846 (5.8270) grad_norm 2.3651 (2.3508) loss_scale 65536.0000 (40509.5234) mem 16696MB [2024-08-04 10:01:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][500/625] eta 0:00:55 lr 0.000290 wd 0.0500 time 0.4429 (0.4473) data time 0.0006 (0.0019) model time 0.4423 (0.4449) loss 5.4446 (5.8250) grad_norm 2.8464 (2.3527) loss_scale 65536.0000 (41009.0539) mem 16696MB [2024-08-04 10:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][510/625] eta 0:00:51 lr 0.000290 wd 0.0500 time 0.4457 (0.4472) data time 0.0007 (0.0019) model time 0.4450 (0.4449) loss 6.0272 (5.8217) grad_norm 2.4902 (2.3504) loss_scale 65536.0000 (41489.0333) mem 16696MB [2024-08-04 10:01:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][520/625] eta 0:00:46 lr 0.000291 wd 0.0500 time 0.4421 (0.4472) data time 0.0007 (0.0018) model time 0.4414 (0.4449) loss 6.0326 (5.8197) grad_norm 1.7744 (2.3432) loss_scale 65536.0000 (41950.5873) mem 16696MB [2024-08-04 10:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][530/625] eta 0:00:42 lr 0.000292 wd 0.0500 time 0.4463 (0.4472) data time 0.0007 (0.0018) model time 0.4457 (0.4449) loss 5.8267 (5.8119) grad_norm 2.7667 (2.3432) loss_scale 65536.0000 (42394.7571) mem 16696MB [2024-08-04 10:01:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][540/625] eta 0:00:38 lr 0.000293 wd 0.0500 time 0.4446 (0.4471) data time 0.0006 (0.0018) model time 0.4440 (0.4449) loss 4.8123 (5.8118) grad_norm 2.1599 (2.3410) loss_scale 65536.0000 (42822.5065) mem 16696MB [2024-08-04 10:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][550/625] eta 0:00:33 lr 0.000294 wd 0.0500 time 0.4397 (0.4471) data time 0.0009 (0.0018) model time 0.4389 (0.4449) loss 6.0304 (5.8100) grad_norm 2.7128 (2.3376) loss_scale 65536.0000 (43234.7296) mem 16696MB [2024-08-04 10:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][560/625] eta 0:00:29 lr 0.000295 wd 0.0500 time 0.4439 (0.4471) data time 0.0007 (0.0018) model time 0.4433 (0.4449) loss 5.9704 (5.8062) grad_norm 1.9532 (2.3369) loss_scale 65536.0000 (43632.2567) mem 16696MB [2024-08-04 10:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][570/625] eta 0:00:24 lr 0.000296 wd 0.0500 time 0.4505 (0.4470) data time 0.0008 (0.0018) model time 0.4497 (0.4448) loss 5.6833 (5.8023) grad_norm 2.3972 (2.3330) loss_scale 65536.0000 (44015.8599) mem 16696MB [2024-08-04 10:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][580/625] eta 0:00:20 lr 0.000297 wd 0.0500 time 0.4406 (0.4470) data time 0.0006 (0.0017) model time 0.4400 (0.4448) loss 5.2241 (5.7990) grad_norm 1.8964 (2.3279) loss_scale 65536.0000 (44386.2582) mem 16696MB [2024-08-04 10:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][590/625] eta 0:00:15 lr 0.000298 wd 0.0500 time 0.4450 (0.4470) data time 0.0008 (0.0017) model time 0.4443 (0.4448) loss 5.6682 (5.7958) grad_norm 1.9592 (2.3235) loss_scale 65536.0000 (44744.1218) mem 16696MB [2024-08-04 10:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][600/625] eta 0:00:11 lr 0.000299 wd 0.0500 time 0.4445 (0.4469) data time 0.0007 (0.0017) model time 0.4438 (0.4448) loss 5.9313 (5.7959) grad_norm 1.8574 (2.3223) loss_scale 65536.0000 (45090.0765) mem 16696MB [2024-08-04 10:02:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][610/625] eta 0:00:06 lr 0.000300 wd 0.0500 time 0.4389 (0.4469) data time 0.0006 (0.0017) model time 0.4383 (0.4448) loss 5.7495 (5.7938) grad_norm 2.2196 (2.3161) loss_scale 65536.0000 (45424.7070) mem 16696MB [2024-08-04 10:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [4/300][620/625] eta 0:00:02 lr 0.000301 wd 0.0500 time 0.4399 (0.4471) data time 0.0003 (0.0017) model time 0.4396 (0.4450) loss 5.5333 (5.7935) grad_norm 1.9347 (2.3125) loss_scale 65536.0000 (45748.5604) mem 16696MB [2024-08-04 10:02:28 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 4 training takes 0:04:39 [2024-08-04 10:02:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:02:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 2.9746 (2.9746) Acc@1 37.061 (37.061) Acc@5 65.088 (65.088) Mem 16696MB [2024-08-04 10:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 4.0820 (3.3580) Acc@1 21.045 (28.729) Acc@5 42.041 (55.961) Mem 16696MB [2024-08-04 10:02:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 4.2148 (3.6526) Acc@1 15.576 (25.379) Acc@5 36.816 (50.228) Mem 16696MB [2024-08-04 10:02:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 26.176 Acc@5 50.998 [2024-08-04 10:02:33 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 26.2% [2024-08-04 10:02:33 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 26.18% [2024-08-04 10:02:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:02:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 6.9570 (6.9570) Acc@1 0.244 (0.244) Acc@5 0.879 (0.879) Mem 16696MB [2024-08-04 10:02:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.152) Loss 6.9336 (6.9918) Acc@1 0.049 (0.107) Acc@5 0.586 (0.599) Mem 16696MB [2024-08-04 10:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.9648 (6.9907) Acc@1 0.244 (0.112) Acc@5 0.977 (0.660) Mem 16696MB [2024-08-04 10:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.126 Acc@5 0.734 [2024-08-04 10:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.1% [2024-08-04 10:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][0/625] eta 0:12:43 lr 0.000301 wd 0.0500 time 1.2209 (1.2209) data time 0.7038 (0.7038) model time 0.0000 (0.0000) loss 5.1908 (5.1908) grad_norm 2.5978 (2.5978) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][10/625] eta 0:05:16 lr 0.000302 wd 0.0500 time 0.4434 (0.5145) data time 0.0008 (0.0648) model time 0.0000 (0.0000) loss 5.9476 (5.6617) grad_norm 2.4595 (2.4043) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][20/625] eta 0:04:50 lr 0.000303 wd 0.0500 time 0.4462 (0.4808) data time 0.0007 (0.0343) model time 0.0000 (0.0000) loss 5.4209 (5.5199) grad_norm 1.7398 (2.3203) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][30/625] eta 0:04:39 lr 0.000304 wd 0.0500 time 0.4434 (0.4691) data time 0.0008 (0.0236) model time 0.0000 (0.0000) loss 6.1718 (5.5973) grad_norm 1.7766 (2.2609) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][40/625] eta 0:04:32 lr 0.000305 wd 0.0500 time 0.4444 (0.4661) data time 0.0006 (0.0180) model time 0.0000 (0.0000) loss 5.3686 (5.5860) grad_norm 2.3983 (2.2236) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][50/625] eta 0:04:25 lr 0.000306 wd 0.0500 time 0.4435 (0.4618) data time 0.0008 (0.0147) model time 0.0000 (0.0000) loss 5.5867 (5.5682) grad_norm 3.0416 (2.2389) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][60/625] eta 0:04:19 lr 0.000307 wd 0.0500 time 0.4450 (0.4589) data time 0.0006 (0.0124) model time 0.4444 (0.4434) loss 6.3458 (5.5931) grad_norm 1.9910 (2.2252) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][70/625] eta 0:04:13 lr 0.000308 wd 0.0500 time 0.4447 (0.4569) data time 0.0007 (0.0108) model time 0.4440 (0.4436) loss 5.3672 (5.5841) grad_norm 2.2427 (2.2379) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][80/625] eta 0:04:08 lr 0.000309 wd 0.0500 time 0.4436 (0.4554) data time 0.0007 (0.0096) model time 0.4429 (0.4436) loss 6.1722 (5.6197) grad_norm 2.5892 (2.2595) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][90/625] eta 0:04:02 lr 0.000310 wd 0.0500 time 0.4454 (0.4541) data time 0.0009 (0.0086) model time 0.4445 (0.4434) loss 4.8991 (5.6064) grad_norm 2.5878 (2.2792) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][100/625] eta 0:03:57 lr 0.000311 wd 0.0500 time 0.4435 (0.4531) data time 0.0007 (0.0079) model time 0.4428 (0.4432) loss 6.3782 (5.6096) grad_norm 2.0051 (2.2698) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][110/625] eta 0:03:52 lr 0.000312 wd 0.0500 time 0.4422 (0.4522) data time 0.0008 (0.0073) model time 0.4414 (0.4432) loss 5.9299 (5.6417) grad_norm 1.7349 (2.2371) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][120/625] eta 0:03:48 lr 0.000313 wd 0.0500 time 0.4488 (0.4516) data time 0.0007 (0.0067) model time 0.4481 (0.4432) loss 4.8368 (5.6484) grad_norm 1.8836 (2.2140) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][130/625] eta 0:03:43 lr 0.000314 wd 0.0500 time 0.4424 (0.4511) data time 0.0009 (0.0063) model time 0.4415 (0.4433) loss 5.5809 (5.6497) grad_norm 1.9725 (2.2228) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][140/625] eta 0:03:39 lr 0.000315 wd 0.0500 time 0.4551 (0.4521) data time 0.0008 (0.0059) model time 0.4544 (0.4457) loss 5.5415 (5.6481) grad_norm 2.1298 (2.2348) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][150/625] eta 0:03:34 lr 0.000316 wd 0.0500 time 0.4432 (0.4515) data time 0.0007 (0.0056) model time 0.4425 (0.4452) loss 5.1522 (5.6364) grad_norm 1.8765 (2.2247) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][160/625] eta 0:03:29 lr 0.000317 wd 0.0500 time 0.4448 (0.4510) data time 0.0006 (0.0053) model time 0.4442 (0.4451) loss 5.6232 (5.6301) grad_norm 2.2532 (2.2318) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][170/625] eta 0:03:25 lr 0.000318 wd 0.0500 time 0.4456 (0.4507) data time 0.0006 (0.0050) model time 0.4450 (0.4450) loss 6.1489 (5.6410) grad_norm 1.8379 (2.2315) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][180/625] eta 0:03:20 lr 0.000319 wd 0.0500 time 0.4458 (0.4503) data time 0.0009 (0.0048) model time 0.4449 (0.4448) loss 6.0747 (5.6368) grad_norm 1.6747 (2.2205) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][190/625] eta 0:03:15 lr 0.000320 wd 0.0500 time 0.4394 (0.4500) data time 0.0007 (0.0046) model time 0.4387 (0.4447) loss 4.9202 (5.6349) grad_norm 2.5431 (2.2229) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][200/625] eta 0:03:11 lr 0.000321 wd 0.0500 time 0.4431 (0.4497) data time 0.0008 (0.0044) model time 0.4423 (0.4446) loss 5.1013 (5.6260) grad_norm 1.9206 (2.2146) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][210/625] eta 0:03:06 lr 0.000322 wd 0.0500 time 0.4426 (0.4495) data time 0.0008 (0.0043) model time 0.4417 (0.4447) loss 5.6368 (5.6192) grad_norm 1.7818 (2.2117) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][220/625] eta 0:03:01 lr 0.000323 wd 0.0500 time 0.4430 (0.4493) data time 0.0007 (0.0041) model time 0.4423 (0.4446) loss 4.8664 (5.6161) grad_norm 2.3330 (2.2173) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][230/625] eta 0:02:57 lr 0.000324 wd 0.0500 time 0.4468 (0.4490) data time 0.0006 (0.0040) model time 0.4462 (0.4445) loss 5.2239 (5.6143) grad_norm 1.9229 (2.2149) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][240/625] eta 0:02:52 lr 0.000325 wd 0.0500 time 0.4428 (0.4488) data time 0.0007 (0.0038) model time 0.4421 (0.4444) loss 5.9810 (5.6118) grad_norm 1.9054 (2.2021) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][250/625] eta 0:02:48 lr 0.000325 wd 0.0500 time 0.4409 (0.4486) data time 0.0008 (0.0037) model time 0.4401 (0.4443) loss 5.7947 (5.6095) grad_norm 1.8230 (2.1940) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][260/625] eta 0:02:43 lr 0.000326 wd 0.0500 time 0.4425 (0.4484) data time 0.0008 (0.0036) model time 0.4417 (0.4442) loss 6.0804 (5.6170) grad_norm 2.1623 (2.1906) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][270/625] eta 0:02:39 lr 0.000327 wd 0.0500 time 0.4473 (0.4483) data time 0.0008 (0.0035) model time 0.4464 (0.4443) loss 5.3176 (5.6141) grad_norm 1.9991 (2.1784) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][280/625] eta 0:02:34 lr 0.000328 wd 0.0500 time 0.4438 (0.4482) data time 0.0008 (0.0034) model time 0.4430 (0.4443) loss 5.7477 (5.6117) grad_norm 2.2823 (2.1770) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][290/625] eta 0:02:30 lr 0.000329 wd 0.0500 time 0.4456 (0.4481) data time 0.0009 (0.0033) model time 0.4448 (0.4442) loss 6.0281 (5.6111) grad_norm 1.9155 (2.1772) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][300/625] eta 0:02:25 lr 0.000330 wd 0.0500 time 0.4427 (0.4479) data time 0.0006 (0.0033) model time 0.4421 (0.4441) loss 5.2704 (5.6102) grad_norm 1.9828 (2.1830) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][310/625] eta 0:02:21 lr 0.000331 wd 0.0500 time 0.4436 (0.4477) data time 0.0006 (0.0032) model time 0.4430 (0.4440) loss 5.1613 (5.6055) grad_norm 2.3758 (2.1873) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][320/625] eta 0:02:16 lr 0.000332 wd 0.0500 time 0.4468 (0.4476) data time 0.0008 (0.0031) model time 0.4460 (0.4439) loss 5.6734 (5.6023) grad_norm 3.3072 (2.1975) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][330/625] eta 0:02:12 lr 0.000333 wd 0.0500 time 0.4421 (0.4475) data time 0.0007 (0.0030) model time 0.4414 (0.4439) loss 6.2982 (5.6016) grad_norm 1.9905 (2.1903) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][340/625] eta 0:02:07 lr 0.000334 wd 0.0500 time 0.4515 (0.4474) data time 0.0009 (0.0030) model time 0.4506 (0.4439) loss 4.7997 (5.5935) grad_norm 2.2596 (2.1815) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][350/625] eta 0:02:03 lr 0.000335 wd 0.0500 time 0.4452 (0.4473) data time 0.0008 (0.0029) model time 0.4443 (0.4439) loss 5.5909 (5.5892) grad_norm 1.8172 (2.1761) loss_scale 65536.0000 (65536.0000) mem 16696MB [2024-08-04 10:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][360/625] eta 0:01:58 lr 0.000336 wd 0.0500 time 0.4426 (0.4472) data time 0.0008 (0.0029) model time 0.4418 (0.4439) loss 5.3626 (5.5819) grad_norm 2.5121 (inf) loss_scale 32768.0000 (65082.1496) mem 16696MB [2024-08-04 10:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][370/625] eta 0:01:54 lr 0.000337 wd 0.0500 time 0.4438 (0.4471) data time 0.0008 (0.0028) model time 0.4431 (0.4438) loss 5.9540 (5.5793) grad_norm 1.8424 (inf) loss_scale 32768.0000 (64211.1482) mem 16696MB [2024-08-04 10:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][380/625] eta 0:01:49 lr 0.000338 wd 0.0500 time 0.4442 (0.4474) data time 0.0006 (0.0028) model time 0.4436 (0.4443) loss 6.1522 (5.5787) grad_norm 1.8261 (inf) loss_scale 32768.0000 (63385.8688) mem 16696MB [2024-08-04 10:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][390/625] eta 0:01:45 lr 0.000339 wd 0.0500 time 0.4434 (0.4474) data time 0.0006 (0.0027) model time 0.4428 (0.4443) loss 5.8078 (5.5741) grad_norm 1.9839 (inf) loss_scale 32768.0000 (62602.8031) mem 16696MB [2024-08-04 10:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][400/625] eta 0:01:40 lr 0.000340 wd 0.0500 time 0.4439 (0.4473) data time 0.0006 (0.0027) model time 0.4433 (0.4442) loss 4.8047 (5.5693) grad_norm 2.7240 (inf) loss_scale 32768.0000 (61858.7930) mem 16696MB [2024-08-04 10:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][410/625] eta 0:01:36 lr 0.000341 wd 0.0500 time 0.4455 (0.4472) data time 0.0007 (0.0026) model time 0.4448 (0.4443) loss 5.2354 (5.5666) grad_norm 2.2363 (inf) loss_scale 32768.0000 (61150.9878) mem 16696MB [2024-08-04 10:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][420/625] eta 0:01:31 lr 0.000342 wd 0.0500 time 0.4423 (0.4472) data time 0.0008 (0.0026) model time 0.4415 (0.4443) loss 5.6794 (5.5616) grad_norm 3.2258 (inf) loss_scale 32768.0000 (60476.8076) mem 16696MB [2024-08-04 10:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][430/625] eta 0:01:27 lr 0.000343 wd 0.0500 time 0.4457 (0.4472) data time 0.0009 (0.0025) model time 0.4448 (0.4443) loss 6.0127 (5.5605) grad_norm 1.8910 (inf) loss_scale 32768.0000 (59833.9118) mem 16696MB [2024-08-04 10:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][440/625] eta 0:01:22 lr 0.000344 wd 0.0500 time 0.4477 (0.4471) data time 0.0006 (0.0025) model time 0.4471 (0.4443) loss 4.6627 (5.5653) grad_norm 1.6804 (inf) loss_scale 32768.0000 (59220.1723) mem 16696MB [2024-08-04 10:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][450/625] eta 0:01:18 lr 0.000345 wd 0.0500 time 0.4448 (0.4470) data time 0.0008 (0.0025) model time 0.4441 (0.4442) loss 5.7263 (5.5642) grad_norm 1.8510 (inf) loss_scale 32768.0000 (58633.6497) mem 16696MB [2024-08-04 10:06:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][460/625] eta 0:01:13 lr 0.000346 wd 0.0500 time 0.4455 (0.4470) data time 0.0008 (0.0024) model time 0.4447 (0.4443) loss 5.5435 (5.5681) grad_norm 1.7716 (inf) loss_scale 32768.0000 (58072.5727) mem 16696MB [2024-08-04 10:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][470/625] eta 0:01:09 lr 0.000347 wd 0.0500 time 0.4449 (0.4470) data time 0.0007 (0.0024) model time 0.4442 (0.4443) loss 5.0076 (5.5669) grad_norm 1.6517 (inf) loss_scale 32768.0000 (57535.3206) mem 16696MB [2024-08-04 10:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][480/625] eta 0:01:04 lr 0.000348 wd 0.0500 time 0.4459 (0.4478) data time 0.0007 (0.0024) model time 0.4452 (0.4453) loss 5.5508 (5.5641) grad_norm 1.7865 (inf) loss_scale 32768.0000 (57020.4075) mem 16696MB [2024-08-04 10:06:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][490/625] eta 0:01:00 lr 0.000348 wd 0.0500 time 0.4450 (0.4477) data time 0.0008 (0.0023) model time 0.4441 (0.4452) loss 6.0275 (5.5689) grad_norm 3.1657 (inf) loss_scale 32768.0000 (56526.4684) mem 16696MB [2024-08-04 10:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][500/625] eta 0:00:55 lr 0.000349 wd 0.0500 time 0.4440 (0.4477) data time 0.0007 (0.0023) model time 0.4433 (0.4452) loss 5.9679 (5.5688) grad_norm 2.2011 (inf) loss_scale 32768.0000 (56052.2475) mem 16696MB [2024-08-04 10:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][510/625] eta 0:00:51 lr 0.000350 wd 0.0500 time 0.4457 (0.4476) data time 0.0009 (0.0023) model time 0.4448 (0.4451) loss 5.1053 (5.5711) grad_norm 1.9819 (inf) loss_scale 32768.0000 (55596.5871) mem 16696MB [2024-08-04 10:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][520/625] eta 0:00:46 lr 0.000351 wd 0.0500 time 0.4443 (0.4475) data time 0.0008 (0.0023) model time 0.4435 (0.4451) loss 5.4703 (5.5672) grad_norm 2.4255 (inf) loss_scale 32768.0000 (55158.4184) mem 16696MB [2024-08-04 10:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][530/625] eta 0:00:42 lr 0.000352 wd 0.0500 time 0.4438 (0.4475) data time 0.0008 (0.0022) model time 0.4430 (0.4451) loss 5.8317 (5.5658) grad_norm 2.4177 (inf) loss_scale 32768.0000 (54736.7533) mem 16696MB [2024-08-04 10:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][540/625] eta 0:00:38 lr 0.000353 wd 0.0500 time 0.4488 (0.4474) data time 0.0006 (0.0022) model time 0.4481 (0.4450) loss 5.9729 (5.5659) grad_norm 2.6200 (inf) loss_scale 32768.0000 (54330.6765) mem 16696MB [2024-08-04 10:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][550/625] eta 0:00:33 lr 0.000354 wd 0.0500 time 0.4452 (0.4474) data time 0.0007 (0.0022) model time 0.4445 (0.4450) loss 5.2268 (5.5659) grad_norm 2.1124 (inf) loss_scale 32768.0000 (53939.3394) mem 16696MB [2024-08-04 10:06:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][560/625] eta 0:00:29 lr 0.000355 wd 0.0500 time 0.5472 (0.4475) data time 0.0007 (0.0022) model time 0.5466 (0.4452) loss 4.9549 (5.5636) grad_norm 2.2711 (inf) loss_scale 32768.0000 (53561.9537) mem 16696MB [2024-08-04 10:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][570/625] eta 0:00:24 lr 0.000356 wd 0.0500 time 0.4456 (0.4473) data time 0.0007 (0.0021) model time 0.4449 (0.4450) loss 4.4655 (5.5619) grad_norm 2.1615 (inf) loss_scale 32768.0000 (53197.7863) mem 16696MB [2024-08-04 10:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][580/625] eta 0:00:20 lr 0.000357 wd 0.0500 time 0.4411 (0.4473) data time 0.0007 (0.0021) model time 0.4404 (0.4450) loss 4.9531 (5.5581) grad_norm 1.8323 (inf) loss_scale 32768.0000 (52846.1549) mem 16696MB [2024-08-04 10:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][590/625] eta 0:00:15 lr 0.000358 wd 0.0500 time 0.4447 (0.4472) data time 0.0009 (0.0021) model time 0.4439 (0.4449) loss 5.6440 (5.5596) grad_norm 1.8446 (inf) loss_scale 32768.0000 (52506.4230) mem 16696MB [2024-08-04 10:07:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][600/625] eta 0:00:11 lr 0.000359 wd 0.0500 time 0.4444 (0.4471) data time 0.0009 (0.0021) model time 0.4435 (0.4449) loss 5.6795 (5.5599) grad_norm 1.4694 (inf) loss_scale 32768.0000 (52177.9967) mem 16696MB [2024-08-04 10:07:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][610/625] eta 0:00:06 lr 0.000360 wd 0.0500 time 0.4379 (0.4471) data time 0.0006 (0.0021) model time 0.4373 (0.4448) loss 4.9645 (5.5588) grad_norm 1.7831 (inf) loss_scale 32768.0000 (51860.3208) mem 16696MB [2024-08-04 10:07:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [5/300][620/625] eta 0:00:02 lr 0.000361 wd 0.0500 time 0.4404 (0.4470) data time 0.0006 (0.0020) model time 0.4398 (0.4448) loss 4.6428 (5.5593) grad_norm 1.7263 (inf) loss_scale 32768.0000 (51552.8760) mem 16696MB [2024-08-04 10:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 5 training takes 0:04:39 [2024-08-04 10:07:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:07:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 2.5742 (2.5742) Acc@1 42.920 (42.920) Acc@5 70.947 (70.947) Mem 16696MB [2024-08-04 10:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 3.5742 (2.7411) Acc@1 26.465 (38.326) Acc@5 51.709 (66.983) Mem 16696MB [2024-08-04 10:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 3.7324 (3.1153) Acc@1 23.096 (33.684) Acc@5 47.656 (60.212) Mem 16696MB [2024-08-04 10:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 34.425 Acc@5 60.869 [2024-08-04 10:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 34.4% [2024-08-04 10:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 34.42% [2024-08-04 10:07:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:07:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 6.9727 (6.9727) Acc@1 0.195 (0.195) Acc@5 1.270 (1.270) Mem 16696MB [2024-08-04 10:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.150) Loss 7.0117 (7.0540) Acc@1 0.000 (0.138) Acc@5 0.439 (0.661) Mem 16696MB [2024-08-04 10:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.9375 (7.0324) Acc@1 0.439 (0.186) Acc@5 1.367 (0.795) Mem 16696MB [2024-08-04 10:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.202 Acc@5 0.874 [2024-08-04 10:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.2% [2024-08-04 10:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.20% [2024-08-04 10:07:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:07:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:07:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][0/625] eta 0:08:05 lr 0.000361 wd 0.0500 time 0.7775 (0.7775) data time 0.3903 (0.3903) model time 0.0000 (0.0000) loss 5.1156 (5.1156) grad_norm 2.0639 (2.0639) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:07:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][10/625] eta 0:04:52 lr 0.000362 wd 0.0500 time 0.4462 (0.4761) data time 0.0006 (0.0363) model time 0.0000 (0.0000) loss 6.0080 (5.4413) grad_norm 1.8126 (2.0158) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:07:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][20/625] eta 0:04:39 lr 0.000363 wd 0.0500 time 0.4474 (0.4615) data time 0.0006 (0.0195) model time 0.0000 (0.0000) loss 5.5405 (5.5107) grad_norm 2.1329 (2.0265) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][30/625] eta 0:04:32 lr 0.000364 wd 0.0500 time 0.4546 (0.4572) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 5.0101 (5.4085) grad_norm 2.2073 (2.0360) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:07:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][40/625] eta 0:04:25 lr 0.000365 wd 0.0500 time 0.4447 (0.4545) data time 0.0007 (0.0104) model time 0.0000 (0.0000) loss 4.8511 (5.4734) grad_norm 2.5136 (2.0831) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][50/625] eta 0:04:21 lr 0.000366 wd 0.0500 time 0.4412 (0.4556) data time 0.0008 (0.0086) model time 0.0000 (0.0000) loss 5.3855 (5.4949) grad_norm 2.2176 (2.1253) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:07:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][60/625] eta 0:04:16 lr 0.000367 wd 0.0500 time 0.4456 (0.4538) data time 0.0007 (0.0073) model time 0.4450 (0.4434) loss 5.6665 (5.4741) grad_norm 1.8863 (2.0882) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][70/625] eta 0:04:14 lr 0.000368 wd 0.0500 time 0.4471 (0.4583) data time 0.0009 (0.0064) model time 0.4462 (0.4642) loss 5.6065 (5.4428) grad_norm 1.5848 (2.0644) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][80/625] eta 0:04:08 lr 0.000369 wd 0.0500 time 0.4464 (0.4565) data time 0.0006 (0.0057) model time 0.4458 (0.4572) loss 4.5215 (5.4630) grad_norm 1.9977 (2.0454) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][90/625] eta 0:04:03 lr 0.000370 wd 0.0500 time 0.4494 (0.4553) data time 0.0006 (0.0052) model time 0.4488 (0.4540) loss 5.1094 (5.4278) grad_norm 1.8829 (2.0450) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][100/625] eta 0:03:58 lr 0.000371 wd 0.0500 time 0.4413 (0.4540) data time 0.0007 (0.0048) model time 0.4407 (0.4515) loss 6.2423 (5.4457) grad_norm 2.2293 (2.0431) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][110/625] eta 0:03:53 lr 0.000372 wd 0.0500 time 0.4428 (0.4530) data time 0.0006 (0.0044) model time 0.4422 (0.4499) loss 4.8131 (5.4209) grad_norm 2.3263 (2.0503) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][120/625] eta 0:03:48 lr 0.000373 wd 0.0500 time 0.4440 (0.4522) data time 0.0008 (0.0041) model time 0.4432 (0.4487) loss 5.8985 (5.4206) grad_norm 2.1677 (2.0563) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][130/625] eta 0:03:43 lr 0.000374 wd 0.0500 time 0.4453 (0.4516) data time 0.0007 (0.0039) model time 0.4446 (0.4481) loss 6.0030 (5.4079) grad_norm 2.2346 (2.0502) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][140/625] eta 0:03:38 lr 0.000375 wd 0.0500 time 0.4429 (0.4511) data time 0.0008 (0.0037) model time 0.4421 (0.4476) loss 5.5226 (5.4056) grad_norm 2.2348 (2.0543) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][150/625] eta 0:03:34 lr 0.000376 wd 0.0500 time 0.4384 (0.4506) data time 0.0006 (0.0035) model time 0.4379 (0.4471) loss 5.6902 (5.3974) grad_norm 2.2063 (2.0656) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][160/625] eta 0:03:29 lr 0.000377 wd 0.0500 time 0.4439 (0.4501) data time 0.0006 (0.0033) model time 0.4432 (0.4467) loss 5.6707 (5.3929) grad_norm 2.5485 (2.0807) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][170/625] eta 0:03:24 lr 0.000378 wd 0.0500 time 0.4467 (0.4498) data time 0.0006 (0.0032) model time 0.4461 (0.4465) loss 5.6682 (5.3965) grad_norm 1.6917 (2.0738) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][180/625] eta 0:03:20 lr 0.000379 wd 0.0500 time 0.4411 (0.4495) data time 0.0009 (0.0030) model time 0.4402 (0.4462) loss 4.9288 (5.3790) grad_norm 1.9407 (2.0645) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][190/625] eta 0:03:15 lr 0.000380 wd 0.0500 time 0.4442 (0.4492) data time 0.0006 (0.0029) model time 0.4436 (0.4460) loss 5.6699 (5.3749) grad_norm 2.2664 (2.0536) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][200/625] eta 0:03:10 lr 0.000381 wd 0.0500 time 0.4438 (0.4490) data time 0.0008 (0.0028) model time 0.4430 (0.4459) loss 5.5660 (5.3660) grad_norm 1.8990 (2.0468) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][210/625] eta 0:03:06 lr 0.000382 wd 0.0500 time 0.4413 (0.4488) data time 0.0008 (0.0027) model time 0.4405 (0.4457) loss 5.2738 (5.3660) grad_norm 2.8308 (2.0530) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][220/625] eta 0:03:01 lr 0.000382 wd 0.0500 time 0.4457 (0.4486) data time 0.0007 (0.0026) model time 0.4450 (0.4456) loss 5.7981 (5.3794) grad_norm 2.1368 (2.0540) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][230/625] eta 0:02:57 lr 0.000383 wd 0.0500 time 0.4452 (0.4485) data time 0.0007 (0.0026) model time 0.4445 (0.4456) loss 4.6580 (5.3783) grad_norm 1.9991 (2.0514) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][240/625] eta 0:02:52 lr 0.000384 wd 0.0500 time 0.4401 (0.4484) data time 0.0008 (0.0025) model time 0.4393 (0.4455) loss 5.6306 (5.3738) grad_norm 1.9026 (2.0464) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][250/625] eta 0:02:48 lr 0.000385 wd 0.0500 time 0.4559 (0.4483) data time 0.0006 (0.0024) model time 0.4553 (0.4455) loss 6.2380 (5.3689) grad_norm 2.0663 (2.0430) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][260/625] eta 0:02:44 lr 0.000386 wd 0.0500 time 0.4455 (0.4495) data time 0.0006 (0.0024) model time 0.4449 (0.4471) loss 5.8390 (5.3636) grad_norm 1.7857 (2.0467) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][270/625] eta 0:02:39 lr 0.000387 wd 0.0500 time 0.4457 (0.4493) data time 0.0008 (0.0023) model time 0.4449 (0.4470) loss 5.1234 (5.3638) grad_norm 1.7603 (2.0441) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][280/625] eta 0:02:34 lr 0.000388 wd 0.0500 time 0.4441 (0.4492) data time 0.0007 (0.0023) model time 0.4434 (0.4468) loss 5.5464 (5.3642) grad_norm 1.7017 (2.0355) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][290/625] eta 0:02:30 lr 0.000389 wd 0.0500 time 0.4425 (0.4490) data time 0.0009 (0.0022) model time 0.4416 (0.4467) loss 5.5482 (5.3704) grad_norm 1.8407 (2.0319) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][300/625] eta 0:02:25 lr 0.000390 wd 0.0500 time 0.4452 (0.4488) data time 0.0008 (0.0022) model time 0.4444 (0.4465) loss 5.5777 (5.3806) grad_norm 1.7694 (2.0250) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][310/625] eta 0:02:21 lr 0.000391 wd 0.0500 time 0.4442 (0.4487) data time 0.0008 (0.0021) model time 0.4434 (0.4464) loss 5.4446 (5.3831) grad_norm 2.5817 (2.0286) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][320/625] eta 0:02:16 lr 0.000392 wd 0.0500 time 0.4421 (0.4485) data time 0.0009 (0.0021) model time 0.4412 (0.4463) loss 4.9905 (5.3707) grad_norm 1.6856 (2.0230) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:09:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][330/625] eta 0:02:12 lr 0.000393 wd 0.0500 time 0.4437 (0.4484) data time 0.0009 (0.0021) model time 0.4428 (0.4462) loss 5.8681 (5.3661) grad_norm 2.2810 (2.0229) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][340/625] eta 0:02:07 lr 0.000394 wd 0.0500 time 0.4415 (0.4482) data time 0.0006 (0.0020) model time 0.4409 (0.4460) loss 5.5657 (5.3538) grad_norm 1.8655 (2.0219) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][350/625] eta 0:02:03 lr 0.000395 wd 0.0500 time 0.4434 (0.4481) data time 0.0014 (0.0020) model time 0.4421 (0.4459) loss 5.4671 (5.3482) grad_norm 2.1094 (2.0226) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][360/625] eta 0:01:58 lr 0.000396 wd 0.0500 time 0.4451 (0.4480) data time 0.0007 (0.0020) model time 0.4444 (0.4459) loss 5.7045 (5.3439) grad_norm 2.0547 (2.0292) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][370/625] eta 0:01:54 lr 0.000397 wd 0.0500 time 0.4432 (0.4479) data time 0.0007 (0.0019) model time 0.4425 (0.4458) loss 4.8651 (5.3364) grad_norm 2.0932 (2.0279) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][380/625] eta 0:01:49 lr 0.000398 wd 0.0500 time 0.6043 (0.4483) data time 0.0006 (0.0019) model time 0.6037 (0.4463) loss 5.1713 (5.3385) grad_norm 1.6111 (2.0288) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][390/625] eta 0:01:45 lr 0.000399 wd 0.0500 time 0.4466 (0.4482) data time 0.0008 (0.0019) model time 0.4458 (0.4462) loss 4.5509 (5.3383) grad_norm 2.2864 (2.0246) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][400/625] eta 0:01:40 lr 0.000400 wd 0.0500 time 0.4433 (0.4481) data time 0.0007 (0.0019) model time 0.4426 (0.4462) loss 5.7850 (5.3348) grad_norm 1.6430 (2.0217) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][410/625] eta 0:01:36 lr 0.000401 wd 0.0500 time 0.4459 (0.4481) data time 0.0009 (0.0018) model time 0.4449 (0.4461) loss 4.2375 (5.3302) grad_norm 1.9057 (2.0208) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][420/625] eta 0:01:31 lr 0.000402 wd 0.0500 time 0.4457 (0.4480) data time 0.0006 (0.0018) model time 0.4451 (0.4461) loss 5.9347 (5.3320) grad_norm 1.9243 (2.0222) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][430/625] eta 0:01:27 lr 0.000403 wd 0.0500 time 0.4424 (0.4479) data time 0.0008 (0.0018) model time 0.4416 (0.4460) loss 5.7079 (5.3356) grad_norm 1.6089 (2.0175) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][440/625] eta 0:01:22 lr 0.000404 wd 0.0500 time 0.4441 (0.4478) data time 0.0006 (0.0018) model time 0.4436 (0.4459) loss 5.5746 (5.3342) grad_norm 2.0278 (2.0166) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][450/625] eta 0:01:18 lr 0.000405 wd 0.0500 time 0.4434 (0.4478) data time 0.0007 (0.0018) model time 0.4427 (0.4459) loss 4.8876 (5.3260) grad_norm 2.1384 (2.0116) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][460/625] eta 0:01:13 lr 0.000405 wd 0.0500 time 0.4427 (0.4477) data time 0.0008 (0.0017) model time 0.4418 (0.4458) loss 5.7482 (5.3280) grad_norm 1.7573 (2.0066) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][470/625] eta 0:01:09 lr 0.000406 wd 0.0500 time 0.4435 (0.4476) data time 0.0006 (0.0017) model time 0.4428 (0.4457) loss 6.2496 (5.3299) grad_norm 1.8518 (2.0045) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][480/625] eta 0:01:05 lr 0.000407 wd 0.0500 time 0.4435 (0.4483) data time 0.0007 (0.0017) model time 0.4429 (0.4466) loss 4.8694 (5.3241) grad_norm 1.9160 (2.0019) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][490/625] eta 0:01:00 lr 0.000408 wd 0.0500 time 0.4475 (0.4483) data time 0.0009 (0.0017) model time 0.4467 (0.4465) loss 5.6721 (5.3243) grad_norm 2.6030 (2.0040) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][500/625] eta 0:00:56 lr 0.000409 wd 0.0500 time 0.4438 (0.4482) data time 0.0007 (0.0017) model time 0.4431 (0.4465) loss 5.7269 (5.3217) grad_norm 1.9841 (2.0040) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][510/625] eta 0:00:51 lr 0.000410 wd 0.0500 time 0.4444 (0.4481) data time 0.0008 (0.0017) model time 0.4436 (0.4464) loss 4.4402 (5.3218) grad_norm 1.4522 (2.0000) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][520/625] eta 0:00:47 lr 0.000411 wd 0.0500 time 0.4429 (0.4481) data time 0.0007 (0.0016) model time 0.4422 (0.4464) loss 4.9319 (5.3150) grad_norm 1.7549 (1.9955) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][530/625] eta 0:00:42 lr 0.000412 wd 0.0500 time 0.4450 (0.4480) data time 0.0008 (0.0016) model time 0.4442 (0.4463) loss 5.2353 (5.3151) grad_norm 1.9435 (1.9964) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][540/625] eta 0:00:38 lr 0.000413 wd 0.0500 time 0.4424 (0.4479) data time 0.0009 (0.0016) model time 0.4414 (0.4462) loss 5.8318 (5.3136) grad_norm 2.0109 (1.9984) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][550/625] eta 0:00:33 lr 0.000414 wd 0.0500 time 0.4425 (0.4479) data time 0.0006 (0.0016) model time 0.4419 (0.4462) loss 4.6106 (5.3113) grad_norm 2.6062 (1.9983) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][560/625] eta 0:00:29 lr 0.000415 wd 0.0500 time 0.4416 (0.4478) data time 0.0008 (0.0016) model time 0.4408 (0.4461) loss 5.7672 (5.3101) grad_norm 2.6058 (2.0005) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][570/625] eta 0:00:24 lr 0.000416 wd 0.0500 time 0.3864 (0.4479) data time 0.0007 (0.0016) model time 0.3858 (0.4462) loss 5.8447 (5.3116) grad_norm 2.1123 (2.0007) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][580/625] eta 0:00:20 lr 0.000417 wd 0.0500 time 0.4482 (0.4478) data time 0.0007 (0.0016) model time 0.4475 (0.4461) loss 5.1096 (5.3087) grad_norm 2.0985 (1.9992) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][590/625] eta 0:00:15 lr 0.000418 wd 0.0500 time 0.4428 (0.4477) data time 0.0009 (0.0016) model time 0.4420 (0.4461) loss 5.3095 (5.3048) grad_norm 1.8582 (1.9969) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][600/625] eta 0:00:11 lr 0.000419 wd 0.0500 time 0.4432 (0.4477) data time 0.0009 (0.0015) model time 0.4423 (0.4460) loss 5.4629 (5.2999) grad_norm 2.0787 (1.9960) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][610/625] eta 0:00:06 lr 0.000420 wd 0.0500 time 0.4417 (0.4476) data time 0.0004 (0.0015) model time 0.4413 (0.4460) loss 4.2145 (5.3006) grad_norm 1.5356 (1.9913) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [6/300][620/625] eta 0:00:02 lr 0.000421 wd 0.0500 time 0.4381 (0.4478) data time 0.0006 (0.0015) model time 0.4375 (0.4462) loss 5.6072 (5.2997) grad_norm 1.9832 (1.9893) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 6 training takes 0:04:40 [2024-08-04 10:12:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:12:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:12:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 2.1016 (2.1016) Acc@1 52.783 (52.783) Acc@5 78.467 (78.467) Mem 16696MB [2024-08-04 10:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.150) Loss 3.1113 (2.3132) Acc@1 33.691 (46.214) Acc@5 60.693 (74.871) Mem 16696MB [2024-08-04 10:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 3.3223 (2.7094) Acc@1 29.102 (40.555) Acc@5 57.324 (67.869) Mem 16696MB [2024-08-04 10:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 40.913 Acc@5 68.122 [2024-08-04 10:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 40.9% [2024-08-04 10:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 40.91% [2024-08-04 10:12:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:12:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:12:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 7.0547 (7.0547) Acc@1 0.439 (0.439) Acc@5 1.465 (1.465) Mem 16696MB [2024-08-04 10:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.151) Loss 7.0508 (7.1168) Acc@1 0.049 (0.164) Acc@5 0.635 (0.772) Mem 16696MB [2024-08-04 10:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.8867 (7.0512) Acc@1 0.635 (0.205) Acc@5 1.904 (0.960) Mem 16696MB [2024-08-04 10:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.246 Acc@5 1.094 [2024-08-04 10:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.2% [2024-08-04 10:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.25% [2024-08-04 10:12:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:12:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:12:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][0/625] eta 0:08:01 lr 0.000421 wd 0.0500 time 0.7702 (0.7702) data time 0.3865 (0.3865) model time 0.0000 (0.0000) loss 5.2487 (5.2487) grad_norm 1.5508 (1.5508) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][10/625] eta 0:04:50 lr 0.000422 wd 0.0500 time 0.4489 (0.4730) data time 0.0006 (0.0359) model time 0.0000 (0.0000) loss 4.6619 (4.9694) grad_norm 1.6315 (1.8878) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][20/625] eta 0:04:38 lr 0.000423 wd 0.0500 time 0.4510 (0.4596) data time 0.0009 (0.0192) model time 0.0000 (0.0000) loss 5.5127 (5.1297) grad_norm 2.4780 (1.9451) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][30/625] eta 0:04:30 lr 0.000424 wd 0.0500 time 0.4436 (0.4552) data time 0.0006 (0.0133) model time 0.0000 (0.0000) loss 4.9658 (5.2027) grad_norm 2.1862 (1.9809) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][40/625] eta 0:04:24 lr 0.000425 wd 0.0500 time 0.4431 (0.4522) data time 0.0009 (0.0103) model time 0.0000 (0.0000) loss 4.7241 (5.1738) grad_norm 2.3486 (2.0245) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][50/625] eta 0:04:20 lr 0.000426 wd 0.0500 time 0.6017 (0.4534) data time 0.0006 (0.0084) model time 0.0000 (0.0000) loss 4.7291 (5.1664) grad_norm 1.8788 (1.9974) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][60/625] eta 0:04:15 lr 0.000427 wd 0.0500 time 0.4453 (0.4522) data time 0.0008 (0.0071) model time 0.4445 (0.4450) loss 4.5077 (5.1326) grad_norm 1.4732 (1.9535) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][70/625] eta 0:04:10 lr 0.000428 wd 0.0500 time 0.4416 (0.4510) data time 0.0008 (0.0063) model time 0.4408 (0.4440) loss 5.4137 (5.1738) grad_norm 1.6516 (1.9314) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][80/625] eta 0:04:05 lr 0.000429 wd 0.0500 time 0.4437 (0.4502) data time 0.0006 (0.0056) model time 0.4431 (0.4439) loss 4.2945 (5.1468) grad_norm 1.6524 (1.9189) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][90/625] eta 0:04:00 lr 0.000430 wd 0.0500 time 0.4479 (0.4499) data time 0.0006 (0.0051) model time 0.4472 (0.4445) loss 5.5265 (5.1742) grad_norm 2.0317 (1.9337) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][100/625] eta 0:03:55 lr 0.000431 wd 0.0500 time 0.4444 (0.4493) data time 0.0008 (0.0046) model time 0.4437 (0.4442) loss 4.4205 (5.1961) grad_norm 1.9367 (1.9223) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][110/625] eta 0:03:51 lr 0.000432 wd 0.0500 time 0.4427 (0.4487) data time 0.0006 (0.0043) model time 0.4421 (0.4439) loss 4.1546 (5.1994) grad_norm 1.4433 (1.9054) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][120/625] eta 0:03:46 lr 0.000433 wd 0.0500 time 0.4489 (0.4484) data time 0.0008 (0.0040) model time 0.4481 (0.4439) loss 5.4685 (5.1763) grad_norm 1.7380 (1.8911) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][130/625] eta 0:03:41 lr 0.000434 wd 0.0500 time 0.4435 (0.4480) data time 0.0008 (0.0037) model time 0.4427 (0.4438) loss 5.4086 (5.1747) grad_norm 1.9061 (1.8887) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][140/625] eta 0:03:37 lr 0.000435 wd 0.0500 time 0.4450 (0.4479) data time 0.0007 (0.0035) model time 0.4443 (0.4439) loss 4.7587 (5.1678) grad_norm 1.9595 (1.8944) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][150/625] eta 0:03:32 lr 0.000436 wd 0.0500 time 0.4440 (0.4477) data time 0.0006 (0.0034) model time 0.4434 (0.4440) loss 5.5547 (5.1684) grad_norm 1.6903 (1.9026) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][160/625] eta 0:03:28 lr 0.000437 wd 0.0500 time 0.4427 (0.4474) data time 0.0006 (0.0032) model time 0.4422 (0.4439) loss 4.3515 (5.1728) grad_norm 1.4289 (1.8956) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][170/625] eta 0:03:23 lr 0.000438 wd 0.0500 time 0.4387 (0.4472) data time 0.0009 (0.0031) model time 0.4378 (0.4437) loss 5.5181 (5.1668) grad_norm 1.7781 (1.8859) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][180/625] eta 0:03:18 lr 0.000439 wd 0.0500 time 0.4442 (0.4470) data time 0.0008 (0.0029) model time 0.4435 (0.4437) loss 4.9238 (5.1758) grad_norm 1.6147 (1.8788) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][190/625] eta 0:03:15 lr 0.000440 wd 0.0500 time 0.4415 (0.4485) data time 0.0008 (0.0028) model time 0.4407 (0.4460) loss 5.6662 (5.1795) grad_norm 1.9229 (1.8892) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][200/625] eta 0:03:10 lr 0.000440 wd 0.0500 time 0.4434 (0.4483) data time 0.0006 (0.0027) model time 0.4428 (0.4458) loss 5.3918 (5.1738) grad_norm 1.6434 (1.8906) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][210/625] eta 0:03:05 lr 0.000441 wd 0.0500 time 0.4397 (0.4481) data time 0.0007 (0.0026) model time 0.4390 (0.4456) loss 5.7285 (5.1875) grad_norm 2.1494 (1.9011) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:13:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][220/625] eta 0:03:01 lr 0.000442 wd 0.0500 time 0.4453 (0.4480) data time 0.0008 (0.0025) model time 0.4445 (0.4455) loss 5.3496 (5.1883) grad_norm 1.8443 (1.8977) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][230/625] eta 0:02:56 lr 0.000443 wd 0.0500 time 0.4426 (0.4477) data time 0.0006 (0.0025) model time 0.4420 (0.4453) loss 3.9078 (5.1829) grad_norm 2.2409 (1.9006) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][240/625] eta 0:02:52 lr 0.000444 wd 0.0500 time 0.4417 (0.4475) data time 0.0006 (0.0024) model time 0.4411 (0.4452) loss 4.3749 (5.1752) grad_norm 2.1037 (1.9042) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][250/625] eta 0:02:47 lr 0.000445 wd 0.0500 time 0.4429 (0.4474) data time 0.0008 (0.0023) model time 0.4420 (0.4451) loss 5.1992 (5.1763) grad_norm 2.2609 (1.9120) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][260/625] eta 0:02:43 lr 0.000446 wd 0.0500 time 0.4428 (0.4472) data time 0.0007 (0.0023) model time 0.4421 (0.4450) loss 5.1476 (5.1755) grad_norm 1.6666 (1.9078) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][270/625] eta 0:02:38 lr 0.000447 wd 0.0500 time 0.4412 (0.4471) data time 0.0007 (0.0022) model time 0.4405 (0.4449) loss 5.8590 (5.1761) grad_norm 2.3082 (1.9076) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][280/625] eta 0:02:34 lr 0.000448 wd 0.0500 time 0.4455 (0.4470) data time 0.0009 (0.0022) model time 0.4446 (0.4448) loss 5.4844 (5.1775) grad_norm 1.4468 (1.9034) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][290/625] eta 0:02:29 lr 0.000449 wd 0.0500 time 0.4493 (0.4469) data time 0.0007 (0.0021) model time 0.4486 (0.4448) loss 5.9405 (5.1830) grad_norm 2.0466 (1.8979) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][300/625] eta 0:02:25 lr 0.000450 wd 0.0500 time 0.4442 (0.4469) data time 0.0007 (0.0021) model time 0.4436 (0.4447) loss 3.7308 (5.1795) grad_norm 2.1407 (1.8993) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][310/625] eta 0:02:20 lr 0.000451 wd 0.0500 time 0.4433 (0.4468) data time 0.0008 (0.0020) model time 0.4425 (0.4447) loss 5.1910 (5.1837) grad_norm 1.9354 (1.8968) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][320/625] eta 0:02:16 lr 0.000452 wd 0.0500 time 0.4434 (0.4467) data time 0.0008 (0.0020) model time 0.4426 (0.4446) loss 5.1179 (5.1867) grad_norm 2.0914 (1.8985) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][330/625] eta 0:02:11 lr 0.000453 wd 0.0500 time 0.4431 (0.4466) data time 0.0007 (0.0020) model time 0.4424 (0.4446) loss 5.1321 (5.1956) grad_norm 2.0601 (1.9043) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][340/625] eta 0:02:07 lr 0.000454 wd 0.0500 time 0.4419 (0.4465) data time 0.0010 (0.0019) model time 0.4409 (0.4445) loss 5.5482 (5.1947) grad_norm 1.7592 (1.9015) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:14:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][350/625] eta 0:02:02 lr 0.000455 wd 0.0500 time 0.4380 (0.4465) data time 0.0006 (0.0019) model time 0.4374 (0.4445) loss 6.0801 (5.1981) grad_norm 1.9569 (1.8968) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][360/625] eta 0:01:58 lr 0.000456 wd 0.0500 time 0.4426 (0.4464) data time 0.0008 (0.0019) model time 0.4418 (0.4445) loss 5.5777 (5.1954) grad_norm 1.8275 (1.8917) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][370/625] eta 0:01:53 lr 0.000457 wd 0.0500 time 0.4505 (0.4464) data time 0.0008 (0.0018) model time 0.4497 (0.4445) loss 5.5212 (5.1924) grad_norm 1.6375 (1.8888) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][380/625] eta 0:01:49 lr 0.000458 wd 0.0500 time 0.4482 (0.4463) data time 0.0006 (0.0018) model time 0.4476 (0.4444) loss 3.9308 (5.1903) grad_norm 1.9862 (1.8857) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][390/625] eta 0:01:44 lr 0.000459 wd 0.0500 time 0.4413 (0.4467) data time 0.0008 (0.0018) model time 0.4405 (0.4449) loss 4.7921 (5.1911) grad_norm 1.9641 (1.8849) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][400/625] eta 0:01:40 lr 0.000460 wd 0.0500 time 0.4410 (0.4466) data time 0.0006 (0.0018) model time 0.4404 (0.4448) loss 5.7370 (5.1829) grad_norm 1.8744 (1.8923) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][410/625] eta 0:01:36 lr 0.000461 wd 0.0500 time 0.4412 (0.4475) data time 0.0009 (0.0017) model time 0.4404 (0.4459) loss 5.1039 (5.1799) grad_norm 2.1423 (1.8960) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][420/625] eta 0:01:31 lr 0.000462 wd 0.0500 time 0.4426 (0.4474) data time 0.0007 (0.0017) model time 0.4420 (0.4458) loss 4.0566 (5.1807) grad_norm 1.7199 (1.8969) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][430/625] eta 0:01:27 lr 0.000463 wd 0.0500 time 0.4426 (0.4474) data time 0.0006 (0.0017) model time 0.4421 (0.4458) loss 4.9575 (5.1833) grad_norm 1.7835 (1.8929) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][440/625] eta 0:01:22 lr 0.000463 wd 0.0500 time 0.4437 (0.4473) data time 0.0008 (0.0017) model time 0.4428 (0.4457) loss 5.6409 (5.1838) grad_norm 1.6126 (1.8924) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][450/625] eta 0:01:18 lr 0.000464 wd 0.0500 time 0.4430 (0.4473) data time 0.0008 (0.0016) model time 0.4423 (0.4457) loss 5.4710 (5.1762) grad_norm 2.3644 (1.8911) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][460/625] eta 0:01:13 lr 0.000465 wd 0.0500 time 0.4409 (0.4472) data time 0.0008 (0.0016) model time 0.4400 (0.4457) loss 5.2442 (5.1757) grad_norm 1.7994 (1.8898) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][470/625] eta 0:01:09 lr 0.000466 wd 0.0500 time 0.4429 (0.4471) data time 0.0006 (0.0016) model time 0.4423 (0.4456) loss 4.0913 (5.1755) grad_norm 2.4369 (1.8954) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:15:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][480/625] eta 0:01:04 lr 0.000467 wd 0.0500 time 0.4428 (0.4471) data time 0.0006 (0.0016) model time 0.4421 (0.4456) loss 4.3179 (5.1705) grad_norm 1.4499 (1.8932) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][490/625] eta 0:01:00 lr 0.000468 wd 0.0500 time 0.4439 (0.4471) data time 0.0009 (0.0016) model time 0.4430 (0.4456) loss 4.3636 (5.1606) grad_norm 1.5790 (1.8910) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][500/625] eta 0:00:55 lr 0.000469 wd 0.0500 time 0.4423 (0.4471) data time 0.0009 (0.0016) model time 0.4414 (0.4456) loss 4.8276 (5.1531) grad_norm 1.9436 (1.8917) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][510/625] eta 0:00:51 lr 0.000470 wd 0.0500 time 0.4435 (0.4471) data time 0.0008 (0.0016) model time 0.4426 (0.4456) loss 5.1867 (5.1518) grad_norm 1.7871 (1.8905) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][520/625] eta 0:00:46 lr 0.000471 wd 0.0500 time 0.4427 (0.4471) data time 0.0006 (0.0015) model time 0.4421 (0.4456) loss 4.6876 (5.1479) grad_norm 1.4829 (1.8902) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][530/625] eta 0:00:42 lr 0.000472 wd 0.0500 time 0.4456 (0.4470) data time 0.0006 (0.0015) model time 0.4450 (0.4456) loss 4.9858 (5.1457) grad_norm 1.5993 (1.8874) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][540/625] eta 0:00:37 lr 0.000473 wd 0.0500 time 0.4475 (0.4470) data time 0.0009 (0.0015) model time 0.4466 (0.4456) loss 5.5461 (5.1445) grad_norm 1.8675 (1.8835) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][550/625] eta 0:00:33 lr 0.000474 wd 0.0500 time 0.6506 (0.4473) data time 0.0007 (0.0015) model time 0.6499 (0.4459) loss 4.5021 (5.1402) grad_norm 1.9101 (1.8834) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][560/625] eta 0:00:29 lr 0.000475 wd 0.0500 time 0.4438 (0.4476) data time 0.0006 (0.0015) model time 0.4431 (0.4462) loss 6.0148 (5.1461) grad_norm 1.7294 (1.8836) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][570/625] eta 0:00:24 lr 0.000476 wd 0.0500 time 0.4458 (0.4475) data time 0.0007 (0.0015) model time 0.4451 (0.4461) loss 5.8857 (5.1475) grad_norm 1.7248 (1.8797) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][580/625] eta 0:00:20 lr 0.000477 wd 0.0500 time 0.4444 (0.4476) data time 0.0009 (0.0015) model time 0.4435 (0.4462) loss 5.1445 (5.1449) grad_norm 1.8118 (1.8766) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][590/625] eta 0:00:15 lr 0.000478 wd 0.0500 time 0.4454 (0.4475) data time 0.0008 (0.0015) model time 0.4446 (0.4462) loss 5.0209 (5.1415) grad_norm 1.4312 (1.8750) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][600/625] eta 0:00:11 lr 0.000479 wd 0.0500 time 0.4418 (0.4475) data time 0.0009 (0.0014) model time 0.4410 (0.4461) loss 5.3077 (5.1407) grad_norm 1.6798 (1.8726) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][610/625] eta 0:00:06 lr 0.000480 wd 0.0500 time 0.4388 (0.4474) data time 0.0004 (0.0014) model time 0.4384 (0.4461) loss 4.7574 (5.1431) grad_norm 2.0538 (1.8711) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [7/300][620/625] eta 0:00:02 lr 0.000481 wd 0.0500 time 0.4353 (0.4473) data time 0.0005 (0.0014) model time 0.4348 (0.4459) loss 6.0603 (5.1429) grad_norm 1.5830 (1.8680) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:00 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 7 training takes 0:04:39 [2024-08-04 10:17:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:17:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:17:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 1.8115 (1.8115) Acc@1 59.473 (59.473) Acc@5 82.373 (82.373) Mem 16696MB [2024-08-04 10:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 2.8398 (2.0481) Acc@1 39.307 (51.669) Acc@5 65.869 (78.960) Mem 16696MB [2024-08-04 10:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 3.0156 (2.4342) Acc@1 35.547 (45.603) Acc@5 61.914 (72.133) Mem 16696MB [2024-08-04 10:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 45.805 Acc@5 72.177 [2024-08-04 10:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 45.8% [2024-08-04 10:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 45.81% [2024-08-04 10:17:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:17:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:17:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 7.0625 (7.0625) Acc@1 0.439 (0.439) Acc@5 2.148 (2.148) Mem 16696MB [2024-08-04 10:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 7.0547 (7.1254) Acc@1 0.098 (0.262) Acc@5 0.830 (1.083) Mem 16696MB [2024-08-04 10:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.8398 (7.0333) Acc@1 0.879 (0.291) Acc@5 2.246 (1.307) Mem 16696MB [2024-08-04 10:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.370 Acc@5 1.530 [2024-08-04 10:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.4% [2024-08-04 10:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.37% [2024-08-04 10:17:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:17:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:17:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][0/625] eta 0:07:53 lr 0.000481 wd 0.0500 time 0.7581 (0.7581) data time 0.3776 (0.3776) model time 0.0000 (0.0000) loss 4.9853 (4.9853) grad_norm 1.8679 (1.8679) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][10/625] eta 0:04:50 lr 0.000482 wd 0.0500 time 0.4457 (0.4728) data time 0.0009 (0.0351) model time 0.0000 (0.0000) loss 5.0732 (4.9686) grad_norm 1.6406 (1.8063) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][20/625] eta 0:04:37 lr 0.000483 wd 0.0500 time 0.4431 (0.4588) data time 0.0007 (0.0187) model time 0.0000 (0.0000) loss 3.8403 (5.0888) grad_norm 1.9842 (1.9124) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][30/625] eta 0:04:30 lr 0.000484 wd 0.0500 time 0.4457 (0.4543) data time 0.0006 (0.0130) model time 0.0000 (0.0000) loss 4.5480 (5.0189) grad_norm 1.7943 (1.9349) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][40/625] eta 0:04:24 lr 0.000485 wd 0.0500 time 0.4464 (0.4521) data time 0.0006 (0.0100) model time 0.0000 (0.0000) loss 5.0484 (5.0302) grad_norm 1.8872 (1.9014) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][50/625] eta 0:04:19 lr 0.000486 wd 0.0500 time 0.4489 (0.4507) data time 0.0006 (0.0082) model time 0.0000 (0.0000) loss 3.8897 (5.0190) grad_norm 1.8924 (1.8816) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][60/625] eta 0:04:15 lr 0.000487 wd 0.0500 time 0.4436 (0.4523) data time 0.0008 (0.0070) model time 0.4429 (0.4597) loss 5.4927 (5.0581) grad_norm 2.0353 (1.8674) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][70/625] eta 0:04:10 lr 0.000488 wd 0.0500 time 0.4428 (0.4511) data time 0.0006 (0.0061) model time 0.4421 (0.4515) loss 4.4814 (5.0228) grad_norm 1.7870 (1.8792) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][80/625] eta 0:04:05 lr 0.000489 wd 0.0500 time 0.4460 (0.4504) data time 0.0009 (0.0054) model time 0.4452 (0.4491) loss 5.6678 (5.0320) grad_norm 2.1095 (1.8983) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][90/625] eta 0:04:00 lr 0.000490 wd 0.0500 time 0.4480 (0.4499) data time 0.0009 (0.0049) model time 0.4471 (0.4481) loss 5.4762 (5.0258) grad_norm 1.6345 (1.8928) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][100/625] eta 0:03:55 lr 0.000491 wd 0.0500 time 0.4406 (0.4494) data time 0.0008 (0.0045) model time 0.4398 (0.4473) loss 4.7107 (4.9985) grad_norm 1.7367 (1.8826) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][110/625] eta 0:03:51 lr 0.000492 wd 0.0500 time 0.4389 (0.4490) data time 0.0006 (0.0042) model time 0.4382 (0.4468) loss 4.1475 (5.0067) grad_norm 1.8056 (1.8725) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][120/625] eta 0:03:48 lr 0.000493 wd 0.0500 time 0.4424 (0.4520) data time 0.0007 (0.0039) model time 0.4417 (0.4522) loss 5.0701 (4.9991) grad_norm 1.9356 (1.8489) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][130/625] eta 0:03:43 lr 0.000494 wd 0.0500 time 0.4403 (0.4514) data time 0.0008 (0.0037) model time 0.4394 (0.4511) loss 5.0772 (5.0094) grad_norm 1.7228 (1.8457) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][140/625] eta 0:03:38 lr 0.000495 wd 0.0500 time 0.4424 (0.4509) data time 0.0010 (0.0035) model time 0.4415 (0.4502) loss 5.5436 (5.0177) grad_norm 1.5896 (1.8450) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][150/625] eta 0:03:33 lr 0.000496 wd 0.0500 time 0.4456 (0.4505) data time 0.0008 (0.0033) model time 0.4448 (0.4496) loss 5.2520 (5.0184) grad_norm 1.7268 (1.8409) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][160/625] eta 0:03:29 lr 0.000497 wd 0.0500 time 0.4414 (0.4501) data time 0.0008 (0.0031) model time 0.4405 (0.4490) loss 5.2350 (5.0273) grad_norm 1.8373 (1.8338) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][170/625] eta 0:03:24 lr 0.000497 wd 0.0500 time 0.4438 (0.4497) data time 0.0006 (0.0030) model time 0.4433 (0.4486) loss 4.4687 (5.0229) grad_norm 1.7897 (1.8376) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][180/625] eta 0:03:20 lr 0.000498 wd 0.0500 time 0.4465 (0.4495) data time 0.0006 (0.0029) model time 0.4459 (0.4482) loss 5.5462 (5.0371) grad_norm 2.0510 (1.8486) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][190/625] eta 0:03:15 lr 0.000499 wd 0.0500 time 0.4451 (0.4492) data time 0.0008 (0.0028) model time 0.4443 (0.4479) loss 5.2495 (5.0243) grad_norm 1.6934 (1.8394) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][200/625] eta 0:03:10 lr 0.000500 wd 0.0500 time 0.4431 (0.4490) data time 0.0007 (0.0027) model time 0.4423 (0.4476) loss 5.5378 (5.0260) grad_norm 1.7191 (1.8308) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][210/625] eta 0:03:06 lr 0.000501 wd 0.0500 time 0.4414 (0.4487) data time 0.0008 (0.0026) model time 0.4406 (0.4473) loss 5.2776 (5.0380) grad_norm 2.1965 (1.8549) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][220/625] eta 0:03:01 lr 0.000502 wd 0.0500 time 0.4424 (0.4485) data time 0.0007 (0.0025) model time 0.4417 (0.4470) loss 4.7185 (5.0390) grad_norm 1.6907 (1.8526) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][230/625] eta 0:02:57 lr 0.000503 wd 0.0500 time 0.4453 (0.4483) data time 0.0008 (0.0024) model time 0.4445 (0.4468) loss 5.3817 (5.0314) grad_norm 1.5600 (1.8469) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:18:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][240/625] eta 0:02:52 lr 0.000504 wd 0.0500 time 0.4456 (0.4481) data time 0.0008 (0.0024) model time 0.4448 (0.4466) loss 5.3021 (5.0444) grad_norm 2.3315 (1.8468) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][250/625] eta 0:02:47 lr 0.000505 wd 0.0500 time 0.4425 (0.4479) data time 0.0006 (0.0023) model time 0.4419 (0.4464) loss 5.3748 (5.0440) grad_norm 1.6922 (1.8454) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][260/625] eta 0:02:43 lr 0.000506 wd 0.0500 time 0.4419 (0.4477) data time 0.0006 (0.0022) model time 0.4413 (0.4462) loss 5.7231 (5.0467) grad_norm 1.7986 (1.8476) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][270/625] eta 0:02:38 lr 0.000507 wd 0.0500 time 0.4429 (0.4476) data time 0.0011 (0.0022) model time 0.4418 (0.4461) loss 5.1333 (5.0468) grad_norm 1.7168 (1.8447) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][280/625] eta 0:02:34 lr 0.000508 wd 0.0500 time 0.4445 (0.4475) data time 0.0008 (0.0021) model time 0.4437 (0.4460) loss 5.3352 (5.0464) grad_norm 1.9588 (1.8392) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][290/625] eta 0:02:29 lr 0.000509 wd 0.0500 time 0.4418 (0.4474) data time 0.0008 (0.0021) model time 0.4409 (0.4459) loss 5.0410 (5.0495) grad_norm 2.1481 (1.8346) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][300/625] eta 0:02:25 lr 0.000510 wd 0.0500 time 0.4442 (0.4472) data time 0.0009 (0.0020) model time 0.4432 (0.4458) loss 5.6942 (5.0523) grad_norm 2.0203 (1.8357) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][310/625] eta 0:02:20 lr 0.000511 wd 0.0500 time 0.4389 (0.4471) data time 0.0008 (0.0020) model time 0.4380 (0.4456) loss 4.5563 (5.0501) grad_norm 2.9010 (1.8409) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][320/625] eta 0:02:16 lr 0.000512 wd 0.0500 time 0.4469 (0.4470) data time 0.0008 (0.0020) model time 0.4460 (0.4455) loss 4.0826 (5.0361) grad_norm 1.6619 (1.8472) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][330/625] eta 0:02:11 lr 0.000513 wd 0.0500 time 0.4444 (0.4469) data time 0.0007 (0.0019) model time 0.4438 (0.4454) loss 4.9852 (5.0378) grad_norm 1.2133 (1.8467) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][340/625] eta 0:02:07 lr 0.000514 wd 0.0500 time 0.4476 (0.4468) data time 0.0008 (0.0019) model time 0.4468 (0.4453) loss 5.7957 (5.0413) grad_norm 1.9176 (1.8454) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][350/625] eta 0:02:02 lr 0.000515 wd 0.0500 time 0.4420 (0.4467) data time 0.0009 (0.0019) model time 0.4411 (0.4452) loss 5.3633 (5.0468) grad_norm 1.5050 (1.8459) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][360/625] eta 0:01:58 lr 0.000516 wd 0.0500 time 0.4470 (0.4466) data time 0.0006 (0.0018) model time 0.4464 (0.4451) loss 4.5296 (5.0356) grad_norm 1.6688 (1.8414) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][370/625] eta 0:01:53 lr 0.000517 wd 0.0500 time 0.4438 (0.4465) data time 0.0008 (0.0018) model time 0.4430 (0.4451) loss 5.3198 (5.0316) grad_norm 1.5587 (1.8345) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][380/625] eta 0:01:49 lr 0.000518 wd 0.0500 time 0.4425 (0.4464) data time 0.0009 (0.0018) model time 0.4416 (0.4450) loss 4.4350 (5.0340) grad_norm 1.4187 (1.8306) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][390/625] eta 0:01:44 lr 0.000519 wd 0.0500 time 0.4448 (0.4468) data time 0.0008 (0.0018) model time 0.4440 (0.4454) loss 4.4751 (5.0277) grad_norm 1.9546 (1.8385) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][400/625] eta 0:01:40 lr 0.000520 wd 0.0500 time 0.4427 (0.4467) data time 0.0006 (0.0017) model time 0.4421 (0.4453) loss 5.7508 (5.0361) grad_norm 1.9049 (1.8418) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][410/625] eta 0:01:36 lr 0.000520 wd 0.0500 time 0.4484 (0.4466) data time 0.0009 (0.0017) model time 0.4475 (0.4453) loss 4.5846 (5.0328) grad_norm 1.7130 (1.8374) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][420/625] eta 0:01:31 lr 0.000521 wd 0.0500 time 0.4418 (0.4465) data time 0.0006 (0.0017) model time 0.4412 (0.4452) loss 4.4870 (5.0312) grad_norm 2.3962 (1.8338) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][430/625] eta 0:01:27 lr 0.000522 wd 0.0500 time 0.4463 (0.4465) data time 0.0006 (0.0017) model time 0.4457 (0.4451) loss 5.3079 (5.0233) grad_norm 1.8754 (1.8330) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][440/625] eta 0:01:22 lr 0.000523 wd 0.0500 time 0.4457 (0.4464) data time 0.0009 (0.0017) model time 0.4448 (0.4451) loss 5.5391 (5.0217) grad_norm 1.4235 (1.8302) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][450/625] eta 0:01:18 lr 0.000524 wd 0.0500 time 0.4425 (0.4468) data time 0.0006 (0.0016) model time 0.4418 (0.4455) loss 4.6304 (5.0231) grad_norm 1.5731 (1.8267) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][460/625] eta 0:01:13 lr 0.000525 wd 0.0500 time 0.4435 (0.4468) data time 0.0006 (0.0016) model time 0.4429 (0.4455) loss 5.4879 (5.0231) grad_norm 1.2414 (1.8220) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][470/625] eta 0:01:09 lr 0.000526 wd 0.0500 time 0.4425 (0.4467) data time 0.0008 (0.0016) model time 0.4417 (0.4454) loss 4.6542 (5.0257) grad_norm 2.1628 (1.8208) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][480/625] eta 0:01:04 lr 0.000527 wd 0.0500 time 0.4435 (0.4467) data time 0.0006 (0.0016) model time 0.4429 (0.4454) loss 4.9293 (5.0201) grad_norm 1.6089 (1.8152) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:20:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][490/625] eta 0:01:00 lr 0.000528 wd 0.0500 time 0.4490 (0.4466) data time 0.0007 (0.0016) model time 0.4483 (0.4454) loss 5.1725 (5.0163) grad_norm 1.6562 (1.8128) loss_scale 65536.0000 (33435.3727) mem 16696MB [2024-08-04 10:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][500/625] eta 0:00:55 lr 0.000529 wd 0.0500 time 0.4418 (0.4466) data time 0.0006 (0.0016) model time 0.4411 (0.4453) loss 4.4989 (5.0135) grad_norm 1.8079 (inf) loss_scale 32768.0000 (33814.4830) mem 16696MB [2024-08-04 10:20:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][510/625] eta 0:00:51 lr 0.000530 wd 0.0500 time 0.4433 (0.4465) data time 0.0007 (0.0015) model time 0.4426 (0.4453) loss 4.9451 (5.0141) grad_norm 2.7048 (inf) loss_scale 32768.0000 (33794.0039) mem 16696MB [2024-08-04 10:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][520/625] eta 0:00:46 lr 0.000531 wd 0.0500 time 0.4447 (0.4465) data time 0.0008 (0.0015) model time 0.4439 (0.4452) loss 5.1334 (5.0144) grad_norm 1.5844 (inf) loss_scale 32768.0000 (33774.3109) mem 16696MB [2024-08-04 10:21:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][530/625] eta 0:00:42 lr 0.000532 wd 0.0500 time 0.4447 (0.4464) data time 0.0007 (0.0015) model time 0.4440 (0.4452) loss 4.8012 (5.0132) grad_norm 1.8604 (inf) loss_scale 32768.0000 (33755.3597) mem 16696MB [2024-08-04 10:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][540/625] eta 0:00:37 lr 0.000533 wd 0.0500 time 0.4441 (0.4464) data time 0.0006 (0.0015) model time 0.4435 (0.4451) loss 4.9862 (5.0088) grad_norm 1.6731 (inf) loss_scale 32768.0000 (33737.1091) mem 16696MB [2024-08-04 10:21:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][550/625] eta 0:00:33 lr 0.000534 wd 0.0500 time 0.4446 (0.4463) data time 0.0008 (0.0015) model time 0.4437 (0.4451) loss 5.2993 (5.0034) grad_norm 1.4942 (inf) loss_scale 32768.0000 (33719.5209) mem 16696MB [2024-08-04 10:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][560/625] eta 0:00:29 lr 0.000535 wd 0.0500 time 0.4454 (0.4463) data time 0.0006 (0.0015) model time 0.4449 (0.4450) loss 4.3474 (5.0005) grad_norm 1.5129 (inf) loss_scale 32768.0000 (33702.5597) mem 16696MB [2024-08-04 10:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][570/625] eta 0:00:24 lr 0.000536 wd 0.0500 time 0.4437 (0.4462) data time 0.0005 (0.0015) model time 0.4431 (0.4450) loss 4.0600 (5.0031) grad_norm 1.4655 (inf) loss_scale 32768.0000 (33686.1926) mem 16696MB [2024-08-04 10:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][580/625] eta 0:00:20 lr 0.000537 wd 0.0500 time 0.4443 (0.4463) data time 0.0007 (0.0014) model time 0.4436 (0.4451) loss 5.6090 (4.9989) grad_norm 1.5736 (inf) loss_scale 32768.0000 (33670.3890) mem 16696MB [2024-08-04 10:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][590/625] eta 0:00:15 lr 0.000538 wd 0.0500 time 0.4433 (0.4462) data time 0.0006 (0.0014) model time 0.4427 (0.4450) loss 5.3296 (5.0003) grad_norm 1.4781 (inf) loss_scale 32768.0000 (33655.1201) mem 16696MB [2024-08-04 10:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][600/625] eta 0:00:11 lr 0.000539 wd 0.0500 time 0.4427 (0.4462) data time 0.0007 (0.0014) model time 0.4421 (0.4450) loss 3.8304 (4.9994) grad_norm 1.7931 (inf) loss_scale 32768.0000 (33640.3594) mem 16696MB [2024-08-04 10:21:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][610/625] eta 0:00:06 lr 0.000540 wd 0.0500 time 0.4377 (0.4461) data time 0.0006 (0.0014) model time 0.4371 (0.4449) loss 4.0380 (4.9975) grad_norm 1.7729 (inf) loss_scale 32768.0000 (33626.0818) mem 16696MB [2024-08-04 10:21:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [8/300][620/625] eta 0:00:02 lr 0.000541 wd 0.0500 time 0.4369 (0.4460) data time 0.0004 (0.0014) model time 0.4365 (0.4448) loss 4.4007 (4.9946) grad_norm 1.3868 (inf) loss_scale 32768.0000 (33612.2641) mem 16696MB [2024-08-04 10:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 8 training takes 0:04:38 [2024-08-04 10:21:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:21:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:21:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.501 (0.501) Loss 1.5986 (1.5986) Acc@1 62.598 (62.598) Acc@5 85.840 (85.840) Mem 16696MB [2024-08-04 10:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.154) Loss 2.5566 (1.8592) Acc@1 43.066 (55.562) Acc@5 70.654 (82.218) Mem 16696MB [2024-08-04 10:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 2.7754 (2.2119) Acc@1 39.844 (49.809) Acc@5 65.674 (75.979) Mem 16696MB [2024-08-04 10:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 50.036 Acc@5 76.124 [2024-08-04 10:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 50.0% [2024-08-04 10:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 50.04% [2024-08-04 10:21:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:21:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:21:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.488 (0.488) Loss 6.9766 (6.9766) Acc@1 0.439 (0.439) Acc@5 2.637 (2.637) Mem 16696MB [2024-08-04 10:21:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 7.0078 (7.0664) Acc@1 0.195 (0.342) Acc@5 1.221 (1.611) Mem 16696MB [2024-08-04 10:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 6.7617 (6.9574) Acc@1 0.781 (0.384) Acc@5 2.881 (1.858) Mem 16696MB [2024-08-04 10:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.498 Acc@5 2.197 [2024-08-04 10:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.5% [2024-08-04 10:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.50% [2024-08-04 10:21:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:22:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:22:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][0/625] eta 0:07:54 lr 0.000541 wd 0.0500 time 0.7585 (0.7585) data time 0.3722 (0.3722) model time 0.0000 (0.0000) loss 5.2831 (5.2831) grad_norm 1.7464 (1.7464) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][10/625] eta 0:04:50 lr 0.000542 wd 0.0500 time 0.4471 (0.4727) data time 0.0007 (0.0345) model time 0.0000 (0.0000) loss 5.7208 (5.3358) grad_norm 1.9725 (1.6582) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][20/625] eta 0:04:37 lr 0.000543 wd 0.0500 time 0.4457 (0.4591) data time 0.0008 (0.0185) model time 0.0000 (0.0000) loss 4.9484 (5.0551) grad_norm 1.7533 (1.7407) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][30/625] eta 0:04:37 lr 0.000544 wd 0.0500 time 0.4445 (0.4671) data time 0.0006 (0.0128) model time 0.0000 (0.0000) loss 4.8860 (4.9904) grad_norm 2.3728 (1.8225) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][40/625] eta 0:04:29 lr 0.000545 wd 0.0500 time 0.4384 (0.4610) data time 0.0006 (0.0099) model time 0.0000 (0.0000) loss 5.3350 (4.9435) grad_norm 1.4204 (1.7784) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][50/625] eta 0:04:23 lr 0.000546 wd 0.0500 time 0.4458 (0.4576) data time 0.0007 (0.0081) model time 0.0000 (0.0000) loss 4.5221 (4.9034) grad_norm 1.5809 (1.7709) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][60/625] eta 0:04:18 lr 0.000547 wd 0.0500 time 0.4455 (0.4582) data time 0.0008 (0.0069) model time 0.4446 (0.4600) loss 5.3091 (4.9151) grad_norm 1.8604 (1.7596) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][70/625] eta 0:04:13 lr 0.000548 wd 0.0500 time 0.4472 (0.4561) data time 0.0006 (0.0060) model time 0.4466 (0.4514) loss 5.2666 (4.9313) grad_norm 1.3738 (1.7360) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][80/625] eta 0:04:07 lr 0.000549 wd 0.0500 time 0.4436 (0.4545) data time 0.0008 (0.0054) model time 0.4428 (0.4484) loss 5.4176 (4.9456) grad_norm 1.6832 (1.7399) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][90/625] eta 0:04:02 lr 0.000550 wd 0.0500 time 0.4580 (0.4533) data time 0.0009 (0.0049) model time 0.4571 (0.4470) loss 4.8863 (4.9139) grad_norm 1.8915 (1.7336) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][100/625] eta 0:03:57 lr 0.000551 wd 0.0500 time 0.4436 (0.4523) data time 0.0007 (0.0045) model time 0.4429 (0.4461) loss 5.2892 (4.8915) grad_norm 1.5441 (1.7316) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][110/625] eta 0:03:52 lr 0.000552 wd 0.0500 time 0.4430 (0.4514) data time 0.0007 (0.0041) model time 0.4424 (0.4454) loss 5.0741 (4.8762) grad_norm 1.9009 (1.7365) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][120/625] eta 0:03:47 lr 0.000553 wd 0.0500 time 0.4412 (0.4508) data time 0.0006 (0.0039) model time 0.4406 (0.4450) loss 5.3104 (4.8724) grad_norm 1.8781 (1.7525) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][130/625] eta 0:03:42 lr 0.000554 wd 0.0500 time 0.4507 (0.4503) data time 0.0008 (0.0036) model time 0.4499 (0.4448) loss 4.7378 (4.8790) grad_norm 2.1204 (1.7454) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][140/625] eta 0:03:38 lr 0.000555 wd 0.0500 time 0.4424 (0.4498) data time 0.0010 (0.0034) model time 0.4414 (0.4445) loss 4.5861 (4.8920) grad_norm 1.4659 (1.7543) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][150/625] eta 0:03:33 lr 0.000555 wd 0.0500 time 0.4377 (0.4494) data time 0.0009 (0.0033) model time 0.4369 (0.4444) loss 5.1063 (4.8816) grad_norm 2.0952 (1.7447) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][160/625] eta 0:03:28 lr 0.000556 wd 0.0500 time 0.4410 (0.4490) data time 0.0008 (0.0031) model time 0.4402 (0.4441) loss 5.3497 (4.8770) grad_norm 1.3614 (1.7478) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][170/625] eta 0:03:24 lr 0.000557 wd 0.0500 time 0.4427 (0.4487) data time 0.0009 (0.0030) model time 0.4418 (0.4441) loss 4.9108 (4.8665) grad_norm 1.8161 (1.7565) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][180/625] eta 0:03:19 lr 0.000558 wd 0.0500 time 0.4423 (0.4483) data time 0.0008 (0.0029) model time 0.4415 (0.4439) loss 5.0595 (4.8702) grad_norm 1.4813 (1.7557) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][190/625] eta 0:03:14 lr 0.000559 wd 0.0500 time 0.4430 (0.4481) data time 0.0009 (0.0027) model time 0.4420 (0.4438) loss 3.8475 (4.8635) grad_norm 1.4542 (1.7477) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][200/625] eta 0:03:10 lr 0.000560 wd 0.0500 time 0.4482 (0.4479) data time 0.0009 (0.0027) model time 0.4473 (0.4438) loss 5.2883 (4.8804) grad_norm 1.5816 (1.7394) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][210/625] eta 0:03:05 lr 0.000561 wd 0.0500 time 0.4411 (0.4477) data time 0.0007 (0.0026) model time 0.4404 (0.4438) loss 5.4741 (4.8879) grad_norm 1.5966 (1.7303) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][220/625] eta 0:03:01 lr 0.000562 wd 0.0500 time 0.4445 (0.4491) data time 0.0008 (0.0025) model time 0.4437 (0.4457) loss 5.4130 (4.8863) grad_norm 1.5752 (1.7242) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][230/625] eta 0:02:57 lr 0.000563 wd 0.0500 time 0.4432 (0.4488) data time 0.0008 (0.0024) model time 0.4424 (0.4455) loss 5.5282 (4.8812) grad_norm 2.4686 (1.7302) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][240/625] eta 0:02:52 lr 0.000564 wd 0.0500 time 0.4438 (0.4487) data time 0.0007 (0.0023) model time 0.4431 (0.4454) loss 5.0119 (4.8990) grad_norm 1.6841 (1.7335) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][250/625] eta 0:02:48 lr 0.000565 wd 0.0500 time 0.4443 (0.4485) data time 0.0009 (0.0023) model time 0.4434 (0.4453) loss 5.1982 (4.9148) grad_norm 1.8167 (1.7334) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][260/625] eta 0:02:43 lr 0.000566 wd 0.0500 time 0.4442 (0.4483) data time 0.0008 (0.0022) model time 0.4434 (0.4453) loss 5.1071 (4.9168) grad_norm 1.4250 (1.7313) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][270/625] eta 0:02:39 lr 0.000567 wd 0.0500 time 0.4424 (0.4482) data time 0.0007 (0.0022) model time 0.4417 (0.4452) loss 3.9594 (4.9074) grad_norm 1.5942 (1.7261) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][280/625] eta 0:02:34 lr 0.000568 wd 0.0500 time 0.4477 (0.4481) data time 0.0006 (0.0021) model time 0.4472 (0.4452) loss 4.8734 (4.9025) grad_norm 1.5640 (1.7295) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][290/625] eta 0:02:30 lr 0.000569 wd 0.0500 time 0.4416 (0.4480) data time 0.0008 (0.0021) model time 0.4408 (0.4451) loss 5.0719 (4.9063) grad_norm 1.9233 (1.7280) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][300/625] eta 0:02:25 lr 0.000570 wd 0.0500 time 0.4428 (0.4479) data time 0.0006 (0.0020) model time 0.4422 (0.4451) loss 5.2498 (4.9059) grad_norm 1.1677 (1.7274) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][310/625] eta 0:02:21 lr 0.000571 wd 0.0500 time 0.4412 (0.4477) data time 0.0006 (0.0020) model time 0.4405 (0.4450) loss 5.6317 (4.9087) grad_norm 1.4245 (1.7255) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][320/625] eta 0:02:16 lr 0.000572 wd 0.0500 time 0.4437 (0.4476) data time 0.0008 (0.0020) model time 0.4429 (0.4450) loss 4.1525 (4.9077) grad_norm 1.3806 (1.7251) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][330/625] eta 0:02:12 lr 0.000573 wd 0.0500 time 0.4413 (0.4475) data time 0.0008 (0.0019) model time 0.4405 (0.4449) loss 5.2339 (4.9096) grad_norm 1.6509 (1.7199) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][340/625] eta 0:02:07 lr 0.000574 wd 0.0500 time 0.4435 (0.4474) data time 0.0006 (0.0019) model time 0.4430 (0.4448) loss 5.5278 (4.9065) grad_norm 1.6093 (1.7178) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][350/625] eta 0:02:03 lr 0.000575 wd 0.0500 time 0.4443 (0.4473) data time 0.0006 (0.0019) model time 0.4437 (0.4447) loss 4.0262 (4.9049) grad_norm 1.4137 (1.7160) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][360/625] eta 0:01:58 lr 0.000576 wd 0.0500 time 0.4474 (0.4472) data time 0.0006 (0.0018) model time 0.4468 (0.4447) loss 5.1767 (4.9120) grad_norm 1.9432 (1.7171) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][370/625] eta 0:01:54 lr 0.000577 wd 0.0500 time 0.4477 (0.4472) data time 0.0009 (0.0018) model time 0.4468 (0.4447) loss 4.9316 (4.9117) grad_norm 1.9018 (1.7189) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][380/625] eta 0:01:49 lr 0.000578 wd 0.0500 time 0.4441 (0.4471) data time 0.0008 (0.0018) model time 0.4434 (0.4447) loss 4.3082 (4.9124) grad_norm 1.5680 (1.7185) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][390/625] eta 0:01:45 lr 0.000578 wd 0.0500 time 0.4441 (0.4471) data time 0.0009 (0.0018) model time 0.4431 (0.4447) loss 4.4375 (4.9182) grad_norm 1.4317 (1.7187) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][400/625] eta 0:01:40 lr 0.000579 wd 0.0500 time 0.4436 (0.4474) data time 0.0006 (0.0017) model time 0.4430 (0.4451) loss 5.4460 (4.9144) grad_norm 1.6058 (1.7198) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][410/625] eta 0:01:36 lr 0.000580 wd 0.0500 time 0.4434 (0.4473) data time 0.0010 (0.0017) model time 0.4424 (0.4451) loss 4.9500 (4.9047) grad_norm 1.8187 (1.7231) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][420/625] eta 0:01:31 lr 0.000581 wd 0.0500 time 0.4452 (0.4473) data time 0.0007 (0.0017) model time 0.4445 (0.4450) loss 5.4845 (4.9050) grad_norm 1.4271 (1.7209) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][430/625] eta 0:01:27 lr 0.000582 wd 0.0500 time 0.4454 (0.4472) data time 0.0009 (0.0017) model time 0.4445 (0.4450) loss 5.1247 (4.9053) grad_norm 2.1103 (1.7204) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][440/625] eta 0:01:22 lr 0.000583 wd 0.0500 time 0.4455 (0.4480) data time 0.0006 (0.0016) model time 0.4448 (0.4459) loss 5.4359 (4.9057) grad_norm 1.9527 (1.7182) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][450/625] eta 0:01:18 lr 0.000584 wd 0.0500 time 0.4416 (0.4479) data time 0.0008 (0.0016) model time 0.4408 (0.4459) loss 5.1358 (4.9093) grad_norm 1.7343 (1.7186) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][460/625] eta 0:01:13 lr 0.000585 wd 0.0500 time 0.4412 (0.4478) data time 0.0008 (0.0016) model time 0.4404 (0.4458) loss 5.3259 (4.9080) grad_norm 1.8735 (1.7219) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][470/625] eta 0:01:09 lr 0.000586 wd 0.0500 time 0.4438 (0.4477) data time 0.0009 (0.0016) model time 0.4428 (0.4457) loss 4.7893 (4.9067) grad_norm 1.4573 (1.7225) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][480/625] eta 0:01:04 lr 0.000587 wd 0.0500 time 0.4425 (0.4477) data time 0.0009 (0.0016) model time 0.4416 (0.4457) loss 3.9806 (4.9029) grad_norm 1.7792 (1.7238) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][490/625] eta 0:01:00 lr 0.000588 wd 0.0500 time 0.4420 (0.4476) data time 0.0009 (0.0016) model time 0.4410 (0.4456) loss 5.0992 (4.8974) grad_norm 1.7231 (1.7244) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][500/625] eta 0:00:55 lr 0.000589 wd 0.0500 time 0.4439 (0.4475) data time 0.0007 (0.0015) model time 0.4432 (0.4456) loss 4.1812 (4.8948) grad_norm 1.3380 (1.7243) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][510/625] eta 0:00:51 lr 0.000590 wd 0.0500 time 0.4444 (0.4475) data time 0.0006 (0.0015) model time 0.4438 (0.4455) loss 3.8807 (4.8955) grad_norm 1.2620 (1.7203) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][520/625] eta 0:00:46 lr 0.000591 wd 0.0500 time 0.4450 (0.4474) data time 0.0007 (0.0015) model time 0.4443 (0.4455) loss 5.0901 (4.8890) grad_norm 1.6522 (1.7174) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:25:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][530/625] eta 0:00:42 lr 0.000592 wd 0.0500 time 0.4443 (0.4474) data time 0.0007 (0.0015) model time 0.4435 (0.4455) loss 5.1432 (4.8883) grad_norm 1.7885 (1.7186) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][540/625] eta 0:00:38 lr 0.000593 wd 0.0500 time 0.4451 (0.4474) data time 0.0007 (0.0015) model time 0.4444 (0.4455) loss 5.8178 (4.8876) grad_norm 2.4957 (1.7180) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][550/625] eta 0:00:33 lr 0.000594 wd 0.0500 time 0.4507 (0.4473) data time 0.0008 (0.0015) model time 0.4499 (0.4455) loss 4.3009 (4.8877) grad_norm 1.7982 (1.7187) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][560/625] eta 0:00:29 lr 0.000595 wd 0.0500 time 0.4428 (0.4473) data time 0.0008 (0.0015) model time 0.4420 (0.4454) loss 5.0798 (4.8876) grad_norm 1.5526 (1.7173) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][570/625] eta 0:00:24 lr 0.000596 wd 0.0500 time 0.4420 (0.4472) data time 0.0008 (0.0015) model time 0.4411 (0.4454) loss 5.2750 (4.8883) grad_norm 1.2550 (1.7158) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][580/625] eta 0:00:20 lr 0.000597 wd 0.0500 time 0.6553 (0.4479) data time 0.0009 (0.0014) model time 0.6544 (0.4462) loss 4.6512 (4.8876) grad_norm 1.6033 (1.7147) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][590/625] eta 0:00:15 lr 0.000598 wd 0.0500 time 0.4440 (0.4479) data time 0.0008 (0.0014) model time 0.4432 (0.4462) loss 4.5964 (4.8776) grad_norm 1.7892 (1.7160) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][600/625] eta 0:00:11 lr 0.000599 wd 0.0500 time 0.4438 (0.4479) data time 0.0006 (0.0014) model time 0.4433 (0.4462) loss 3.9451 (4.8770) grad_norm 1.7304 (1.7166) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][610/625] eta 0:00:06 lr 0.000600 wd 0.0500 time 0.4436 (0.4479) data time 0.0004 (0.0014) model time 0.4432 (0.4462) loss 5.5133 (4.8760) grad_norm 2.1395 (1.7202) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [9/300][620/625] eta 0:00:02 lr 0.000601 wd 0.0500 time 0.4405 (0.4477) data time 0.0004 (0.0014) model time 0.4401 (0.4460) loss 5.3236 (4.8772) grad_norm 1.9601 (1.7190) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 9 training takes 0:04:39 [2024-08-04 10:26:40 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:26:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:26:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.490 (0.490) Loss 1.4424 (1.4424) Acc@1 66.650 (66.650) Acc@5 87.695 (87.695) Mem 16696MB [2024-08-04 10:26:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.121 (0.154) Loss 2.4258 (1.7052) Acc@1 45.801 (59.007) Acc@5 73.096 (84.783) Mem 16696MB [2024-08-04 10:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 2.5566 (2.0610) Acc@1 42.529 (52.958) Acc@5 69.385 (78.406) Mem 16696MB [2024-08-04 10:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 52.955 Acc@5 78.421 [2024-08-04 10:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 53.0% [2024-08-04 10:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 52.96% [2024-08-04 10:26:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:26:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:26:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 6.7969 (6.7969) Acc@1 0.488 (0.488) Acc@5 3.564 (3.564) Mem 16696MB [2024-08-04 10:26:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 6.8867 (6.9091) Acc@1 0.537 (0.661) Acc@5 2.100 (2.575) Mem 16696MB [2024-08-04 10:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 6.6328 (6.8043) Acc@1 1.172 (0.718) Acc@5 4.004 (2.867) Mem 16696MB [2024-08-04 10:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 0.894 Acc@5 3.347 [2024-08-04 10:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 0.9% [2024-08-04 10:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 0.89% [2024-08-04 10:26:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:26:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:26:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][0/625] eta 0:08:20 lr 0.000601 wd 0.0500 time 0.8006 (0.8006) data time 0.4171 (0.4171) model time 0.0000 (0.0000) loss 5.3969 (5.3969) grad_norm 1.4536 (1.4536) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:26:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][10/625] eta 0:04:52 lr 0.000602 wd 0.0500 time 0.4448 (0.4762) data time 0.0006 (0.0386) model time 0.0000 (0.0000) loss 5.5679 (4.9835) grad_norm 1.4081 (1.5841) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][20/625] eta 0:04:38 lr 0.000603 wd 0.0500 time 0.4410 (0.4604) data time 0.0006 (0.0206) model time 0.0000 (0.0000) loss 5.0733 (5.0477) grad_norm 1.5142 (1.5449) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][30/625] eta 0:04:31 lr 0.000604 wd 0.0500 time 0.4423 (0.4555) data time 0.0006 (0.0142) model time 0.0000 (0.0000) loss 5.4053 (4.9414) grad_norm 1.8636 (1.6043) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][40/625] eta 0:04:24 lr 0.000605 wd 0.0500 time 0.4423 (0.4527) data time 0.0006 (0.0109) model time 0.0000 (0.0000) loss 5.7309 (4.9688) grad_norm 1.8552 (1.5987) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][50/625] eta 0:04:19 lr 0.000606 wd 0.0500 time 0.4394 (0.4505) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 5.2951 (4.9861) grad_norm 1.6908 (1.6175) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][60/625] eta 0:04:15 lr 0.000607 wd 0.0500 time 0.4487 (0.4515) data time 0.0007 (0.0076) model time 0.4480 (0.4555) loss 5.4367 (5.0313) grad_norm 1.9658 (1.6243) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][70/625] eta 0:04:10 lr 0.000608 wd 0.0500 time 0.4424 (0.4505) data time 0.0007 (0.0067) model time 0.4417 (0.4497) loss 3.8873 (4.9666) grad_norm 1.3134 (1.6136) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][80/625] eta 0:04:05 lr 0.000609 wd 0.0500 time 0.4427 (0.4496) data time 0.0009 (0.0060) model time 0.4418 (0.4473) loss 5.0970 (4.9688) grad_norm 1.3781 (1.6157) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][90/625] eta 0:04:00 lr 0.000610 wd 0.0500 time 0.4451 (0.4491) data time 0.0007 (0.0054) model time 0.4444 (0.4465) loss 3.6755 (4.9401) grad_norm 1.8158 (1.6288) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][100/625] eta 0:03:55 lr 0.000611 wd 0.0500 time 0.4433 (0.4488) data time 0.0007 (0.0049) model time 0.4426 (0.4461) loss 3.5941 (4.9246) grad_norm 1.5897 (1.6638) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][110/625] eta 0:03:50 lr 0.000612 wd 0.0500 time 0.4405 (0.4483) data time 0.0006 (0.0046) model time 0.4399 (0.4456) loss 6.0239 (4.9146) grad_norm 1.3724 (1.6546) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][120/625] eta 0:03:46 lr 0.000613 wd 0.0500 time 0.4442 (0.4479) data time 0.0009 (0.0042) model time 0.4433 (0.4452) loss 4.9537 (4.9105) grad_norm 1.3113 (1.6491) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][130/625] eta 0:03:43 lr 0.000613 wd 0.0500 time 0.4447 (0.4510) data time 0.0007 (0.0040) model time 0.4440 (0.4505) loss 5.5516 (4.9180) grad_norm 2.1213 (1.6415) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][140/625] eta 0:03:38 lr 0.000614 wd 0.0500 time 0.4485 (0.4505) data time 0.0008 (0.0038) model time 0.4477 (0.4497) loss 5.5006 (4.9068) grad_norm 1.2185 (1.6505) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][150/625] eta 0:03:33 lr 0.000615 wd 0.0500 time 0.4412 (0.4501) data time 0.0009 (0.0036) model time 0.4403 (0.4491) loss 3.4168 (4.8903) grad_norm 2.1224 (1.6607) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][160/625] eta 0:03:29 lr 0.000616 wd 0.0500 time 0.4468 (0.4499) data time 0.0008 (0.0034) model time 0.4460 (0.4487) loss 3.9476 (4.8818) grad_norm 1.3607 (1.6537) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][170/625] eta 0:03:24 lr 0.000617 wd 0.0500 time 0.4461 (0.4496) data time 0.0006 (0.0032) model time 0.4455 (0.4484) loss 5.2536 (4.8938) grad_norm 1.3715 (1.6472) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][180/625] eta 0:03:19 lr 0.000618 wd 0.0500 time 0.4458 (0.4493) data time 0.0008 (0.0031) model time 0.4449 (0.4480) loss 4.3833 (4.8733) grad_norm 1.9084 (1.6495) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][190/625] eta 0:03:15 lr 0.000619 wd 0.0500 time 0.4425 (0.4490) data time 0.0007 (0.0030) model time 0.4418 (0.4477) loss 4.8498 (4.8687) grad_norm 1.4135 (1.6556) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][200/625] eta 0:03:10 lr 0.000620 wd 0.0500 time 0.4434 (0.4487) data time 0.0007 (0.0029) model time 0.4427 (0.4473) loss 5.5911 (4.8514) grad_norm 1.5681 (1.6497) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][210/625] eta 0:03:06 lr 0.000621 wd 0.0500 time 0.4472 (0.4485) data time 0.0006 (0.0028) model time 0.4466 (0.4470) loss 4.0791 (4.8349) grad_norm 1.7046 (1.6615) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][220/625] eta 0:03:01 lr 0.000622 wd 0.0500 time 0.4527 (0.4483) data time 0.0007 (0.0027) model time 0.4520 (0.4468) loss 4.1670 (4.8309) grad_norm 1.8106 (1.6646) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][230/625] eta 0:02:56 lr 0.000623 wd 0.0500 time 0.4407 (0.4481) data time 0.0006 (0.0026) model time 0.4401 (0.4466) loss 4.4952 (4.8150) grad_norm 1.8060 (1.6679) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][240/625] eta 0:02:52 lr 0.000624 wd 0.0500 time 0.4451 (0.4479) data time 0.0007 (0.0025) model time 0.4444 (0.4464) loss 5.6293 (4.8139) grad_norm 1.5693 (1.6569) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][250/625] eta 0:02:47 lr 0.000625 wd 0.0500 time 0.4513 (0.4478) data time 0.0008 (0.0025) model time 0.4505 (0.4463) loss 5.4403 (4.8043) grad_norm 1.3551 (1.6485) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][260/625] eta 0:02:43 lr 0.000626 wd 0.0500 time 0.4459 (0.4477) data time 0.0009 (0.0024) model time 0.4450 (0.4462) loss 5.1695 (4.8093) grad_norm 1.9915 (1.6445) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][270/625] eta 0:02:38 lr 0.000627 wd 0.0500 time 0.4427 (0.4475) data time 0.0007 (0.0023) model time 0.4420 (0.4460) loss 3.9848 (4.8045) grad_norm 1.4449 (1.6448) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:28:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][280/625] eta 0:02:34 lr 0.000628 wd 0.0500 time 0.4522 (0.4475) data time 0.0009 (0.0023) model time 0.4513 (0.4460) loss 3.9004 (4.8023) grad_norm 1.3784 (1.6394) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][290/625] eta 0:02:29 lr 0.000629 wd 0.0500 time 0.4596 (0.4475) data time 0.0006 (0.0022) model time 0.4590 (0.4460) loss 5.1467 (4.8070) grad_norm 1.6741 (1.6348) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][300/625] eta 0:02:25 lr 0.000630 wd 0.0500 time 0.4417 (0.4474) data time 0.0007 (0.0022) model time 0.4410 (0.4459) loss 5.3033 (4.8069) grad_norm 1.4474 (1.6318) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][310/625] eta 0:02:20 lr 0.000631 wd 0.0500 time 0.4426 (0.4472) data time 0.0009 (0.0021) model time 0.4417 (0.4458) loss 5.6719 (4.8023) grad_norm 1.6446 (1.6331) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][320/625] eta 0:02:16 lr 0.000632 wd 0.0500 time 0.4416 (0.4471) data time 0.0009 (0.0021) model time 0.4407 (0.4457) loss 4.9888 (4.8082) grad_norm 1.3618 (1.6318) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][330/625] eta 0:02:11 lr 0.000633 wd 0.0500 time 0.4411 (0.4470) data time 0.0007 (0.0021) model time 0.4405 (0.4456) loss 3.9179 (4.7976) grad_norm 2.1113 (1.6328) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][340/625] eta 0:02:07 lr 0.000634 wd 0.0500 time 0.4428 (0.4469) data time 0.0006 (0.0020) model time 0.4422 (0.4455) loss 5.0402 (4.8023) grad_norm 1.4749 (1.6308) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][350/625] eta 0:02:02 lr 0.000635 wd 0.0500 time 0.4429 (0.4468) data time 0.0007 (0.0020) model time 0.4422 (0.4454) loss 3.7913 (4.8029) grad_norm 1.9300 (1.6304) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][360/625] eta 0:01:58 lr 0.000636 wd 0.0500 time 0.4409 (0.4467) data time 0.0008 (0.0020) model time 0.4400 (0.4453) loss 5.3259 (4.7927) grad_norm 1.6118 (1.6276) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][370/625] eta 0:01:53 lr 0.000636 wd 0.0500 time 0.4414 (0.4466) data time 0.0007 (0.0019) model time 0.4407 (0.4452) loss 5.7631 (4.7995) grad_norm 1.6600 (1.6289) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][380/625] eta 0:01:49 lr 0.000637 wd 0.0500 time 0.4399 (0.4465) data time 0.0006 (0.0019) model time 0.4393 (0.4451) loss 3.4327 (4.7911) grad_norm 1.8076 (1.6339) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][390/625] eta 0:01:44 lr 0.000638 wd 0.0500 time 0.4428 (0.4464) data time 0.0007 (0.0019) model time 0.4421 (0.4450) loss 3.8758 (4.7911) grad_norm 2.0426 (1.6386) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][400/625] eta 0:01:40 lr 0.000639 wd 0.0500 time 0.4431 (0.4467) data time 0.0007 (0.0018) model time 0.4425 (0.4454) loss 3.6849 (4.7875) grad_norm 1.7543 (1.6354) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][410/625] eta 0:01:36 lr 0.000640 wd 0.0500 time 0.4425 (0.4466) data time 0.0008 (0.0018) model time 0.4417 (0.4453) loss 5.2837 (4.7884) grad_norm 1.6820 (1.6355) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][420/625] eta 0:01:31 lr 0.000641 wd 0.0500 time 0.4439 (0.4465) data time 0.0006 (0.0018) model time 0.4433 (0.4452) loss 3.7269 (4.7795) grad_norm 1.2586 (1.6351) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][430/625] eta 0:01:27 lr 0.000642 wd 0.0500 time 0.4491 (0.4465) data time 0.0008 (0.0018) model time 0.4483 (0.4451) loss 4.9539 (4.7703) grad_norm 1.2665 (1.6306) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][440/625] eta 0:01:22 lr 0.000643 wd 0.0500 time 0.4463 (0.4464) data time 0.0006 (0.0017) model time 0.4457 (0.4451) loss 3.4984 (4.7662) grad_norm 1.3464 (1.6329) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][450/625] eta 0:01:18 lr 0.000644 wd 0.0500 time 0.4435 (0.4464) data time 0.0009 (0.0017) model time 0.4426 (0.4450) loss 3.9747 (4.7696) grad_norm 2.0725 (1.6377) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][460/625] eta 0:01:13 lr 0.000645 wd 0.0500 time 0.6619 (0.4468) data time 0.0006 (0.0017) model time 0.6613 (0.4455) loss 4.9890 (4.7736) grad_norm 1.2760 (1.6373) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][470/625] eta 0:01:09 lr 0.000646 wd 0.0500 time 0.4439 (0.4467) data time 0.0008 (0.0017) model time 0.4430 (0.4455) loss 5.0395 (4.7764) grad_norm 1.7700 (1.6407) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][480/625] eta 0:01:04 lr 0.000647 wd 0.0500 time 0.4425 (0.4467) data time 0.0008 (0.0017) model time 0.4417 (0.4454) loss 5.0423 (4.7794) grad_norm 1.4638 (1.6407) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][490/625] eta 0:01:00 lr 0.000648 wd 0.0500 time 0.4418 (0.4467) data time 0.0006 (0.0017) model time 0.4412 (0.4454) loss 5.3419 (4.7769) grad_norm 1.7301 (1.6408) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][500/625] eta 0:00:55 lr 0.000649 wd 0.0500 time 0.4437 (0.4466) data time 0.0006 (0.0016) model time 0.4432 (0.4454) loss 5.2725 (4.7672) grad_norm 1.6085 (1.6375) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][510/625] eta 0:00:51 lr 0.000650 wd 0.0500 time 0.4415 (0.4465) data time 0.0008 (0.0016) model time 0.4406 (0.4453) loss 4.9258 (4.7649) grad_norm 1.3570 (1.6384) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][520/625] eta 0:00:46 lr 0.000651 wd 0.0500 time 0.4432 (0.4465) data time 0.0008 (0.0016) model time 0.4424 (0.4452) loss 3.6908 (4.7577) grad_norm 1.3871 (1.6380) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][530/625] eta 0:00:42 lr 0.000652 wd 0.0500 time 0.4501 (0.4464) data time 0.0008 (0.0016) model time 0.4493 (0.4452) loss 5.0542 (4.7616) grad_norm 2.4436 (1.6376) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][540/625] eta 0:00:37 lr 0.000653 wd 0.0500 time 0.4503 (0.4464) data time 0.0006 (0.0016) model time 0.4497 (0.4452) loss 5.3879 (4.7633) grad_norm 1.7296 (1.6426) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:30:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][550/625] eta 0:00:33 lr 0.000654 wd 0.0500 time 0.4426 (0.4464) data time 0.0007 (0.0016) model time 0.4419 (0.4451) loss 4.9058 (4.7617) grad_norm 1.5314 (1.6429) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][560/625] eta 0:00:29 lr 0.000655 wd 0.0500 time 0.4453 (0.4463) data time 0.0007 (0.0015) model time 0.4446 (0.4451) loss 5.1521 (4.7592) grad_norm 1.3932 (1.6395) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][570/625] eta 0:00:24 lr 0.000656 wd 0.0500 time 0.4440 (0.4463) data time 0.0007 (0.0015) model time 0.4433 (0.4451) loss 4.4855 (4.7570) grad_norm 1.4922 (1.6394) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][580/625] eta 0:00:20 lr 0.000657 wd 0.0500 time 0.4442 (0.4463) data time 0.0009 (0.0015) model time 0.4434 (0.4451) loss 4.8507 (4.7561) grad_norm 1.2908 (1.6395) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][590/625] eta 0:00:15 lr 0.000658 wd 0.0500 time 0.4411 (0.4464) data time 0.0007 (0.0015) model time 0.4404 (0.4452) loss 4.4592 (4.7505) grad_norm 1.8904 (1.6388) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][600/625] eta 0:00:11 lr 0.000659 wd 0.0500 time 0.4453 (0.4464) data time 0.0007 (0.0015) model time 0.4445 (0.4452) loss 3.5917 (4.7470) grad_norm 1.4922 (1.6415) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][610/625] eta 0:00:06 lr 0.000659 wd 0.0500 time 0.4390 (0.4463) data time 0.0004 (0.0015) model time 0.4385 (0.4451) loss 5.7619 (4.7442) grad_norm 1.3714 (1.6407) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [10/300][620/625] eta 0:00:02 lr 0.000660 wd 0.0500 time 0.4428 (0.4462) data time 0.0004 (0.0015) model time 0.4423 (0.4450) loss 4.0958 (4.7393) grad_norm 1.4794 (1.6397) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 10 training takes 0:04:38 [2024-08-04 10:31:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:31:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:31:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 1.2949 (1.2949) Acc@1 69.678 (69.678) Acc@5 90.234 (90.234) Mem 16696MB [2024-08-04 10:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 2.1816 (1.5510) Acc@1 50.439 (62.278) Acc@5 77.490 (86.976) Mem 16696MB [2024-08-04 10:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 2.4043 (1.9167) Acc@1 47.363 (55.780) Acc@5 72.168 (80.762) Mem 16696MB [2024-08-04 10:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 55.858 Acc@5 80.878 [2024-08-04 10:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 55.9% [2024-08-04 10:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 55.86% [2024-08-04 10:31:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:31:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:31:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 6.5391 (6.5391) Acc@1 0.781 (0.781) Acc@5 5.615 (5.615) Mem 16696MB [2024-08-04 10:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 6.7188 (6.6605) Acc@1 0.684 (1.221) Acc@5 3.467 (4.581) Mem 16696MB [2024-08-04 10:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.134) Loss 6.4492 (6.5766) Acc@1 1.221 (1.251) Acc@5 4.834 (4.818) Mem 16696MB [2024-08-04 10:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 1.538 Acc@5 5.426 [2024-08-04 10:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 1.5% [2024-08-04 10:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 1.54% [2024-08-04 10:31:40 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:31:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][0/625] eta 0:08:15 lr 0.000661 wd 0.0500 time 0.7921 (0.7921) data time 0.4062 (0.4062) model time 0.0000 (0.0000) loss 3.9156 (3.9156) grad_norm 1.5309 (1.5309) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][10/625] eta 0:04:51 lr 0.000662 wd 0.0500 time 0.4441 (0.4744) data time 0.0006 (0.0376) model time 0.0000 (0.0000) loss 5.4522 (4.4873) grad_norm 1.6029 (1.5140) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][20/625] eta 0:04:38 lr 0.000663 wd 0.0500 time 0.4435 (0.4596) data time 0.0009 (0.0201) model time 0.0000 (0.0000) loss 4.9038 (4.5426) grad_norm 1.8093 (1.5267) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:31:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][30/625] eta 0:04:30 lr 0.000664 wd 0.0500 time 0.4407 (0.4545) data time 0.0007 (0.0139) model time 0.0000 (0.0000) loss 3.4072 (4.4596) grad_norm 1.7003 (1.6678) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][40/625] eta 0:04:24 lr 0.000665 wd 0.0500 time 0.4451 (0.4520) data time 0.0009 (0.0107) model time 0.0000 (0.0000) loss 4.1118 (4.5329) grad_norm 1.7087 (1.6634) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][50/625] eta 0:04:21 lr 0.000666 wd 0.0500 time 0.4438 (0.4546) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 4.0882 (4.5255) grad_norm 1.2556 (1.6354) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][60/625] eta 0:04:17 lr 0.000667 wd 0.0500 time 0.6434 (0.4560) data time 0.0007 (0.0074) model time 0.6427 (0.4625) loss 5.5699 (4.5852) grad_norm 1.9128 (1.6417) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][70/625] eta 0:04:11 lr 0.000668 wd 0.0500 time 0.4556 (0.4536) data time 0.0009 (0.0065) model time 0.4548 (0.4501) loss 4.9262 (4.5709) grad_norm 2.0813 (1.6524) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][80/625] eta 0:04:06 lr 0.000669 wd 0.0500 time 0.4430 (0.4523) data time 0.0006 (0.0058) model time 0.4424 (0.4476) loss 4.8768 (4.5451) grad_norm 1.1997 (1.6433) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][90/625] eta 0:04:01 lr 0.000670 wd 0.0500 time 0.4455 (0.4514) data time 0.0009 (0.0053) model time 0.4447 (0.4464) loss 4.3747 (4.5650) grad_norm 1.7668 (1.6529) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][100/625] eta 0:03:56 lr 0.000670 wd 0.0500 time 0.4417 (0.4506) data time 0.0008 (0.0048) model time 0.4408 (0.4456) loss 4.9287 (4.5619) grad_norm 2.0761 (1.6570) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][110/625] eta 0:03:51 lr 0.000671 wd 0.0500 time 0.4423 (0.4499) data time 0.0008 (0.0045) model time 0.4415 (0.4452) loss 4.2807 (4.5623) grad_norm 1.7199 (1.6709) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][120/625] eta 0:03:46 lr 0.000672 wd 0.0500 time 0.4416 (0.4493) data time 0.0006 (0.0042) model time 0.4410 (0.4447) loss 3.9377 (4.5564) grad_norm 1.6328 (1.6631) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][130/625] eta 0:03:42 lr 0.000673 wd 0.0500 time 0.4416 (0.4489) data time 0.0009 (0.0039) model time 0.4408 (0.4444) loss 3.9676 (4.5297) grad_norm 1.5410 (1.6640) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][140/625] eta 0:03:37 lr 0.000674 wd 0.0500 time 0.4425 (0.4485) data time 0.0009 (0.0037) model time 0.4415 (0.4442) loss 5.0377 (4.5412) grad_norm 1.3381 (1.6461) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][150/625] eta 0:03:32 lr 0.000675 wd 0.0500 time 0.4473 (0.4482) data time 0.0008 (0.0035) model time 0.4465 (0.4441) loss 4.9091 (4.5391) grad_norm 1.4171 (1.6418) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][160/625] eta 0:03:28 lr 0.000676 wd 0.0500 time 0.4433 (0.4480) data time 0.0007 (0.0033) model time 0.4426 (0.4441) loss 5.2615 (4.5498) grad_norm 1.6650 (1.6517) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:32:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][170/625] eta 0:03:23 lr 0.000677 wd 0.0500 time 0.4405 (0.4477) data time 0.0009 (0.0032) model time 0.4396 (0.4440) loss 5.0264 (4.5432) grad_norm 1.6239 (1.6496) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][180/625] eta 0:03:19 lr 0.000678 wd 0.0500 time 0.4433 (0.4475) data time 0.0008 (0.0031) model time 0.4425 (0.4439) loss 4.9692 (4.5410) grad_norm 1.6349 (1.6452) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][190/625] eta 0:03:14 lr 0.000679 wd 0.0500 time 0.4475 (0.4472) data time 0.0007 (0.0029) model time 0.4469 (0.4438) loss 4.7648 (4.5565) grad_norm 1.5337 (1.6558) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][200/625] eta 0:03:10 lr 0.000680 wd 0.0500 time 0.4469 (0.4471) data time 0.0007 (0.0028) model time 0.4463 (0.4438) loss 3.6271 (4.5502) grad_norm 1.7708 (1.6512) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][210/625] eta 0:03:05 lr 0.000681 wd 0.0500 time 0.4456 (0.4470) data time 0.0008 (0.0027) model time 0.4448 (0.4437) loss 4.2503 (4.5457) grad_norm 1.4159 (1.6442) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][220/625] eta 0:03:00 lr 0.000682 wd 0.0500 time 0.4429 (0.4468) data time 0.0006 (0.0026) model time 0.4423 (0.4437) loss 4.3283 (4.5552) grad_norm 1.4285 (1.6406) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][230/625] eta 0:02:56 lr 0.000683 wd 0.0500 time 0.4462 (0.4468) data time 0.0006 (0.0026) model time 0.4456 (0.4437) loss 4.1620 (4.5630) grad_norm 1.2398 (1.6389) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][240/625] eta 0:02:52 lr 0.000684 wd 0.0500 time 0.4408 (0.4473) data time 0.0009 (0.0025) model time 0.4398 (0.4446) loss 4.6649 (4.5546) grad_norm 1.7036 (1.6320) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][250/625] eta 0:02:47 lr 0.000685 wd 0.0500 time 0.4440 (0.4473) data time 0.0007 (0.0024) model time 0.4433 (0.4446) loss 5.1307 (4.5570) grad_norm 1.3432 (1.6315) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][260/625] eta 0:02:43 lr 0.000686 wd 0.0500 time 0.4403 (0.4471) data time 0.0009 (0.0024) model time 0.4394 (0.4445) loss 4.6483 (4.5651) grad_norm 1.5146 (1.6304) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][270/625] eta 0:02:38 lr 0.000687 wd 0.0500 time 0.4429 (0.4470) data time 0.0006 (0.0023) model time 0.4423 (0.4444) loss 5.6290 (4.5690) grad_norm 1.4899 (1.6246) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][280/625] eta 0:02:34 lr 0.000688 wd 0.0500 time 0.4445 (0.4469) data time 0.0007 (0.0023) model time 0.4438 (0.4444) loss 5.4168 (4.5899) grad_norm 1.4384 (1.6311) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][290/625] eta 0:02:29 lr 0.000689 wd 0.0500 time 0.4439 (0.4468) data time 0.0009 (0.0022) model time 0.4430 (0.4443) loss 4.1894 (4.5954) grad_norm 2.3695 (1.6390) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][300/625] eta 0:02:25 lr 0.000690 wd 0.0500 time 0.4456 (0.4467) data time 0.0007 (0.0022) model time 0.4449 (0.4443) loss 5.4792 (4.6024) grad_norm 1.3653 (1.6380) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][310/625] eta 0:02:20 lr 0.000691 wd 0.0500 time 0.4428 (0.4467) data time 0.0006 (0.0021) model time 0.4422 (0.4443) loss 4.7776 (4.5930) grad_norm 1.4403 (1.6319) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][320/625] eta 0:02:16 lr 0.000692 wd 0.0500 time 0.4426 (0.4466) data time 0.0008 (0.0021) model time 0.4418 (0.4442) loss 5.4277 (4.5958) grad_norm 1.5468 (1.6236) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][330/625] eta 0:02:11 lr 0.000693 wd 0.0500 time 0.4430 (0.4465) data time 0.0006 (0.0020) model time 0.4424 (0.4442) loss 3.7372 (4.5948) grad_norm 1.5472 (1.6207) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][340/625] eta 0:02:07 lr 0.000693 wd 0.0500 time 0.4455 (0.4465) data time 0.0006 (0.0020) model time 0.4449 (0.4442) loss 5.1745 (4.5907) grad_norm 2.1444 (1.6206) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][350/625] eta 0:02:02 lr 0.000694 wd 0.0500 time 0.4384 (0.4464) data time 0.0007 (0.0020) model time 0.4376 (0.4441) loss 4.9114 (4.5966) grad_norm 1.5525 (1.6187) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][360/625] eta 0:01:58 lr 0.000695 wd 0.0500 time 0.4440 (0.4463) data time 0.0006 (0.0019) model time 0.4434 (0.4441) loss 3.9152 (4.5960) grad_norm 2.0783 (1.6176) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][370/625] eta 0:01:53 lr 0.000696 wd 0.0500 time 0.4459 (0.4462) data time 0.0007 (0.0019) model time 0.4453 (0.4441) loss 3.7310 (4.5980) grad_norm 1.5808 (1.6224) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][380/625] eta 0:01:49 lr 0.000697 wd 0.0500 time 0.4449 (0.4462) data time 0.0008 (0.0019) model time 0.4441 (0.4441) loss 5.1131 (4.5971) grad_norm 1.5138 (1.6204) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][390/625] eta 0:01:44 lr 0.000698 wd 0.0500 time 0.4430 (0.4461) data time 0.0008 (0.0019) model time 0.4422 (0.4441) loss 5.2082 (4.6071) grad_norm 1.7141 (1.6207) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][400/625] eta 0:01:40 lr 0.000699 wd 0.0500 time 0.4407 (0.4465) data time 0.0008 (0.0018) model time 0.4398 (0.4444) loss 3.5598 (4.6087) grad_norm 1.9171 (1.6216) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][410/625] eta 0:01:35 lr 0.000700 wd 0.0500 time 0.4503 (0.4464) data time 0.0010 (0.0018) model time 0.4493 (0.4445) loss 4.6927 (4.6027) grad_norm 1.8676 (1.6267) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][420/625] eta 0:01:31 lr 0.000701 wd 0.0500 time 0.4468 (0.4464) data time 0.0005 (0.0018) model time 0.4463 (0.4445) loss 4.5347 (4.6017) grad_norm 1.4948 (1.6248) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][430/625] eta 0:01:27 lr 0.000702 wd 0.0500 time 0.4435 (0.4464) data time 0.0009 (0.0018) model time 0.4426 (0.4444) loss 4.2516 (4.6053) grad_norm 1.4691 (1.6254) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][440/625] eta 0:01:22 lr 0.000703 wd 0.0500 time 0.4437 (0.4464) data time 0.0008 (0.0017) model time 0.4428 (0.4445) loss 4.0503 (4.6070) grad_norm 1.5290 (1.6269) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][450/625] eta 0:01:18 lr 0.000704 wd 0.0500 time 0.4467 (0.4463) data time 0.0007 (0.0017) model time 0.4461 (0.4445) loss 3.5159 (4.6032) grad_norm 1.7254 (1.6273) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][460/625] eta 0:01:13 lr 0.000705 wd 0.0500 time 0.4437 (0.4468) data time 0.0008 (0.0017) model time 0.4429 (0.4450) loss 5.0708 (4.6016) grad_norm 2.2074 (1.6279) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][470/625] eta 0:01:09 lr 0.000706 wd 0.0500 time 0.4451 (0.4467) data time 0.0007 (0.0017) model time 0.4444 (0.4450) loss 5.3094 (4.6074) grad_norm 1.4767 (1.6287) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][480/625] eta 0:01:04 lr 0.000707 wd 0.0500 time 0.4431 (0.4467) data time 0.0010 (0.0017) model time 0.4422 (0.4449) loss 5.0940 (4.6089) grad_norm 1.4631 (1.6262) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][490/625] eta 0:01:00 lr 0.000708 wd 0.0500 time 0.4466 (0.4467) data time 0.0006 (0.0016) model time 0.4460 (0.4449) loss 4.7039 (4.6100) grad_norm 1.8255 (1.6263) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][500/625] eta 0:00:55 lr 0.000709 wd 0.0500 time 0.4447 (0.4466) data time 0.0006 (0.0016) model time 0.4441 (0.4449) loss 5.0816 (4.6084) grad_norm 1.6138 (1.6260) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][510/625] eta 0:00:51 lr 0.000710 wd 0.0500 time 0.4440 (0.4466) data time 0.0007 (0.0016) model time 0.4433 (0.4449) loss 5.1645 (4.6091) grad_norm 1.3967 (1.6251) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][520/625] eta 0:00:46 lr 0.000711 wd 0.0500 time 0.4426 (0.4466) data time 0.0005 (0.0016) model time 0.4420 (0.4449) loss 3.5625 (4.6119) grad_norm 1.6453 (1.6259) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][530/625] eta 0:00:42 lr 0.000712 wd 0.0500 time 0.4521 (0.4466) data time 0.0009 (0.0016) model time 0.4513 (0.4449) loss 4.3760 (4.6056) grad_norm 1.3441 (1.6236) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][540/625] eta 0:00:37 lr 0.000713 wd 0.0500 time 0.4423 (0.4466) data time 0.0009 (0.0016) model time 0.4414 (0.4449) loss 4.7507 (4.6063) grad_norm 2.0344 (1.6219) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][550/625] eta 0:00:33 lr 0.000714 wd 0.0500 time 0.4419 (0.4465) data time 0.0008 (0.0015) model time 0.4411 (0.4449) loss 4.6698 (4.6083) grad_norm 1.5704 (1.6244) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][560/625] eta 0:00:29 lr 0.000715 wd 0.0500 time 0.4426 (0.4465) data time 0.0006 (0.0015) model time 0.4420 (0.4449) loss 4.0886 (4.6065) grad_norm 1.3372 (1.6245) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:35:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][570/625] eta 0:00:24 lr 0.000716 wd 0.0500 time 0.4432 (0.4464) data time 0.0006 (0.0015) model time 0.4426 (0.4448) loss 4.6406 (4.6006) grad_norm 1.4121 (1.6240) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][580/625] eta 0:00:20 lr 0.000716 wd 0.0500 time 0.4415 (0.4465) data time 0.0007 (0.0015) model time 0.4408 (0.4449) loss 4.9776 (4.6035) grad_norm 1.2707 (1.6240) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][590/625] eta 0:00:15 lr 0.000717 wd 0.0500 time 0.4444 (0.4465) data time 0.0008 (0.0015) model time 0.4436 (0.4449) loss 4.5075 (4.6044) grad_norm 1.5011 (1.6243) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][600/625] eta 0:00:11 lr 0.000718 wd 0.0500 time 0.4486 (0.4465) data time 0.0006 (0.0015) model time 0.4479 (0.4449) loss 4.2198 (4.6060) grad_norm 1.9001 (1.6228) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][610/625] eta 0:00:06 lr 0.000719 wd 0.0500 time 0.4418 (0.4468) data time 0.0004 (0.0015) model time 0.4414 (0.4453) loss 4.1739 (4.6049) grad_norm 1.4746 (1.6226) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [11/300][620/625] eta 0:00:02 lr 0.000720 wd 0.0500 time 0.4462 (0.4467) data time 0.0004 (0.0015) model time 0.4458 (0.4452) loss 3.4325 (4.5997) grad_norm 1.8196 (1.6218) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:21 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 11 training takes 0:04:39 [2024-08-04 10:36:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:36:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:36:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 1.2793 (1.2793) Acc@1 71.631 (71.631) Acc@5 90.137 (90.137) Mem 16696MB [2024-08-04 10:36:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 2.2031 (1.5206) Acc@1 50.830 (63.743) Acc@5 76.758 (87.611) Mem 16696MB [2024-08-04 10:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 2.3359 (1.8524) Acc@1 48.193 (57.557) Acc@5 74.170 (82.066) Mem 16696MB [2024-08-04 10:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 57.554 Acc@5 82.148 [2024-08-04 10:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 57.6% [2024-08-04 10:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 57.55% [2024-08-04 10:36:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:36:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:36:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 6.1797 (6.1797) Acc@1 1.953 (1.953) Acc@5 9.277 (9.277) Mem 16696MB [2024-08-04 10:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 6.4570 (6.3015) Acc@1 1.660 (2.450) Acc@5 6.250 (8.092) Mem 16696MB [2024-08-04 10:36:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 6.1914 (6.2480) Acc@1 1.953 (2.476) Acc@5 6.689 (8.240) Mem 16696MB [2024-08-04 10:36:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 2.873 Acc@5 9.081 [2024-08-04 10:36:30 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 2.9% [2024-08-04 10:36:30 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 2.87% [2024-08-04 10:36:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:36:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:36:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][0/625] eta 0:08:29 lr 0.000721 wd 0.0500 time 0.8156 (0.8156) data time 0.4316 (0.4316) model time 0.0000 (0.0000) loss 4.6271 (4.6271) grad_norm 1.5635 (1.5635) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][10/625] eta 0:04:53 lr 0.000722 wd 0.0500 time 0.4457 (0.4778) data time 0.0008 (0.0400) model time 0.0000 (0.0000) loss 5.0436 (4.4747) grad_norm 1.9265 (1.7186) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][20/625] eta 0:04:39 lr 0.000723 wd 0.0500 time 0.4435 (0.4620) data time 0.0008 (0.0213) model time 0.0000 (0.0000) loss 3.9787 (4.4715) grad_norm 1.4111 (1.6670) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][30/625] eta 0:04:31 lr 0.000724 wd 0.0500 time 0.4424 (0.4570) data time 0.0007 (0.0147) model time 0.0000 (0.0000) loss 5.3419 (4.5520) grad_norm 1.5327 (1.6221) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][40/625] eta 0:04:25 lr 0.000725 wd 0.0500 time 0.4464 (0.4536) data time 0.0006 (0.0113) model time 0.0000 (0.0000) loss 5.2057 (4.5050) grad_norm 1.3994 (1.6184) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][50/625] eta 0:04:19 lr 0.000726 wd 0.0500 time 0.4426 (0.4515) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 5.0969 (4.5340) grad_norm 1.6715 (1.6316) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:36:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][60/625] eta 0:04:16 lr 0.000727 wd 0.0500 time 0.6603 (0.4538) data time 0.0006 (0.0079) model time 0.6596 (0.4646) loss 4.5551 (4.4798) grad_norm 1.6154 (1.6028) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][70/625] eta 0:04:10 lr 0.000728 wd 0.0500 time 0.4451 (0.4515) data time 0.0007 (0.0069) model time 0.4444 (0.4507) loss 3.4545 (4.4539) grad_norm 1.9627 (1.5902) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][80/625] eta 0:04:05 lr 0.000728 wd 0.0500 time 0.4443 (0.4506) data time 0.0006 (0.0062) model time 0.4437 (0.4480) loss 4.1321 (4.4884) grad_norm 1.5360 (1.5844) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][90/625] eta 0:04:00 lr 0.000729 wd 0.0500 time 0.4420 (0.4497) data time 0.0009 (0.0056) model time 0.4412 (0.4465) loss 4.6724 (4.5195) grad_norm 1.3898 (1.5882) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][100/625] eta 0:03:55 lr 0.000730 wd 0.0500 time 0.4491 (0.4491) data time 0.0008 (0.0051) model time 0.4483 (0.4457) loss 5.0697 (4.5362) grad_norm 1.5749 (1.5822) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][110/625] eta 0:03:51 lr 0.000731 wd 0.0500 time 0.4436 (0.4486) data time 0.0008 (0.0047) model time 0.4428 (0.4452) loss 4.6165 (4.5429) grad_norm 1.8542 (1.5850) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][120/625] eta 0:03:46 lr 0.000732 wd 0.0500 time 0.4456 (0.4482) data time 0.0009 (0.0044) model time 0.4447 (0.4449) loss 4.6802 (4.5178) grad_norm 1.3015 (1.5858) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][130/625] eta 0:03:41 lr 0.000733 wd 0.0500 time 0.4461 (0.4478) data time 0.0007 (0.0041) model time 0.4453 (0.4446) loss 4.2259 (4.5335) grad_norm 1.3473 (1.5722) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][140/625] eta 0:03:37 lr 0.000734 wd 0.0500 time 0.4475 (0.4475) data time 0.0009 (0.0039) model time 0.4466 (0.4444) loss 4.7815 (4.5402) grad_norm 1.2981 (1.5654) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][150/625] eta 0:03:32 lr 0.000735 wd 0.0500 time 0.4447 (0.4472) data time 0.0007 (0.0037) model time 0.4440 (0.4442) loss 4.5477 (4.5495) grad_norm 2.2711 (1.5658) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][160/625] eta 0:03:27 lr 0.000736 wd 0.0500 time 0.4436 (0.4469) data time 0.0008 (0.0035) model time 0.4428 (0.4439) loss 4.0736 (4.5482) grad_norm 1.5949 (1.5700) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][170/625] eta 0:03:23 lr 0.000737 wd 0.0500 time 0.4410 (0.4467) data time 0.0008 (0.0034) model time 0.4402 (0.4437) loss 3.3108 (4.5577) grad_norm 1.5756 (1.5706) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][180/625] eta 0:03:18 lr 0.000738 wd 0.0500 time 0.4466 (0.4465) data time 0.0008 (0.0032) model time 0.4458 (0.4436) loss 4.4367 (4.5429) grad_norm 2.2208 (1.5725) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][190/625] eta 0:03:14 lr 0.000739 wd 0.0500 time 0.4434 (0.4463) data time 0.0008 (0.0031) model time 0.4426 (0.4436) loss 4.9060 (4.5537) grad_norm 1.8478 (1.5823) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][200/625] eta 0:03:09 lr 0.000740 wd 0.0500 time 0.4434 (0.4462) data time 0.0008 (0.0030) model time 0.4426 (0.4435) loss 4.6663 (4.5663) grad_norm 1.2643 (1.5776) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][210/625] eta 0:03:05 lr 0.000741 wd 0.0500 time 0.4414 (0.4478) data time 0.0008 (0.0029) model time 0.4406 (0.4458) loss 4.2555 (4.5781) grad_norm 1.4110 (1.5792) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][220/625] eta 0:03:01 lr 0.000742 wd 0.0500 time 0.4478 (0.4477) data time 0.0009 (0.0028) model time 0.4470 (0.4457) loss 3.6371 (4.5525) grad_norm 1.7726 (1.5852) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][230/625] eta 0:02:56 lr 0.000743 wd 0.0500 time 0.4446 (0.4475) data time 0.0008 (0.0027) model time 0.4438 (0.4455) loss 4.9902 (4.5556) grad_norm 1.5291 (1.5998) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][240/625] eta 0:02:52 lr 0.000744 wd 0.0500 time 0.4441 (0.4473) data time 0.0007 (0.0026) model time 0.4434 (0.4453) loss 3.8021 (4.5425) grad_norm 2.2331 (1.5986) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][250/625] eta 0:02:47 lr 0.000745 wd 0.0500 time 0.4401 (0.4471) data time 0.0008 (0.0025) model time 0.4393 (0.4452) loss 3.7164 (4.5393) grad_norm 2.1159 (1.6045) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][260/625] eta 0:02:43 lr 0.000746 wd 0.0500 time 0.4451 (0.4470) data time 0.0006 (0.0025) model time 0.4445 (0.4451) loss 4.4835 (4.5468) grad_norm 1.6131 (1.6018) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][270/625] eta 0:02:38 lr 0.000747 wd 0.0500 time 0.4463 (0.4469) data time 0.0006 (0.0024) model time 0.4457 (0.4450) loss 5.0295 (4.5385) grad_norm 1.5231 (1.5990) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][280/625] eta 0:02:34 lr 0.000748 wd 0.0500 time 0.4415 (0.4469) data time 0.0010 (0.0024) model time 0.4405 (0.4451) loss 4.4087 (4.5386) grad_norm 1.7137 (1.6039) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][290/625] eta 0:02:29 lr 0.000749 wd 0.0500 time 0.4422 (0.4468) data time 0.0008 (0.0023) model time 0.4414 (0.4450) loss 3.7536 (4.5440) grad_norm 1.2405 (1.5997) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][300/625] eta 0:02:25 lr 0.000750 wd 0.0500 time 0.4450 (0.4468) data time 0.0007 (0.0023) model time 0.4443 (0.4450) loss 5.3646 (4.5429) grad_norm 1.4127 (1.6008) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][310/625] eta 0:02:20 lr 0.000751 wd 0.0500 time 0.4435 (0.4468) data time 0.0009 (0.0022) model time 0.4425 (0.4450) loss 4.9793 (4.5477) grad_norm 1.5617 (1.6054) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][320/625] eta 0:02:16 lr 0.000751 wd 0.0500 time 0.4444 (0.4468) data time 0.0007 (0.0022) model time 0.4437 (0.4451) loss 4.6871 (4.5524) grad_norm 2.4153 (1.6069) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][330/625] eta 0:02:11 lr 0.000752 wd 0.0500 time 0.4373 (0.4467) data time 0.0007 (0.0021) model time 0.4366 (0.4450) loss 3.9315 (4.5455) grad_norm 1.5559 (1.6039) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][340/625] eta 0:02:07 lr 0.000753 wd 0.0500 time 0.4396 (0.4467) data time 0.0006 (0.0021) model time 0.4390 (0.4450) loss 5.1172 (4.5456) grad_norm 1.9816 (1.6038) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][350/625] eta 0:02:02 lr 0.000754 wd 0.0500 time 0.4405 (0.4473) data time 0.0010 (0.0020) model time 0.4395 (0.4457) loss 3.6098 (4.5520) grad_norm 1.5316 (1.6041) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][360/625] eta 0:01:58 lr 0.000755 wd 0.0500 time 0.4441 (0.4476) data time 0.0006 (0.0020) model time 0.4435 (0.4462) loss 3.9846 (4.5481) grad_norm 1.6944 (1.6013) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][370/625] eta 0:01:54 lr 0.000756 wd 0.0500 time 0.4410 (0.4475) data time 0.0008 (0.0020) model time 0.4402 (0.4461) loss 4.6271 (4.5539) grad_norm 1.5316 (1.5972) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][380/625] eta 0:01:49 lr 0.000757 wd 0.0500 time 0.4459 (0.4474) data time 0.0007 (0.0020) model time 0.4453 (0.4460) loss 4.4622 (4.5535) grad_norm 1.3360 (1.6038) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][390/625] eta 0:01:45 lr 0.000758 wd 0.0500 time 0.4430 (0.4474) data time 0.0009 (0.0019) model time 0.4421 (0.4459) loss 4.9557 (4.5515) grad_norm 1.2835 (1.6041) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][400/625] eta 0:01:40 lr 0.000759 wd 0.0500 time 0.4404 (0.4476) data time 0.0008 (0.0019) model time 0.4396 (0.4462) loss 4.7748 (4.5535) grad_norm 1.4203 (1.6012) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][410/625] eta 0:01:36 lr 0.000760 wd 0.0500 time 0.4437 (0.4475) data time 0.0009 (0.0019) model time 0.4428 (0.4461) loss 4.5860 (4.5592) grad_norm 1.5728 (1.5980) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][420/625] eta 0:01:31 lr 0.000761 wd 0.0500 time 0.4440 (0.4475) data time 0.0007 (0.0018) model time 0.4432 (0.4461) loss 4.1272 (4.5518) grad_norm 1.6445 (1.6005) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][430/625] eta 0:01:27 lr 0.000762 wd 0.0500 time 0.4447 (0.4474) data time 0.0011 (0.0018) model time 0.4437 (0.4461) loss 4.5956 (4.5517) grad_norm 1.5088 (1.5987) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][440/625] eta 0:01:22 lr 0.000763 wd 0.0500 time 0.4420 (0.4474) data time 0.0006 (0.0018) model time 0.4414 (0.4461) loss 5.4661 (4.5537) grad_norm 2.0664 (1.6019) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][450/625] eta 0:01:18 lr 0.000764 wd 0.0500 time 0.4439 (0.4473) data time 0.0008 (0.0018) model time 0.4431 (0.4460) loss 4.1713 (4.5562) grad_norm 2.1233 (1.6043) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][460/625] eta 0:01:13 lr 0.000765 wd 0.0500 time 0.4422 (0.4473) data time 0.0007 (0.0017) model time 0.4415 (0.4459) loss 4.5239 (4.5492) grad_norm 1.4841 (1.6026) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][470/625] eta 0:01:09 lr 0.000766 wd 0.0500 time 0.4432 (0.4472) data time 0.0006 (0.0017) model time 0.4426 (0.4459) loss 3.5944 (4.5404) grad_norm 1.1625 (1.6032) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][480/625] eta 0:01:04 lr 0.000767 wd 0.0500 time 0.4471 (0.4472) data time 0.0008 (0.0017) model time 0.4463 (0.4458) loss 4.8715 (4.5409) grad_norm 1.6105 (1.6021) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][490/625] eta 0:01:00 lr 0.000768 wd 0.0500 time 0.4439 (0.4471) data time 0.0007 (0.0017) model time 0.4432 (0.4458) loss 3.3751 (4.5394) grad_norm 1.2238 (1.6001) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][500/625] eta 0:00:55 lr 0.000769 wd 0.0500 time 0.4444 (0.4471) data time 0.0005 (0.0017) model time 0.4439 (0.4457) loss 4.6239 (4.5365) grad_norm 1.8363 (1.5975) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][510/625] eta 0:00:51 lr 0.000770 wd 0.0500 time 0.4447 (0.4470) data time 0.0009 (0.0017) model time 0.4438 (0.4457) loss 4.3511 (4.5334) grad_norm 1.7520 (1.5957) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][520/625] eta 0:00:46 lr 0.000771 wd 0.0500 time 0.4436 (0.4469) data time 0.0006 (0.0016) model time 0.4430 (0.4456) loss 3.9326 (4.5369) grad_norm 1.8329 (1.5945) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][530/625] eta 0:00:42 lr 0.000772 wd 0.0500 time 0.4447 (0.4469) data time 0.0008 (0.0016) model time 0.4439 (0.4456) loss 3.8358 (4.5318) grad_norm 1.3383 (1.5928) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][540/625] eta 0:00:37 lr 0.000773 wd 0.0500 time 0.4430 (0.4468) data time 0.0008 (0.0016) model time 0.4422 (0.4455) loss 4.2121 (4.5309) grad_norm 1.2920 (1.5926) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][550/625] eta 0:00:33 lr 0.000774 wd 0.0500 time 0.4437 (0.4468) data time 0.0007 (0.0016) model time 0.4429 (0.4455) loss 4.3041 (4.5252) grad_norm 1.4720 (1.5906) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][560/625] eta 0:00:29 lr 0.000774 wd 0.0500 time 0.4456 (0.4467) data time 0.0006 (0.0016) model time 0.4450 (0.4454) loss 4.4369 (4.5224) grad_norm 1.5336 (1.5875) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][570/625] eta 0:00:24 lr 0.000775 wd 0.0500 time 0.4438 (0.4467) data time 0.0006 (0.0016) model time 0.4432 (0.4454) loss 5.3270 (4.5267) grad_norm 1.1163 (1.5853) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][580/625] eta 0:00:20 lr 0.000776 wd 0.0500 time 0.4356 (0.4466) data time 0.0009 (0.0016) model time 0.4348 (0.4453) loss 4.4156 (4.5261) grad_norm 1.8510 (1.5832) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][590/625] eta 0:00:15 lr 0.000777 wd 0.0500 time 0.4444 (0.4466) data time 0.0007 (0.0015) model time 0.4438 (0.4454) loss 5.3833 (4.5293) grad_norm 1.9670 (1.5868) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][600/625] eta 0:00:11 lr 0.000778 wd 0.0500 time 0.4435 (0.4466) data time 0.0008 (0.0015) model time 0.4427 (0.4453) loss 4.6109 (4.5299) grad_norm 2.4157 (1.5857) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][610/625] eta 0:00:06 lr 0.000779 wd 0.0500 time 0.4375 (0.4465) data time 0.0004 (0.0015) model time 0.4371 (0.4453) loss 3.7803 (4.5288) grad_norm 1.4029 (1.5873) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [12/300][620/625] eta 0:00:02 lr 0.000780 wd 0.0500 time 0.4390 (0.4464) data time 0.0006 (0.0015) model time 0.4385 (0.4452) loss 4.7274 (4.5289) grad_norm 1.3209 (1.5856) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 12 training takes 0:04:38 [2024-08-04 10:41:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:41:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:41:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 1.1475 (1.1475) Acc@1 73.096 (73.096) Acc@5 91.162 (91.162) Mem 16696MB [2024-08-04 10:41:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 2.0254 (1.3822) Acc@1 54.688 (65.887) Acc@5 79.639 (89.080) Mem 16696MB [2024-08-04 10:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 2.2168 (1.7227) Acc@1 50.244 (59.617) Acc@5 75.098 (83.598) Mem 16696MB [2024-08-04 10:41:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 59.601 Acc@5 83.581 [2024-08-04 10:41:16 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 59.6% [2024-08-04 10:41:16 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 59.60% [2024-08-04 10:41:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:41:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:41:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 5.7422 (5.7422) Acc@1 4.395 (4.395) Acc@5 15.820 (15.820) Mem 16696MB [2024-08-04 10:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.150) Loss 6.0742 (5.8413) Acc@1 3.223 (5.114) Acc@5 10.400 (14.222) Mem 16696MB [2024-08-04 10:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 5.8438 (5.8218) Acc@1 3.076 (4.911) Acc@5 10.645 (14.174) Mem 16696MB [2024-08-04 10:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 5.508 Acc@5 15.323 [2024-08-04 10:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 5.5% [2024-08-04 10:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 5.51% [2024-08-04 10:41:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:41:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:41:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][0/625] eta 0:08:19 lr 0.000781 wd 0.0500 time 0.7991 (0.7991) data time 0.4156 (0.4156) model time 0.0000 (0.0000) loss 4.9522 (4.9522) grad_norm 2.0476 (2.0476) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][10/625] eta 0:04:52 lr 0.000782 wd 0.0500 time 0.4427 (0.4762) data time 0.0007 (0.0385) model time 0.0000 (0.0000) loss 3.7729 (4.5217) grad_norm 1.8301 (1.6060) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][20/625] eta 0:04:38 lr 0.000783 wd 0.0500 time 0.4433 (0.4611) data time 0.0008 (0.0205) model time 0.0000 (0.0000) loss 5.1215 (4.5651) grad_norm 2.1845 (1.7618) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][30/625] eta 0:04:35 lr 0.000784 wd 0.0500 time 0.4431 (0.4624) data time 0.0006 (0.0142) model time 0.0000 (0.0000) loss 5.0777 (4.4931) grad_norm 1.2821 (1.6905) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][40/625] eta 0:04:27 lr 0.000785 wd 0.0500 time 0.4434 (0.4580) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 4.4148 (4.4579) grad_norm 1.5409 (1.6892) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][50/625] eta 0:04:21 lr 0.000785 wd 0.0500 time 0.4387 (0.4549) data time 0.0006 (0.0089) model time 0.0000 (0.0000) loss 3.9861 (4.4833) grad_norm 1.5850 (1.7065) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][60/625] eta 0:04:17 lr 0.000786 wd 0.0500 time 0.6578 (0.4563) data time 0.0008 (0.0076) model time 0.6571 (0.4629) loss 4.9188 (4.5085) grad_norm 1.3142 (1.6773) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][70/625] eta 0:04:11 lr 0.000787 wd 0.0500 time 0.4397 (0.4534) data time 0.0006 (0.0066) model time 0.4391 (0.4487) loss 4.8324 (4.4835) grad_norm 2.2197 (1.6504) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:41:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][80/625] eta 0:04:06 lr 0.000788 wd 0.0500 time 0.4402 (0.4521) data time 0.0006 (0.0059) model time 0.4396 (0.4465) loss 3.8517 (4.4924) grad_norm 2.0455 (1.6875) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][90/625] eta 0:04:01 lr 0.000789 wd 0.0500 time 0.4403 (0.4510) data time 0.0006 (0.0053) model time 0.4398 (0.4453) loss 4.7746 (4.4787) grad_norm 1.4167 (1.6793) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][100/625] eta 0:03:56 lr 0.000790 wd 0.0500 time 0.4425 (0.4502) data time 0.0007 (0.0049) model time 0.4418 (0.4446) loss 3.2986 (4.4656) grad_norm 1.4344 (1.6532) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][110/625] eta 0:03:51 lr 0.000791 wd 0.0500 time 0.4397 (0.4494) data time 0.0008 (0.0045) model time 0.4389 (0.4439) loss 4.2090 (4.4469) grad_norm 1.3096 (1.6450) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][120/625] eta 0:03:46 lr 0.000792 wd 0.0500 time 0.4404 (0.4488) data time 0.0008 (0.0042) model time 0.4396 (0.4437) loss 4.6794 (4.4430) grad_norm 1.4242 (1.6170) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][130/625] eta 0:03:41 lr 0.000793 wd 0.0500 time 0.4408 (0.4483) data time 0.0006 (0.0040) model time 0.4402 (0.4434) loss 5.1411 (4.4463) grad_norm 1.7907 (1.6137) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][140/625] eta 0:03:37 lr 0.000794 wd 0.0500 time 0.4431 (0.4479) data time 0.0009 (0.0037) model time 0.4422 (0.4432) loss 4.7633 (4.4770) grad_norm 1.8694 (1.6053) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][150/625] eta 0:03:32 lr 0.000795 wd 0.0500 time 0.4420 (0.4476) data time 0.0009 (0.0035) model time 0.4411 (0.4431) loss 3.8773 (4.4770) grad_norm 1.3310 (1.5994) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][160/625] eta 0:03:28 lr 0.000796 wd 0.0500 time 0.4432 (0.4474) data time 0.0006 (0.0034) model time 0.4425 (0.4431) loss 4.4561 (4.4866) grad_norm 2.0543 (1.6015) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][170/625] eta 0:03:23 lr 0.000797 wd 0.0500 time 0.4470 (0.4471) data time 0.0007 (0.0032) model time 0.4464 (0.4431) loss 4.2623 (4.4874) grad_norm 1.5531 (1.5979) loss_scale 32768.0000 (32768.0000) mem 16696MB [2024-08-04 10:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][180/625] eta 0:03:18 lr 0.000798 wd 0.0500 time 0.4407 (0.4469) data time 0.0008 (0.0031) model time 0.4399 (0.4430) loss 3.9725 (4.4856) grad_norm 1.5721 (inf) loss_scale 16384.0000 (31862.8066) mem 16696MB [2024-08-04 10:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][190/625] eta 0:03:14 lr 0.000799 wd 0.0500 time 0.4477 (0.4467) data time 0.0010 (0.0030) model time 0.4466 (0.4429) loss 3.6484 (4.4772) grad_norm 1.4368 (inf) loss_scale 16384.0000 (31052.3979) mem 16696MB [2024-08-04 10:42:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][200/625] eta 0:03:09 lr 0.000800 wd 0.0500 time 0.4456 (0.4465) data time 0.0007 (0.0029) model time 0.4449 (0.4429) loss 4.9662 (4.4789) grad_norm 1.4447 (inf) loss_scale 16384.0000 (30322.6269) mem 16696MB [2024-08-04 10:42:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][210/625] eta 0:03:05 lr 0.000801 wd 0.0500 time 0.4431 (0.4464) data time 0.0007 (0.0028) model time 0.4424 (0.4430) loss 4.8121 (4.4772) grad_norm 1.8071 (inf) loss_scale 16384.0000 (29662.0284) mem 16696MB [2024-08-04 10:43:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][220/625] eta 0:03:00 lr 0.000802 wd 0.0500 time 0.4429 (0.4463) data time 0.0006 (0.0027) model time 0.4423 (0.4430) loss 3.8319 (4.4654) grad_norm 2.7331 (inf) loss_scale 16384.0000 (29061.2127) mem 16696MB [2024-08-04 10:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][230/625] eta 0:02:56 lr 0.000803 wd 0.0500 time 0.4458 (0.4462) data time 0.0008 (0.0026) model time 0.4449 (0.4430) loss 4.1711 (4.4694) grad_norm 1.1594 (inf) loss_scale 16384.0000 (28512.4156) mem 16696MB [2024-08-04 10:43:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][240/625] eta 0:02:51 lr 0.000804 wd 0.0500 time 0.4474 (0.4461) data time 0.0008 (0.0025) model time 0.4466 (0.4430) loss 4.9005 (4.4841) grad_norm 1.1210 (inf) loss_scale 16384.0000 (28009.1618) mem 16696MB [2024-08-04 10:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][250/625] eta 0:02:47 lr 0.000805 wd 0.0500 time 0.4432 (0.4461) data time 0.0007 (0.0024) model time 0.4425 (0.4430) loss 4.1640 (4.4868) grad_norm 1.4834 (inf) loss_scale 16384.0000 (27546.0080) mem 16696MB [2024-08-04 10:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][260/625] eta 0:02:42 lr 0.000806 wd 0.0500 time 0.4477 (0.4460) data time 0.0006 (0.0024) model time 0.4471 (0.4431) loss 4.8653 (4.4893) grad_norm 1.4713 (inf) loss_scale 16384.0000 (27118.3448) mem 16696MB [2024-08-04 10:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][270/625] eta 0:02:38 lr 0.000807 wd 0.0500 time 0.4443 (0.4460) data time 0.0007 (0.0023) model time 0.4436 (0.4432) loss 5.1093 (4.4959) grad_norm 1.3261 (inf) loss_scale 16384.0000 (26722.2435) mem 16696MB [2024-08-04 10:43:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][280/625] eta 0:02:33 lr 0.000808 wd 0.0500 time 0.4414 (0.4460) data time 0.0006 (0.0023) model time 0.4407 (0.4432) loss 4.3778 (4.4942) grad_norm 1.2554 (inf) loss_scale 16384.0000 (26354.3345) mem 16696MB [2024-08-04 10:43:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][290/625] eta 0:02:29 lr 0.000808 wd 0.0500 time 0.4450 (0.4459) data time 0.0006 (0.0022) model time 0.4444 (0.4433) loss 3.1289 (4.4826) grad_norm 1.3743 (inf) loss_scale 16384.0000 (26011.7113) mem 16696MB [2024-08-04 10:43:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][300/625] eta 0:02:24 lr 0.000809 wd 0.0500 time 0.4412 (0.4459) data time 0.0008 (0.0022) model time 0.4404 (0.4432) loss 3.5128 (4.4843) grad_norm 1.6096 (inf) loss_scale 16384.0000 (25691.8538) mem 16696MB [2024-08-04 10:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][310/625] eta 0:02:20 lr 0.000810 wd 0.0500 time 0.4466 (0.4458) data time 0.0008 (0.0021) model time 0.4457 (0.4433) loss 4.9266 (4.4827) grad_norm 1.6505 (inf) loss_scale 16384.0000 (25392.5659) mem 16696MB [2024-08-04 10:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][320/625] eta 0:02:15 lr 0.000811 wd 0.0500 time 0.4404 (0.4458) data time 0.0008 (0.0021) model time 0.4395 (0.4432) loss 4.8905 (4.4931) grad_norm 1.4822 (inf) loss_scale 16384.0000 (25111.9252) mem 16696MB [2024-08-04 10:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][330/625] eta 0:02:11 lr 0.000812 wd 0.0500 time 0.4427 (0.4457) data time 0.0008 (0.0020) model time 0.4420 (0.4432) loss 4.8403 (4.4948) grad_norm 1.7782 (inf) loss_scale 16384.0000 (24848.2417) mem 16696MB [2024-08-04 10:43:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][340/625] eta 0:02:07 lr 0.000813 wd 0.0500 time 0.4426 (0.4456) data time 0.0008 (0.0020) model time 0.4418 (0.4432) loss 4.8131 (4.4978) grad_norm 1.2233 (inf) loss_scale 16384.0000 (24600.0235) mem 16696MB [2024-08-04 10:43:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][350/625] eta 0:02:02 lr 0.000814 wd 0.0500 time 0.4471 (0.4456) data time 0.0009 (0.0020) model time 0.4463 (0.4432) loss 4.9380 (4.4910) grad_norm 1.3086 (inf) loss_scale 16384.0000 (24365.9487) mem 16696MB [2024-08-04 10:44:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][360/625] eta 0:01:58 lr 0.000815 wd 0.0500 time 0.4408 (0.4456) data time 0.0009 (0.0019) model time 0.4399 (0.4433) loss 3.5666 (4.4820) grad_norm 1.5009 (inf) loss_scale 16384.0000 (24144.8421) mem 16696MB [2024-08-04 10:44:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][370/625] eta 0:01:53 lr 0.000816 wd 0.0500 time 0.4499 (0.4462) data time 0.0008 (0.0019) model time 0.4491 (0.4440) loss 4.6916 (4.4905) grad_norm 1.7059 (inf) loss_scale 16384.0000 (23935.6550) mem 16696MB [2024-08-04 10:44:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][380/625] eta 0:01:49 lr 0.000817 wd 0.0500 time 0.4429 (0.4461) data time 0.0006 (0.0019) model time 0.4423 (0.4440) loss 4.5975 (4.4939) grad_norm 1.8392 (inf) loss_scale 16384.0000 (23737.4488) mem 16696MB [2024-08-04 10:44:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][390/625] eta 0:01:44 lr 0.000818 wd 0.0500 time 0.4420 (0.4461) data time 0.0006 (0.0018) model time 0.4413 (0.4440) loss 4.3550 (4.4957) grad_norm 1.0738 (inf) loss_scale 16384.0000 (23549.3811) mem 16696MB [2024-08-04 10:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][400/625] eta 0:01:40 lr 0.000819 wd 0.0500 time 0.4411 (0.4465) data time 0.0007 (0.0018) model time 0.4403 (0.4445) loss 4.8717 (4.4982) grad_norm 2.3699 (inf) loss_scale 16384.0000 (23370.6933) mem 16696MB [2024-08-04 10:44:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][410/625] eta 0:01:35 lr 0.000820 wd 0.0500 time 0.4471 (0.4464) data time 0.0010 (0.0018) model time 0.4461 (0.4445) loss 4.7256 (4.4977) grad_norm 1.0849 (inf) loss_scale 16384.0000 (23200.7007) mem 16696MB [2024-08-04 10:44:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][420/625] eta 0:01:31 lr 0.000821 wd 0.0500 time 0.4444 (0.4464) data time 0.0006 (0.0018) model time 0.4438 (0.4445) loss 5.2442 (4.5028) grad_norm 1.7418 (inf) loss_scale 16384.0000 (23038.7838) mem 16696MB [2024-08-04 10:44:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][430/625] eta 0:01:27 lr 0.000822 wd 0.0500 time 0.4445 (0.4464) data time 0.0007 (0.0017) model time 0.4438 (0.4445) loss 5.3278 (4.5050) grad_norm 1.1631 (inf) loss_scale 16384.0000 (22884.3805) mem 16696MB [2024-08-04 10:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][440/625] eta 0:01:22 lr 0.000823 wd 0.0500 time 0.4472 (0.4464) data time 0.0008 (0.0017) model time 0.4463 (0.4445) loss 4.5852 (4.5050) grad_norm 1.8284 (inf) loss_scale 16384.0000 (22736.9796) mem 16696MB [2024-08-04 10:44:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][450/625] eta 0:01:18 lr 0.000824 wd 0.0500 time 0.4459 (0.4463) data time 0.0008 (0.0017) model time 0.4452 (0.4444) loss 3.3163 (4.5007) grad_norm 1.5358 (inf) loss_scale 16384.0000 (22596.1153) mem 16696MB [2024-08-04 10:44:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][460/625] eta 0:01:13 lr 0.000825 wd 0.0500 time 0.4398 (0.4462) data time 0.0008 (0.0017) model time 0.4389 (0.4444) loss 4.1346 (4.4937) grad_norm 1.9273 (inf) loss_scale 16384.0000 (22461.3623) mem 16696MB [2024-08-04 10:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][470/625] eta 0:01:09 lr 0.000826 wd 0.0500 time 0.4416 (0.4462) data time 0.0009 (0.0017) model time 0.4407 (0.4443) loss 4.5082 (4.4979) grad_norm 1.5313 (inf) loss_scale 16384.0000 (22332.3312) mem 16696MB [2024-08-04 10:44:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][480/625] eta 0:01:04 lr 0.000827 wd 0.0500 time 0.4466 (0.4461) data time 0.0006 (0.0017) model time 0.4460 (0.4443) loss 3.7725 (4.5006) grad_norm 1.1174 (inf) loss_scale 16384.0000 (22208.6653) mem 16696MB [2024-08-04 10:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][490/625] eta 0:01:00 lr 0.000828 wd 0.0500 time 0.4407 (0.4461) data time 0.0009 (0.0016) model time 0.4397 (0.4442) loss 4.6477 (4.5006) grad_norm 1.4481 (inf) loss_scale 16384.0000 (22090.0367) mem 16696MB [2024-08-04 10:45:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][500/625] eta 0:00:55 lr 0.000829 wd 0.0500 time 0.4502 (0.4460) data time 0.0008 (0.0016) model time 0.4494 (0.4442) loss 4.7098 (4.5044) grad_norm 1.4252 (inf) loss_scale 16384.0000 (21976.1437) mem 16696MB [2024-08-04 10:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][510/625] eta 0:00:51 lr 0.000830 wd 0.0500 time 0.4459 (0.4460) data time 0.0007 (0.0016) model time 0.4452 (0.4442) loss 4.4409 (4.5008) grad_norm 1.3765 (inf) loss_scale 16384.0000 (21866.7084) mem 16696MB [2024-08-04 10:45:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][520/625] eta 0:00:46 lr 0.000831 wd 0.0500 time 0.4406 (0.4459) data time 0.0007 (0.0016) model time 0.4398 (0.4442) loss 5.1022 (4.5053) grad_norm 1.1553 (inf) loss_scale 16384.0000 (21761.4741) mem 16696MB [2024-08-04 10:45:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][530/625] eta 0:00:42 lr 0.000831 wd 0.0500 time 0.4452 (0.4459) data time 0.0008 (0.0016) model time 0.4444 (0.4442) loss 2.8331 (4.4987) grad_norm 1.4516 (inf) loss_scale 16384.0000 (21660.2034) mem 16696MB [2024-08-04 10:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][540/625] eta 0:00:37 lr 0.000832 wd 0.0500 time 0.4441 (0.4459) data time 0.0007 (0.0016) model time 0.4435 (0.4442) loss 5.1047 (4.4960) grad_norm 1.5857 (inf) loss_scale 16384.0000 (21562.6765) mem 16696MB [2024-08-04 10:45:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][550/625] eta 0:00:33 lr 0.000833 wd 0.0500 time 0.4421 (0.4459) data time 0.0009 (0.0015) model time 0.4413 (0.4442) loss 5.1905 (4.4946) grad_norm 1.6344 (inf) loss_scale 16384.0000 (21468.6897) mem 16696MB [2024-08-04 10:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][560/625] eta 0:00:28 lr 0.000834 wd 0.0500 time 0.4468 (0.4461) data time 0.0006 (0.0015) model time 0.4462 (0.4445) loss 4.7302 (4.4861) grad_norm 1.2413 (inf) loss_scale 16384.0000 (21378.0535) mem 16696MB [2024-08-04 10:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][570/625] eta 0:00:24 lr 0.000835 wd 0.0500 time 0.4483 (0.4461) data time 0.0008 (0.0015) model time 0.4475 (0.4445) loss 4.5070 (4.4853) grad_norm 1.2780 (inf) loss_scale 16384.0000 (21290.5919) mem 16696MB [2024-08-04 10:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][580/625] eta 0:00:20 lr 0.000836 wd 0.0500 time 0.4448 (0.4461) data time 0.0009 (0.0015) model time 0.4439 (0.4445) loss 3.7146 (4.4848) grad_norm 1.4447 (inf) loss_scale 16384.0000 (21206.1411) mem 16696MB [2024-08-04 10:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][590/625] eta 0:00:15 lr 0.000837 wd 0.0500 time 0.4520 (0.4462) data time 0.0009 (0.0015) model time 0.4512 (0.4446) loss 5.1172 (4.4832) grad_norm 1.5030 (inf) loss_scale 16384.0000 (21124.5482) mem 16696MB [2024-08-04 10:45:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][600/625] eta 0:00:11 lr 0.000838 wd 0.0500 time 0.4427 (0.4462) data time 0.0007 (0.0015) model time 0.4419 (0.4446) loss 4.4394 (4.4839) grad_norm 1.4680 (inf) loss_scale 16384.0000 (21045.6705) mem 16696MB [2024-08-04 10:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][610/625] eta 0:00:06 lr 0.000839 wd 0.0500 time 0.4392 (0.4461) data time 0.0004 (0.0015) model time 0.4388 (0.4445) loss 4.6794 (4.4826) grad_norm 3.0598 (inf) loss_scale 16384.0000 (20969.3748) mem 16696MB [2024-08-04 10:45:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [13/300][620/625] eta 0:00:02 lr 0.000840 wd 0.0500 time 0.4394 (0.4460) data time 0.0006 (0.0015) model time 0.4387 (0.4445) loss 4.1431 (4.4849) grad_norm 1.1074 (inf) loss_scale 16384.0000 (20895.5362) mem 16696MB [2024-08-04 10:46:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 13 training takes 0:04:38 [2024-08-04 10:46:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:46:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:46:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 1.1348 (1.1348) Acc@1 73.145 (73.145) Acc@5 91.357 (91.357) Mem 16696MB [2024-08-04 10:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 1.9287 (1.3389) Acc@1 55.322 (67.116) Acc@5 81.299 (89.795) Mem 16696MB [2024-08-04 10:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 2.0645 (1.6379) Acc@1 53.662 (61.442) Acc@5 77.783 (84.975) Mem 16696MB [2024-08-04 10:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 61.470 Acc@5 84.967 [2024-08-04 10:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 61.5% [2024-08-04 10:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 61.47% [2024-08-04 10:46:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:46:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:46:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 5.1836 (5.1836) Acc@1 9.814 (9.814) Acc@5 26.123 (26.123) Mem 16696MB [2024-08-04 10:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 5.5586 (5.2752) Acc@1 7.422 (9.628) Acc@5 18.896 (23.877) Mem 16696MB [2024-08-04 10:46:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 5.3789 (5.2878) Acc@1 5.957 (9.249) Acc@5 18.750 (23.177) Mem 16696MB [2024-08-04 10:46:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 10.009 Acc@5 24.530 [2024-08-04 10:46:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 10.0% [2024-08-04 10:46:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 10.01% [2024-08-04 10:46:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:46:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][0/625] eta 0:08:31 lr 0.000841 wd 0.0500 time 0.8179 (0.8179) data time 0.4355 (0.4355) model time 0.0000 (0.0000) loss 4.3374 (4.3374) grad_norm 1.8450 (1.8450) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][10/625] eta 0:04:54 lr 0.000842 wd 0.0500 time 0.4433 (0.4783) data time 0.0008 (0.0403) model time 0.0000 (0.0000) loss 4.3948 (4.5928) grad_norm 1.3170 (1.4192) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][20/625] eta 0:04:39 lr 0.000843 wd 0.0500 time 0.4408 (0.4617) data time 0.0007 (0.0215) model time 0.0000 (0.0000) loss 4.6572 (4.5212) grad_norm 1.5686 (1.5166) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][30/625] eta 0:04:31 lr 0.000843 wd 0.0500 time 0.4461 (0.4559) data time 0.0008 (0.0148) model time 0.0000 (0.0000) loss 4.9831 (4.5520) grad_norm 1.2955 (1.5101) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][40/625] eta 0:04:25 lr 0.000844 wd 0.0500 time 0.4489 (0.4531) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 4.2714 (4.5295) grad_norm 2.4107 (1.5607) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][50/625] eta 0:04:19 lr 0.000845 wd 0.0500 time 0.4400 (0.4513) data time 0.0007 (0.0093) model time 0.0000 (0.0000) loss 4.4005 (4.4824) grad_norm 2.1576 (1.5695) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][60/625] eta 0:04:16 lr 0.000846 wd 0.0500 time 0.6717 (0.4536) data time 0.0008 (0.0079) model time 0.6709 (0.4647) loss 4.2652 (4.4614) grad_norm 1.4043 (1.5793) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][70/625] eta 0:04:10 lr 0.000847 wd 0.0500 time 0.4466 (0.4514) data time 0.0008 (0.0069) model time 0.4458 (0.4507) loss 3.9785 (4.4674) grad_norm 1.3318 (1.6018) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][80/625] eta 0:04:05 lr 0.000848 wd 0.0500 time 0.4435 (0.4505) data time 0.0006 (0.0062) model time 0.4429 (0.4483) loss 4.4032 (4.4764) grad_norm 1.4315 (1.5795) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][90/625] eta 0:04:00 lr 0.000849 wd 0.0500 time 0.4430 (0.4497) data time 0.0008 (0.0056) model time 0.4422 (0.4467) loss 4.2907 (4.4652) grad_norm 1.3175 (1.5829) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:46:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][100/625] eta 0:03:55 lr 0.000850 wd 0.0500 time 0.4471 (0.4492) data time 0.0008 (0.0051) model time 0.4464 (0.4463) loss 4.7809 (4.4698) grad_norm 2.4212 (1.5903) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][110/625] eta 0:03:51 lr 0.000851 wd 0.0500 time 0.4372 (0.4489) data time 0.0008 (0.0047) model time 0.4365 (0.4460) loss 4.6862 (4.4854) grad_norm 2.0527 (1.5999) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][120/625] eta 0:03:46 lr 0.000852 wd 0.0500 time 0.4486 (0.4486) data time 0.0008 (0.0044) model time 0.4478 (0.4458) loss 4.3916 (4.4747) grad_norm 1.7428 (1.6012) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][130/625] eta 0:03:42 lr 0.000853 wd 0.0500 time 0.4446 (0.4499) data time 0.0007 (0.0041) model time 0.4439 (0.4482) loss 3.5667 (4.4595) grad_norm 1.5656 (1.5986) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][140/625] eta 0:03:38 lr 0.000854 wd 0.0500 time 0.4431 (0.4495) data time 0.0008 (0.0039) model time 0.4423 (0.4477) loss 3.2331 (4.4471) grad_norm 1.3116 (1.5924) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][150/625] eta 0:03:33 lr 0.000855 wd 0.0500 time 0.4434 (0.4491) data time 0.0009 (0.0037) model time 0.4425 (0.4471) loss 4.4800 (4.4454) grad_norm 1.5014 (1.5901) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][160/625] eta 0:03:28 lr 0.000856 wd 0.0500 time 0.4447 (0.4488) data time 0.0008 (0.0035) model time 0.4439 (0.4468) loss 4.4650 (4.4520) grad_norm 1.1786 (1.5854) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][170/625] eta 0:03:24 lr 0.000857 wd 0.0500 time 0.4470 (0.4486) data time 0.0007 (0.0034) model time 0.4463 (0.4466) loss 4.0982 (4.4467) grad_norm 1.7511 (1.5964) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][180/625] eta 0:03:19 lr 0.000858 wd 0.0500 time 0.4467 (0.4483) data time 0.0006 (0.0032) model time 0.4461 (0.4463) loss 4.9183 (4.4607) grad_norm 1.1679 (1.5998) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][190/625] eta 0:03:14 lr 0.000859 wd 0.0500 time 0.4437 (0.4481) data time 0.0008 (0.0031) model time 0.4429 (0.4461) loss 4.0688 (4.4595) grad_norm 2.0380 (1.5975) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][200/625] eta 0:03:10 lr 0.000860 wd 0.0500 time 0.4457 (0.4479) data time 0.0006 (0.0030) model time 0.4451 (0.4460) loss 5.0537 (4.4482) grad_norm 1.6734 (1.5897) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][210/625] eta 0:03:05 lr 0.000861 wd 0.0500 time 0.4442 (0.4478) data time 0.0007 (0.0029) model time 0.4436 (0.4458) loss 4.9794 (4.4522) grad_norm 1.5120 (1.5835) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][220/625] eta 0:03:01 lr 0.000862 wd 0.0500 time 0.4443 (0.4476) data time 0.0008 (0.0028) model time 0.4435 (0.4457) loss 4.7894 (4.4741) grad_norm 1.4495 (1.5753) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:47:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][230/625] eta 0:02:56 lr 0.000863 wd 0.0500 time 0.4457 (0.4475) data time 0.0006 (0.0027) model time 0.4451 (0.4456) loss 5.0332 (4.4535) grad_norm 1.3062 (1.5681) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][240/625] eta 0:02:52 lr 0.000864 wd 0.0500 time 0.4473 (0.4475) data time 0.0008 (0.0026) model time 0.4465 (0.4456) loss 4.9038 (4.4701) grad_norm 1.3557 (1.5700) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][250/625] eta 0:02:47 lr 0.000865 wd 0.0500 time 0.4481 (0.4474) data time 0.0006 (0.0025) model time 0.4474 (0.4456) loss 3.1944 (4.4427) grad_norm 1.5005 (1.5726) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][260/625] eta 0:02:43 lr 0.000866 wd 0.0500 time 0.4482 (0.4474) data time 0.0006 (0.0025) model time 0.4476 (0.4456) loss 4.1370 (4.4267) grad_norm 1.9356 (1.5688) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][270/625] eta 0:02:38 lr 0.000866 wd 0.0500 time 0.4448 (0.4472) data time 0.0006 (0.0024) model time 0.4442 (0.4455) loss 5.0991 (4.4357) grad_norm 1.1612 (1.5663) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][280/625] eta 0:02:34 lr 0.000867 wd 0.0500 time 0.4462 (0.4472) data time 0.0008 (0.0024) model time 0.4454 (0.4454) loss 4.6221 (4.4312) grad_norm 1.3262 (1.5644) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][290/625] eta 0:02:29 lr 0.000868 wd 0.0500 time 0.4437 (0.4471) data time 0.0006 (0.0023) model time 0.4431 (0.4454) loss 4.8649 (4.4259) grad_norm 1.1692 (1.5572) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][300/625] eta 0:02:25 lr 0.000869 wd 0.0500 time 0.4437 (0.4470) data time 0.0008 (0.0022) model time 0.4430 (0.4453) loss 3.2019 (4.4094) grad_norm 1.2779 (1.5534) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][310/625] eta 0:02:20 lr 0.000870 wd 0.0500 time 0.4432 (0.4469) data time 0.0006 (0.0022) model time 0.4426 (0.4453) loss 4.8012 (4.4080) grad_norm 1.2948 (1.5549) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][320/625] eta 0:02:16 lr 0.000871 wd 0.0500 time 0.4449 (0.4469) data time 0.0006 (0.0022) model time 0.4442 (0.4453) loss 3.0790 (4.4028) grad_norm 1.5245 (1.5517) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][330/625] eta 0:02:11 lr 0.000872 wd 0.0500 time 0.4460 (0.4469) data time 0.0007 (0.0021) model time 0.4453 (0.4453) loss 4.8535 (4.4091) grad_norm 1.4044 (1.5538) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][340/625] eta 0:02:07 lr 0.000873 wd 0.0500 time 0.4474 (0.4469) data time 0.0007 (0.0021) model time 0.4466 (0.4453) loss 5.1907 (4.4089) grad_norm 1.4138 (1.5597) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][350/625] eta 0:02:03 lr 0.000874 wd 0.0500 time 0.4408 (0.4473) data time 0.0007 (0.0020) model time 0.4402 (0.4459) loss 4.4244 (4.4038) grad_norm 1.2934 (1.5611) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][360/625] eta 0:01:58 lr 0.000875 wd 0.0500 time 0.4467 (0.4473) data time 0.0008 (0.0020) model time 0.4459 (0.4458) loss 4.7029 (4.4119) grad_norm 1.3448 (1.5598) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][370/625] eta 0:01:54 lr 0.000876 wd 0.0500 time 0.4438 (0.4472) data time 0.0006 (0.0020) model time 0.4432 (0.4458) loss 3.8907 (4.4030) grad_norm 1.6186 (1.5623) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][380/625] eta 0:01:49 lr 0.000877 wd 0.0500 time 0.4457 (0.4472) data time 0.0006 (0.0019) model time 0.4451 (0.4457) loss 4.6201 (4.4007) grad_norm 1.2137 (1.5563) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][390/625] eta 0:01:45 lr 0.000878 wd 0.0500 time 0.4427 (0.4471) data time 0.0006 (0.0019) model time 0.4421 (0.4457) loss 4.2543 (4.4004) grad_norm 1.3053 (1.5542) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][400/625] eta 0:01:40 lr 0.000879 wd 0.0500 time 0.4436 (0.4475) data time 0.0006 (0.0019) model time 0.4431 (0.4461) loss 4.2926 (4.3953) grad_norm 2.2664 (1.5545) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][410/625] eta 0:01:36 lr 0.000880 wd 0.0500 time 0.4441 (0.4474) data time 0.0007 (0.0019) model time 0.4434 (0.4461) loss 3.8796 (4.3919) grad_norm 2.0999 (1.5603) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][420/625] eta 0:01:31 lr 0.000881 wd 0.0500 time 0.4439 (0.4474) data time 0.0009 (0.0018) model time 0.4430 (0.4460) loss 4.6629 (4.3987) grad_norm 1.3718 (1.5580) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][430/625] eta 0:01:27 lr 0.000882 wd 0.0500 time 0.4449 (0.4473) data time 0.0008 (0.0018) model time 0.4441 (0.4459) loss 3.2253 (4.3870) grad_norm 1.8095 (1.5593) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][440/625] eta 0:01:22 lr 0.000883 wd 0.0500 time 0.4446 (0.4472) data time 0.0006 (0.0018) model time 0.4440 (0.4459) loss 3.0659 (4.3849) grad_norm 1.5549 (1.5562) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][450/625] eta 0:01:18 lr 0.000884 wd 0.0500 time 0.4431 (0.4472) data time 0.0007 (0.0018) model time 0.4423 (0.4459) loss 3.8857 (4.3871) grad_norm 1.2203 (1.5529) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][460/625] eta 0:01:13 lr 0.000885 wd 0.0500 time 0.4448 (0.4471) data time 0.0009 (0.0017) model time 0.4439 (0.4458) loss 4.6052 (4.3868) grad_norm 1.4276 (1.5524) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][470/625] eta 0:01:09 lr 0.000886 wd 0.0500 time 0.4449 (0.4471) data time 0.0008 (0.0017) model time 0.4441 (0.4458) loss 3.8560 (4.3886) grad_norm 1.1112 (1.5498) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][480/625] eta 0:01:04 lr 0.000887 wd 0.0500 time 0.4469 (0.4470) data time 0.0006 (0.0017) model time 0.4463 (0.4457) loss 5.3217 (4.3866) grad_norm 1.4462 (1.5454) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][490/625] eta 0:01:00 lr 0.000888 wd 0.0500 time 0.4467 (0.4469) data time 0.0009 (0.0017) model time 0.4458 (0.4456) loss 4.3248 (4.3851) grad_norm 2.5487 (1.5469) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][500/625] eta 0:00:55 lr 0.000889 wd 0.0500 time 0.4441 (0.4473) data time 0.0009 (0.0017) model time 0.4432 (0.4461) loss 4.5029 (4.3818) grad_norm 1.3744 (1.5449) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][510/625] eta 0:00:51 lr 0.000889 wd 0.0500 time 0.4465 (0.4473) data time 0.0008 (0.0017) model time 0.4457 (0.4460) loss 4.8266 (4.3818) grad_norm 2.0473 (1.5452) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][520/625] eta 0:00:46 lr 0.000890 wd 0.0500 time 0.4423 (0.4473) data time 0.0006 (0.0016) model time 0.4418 (0.4460) loss 3.3521 (4.3831) grad_norm 1.6652 (1.5496) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][530/625] eta 0:00:42 lr 0.000891 wd 0.0500 time 0.4438 (0.4472) data time 0.0009 (0.0016) model time 0.4429 (0.4460) loss 4.7885 (4.3890) grad_norm 1.4534 (1.5509) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][540/625] eta 0:00:38 lr 0.000892 wd 0.0500 time 0.4440 (0.4472) data time 0.0006 (0.0016) model time 0.4434 (0.4459) loss 3.5048 (4.3851) grad_norm 1.1908 (1.5494) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][550/625] eta 0:00:33 lr 0.000893 wd 0.0500 time 0.4454 (0.4471) data time 0.0008 (0.0016) model time 0.4445 (0.4459) loss 4.7166 (4.3825) grad_norm 1.2000 (1.5458) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][560/625] eta 0:00:29 lr 0.000894 wd 0.0500 time 0.4429 (0.4471) data time 0.0009 (0.0016) model time 0.4421 (0.4458) loss 4.5039 (4.3827) grad_norm 1.4095 (1.5422) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][570/625] eta 0:00:24 lr 0.000895 wd 0.0500 time 0.4369 (0.4470) data time 0.0009 (0.0016) model time 0.4360 (0.4458) loss 4.2514 (4.3788) grad_norm 1.1064 (1.5406) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][580/625] eta 0:00:20 lr 0.000896 wd 0.0500 time 0.4433 (0.4469) data time 0.0008 (0.0016) model time 0.4425 (0.4457) loss 3.4128 (4.3718) grad_norm 1.6311 (1.5382) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][590/625] eta 0:00:15 lr 0.000897 wd 0.0500 time 0.4432 (0.4470) data time 0.0007 (0.0015) model time 0.4425 (0.4457) loss 4.6730 (4.3723) grad_norm 1.5064 (1.5387) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][600/625] eta 0:00:11 lr 0.000898 wd 0.0500 time 0.4558 (0.4469) data time 0.0009 (0.0015) model time 0.4550 (0.4457) loss 4.3649 (4.3673) grad_norm 1.6211 (1.5422) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][610/625] eta 0:00:06 lr 0.000899 wd 0.0500 time 0.4393 (0.4469) data time 0.0004 (0.0015) model time 0.4389 (0.4457) loss 5.2475 (4.3691) grad_norm 1.7588 (1.5428) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [14/300][620/625] eta 0:00:02 lr 0.000900 wd 0.0500 time 0.4356 (0.4468) data time 0.0006 (0.0015) model time 0.4350 (0.4456) loss 4.4225 (4.3726) grad_norm 1.7766 (1.5439) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:50:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 14 training takes 0:04:39 [2024-08-04 10:50:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:50:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:50:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 1.0557 (1.0557) Acc@1 75.146 (75.146) Acc@5 92.822 (92.822) Mem 16696MB [2024-08-04 10:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.154) Loss 1.8701 (1.2893) Acc@1 55.811 (68.124) Acc@5 82.812 (90.598) Mem 16696MB [2024-08-04 10:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 2.0391 (1.5872) Acc@1 53.760 (62.746) Acc@5 79.150 (85.893) Mem 16696MB [2024-08-04 10:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 62.590 Acc@5 85.829 [2024-08-04 10:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 62.6% [2024-08-04 10:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 62.59% [2024-08-04 10:50:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:50:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:50:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.491 (0.491) Loss 4.5430 (4.5430) Acc@1 17.773 (17.773) Acc@5 38.672 (38.672) Mem 16696MB [2024-08-04 10:50:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 4.9570 (4.6197) Acc@1 13.086 (16.366) Acc@5 28.760 (36.022) Mem 16696MB [2024-08-04 10:51:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 4.8594 (4.6789) Acc@1 11.279 (15.595) Acc@5 29.639 (34.584) Mem 16696MB [2024-08-04 10:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 16.531 Acc@5 36.038 [2024-08-04 10:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 16.5% [2024-08-04 10:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 16.53% [2024-08-04 10:51:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:51:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:51:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][0/625] eta 0:08:01 lr 0.000900 wd 0.0500 time 0.7700 (0.7700) data time 0.3889 (0.3889) model time 0.0000 (0.0000) loss 4.1554 (4.1554) grad_norm 1.2821 (1.2821) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][10/625] eta 0:04:50 lr 0.000901 wd 0.0500 time 0.4440 (0.4726) data time 0.0006 (0.0360) model time 0.0000 (0.0000) loss 4.8187 (4.3990) grad_norm 1.4257 (1.4163) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][20/625] eta 0:04:37 lr 0.000902 wd 0.0500 time 0.4418 (0.4584) data time 0.0008 (0.0193) model time 0.0000 (0.0000) loss 4.8242 (4.4770) grad_norm 1.5312 (1.4418) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][30/625] eta 0:04:30 lr 0.000903 wd 0.0500 time 0.4402 (0.4539) data time 0.0006 (0.0133) model time 0.0000 (0.0000) loss 4.9674 (4.3401) grad_norm 1.2694 (1.4618) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][40/625] eta 0:04:23 lr 0.000904 wd 0.0500 time 0.4456 (0.4512) data time 0.0006 (0.0102) model time 0.0000 (0.0000) loss 5.0376 (4.4519) grad_norm 1.8742 (1.4532) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][50/625] eta 0:04:18 lr 0.000905 wd 0.0500 time 0.4462 (0.4499) data time 0.0006 (0.0084) model time 0.0000 (0.0000) loss 4.0267 (4.5001) grad_norm 1.6295 (1.4605) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][60/625] eta 0:04:15 lr 0.000906 wd 0.0500 time 0.6557 (0.4524) data time 0.0008 (0.0071) model time 0.6549 (0.4647) loss 4.2962 (4.4664) grad_norm 1.7559 (1.5045) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][70/625] eta 0:04:09 lr 0.000907 wd 0.0500 time 0.4430 (0.4503) data time 0.0006 (0.0062) model time 0.4424 (0.4505) loss 4.8833 (4.4599) grad_norm 1.6526 (1.5370) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][80/625] eta 0:04:05 lr 0.000908 wd 0.0500 time 0.4528 (0.4499) data time 0.0009 (0.0056) model time 0.4519 (0.4490) loss 4.7409 (4.4486) grad_norm 1.2234 (1.5176) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][90/625] eta 0:04:01 lr 0.000909 wd 0.0500 time 0.4447 (0.4516) data time 0.0006 (0.0051) model time 0.4441 (0.4531) loss 5.3143 (4.4572) grad_norm 1.7618 (1.5034) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][100/625] eta 0:03:56 lr 0.000910 wd 0.0500 time 0.4399 (0.4508) data time 0.0010 (0.0046) model time 0.4389 (0.4509) loss 4.5456 (4.4507) grad_norm 1.0332 (1.4838) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][110/625] eta 0:03:51 lr 0.000911 wd 0.0500 time 0.4413 (0.4501) data time 0.0008 (0.0043) model time 0.4405 (0.4494) loss 4.5130 (4.4649) grad_norm 1.6999 (1.4753) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:51:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][120/625] eta 0:03:47 lr 0.000912 wd 0.0500 time 0.4416 (0.4495) data time 0.0008 (0.0040) model time 0.4408 (0.4484) loss 4.6432 (4.4641) grad_norm 1.2805 (1.4604) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][130/625] eta 0:03:42 lr 0.000913 wd 0.0500 time 0.4429 (0.4491) data time 0.0006 (0.0038) model time 0.4423 (0.4477) loss 3.1992 (4.4189) grad_norm 1.7960 (1.4736) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][140/625] eta 0:03:37 lr 0.000914 wd 0.0500 time 0.4427 (0.4487) data time 0.0007 (0.0035) model time 0.4420 (0.4472) loss 3.8573 (4.3844) grad_norm 1.3635 (1.4666) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][150/625] eta 0:03:32 lr 0.000915 wd 0.0500 time 0.4427 (0.4483) data time 0.0008 (0.0034) model time 0.4419 (0.4467) loss 4.7745 (4.3880) grad_norm 1.1661 (1.4656) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][160/625] eta 0:03:28 lr 0.000916 wd 0.0500 time 0.4464 (0.4480) data time 0.0006 (0.0032) model time 0.4458 (0.4464) loss 4.5714 (4.3969) grad_norm 1.3731 (1.4721) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][170/625] eta 0:03:23 lr 0.000917 wd 0.0500 time 0.4450 (0.4479) data time 0.0006 (0.0031) model time 0.4444 (0.4463) loss 3.8080 (4.3917) grad_norm 1.5112 (1.4767) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][180/625] eta 0:03:19 lr 0.000918 wd 0.0500 time 0.4424 (0.4477) data time 0.0008 (0.0029) model time 0.4416 (0.4460) loss 4.7061 (4.3867) grad_norm 1.5307 (1.4870) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][190/625] eta 0:03:14 lr 0.000919 wd 0.0500 time 0.4418 (0.4475) data time 0.0006 (0.0028) model time 0.4412 (0.4458) loss 5.0396 (4.3871) grad_norm 1.7705 (1.4889) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][200/625] eta 0:03:10 lr 0.000920 wd 0.0500 time 0.4443 (0.4473) data time 0.0007 (0.0027) model time 0.4437 (0.4457) loss 5.1074 (4.3870) grad_norm 1.6477 (1.4957) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][210/625] eta 0:03:05 lr 0.000921 wd 0.0500 time 0.4450 (0.4472) data time 0.0006 (0.0026) model time 0.4444 (0.4456) loss 5.2351 (4.3771) grad_norm 1.3416 (1.4968) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][220/625] eta 0:03:01 lr 0.000922 wd 0.0500 time 0.4436 (0.4471) data time 0.0006 (0.0025) model time 0.4429 (0.4455) loss 4.5673 (4.3686) grad_norm 1.3622 (1.4966) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][230/625] eta 0:02:56 lr 0.000923 wd 0.0500 time 0.4456 (0.4471) data time 0.0007 (0.0025) model time 0.4448 (0.4455) loss 3.4939 (4.3656) grad_norm 1.3763 (1.4992) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][240/625] eta 0:02:52 lr 0.000924 wd 0.0500 time 0.4453 (0.4470) data time 0.0006 (0.0024) model time 0.4447 (0.4454) loss 3.2024 (4.3640) grad_norm 1.2643 (1.5057) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][250/625] eta 0:02:47 lr 0.000924 wd 0.0500 time 0.4431 (0.4469) data time 0.0006 (0.0023) model time 0.4425 (0.4453) loss 5.2698 (4.3664) grad_norm 1.1697 (1.5002) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][260/625] eta 0:02:43 lr 0.000925 wd 0.0500 time 0.4429 (0.4468) data time 0.0007 (0.0023) model time 0.4421 (0.4452) loss 4.2406 (4.3587) grad_norm 1.9280 (1.4983) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][270/625] eta 0:02:38 lr 0.000926 wd 0.0500 time 0.4513 (0.4467) data time 0.0008 (0.0022) model time 0.4504 (0.4452) loss 3.7104 (4.3387) grad_norm 1.4588 (1.4942) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][280/625] eta 0:02:34 lr 0.000927 wd 0.0500 time 0.4440 (0.4466) data time 0.0008 (0.0022) model time 0.4432 (0.4451) loss 3.9844 (4.3482) grad_norm 1.6698 (1.4912) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][290/625] eta 0:02:29 lr 0.000928 wd 0.0500 time 0.4417 (0.4466) data time 0.0008 (0.0021) model time 0.4408 (0.4451) loss 4.6746 (4.3517) grad_norm 1.8785 (1.4925) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][300/625] eta 0:02:25 lr 0.000929 wd 0.0500 time 0.4436 (0.4464) data time 0.0008 (0.0021) model time 0.4428 (0.4450) loss 4.5112 (4.3604) grad_norm 1.3907 (1.4916) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][310/625] eta 0:02:20 lr 0.000930 wd 0.0500 time 0.4418 (0.4464) data time 0.0006 (0.0020) model time 0.4412 (0.4449) loss 3.6920 (4.3617) grad_norm 1.8109 (1.4935) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][320/625] eta 0:02:16 lr 0.000931 wd 0.0500 time 0.4399 (0.4463) data time 0.0008 (0.0020) model time 0.4391 (0.4448) loss 4.6501 (4.3703) grad_norm 1.1442 (1.4899) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][330/625] eta 0:02:11 lr 0.000932 wd 0.0500 time 0.4426 (0.4462) data time 0.0006 (0.0020) model time 0.4420 (0.4447) loss 4.4940 (4.3762) grad_norm 1.3651 (1.4885) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][340/625] eta 0:02:07 lr 0.000933 wd 0.0500 time 0.4401 (0.4461) data time 0.0008 (0.0019) model time 0.4393 (0.4446) loss 4.5980 (4.3781) grad_norm 1.8546 (1.4909) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][350/625] eta 0:02:02 lr 0.000934 wd 0.0500 time 0.4466 (0.4460) data time 0.0006 (0.0019) model time 0.4460 (0.4446) loss 4.6073 (4.3719) grad_norm 1.8019 (1.4958) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][360/625] eta 0:01:58 lr 0.000935 wd 0.0500 time 0.4466 (0.4460) data time 0.0008 (0.0019) model time 0.4459 (0.4445) loss 4.6480 (4.3713) grad_norm 1.2681 (1.4935) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][370/625] eta 0:01:53 lr 0.000936 wd 0.0500 time 0.4438 (0.4459) data time 0.0008 (0.0018) model time 0.4431 (0.4445) loss 3.7988 (4.3655) grad_norm 1.4848 (1.4909) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][380/625] eta 0:01:49 lr 0.000937 wd 0.0500 time 0.4453 (0.4459) data time 0.0008 (0.0018) model time 0.4446 (0.4445) loss 4.8044 (4.3607) grad_norm 1.5905 (1.4921) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][390/625] eta 0:01:44 lr 0.000938 wd 0.0500 time 0.4480 (0.4459) data time 0.0006 (0.0018) model time 0.4474 (0.4445) loss 3.2688 (4.3529) grad_norm 1.2758 (1.4925) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][400/625] eta 0:01:40 lr 0.000939 wd 0.0500 time 0.4444 (0.4462) data time 0.0006 (0.0018) model time 0.4438 (0.4449) loss 4.9154 (4.3531) grad_norm 1.2074 (1.4923) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][410/625] eta 0:01:35 lr 0.000940 wd 0.0500 time 0.4423 (0.4462) data time 0.0009 (0.0017) model time 0.4414 (0.4449) loss 4.6254 (4.3487) grad_norm 1.7605 (1.4878) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][420/625] eta 0:01:31 lr 0.000941 wd 0.0500 time 0.4432 (0.4461) data time 0.0008 (0.0017) model time 0.4424 (0.4448) loss 3.4968 (4.3516) grad_norm 1.2463 (1.4899) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][430/625] eta 0:01:27 lr 0.000942 wd 0.0500 time 0.4500 (0.4466) data time 0.0008 (0.0017) model time 0.4492 (0.4454) loss 4.5813 (4.3524) grad_norm 1.3359 (1.4955) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][440/625] eta 0:01:22 lr 0.000943 wd 0.0500 time 0.4404 (0.4465) data time 0.0008 (0.0017) model time 0.4396 (0.4453) loss 4.5827 (4.3491) grad_norm 1.4345 (1.4994) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][450/625] eta 0:01:18 lr 0.000944 wd 0.0500 time 0.4462 (0.4465) data time 0.0007 (0.0016) model time 0.4455 (0.4453) loss 5.0862 (4.3474) grad_norm 1.3149 (1.4981) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][460/625] eta 0:01:13 lr 0.000945 wd 0.0500 time 0.4438 (0.4464) data time 0.0005 (0.0016) model time 0.4432 (0.4452) loss 5.3474 (4.3444) grad_norm 1.8068 (1.4981) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][470/625] eta 0:01:09 lr 0.000946 wd 0.0500 time 0.4413 (0.4464) data time 0.0008 (0.0016) model time 0.4405 (0.4452) loss 4.0312 (4.3461) grad_norm 1.3898 (1.5055) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][480/625] eta 0:01:04 lr 0.000947 wd 0.0500 time 0.4453 (0.4464) data time 0.0008 (0.0016) model time 0.4445 (0.4452) loss 3.7942 (4.3354) grad_norm 1.3215 (1.5049) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][490/625] eta 0:01:00 lr 0.000947 wd 0.0500 time 0.4408 (0.4463) data time 0.0007 (0.0016) model time 0.4401 (0.4452) loss 3.5606 (4.3363) grad_norm 1.2528 (1.5010) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][500/625] eta 0:00:55 lr 0.000948 wd 0.0500 time 0.4416 (0.4463) data time 0.0007 (0.0016) model time 0.4409 (0.4451) loss 4.0280 (4.3379) grad_norm 1.4773 (1.5036) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][510/625] eta 0:00:51 lr 0.000949 wd 0.0500 time 0.4460 (0.4463) data time 0.0007 (0.0015) model time 0.4453 (0.4451) loss 4.5278 (4.3404) grad_norm 1.3595 (1.5041) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][520/625] eta 0:00:46 lr 0.000950 wd 0.0500 time 0.4468 (0.4463) data time 0.0008 (0.0015) model time 0.4460 (0.4451) loss 4.5341 (4.3400) grad_norm 1.2975 (1.5025) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][530/625] eta 0:00:42 lr 0.000951 wd 0.0500 time 0.4433 (0.4463) data time 0.0008 (0.0015) model time 0.4425 (0.4451) loss 4.7549 (4.3358) grad_norm 1.1568 (1.5029) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][540/625] eta 0:00:37 lr 0.000952 wd 0.0500 time 0.4459 (0.4463) data time 0.0006 (0.0015) model time 0.4453 (0.4451) loss 4.6273 (4.3362) grad_norm 1.6960 (1.5053) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][550/625] eta 0:00:33 lr 0.000953 wd 0.0500 time 0.4433 (0.4463) data time 0.0009 (0.0015) model time 0.4424 (0.4451) loss 4.1018 (4.3368) grad_norm 1.2545 (1.5042) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][560/625] eta 0:00:29 lr 0.000954 wd 0.0500 time 0.4484 (0.4462) data time 0.0007 (0.0015) model time 0.4477 (0.4451) loss 4.6366 (4.3368) grad_norm 1.3770 (1.5055) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][570/625] eta 0:00:24 lr 0.000955 wd 0.0500 time 0.4419 (0.4462) data time 0.0009 (0.0015) model time 0.4410 (0.4451) loss 4.3691 (4.3398) grad_norm 1.6710 (1.5028) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][580/625] eta 0:00:20 lr 0.000956 wd 0.0500 time 0.4480 (0.4462) data time 0.0008 (0.0015) model time 0.4471 (0.4451) loss 4.2332 (4.3354) grad_norm 1.8754 (1.5033) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][590/625] eta 0:00:15 lr 0.000957 wd 0.0500 time 0.4444 (0.4463) data time 0.0006 (0.0014) model time 0.4438 (0.4452) loss 5.2938 (4.3337) grad_norm 1.5593 (1.5076) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][600/625] eta 0:00:11 lr 0.000958 wd 0.0500 time 0.4449 (0.4463) data time 0.0006 (0.0014) model time 0.4443 (0.4452) loss 5.4560 (4.3335) grad_norm 1.7202 (1.5088) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][610/625] eta 0:00:06 lr 0.000959 wd 0.0500 time 0.4391 (0.4462) data time 0.0006 (0.0014) model time 0.4386 (0.4451) loss 3.7838 (4.3323) grad_norm 1.3992 (1.5046) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [15/300][620/625] eta 0:00:02 lr 0.000960 wd 0.0500 time 0.4367 (0.4464) data time 0.0005 (0.0014) model time 0.4362 (0.4453) loss 4.2324 (4.3294) grad_norm 1.7627 (1.5047) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 15 training takes 0:04:38 [2024-08-04 10:55:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 10:55:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 10:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.9761 (0.9761) Acc@1 75.586 (75.586) Acc@5 93.994 (93.994) Mem 16696MB [2024-08-04 10:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 1.7822 (1.2343) Acc@1 58.984 (69.695) Acc@5 82.764 (91.295) Mem 16696MB [2024-08-04 10:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 1.8652 (1.5228) Acc@1 57.715 (64.211) Acc@5 81.592 (86.879) Mem 16696MB [2024-08-04 10:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 63.992 Acc@5 86.832 [2024-08-04 10:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 64.0% [2024-08-04 10:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 63.99% [2024-08-04 10:55:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 10:55:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 10:55:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 3.8496 (3.8496) Acc@1 27.734 (27.734) Acc@5 52.393 (52.393) Mem 16696MB [2024-08-04 10:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.152) Loss 4.3477 (3.9554) Acc@1 19.287 (24.290) Acc@5 41.064 (48.318) Mem 16696MB [2024-08-04 10:55:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.134) Loss 4.3320 (4.0648) Acc@1 17.090 (23.077) Acc@5 38.721 (45.866) Mem 16696MB [2024-08-04 10:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 24.052 Acc@5 47.147 [2024-08-04 10:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 24.1% [2024-08-04 10:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 24.05% [2024-08-04 10:55:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 10:55:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 10:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][0/625] eta 0:08:43 lr 0.000960 wd 0.0500 time 0.8380 (0.8380) data time 0.4541 (0.4541) model time 0.0000 (0.0000) loss 4.4370 (4.4370) grad_norm 1.6192 (1.6192) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:55:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][10/625] eta 0:04:54 lr 0.000961 wd 0.0500 time 0.4435 (0.4794) data time 0.0006 (0.0420) model time 0.0000 (0.0000) loss 4.8595 (4.3727) grad_norm 1.4307 (1.6422) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][20/625] eta 0:04:40 lr 0.000962 wd 0.0500 time 0.4517 (0.4631) data time 0.0006 (0.0224) model time 0.0000 (0.0000) loss 5.1464 (4.4483) grad_norm 1.3401 (1.5226) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][30/625] eta 0:04:32 lr 0.000963 wd 0.0500 time 0.4488 (0.4575) data time 0.0009 (0.0154) model time 0.0000 (0.0000) loss 3.7646 (4.3978) grad_norm 3.0121 (1.5740) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][40/625] eta 0:04:26 lr 0.000964 wd 0.0500 time 0.4444 (0.4547) data time 0.0009 (0.0119) model time 0.0000 (0.0000) loss 4.7138 (4.4428) grad_norm 1.6242 (1.6459) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][50/625] eta 0:04:20 lr 0.000965 wd 0.0500 time 0.4431 (0.4527) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 4.1097 (4.3648) grad_norm 1.2551 (1.5862) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][60/625] eta 0:04:17 lr 0.000966 wd 0.0500 time 0.6605 (0.4551) data time 0.0009 (0.0083) model time 0.6596 (0.4665) loss 4.6439 (4.4281) grad_norm 1.3742 (1.5739) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][70/625] eta 0:04:11 lr 0.000967 wd 0.0500 time 0.4459 (0.4527) data time 0.0008 (0.0072) model time 0.4451 (0.4519) loss 5.1040 (4.4157) grad_norm 1.3581 (1.5551) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][80/625] eta 0:04:06 lr 0.000968 wd 0.0500 time 0.4429 (0.4518) data time 0.0007 (0.0064) model time 0.4422 (0.4495) loss 5.1379 (4.4198) grad_norm 1.3825 (1.5341) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][90/625] eta 0:04:01 lr 0.000969 wd 0.0500 time 0.4447 (0.4510) data time 0.0006 (0.0058) model time 0.4441 (0.4480) loss 3.9947 (4.4051) grad_norm 1.6757 (1.5503) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][100/625] eta 0:03:56 lr 0.000970 wd 0.0500 time 0.4436 (0.4502) data time 0.0007 (0.0053) model time 0.4428 (0.4469) loss 2.7889 (4.4049) grad_norm 1.4458 (1.5531) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][110/625] eta 0:03:51 lr 0.000971 wd 0.0500 time 0.4437 (0.4498) data time 0.0007 (0.0049) model time 0.4430 (0.4465) loss 4.8251 (4.3945) grad_norm 2.4422 (1.5531) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][120/625] eta 0:03:46 lr 0.000972 wd 0.0500 time 0.4427 (0.4493) data time 0.0008 (0.0046) model time 0.4419 (0.4459) loss 4.7141 (4.4048) grad_norm 1.2281 (1.5520) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][130/625] eta 0:03:42 lr 0.000973 wd 0.0500 time 0.4462 (0.4489) data time 0.0009 (0.0043) model time 0.4453 (0.4456) loss 4.7051 (4.3995) grad_norm 1.2574 (1.5378) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][140/625] eta 0:03:38 lr 0.000974 wd 0.0500 time 0.4392 (0.4498) data time 0.0008 (0.0041) model time 0.4384 (0.4473) loss 4.7897 (4.3905) grad_norm 1.4298 (1.5345) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][150/625] eta 0:03:33 lr 0.000975 wd 0.0500 time 0.4475 (0.4494) data time 0.0009 (0.0038) model time 0.4466 (0.4469) loss 4.3810 (4.3741) grad_norm 1.4982 (1.5285) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][160/625] eta 0:03:28 lr 0.000976 wd 0.0500 time 0.4464 (0.4491) data time 0.0009 (0.0037) model time 0.4456 (0.4466) loss 4.0657 (4.3605) grad_norm 1.0907 (1.5186) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][170/625] eta 0:03:24 lr 0.000977 wd 0.0500 time 0.4433 (0.4488) data time 0.0006 (0.0035) model time 0.4427 (0.4463) loss 4.0037 (4.3574) grad_norm 1.4659 (1.5247) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][180/625] eta 0:03:19 lr 0.000978 wd 0.0500 time 0.4513 (0.4484) data time 0.0006 (0.0033) model time 0.4507 (0.4460) loss 4.8586 (4.3644) grad_norm 1.2114 (1.5148) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][190/625] eta 0:03:14 lr 0.000979 wd 0.0500 time 0.4413 (0.4483) data time 0.0007 (0.0032) model time 0.4406 (0.4458) loss 4.0772 (4.3438) grad_norm 1.1383 (1.5112) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][200/625] eta 0:03:10 lr 0.000980 wd 0.0500 time 0.4411 (0.4480) data time 0.0006 (0.0031) model time 0.4405 (0.4455) loss 5.0449 (4.3502) grad_norm 1.7840 (1.5278) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][210/625] eta 0:03:05 lr 0.000981 wd 0.0500 time 0.4400 (0.4477) data time 0.0008 (0.0030) model time 0.4392 (0.4453) loss 5.2300 (4.3484) grad_norm 1.1391 (1.5330) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][220/625] eta 0:03:01 lr 0.000981 wd 0.0500 time 0.4476 (0.4475) data time 0.0006 (0.0029) model time 0.4470 (0.4451) loss 3.7892 (4.3332) grad_norm 1.3986 (1.5324) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][230/625] eta 0:02:56 lr 0.000982 wd 0.0500 time 0.4423 (0.4474) data time 0.0005 (0.0028) model time 0.4418 (0.4451) loss 3.8702 (4.3225) grad_norm 1.3442 (1.5305) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][240/625] eta 0:02:52 lr 0.000983 wd 0.0500 time 0.4431 (0.4472) data time 0.0008 (0.0027) model time 0.4423 (0.4449) loss 4.3056 (4.3159) grad_norm 1.2721 (1.5265) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][250/625] eta 0:02:47 lr 0.000984 wd 0.0500 time 0.4433 (0.4471) data time 0.0007 (0.0026) model time 0.4426 (0.4449) loss 5.0758 (4.3357) grad_norm 1.3956 (1.5264) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][260/625] eta 0:02:43 lr 0.000985 wd 0.0500 time 0.4419 (0.4470) data time 0.0007 (0.0026) model time 0.4412 (0.4448) loss 4.5487 (4.3453) grad_norm 1.1739 (1.5247) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][270/625] eta 0:02:38 lr 0.000986 wd 0.0500 time 0.4440 (0.4468) data time 0.0008 (0.0025) model time 0.4432 (0.4446) loss 4.3126 (4.3444) grad_norm 1.4009 (1.5251) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:57:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][280/625] eta 0:02:34 lr 0.000987 wd 0.0500 time 0.4430 (0.4467) data time 0.0009 (0.0024) model time 0.4421 (0.4445) loss 4.7226 (4.3419) grad_norm 1.6496 (1.5317) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][290/625] eta 0:02:29 lr 0.000988 wd 0.0500 time 0.4449 (0.4466) data time 0.0009 (0.0024) model time 0.4440 (0.4445) loss 4.5449 (4.3411) grad_norm 1.4917 (1.5333) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 10:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][300/625] eta 0:02:25 lr 0.000989 wd 0.0500 time 0.4387 (0.4465) data time 0.0007 (0.0023) model time 0.4380 (0.4444) loss 5.0317 (4.3439) grad_norm 1.1676 (1.5297) loss_scale 32768.0000 (16656.1595) mem 16696MB [2024-08-04 10:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][310/625] eta 0:02:20 lr 0.000990 wd 0.0500 time 0.4436 (0.4464) data time 0.0006 (0.0023) model time 0.4430 (0.4444) loss 3.9409 (4.3449) grad_norm 1.0296 (1.5239) loss_scale 32768.0000 (17174.2251) mem 16696MB [2024-08-04 10:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][320/625] eta 0:02:16 lr 0.000991 wd 0.0500 time 0.4432 (0.4463) data time 0.0008 (0.0022) model time 0.4424 (0.4443) loss 4.5616 (4.3357) grad_norm 1.0632 (1.5186) loss_scale 32768.0000 (17660.0125) mem 16696MB [2024-08-04 10:58:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][330/625] eta 0:02:11 lr 0.000992 wd 0.0500 time 0.4372 (0.4463) data time 0.0009 (0.0022) model time 0.4364 (0.4443) loss 4.7580 (4.3363) grad_norm 1.4231 (1.5200) loss_scale 32768.0000 (18116.4471) mem 16696MB [2024-08-04 10:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][340/625] eta 0:02:07 lr 0.000993 wd 0.0500 time 0.4437 (0.4462) data time 0.0008 (0.0021) model time 0.4428 (0.4443) loss 4.4876 (4.3406) grad_norm 1.4270 (1.5179) loss_scale 32768.0000 (18546.1114) mem 16696MB [2024-08-04 10:58:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][350/625] eta 0:02:02 lr 0.000994 wd 0.0500 time 0.4423 (0.4462) data time 0.0007 (0.0021) model time 0.4416 (0.4442) loss 3.6230 (4.3368) grad_norm 2.9273 (1.5222) loss_scale 32768.0000 (18951.2934) mem 16696MB [2024-08-04 10:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][360/625] eta 0:01:58 lr 0.000995 wd 0.0500 time 0.4433 (0.4461) data time 0.0010 (0.0021) model time 0.4423 (0.4442) loss 4.7106 (4.3395) grad_norm 1.3208 (1.5255) loss_scale 32768.0000 (19334.0277) mem 16696MB [2024-08-04 10:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][370/625] eta 0:01:53 lr 0.000996 wd 0.0500 time 0.4491 (0.4461) data time 0.0007 (0.0020) model time 0.4485 (0.4442) loss 3.4864 (4.3373) grad_norm 1.3095 (1.5350) loss_scale 32768.0000 (19696.1294) mem 16696MB [2024-08-04 10:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][380/625] eta 0:01:49 lr 0.000997 wd 0.0500 time 0.4458 (0.4460) data time 0.0009 (0.0020) model time 0.4449 (0.4441) loss 4.7143 (4.3338) grad_norm 1.1366 (inf) loss_scale 16384.0000 (19609.1969) mem 16696MB [2024-08-04 10:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][390/625] eta 0:01:44 lr 0.000998 wd 0.0500 time 0.4460 (0.4459) data time 0.0007 (0.0020) model time 0.4453 (0.4441) loss 5.0508 (4.3315) grad_norm 1.3588 (inf) loss_scale 16384.0000 (19526.7110) mem 16696MB [2024-08-04 10:58:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][400/625] eta 0:01:40 lr 0.000999 wd 0.0500 time 0.4397 (0.4463) data time 0.0009 (0.0020) model time 0.4388 (0.4446) loss 2.8997 (4.3241) grad_norm 1.3211 (inf) loss_scale 16384.0000 (19448.3392) mem 16696MB [2024-08-04 10:58:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][410/625] eta 0:01:35 lr 0.001000 wd 0.0500 time 0.4405 (0.4463) data time 0.0007 (0.0019) model time 0.4398 (0.4446) loss 4.8516 (4.3229) grad_norm 2.1853 (inf) loss_scale 16384.0000 (19373.7810) mem 16696MB [2024-08-04 10:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][420/625] eta 0:01:31 lr 0.001001 wd 0.0500 time 0.4475 (0.4462) data time 0.0006 (0.0019) model time 0.4469 (0.4445) loss 4.5797 (4.3225) grad_norm 1.6492 (inf) loss_scale 16384.0000 (19302.7648) mem 16696MB [2024-08-04 10:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][430/625] eta 0:01:27 lr 0.001002 wd 0.0500 time 0.4421 (0.4462) data time 0.0008 (0.0019) model time 0.4414 (0.4445) loss 4.6182 (4.3132) grad_norm 1.3870 (inf) loss_scale 16384.0000 (19235.0441) mem 16696MB [2024-08-04 10:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][440/625] eta 0:01:22 lr 0.001003 wd 0.0500 time 0.4430 (0.4462) data time 0.0008 (0.0018) model time 0.4422 (0.4445) loss 4.1123 (4.3067) grad_norm 1.0835 (inf) loss_scale 16384.0000 (19170.3946) mem 16696MB [2024-08-04 10:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][450/625] eta 0:01:18 lr 0.001004 wd 0.0500 time 0.4417 (0.4461) data time 0.0009 (0.0018) model time 0.4408 (0.4445) loss 4.7235 (4.3109) grad_norm 1.1830 (inf) loss_scale 16384.0000 (19108.6120) mem 16696MB [2024-08-04 10:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][460/625] eta 0:01:13 lr 0.001004 wd 0.0500 time 0.4430 (0.4461) data time 0.0008 (0.0018) model time 0.4423 (0.4445) loss 3.2925 (4.3053) grad_norm 1.5857 (inf) loss_scale 16384.0000 (19049.5098) mem 16696MB [2024-08-04 10:59:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][470/625] eta 0:01:09 lr 0.001005 wd 0.0500 time 0.6615 (0.4466) data time 0.0008 (0.0018) model time 0.6608 (0.4450) loss 4.4533 (4.3099) grad_norm 1.6906 (inf) loss_scale 16384.0000 (18992.9172) mem 16696MB [2024-08-04 10:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][480/625] eta 0:01:04 lr 0.001006 wd 0.0500 time 0.4412 (0.4465) data time 0.0008 (0.0018) model time 0.4404 (0.4449) loss 2.8862 (4.3110) grad_norm 1.8309 (inf) loss_scale 16384.0000 (18938.6778) mem 16696MB [2024-08-04 10:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][490/625] eta 0:01:00 lr 0.001007 wd 0.0500 time 0.4456 (0.4465) data time 0.0007 (0.0017) model time 0.4448 (0.4449) loss 4.8029 (4.3109) grad_norm 1.6222 (inf) loss_scale 16384.0000 (18886.6477) mem 16696MB [2024-08-04 10:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][500/625] eta 0:00:55 lr 0.001008 wd 0.0500 time 0.4396 (0.4464) data time 0.0007 (0.0017) model time 0.4390 (0.4448) loss 3.5070 (4.3042) grad_norm 1.8304 (inf) loss_scale 16384.0000 (18836.6946) mem 16696MB [2024-08-04 10:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][510/625] eta 0:00:51 lr 0.001009 wd 0.0500 time 0.4414 (0.4463) data time 0.0006 (0.0017) model time 0.4408 (0.4447) loss 5.0777 (4.3083) grad_norm 2.3669 (inf) loss_scale 16384.0000 (18788.6967) mem 16696MB [2024-08-04 10:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][520/625] eta 0:00:46 lr 0.001010 wd 0.0500 time 0.4410 (0.4462) data time 0.0009 (0.0017) model time 0.4401 (0.4447) loss 4.4284 (4.3036) grad_norm 1.3423 (inf) loss_scale 16384.0000 (18742.5413) mem 16696MB [2024-08-04 10:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][530/625] eta 0:00:42 lr 0.001011 wd 0.0500 time 0.4417 (0.4461) data time 0.0009 (0.0017) model time 0.4408 (0.4446) loss 3.9292 (4.3036) grad_norm 1.4318 (inf) loss_scale 16384.0000 (18698.1243) mem 16696MB [2024-08-04 10:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][540/625] eta 0:00:37 lr 0.001012 wd 0.0500 time 0.4426 (0.4461) data time 0.0007 (0.0017) model time 0.4419 (0.4446) loss 4.6071 (4.2985) grad_norm 1.5254 (inf) loss_scale 16384.0000 (18655.3494) mem 16696MB [2024-08-04 10:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][550/625] eta 0:00:33 lr 0.001013 wd 0.0500 time 0.4429 (0.4460) data time 0.0006 (0.0016) model time 0.4423 (0.4445) loss 4.9107 (4.2994) grad_norm 1.4652 (inf) loss_scale 16384.0000 (18614.1270) mem 16696MB [2024-08-04 11:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][560/625] eta 0:00:28 lr 0.001014 wd 0.0500 time 0.4480 (0.4460) data time 0.0007 (0.0016) model time 0.4473 (0.4445) loss 3.4839 (4.2948) grad_norm 1.7136 (inf) loss_scale 16384.0000 (18574.3743) mem 16696MB [2024-08-04 11:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][570/625] eta 0:00:24 lr 0.001015 wd 0.0500 time 0.4399 (0.4459) data time 0.0007 (0.0016) model time 0.4392 (0.4445) loss 5.0469 (4.2888) grad_norm 1.6090 (inf) loss_scale 16384.0000 (18536.0140) mem 16696MB [2024-08-04 11:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][580/625] eta 0:00:20 lr 0.001016 wd 0.0500 time 0.4487 (0.4460) data time 0.0008 (0.0016) model time 0.4479 (0.4445) loss 4.6933 (4.2882) grad_norm 1.1782 (inf) loss_scale 16384.0000 (18498.9742) mem 16696MB [2024-08-04 11:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][590/625] eta 0:00:15 lr 0.001017 wd 0.0500 time 0.4454 (0.4460) data time 0.0006 (0.0016) model time 0.4448 (0.4446) loss 3.2818 (4.2852) grad_norm 1.4699 (inf) loss_scale 16384.0000 (18463.1878) mem 16696MB [2024-08-04 11:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][600/625] eta 0:00:11 lr 0.001018 wd 0.0500 time 0.4431 (0.4460) data time 0.0007 (0.0016) model time 0.4424 (0.4446) loss 4.1814 (4.2828) grad_norm 1.1035 (inf) loss_scale 16384.0000 (18428.5923) mem 16696MB [2024-08-04 11:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][610/625] eta 0:00:06 lr 0.001019 wd 0.0500 time 0.4403 (0.4460) data time 0.0006 (0.0016) model time 0.4397 (0.4445) loss 3.5004 (4.2826) grad_norm 1.6612 (inf) loss_scale 16384.0000 (18395.1293) mem 16696MB [2024-08-04 11:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [16/300][620/625] eta 0:00:02 lr 0.001020 wd 0.0500 time 0.4485 (0.4459) data time 0.0004 (0.0015) model time 0.4481 (0.4445) loss 3.7592 (4.2786) grad_norm 1.4136 (inf) loss_scale 16384.0000 (18362.7440) mem 16696MB [2024-08-04 11:00:31 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 16 training takes 0:04:38 [2024-08-04 11:00:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:00:33 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.9780 (0.9780) Acc@1 77.246 (77.246) Acc@5 93.945 (93.945) Mem 16696MB [2024-08-04 11:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.156) Loss 1.7725 (1.2498) Acc@1 59.424 (70.126) Acc@5 84.082 (91.553) Mem 16696MB [2024-08-04 11:00:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.137) Loss 1.9521 (1.5362) Acc@1 57.812 (64.748) Acc@5 81.055 (87.309) Mem 16696MB [2024-08-04 11:00:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 64.679 Acc@5 87.280 [2024-08-04 11:00:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 64.7% [2024-08-04 11:00:36 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 64.68% [2024-08-04 11:00:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:00:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.508 (0.508) Loss 3.2129 (3.2129) Acc@1 37.354 (37.354) Acc@5 62.207 (62.207) Mem 16696MB [2024-08-04 11:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.154) Loss 3.7988 (3.3514) Acc@1 26.416 (32.449) Acc@5 50.928 (58.518) Mem 16696MB [2024-08-04 11:00:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.135) Loss 3.8281 (3.5013) Acc@1 23.682 (30.783) Acc@5 48.486 (55.564) Mem 16696MB [2024-08-04 11:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 31.552 Acc@5 56.656 [2024-08-04 11:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 31.6% [2024-08-04 11:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 31.55% [2024-08-04 11:00:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:00:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][0/625] eta 0:07:49 lr 0.001020 wd 0.0500 time 0.7516 (0.7516) data time 0.3691 (0.3691) model time 0.0000 (0.0000) loss 3.5019 (3.5019) grad_norm 1.1091 (1.1091) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][10/625] eta 0:04:51 lr 0.001021 wd 0.0500 time 0.4530 (0.4739) data time 0.0006 (0.0342) model time 0.0000 (0.0000) loss 4.7232 (4.5853) grad_norm 1.9072 (1.4924) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][20/625] eta 0:04:38 lr 0.001022 wd 0.0500 time 0.4388 (0.4595) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 3.3522 (4.2944) grad_norm 1.1454 (1.4448) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][30/625] eta 0:04:30 lr 0.001023 wd 0.0500 time 0.4391 (0.4544) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 4.3151 (4.2252) grad_norm 1.3728 (1.4810) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][40/625] eta 0:04:24 lr 0.001024 wd 0.0500 time 0.4480 (0.4520) data time 0.0006 (0.0097) model time 0.0000 (0.0000) loss 4.6246 (4.1812) grad_norm 1.6404 (1.4685) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][50/625] eta 0:04:18 lr 0.001025 wd 0.0500 time 0.4410 (0.4503) data time 0.0008 (0.0080) model time 0.0000 (0.0000) loss 4.0564 (4.1964) grad_norm 1.2955 (1.4530) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][60/625] eta 0:04:15 lr 0.001026 wd 0.0500 time 0.6605 (0.4530) data time 0.0006 (0.0068) model time 0.6599 (0.4663) loss 5.0563 (4.2357) grad_norm 1.3483 (1.4381) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][70/625] eta 0:04:10 lr 0.001027 wd 0.0500 time 0.4449 (0.4518) data time 0.0006 (0.0060) model time 0.4443 (0.4550) loss 5.0392 (4.2379) grad_norm 1.2196 (1.4257) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][80/625] eta 0:04:05 lr 0.001028 wd 0.0500 time 0.4472 (0.4510) data time 0.0008 (0.0053) model time 0.4464 (0.4513) loss 4.3428 (4.2640) grad_norm 1.6764 (1.4222) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][90/625] eta 0:04:00 lr 0.001029 wd 0.0500 time 0.4501 (0.4502) data time 0.0009 (0.0049) model time 0.4492 (0.4493) loss 3.4600 (4.2615) grad_norm 1.1282 (1.4312) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][100/625] eta 0:03:56 lr 0.001030 wd 0.0500 time 0.4464 (0.4496) data time 0.0007 (0.0045) model time 0.4457 (0.4481) loss 3.2092 (4.2159) grad_norm 2.1892 (1.4328) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][110/625] eta 0:03:51 lr 0.001031 wd 0.0500 time 0.4375 (0.4492) data time 0.0006 (0.0041) model time 0.4369 (0.4475) loss 4.8104 (4.2356) grad_norm 1.5895 (1.4605) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][120/625] eta 0:03:46 lr 0.001032 wd 0.0500 time 0.4459 (0.4488) data time 0.0008 (0.0038) model time 0.4451 (0.4469) loss 4.2216 (4.2334) grad_norm 2.1162 (1.4662) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][130/625] eta 0:03:42 lr 0.001033 wd 0.0500 time 0.4429 (0.4485) data time 0.0006 (0.0036) model time 0.4422 (0.4466) loss 5.0109 (4.2340) grad_norm 1.0982 (1.4652) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][140/625] eta 0:03:37 lr 0.001034 wd 0.0500 time 0.4416 (0.4482) data time 0.0008 (0.0034) model time 0.4408 (0.4462) loss 4.8797 (4.2232) grad_norm 1.4309 (1.4518) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][150/625] eta 0:03:32 lr 0.001035 wd 0.0500 time 0.4452 (0.4479) data time 0.0006 (0.0032) model time 0.4446 (0.4459) loss 3.0689 (4.2085) grad_norm 1.4785 (1.4509) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][160/625] eta 0:03:28 lr 0.001036 wd 0.0500 time 0.4457 (0.4477) data time 0.0006 (0.0031) model time 0.4451 (0.4457) loss 4.6976 (4.2006) grad_norm 1.3887 (1.4495) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][170/625] eta 0:03:23 lr 0.001037 wd 0.0500 time 0.4413 (0.4474) data time 0.0006 (0.0029) model time 0.4407 (0.4454) loss 3.3995 (4.2016) grad_norm 1.3743 (1.4492) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][180/625] eta 0:03:19 lr 0.001038 wd 0.0500 time 0.4441 (0.4472) data time 0.0006 (0.0028) model time 0.4434 (0.4452) loss 3.4969 (4.1919) grad_norm 2.3327 (1.4552) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][190/625] eta 0:03:14 lr 0.001039 wd 0.0500 time 0.4482 (0.4472) data time 0.0008 (0.0027) model time 0.4474 (0.4452) loss 3.3962 (4.1938) grad_norm 1.2853 (1.4574) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][200/625] eta 0:03:09 lr 0.001039 wd 0.0500 time 0.4450 (0.4470) data time 0.0007 (0.0026) model time 0.4443 (0.4451) loss 4.3048 (4.1831) grad_norm 1.2688 (1.4596) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][210/625] eta 0:03:05 lr 0.001040 wd 0.0500 time 0.4434 (0.4469) data time 0.0006 (0.0025) model time 0.4428 (0.4450) loss 5.1876 (4.1863) grad_norm 1.6058 (1.4525) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][220/625] eta 0:03:00 lr 0.001041 wd 0.0500 time 0.4485 (0.4468) data time 0.0008 (0.0025) model time 0.4476 (0.4449) loss 4.2332 (4.1901) grad_norm 1.7599 (1.4617) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][230/625] eta 0:02:56 lr 0.001042 wd 0.0500 time 0.4472 (0.4467) data time 0.0006 (0.0024) model time 0.4466 (0.4449) loss 4.2340 (4.2000) grad_norm 1.3930 (1.4609) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][240/625] eta 0:02:51 lr 0.001043 wd 0.0500 time 0.4481 (0.4466) data time 0.0006 (0.0023) model time 0.4474 (0.4448) loss 4.0932 (4.2059) grad_norm 1.4654 (1.4681) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][250/625] eta 0:02:47 lr 0.001044 wd 0.0500 time 0.4462 (0.4466) data time 0.0007 (0.0023) model time 0.4455 (0.4448) loss 4.1021 (4.2108) grad_norm 1.9194 (1.4766) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][260/625] eta 0:02:43 lr 0.001045 wd 0.0500 time 0.4492 (0.4472) data time 0.0008 (0.0022) model time 0.4484 (0.4456) loss 4.4622 (4.2173) grad_norm 1.6943 (1.4802) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][270/625] eta 0:02:38 lr 0.001046 wd 0.0500 time 0.4416 (0.4471) data time 0.0006 (0.0022) model time 0.4410 (0.4455) loss 2.9066 (4.2109) grad_norm 1.6135 (1.4831) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][280/625] eta 0:02:34 lr 0.001047 wd 0.0500 time 0.4470 (0.4470) data time 0.0008 (0.0021) model time 0.4462 (0.4454) loss 4.5713 (4.2105) grad_norm 1.2132 (1.4810) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][290/625] eta 0:02:29 lr 0.001048 wd 0.0500 time 0.4451 (0.4469) data time 0.0006 (0.0021) model time 0.4445 (0.4453) loss 3.7566 (4.2047) grad_norm 1.7410 (1.4847) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][300/625] eta 0:02:25 lr 0.001049 wd 0.0500 time 0.4466 (0.4468) data time 0.0006 (0.0020) model time 0.4460 (0.4453) loss 5.2894 (4.1931) grad_norm 1.1064 (1.4797) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][310/625] eta 0:02:20 lr 0.001050 wd 0.0500 time 0.4414 (0.4467) data time 0.0009 (0.0020) model time 0.4406 (0.4452) loss 4.6490 (4.1915) grad_norm 1.4002 (1.4836) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][320/625] eta 0:02:16 lr 0.001051 wd 0.0500 time 0.4422 (0.4466) data time 0.0008 (0.0020) model time 0.4413 (0.4451) loss 4.4458 (4.1958) grad_norm 1.7122 (1.4864) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][330/625] eta 0:02:11 lr 0.001052 wd 0.0500 time 0.4442 (0.4466) data time 0.0008 (0.0019) model time 0.4434 (0.4451) loss 4.2668 (4.1969) grad_norm 1.7357 (1.4860) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][340/625] eta 0:02:07 lr 0.001053 wd 0.0500 time 0.4394 (0.4464) data time 0.0006 (0.0019) model time 0.4387 (0.4450) loss 4.1112 (4.2018) grad_norm 1.6712 (1.4849) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][350/625] eta 0:02:02 lr 0.001054 wd 0.0500 time 0.4387 (0.4463) data time 0.0009 (0.0019) model time 0.4379 (0.4448) loss 4.4616 (4.2051) grad_norm 2.0477 (1.4898) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][360/625] eta 0:01:58 lr 0.001055 wd 0.0500 time 0.4430 (0.4462) data time 0.0007 (0.0018) model time 0.4424 (0.4448) loss 4.6791 (4.2066) grad_norm 2.2867 (1.4965) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][370/625] eta 0:01:53 lr 0.001056 wd 0.0500 time 0.4413 (0.4462) data time 0.0008 (0.0018) model time 0.4406 (0.4447) loss 4.5456 (4.2074) grad_norm 1.8072 (1.4949) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][380/625] eta 0:01:49 lr 0.001057 wd 0.0500 time 0.4463 (0.4461) data time 0.0008 (0.0018) model time 0.4456 (0.4447) loss 4.2424 (4.2084) grad_norm 1.3816 (1.4963) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][390/625] eta 0:01:44 lr 0.001058 wd 0.0500 time 0.4386 (0.4461) data time 0.0007 (0.0017) model time 0.4379 (0.4447) loss 4.7014 (4.2125) grad_norm 1.7328 (1.4962) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][400/625] eta 0:01:40 lr 0.001059 wd 0.0500 time 0.4430 (0.4464) data time 0.0008 (0.0017) model time 0.4422 (0.4450) loss 4.1824 (4.2165) grad_norm 1.2858 (1.5026) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][410/625] eta 0:01:35 lr 0.001060 wd 0.0500 time 0.4415 (0.4463) data time 0.0007 (0.0017) model time 0.4408 (0.4449) loss 4.0133 (4.2132) grad_norm 1.3317 (1.4969) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][420/625] eta 0:01:31 lr 0.001061 wd 0.0500 time 0.4428 (0.4462) data time 0.0006 (0.0017) model time 0.4422 (0.4449) loss 5.1223 (4.2231) grad_norm 1.3413 (1.4903) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][430/625] eta 0:01:27 lr 0.001062 wd 0.0500 time 0.4431 (0.4462) data time 0.0008 (0.0017) model time 0.4423 (0.4448) loss 4.0366 (4.2164) grad_norm 1.3625 (1.4962) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][440/625] eta 0:01:22 lr 0.001062 wd 0.0500 time 0.4641 (0.4462) data time 0.0006 (0.0016) model time 0.4635 (0.4448) loss 4.5275 (4.2156) grad_norm 1.4691 (1.4958) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][450/625] eta 0:01:18 lr 0.001063 wd 0.0500 time 0.4444 (0.4461) data time 0.0008 (0.0016) model time 0.4436 (0.4448) loss 3.3848 (4.2166) grad_norm 1.4012 (1.4963) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][460/625] eta 0:01:13 lr 0.001064 wd 0.0500 time 0.4449 (0.4460) data time 0.0006 (0.0016) model time 0.4443 (0.4447) loss 3.9859 (4.2173) grad_norm 0.9534 (1.4903) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][470/625] eta 0:01:09 lr 0.001065 wd 0.0500 time 0.4410 (0.4460) data time 0.0008 (0.0016) model time 0.4402 (0.4446) loss 4.7338 (4.2212) grad_norm 1.5168 (1.4889) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][480/625] eta 0:01:04 lr 0.001066 wd 0.0500 time 0.4412 (0.4463) data time 0.0009 (0.0016) model time 0.4403 (0.4450) loss 3.1067 (4.2201) grad_norm 1.3334 (1.4856) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][490/625] eta 0:01:00 lr 0.001067 wd 0.0500 time 0.4434 (0.4463) data time 0.0008 (0.0015) model time 0.4426 (0.4450) loss 4.0474 (4.2176) grad_norm 2.0048 (1.4864) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][500/625] eta 0:00:55 lr 0.001068 wd 0.0500 time 0.4401 (0.4462) data time 0.0007 (0.0015) model time 0.4394 (0.4449) loss 4.3911 (4.2198) grad_norm 1.5735 (1.4904) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][510/625] eta 0:00:51 lr 0.001069 wd 0.0500 time 0.4407 (0.4461) data time 0.0008 (0.0015) model time 0.4399 (0.4449) loss 2.9184 (4.2194) grad_norm 1.4104 (1.4894) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][520/625] eta 0:00:46 lr 0.001070 wd 0.0500 time 0.4424 (0.4461) data time 0.0007 (0.0015) model time 0.4417 (0.4448) loss 5.1100 (4.2239) grad_norm 1.4954 (1.4896) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][530/625] eta 0:00:42 lr 0.001071 wd 0.0500 time 0.4413 (0.4460) data time 0.0008 (0.0015) model time 0.4405 (0.4448) loss 3.9844 (4.2233) grad_norm 1.1192 (1.4857) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][540/625] eta 0:00:37 lr 0.001072 wd 0.0500 time 0.4422 (0.4460) data time 0.0007 (0.0015) model time 0.4415 (0.4447) loss 5.0927 (4.2239) grad_norm 1.0890 (1.4851) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][550/625] eta 0:00:33 lr 0.001073 wd 0.0500 time 0.4451 (0.4459) data time 0.0009 (0.0015) model time 0.4442 (0.4447) loss 4.3559 (4.2175) grad_norm 1.1200 (1.4871) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][560/625] eta 0:00:28 lr 0.001074 wd 0.0500 time 0.4374 (0.4459) data time 0.0006 (0.0015) model time 0.4368 (0.4446) loss 3.1694 (4.2165) grad_norm 1.3437 (1.4840) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][570/625] eta 0:00:24 lr 0.001075 wd 0.0500 time 0.4459 (0.4458) data time 0.0006 (0.0014) model time 0.4453 (0.4446) loss 4.8019 (4.2135) grad_norm 1.2254 (1.4842) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][580/625] eta 0:00:20 lr 0.001076 wd 0.0500 time 0.4407 (0.4458) data time 0.0009 (0.0014) model time 0.4398 (0.4446) loss 3.5610 (4.2133) grad_norm 2.7867 (1.4877) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][590/625] eta 0:00:15 lr 0.001077 wd 0.0500 time 0.4439 (0.4459) data time 0.0007 (0.0014) model time 0.4432 (0.4447) loss 3.5003 (4.2139) grad_norm 1.8008 (1.4898) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][600/625] eta 0:00:11 lr 0.001078 wd 0.0500 time 0.4445 (0.4458) data time 0.0007 (0.0014) model time 0.4438 (0.4446) loss 4.9087 (4.2109) grad_norm 1.1806 (1.4890) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][610/625] eta 0:00:06 lr 0.001079 wd 0.0500 time 0.4419 (0.4458) data time 0.0006 (0.0014) model time 0.4414 (0.4446) loss 3.7668 (4.2148) grad_norm 2.0996 (1.4898) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [17/300][620/625] eta 0:00:02 lr 0.001080 wd 0.0500 time 0.4402 (0.4460) data time 0.0006 (0.0014) model time 0.4396 (0.4449) loss 4.2604 (4.2109) grad_norm 1.6929 (1.4878) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:21 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 17 training takes 0:04:38 [2024-08-04 11:05:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:05:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:05:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.9580 (0.9580) Acc@1 77.637 (77.637) Acc@5 93.994 (93.994) Mem 16696MB [2024-08-04 11:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.6904 (1.1764) Acc@1 60.693 (71.365) Acc@5 84.814 (91.868) Mem 16696MB [2024-08-04 11:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.8359 (1.4420) Acc@1 58.643 (66.088) Acc@5 82.324 (87.951) Mem 16696MB [2024-08-04 11:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 66.005 Acc@5 87.908 [2024-08-04 11:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 66.0% [2024-08-04 11:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 66.00% [2024-08-04 11:05:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:05:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 2.6660 (2.6660) Acc@1 45.996 (45.996) Acc@5 69.922 (69.922) Mem 16696MB [2024-08-04 11:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 3.3242 (2.8345) Acc@1 33.057 (39.719) Acc@5 58.643 (66.726) Mem 16696MB [2024-08-04 11:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 3.3945 (3.0204) Acc@1 30.078 (37.400) Acc@5 56.348 (63.367) Mem 16696MB [2024-08-04 11:05:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 38.046 Acc@5 64.161 [2024-08-04 11:05:31 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 38.0% [2024-08-04 11:05:31 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 38.05% [2024-08-04 11:05:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:05:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][0/625] eta 0:07:38 lr 0.001080 wd 0.0500 time 0.7336 (0.7336) data time 0.3427 (0.3427) model time 0.0000 (0.0000) loss 4.8025 (4.8025) grad_norm 1.8283 (1.8283) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][10/625] eta 0:04:48 lr 0.001081 wd 0.0500 time 0.4450 (0.4689) data time 0.0006 (0.0319) model time 0.0000 (0.0000) loss 3.3801 (4.3212) grad_norm 1.9414 (1.7970) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][20/625] eta 0:04:36 lr 0.001082 wd 0.0500 time 0.4381 (0.4569) data time 0.0009 (0.0171) model time 0.0000 (0.0000) loss 4.2793 (4.2636) grad_norm 1.3471 (1.6378) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][30/625] eta 0:04:29 lr 0.001083 wd 0.0500 time 0.4425 (0.4527) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 3.7421 (4.1253) grad_norm 1.1617 (1.5749) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][40/625] eta 0:04:23 lr 0.001084 wd 0.0500 time 0.4435 (0.4508) data time 0.0007 (0.0092) model time 0.0000 (0.0000) loss 4.3750 (4.1739) grad_norm 1.1021 (1.5455) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][50/625] eta 0:04:18 lr 0.001085 wd 0.0500 time 0.4435 (0.4497) data time 0.0006 (0.0075) model time 0.0000 (0.0000) loss 4.8018 (4.1926) grad_norm 1.1376 (1.5094) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][60/625] eta 0:04:15 lr 0.001086 wd 0.0500 time 0.6374 (0.4519) data time 0.0008 (0.0064) model time 0.6367 (0.4620) loss 4.0152 (4.1943) grad_norm 1.0798 (1.4775) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][70/625] eta 0:04:09 lr 0.001087 wd 0.0500 time 0.4425 (0.4499) data time 0.0006 (0.0056) model time 0.4418 (0.4495) loss 4.6676 (4.1929) grad_norm 1.6406 (1.4756) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][80/625] eta 0:04:04 lr 0.001088 wd 0.0500 time 0.4447 (0.4492) data time 0.0006 (0.0050) model time 0.4441 (0.4475) loss 3.6958 (4.2005) grad_norm 1.3184 (1.4881) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][90/625] eta 0:03:59 lr 0.001089 wd 0.0500 time 0.4428 (0.4484) data time 0.0006 (0.0046) model time 0.4422 (0.4459) loss 4.5968 (4.2099) grad_norm 1.3292 (1.4716) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][100/625] eta 0:03:55 lr 0.001090 wd 0.0500 time 0.4455 (0.4479) data time 0.0009 (0.0042) model time 0.4446 (0.4452) loss 4.4741 (4.2230) grad_norm 1.6016 (1.4612) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][110/625] eta 0:03:50 lr 0.001091 wd 0.0500 time 0.4403 (0.4474) data time 0.0006 (0.0039) model time 0.4398 (0.4446) loss 3.8183 (4.2221) grad_norm 1.2219 (1.4527) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][120/625] eta 0:03:45 lr 0.001092 wd 0.0500 time 0.4427 (0.4471) data time 0.0006 (0.0036) model time 0.4421 (0.4443) loss 2.5805 (4.2124) grad_norm 1.2800 (1.4577) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][130/625] eta 0:03:41 lr 0.001093 wd 0.0500 time 0.4437 (0.4468) data time 0.0008 (0.0034) model time 0.4429 (0.4441) loss 3.8780 (4.2241) grad_norm 1.2206 (1.4549) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][140/625] eta 0:03:36 lr 0.001094 wd 0.0500 time 0.4452 (0.4467) data time 0.0011 (0.0032) model time 0.4441 (0.4442) loss 4.5446 (4.2172) grad_norm 1.8297 (1.4625) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][150/625] eta 0:03:32 lr 0.001095 wd 0.0500 time 0.4502 (0.4466) data time 0.0007 (0.0031) model time 0.4494 (0.4442) loss 4.3281 (4.2157) grad_norm 1.7141 (1.4724) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][160/625] eta 0:03:27 lr 0.001096 wd 0.0500 time 0.4447 (0.4465) data time 0.0008 (0.0029) model time 0.4439 (0.4442) loss 4.2290 (4.1946) grad_norm 1.2864 (1.4870) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][170/625] eta 0:03:23 lr 0.001096 wd 0.0500 time 0.4409 (0.4462) data time 0.0008 (0.0028) model time 0.4401 (0.4440) loss 4.8958 (4.2239) grad_norm 1.3862 (1.4808) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][180/625] eta 0:03:18 lr 0.001097 wd 0.0500 time 0.4519 (0.4461) data time 0.0006 (0.0027) model time 0.4513 (0.4438) loss 4.4125 (4.2087) grad_norm 1.1094 (1.4802) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][190/625] eta 0:03:14 lr 0.001098 wd 0.0500 time 0.4408 (0.4468) data time 0.0006 (0.0026) model time 0.4402 (0.4450) loss 3.8765 (4.2120) grad_norm 1.1105 (1.4791) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][200/625] eta 0:03:09 lr 0.001099 wd 0.0500 time 0.4422 (0.4466) data time 0.0006 (0.0025) model time 0.4415 (0.4448) loss 4.7927 (4.2181) grad_norm 1.1381 (1.4737) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][210/625] eta 0:03:05 lr 0.001100 wd 0.0500 time 0.4457 (0.4465) data time 0.0006 (0.0024) model time 0.4451 (0.4447) loss 4.3910 (4.2180) grad_norm 2.1983 (1.4832) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][220/625] eta 0:03:00 lr 0.001101 wd 0.0500 time 0.4396 (0.4463) data time 0.0007 (0.0023) model time 0.4389 (0.4445) loss 4.6151 (4.2230) grad_norm 1.4152 (1.4838) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][230/625] eta 0:02:56 lr 0.001102 wd 0.0500 time 0.4496 (0.4462) data time 0.0008 (0.0023) model time 0.4488 (0.4444) loss 3.6337 (4.2205) grad_norm 1.0069 (1.4860) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][240/625] eta 0:02:51 lr 0.001103 wd 0.0500 time 0.4444 (0.4461) data time 0.0009 (0.0022) model time 0.4435 (0.4443) loss 3.2561 (4.2303) grad_norm 1.7814 (1.4867) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][250/625] eta 0:02:47 lr 0.001104 wd 0.0500 time 0.4431 (0.4460) data time 0.0006 (0.0022) model time 0.4425 (0.4442) loss 4.1760 (4.2425) grad_norm 1.6242 (1.4991) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][260/625] eta 0:02:42 lr 0.001105 wd 0.0500 time 0.4438 (0.4459) data time 0.0009 (0.0021) model time 0.4430 (0.4442) loss 4.3501 (4.2467) grad_norm 2.0826 (1.5003) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][270/625] eta 0:02:38 lr 0.001106 wd 0.0500 time 0.4421 (0.4458) data time 0.0006 (0.0021) model time 0.4414 (0.4440) loss 3.9422 (4.2405) grad_norm 1.2409 (1.5005) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][280/625] eta 0:02:33 lr 0.001107 wd 0.0500 time 0.4440 (0.4457) data time 0.0006 (0.0020) model time 0.4434 (0.4440) loss 3.0655 (4.2218) grad_norm 2.2084 (1.5145) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][290/625] eta 0:02:29 lr 0.001108 wd 0.0500 time 0.4401 (0.4456) data time 0.0009 (0.0020) model time 0.4392 (0.4439) loss 3.4803 (4.2288) grad_norm 1.5334 (1.5074) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][300/625] eta 0:02:24 lr 0.001109 wd 0.0500 time 0.4447 (0.4455) data time 0.0008 (0.0019) model time 0.4438 (0.4438) loss 4.1382 (4.2343) grad_norm 2.4732 (1.5082) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][310/625] eta 0:02:20 lr 0.001110 wd 0.0500 time 0.4387 (0.4453) data time 0.0006 (0.0019) model time 0.4381 (0.4437) loss 4.1570 (4.2263) grad_norm 1.2461 (1.5086) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][320/625] eta 0:02:15 lr 0.001111 wd 0.0500 time 0.4438 (0.4453) data time 0.0006 (0.0019) model time 0.4432 (0.4437) loss 5.1152 (4.2281) grad_norm 1.2334 (1.5041) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][330/625] eta 0:02:11 lr 0.001112 wd 0.0500 time 0.4450 (0.4452) data time 0.0008 (0.0018) model time 0.4442 (0.4436) loss 3.3441 (4.2159) grad_norm 1.1244 (1.5055) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][340/625] eta 0:02:06 lr 0.001113 wd 0.0500 time 0.4446 (0.4452) data time 0.0010 (0.0018) model time 0.4436 (0.4436) loss 4.0688 (4.2087) grad_norm 1.3511 (1.4996) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][350/625] eta 0:02:02 lr 0.001114 wd 0.0500 time 0.4426 (0.4452) data time 0.0007 (0.0018) model time 0.4419 (0.4436) loss 3.2196 (4.2048) grad_norm 1.4619 (1.5002) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][360/625] eta 0:01:57 lr 0.001115 wd 0.0500 time 0.4429 (0.4451) data time 0.0007 (0.0017) model time 0.4422 (0.4436) loss 5.2411 (4.2100) grad_norm 1.4152 (1.5027) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][370/625] eta 0:01:53 lr 0.001116 wd 0.0500 time 0.4438 (0.4451) data time 0.0007 (0.0017) model time 0.4431 (0.4435) loss 2.9648 (4.2029) grad_norm 0.9538 (1.4996) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][380/625] eta 0:01:49 lr 0.001117 wd 0.0500 time 0.4410 (0.4450) data time 0.0008 (0.0017) model time 0.4402 (0.4435) loss 4.2485 (4.1958) grad_norm 1.4731 (1.4933) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][390/625] eta 0:01:44 lr 0.001118 wd 0.0500 time 0.4401 (0.4449) data time 0.0008 (0.0017) model time 0.4394 (0.4434) loss 4.4006 (4.1968) grad_norm 1.3562 (1.4991) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][400/625] eta 0:01:40 lr 0.001119 wd 0.0500 time 0.4426 (0.4452) data time 0.0008 (0.0016) model time 0.4418 (0.4438) loss 3.4636 (4.1900) grad_norm 2.0712 (1.5038) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][410/625] eta 0:01:35 lr 0.001119 wd 0.0500 time 0.4475 (0.4457) data time 0.0008 (0.0016) model time 0.4468 (0.4444) loss 4.7648 (4.1956) grad_norm 1.5106 (1.5006) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][420/625] eta 0:01:31 lr 0.001120 wd 0.0500 time 0.4454 (0.4457) data time 0.0006 (0.0016) model time 0.4448 (0.4443) loss 4.8398 (4.1945) grad_norm 1.1659 (1.5034) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][430/625] eta 0:01:26 lr 0.001121 wd 0.0500 time 0.4404 (0.4456) data time 0.0008 (0.0016) model time 0.4396 (0.4443) loss 4.5304 (4.1965) grad_norm 1.3265 (1.5036) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][440/625] eta 0:01:22 lr 0.001122 wd 0.0500 time 0.4439 (0.4456) data time 0.0006 (0.0016) model time 0.4432 (0.4442) loss 4.2996 (4.1947) grad_norm 1.5603 (1.5035) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][450/625] eta 0:01:17 lr 0.001123 wd 0.0500 time 0.4445 (0.4455) data time 0.0008 (0.0015) model time 0.4436 (0.4442) loss 4.1142 (4.1980) grad_norm 1.2595 (1.5053) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][460/625] eta 0:01:13 lr 0.001124 wd 0.0500 time 0.4442 (0.4455) data time 0.0007 (0.0015) model time 0.4435 (0.4441) loss 4.4350 (4.1971) grad_norm 1.2088 (1.5063) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][470/625] eta 0:01:09 lr 0.001125 wd 0.0500 time 0.4402 (0.4454) data time 0.0009 (0.0015) model time 0.4393 (0.4441) loss 3.9356 (4.1877) grad_norm 2.1533 (1.5038) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][480/625] eta 0:01:04 lr 0.001126 wd 0.0500 time 0.4428 (0.4454) data time 0.0009 (0.0015) model time 0.4418 (0.4441) loss 3.8961 (4.1832) grad_norm 1.2484 (1.5050) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][490/625] eta 0:01:00 lr 0.001127 wd 0.0500 time 0.4436 (0.4453) data time 0.0007 (0.0015) model time 0.4429 (0.4440) loss 3.7206 (4.1809) grad_norm 1.5403 (1.5052) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][500/625] eta 0:00:55 lr 0.001128 wd 0.0500 time 0.4420 (0.4453) data time 0.0008 (0.0015) model time 0.4412 (0.4440) loss 3.9320 (4.1783) grad_norm 1.5101 (1.5069) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][510/625] eta 0:00:51 lr 0.001129 wd 0.0500 time 0.4441 (0.4453) data time 0.0008 (0.0015) model time 0.4433 (0.4440) loss 4.2736 (4.1768) grad_norm 1.4507 (1.5083) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][520/625] eta 0:00:46 lr 0.001130 wd 0.0500 time 0.4439 (0.4453) data time 0.0006 (0.0014) model time 0.4433 (0.4440) loss 3.9498 (4.1745) grad_norm 1.3465 (1.5061) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][530/625] eta 0:00:42 lr 0.001131 wd 0.0500 time 0.4413 (0.4453) data time 0.0007 (0.0014) model time 0.4406 (0.4440) loss 4.1302 (4.1780) grad_norm 1.6795 (1.5068) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][540/625] eta 0:00:37 lr 0.001132 wd 0.0500 time 0.4404 (0.4453) data time 0.0008 (0.0014) model time 0.4397 (0.4440) loss 4.3988 (4.1766) grad_norm 1.2121 (1.5026) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][550/625] eta 0:00:33 lr 0.001133 wd 0.0500 time 0.6446 (0.4457) data time 0.0008 (0.0014) model time 0.6438 (0.4445) loss 4.3550 (4.1741) grad_norm 1.2840 (1.4989) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][560/625] eta 0:00:28 lr 0.001134 wd 0.0500 time 0.4452 (0.4457) data time 0.0006 (0.0014) model time 0.4446 (0.4445) loss 5.0255 (4.1813) grad_norm 1.4262 (1.4973) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][570/625] eta 0:00:24 lr 0.001135 wd 0.0500 time 0.4455 (0.4457) data time 0.0007 (0.0014) model time 0.4447 (0.4445) loss 4.2938 (4.1801) grad_norm 2.2620 (1.4987) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][580/625] eta 0:00:20 lr 0.001136 wd 0.0500 time 0.4391 (0.4456) data time 0.0008 (0.0014) model time 0.4383 (0.4444) loss 4.4159 (4.1767) grad_norm 1.6469 (1.4966) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][590/625] eta 0:00:15 lr 0.001137 wd 0.0500 time 0.4439 (0.4457) data time 0.0006 (0.0014) model time 0.4433 (0.4445) loss 4.4688 (4.1748) grad_norm 1.1100 (1.4970) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][600/625] eta 0:00:11 lr 0.001138 wd 0.0500 time 0.4442 (0.4456) data time 0.0007 (0.0014) model time 0.4435 (0.4445) loss 4.6269 (4.1787) grad_norm 1.2773 (1.4943) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][610/625] eta 0:00:06 lr 0.001139 wd 0.0500 time 0.4377 (0.4456) data time 0.0004 (0.0014) model time 0.4373 (0.4444) loss 4.5508 (4.1753) grad_norm 1.3523 (1.4935) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [18/300][620/625] eta 0:00:02 lr 0.001140 wd 0.0500 time 0.4396 (0.4455) data time 0.0004 (0.0013) model time 0.4392 (0.4443) loss 4.8945 (4.1743) grad_norm 1.0889 (1.4920) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:11 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 18 training takes 0:04:38 [2024-08-04 11:10:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:10:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:10:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.9487 (0.9487) Acc@1 77.686 (77.686) Acc@5 94.482 (94.482) Mem 16696MB [2024-08-04 11:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 1.6592 (1.1365) Acc@1 61.768 (72.328) Acc@5 85.449 (92.591) Mem 16696MB [2024-08-04 11:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.8311 (1.4171) Acc@1 58.252 (66.750) Acc@5 82.861 (88.586) Mem 16696MB [2024-08-04 11:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 66.685 Acc@5 88.524 [2024-08-04 11:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 66.7% [2024-08-04 11:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 66.68% [2024-08-04 11:10:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:10:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:10:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 2.2227 (2.2227) Acc@1 52.197 (52.197) Acc@5 76.611 (76.611) Mem 16696MB [2024-08-04 11:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 2.9258 (2.4199) Acc@1 39.160 (46.280) Acc@5 64.404 (73.229) Mem 16696MB [2024-08-04 11:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 3.0391 (2.6303) Acc@1 35.742 (43.455) Acc@5 61.377 (69.468) Mem 16696MB [2024-08-04 11:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 43.976 Acc@5 70.076 [2024-08-04 11:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 44.0% [2024-08-04 11:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 43.98% [2024-08-04 11:10:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:10:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][0/625] eta 0:07:42 lr 0.001140 wd 0.0500 time 0.7406 (0.7406) data time 0.3471 (0.3471) model time 0.0000 (0.0000) loss 4.3105 (4.3105) grad_norm 1.4049 (1.4049) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][10/625] eta 0:04:49 lr 0.001141 wd 0.0500 time 0.4466 (0.4708) data time 0.0006 (0.0322) model time 0.0000 (0.0000) loss 3.6305 (4.0450) grad_norm 1.6716 (1.5006) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][20/625] eta 0:04:37 lr 0.001142 wd 0.0500 time 0.4426 (0.4587) data time 0.0006 (0.0172) model time 0.0000 (0.0000) loss 4.6615 (4.0548) grad_norm 1.3903 (1.5447) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][30/625] eta 0:04:30 lr 0.001143 wd 0.0500 time 0.4475 (0.4543) data time 0.0006 (0.0119) model time 0.0000 (0.0000) loss 3.8302 (4.1987) grad_norm 1.5458 (1.4989) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][40/625] eta 0:04:24 lr 0.001144 wd 0.0500 time 0.4473 (0.4517) data time 0.0007 (0.0092) model time 0.0000 (0.0000) loss 4.1871 (4.1293) grad_norm 1.4542 (1.5400) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][50/625] eta 0:04:18 lr 0.001145 wd 0.0500 time 0.4374 (0.4500) data time 0.0007 (0.0075) model time 0.0000 (0.0000) loss 4.4838 (4.1493) grad_norm 1.0867 (1.5163) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][60/625] eta 0:04:15 lr 0.001146 wd 0.0500 time 0.6536 (0.4526) data time 0.0008 (0.0064) model time 0.6528 (0.4652) loss 3.7354 (4.1496) grad_norm 2.9239 (1.5383) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][70/625] eta 0:04:10 lr 0.001147 wd 0.0500 time 0.4433 (0.4507) data time 0.0007 (0.0056) model time 0.4426 (0.4518) loss 3.9846 (4.1299) grad_norm 1.2532 (1.5552) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:10:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][80/625] eta 0:04:05 lr 0.001148 wd 0.0500 time 0.4422 (0.4499) data time 0.0005 (0.0050) model time 0.4417 (0.4491) loss 3.0220 (4.1555) grad_norm 1.3323 (1.5259) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][90/625] eta 0:04:00 lr 0.001149 wd 0.0500 time 0.4462 (0.4494) data time 0.0007 (0.0046) model time 0.4455 (0.4478) loss 4.4108 (4.1564) grad_norm 1.9007 (1.5315) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][100/625] eta 0:03:55 lr 0.001150 wd 0.0500 time 0.4467 (0.4488) data time 0.0008 (0.0042) model time 0.4459 (0.4468) loss 4.0025 (4.1651) grad_norm 1.3940 (1.5158) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][110/625] eta 0:03:50 lr 0.001151 wd 0.0500 time 0.4423 (0.4483) data time 0.0009 (0.0039) model time 0.4414 (0.4461) loss 3.4456 (4.1331) grad_norm 1.6338 (1.5010) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][120/625] eta 0:03:47 lr 0.001152 wd 0.0500 time 0.4451 (0.4498) data time 0.0006 (0.0036) model time 0.4444 (0.4489) loss 3.6848 (4.1295) grad_norm 1.3042 (1.4980) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][130/625] eta 0:03:42 lr 0.001153 wd 0.0500 time 0.4447 (0.4493) data time 0.0009 (0.0034) model time 0.4439 (0.4481) loss 4.3282 (4.1557) grad_norm 1.1433 (1.5309) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][140/625] eta 0:03:37 lr 0.001154 wd 0.0500 time 0.4418 (0.4491) data time 0.0010 (0.0032) model time 0.4408 (0.4478) loss 3.9622 (4.1392) grad_norm 1.4361 (1.5217) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][150/625] eta 0:03:33 lr 0.001154 wd 0.0500 time 0.4456 (0.4487) data time 0.0006 (0.0031) model time 0.4450 (0.4473) loss 4.8731 (4.1297) grad_norm 1.3582 (1.5110) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][160/625] eta 0:03:28 lr 0.001155 wd 0.0500 time 0.4449 (0.4484) data time 0.0006 (0.0029) model time 0.4443 (0.4469) loss 4.9072 (4.1336) grad_norm 1.1270 (1.4949) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][170/625] eta 0:03:23 lr 0.001156 wd 0.0500 time 0.4425 (0.4482) data time 0.0009 (0.0028) model time 0.4416 (0.4466) loss 4.4940 (4.1436) grad_norm 1.8474 (1.4873) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][180/625] eta 0:03:19 lr 0.001157 wd 0.0500 time 0.4452 (0.4479) data time 0.0006 (0.0027) model time 0.4446 (0.4463) loss 4.0475 (4.1458) grad_norm 1.4111 (1.4757) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][190/625] eta 0:03:14 lr 0.001158 wd 0.0500 time 0.4447 (0.4477) data time 0.0006 (0.0026) model time 0.4441 (0.4461) loss 4.5121 (4.1513) grad_norm 1.5355 (1.4779) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][200/625] eta 0:03:10 lr 0.001159 wd 0.0500 time 0.4454 (0.4476) data time 0.0010 (0.0025) model time 0.4445 (0.4459) loss 4.3728 (4.1588) grad_norm 1.2680 (1.4829) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][210/625] eta 0:03:05 lr 0.001160 wd 0.0500 time 0.4464 (0.4474) data time 0.0009 (0.0024) model time 0.4455 (0.4458) loss 3.2927 (4.1473) grad_norm 1.3601 (1.4810) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][220/625] eta 0:03:01 lr 0.001161 wd 0.0500 time 0.4403 (0.4473) data time 0.0006 (0.0023) model time 0.4397 (0.4458) loss 3.0902 (4.1307) grad_norm 2.0218 (1.4768) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][230/625] eta 0:02:56 lr 0.001162 wd 0.0500 time 0.4458 (0.4472) data time 0.0007 (0.0023) model time 0.4451 (0.4456) loss 3.7987 (4.1155) grad_norm 1.0036 (1.4789) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][240/625] eta 0:02:52 lr 0.001163 wd 0.0500 time 0.4405 (0.4470) data time 0.0008 (0.0022) model time 0.4397 (0.4455) loss 4.3318 (4.1242) grad_norm 1.1704 (1.4694) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][250/625] eta 0:02:47 lr 0.001164 wd 0.0500 time 0.4447 (0.4470) data time 0.0007 (0.0022) model time 0.4440 (0.4454) loss 3.7655 (4.1134) grad_norm 0.9971 (1.4558) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][260/625] eta 0:02:43 lr 0.001165 wd 0.0500 time 0.4451 (0.4469) data time 0.0008 (0.0021) model time 0.4443 (0.4454) loss 2.5947 (4.1049) grad_norm 1.8273 (1.4498) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][270/625] eta 0:02:38 lr 0.001166 wd 0.0500 time 0.4465 (0.4468) data time 0.0006 (0.0021) model time 0.4459 (0.4453) loss 3.2899 (4.0986) grad_norm 1.3110 (1.4529) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][280/625] eta 0:02:34 lr 0.001167 wd 0.0500 time 0.4437 (0.4468) data time 0.0007 (0.0020) model time 0.4431 (0.4453) loss 3.1119 (4.0941) grad_norm 1.5366 (1.4571) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][290/625] eta 0:02:29 lr 0.001168 wd 0.0500 time 0.4449 (0.4468) data time 0.0008 (0.0020) model time 0.4441 (0.4453) loss 3.1665 (4.1019) grad_norm 2.3602 (1.4576) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][300/625] eta 0:02:25 lr 0.001169 wd 0.0500 time 0.4429 (0.4467) data time 0.0006 (0.0019) model time 0.4423 (0.4452) loss 3.7534 (4.1026) grad_norm 2.7024 (1.4650) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][310/625] eta 0:02:20 lr 0.001170 wd 0.0500 time 0.4395 (0.4466) data time 0.0008 (0.0019) model time 0.4387 (0.4451) loss 3.9347 (4.0910) grad_norm 1.0002 (1.4704) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][320/625] eta 0:02:16 lr 0.001171 wd 0.0500 time 0.4413 (0.4465) data time 0.0007 (0.0019) model time 0.4406 (0.4450) loss 4.9315 (4.0850) grad_norm 2.8931 (1.4762) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][330/625] eta 0:02:11 lr 0.001172 wd 0.0500 time 0.4437 (0.4464) data time 0.0008 (0.0018) model time 0.4429 (0.4449) loss 3.6442 (4.0757) grad_norm 1.2711 (1.4696) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][340/625] eta 0:02:07 lr 0.001173 wd 0.0500 time 0.4476 (0.4463) data time 0.0006 (0.0018) model time 0.4470 (0.4449) loss 3.7050 (4.0708) grad_norm 1.3172 (1.4622) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][350/625] eta 0:02:02 lr 0.001174 wd 0.0500 time 0.4506 (0.4463) data time 0.0006 (0.0018) model time 0.4501 (0.4448) loss 3.4072 (4.0646) grad_norm 1.6339 (1.4643) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][360/625] eta 0:01:58 lr 0.001175 wd 0.0500 time 0.4435 (0.4462) data time 0.0006 (0.0017) model time 0.4429 (0.4447) loss 3.3813 (4.0717) grad_norm 1.3676 (1.4643) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][370/625] eta 0:01:53 lr 0.001176 wd 0.0500 time 0.4392 (0.4460) data time 0.0006 (0.0017) model time 0.4386 (0.4446) loss 3.8534 (4.0713) grad_norm 2.3163 (1.4726) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][380/625] eta 0:01:49 lr 0.001177 wd 0.0500 time 0.4423 (0.4460) data time 0.0008 (0.0017) model time 0.4415 (0.4446) loss 4.2653 (4.0754) grad_norm 1.1143 (1.4672) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][390/625] eta 0:01:44 lr 0.001177 wd 0.0500 time 0.4435 (0.4459) data time 0.0008 (0.0017) model time 0.4427 (0.4445) loss 4.1403 (4.0846) grad_norm 2.0322 (1.4684) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][400/625] eta 0:01:40 lr 0.001178 wd 0.0500 time 0.4437 (0.4462) data time 0.0007 (0.0016) model time 0.4430 (0.4449) loss 3.2079 (4.0766) grad_norm 1.3865 (1.4736) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][410/625] eta 0:01:35 lr 0.001179 wd 0.0500 time 0.4441 (0.4462) data time 0.0008 (0.0016) model time 0.4433 (0.4448) loss 5.1761 (4.0858) grad_norm 1.4687 (1.4743) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][420/625] eta 0:01:31 lr 0.001180 wd 0.0500 time 0.4413 (0.4461) data time 0.0006 (0.0016) model time 0.4407 (0.4448) loss 3.0757 (4.0837) grad_norm 1.7688 (1.4762) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][430/625] eta 0:01:26 lr 0.001181 wd 0.0500 time 0.4434 (0.4461) data time 0.0006 (0.0016) model time 0.4428 (0.4448) loss 3.8164 (4.0891) grad_norm 0.9689 (1.4711) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][440/625] eta 0:01:22 lr 0.001182 wd 0.0500 time 0.4431 (0.4460) data time 0.0007 (0.0016) model time 0.4424 (0.4447) loss 4.7524 (4.0923) grad_norm 1.2726 (1.4667) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][450/625] eta 0:01:18 lr 0.001183 wd 0.0500 time 0.4440 (0.4465) data time 0.0008 (0.0016) model time 0.4432 (0.4452) loss 4.2383 (4.0934) grad_norm 1.8828 (1.4636) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][460/625] eta 0:01:13 lr 0.001184 wd 0.0500 time 0.4444 (0.4464) data time 0.0009 (0.0015) model time 0.4434 (0.4452) loss 4.4182 (4.0865) grad_norm 1.6473 (1.4660) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][470/625] eta 0:01:09 lr 0.001185 wd 0.0500 time 0.4445 (0.4464) data time 0.0009 (0.0015) model time 0.4436 (0.4451) loss 4.5528 (4.0940) grad_norm 1.5645 (1.4637) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][480/625] eta 0:01:04 lr 0.001186 wd 0.0500 time 0.4412 (0.4463) data time 0.0008 (0.0015) model time 0.4404 (0.4451) loss 4.1448 (4.0891) grad_norm 1.4625 (1.4664) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][490/625] eta 0:01:00 lr 0.001187 wd 0.0500 time 0.4448 (0.4463) data time 0.0009 (0.0015) model time 0.4440 (0.4451) loss 3.3698 (4.0834) grad_norm 1.6306 (1.4660) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][500/625] eta 0:00:55 lr 0.001188 wd 0.0500 time 0.4427 (0.4462) data time 0.0007 (0.0015) model time 0.4419 (0.4450) loss 4.6243 (4.0777) grad_norm 1.4449 (1.4677) loss_scale 32768.0000 (16547.5130) mem 16696MB [2024-08-04 11:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][510/625] eta 0:00:51 lr 0.001189 wd 0.0500 time 0.4472 (0.4462) data time 0.0009 (0.0015) model time 0.4463 (0.4450) loss 4.9105 (4.0797) grad_norm 1.5119 (1.4688) loss_scale 32768.0000 (16864.9393) mem 16696MB [2024-08-04 11:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][520/625] eta 0:00:46 lr 0.001190 wd 0.0500 time 0.4458 (0.4462) data time 0.0008 (0.0015) model time 0.4450 (0.4450) loss 4.3712 (4.0767) grad_norm 2.1449 (1.4698) loss_scale 32768.0000 (17170.1804) mem 16696MB [2024-08-04 11:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][530/625] eta 0:00:42 lr 0.001191 wd 0.0500 time 0.4450 (0.4462) data time 0.0009 (0.0014) model time 0.4441 (0.4450) loss 4.4434 (4.0744) grad_norm 1.3468 (1.4703) loss_scale 32768.0000 (17463.9247) mem 16696MB [2024-08-04 11:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][540/625] eta 0:00:37 lr 0.001192 wd 0.0500 time 0.4456 (0.4462) data time 0.0006 (0.0014) model time 0.4450 (0.4450) loss 3.0599 (4.0732) grad_norm 1.4619 (1.4656) loss_scale 32768.0000 (17746.8096) mem 16696MB [2024-08-04 11:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][550/625] eta 0:00:33 lr 0.001193 wd 0.0500 time 0.4501 (0.4462) data time 0.0007 (0.0014) model time 0.4495 (0.4450) loss 4.0413 (4.0734) grad_norm 1.0004 (1.4710) loss_scale 32768.0000 (18019.4265) mem 16696MB [2024-08-04 11:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][560/625] eta 0:00:29 lr 0.001194 wd 0.0500 time 0.4475 (0.4462) data time 0.0009 (0.0014) model time 0.4466 (0.4450) loss 4.4901 (4.0684) grad_norm 1.5725 (1.4711) loss_scale 32768.0000 (18282.3244) mem 16696MB [2024-08-04 11:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][570/625] eta 0:00:24 lr 0.001195 wd 0.0500 time 0.4422 (0.4461) data time 0.0008 (0.0014) model time 0.4413 (0.4450) loss 3.8730 (4.0658) grad_norm 1.2686 (inf) loss_scale 16384.0000 (18249.0788) mem 16696MB [2024-08-04 11:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][580/625] eta 0:00:20 lr 0.001196 wd 0.0500 time 0.4429 (0.4461) data time 0.0008 (0.0014) model time 0.4420 (0.4450) loss 4.3647 (4.0627) grad_norm 1.7788 (inf) loss_scale 16384.0000 (18216.9776) mem 16696MB [2024-08-04 11:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][590/625] eta 0:00:15 lr 0.001197 wd 0.0500 time 0.4477 (0.4462) data time 0.0009 (0.0014) model time 0.4468 (0.4450) loss 4.5968 (4.0668) grad_norm 1.2236 (inf) loss_scale 16384.0000 (18185.9628) mem 16696MB [2024-08-04 11:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][600/625] eta 0:00:11 lr 0.001198 wd 0.0500 time 0.4448 (0.4462) data time 0.0006 (0.0014) model time 0.4442 (0.4450) loss 4.6556 (4.0670) grad_norm 1.1822 (inf) loss_scale 16384.0000 (18155.9800) mem 16696MB [2024-08-04 11:14:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][610/625] eta 0:00:06 lr 0.001199 wd 0.0500 time 0.4424 (0.4462) data time 0.0006 (0.0014) model time 0.4419 (0.4450) loss 4.6572 (4.0656) grad_norm 1.4740 (inf) loss_scale 16384.0000 (18126.9787) mem 16696MB [2024-08-04 11:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [19/300][620/625] eta 0:00:02 lr 0.001200 wd 0.0500 time 0.4417 (0.4460) data time 0.0004 (0.0014) model time 0.4412 (0.4449) loss 4.9862 (4.0685) grad_norm 1.6905 (inf) loss_scale 16384.0000 (18098.9114) mem 16696MB [2024-08-04 11:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 19 training takes 0:04:38 [2024-08-04 11:15:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:15:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:15:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.9102 (0.9102) Acc@1 78.760 (78.760) Acc@5 94.922 (94.922) Mem 16696MB [2024-08-04 11:15:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.154) Loss 1.6182 (1.1487) Acc@1 63.965 (72.745) Acc@5 87.256 (92.805) Mem 16696MB [2024-08-04 11:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.136) Loss 1.7949 (1.4121) Acc@1 60.107 (67.569) Acc@5 83.984 (89.058) Mem 16696MB [2024-08-04 11:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 67.462 Acc@5 89.062 [2024-08-04 11:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 67.5% [2024-08-04 11:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 67.46% [2024-08-04 11:15:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:15:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:15:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 1.8770 (1.8770) Acc@1 58.447 (58.447) Acc@5 81.689 (81.689) Mem 16696MB [2024-08-04 11:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 2.6113 (2.0920) Acc@1 42.725 (51.776) Acc@5 70.459 (78.445) Mem 16696MB [2024-08-04 11:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 2.7461 (2.3216) Acc@1 40.088 (48.521) Acc@5 65.674 (74.282) Mem 16696MB [2024-08-04 11:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 48.850 Acc@5 74.764 [2024-08-04 11:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 48.9% [2024-08-04 11:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 48.85% [2024-08-04 11:15:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:15:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:15:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][0/625] eta 0:07:58 lr 0.001200 wd 0.0500 time 0.7656 (0.7656) data time 0.3786 (0.3786) model time 0.0000 (0.0000) loss 4.3843 (4.3843) grad_norm 1.2530 (1.2530) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][10/625] eta 0:04:51 lr 0.001200 wd 0.0500 time 0.4457 (0.4746) data time 0.0008 (0.0352) model time 0.0000 (0.0000) loss 4.6126 (4.1737) grad_norm 1.5556 (1.5701) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][20/625] eta 0:04:38 lr 0.001200 wd 0.0500 time 0.4478 (0.4605) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 4.2533 (4.2380) grad_norm 1.2772 (1.5398) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][30/625] eta 0:04:38 lr 0.001200 wd 0.0500 time 0.4424 (0.4683) data time 0.0007 (0.0130) model time 0.0000 (0.0000) loss 3.4971 (4.1932) grad_norm 0.9926 (1.4566) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][40/625] eta 0:04:30 lr 0.001200 wd 0.0500 time 0.4418 (0.4622) data time 0.0006 (0.0100) model time 0.0000 (0.0000) loss 4.6540 (4.1517) grad_norm 1.4138 (1.4684) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][50/625] eta 0:04:23 lr 0.001200 wd 0.0500 time 0.4424 (0.4586) data time 0.0007 (0.0082) model time 0.0000 (0.0000) loss 4.9784 (4.1711) grad_norm 1.3218 (1.4536) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][60/625] eta 0:04:19 lr 0.001200 wd 0.0500 time 0.6672 (0.4597) data time 0.0010 (0.0070) model time 0.6662 (0.4649) loss 3.3229 (4.1213) grad_norm 1.3520 (1.4580) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][70/625] eta 0:04:13 lr 0.001200 wd 0.0500 time 0.4426 (0.4566) data time 0.0009 (0.0061) model time 0.4417 (0.4507) loss 4.7795 (4.0836) grad_norm 1.3615 (1.4470) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][80/625] eta 0:04:07 lr 0.001200 wd 0.0500 time 0.4459 (0.4550) data time 0.0006 (0.0055) model time 0.4453 (0.4482) loss 3.2373 (4.1143) grad_norm 1.7054 (1.4430) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][90/625] eta 0:04:02 lr 0.001200 wd 0.0500 time 0.4429 (0.4538) data time 0.0009 (0.0050) model time 0.4420 (0.4469) loss 4.0539 (4.1269) grad_norm 1.8853 (1.4642) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][100/625] eta 0:03:57 lr 0.001200 wd 0.0500 time 0.4422 (0.4529) data time 0.0007 (0.0046) model time 0.4415 (0.4463) loss 4.7217 (4.1225) grad_norm 1.5651 (1.4505) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][110/625] eta 0:03:52 lr 0.001200 wd 0.0500 time 0.4428 (0.4520) data time 0.0007 (0.0042) model time 0.4422 (0.4457) loss 3.7437 (4.1147) grad_norm 2.5430 (1.4542) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][120/625] eta 0:03:47 lr 0.001200 wd 0.0500 time 0.4424 (0.4514) data time 0.0008 (0.0039) model time 0.4416 (0.4454) loss 3.0791 (4.1095) grad_norm 1.4446 (1.4472) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][130/625] eta 0:03:43 lr 0.001200 wd 0.0500 time 0.4425 (0.4509) data time 0.0007 (0.0037) model time 0.4418 (0.4452) loss 4.3971 (4.1085) grad_norm 0.9019 (1.4608) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][140/625] eta 0:03:38 lr 0.001200 wd 0.0500 time 0.4473 (0.4504) data time 0.0009 (0.0035) model time 0.4464 (0.4449) loss 4.4057 (4.0921) grad_norm 1.1979 (1.4520) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][150/625] eta 0:03:33 lr 0.001200 wd 0.0500 time 0.4379 (0.4499) data time 0.0008 (0.0033) model time 0.4371 (0.4447) loss 4.0395 (4.0908) grad_norm 1.8989 (1.4440) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][160/625] eta 0:03:29 lr 0.001200 wd 0.0500 time 0.4445 (0.4496) data time 0.0008 (0.0032) model time 0.4437 (0.4446) loss 4.1597 (4.0934) grad_norm 1.6521 (1.4451) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][170/625] eta 0:03:24 lr 0.001200 wd 0.0500 time 0.4480 (0.4493) data time 0.0008 (0.0030) model time 0.4473 (0.4446) loss 4.0725 (4.0959) grad_norm 1.3679 (1.4366) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][180/625] eta 0:03:19 lr 0.001200 wd 0.0500 time 0.4426 (0.4491) data time 0.0006 (0.0029) model time 0.4420 (0.4446) loss 4.3030 (4.1042) grad_norm 1.5730 (1.4284) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][190/625] eta 0:03:15 lr 0.001200 wd 0.0500 time 0.4415 (0.4488) data time 0.0006 (0.0028) model time 0.4409 (0.4445) loss 4.4807 (4.1065) grad_norm 1.1313 (1.4346) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][200/625] eta 0:03:10 lr 0.001200 wd 0.0500 time 0.4440 (0.4486) data time 0.0008 (0.0027) model time 0.4432 (0.4444) loss 4.2414 (4.1095) grad_norm 2.2531 (1.4410) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][210/625] eta 0:03:06 lr 0.001200 wd 0.0500 time 0.4436 (0.4485) data time 0.0009 (0.0026) model time 0.4427 (0.4445) loss 3.9306 (4.1025) grad_norm 1.3909 (1.4450) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][220/625] eta 0:03:02 lr 0.001200 wd 0.0500 time 0.4416 (0.4499) data time 0.0006 (0.0025) model time 0.4410 (0.4466) loss 5.1934 (4.1116) grad_norm 1.1864 (1.4449) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:16:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][230/625] eta 0:02:57 lr 0.001200 wd 0.0500 time 0.4417 (0.4497) data time 0.0006 (0.0024) model time 0.4411 (0.4463) loss 3.4563 (4.1159) grad_norm 1.5645 (1.4446) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][240/625] eta 0:02:53 lr 0.001200 wd 0.0500 time 0.4450 (0.4494) data time 0.0006 (0.0024) model time 0.4444 (0.4462) loss 4.8638 (4.1122) grad_norm 1.4481 (1.4497) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][250/625] eta 0:02:48 lr 0.001200 wd 0.0500 time 0.4464 (0.4492) data time 0.0006 (0.0023) model time 0.4458 (0.4460) loss 3.5430 (4.0928) grad_norm 1.2683 (1.4473) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][260/625] eta 0:02:43 lr 0.001200 wd 0.0500 time 0.4419 (0.4491) data time 0.0006 (0.0022) model time 0.4413 (0.4460) loss 3.3898 (4.0973) grad_norm 1.4482 (1.4468) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][270/625] eta 0:02:39 lr 0.001200 wd 0.0500 time 0.4452 (0.4489) data time 0.0006 (0.0022) model time 0.4446 (0.4458) loss 5.1549 (4.1135) grad_norm 1.6340 (1.4454) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][280/625] eta 0:02:34 lr 0.001200 wd 0.0500 time 0.4466 (0.4487) data time 0.0008 (0.0021) model time 0.4458 (0.4457) loss 4.2218 (4.1045) grad_norm 2.2331 (1.4488) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][290/625] eta 0:02:30 lr 0.001200 wd 0.0500 time 0.4415 (0.4486) data time 0.0009 (0.0021) model time 0.4406 (0.4456) loss 3.6027 (4.0934) grad_norm 1.8723 (1.4552) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][300/625] eta 0:02:25 lr 0.001200 wd 0.0500 time 0.4414 (0.4484) data time 0.0008 (0.0021) model time 0.4406 (0.4455) loss 3.9909 (4.0901) grad_norm 2.1613 (1.4561) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][310/625] eta 0:02:21 lr 0.001200 wd 0.0500 time 0.4443 (0.4483) data time 0.0008 (0.0020) model time 0.4435 (0.4455) loss 4.4688 (4.0863) grad_norm 1.6822 (1.4554) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][320/625] eta 0:02:16 lr 0.001200 wd 0.0500 time 0.4398 (0.4482) data time 0.0006 (0.0020) model time 0.4392 (0.4454) loss 4.4835 (4.0889) grad_norm 1.6457 (1.4549) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][330/625] eta 0:02:12 lr 0.001200 wd 0.0500 time 0.4416 (0.4481) data time 0.0006 (0.0019) model time 0.4410 (0.4454) loss 3.9495 (4.0896) grad_norm 1.3481 (1.4524) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][340/625] eta 0:02:07 lr 0.001200 wd 0.0500 time 0.4396 (0.4479) data time 0.0006 (0.0019) model time 0.4390 (0.4453) loss 3.6051 (4.0896) grad_norm 1.1680 (1.4521) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][350/625] eta 0:02:03 lr 0.001200 wd 0.0500 time 0.4402 (0.4478) data time 0.0007 (0.0019) model time 0.4395 (0.4452) loss 3.9966 (4.0932) grad_norm 1.6418 (1.4583) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][360/625] eta 0:01:58 lr 0.001200 wd 0.0500 time 0.4455 (0.4477) data time 0.0006 (0.0018) model time 0.4449 (0.4452) loss 3.3057 (4.0854) grad_norm 1.0386 (1.4562) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][370/625] eta 0:01:54 lr 0.001200 wd 0.0500 time 0.4464 (0.4476) data time 0.0008 (0.0018) model time 0.4456 (0.4451) loss 4.3788 (4.0880) grad_norm 1.4007 (1.4541) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][380/625] eta 0:01:49 lr 0.001200 wd 0.0500 time 0.4405 (0.4476) data time 0.0008 (0.0018) model time 0.4397 (0.4451) loss 4.7800 (4.0856) grad_norm 1.5523 (1.4577) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][390/625] eta 0:01:45 lr 0.001200 wd 0.0500 time 0.4449 (0.4475) data time 0.0008 (0.0018) model time 0.4441 (0.4450) loss 4.7247 (4.0829) grad_norm 1.0281 (1.4629) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][400/625] eta 0:01:40 lr 0.001200 wd 0.0500 time 0.4406 (0.4477) data time 0.0007 (0.0017) model time 0.4399 (0.4454) loss 4.7293 (4.0847) grad_norm 1.4770 (1.4624) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][410/625] eta 0:01:36 lr 0.001200 wd 0.0500 time 0.4461 (0.4476) data time 0.0007 (0.0017) model time 0.4454 (0.4453) loss 3.3529 (4.0906) grad_norm 1.2104 (1.4581) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][420/625] eta 0:01:31 lr 0.001200 wd 0.0500 time 0.4469 (0.4475) data time 0.0006 (0.0017) model time 0.4464 (0.4452) loss 4.2917 (4.0978) grad_norm 1.7281 (1.4628) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][430/625] eta 0:01:27 lr 0.001200 wd 0.0500 time 0.4420 (0.4475) data time 0.0009 (0.0017) model time 0.4411 (0.4452) loss 3.9032 (4.0949) grad_norm 1.1631 (1.4613) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][440/625] eta 0:01:22 lr 0.001200 wd 0.0500 time 0.4424 (0.4482) data time 0.0008 (0.0016) model time 0.4416 (0.4460) loss 4.5817 (4.0957) grad_norm 1.3406 (1.4612) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][450/625] eta 0:01:18 lr 0.001200 wd 0.0500 time 0.4412 (0.4481) data time 0.0008 (0.0016) model time 0.4404 (0.4459) loss 4.2222 (4.0928) grad_norm 1.6195 (1.4600) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][460/625] eta 0:01:13 lr 0.001200 wd 0.0500 time 0.4454 (0.4480) data time 0.0008 (0.0016) model time 0.4446 (0.4459) loss 4.5460 (4.0958) grad_norm 1.7025 (1.4663) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][470/625] eta 0:01:09 lr 0.001200 wd 0.0500 time 0.4437 (0.4479) data time 0.0006 (0.0016) model time 0.4431 (0.4458) loss 2.7342 (4.0961) grad_norm 1.6971 (1.4744) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][480/625] eta 0:01:04 lr 0.001200 wd 0.0500 time 0.4432 (0.4478) data time 0.0008 (0.0016) model time 0.4425 (0.4457) loss 2.9171 (4.0912) grad_norm 1.2438 (1.4744) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][490/625] eta 0:01:00 lr 0.001200 wd 0.0500 time 0.4433 (0.4477) data time 0.0008 (0.0016) model time 0.4425 (0.4456) loss 4.0710 (4.0828) grad_norm 1.2235 (1.4707) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][500/625] eta 0:00:55 lr 0.001200 wd 0.0500 time 0.4458 (0.4476) data time 0.0009 (0.0015) model time 0.4449 (0.4456) loss 2.9159 (4.0779) grad_norm 1.4705 (1.4678) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][510/625] eta 0:00:51 lr 0.001200 wd 0.0500 time 0.4408 (0.4475) data time 0.0008 (0.0015) model time 0.4400 (0.4455) loss 4.4998 (4.0831) grad_norm 1.2824 (1.4739) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][520/625] eta 0:00:46 lr 0.001200 wd 0.0500 time 0.4436 (0.4475) data time 0.0007 (0.0015) model time 0.4429 (0.4455) loss 4.8140 (4.0886) grad_norm 1.3634 (1.4743) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][530/625] eta 0:00:42 lr 0.001200 wd 0.0500 time 0.4436 (0.4474) data time 0.0006 (0.0015) model time 0.4430 (0.4454) loss 2.9919 (4.0906) grad_norm 1.1824 (1.4716) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][540/625] eta 0:00:38 lr 0.001200 wd 0.0500 time 0.4405 (0.4473) data time 0.0006 (0.0015) model time 0.4399 (0.4454) loss 4.2866 (4.0938) grad_norm 1.2956 (1.4680) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][550/625] eta 0:00:33 lr 0.001200 wd 0.0500 time 0.4414 (0.4473) data time 0.0006 (0.0015) model time 0.4408 (0.4453) loss 4.1514 (4.0958) grad_norm 1.1478 (1.4675) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][560/625] eta 0:00:29 lr 0.001200 wd 0.0500 time 0.4431 (0.4472) data time 0.0008 (0.0015) model time 0.4423 (0.4453) loss 3.1895 (4.0943) grad_norm 1.4217 (1.4639) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][570/625] eta 0:00:24 lr 0.001200 wd 0.0500 time 0.4446 (0.4471) data time 0.0008 (0.0014) model time 0.4437 (0.4452) loss 3.3725 (4.0909) grad_norm 1.1716 (1.4624) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][580/625] eta 0:00:20 lr 0.001200 wd 0.0500 time 0.6320 (0.4478) data time 0.0008 (0.0014) model time 0.6312 (0.4460) loss 4.1327 (4.0860) grad_norm 1.6820 (1.4632) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][590/625] eta 0:00:15 lr 0.001200 wd 0.0500 time 0.4454 (0.4478) data time 0.0008 (0.0014) model time 0.4446 (0.4460) loss 3.2626 (4.0891) grad_norm 2.2046 (1.4678) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][600/625] eta 0:00:11 lr 0.001200 wd 0.0500 time 0.4457 (0.4478) data time 0.0006 (0.0014) model time 0.4451 (0.4460) loss 4.2778 (4.0918) grad_norm 1.5060 (1.4703) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][610/625] eta 0:00:06 lr 0.001200 wd 0.0500 time 0.4350 (0.4477) data time 0.0007 (0.0014) model time 0.4343 (0.4460) loss 3.8900 (4.0940) grad_norm 1.1831 (1.4694) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [20/300][620/625] eta 0:00:02 lr 0.001200 wd 0.0500 time 0.4380 (0.4476) data time 0.0005 (0.0014) model time 0.4375 (0.4458) loss 4.4807 (4.0917) grad_norm 1.5317 (1.4669) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 20 training takes 0:04:39 [2024-08-04 11:19:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:19:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:19:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.8359 (0.8359) Acc@1 80.078 (80.078) Acc@5 95.410 (95.410) Mem 16696MB [2024-08-04 11:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 1.5527 (1.0667) Acc@1 62.744 (73.438) Acc@5 87.012 (93.191) Mem 16696MB [2024-08-04 11:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.7451 (1.3188) Acc@1 59.668 (68.404) Acc@5 83.545 (89.653) Mem 16696MB [2024-08-04 11:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 68.316 Acc@5 89.549 [2024-08-04 11:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 68.3% [2024-08-04 11:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 68.32% [2024-08-04 11:19:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:19:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 1.6055 (1.6055) Acc@1 63.330 (63.330) Acc@5 84.375 (84.375) Mem 16696MB [2024-08-04 11:20:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 2.3574 (1.8329) Acc@1 46.191 (56.339) Acc@5 74.512 (82.098) Mem 16696MB [2024-08-04 11:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.134) Loss 2.5000 (2.0760) Acc@1 43.896 (52.718) Acc@5 70.020 (77.818) Mem 16696MB [2024-08-04 11:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 52.943 Acc@5 78.239 [2024-08-04 11:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 52.9% [2024-08-04 11:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 52.94% [2024-08-04 11:20:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:20:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][0/625] eta 0:07:59 lr 0.001200 wd 0.0500 time 0.7676 (0.7676) data time 0.3859 (0.3859) model time 0.0000 (0.0000) loss 4.1063 (4.1063) grad_norm 1.3820 (1.3820) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][10/625] eta 0:04:51 lr 0.001200 wd 0.0500 time 0.4425 (0.4733) data time 0.0005 (0.0358) model time 0.0000 (0.0000) loss 4.6852 (4.0031) grad_norm 1.4062 (1.5117) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][20/625] eta 0:04:37 lr 0.001200 wd 0.0500 time 0.4479 (0.4593) data time 0.0007 (0.0191) model time 0.0000 (0.0000) loss 4.4208 (4.0239) grad_norm 1.4467 (1.4139) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][30/625] eta 0:04:30 lr 0.001200 wd 0.0500 time 0.4506 (0.4540) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 4.9322 (4.1528) grad_norm 1.3760 (inf) loss_scale 8192.0000 (13741.4194) mem 16696MB [2024-08-04 11:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][40/625] eta 0:04:23 lr 0.001200 wd 0.0500 time 0.4415 (0.4512) data time 0.0008 (0.0102) model time 0.0000 (0.0000) loss 3.5222 (4.0329) grad_norm 1.0697 (inf) loss_scale 8192.0000 (12387.9024) mem 16696MB [2024-08-04 11:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][50/625] eta 0:04:18 lr 0.001200 wd 0.0500 time 0.4448 (0.4500) data time 0.0010 (0.0084) model time 0.0000 (0.0000) loss 4.1221 (4.0522) grad_norm 1.1802 (inf) loss_scale 8192.0000 (11565.1765) mem 16696MB [2024-08-04 11:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][60/625] eta 0:04:15 lr 0.001200 wd 0.0500 time 0.6530 (0.4527) data time 0.0010 (0.0072) model time 0.6520 (0.4655) loss 3.4035 (4.0746) grad_norm 1.7886 (inf) loss_scale 8192.0000 (11012.1967) mem 16696MB [2024-08-04 11:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][70/625] eta 0:04:10 lr 0.001200 wd 0.0500 time 0.4427 (0.4506) data time 0.0008 (0.0063) model time 0.4418 (0.4514) loss 3.8533 (4.0443) grad_norm 1.3214 (inf) loss_scale 8192.0000 (10614.9859) mem 16696MB [2024-08-04 11:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][80/625] eta 0:04:05 lr 0.001200 wd 0.0500 time 0.4381 (0.4497) data time 0.0007 (0.0056) model time 0.4374 (0.4483) loss 4.7878 (4.0504) grad_norm 1.1829 (inf) loss_scale 8192.0000 (10315.8519) mem 16696MB [2024-08-04 11:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][90/625] eta 0:04:00 lr 0.001200 wd 0.0500 time 0.4429 (0.4489) data time 0.0008 (0.0051) model time 0.4420 (0.4468) loss 3.2624 (4.0783) grad_norm 1.3742 (inf) loss_scale 8192.0000 (10082.4615) mem 16696MB [2024-08-04 11:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][100/625] eta 0:03:55 lr 0.001200 wd 0.0500 time 0.4460 (0.4485) data time 0.0009 (0.0047) model time 0.4450 (0.4462) loss 4.5877 (4.0958) grad_norm 1.7126 (inf) loss_scale 8192.0000 (9895.2871) mem 16696MB [2024-08-04 11:20:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][110/625] eta 0:03:50 lr 0.001200 wd 0.0500 time 0.4413 (0.4480) data time 0.0006 (0.0043) model time 0.4407 (0.4455) loss 4.2301 (4.0875) grad_norm 1.6550 (inf) loss_scale 8192.0000 (9741.8378) mem 16696MB [2024-08-04 11:20:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][120/625] eta 0:03:46 lr 0.001200 wd 0.0500 time 0.4433 (0.4477) data time 0.0008 (0.0040) model time 0.4425 (0.4453) loss 4.4845 (4.0476) grad_norm 1.3467 (inf) loss_scale 8192.0000 (9613.7521) mem 16696MB [2024-08-04 11:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][130/625] eta 0:03:43 lr 0.001200 wd 0.0500 time 0.4436 (0.4505) data time 0.0008 (0.0038) model time 0.4427 (0.4501) loss 4.3932 (4.0479) grad_norm 1.6156 (inf) loss_scale 8192.0000 (9505.2214) mem 16696MB [2024-08-04 11:21:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][140/625] eta 0:03:38 lr 0.001200 wd 0.0500 time 0.4455 (0.4501) data time 0.0008 (0.0036) model time 0.4446 (0.4493) loss 3.6821 (4.0467) grad_norm 1.1433 (inf) loss_scale 8192.0000 (9412.0851) mem 16696MB [2024-08-04 11:21:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][150/625] eta 0:03:33 lr 0.001200 wd 0.0500 time 0.4436 (0.4497) data time 0.0007 (0.0034) model time 0.4429 (0.4487) loss 4.7850 (4.0292) grad_norm 1.4725 (inf) loss_scale 8192.0000 (9331.2848) mem 16696MB [2024-08-04 11:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][160/625] eta 0:03:28 lr 0.001200 wd 0.0500 time 0.4411 (0.4493) data time 0.0008 (0.0032) model time 0.4403 (0.4481) loss 3.9882 (4.0168) grad_norm 1.2815 (inf) loss_scale 8192.0000 (9260.5217) mem 16696MB [2024-08-04 11:21:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][170/625] eta 0:03:24 lr 0.001200 wd 0.0500 time 0.4432 (0.4489) data time 0.0008 (0.0031) model time 0.4425 (0.4477) loss 3.9669 (4.0318) grad_norm 2.0281 (inf) loss_scale 8192.0000 (9198.0351) mem 16696MB [2024-08-04 11:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][180/625] eta 0:03:19 lr 0.001200 wd 0.0500 time 0.4434 (0.4487) data time 0.0006 (0.0029) model time 0.4427 (0.4474) loss 4.0761 (4.0113) grad_norm 1.1373 (inf) loss_scale 8192.0000 (9142.4530) mem 16696MB [2024-08-04 11:21:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][190/625] eta 0:03:15 lr 0.001200 wd 0.0500 time 0.4458 (0.4484) data time 0.0009 (0.0028) model time 0.4450 (0.4471) loss 3.8812 (3.9923) grad_norm 1.4724 (inf) loss_scale 8192.0000 (9092.6911) mem 16696MB [2024-08-04 11:21:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][200/625] eta 0:03:10 lr 0.001200 wd 0.0500 time 0.4405 (0.4481) data time 0.0008 (0.0027) model time 0.4397 (0.4467) loss 4.1553 (3.9918) grad_norm 1.2987 (inf) loss_scale 8192.0000 (9047.8806) mem 16696MB [2024-08-04 11:21:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][210/625] eta 0:03:05 lr 0.001200 wd 0.0500 time 0.4437 (0.4479) data time 0.0007 (0.0027) model time 0.4430 (0.4464) loss 2.9823 (3.9878) grad_norm 1.6736 (inf) loss_scale 8192.0000 (9007.3175) mem 16696MB [2024-08-04 11:21:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][220/625] eta 0:03:01 lr 0.001200 wd 0.0500 time 0.4451 (0.4478) data time 0.0007 (0.0026) model time 0.4444 (0.4463) loss 4.2899 (3.9968) grad_norm 1.4990 (inf) loss_scale 8192.0000 (8970.4253) mem 16696MB [2024-08-04 11:21:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][230/625] eta 0:02:56 lr 0.001200 wd 0.0500 time 0.4395 (0.4476) data time 0.0008 (0.0025) model time 0.4386 (0.4461) loss 4.1429 (4.0025) grad_norm 1.1129 (inf) loss_scale 8192.0000 (8936.7273) mem 16696MB [2024-08-04 11:21:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][240/625] eta 0:02:52 lr 0.001200 wd 0.0500 time 0.4452 (0.4474) data time 0.0006 (0.0024) model time 0.4445 (0.4459) loss 3.2473 (3.9916) grad_norm 1.5031 (inf) loss_scale 8192.0000 (8905.8257) mem 16696MB [2024-08-04 11:21:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][250/625] eta 0:02:47 lr 0.001200 wd 0.0500 time 0.4406 (0.4473) data time 0.0006 (0.0024) model time 0.4400 (0.4458) loss 4.1568 (4.0084) grad_norm 1.3991 (inf) loss_scale 8192.0000 (8877.3865) mem 16696MB [2024-08-04 11:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][260/625] eta 0:02:43 lr 0.001200 wd 0.0500 time 0.4444 (0.4473) data time 0.0007 (0.0023) model time 0.4437 (0.4458) loss 4.1053 (4.0136) grad_norm 2.2329 (inf) loss_scale 8192.0000 (8851.1264) mem 16696MB [2024-08-04 11:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][270/625] eta 0:02:38 lr 0.001200 wd 0.0500 time 0.4464 (0.4472) data time 0.0009 (0.0023) model time 0.4455 (0.4457) loss 3.1808 (3.9981) grad_norm 1.7074 (inf) loss_scale 8192.0000 (8826.8044) mem 16696MB [2024-08-04 11:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][280/625] eta 0:02:34 lr 0.001200 wd 0.0500 time 0.4423 (0.4471) data time 0.0009 (0.0022) model time 0.4414 (0.4456) loss 3.6574 (3.9813) grad_norm 1.6614 (inf) loss_scale 8192.0000 (8804.2135) mem 16696MB [2024-08-04 11:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][290/625] eta 0:02:29 lr 0.001200 wd 0.0500 time 0.4489 (0.4471) data time 0.0008 (0.0022) model time 0.4480 (0.4457) loss 4.2661 (3.9794) grad_norm 1.6186 (inf) loss_scale 8192.0000 (8783.1753) mem 16696MB [2024-08-04 11:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][300/625] eta 0:02:25 lr 0.001200 wd 0.0500 time 0.4409 (0.4471) data time 0.0009 (0.0021) model time 0.4400 (0.4456) loss 4.1689 (3.9767) grad_norm 1.1414 (inf) loss_scale 8192.0000 (8763.5349) mem 16696MB [2024-08-04 11:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][310/625] eta 0:02:20 lr 0.001200 wd 0.0500 time 0.4476 (0.4471) data time 0.0008 (0.0021) model time 0.4468 (0.4457) loss 4.2759 (3.9908) grad_norm 1.1658 (inf) loss_scale 8192.0000 (8745.1576) mem 16696MB [2024-08-04 11:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][320/625] eta 0:02:16 lr 0.001200 wd 0.0500 time 0.4410 (0.4470) data time 0.0008 (0.0020) model time 0.4402 (0.4456) loss 3.9425 (3.9964) grad_norm 1.3889 (inf) loss_scale 8192.0000 (8727.9252) mem 16696MB [2024-08-04 11:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][330/625] eta 0:02:11 lr 0.001200 wd 0.0500 time 0.4435 (0.4469) data time 0.0009 (0.0020) model time 0.4426 (0.4456) loss 3.7328 (3.9960) grad_norm 1.5858 (inf) loss_scale 8192.0000 (8711.7341) mem 16696MB [2024-08-04 11:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][340/625] eta 0:02:07 lr 0.001200 wd 0.0500 time 0.4443 (0.4469) data time 0.0008 (0.0020) model time 0.4435 (0.4455) loss 4.5640 (3.9986) grad_norm 1.0098 (inf) loss_scale 8192.0000 (8696.4927) mem 16696MB [2024-08-04 11:22:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][350/625] eta 0:02:02 lr 0.001200 wd 0.0500 time 0.4462 (0.4468) data time 0.0006 (0.0019) model time 0.4456 (0.4454) loss 4.3464 (3.9999) grad_norm 1.4766 (inf) loss_scale 4096.0000 (8565.4245) mem 16696MB [2024-08-04 11:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][360/625] eta 0:01:58 lr 0.001200 wd 0.0500 time 0.4435 (0.4468) data time 0.0007 (0.0019) model time 0.4428 (0.4454) loss 3.8430 (3.9941) grad_norm 1.6817 (inf) loss_scale 4096.0000 (8441.6177) mem 16696MB [2024-08-04 11:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][370/625] eta 0:01:53 lr 0.001200 wd 0.0500 time 0.4431 (0.4467) data time 0.0006 (0.0019) model time 0.4425 (0.4454) loss 3.7139 (3.9952) grad_norm 1.5348 (inf) loss_scale 4096.0000 (8324.4852) mem 16696MB [2024-08-04 11:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][380/625] eta 0:01:49 lr 0.001200 wd 0.0500 time 0.4410 (0.4466) data time 0.0007 (0.0018) model time 0.4404 (0.4453) loss 3.5396 (3.9942) grad_norm 1.8274 (inf) loss_scale 4096.0000 (8213.5013) mem 16696MB [2024-08-04 11:22:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][390/625] eta 0:01:44 lr 0.001200 wd 0.0500 time 0.4443 (0.4466) data time 0.0008 (0.0018) model time 0.4435 (0.4453) loss 3.4538 (3.9914) grad_norm 1.4867 (inf) loss_scale 4096.0000 (8108.1944) mem 16696MB [2024-08-04 11:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][400/625] eta 0:01:40 lr 0.001200 wd 0.0500 time 0.4416 (0.4470) data time 0.0007 (0.0018) model time 0.4410 (0.4457) loss 2.7440 (3.9765) grad_norm 1.2702 (inf) loss_scale 4096.0000 (8008.1397) mem 16696MB [2024-08-04 11:23:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][410/625] eta 0:01:36 lr 0.001200 wd 0.0500 time 0.4429 (0.4469) data time 0.0008 (0.0018) model time 0.4422 (0.4457) loss 3.1456 (3.9777) grad_norm 1.5420 (inf) loss_scale 4096.0000 (7912.9538) mem 16696MB [2024-08-04 11:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][420/625] eta 0:01:31 lr 0.001200 wd 0.0500 time 0.4434 (0.4469) data time 0.0007 (0.0017) model time 0.4427 (0.4456) loss 2.9136 (3.9725) grad_norm 2.6587 (inf) loss_scale 4096.0000 (7822.2898) mem 16696MB [2024-08-04 11:23:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][430/625] eta 0:01:27 lr 0.001200 wd 0.0500 time 0.4466 (0.4468) data time 0.0009 (0.0017) model time 0.4457 (0.4456) loss 4.1897 (3.9785) grad_norm 1.2074 (inf) loss_scale 4096.0000 (7735.8329) mem 16696MB [2024-08-04 11:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][440/625] eta 0:01:22 lr 0.001200 wd 0.0500 time 0.4433 (0.4468) data time 0.0008 (0.0017) model time 0.4425 (0.4455) loss 3.2192 (3.9709) grad_norm 1.9862 (inf) loss_scale 4096.0000 (7653.2971) mem 16696MB [2024-08-04 11:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][450/625] eta 0:01:18 lr 0.001200 wd 0.0500 time 0.4457 (0.4467) data time 0.0006 (0.0017) model time 0.4451 (0.4455) loss 5.1905 (3.9715) grad_norm 1.3100 (inf) loss_scale 4096.0000 (7574.4213) mem 16696MB [2024-08-04 11:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][460/625] eta 0:01:13 lr 0.001200 wd 0.0500 time 0.6606 (0.4471) data time 0.0009 (0.0017) model time 0.6596 (0.4460) loss 4.2385 (3.9721) grad_norm 1.1899 (inf) loss_scale 4096.0000 (7498.9675) mem 16696MB [2024-08-04 11:23:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][470/625] eta 0:01:09 lr 0.001200 wd 0.0500 time 0.4432 (0.4470) data time 0.0006 (0.0016) model time 0.4426 (0.4459) loss 3.5963 (3.9675) grad_norm 1.6991 (inf) loss_scale 4096.0000 (7426.7176) mem 16696MB [2024-08-04 11:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][480/625] eta 0:01:04 lr 0.001200 wd 0.0500 time 0.4446 (0.4470) data time 0.0009 (0.0016) model time 0.4437 (0.4458) loss 4.5126 (3.9727) grad_norm 1.7549 (inf) loss_scale 4096.0000 (7357.4719) mem 16696MB [2024-08-04 11:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][490/625] eta 0:01:00 lr 0.001200 wd 0.0500 time 0.4438 (0.4469) data time 0.0007 (0.0016) model time 0.4431 (0.4457) loss 4.4776 (3.9753) grad_norm 1.1651 (inf) loss_scale 4096.0000 (7291.0468) mem 16696MB [2024-08-04 11:23:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][500/625] eta 0:00:55 lr 0.001200 wd 0.0500 time 0.4400 (0.4468) data time 0.0009 (0.0016) model time 0.4390 (0.4456) loss 4.7731 (3.9736) grad_norm 1.8468 (inf) loss_scale 4096.0000 (7227.2735) mem 16696MB [2024-08-04 11:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][510/625] eta 0:00:51 lr 0.001200 wd 0.0500 time 0.4454 (0.4467) data time 0.0008 (0.0016) model time 0.4446 (0.4456) loss 3.9675 (3.9698) grad_norm 2.2711 (inf) loss_scale 4096.0000 (7165.9961) mem 16696MB [2024-08-04 11:23:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][520/625] eta 0:00:46 lr 0.001200 wd 0.0500 time 0.4420 (0.4467) data time 0.0008 (0.0016) model time 0.4412 (0.4455) loss 4.3373 (3.9631) grad_norm 2.2986 (inf) loss_scale 4096.0000 (7107.0710) mem 16696MB [2024-08-04 11:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][530/625] eta 0:00:42 lr 0.001200 wd 0.0500 time 0.4430 (0.4466) data time 0.0006 (0.0016) model time 0.4424 (0.4454) loss 3.1245 (3.9562) grad_norm 1.0623 (inf) loss_scale 4096.0000 (7050.3653) mem 16696MB [2024-08-04 11:24:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][540/625] eta 0:00:37 lr 0.001200 wd 0.0500 time 0.4420 (0.4466) data time 0.0009 (0.0015) model time 0.4411 (0.4454) loss 4.0347 (3.9573) grad_norm 1.9339 (inf) loss_scale 4096.0000 (6995.7560) mem 16696MB [2024-08-04 11:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][550/625] eta 0:00:33 lr 0.001200 wd 0.0500 time 0.4459 (0.4465) data time 0.0008 (0.0015) model time 0.4451 (0.4454) loss 4.3356 (3.9631) grad_norm 1.2986 (inf) loss_scale 4096.0000 (6943.1289) mem 16696MB [2024-08-04 11:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][560/625] eta 0:00:29 lr 0.001200 wd 0.0500 time 0.4444 (0.4465) data time 0.0007 (0.0015) model time 0.4437 (0.4453) loss 2.8387 (3.9660) grad_norm 1.2896 (inf) loss_scale 4096.0000 (6892.3779) mem 16696MB [2024-08-04 11:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][570/625] eta 0:00:24 lr 0.001200 wd 0.0500 time 0.4469 (0.4465) data time 0.0008 (0.0015) model time 0.4461 (0.4453) loss 4.3995 (3.9616) grad_norm 1.2148 (inf) loss_scale 4096.0000 (6843.4046) mem 16696MB [2024-08-04 11:24:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][580/625] eta 0:00:20 lr 0.001200 wd 0.0500 time 0.4497 (0.4465) data time 0.0009 (0.0015) model time 0.4488 (0.4453) loss 3.4168 (3.9601) grad_norm 1.3113 (inf) loss_scale 4096.0000 (6796.1170) mem 16696MB [2024-08-04 11:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][590/625] eta 0:00:15 lr 0.001200 wd 0.0500 time 0.4440 (0.4465) data time 0.0008 (0.0015) model time 0.4432 (0.4454) loss 4.0574 (3.9629) grad_norm 1.7512 (inf) loss_scale 4096.0000 (6750.4298) mem 16696MB [2024-08-04 11:24:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][600/625] eta 0:00:11 lr 0.001200 wd 0.0500 time 0.4438 (0.4465) data time 0.0009 (0.0015) model time 0.4429 (0.4454) loss 4.2717 (3.9646) grad_norm 2.2809 (inf) loss_scale 4096.0000 (6706.2629) mem 16696MB [2024-08-04 11:24:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][610/625] eta 0:00:06 lr 0.001200 wd 0.0500 time 0.4370 (0.4465) data time 0.0004 (0.0015) model time 0.4365 (0.4453) loss 3.6214 (3.9630) grad_norm 1.0791 (inf) loss_scale 4096.0000 (6663.5417) mem 16696MB [2024-08-04 11:24:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [21/300][620/625] eta 0:00:02 lr 0.001200 wd 0.0500 time 0.4381 (0.4464) data time 0.0004 (0.0014) model time 0.4377 (0.4452) loss 4.7933 (3.9643) grad_norm 1.7944 (inf) loss_scale 4096.0000 (6622.1965) mem 16696MB [2024-08-04 11:24:42 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 21 training takes 0:04:38 [2024-08-04 11:24:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:24:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.8770 (0.8770) Acc@1 79.297 (79.297) Acc@5 95.312 (95.312) Mem 16696MB [2024-08-04 11:24:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.4902 (1.0753) Acc@1 65.918 (74.285) Acc@5 88.330 (93.679) Mem 16696MB [2024-08-04 11:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.7070 (1.3241) Acc@1 60.547 (69.010) Acc@5 84.766 (90.011) Mem 16696MB [2024-08-04 11:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 68.882 Acc@5 89.957 [2024-08-04 11:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 68.9% [2024-08-04 11:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 68.88% [2024-08-04 11:24:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:24:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 1.3955 (1.3955) Acc@1 66.699 (66.699) Acc@5 87.256 (87.256) Mem 16696MB [2024-08-04 11:24:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 2.1465 (1.6264) Acc@1 50.098 (60.183) Acc@5 77.588 (84.841) Mem 16696MB [2024-08-04 11:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 2.2969 (1.8778) Acc@1 47.607 (56.194) Acc@5 73.486 (80.720) Mem 16696MB [2024-08-04 11:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 56.392 Acc@5 81.054 [2024-08-04 11:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 56.4% [2024-08-04 11:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 56.39% [2024-08-04 11:24:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:24:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][0/625] eta 0:07:35 lr 0.001200 wd 0.0500 time 0.7294 (0.7294) data time 0.3456 (0.3456) model time 0.0000 (0.0000) loss 4.5101 (4.5101) grad_norm 1.5156 (1.5156) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:24:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][10/625] eta 0:04:49 lr 0.001200 wd 0.0500 time 0.4457 (0.4703) data time 0.0008 (0.0322) model time 0.0000 (0.0000) loss 4.1908 (4.0883) grad_norm 1.1664 (1.5599) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][20/625] eta 0:04:36 lr 0.001200 wd 0.0500 time 0.4365 (0.4574) data time 0.0006 (0.0172) model time 0.0000 (0.0000) loss 4.4131 (4.0610) grad_norm 1.8512 (1.6474) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][30/625] eta 0:04:29 lr 0.001200 wd 0.0500 time 0.4442 (0.4529) data time 0.0006 (0.0119) model time 0.0000 (0.0000) loss 4.2209 (4.1793) grad_norm 1.4061 (1.5861) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][40/625] eta 0:04:23 lr 0.001200 wd 0.0500 time 0.4473 (0.4509) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 3.6985 (4.1045) grad_norm 1.6862 (1.5941) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][50/625] eta 0:04:20 lr 0.001200 wd 0.0500 time 0.4416 (0.4535) data time 0.0008 (0.0075) model time 0.0000 (0.0000) loss 3.9740 (4.0932) grad_norm 1.1363 (1.5725) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][60/625] eta 0:04:17 lr 0.001200 wd 0.0500 time 0.6591 (0.4555) data time 0.0008 (0.0064) model time 0.6583 (0.4644) loss 4.0979 (4.0235) grad_norm 1.0915 (1.5471) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][70/625] eta 0:04:11 lr 0.001200 wd 0.0500 time 0.4425 (0.4530) data time 0.0006 (0.0056) model time 0.4419 (0.4510) loss 2.9626 (3.9919) grad_norm 1.7537 (1.5177) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][80/625] eta 0:04:06 lr 0.001200 wd 0.0500 time 0.4444 (0.4520) data time 0.0007 (0.0050) model time 0.4438 (0.4485) loss 3.5970 (3.9928) grad_norm 1.3666 (1.5363) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][90/625] eta 0:04:01 lr 0.001200 wd 0.0500 time 0.4412 (0.4511) data time 0.0006 (0.0046) model time 0.4406 (0.4473) loss 4.9432 (4.0119) grad_norm 1.1914 (1.5163) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][100/625] eta 0:03:56 lr 0.001200 wd 0.0500 time 0.4426 (0.4504) data time 0.0007 (0.0042) model time 0.4419 (0.4464) loss 4.6661 (4.0083) grad_norm 1.9350 (1.5086) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][110/625] eta 0:03:51 lr 0.001200 wd 0.0500 time 0.4442 (0.4498) data time 0.0007 (0.0039) model time 0.4435 (0.4458) loss 4.6081 (4.0145) grad_norm 1.5755 (1.5133) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][120/625] eta 0:03:46 lr 0.001200 wd 0.0500 time 0.4438 (0.4494) data time 0.0006 (0.0036) model time 0.4431 (0.4456) loss 3.6507 (4.0131) grad_norm 1.2451 (1.4959) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][130/625] eta 0:03:42 lr 0.001200 wd 0.0500 time 0.4425 (0.4490) data time 0.0007 (0.0034) model time 0.4418 (0.4453) loss 2.9004 (3.9870) grad_norm 1.8548 (1.5058) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:25:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][140/625] eta 0:03:37 lr 0.001200 wd 0.0500 time 0.4429 (0.4486) data time 0.0008 (0.0032) model time 0.4422 (0.4451) loss 4.8765 (4.0026) grad_norm 1.4278 (1.5096) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][150/625] eta 0:03:32 lr 0.001200 wd 0.0500 time 0.4443 (0.4483) data time 0.0008 (0.0031) model time 0.4435 (0.4449) loss 4.3154 (4.0068) grad_norm 0.9892 (1.5019) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][160/625] eta 0:03:28 lr 0.001200 wd 0.0500 time 0.4418 (0.4482) data time 0.0007 (0.0029) model time 0.4411 (0.4449) loss 4.5369 (4.0095) grad_norm 1.5800 (1.4933) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][170/625] eta 0:03:23 lr 0.001200 wd 0.0500 time 0.4445 (0.4479) data time 0.0008 (0.0028) model time 0.4437 (0.4448) loss 3.2241 (3.9865) grad_norm 1.4850 (1.4921) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][180/625] eta 0:03:19 lr 0.001200 wd 0.0500 time 0.4391 (0.4477) data time 0.0007 (0.0027) model time 0.4384 (0.4447) loss 2.9296 (3.9841) grad_norm 1.2059 (1.4873) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][190/625] eta 0:03:14 lr 0.001200 wd 0.0500 time 0.4411 (0.4475) data time 0.0007 (0.0026) model time 0.4404 (0.4445) loss 3.7741 (3.9827) grad_norm 1.1589 (1.4848) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][200/625] eta 0:03:10 lr 0.001200 wd 0.0500 time 0.4511 (0.4473) data time 0.0007 (0.0025) model time 0.4504 (0.4444) loss 4.0745 (3.9768) grad_norm 1.6732 (1.4763) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][210/625] eta 0:03:05 lr 0.001200 wd 0.0500 time 0.4437 (0.4472) data time 0.0009 (0.0024) model time 0.4429 (0.4444) loss 3.6810 (3.9775) grad_norm 1.1792 (1.4696) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][220/625] eta 0:03:01 lr 0.001200 wd 0.0500 time 0.4440 (0.4471) data time 0.0008 (0.0023) model time 0.4432 (0.4444) loss 4.2685 (3.9721) grad_norm 1.5699 (1.4650) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][230/625] eta 0:02:56 lr 0.001200 wd 0.0500 time 0.4473 (0.4470) data time 0.0008 (0.0023) model time 0.4464 (0.4444) loss 4.2972 (3.9586) grad_norm 1.5616 (1.4601) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][240/625] eta 0:02:52 lr 0.001200 wd 0.0500 time 0.4412 (0.4477) data time 0.0008 (0.0022) model time 0.4404 (0.4453) loss 4.1392 (3.9670) grad_norm 1.6853 (1.4598) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][250/625] eta 0:02:47 lr 0.001200 wd 0.0500 time 0.4422 (0.4476) data time 0.0008 (0.0022) model time 0.4415 (0.4453) loss 3.5085 (3.9774) grad_norm 1.3947 (1.4597) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][260/625] eta 0:02:43 lr 0.001200 wd 0.0500 time 0.4386 (0.4475) data time 0.0006 (0.0021) model time 0.4380 (0.4452) loss 5.0939 (3.9879) grad_norm 1.4355 (1.4602) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][270/625] eta 0:02:38 lr 0.001200 wd 0.0500 time 0.4410 (0.4473) data time 0.0006 (0.0021) model time 0.4404 (0.4451) loss 3.5708 (3.9815) grad_norm 1.1010 (1.4619) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][280/625] eta 0:02:34 lr 0.001200 wd 0.0500 time 0.4504 (0.4472) data time 0.0006 (0.0020) model time 0.4498 (0.4450) loss 3.8706 (3.9808) grad_norm 1.1968 (1.4596) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][290/625] eta 0:02:29 lr 0.001200 wd 0.0500 time 0.4418 (0.4471) data time 0.0006 (0.0020) model time 0.4412 (0.4450) loss 3.1269 (3.9814) grad_norm 1.3401 (1.4647) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][300/625] eta 0:02:25 lr 0.001200 wd 0.0500 time 0.4409 (0.4471) data time 0.0006 (0.0019) model time 0.4402 (0.4450) loss 4.1945 (3.9834) grad_norm 1.1380 (1.4636) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][310/625] eta 0:02:20 lr 0.001200 wd 0.0500 time 0.4433 (0.4470) data time 0.0009 (0.0019) model time 0.4423 (0.4449) loss 4.2822 (3.9786) grad_norm 1.9368 (1.4635) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][320/625] eta 0:02:16 lr 0.001200 wd 0.0500 time 0.4477 (0.4469) data time 0.0006 (0.0018) model time 0.4471 (0.4448) loss 4.6737 (3.9919) grad_norm 1.3739 (1.4658) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][330/625] eta 0:02:11 lr 0.001200 wd 0.0500 time 0.4442 (0.4468) data time 0.0006 (0.0018) model time 0.4436 (0.4448) loss 4.7676 (3.9962) grad_norm 0.9637 (1.4659) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][340/625] eta 0:02:07 lr 0.001200 wd 0.0500 time 0.4433 (0.4467) data time 0.0008 (0.0018) model time 0.4425 (0.4447) loss 3.6059 (4.0034) grad_norm 1.3219 (1.4623) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][350/625] eta 0:02:02 lr 0.001200 wd 0.0500 time 0.4419 (0.4466) data time 0.0008 (0.0018) model time 0.4411 (0.4446) loss 2.7212 (4.0004) grad_norm 1.1581 (1.4619) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][360/625] eta 0:01:58 lr 0.001200 wd 0.0500 time 0.4430 (0.4465) data time 0.0006 (0.0017) model time 0.4424 (0.4445) loss 4.5339 (3.9930) grad_norm 1.2226 (1.4605) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][370/625] eta 0:01:53 lr 0.001200 wd 0.0500 time 0.4488 (0.4464) data time 0.0006 (0.0017) model time 0.4483 (0.4445) loss 3.4201 (3.9887) grad_norm 1.4778 (1.4549) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][380/625] eta 0:01:49 lr 0.001200 wd 0.0500 time 0.4466 (0.4464) data time 0.0006 (0.0017) model time 0.4460 (0.4445) loss 3.6838 (3.9897) grad_norm 1.1684 (1.4512) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][390/625] eta 0:01:44 lr 0.001200 wd 0.0500 time 0.4454 (0.4463) data time 0.0008 (0.0017) model time 0.4446 (0.4444) loss 4.3922 (3.9928) grad_norm 1.4819 (1.4502) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][400/625] eta 0:01:40 lr 0.001200 wd 0.0500 time 0.4425 (0.4467) data time 0.0007 (0.0016) model time 0.4418 (0.4449) loss 2.7366 (3.9984) grad_norm 1.4322 (1.4508) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][410/625] eta 0:01:36 lr 0.001200 wd 0.0500 time 0.4399 (0.4466) data time 0.0007 (0.0016) model time 0.4392 (0.4448) loss 2.6497 (3.9984) grad_norm 1.1956 (1.4515) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][420/625] eta 0:01:31 lr 0.001200 wd 0.0500 time 0.4427 (0.4465) data time 0.0009 (0.0016) model time 0.4418 (0.4447) loss 2.7096 (3.9966) grad_norm 1.3670 (1.4445) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][430/625] eta 0:01:27 lr 0.001200 wd 0.0500 time 0.4429 (0.4464) data time 0.0007 (0.0016) model time 0.4422 (0.4446) loss 2.7570 (3.9919) grad_norm 1.3804 (1.4413) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][440/625] eta 0:01:22 lr 0.001200 wd 0.0500 time 0.4421 (0.4463) data time 0.0007 (0.0016) model time 0.4414 (0.4446) loss 4.6246 (3.9981) grad_norm 1.1725 (1.4424) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][450/625] eta 0:01:18 lr 0.001200 wd 0.0500 time 0.4415 (0.4463) data time 0.0009 (0.0015) model time 0.4406 (0.4445) loss 3.9511 (4.0053) grad_norm 1.9509 (1.4496) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][460/625] eta 0:01:13 lr 0.001200 wd 0.0500 time 0.4476 (0.4466) data time 0.0008 (0.0015) model time 0.4468 (0.4450) loss 4.0184 (3.9941) grad_norm 0.9672 (1.4453) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][470/625] eta 0:01:09 lr 0.001200 wd 0.0500 time 0.4462 (0.4466) data time 0.0008 (0.0015) model time 0.4454 (0.4449) loss 4.2895 (3.9901) grad_norm 1.1225 (1.4395) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][480/625] eta 0:01:04 lr 0.001200 wd 0.0500 time 0.4441 (0.4465) data time 0.0009 (0.0015) model time 0.4432 (0.4449) loss 4.5994 (3.9847) grad_norm 1.3090 (1.4361) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][490/625] eta 0:01:00 lr 0.001200 wd 0.0500 time 0.4413 (0.4464) data time 0.0008 (0.0015) model time 0.4405 (0.4448) loss 3.9228 (3.9861) grad_norm 1.1934 (1.4403) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][500/625] eta 0:00:55 lr 0.001200 wd 0.0500 time 0.4436 (0.4464) data time 0.0009 (0.0015) model time 0.4427 (0.4448) loss 3.8955 (3.9855) grad_norm 1.1697 (1.4420) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][510/625] eta 0:00:51 lr 0.001200 wd 0.0500 time 0.4463 (0.4463) data time 0.0008 (0.0015) model time 0.4456 (0.4447) loss 4.3588 (3.9819) grad_norm 1.1858 (1.4411) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][520/625] eta 0:00:46 lr 0.001200 wd 0.0500 time 0.4422 (0.4463) data time 0.0009 (0.0014) model time 0.4413 (0.4447) loss 4.4582 (3.9785) grad_norm 1.5431 (1.4422) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][530/625] eta 0:00:42 lr 0.001200 wd 0.0500 time 0.4457 (0.4462) data time 0.0008 (0.0014) model time 0.4449 (0.4447) loss 3.1970 (3.9749) grad_norm 1.3021 (1.4410) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][540/625] eta 0:00:37 lr 0.001200 wd 0.0500 time 0.4424 (0.4462) data time 0.0007 (0.0014) model time 0.4417 (0.4446) loss 5.0661 (3.9801) grad_norm 1.0810 (1.4430) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:28:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][550/625] eta 0:00:33 lr 0.001200 wd 0.0500 time 0.4423 (0.4461) data time 0.0007 (0.0014) model time 0.4417 (0.4446) loss 4.5353 (3.9785) grad_norm 1.6246 (1.4434) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][560/625] eta 0:00:28 lr 0.001200 wd 0.0500 time 0.4438 (0.4461) data time 0.0007 (0.0014) model time 0.4431 (0.4446) loss 2.8974 (3.9722) grad_norm 1.6265 (1.4422) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][570/625] eta 0:00:24 lr 0.001200 wd 0.0500 time 0.4444 (0.4461) data time 0.0009 (0.0014) model time 0.4435 (0.4445) loss 4.4329 (3.9736) grad_norm 1.3935 (1.4445) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][580/625] eta 0:00:20 lr 0.001200 wd 0.0500 time 0.4386 (0.4460) data time 0.0008 (0.0014) model time 0.4378 (0.4445) loss 3.9871 (3.9731) grad_norm 1.1780 (1.4449) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][590/625] eta 0:00:15 lr 0.001200 wd 0.0500 time 0.4444 (0.4461) data time 0.0009 (0.0014) model time 0.4435 (0.4446) loss 4.0931 (3.9736) grad_norm 1.2184 (1.4431) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][600/625] eta 0:00:11 lr 0.001200 wd 0.0500 time 0.4443 (0.4460) data time 0.0009 (0.0014) model time 0.4434 (0.4445) loss 4.1875 (3.9739) grad_norm 1.2205 (1.4405) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][610/625] eta 0:00:06 lr 0.001200 wd 0.0500 time 0.4413 (0.4463) data time 0.0006 (0.0014) model time 0.4407 (0.4449) loss 4.0762 (3.9709) grad_norm 1.6457 (1.4387) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [22/300][620/625] eta 0:00:02 lr 0.001200 wd 0.0500 time 0.4396 (0.4462) data time 0.0006 (0.0013) model time 0.4390 (0.4448) loss 4.4254 (3.9733) grad_norm 1.0597 (1.4389) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 22 training takes 0:04:38 [2024-08-04 11:29:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:29:33 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.8813 (0.8813) Acc@1 79.395 (79.395) Acc@5 95.557 (95.557) Mem 16696MB [2024-08-04 11:29:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 1.5908 (1.0850) Acc@1 63.379 (74.689) Acc@5 87.305 (93.701) Mem 16696MB [2024-08-04 11:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 1.6055 (1.3175) Acc@1 63.965 (69.717) Acc@5 86.816 (90.386) Mem 16696MB [2024-08-04 11:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 69.624 Acc@5 90.385 [2024-08-04 11:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 69.6% [2024-08-04 11:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 69.62% [2024-08-04 11:29:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:29:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 1.2383 (1.2383) Acc@1 69.189 (69.189) Acc@5 89.062 (89.062) Mem 16696MB [2024-08-04 11:29:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 1.9697 (1.4642) Acc@1 53.125 (63.419) Acc@5 79.834 (87.007) Mem 16696MB [2024-08-04 11:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 2.1270 (1.7183) Acc@1 50.781 (59.149) Acc@5 76.660 (82.985) Mem 16696MB [2024-08-04 11:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 59.267 Acc@5 83.247 [2024-08-04 11:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 59.3% [2024-08-04 11:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 59.27% [2024-08-04 11:29:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:29:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][0/625] eta 0:07:43 lr 0.001200 wd 0.0500 time 0.7412 (0.7412) data time 0.3530 (0.3530) model time 0.0000 (0.0000) loss 4.4835 (4.4835) grad_norm 1.5983 (1.5983) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][10/625] eta 0:04:49 lr 0.001200 wd 0.0500 time 0.4472 (0.4705) data time 0.0008 (0.0329) model time 0.0000 (0.0000) loss 3.0126 (4.0209) grad_norm 1.4730 (1.5259) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][20/625] eta 0:04:37 lr 0.001200 wd 0.0500 time 0.4452 (0.4587) data time 0.0007 (0.0176) model time 0.0000 (0.0000) loss 3.9330 (3.9799) grad_norm 1.4702 (1.5201) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][30/625] eta 0:04:30 lr 0.001200 wd 0.0500 time 0.4633 (0.4546) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 3.8868 (3.9980) grad_norm 1.2580 (1.5472) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][40/625] eta 0:04:24 lr 0.001200 wd 0.0500 time 0.4481 (0.4518) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 4.0276 (3.9771) grad_norm 1.0934 (1.4565) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][50/625] eta 0:04:18 lr 0.001200 wd 0.0500 time 0.4456 (0.4502) data time 0.0008 (0.0077) model time 0.0000 (0.0000) loss 4.3065 (3.9872) grad_norm 1.7890 (1.4525) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][60/625] eta 0:04:15 lr 0.001200 wd 0.0500 time 0.6568 (0.4527) data time 0.0009 (0.0066) model time 0.6559 (0.4642) loss 4.3210 (4.0131) grad_norm 1.0394 (1.4452) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][70/625] eta 0:04:10 lr 0.001200 wd 0.0500 time 0.4430 (0.4505) data time 0.0008 (0.0058) model time 0.4422 (0.4502) loss 4.0438 (3.9728) grad_norm 1.1251 (1.4270) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][80/625] eta 0:04:04 lr 0.001200 wd 0.0500 time 0.4410 (0.4494) data time 0.0007 (0.0052) model time 0.4403 (0.4473) loss 4.2714 (3.9799) grad_norm 1.5980 (1.4263) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][90/625] eta 0:04:00 lr 0.001200 wd 0.0500 time 0.4444 (0.4488) data time 0.0009 (0.0047) model time 0.4435 (0.4462) loss 4.4178 (3.9909) grad_norm 1.1176 (1.4339) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][100/625] eta 0:03:55 lr 0.001200 wd 0.0500 time 0.4447 (0.4486) data time 0.0008 (0.0043) model time 0.4439 (0.4461) loss 2.9461 (4.0125) grad_norm 1.0942 (1.4249) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][110/625] eta 0:03:50 lr 0.001200 wd 0.0500 time 0.4442 (0.4482) data time 0.0006 (0.0040) model time 0.4436 (0.4456) loss 3.9885 (4.0041) grad_norm 2.4167 (1.4485) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][120/625] eta 0:03:46 lr 0.001200 wd 0.0500 time 0.4511 (0.4481) data time 0.0007 (0.0037) model time 0.4504 (0.4458) loss 3.5951 (4.0260) grad_norm 1.5650 (1.4589) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][130/625] eta 0:03:41 lr 0.001200 wd 0.0500 time 0.4584 (0.4481) data time 0.0007 (0.0035) model time 0.4577 (0.4460) loss 3.5689 (4.0216) grad_norm 1.4685 (1.4694) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][140/625] eta 0:03:37 lr 0.001200 wd 0.0500 time 0.4466 (0.4479) data time 0.0011 (0.0033) model time 0.4455 (0.4457) loss 3.7833 (4.0135) grad_norm 1.4966 (1.4664) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][150/625] eta 0:03:32 lr 0.001200 wd 0.0500 time 0.4454 (0.4477) data time 0.0008 (0.0031) model time 0.4445 (0.4456) loss 4.2352 (4.0220) grad_norm 1.3977 (1.4748) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][160/625] eta 0:03:28 lr 0.001200 wd 0.0500 time 0.4459 (0.4475) data time 0.0006 (0.0030) model time 0.4453 (0.4454) loss 3.6733 (4.0215) grad_norm 1.4812 (1.4725) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][170/625] eta 0:03:23 lr 0.001200 wd 0.0500 time 0.4397 (0.4473) data time 0.0009 (0.0029) model time 0.4388 (0.4452) loss 3.7428 (4.0254) grad_norm 2.0675 (1.4854) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][180/625] eta 0:03:18 lr 0.001200 wd 0.0500 time 0.4410 (0.4472) data time 0.0007 (0.0028) model time 0.4403 (0.4451) loss 3.8932 (4.0262) grad_norm 1.9464 (1.4853) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][190/625] eta 0:03:14 lr 0.001200 wd 0.0500 time 0.4440 (0.4470) data time 0.0009 (0.0027) model time 0.4432 (0.4450) loss 3.0743 (4.0177) grad_norm 0.9355 (1.4936) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][200/625] eta 0:03:09 lr 0.001200 wd 0.0500 time 0.4437 (0.4469) data time 0.0007 (0.0026) model time 0.4430 (0.4449) loss 4.2401 (4.0239) grad_norm 1.3129 (1.4846) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][210/625] eta 0:03:06 lr 0.001200 wd 0.0500 time 0.4424 (0.4484) data time 0.0009 (0.0025) model time 0.4415 (0.4470) loss 4.4214 (4.0152) grad_norm 2.0055 (1.4711) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][220/625] eta 0:03:01 lr 0.001200 wd 0.0500 time 0.4439 (0.4482) data time 0.0007 (0.0024) model time 0.4432 (0.4468) loss 4.2560 (4.0287) grad_norm 1.3915 (1.4714) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][230/625] eta 0:02:56 lr 0.001200 wd 0.0500 time 0.4450 (0.4481) data time 0.0007 (0.0023) model time 0.4444 (0.4467) loss 4.4984 (4.0274) grad_norm 1.2806 (1.4766) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][240/625] eta 0:02:52 lr 0.001200 wd 0.0500 time 0.4476 (0.4479) data time 0.0009 (0.0023) model time 0.4467 (0.4465) loss 4.3760 (4.0391) grad_norm 1.4750 (1.4708) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][250/625] eta 0:02:47 lr 0.001200 wd 0.0500 time 0.4416 (0.4478) data time 0.0008 (0.0022) model time 0.4409 (0.4464) loss 4.6707 (4.0406) grad_norm 1.2885 (1.4704) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][260/625] eta 0:02:43 lr 0.001200 wd 0.0500 time 0.4452 (0.4477) data time 0.0007 (0.0022) model time 0.4445 (0.4462) loss 4.3311 (4.0305) grad_norm 2.1458 (1.4879) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][270/625] eta 0:02:38 lr 0.001200 wd 0.0500 time 0.4420 (0.4475) data time 0.0009 (0.0021) model time 0.4411 (0.4461) loss 3.1711 (4.0129) grad_norm 1.3765 (1.4832) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][280/625] eta 0:02:34 lr 0.001200 wd 0.0500 time 0.4446 (0.4474) data time 0.0007 (0.0021) model time 0.4439 (0.4460) loss 4.4619 (4.0150) grad_norm 1.8324 (1.4733) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][290/625] eta 0:02:29 lr 0.001200 wd 0.0500 time 0.4458 (0.4473) data time 0.0007 (0.0020) model time 0.4451 (0.4459) loss 3.4257 (4.0103) grad_norm 1.1563 (1.4724) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][300/625] eta 0:02:25 lr 0.001200 wd 0.0500 time 0.4414 (0.4472) data time 0.0009 (0.0020) model time 0.4405 (0.4458) loss 3.0132 (4.0038) grad_norm 1.2037 (1.4728) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][310/625] eta 0:02:20 lr 0.001200 wd 0.0500 time 0.4463 (0.4472) data time 0.0009 (0.0019) model time 0.4454 (0.4458) loss 3.7612 (3.9962) grad_norm 1.0529 (1.4751) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][320/625] eta 0:02:16 lr 0.001200 wd 0.0500 time 0.4488 (0.4472) data time 0.0008 (0.0019) model time 0.4480 (0.4458) loss 3.7396 (3.9787) grad_norm 1.2962 (1.4691) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][330/625] eta 0:02:11 lr 0.001200 wd 0.0500 time 0.4548 (0.4471) data time 0.0007 (0.0019) model time 0.4541 (0.4458) loss 4.0582 (3.9732) grad_norm 1.4411 (1.4709) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][340/625] eta 0:02:07 lr 0.001200 wd 0.0500 time 0.4430 (0.4471) data time 0.0009 (0.0018) model time 0.4421 (0.4457) loss 3.9271 (3.9695) grad_norm 1.1240 (1.4619) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][350/625] eta 0:02:03 lr 0.001200 wd 0.0500 time 0.4427 (0.4476) data time 0.0010 (0.0018) model time 0.4417 (0.4463) loss 3.5906 (3.9646) grad_norm 0.9441 (1.4569) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][360/625] eta 0:01:58 lr 0.001200 wd 0.0500 time 0.4511 (0.4480) data time 0.0006 (0.0018) model time 0.4505 (0.4468) loss 3.3022 (3.9576) grad_norm 1.3524 (1.4677) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][370/625] eta 0:01:54 lr 0.001200 wd 0.0500 time 0.4402 (0.4479) data time 0.0009 (0.0018) model time 0.4393 (0.4467) loss 3.8316 (3.9565) grad_norm 1.0300 (1.4690) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][380/625] eta 0:01:49 lr 0.001200 wd 0.0500 time 0.4415 (0.4478) data time 0.0006 (0.0017) model time 0.4409 (0.4466) loss 4.2561 (3.9561) grad_norm 1.6656 (1.4708) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][390/625] eta 0:01:45 lr 0.001200 wd 0.0500 time 0.4429 (0.4476) data time 0.0007 (0.0017) model time 0.4422 (0.4464) loss 3.5513 (3.9517) grad_norm 1.2405 (1.4686) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][400/625] eta 0:01:40 lr 0.001200 wd 0.0500 time 0.4429 (0.4480) data time 0.0007 (0.0017) model time 0.4422 (0.4468) loss 3.1886 (3.9448) grad_norm 1.1693 (1.4633) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][410/625] eta 0:01:36 lr 0.001200 wd 0.0500 time 0.4417 (0.4479) data time 0.0011 (0.0017) model time 0.4406 (0.4467) loss 4.1068 (3.9420) grad_norm 1.2813 (1.4574) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][420/625] eta 0:01:31 lr 0.001199 wd 0.0500 time 0.4424 (0.4477) data time 0.0008 (0.0017) model time 0.4416 (0.4466) loss 4.0653 (3.9487) grad_norm 1.2755 (1.4554) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][430/625] eta 0:01:27 lr 0.001199 wd 0.0500 time 0.4404 (0.4476) data time 0.0006 (0.0016) model time 0.4398 (0.4465) loss 3.0586 (3.9484) grad_norm 1.0809 (1.4578) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][440/625] eta 0:01:22 lr 0.001199 wd 0.0500 time 0.4441 (0.4476) data time 0.0007 (0.0016) model time 0.4434 (0.4464) loss 3.5471 (3.9474) grad_norm 1.3183 (1.4544) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][450/625] eta 0:01:18 lr 0.001199 wd 0.0500 time 0.4404 (0.4475) data time 0.0006 (0.0016) model time 0.4397 (0.4463) loss 4.4262 (3.9428) grad_norm 1.4650 (1.4525) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][460/625] eta 0:01:13 lr 0.001199 wd 0.0500 time 0.4409 (0.4474) data time 0.0008 (0.0016) model time 0.4400 (0.4462) loss 4.4004 (3.9469) grad_norm 1.1169 (1.4499) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][470/625] eta 0:01:09 lr 0.001199 wd 0.0500 time 0.4400 (0.4473) data time 0.0008 (0.0016) model time 0.4392 (0.4461) loss 4.1158 (3.9482) grad_norm 1.8652 (1.4507) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][480/625] eta 0:01:04 lr 0.001199 wd 0.0500 time 0.4383 (0.4471) data time 0.0007 (0.0015) model time 0.4376 (0.4459) loss 3.4914 (3.9458) grad_norm 1.2564 (1.4505) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][490/625] eta 0:01:00 lr 0.001199 wd 0.0500 time 0.4430 (0.4470) data time 0.0007 (0.0015) model time 0.4424 (0.4458) loss 3.2783 (3.9490) grad_norm 1.6357 (1.4469) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][500/625] eta 0:00:55 lr 0.001199 wd 0.0500 time 0.4439 (0.4469) data time 0.0006 (0.0015) model time 0.4432 (0.4457) loss 4.3035 (3.9499) grad_norm 1.2406 (1.4449) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][510/625] eta 0:00:51 lr 0.001199 wd 0.0500 time 0.4426 (0.4468) data time 0.0008 (0.0015) model time 0.4418 (0.4456) loss 4.1351 (3.9484) grad_norm 1.7870 (1.4503) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][520/625] eta 0:00:46 lr 0.001199 wd 0.0500 time 0.4386 (0.4467) data time 0.0006 (0.0015) model time 0.4381 (0.4455) loss 3.1916 (3.9441) grad_norm 1.0076 (1.4501) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][530/625] eta 0:00:42 lr 0.001199 wd 0.0500 time 0.4470 (0.4467) data time 0.0007 (0.0015) model time 0.4463 (0.4455) loss 4.5198 (3.9462) grad_norm 1.1205 (1.4487) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][540/625] eta 0:00:37 lr 0.001199 wd 0.0500 time 0.4465 (0.4467) data time 0.0008 (0.0015) model time 0.4458 (0.4455) loss 4.0826 (3.9495) grad_norm 1.1263 (1.4500) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][550/625] eta 0:00:33 lr 0.001199 wd 0.0500 time 0.4421 (0.4466) data time 0.0009 (0.0014) model time 0.4412 (0.4454) loss 3.1790 (3.9461) grad_norm 1.1239 (1.4499) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][560/625] eta 0:00:29 lr 0.001199 wd 0.0500 time 0.4407 (0.4465) data time 0.0006 (0.0014) model time 0.4400 (0.4454) loss 4.3884 (3.9436) grad_norm 1.0790 (1.4519) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:33:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][570/625] eta 0:00:24 lr 0.001199 wd 0.0500 time 0.4429 (0.4465) data time 0.0009 (0.0014) model time 0.4420 (0.4453) loss 4.2389 (3.9431) grad_norm 1.6741 (1.4542) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][580/625] eta 0:00:20 lr 0.001199 wd 0.0500 time 0.4456 (0.4464) data time 0.0007 (0.0014) model time 0.4449 (0.4453) loss 4.3554 (3.9453) grad_norm 1.2431 (1.4520) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][590/625] eta 0:00:15 lr 0.001199 wd 0.0500 time 0.4417 (0.4464) data time 0.0007 (0.0014) model time 0.4409 (0.4453) loss 3.4503 (3.9436) grad_norm 1.9357 (1.4551) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][600/625] eta 0:00:11 lr 0.001199 wd 0.0500 time 0.4427 (0.4464) data time 0.0006 (0.0014) model time 0.4421 (0.4452) loss 4.5627 (3.9401) grad_norm 1.2740 (1.4526) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][610/625] eta 0:00:06 lr 0.001199 wd 0.0500 time 0.4407 (0.4463) data time 0.0004 (0.0014) model time 0.4402 (0.4452) loss 4.6150 (3.9421) grad_norm 1.0060 (1.4517) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [23/300][620/625] eta 0:00:02 lr 0.001199 wd 0.0500 time 0.4410 (0.4462) data time 0.0004 (0.0014) model time 0.4406 (0.4451) loss 3.0660 (3.9396) grad_norm 1.0487 (1.4504) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 23 training takes 0:04:38 [2024-08-04 11:34:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:34:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.8223 (0.8223) Acc@1 80.762 (80.762) Acc@5 95.654 (95.654) Mem 16696MB [2024-08-04 11:34:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.4727 (1.0085) Acc@1 67.139 (76.181) Acc@5 87.744 (94.247) Mem 16696MB [2024-08-04 11:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 1.6416 (1.2571) Acc@1 63.184 (70.938) Acc@5 85.693 (90.813) Mem 16696MB [2024-08-04 11:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 70.853 Acc@5 90.809 [2024-08-04 11:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 70.9% [2024-08-04 11:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 70.85% [2024-08-04 11:34:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:34:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 1.1133 (1.1133) Acc@1 72.021 (72.021) Acc@5 90.625 (90.625) Mem 16696MB [2024-08-04 11:34:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 1.8252 (1.3343) Acc@1 56.152 (66.167) Acc@5 82.178 (88.712) Mem 16696MB [2024-08-04 11:34:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.9824 (1.5884) Acc@1 53.320 (61.614) Acc@5 78.564 (84.768) Mem 16696MB [2024-08-04 11:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 61.734 Acc@5 84.975 [2024-08-04 11:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 61.7% [2024-08-04 11:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 61.73% [2024-08-04 11:34:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:34:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:34:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][0/625] eta 0:08:09 lr 0.001199 wd 0.0500 time 0.7826 (0.7826) data time 0.3935 (0.3935) model time 0.0000 (0.0000) loss 4.9371 (4.9371) grad_norm 1.7307 (1.7307) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][10/625] eta 0:04:51 lr 0.001199 wd 0.0500 time 0.4431 (0.4741) data time 0.0008 (0.0365) model time 0.0000 (0.0000) loss 4.2559 (4.0695) grad_norm 2.1325 (1.5882) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][20/625] eta 0:04:38 lr 0.001199 wd 0.0500 time 0.4396 (0.4596) data time 0.0010 (0.0195) model time 0.0000 (0.0000) loss 4.3076 (3.9443) grad_norm 1.1661 (1.5240) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][30/625] eta 0:04:34 lr 0.001199 wd 0.0500 time 0.4450 (0.4612) data time 0.0010 (0.0135) model time 0.0000 (0.0000) loss 4.0185 (3.8944) grad_norm 1.3444 (1.4400) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][40/625] eta 0:04:27 lr 0.001199 wd 0.0500 time 0.4441 (0.4572) data time 0.0006 (0.0104) model time 0.0000 (0.0000) loss 3.8462 (3.9058) grad_norm 1.4908 (1.3715) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:34:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][50/625] eta 0:04:21 lr 0.001199 wd 0.0500 time 0.4420 (0.4544) data time 0.0009 (0.0085) model time 0.0000 (0.0000) loss 3.2507 (3.8824) grad_norm 1.3799 (1.3726) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][60/625] eta 0:04:17 lr 0.001199 wd 0.0500 time 0.6518 (0.4558) data time 0.0008 (0.0072) model time 0.6510 (0.4619) loss 3.2194 (3.8682) grad_norm 1.0687 (1.3491) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][70/625] eta 0:04:11 lr 0.001199 wd 0.0500 time 0.4421 (0.4532) data time 0.0007 (0.0063) model time 0.4415 (0.4495) loss 3.1531 (3.8315) grad_norm 1.6746 (1.3653) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][80/625] eta 0:04:06 lr 0.001199 wd 0.0500 time 0.4474 (0.4521) data time 0.0007 (0.0056) model time 0.4467 (0.4474) loss 4.0326 (3.8592) grad_norm 1.1104 (1.3842) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][90/625] eta 0:04:01 lr 0.001199 wd 0.0500 time 0.4437 (0.4512) data time 0.0006 (0.0051) model time 0.4431 (0.4463) loss 4.9583 (3.8838) grad_norm 1.4522 (1.3969) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][100/625] eta 0:03:56 lr 0.001199 wd 0.0500 time 0.4422 (0.4503) data time 0.0008 (0.0047) model time 0.4414 (0.4453) loss 3.7064 (3.8792) grad_norm 1.0677 (1.3937) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][110/625] eta 0:03:51 lr 0.001199 wd 0.0500 time 0.4470 (0.4497) data time 0.0007 (0.0043) model time 0.4463 (0.4449) loss 4.5263 (3.8817) grad_norm 1.2077 (1.3927) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][120/625] eta 0:03:46 lr 0.001199 wd 0.0500 time 0.4444 (0.4493) data time 0.0007 (0.0040) model time 0.4437 (0.4448) loss 2.6264 (3.8923) grad_norm 1.1845 (1.4013) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][130/625] eta 0:03:42 lr 0.001199 wd 0.0500 time 0.4444 (0.4491) data time 0.0009 (0.0038) model time 0.4435 (0.4450) loss 4.1955 (3.9002) grad_norm 1.2784 (1.3956) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][140/625] eta 0:03:37 lr 0.001199 wd 0.0500 time 0.4413 (0.4489) data time 0.0010 (0.0036) model time 0.4403 (0.4450) loss 2.6018 (3.8989) grad_norm 1.1718 (1.3930) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][150/625] eta 0:03:33 lr 0.001199 wd 0.0500 time 0.4475 (0.4487) data time 0.0007 (0.0034) model time 0.4468 (0.4450) loss 4.9891 (3.9200) grad_norm 1.2242 (1.3920) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][160/625] eta 0:03:28 lr 0.001199 wd 0.0500 time 0.4459 (0.4485) data time 0.0009 (0.0032) model time 0.4450 (0.4449) loss 4.3123 (3.9157) grad_norm 1.4685 (1.4011) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][170/625] eta 0:03:23 lr 0.001199 wd 0.0500 time 0.4459 (0.4482) data time 0.0009 (0.0031) model time 0.4451 (0.4448) loss 3.4625 (3.9029) grad_norm 1.4561 (1.4036) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][180/625] eta 0:03:19 lr 0.001199 wd 0.0500 time 0.4444 (0.4481) data time 0.0008 (0.0030) model time 0.4436 (0.4448) loss 4.4120 (3.9192) grad_norm 1.8720 (1.4089) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:35:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][190/625] eta 0:03:14 lr 0.001199 wd 0.0500 time 0.4434 (0.4479) data time 0.0008 (0.0029) model time 0.4426 (0.4447) loss 4.3362 (3.9221) grad_norm 0.9995 (1.3986) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][200/625] eta 0:03:10 lr 0.001199 wd 0.0500 time 0.4437 (0.4477) data time 0.0009 (0.0028) model time 0.4429 (0.4446) loss 3.5950 (3.9159) grad_norm 1.0411 (1.3968) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][210/625] eta 0:03:05 lr 0.001199 wd 0.0500 time 0.4461 (0.4474) data time 0.0009 (0.0027) model time 0.4452 (0.4444) loss 4.2933 (3.9152) grad_norm 1.3135 (1.3961) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][220/625] eta 0:03:01 lr 0.001199 wd 0.0500 time 0.4446 (0.4473) data time 0.0008 (0.0026) model time 0.4438 (0.4444) loss 3.8173 (3.9167) grad_norm 1.2301 (1.3934) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][230/625] eta 0:02:56 lr 0.001199 wd 0.0500 time 0.4490 (0.4471) data time 0.0008 (0.0025) model time 0.4482 (0.4443) loss 3.3367 (3.9098) grad_norm 1.3213 (1.3917) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][240/625] eta 0:02:52 lr 0.001199 wd 0.0500 time 0.4398 (0.4470) data time 0.0007 (0.0024) model time 0.4391 (0.4442) loss 3.9840 (3.9106) grad_norm 1.4610 (1.4007) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][250/625] eta 0:02:47 lr 0.001199 wd 0.0500 time 0.4453 (0.4469) data time 0.0009 (0.0024) model time 0.4444 (0.4442) loss 3.6053 (3.9191) grad_norm 1.1798 (1.4007) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][260/625] eta 0:02:43 lr 0.001199 wd 0.0500 time 0.4454 (0.4468) data time 0.0009 (0.0023) model time 0.4445 (0.4442) loss 3.5150 (3.9119) grad_norm 2.0962 (1.4123) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][270/625] eta 0:02:38 lr 0.001199 wd 0.0500 time 0.4418 (0.4467) data time 0.0008 (0.0023) model time 0.4410 (0.4441) loss 2.7562 (3.9120) grad_norm 1.4811 (1.4079) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][280/625] eta 0:02:34 lr 0.001199 wd 0.0500 time 0.4436 (0.4466) data time 0.0007 (0.0022) model time 0.4430 (0.4440) loss 3.3061 (3.9175) grad_norm 1.5676 (1.4093) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][290/625] eta 0:02:29 lr 0.001199 wd 0.0500 time 0.4430 (0.4465) data time 0.0008 (0.0022) model time 0.4423 (0.4440) loss 4.5111 (3.9163) grad_norm 1.4911 (1.4175) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][300/625] eta 0:02:25 lr 0.001199 wd 0.0500 time 0.4451 (0.4464) data time 0.0009 (0.0021) model time 0.4442 (0.4440) loss 3.7339 (3.9255) grad_norm 1.4656 (1.4272) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][310/625] eta 0:02:20 lr 0.001199 wd 0.0500 time 0.4432 (0.4463) data time 0.0008 (0.0021) model time 0.4423 (0.4439) loss 4.1509 (3.9175) grad_norm 0.9338 (1.4261) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][320/625] eta 0:02:16 lr 0.001199 wd 0.0500 time 0.4426 (0.4463) data time 0.0008 (0.0020) model time 0.4418 (0.4439) loss 4.1936 (3.9172) grad_norm 1.2689 (1.4239) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][330/625] eta 0:02:11 lr 0.001199 wd 0.0500 time 0.4417 (0.4462) data time 0.0007 (0.0020) model time 0.4410 (0.4439) loss 3.0021 (3.9044) grad_norm 1.3910 (1.4211) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][340/625] eta 0:02:07 lr 0.001199 wd 0.0500 time 0.4450 (0.4461) data time 0.0009 (0.0020) model time 0.4442 (0.4439) loss 3.8100 (3.9064) grad_norm 1.1218 (1.4271) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][350/625] eta 0:02:02 lr 0.001199 wd 0.0500 time 0.4468 (0.4461) data time 0.0006 (0.0019) model time 0.4462 (0.4438) loss 4.4626 (3.9042) grad_norm 1.1880 (1.4247) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][360/625] eta 0:01:58 lr 0.001199 wd 0.0500 time 0.4451 (0.4460) data time 0.0008 (0.0019) model time 0.4443 (0.4438) loss 4.2307 (3.9070) grad_norm 1.3431 (1.4201) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][370/625] eta 0:01:53 lr 0.001199 wd 0.0500 time 0.4460 (0.4466) data time 0.0007 (0.0019) model time 0.4453 (0.4445) loss 4.1979 (3.9109) grad_norm 1.4811 (1.4212) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][380/625] eta 0:01:49 lr 0.001199 wd 0.0500 time 0.4452 (0.4465) data time 0.0006 (0.0019) model time 0.4445 (0.4444) loss 3.8777 (3.9149) grad_norm 1.6274 (1.4207) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][390/625] eta 0:01:44 lr 0.001199 wd 0.0500 time 0.4422 (0.4464) data time 0.0006 (0.0018) model time 0.4416 (0.4444) loss 3.0891 (3.9112) grad_norm 1.4716 (1.4210) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][400/625] eta 0:01:40 lr 0.001199 wd 0.0500 time 0.4425 (0.4467) data time 0.0006 (0.0018) model time 0.4418 (0.4447) loss 3.2545 (3.9048) grad_norm 1.0769 (1.4258) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][410/625] eta 0:01:36 lr 0.001199 wd 0.0500 time 0.4469 (0.4467) data time 0.0008 (0.0018) model time 0.4461 (0.4448) loss 3.6747 (3.9020) grad_norm 1.8070 (1.4271) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][420/625] eta 0:01:31 lr 0.001199 wd 0.0500 time 0.4554 (0.4466) data time 0.0008 (0.0018) model time 0.4546 (0.4447) loss 3.9216 (3.9049) grad_norm 1.6820 (1.4274) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][430/625] eta 0:01:27 lr 0.001199 wd 0.0500 time 0.4468 (0.4466) data time 0.0008 (0.0017) model time 0.4460 (0.4447) loss 3.1791 (3.8982) grad_norm 1.7964 (1.4248) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][440/625] eta 0:01:22 lr 0.001199 wd 0.0500 time 0.4452 (0.4465) data time 0.0008 (0.0017) model time 0.4444 (0.4447) loss 4.2169 (3.8992) grad_norm 1.1357 (1.4217) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][450/625] eta 0:01:18 lr 0.001199 wd 0.0500 time 0.4465 (0.4464) data time 0.0008 (0.0017) model time 0.4457 (0.4446) loss 4.3838 (3.8967) grad_norm 1.3076 (1.4244) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][460/625] eta 0:01:13 lr 0.001199 wd 0.0500 time 0.4401 (0.4464) data time 0.0009 (0.0017) model time 0.4392 (0.4445) loss 2.5744 (3.8935) grad_norm 1.3440 (1.4226) loss_scale 4096.0000 (4096.0000) mem 16696MB [2024-08-04 11:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][470/625] eta 0:01:09 lr 0.001199 wd 0.0500 time 0.4464 (0.4463) data time 0.0006 (0.0017) model time 0.4458 (0.4445) loss 3.3314 (3.8872) grad_norm 1.4105 (1.4299) loss_scale 8192.0000 (4139.4820) mem 16696MB [2024-08-04 11:38:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][480/625] eta 0:01:04 lr 0.001199 wd 0.0500 time 0.4432 (0.4463) data time 0.0008 (0.0016) model time 0.4424 (0.4445) loss 3.6790 (3.8839) grad_norm 1.2834 (1.4277) loss_scale 8192.0000 (4223.7339) mem 16696MB [2024-08-04 11:38:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][490/625] eta 0:01:00 lr 0.001199 wd 0.0500 time 0.4419 (0.4463) data time 0.0007 (0.0016) model time 0.4412 (0.4445) loss 4.2844 (3.8891) grad_norm 1.4788 (1.4292) loss_scale 8192.0000 (4304.5540) mem 16696MB [2024-08-04 11:38:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][500/625] eta 0:00:55 lr 0.001199 wd 0.0500 time 0.4496 (0.4462) data time 0.0008 (0.0016) model time 0.4488 (0.4445) loss 2.8280 (3.8871) grad_norm 1.2638 (1.4326) loss_scale 8192.0000 (4382.1477) mem 16696MB [2024-08-04 11:38:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][510/625] eta 0:00:51 lr 0.001199 wd 0.0500 time 0.4437 (0.4462) data time 0.0008 (0.0016) model time 0.4429 (0.4445) loss 3.8219 (3.8891) grad_norm 1.0657 (1.4304) loss_scale 8192.0000 (4456.7045) mem 16696MB [2024-08-04 11:38:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][520/625] eta 0:00:46 lr 0.001199 wd 0.0500 time 0.4402 (0.4462) data time 0.0008 (0.0016) model time 0.4394 (0.4445) loss 4.4538 (3.8927) grad_norm 1.1707 (1.4296) loss_scale 8192.0000 (4528.3992) mem 16696MB [2024-08-04 11:38:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][530/625] eta 0:00:42 lr 0.001199 wd 0.0500 time 0.4423 (0.4462) data time 0.0007 (0.0016) model time 0.4417 (0.4445) loss 2.2316 (3.8889) grad_norm 1.6285 (1.4305) loss_scale 8192.0000 (4597.3936) mem 16696MB [2024-08-04 11:38:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][540/625] eta 0:00:37 lr 0.001199 wd 0.0500 time 0.4425 (0.4461) data time 0.0008 (0.0015) model time 0.4417 (0.4444) loss 4.2943 (3.8906) grad_norm 1.3979 (1.4301) loss_scale 8192.0000 (4663.8373) mem 16696MB [2024-08-04 11:38:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][550/625] eta 0:00:33 lr 0.001199 wd 0.0500 time 0.4453 (0.4461) data time 0.0006 (0.0015) model time 0.4447 (0.4444) loss 3.1125 (3.8913) grad_norm 1.1141 (1.4280) loss_scale 8192.0000 (4727.8693) mem 16696MB [2024-08-04 11:38:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][560/625] eta 0:00:29 lr 0.001199 wd 0.0500 time 0.4454 (0.4464) data time 0.0008 (0.0015) model time 0.4446 (0.4448) loss 3.6053 (3.8934) grad_norm 1.4072 (1.4314) loss_scale 8192.0000 (4789.6185) mem 16696MB [2024-08-04 11:38:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][570/625] eta 0:00:24 lr 0.001199 wd 0.0500 time 0.4440 (0.4463) data time 0.0007 (0.0015) model time 0.4433 (0.4447) loss 4.1132 (3.8995) grad_norm 1.2828 (1.4289) loss_scale 8192.0000 (4849.2049) mem 16696MB [2024-08-04 11:38:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][580/625] eta 0:00:20 lr 0.001199 wd 0.0500 time 0.4494 (0.4463) data time 0.0006 (0.0015) model time 0.4487 (0.4447) loss 3.1301 (3.8975) grad_norm 1.2670 (1.4266) loss_scale 8192.0000 (4906.7401) mem 16696MB [2024-08-04 11:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][590/625] eta 0:00:15 lr 0.001199 wd 0.0500 time 0.4458 (0.4464) data time 0.0008 (0.0015) model time 0.4450 (0.4448) loss 3.3884 (3.8957) grad_norm 1.4450 (1.4237) loss_scale 8192.0000 (4962.3283) mem 16696MB [2024-08-04 11:39:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][600/625] eta 0:00:11 lr 0.001199 wd 0.0500 time 0.4429 (0.4463) data time 0.0008 (0.0015) model time 0.4421 (0.4448) loss 3.8953 (3.8929) grad_norm 1.3130 (1.4246) loss_scale 8192.0000 (5016.0666) mem 16696MB [2024-08-04 11:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][610/625] eta 0:00:06 lr 0.001199 wd 0.0500 time 0.4403 (0.4463) data time 0.0004 (0.0015) model time 0.4398 (0.4447) loss 2.9974 (3.8917) grad_norm 1.8875 (1.4263) loss_scale 8192.0000 (5068.0458) mem 16696MB [2024-08-04 11:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [24/300][620/625] eta 0:00:02 lr 0.001199 wd 0.0500 time 0.4402 (0.4462) data time 0.0004 (0.0014) model time 0.4397 (0.4446) loss 5.0745 (3.8951) grad_norm 1.1386 (1.4241) loss_scale 8192.0000 (5118.3510) mem 16696MB [2024-08-04 11:39:11 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 24 training takes 0:04:38 [2024-08-04 11:39:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:39:13 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.7930 (0.7930) Acc@1 81.055 (81.055) Acc@5 95.850 (95.850) Mem 16696MB [2024-08-04 11:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.123 (0.152) Loss 1.3965 (0.9776) Acc@1 66.748 (76.496) Acc@5 89.697 (94.256) Mem 16696MB [2024-08-04 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 1.5674 (1.2134) Acc@1 63.477 (71.517) Acc@5 87.012 (91.188) Mem 16696MB [2024-08-04 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.345 Acc@5 91.105 [2024-08-04 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 71.3% [2024-08-04 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 71.35% [2024-08-04 11:39:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:39:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 1.0098 (1.0098) Acc@1 74.023 (74.023) Acc@5 91.699 (91.699) Mem 16696MB [2024-08-04 11:39:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.7051 (1.2292) Acc@1 58.789 (68.364) Acc@5 83.545 (89.906) Mem 16696MB [2024-08-04 11:39:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.8691 (1.4815) Acc@1 55.273 (63.783) Acc@5 80.176 (86.121) Mem 16696MB [2024-08-04 11:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 63.884 Acc@5 86.286 [2024-08-04 11:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 63.9% [2024-08-04 11:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 63.88% [2024-08-04 11:39:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:39:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][0/625] eta 0:08:24 lr 0.001199 wd 0.0500 time 0.8065 (0.8065) data time 0.4209 (0.4209) model time 0.0000 (0.0000) loss 4.4062 (4.4062) grad_norm 1.5690 (1.5690) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][10/625] eta 0:04:55 lr 0.001199 wd 0.0500 time 0.4441 (0.4802) data time 0.0006 (0.0390) model time 0.0000 (0.0000) loss 4.1420 (3.7996) grad_norm 1.6500 (1.8296) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][20/625] eta 0:04:40 lr 0.001199 wd 0.0500 time 0.4481 (0.4640) data time 0.0006 (0.0208) model time 0.0000 (0.0000) loss 2.6213 (3.6921) grad_norm 1.2186 (1.6874) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][30/625] eta 0:04:32 lr 0.001199 wd 0.0500 time 0.4451 (0.4581) data time 0.0010 (0.0144) model time 0.0000 (0.0000) loss 3.9539 (3.7275) grad_norm 1.1560 (1.4940) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][40/625] eta 0:04:25 lr 0.001199 wd 0.0500 time 0.4545 (0.4547) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 3.9519 (3.7391) grad_norm 3.5272 (1.5234) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][50/625] eta 0:04:20 lr 0.001199 wd 0.0500 time 0.4457 (0.4527) data time 0.0009 (0.0091) model time 0.0000 (0.0000) loss 3.6284 (3.8139) grad_norm 1.0946 (1.4716) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][60/625] eta 0:04:17 lr 0.001199 wd 0.0500 time 0.6497 (0.4550) data time 0.0008 (0.0077) model time 0.6489 (0.4660) loss 4.1287 (3.7652) grad_norm 1.0902 (1.4484) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][70/625] eta 0:04:11 lr 0.001199 wd 0.0500 time 0.4450 (0.4526) data time 0.0008 (0.0067) model time 0.4443 (0.4515) loss 3.8668 (3.7851) grad_norm 2.0023 (1.4405) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:39:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][80/625] eta 0:04:06 lr 0.001199 wd 0.0500 time 0.4474 (0.4515) data time 0.0007 (0.0060) model time 0.4467 (0.4485) loss 4.6223 (3.7958) grad_norm 1.0148 (1.4389) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][90/625] eta 0:04:01 lr 0.001199 wd 0.0500 time 0.4427 (0.4506) data time 0.0010 (0.0054) model time 0.4417 (0.4471) loss 4.2398 (3.7976) grad_norm 1.1977 (1.4302) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][100/625] eta 0:03:56 lr 0.001199 wd 0.0500 time 0.4445 (0.4500) data time 0.0006 (0.0050) model time 0.4440 (0.4463) loss 3.0013 (3.8021) grad_norm 2.1134 (1.4464) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][110/625] eta 0:03:51 lr 0.001199 wd 0.0500 time 0.4403 (0.4495) data time 0.0007 (0.0046) model time 0.4396 (0.4459) loss 4.9152 (3.8320) grad_norm 1.2579 (1.4388) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][120/625] eta 0:03:46 lr 0.001199 wd 0.0500 time 0.4431 (0.4492) data time 0.0006 (0.0043) model time 0.4425 (0.4459) loss 3.1186 (3.8280) grad_norm 1.3426 (1.4305) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][130/625] eta 0:03:42 lr 0.001199 wd 0.0500 time 0.4480 (0.4503) data time 0.0009 (0.0040) model time 0.4471 (0.4479) loss 3.8219 (3.8369) grad_norm 1.9360 (1.4333) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][140/625] eta 0:03:38 lr 0.001199 wd 0.0500 time 0.4422 (0.4498) data time 0.0008 (0.0038) model time 0.4415 (0.4473) loss 3.0755 (3.8165) grad_norm 1.2726 (1.4344) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][150/625] eta 0:03:33 lr 0.001199 wd 0.0500 time 0.4432 (0.4494) data time 0.0006 (0.0036) model time 0.4426 (0.4469) loss 2.8635 (3.8326) grad_norm 1.2206 (1.4264) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][160/625] eta 0:03:28 lr 0.001199 wd 0.0500 time 0.4439 (0.4490) data time 0.0006 (0.0034) model time 0.4433 (0.4465) loss 4.3674 (3.8183) grad_norm 1.1275 (1.4207) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][170/625] eta 0:03:24 lr 0.001199 wd 0.0500 time 0.4458 (0.4487) data time 0.0008 (0.0033) model time 0.4450 (0.4462) loss 3.8281 (3.8300) grad_norm 1.4512 (1.4318) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][180/625] eta 0:03:19 lr 0.001199 wd 0.0500 time 0.4446 (0.4485) data time 0.0008 (0.0031) model time 0.4438 (0.4460) loss 4.2858 (3.8333) grad_norm 1.5499 (1.4223) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][190/625] eta 0:03:14 lr 0.001199 wd 0.0500 time 0.4427 (0.4483) data time 0.0008 (0.0030) model time 0.4419 (0.4459) loss 3.7690 (3.8253) grad_norm 0.9412 (1.4086) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][200/625] eta 0:03:10 lr 0.001199 wd 0.0500 time 0.4443 (0.4480) data time 0.0009 (0.0029) model time 0.4435 (0.4456) loss 3.8789 (3.8359) grad_norm 1.5017 (1.4113) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][210/625] eta 0:03:05 lr 0.001199 wd 0.0500 time 0.4431 (0.4477) data time 0.0007 (0.0028) model time 0.4424 (0.4454) loss 4.5788 (3.8386) grad_norm 1.6843 (1.4137) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][220/625] eta 0:03:01 lr 0.001199 wd 0.0500 time 0.4436 (0.4475) data time 0.0006 (0.0027) model time 0.4429 (0.4452) loss 4.5957 (3.8444) grad_norm 1.1360 (1.4064) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][230/625] eta 0:02:56 lr 0.001199 wd 0.0500 time 0.4436 (0.4473) data time 0.0008 (0.0026) model time 0.4428 (0.4450) loss 4.1574 (3.8425) grad_norm 1.9853 (1.4123) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][240/625] eta 0:02:52 lr 0.001199 wd 0.0500 time 0.4394 (0.4471) data time 0.0007 (0.0025) model time 0.4388 (0.4448) loss 4.3621 (3.8482) grad_norm 2.6665 (1.4230) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][250/625] eta 0:02:47 lr 0.001199 wd 0.0500 time 0.4444 (0.4470) data time 0.0008 (0.0025) model time 0.4436 (0.4448) loss 4.8287 (3.8574) grad_norm 1.0777 (1.4191) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][260/625] eta 0:02:43 lr 0.001199 wd 0.0500 time 0.4441 (0.4469) data time 0.0008 (0.0024) model time 0.4433 (0.4447) loss 4.4896 (3.8685) grad_norm 1.1820 (1.4132) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][270/625] eta 0:02:38 lr 0.001199 wd 0.0500 time 0.4473 (0.4468) data time 0.0007 (0.0024) model time 0.4467 (0.4447) loss 2.9231 (3.8670) grad_norm 1.3559 (1.4125) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][280/625] eta 0:02:34 lr 0.001199 wd 0.0500 time 0.4452 (0.4468) data time 0.0006 (0.0023) model time 0.4446 (0.4447) loss 4.6604 (3.8730) grad_norm 1.6427 (1.4184) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][290/625] eta 0:02:29 lr 0.001199 wd 0.0500 time 0.4410 (0.4466) data time 0.0009 (0.0022) model time 0.4401 (0.4446) loss 3.0844 (3.8620) grad_norm 1.3094 (1.4166) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][300/625] eta 0:02:25 lr 0.001199 wd 0.0500 time 0.4441 (0.4466) data time 0.0006 (0.0022) model time 0.4435 (0.4445) loss 2.5766 (3.8477) grad_norm 2.7768 (1.4230) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][310/625] eta 0:02:20 lr 0.001199 wd 0.0500 time 0.4460 (0.4465) data time 0.0010 (0.0022) model time 0.4450 (0.4445) loss 3.9477 (3.8407) grad_norm 1.1500 (1.4252) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][320/625] eta 0:02:16 lr 0.001199 wd 0.0500 time 0.4422 (0.4464) data time 0.0007 (0.0021) model time 0.4416 (0.4444) loss 2.9584 (3.8417) grad_norm 1.4127 (1.4273) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][330/625] eta 0:02:11 lr 0.001199 wd 0.0500 time 0.4433 (0.4463) data time 0.0009 (0.0021) model time 0.4424 (0.4444) loss 4.3062 (3.8381) grad_norm 1.3276 (1.4212) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][340/625] eta 0:02:07 lr 0.001199 wd 0.0500 time 0.4422 (0.4463) data time 0.0008 (0.0020) model time 0.4414 (0.4443) loss 4.2819 (3.8385) grad_norm 1.0892 (1.4144) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:41:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][350/625] eta 0:02:02 lr 0.001199 wd 0.0500 time 0.4460 (0.4467) data time 0.0006 (0.0020) model time 0.4454 (0.4448) loss 4.5371 (3.8380) grad_norm 1.5664 (1.4138) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][360/625] eta 0:01:58 lr 0.001199 wd 0.0500 time 0.4466 (0.4466) data time 0.0006 (0.0020) model time 0.4459 (0.4448) loss 3.7609 (3.8253) grad_norm 1.0917 (1.4140) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][370/625] eta 0:01:53 lr 0.001199 wd 0.0500 time 0.4422 (0.4466) data time 0.0008 (0.0019) model time 0.4414 (0.4448) loss 4.7531 (3.8286) grad_norm 1.5564 (1.4138) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][380/625] eta 0:01:49 lr 0.001199 wd 0.0500 time 0.4412 (0.4465) data time 0.0006 (0.0019) model time 0.4406 (0.4447) loss 2.9658 (3.8317) grad_norm 1.7269 (1.4163) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][390/625] eta 0:01:44 lr 0.001199 wd 0.0500 time 0.4454 (0.4465) data time 0.0007 (0.0019) model time 0.4446 (0.4447) loss 4.7126 (3.8220) grad_norm 1.4138 (1.4162) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][400/625] eta 0:01:40 lr 0.001199 wd 0.0500 time 0.4434 (0.4468) data time 0.0007 (0.0019) model time 0.4427 (0.4451) loss 3.0842 (3.8245) grad_norm 1.4728 (1.4170) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][410/625] eta 0:01:36 lr 0.001199 wd 0.0500 time 0.4467 (0.4468) data time 0.0008 (0.0018) model time 0.4460 (0.4451) loss 4.0562 (3.8260) grad_norm 1.0732 (1.4155) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][420/625] eta 0:01:31 lr 0.001199 wd 0.0500 time 0.4435 (0.4468) data time 0.0009 (0.0018) model time 0.4426 (0.4451) loss 4.2762 (3.8266) grad_norm 1.3819 (1.4161) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][430/625] eta 0:01:27 lr 0.001199 wd 0.0500 time 0.4457 (0.4468) data time 0.0009 (0.0018) model time 0.4448 (0.4451) loss 3.4463 (3.8235) grad_norm 1.7524 (1.4151) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][440/625] eta 0:01:22 lr 0.001199 wd 0.0500 time 0.4418 (0.4467) data time 0.0009 (0.0018) model time 0.4408 (0.4451) loss 4.1788 (3.8247) grad_norm 1.4797 (1.4171) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][450/625] eta 0:01:18 lr 0.001199 wd 0.0500 time 0.4443 (0.4466) data time 0.0010 (0.0017) model time 0.4433 (0.4451) loss 3.8443 (3.8262) grad_norm 0.9449 (1.4147) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][460/625] eta 0:01:13 lr 0.001199 wd 0.0500 time 0.4458 (0.4466) data time 0.0009 (0.0017) model time 0.4450 (0.4450) loss 4.0979 (3.8293) grad_norm 1.9300 (1.4152) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][470/625] eta 0:01:09 lr 0.001199 wd 0.0500 time 0.4428 (0.4465) data time 0.0009 (0.0017) model time 0.4419 (0.4450) loss 4.1190 (3.8266) grad_norm 1.2859 (1.4109) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:42:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][480/625] eta 0:01:04 lr 0.001199 wd 0.0500 time 0.4502 (0.4465) data time 0.0007 (0.0017) model time 0.4496 (0.4449) loss 3.0983 (3.8226) grad_norm 1.7609 (1.4111) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][490/625] eta 0:01:00 lr 0.001199 wd 0.0500 time 0.4421 (0.4464) data time 0.0008 (0.0017) model time 0.4414 (0.4448) loss 2.8136 (3.8209) grad_norm 1.3309 (1.4123) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][500/625] eta 0:00:55 lr 0.001199 wd 0.0500 time 0.4420 (0.4467) data time 0.0008 (0.0017) model time 0.4412 (0.4453) loss 3.2720 (3.8241) grad_norm 1.4310 (1.4131) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][510/625] eta 0:00:51 lr 0.001199 wd 0.0500 time 0.4406 (0.4467) data time 0.0008 (0.0016) model time 0.4398 (0.4452) loss 4.2195 (3.8272) grad_norm 1.4077 (1.4125) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][520/625] eta 0:00:46 lr 0.001199 wd 0.0500 time 0.4443 (0.4466) data time 0.0007 (0.0016) model time 0.4436 (0.4452) loss 4.6313 (3.8346) grad_norm 1.2886 (1.4148) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][530/625] eta 0:00:42 lr 0.001199 wd 0.0500 time 0.4451 (0.4466) data time 0.0008 (0.0016) model time 0.4442 (0.4451) loss 3.8949 (3.8338) grad_norm 1.7452 (1.4164) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][540/625] eta 0:00:37 lr 0.001199 wd 0.0500 time 0.4473 (0.4465) data time 0.0006 (0.0016) model time 0.4467 (0.4451) loss 4.4760 (3.8364) grad_norm 1.3066 (1.4177) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][550/625] eta 0:00:33 lr 0.001199 wd 0.0500 time 0.4452 (0.4465) data time 0.0008 (0.0016) model time 0.4444 (0.4450) loss 3.4675 (3.8259) grad_norm 2.1638 (1.4177) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][560/625] eta 0:00:29 lr 0.001199 wd 0.0500 time 0.4416 (0.4464) data time 0.0007 (0.0016) model time 0.4410 (0.4450) loss 3.3634 (3.8236) grad_norm 1.0669 (1.4131) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][570/625] eta 0:00:24 lr 0.001199 wd 0.0500 time 0.4442 (0.4464) data time 0.0006 (0.0015) model time 0.4436 (0.4449) loss 4.8290 (3.8265) grad_norm 1.1074 (1.4124) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][580/625] eta 0:00:20 lr 0.001199 wd 0.0500 time 0.4448 (0.4463) data time 0.0007 (0.0015) model time 0.4442 (0.4449) loss 3.9912 (3.8290) grad_norm 1.2590 (1.4117) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][590/625] eta 0:00:15 lr 0.001199 wd 0.0500 time 0.4459 (0.4463) data time 0.0008 (0.0015) model time 0.4451 (0.4449) loss 4.0679 (3.8269) grad_norm 1.4461 (1.4086) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][600/625] eta 0:00:11 lr 0.001199 wd 0.0500 time 0.4412 (0.4463) data time 0.0008 (0.0015) model time 0.4404 (0.4449) loss 4.2763 (3.8294) grad_norm 1.1987 (1.4082) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][610/625] eta 0:00:06 lr 0.001199 wd 0.0500 time 0.4370 (0.4462) data time 0.0004 (0.0015) model time 0.4365 (0.4448) loss 2.7224 (3.8274) grad_norm 1.4196 (1.4116) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:43:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [25/300][620/625] eta 0:00:02 lr 0.001199 wd 0.0500 time 0.4388 (0.4461) data time 0.0006 (0.0015) model time 0.4382 (0.4447) loss 2.8609 (3.8267) grad_norm 1.5517 (1.4125) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 25 training takes 0:04:38 [2024-08-04 11:44:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:44:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:44:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.8110 (0.8110) Acc@1 80.957 (80.957) Acc@5 95.898 (95.898) Mem 16696MB [2024-08-04 11:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 1.3965 (0.9947) Acc@1 66.699 (76.123) Acc@5 88.916 (94.283) Mem 16696MB [2024-08-04 11:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.5605 (1.2095) Acc@1 62.549 (71.536) Acc@5 86.768 (91.223) Mem 16696MB [2024-08-04 11:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.367 Acc@5 91.157 [2024-08-04 11:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 71.4% [2024-08-04 11:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 71.37% [2024-08-04 11:44:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:44:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.9287 (0.9287) Acc@1 76.074 (76.074) Acc@5 92.871 (92.871) Mem 16696MB [2024-08-04 11:44:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.6035 (1.1444) Acc@1 61.084 (70.184) Acc@5 85.156 (90.985) Mem 16696MB [2024-08-04 11:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.7695 (1.3930) Acc@1 57.617 (65.602) Acc@5 81.689 (87.388) Mem 16696MB [2024-08-04 11:44:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 65.639 Acc@5 87.496 [2024-08-04 11:44:11 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 65.6% [2024-08-04 11:44:11 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 65.64% [2024-08-04 11:44:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:44:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][0/625] eta 0:07:46 lr 0.001199 wd 0.0500 time 0.7469 (0.7469) data time 0.3508 (0.3508) model time 0.0000 (0.0000) loss 4.0797 (4.0797) grad_norm 2.2341 (2.2341) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][10/625] eta 0:04:49 lr 0.001199 wd 0.0500 time 0.4455 (0.4706) data time 0.0007 (0.0326) model time 0.0000 (0.0000) loss 3.8988 (3.9188) grad_norm 1.8447 (1.6432) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][20/625] eta 0:04:37 lr 0.001199 wd 0.0500 time 0.4418 (0.4581) data time 0.0007 (0.0175) model time 0.0000 (0.0000) loss 3.3509 (3.7719) grad_norm 1.1626 (1.5345) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][30/625] eta 0:04:29 lr 0.001199 wd 0.0500 time 0.4436 (0.4533) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 4.1069 (3.7221) grad_norm 1.2827 (1.4976) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][40/625] eta 0:04:23 lr 0.001199 wd 0.0500 time 0.4448 (0.4510) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 3.5339 (3.7530) grad_norm 1.5427 (1.4567) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][50/625] eta 0:04:18 lr 0.001199 wd 0.0500 time 0.4404 (0.4495) data time 0.0009 (0.0077) model time 0.0000 (0.0000) loss 4.2492 (3.7438) grad_norm 1.2107 (1.4334) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][60/625] eta 0:04:15 lr 0.001199 wd 0.0500 time 0.6296 (0.4515) data time 0.0009 (0.0066) model time 0.6287 (0.4608) loss 2.6068 (3.7273) grad_norm 1.1859 (1.4080) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][70/625] eta 0:04:09 lr 0.001199 wd 0.0500 time 0.4420 (0.4495) data time 0.0005 (0.0057) model time 0.4414 (0.4488) loss 4.2108 (3.7524) grad_norm 1.1036 (1.3892) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][80/625] eta 0:04:04 lr 0.001199 wd 0.0500 time 0.4422 (0.4489) data time 0.0008 (0.0051) model time 0.4414 (0.4471) loss 4.5786 (3.7658) grad_norm 1.7568 (1.4011) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][90/625] eta 0:04:01 lr 0.001199 wd 0.0500 time 0.4452 (0.4507) data time 0.0009 (0.0046) model time 0.4443 (0.4513) loss 4.1435 (3.7642) grad_norm 1.0636 (1.4080) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][100/625] eta 0:03:56 lr 0.001199 wd 0.0500 time 0.4415 (0.4500) data time 0.0006 (0.0042) model time 0.4409 (0.4497) loss 4.6403 (3.7516) grad_norm 1.6437 (1.3986) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][110/625] eta 0:03:51 lr 0.001199 wd 0.0500 time 0.4438 (0.4495) data time 0.0007 (0.0039) model time 0.4431 (0.4487) loss 3.7586 (3.7495) grad_norm 1.2852 (1.3984) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][120/625] eta 0:03:46 lr 0.001199 wd 0.0500 time 0.4425 (0.4491) data time 0.0007 (0.0037) model time 0.4418 (0.4480) loss 3.3599 (3.7403) grad_norm 1.3236 (1.3943) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][130/625] eta 0:03:42 lr 0.001199 wd 0.0500 time 0.4415 (0.4486) data time 0.0006 (0.0034) model time 0.4408 (0.4473) loss 2.8347 (3.7417) grad_norm 1.0994 (1.4046) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][140/625] eta 0:03:37 lr 0.001199 wd 0.0500 time 0.4439 (0.4482) data time 0.0009 (0.0033) model time 0.4431 (0.4467) loss 4.4205 (3.7385) grad_norm 1.3604 (1.4038) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][150/625] eta 0:03:32 lr 0.001199 wd 0.0500 time 0.4433 (0.4479) data time 0.0008 (0.0031) model time 0.4425 (0.4464) loss 2.9486 (3.7336) grad_norm 0.9678 (1.4029) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][160/625] eta 0:03:28 lr 0.001199 wd 0.0500 time 0.4430 (0.4477) data time 0.0008 (0.0029) model time 0.4421 (0.4461) loss 4.2480 (3.7497) grad_norm 1.3649 (1.4112) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][170/625] eta 0:03:23 lr 0.001199 wd 0.0500 time 0.4458 (0.4474) data time 0.0009 (0.0028) model time 0.4449 (0.4458) loss 3.8828 (3.7415) grad_norm 1.2673 (1.4066) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][180/625] eta 0:03:19 lr 0.001199 wd 0.0500 time 0.4493 (0.4473) data time 0.0007 (0.0027) model time 0.4486 (0.4457) loss 4.1447 (3.7485) grad_norm 2.3907 (1.4162) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][190/625] eta 0:03:14 lr 0.001199 wd 0.0500 time 0.4413 (0.4472) data time 0.0006 (0.0026) model time 0.4406 (0.4456) loss 4.9257 (3.7653) grad_norm 1.8808 (1.4219) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][200/625] eta 0:03:09 lr 0.001199 wd 0.0500 time 0.4439 (0.4470) data time 0.0008 (0.0025) model time 0.4431 (0.4454) loss 4.3506 (3.7664) grad_norm 1.2974 (1.4118) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][210/625] eta 0:03:05 lr 0.001198 wd 0.0500 time 0.4477 (0.4469) data time 0.0006 (0.0024) model time 0.4471 (0.4453) loss 4.5080 (3.7534) grad_norm 1.0530 (1.4132) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][220/625] eta 0:03:00 lr 0.001198 wd 0.0500 time 0.4413 (0.4467) data time 0.0006 (0.0024) model time 0.4407 (0.4452) loss 3.5127 (3.7511) grad_norm 1.0657 (1.4235) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:45:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][230/625] eta 0:02:56 lr 0.001198 wd 0.0500 time 0.4412 (0.4466) data time 0.0008 (0.0023) model time 0.4404 (0.4450) loss 3.3635 (3.7525) grad_norm 1.3707 (1.4308) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][240/625] eta 0:02:51 lr 0.001198 wd 0.0500 time 0.4427 (0.4465) data time 0.0008 (0.0022) model time 0.4419 (0.4449) loss 4.0820 (3.7494) grad_norm 1.7892 (1.4371) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][250/625] eta 0:02:47 lr 0.001198 wd 0.0500 time 0.4405 (0.4463) data time 0.0006 (0.0022) model time 0.4399 (0.4448) loss 4.3284 (3.7595) grad_norm 1.6767 (1.4356) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][260/625] eta 0:02:42 lr 0.001198 wd 0.0500 time 0.4419 (0.4462) data time 0.0009 (0.0021) model time 0.4409 (0.4446) loss 3.8496 (3.7697) grad_norm 1.5999 (1.4439) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][270/625] eta 0:02:38 lr 0.001198 wd 0.0500 time 0.4484 (0.4461) data time 0.0006 (0.0021) model time 0.4477 (0.4445) loss 4.5639 (3.7743) grad_norm 1.3948 (1.4419) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][280/625] eta 0:02:33 lr 0.001198 wd 0.0500 time 0.4435 (0.4460) data time 0.0008 (0.0020) model time 0.4427 (0.4444) loss 4.2276 (3.7724) grad_norm 1.3645 (1.4414) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][290/625] eta 0:02:29 lr 0.001198 wd 0.0500 time 0.4450 (0.4459) data time 0.0007 (0.0020) model time 0.4444 (0.4444) loss 4.6599 (3.7681) grad_norm 1.9723 (1.4430) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][300/625] eta 0:02:24 lr 0.001198 wd 0.0500 time 0.4485 (0.4459) data time 0.0009 (0.0020) model time 0.4476 (0.4444) loss 3.6934 (3.7628) grad_norm 1.7686 (1.4492) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][310/625] eta 0:02:20 lr 0.001198 wd 0.0500 time 0.4395 (0.4459) data time 0.0006 (0.0019) model time 0.4389 (0.4444) loss 4.5677 (3.7552) grad_norm 2.3824 (1.4499) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][320/625] eta 0:02:15 lr 0.001198 wd 0.0500 time 0.4436 (0.4458) data time 0.0008 (0.0019) model time 0.4428 (0.4443) loss 4.4608 (3.7591) grad_norm 1.3726 (1.4464) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][330/625] eta 0:02:11 lr 0.001198 wd 0.0500 time 0.4462 (0.4458) data time 0.0008 (0.0018) model time 0.4454 (0.4444) loss 4.4101 (3.7636) grad_norm 1.2119 (1.4465) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][340/625] eta 0:02:07 lr 0.001198 wd 0.0500 time 0.4427 (0.4458) data time 0.0006 (0.0018) model time 0.4421 (0.4443) loss 3.2034 (3.7708) grad_norm 1.2143 (1.4433) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][350/625] eta 0:02:02 lr 0.001198 wd 0.0500 time 0.4404 (0.4457) data time 0.0006 (0.0018) model time 0.4398 (0.4443) loss 3.4711 (3.7708) grad_norm 1.1793 (1.4415) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][360/625] eta 0:01:58 lr 0.001198 wd 0.0500 time 0.4467 (0.4457) data time 0.0008 (0.0017) model time 0.4459 (0.4442) loss 3.4688 (3.7705) grad_norm 1.1954 (1.4365) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:46:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][370/625] eta 0:01:53 lr 0.001198 wd 0.0500 time 0.4433 (0.4456) data time 0.0006 (0.0017) model time 0.4427 (0.4442) loss 4.5616 (3.7707) grad_norm 1.3052 (1.4355) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][380/625] eta 0:01:49 lr 0.001198 wd 0.0500 time 0.4450 (0.4456) data time 0.0006 (0.0017) model time 0.4444 (0.4442) loss 4.3207 (3.7726) grad_norm 1.0690 (1.4306) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][390/625] eta 0:01:44 lr 0.001198 wd 0.0500 time 0.4407 (0.4455) data time 0.0008 (0.0017) model time 0.4398 (0.4442) loss 4.4406 (3.7784) grad_norm 1.3504 (1.4271) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][400/625] eta 0:01:40 lr 0.001198 wd 0.0500 time 0.4404 (0.4459) data time 0.0007 (0.0017) model time 0.4397 (0.4446) loss 3.7637 (3.7748) grad_norm 1.1163 (1.4238) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][410/625] eta 0:01:35 lr 0.001198 wd 0.0500 time 0.4490 (0.4459) data time 0.0008 (0.0016) model time 0.4483 (0.4446) loss 4.7305 (3.7710) grad_norm 1.2376 (1.4228) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][420/625] eta 0:01:31 lr 0.001198 wd 0.0500 time 0.4474 (0.4458) data time 0.0006 (0.0016) model time 0.4468 (0.4445) loss 3.6455 (3.7717) grad_norm 1.2867 (1.4248) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][430/625] eta 0:01:27 lr 0.001198 wd 0.0500 time 0.4443 (0.4463) data time 0.0006 (0.0016) model time 0.4437 (0.4451) loss 4.3707 (3.7714) grad_norm 1.8874 (1.4379) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][440/625] eta 0:01:22 lr 0.001198 wd 0.0500 time 0.4412 (0.4463) data time 0.0006 (0.0016) model time 0.4406 (0.4451) loss 2.9786 (3.7746) grad_norm 1.5497 (1.4379) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][450/625] eta 0:01:18 lr 0.001198 wd 0.0500 time 0.4430 (0.4462) data time 0.0008 (0.0016) model time 0.4422 (0.4450) loss 3.9401 (3.7816) grad_norm 1.3582 (1.4369) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][460/625] eta 0:01:13 lr 0.001198 wd 0.0500 time 0.4432 (0.4461) data time 0.0007 (0.0015) model time 0.4425 (0.4449) loss 4.0775 (3.7889) grad_norm 1.1201 (1.4370) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][470/625] eta 0:01:09 lr 0.001198 wd 0.0500 time 0.4426 (0.4460) data time 0.0008 (0.0015) model time 0.4417 (0.4448) loss 4.4930 (3.7807) grad_norm 1.3433 (1.4337) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][480/625] eta 0:01:04 lr 0.001198 wd 0.0500 time 0.4567 (0.4460) data time 0.0007 (0.0015) model time 0.4560 (0.4448) loss 4.0075 (3.7797) grad_norm 1.3029 (1.4310) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][490/625] eta 0:01:00 lr 0.001198 wd 0.0500 time 0.4420 (0.4460) data time 0.0007 (0.0015) model time 0.4413 (0.4448) loss 3.8384 (3.7780) grad_norm 1.2581 (1.4280) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][500/625] eta 0:00:55 lr 0.001198 wd 0.0500 time 0.4399 (0.4459) data time 0.0007 (0.0015) model time 0.4392 (0.4448) loss 4.4275 (3.7834) grad_norm 2.4573 (1.4294) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][510/625] eta 0:00:51 lr 0.001198 wd 0.0500 time 0.4464 (0.4459) data time 0.0006 (0.0015) model time 0.4457 (0.4447) loss 4.5326 (3.7929) grad_norm 1.0541 (1.4301) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][520/625] eta 0:00:46 lr 0.001198 wd 0.0500 time 0.4436 (0.4459) data time 0.0008 (0.0015) model time 0.4428 (0.4447) loss 3.7338 (3.7963) grad_norm 1.7887 (1.4325) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][530/625] eta 0:00:42 lr 0.001198 wd 0.0500 time 0.4400 (0.4459) data time 0.0008 (0.0014) model time 0.4393 (0.4447) loss 3.8602 (3.7992) grad_norm 1.1333 (1.4305) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][540/625] eta 0:00:37 lr 0.001198 wd 0.0500 time 0.4408 (0.4458) data time 0.0008 (0.0014) model time 0.4399 (0.4447) loss 2.8752 (3.7916) grad_norm 1.1922 (1.4310) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][550/625] eta 0:00:33 lr 0.001198 wd 0.0500 time 0.4452 (0.4458) data time 0.0008 (0.0014) model time 0.4444 (0.4447) loss 2.7993 (3.7894) grad_norm 1.3201 (1.4263) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][560/625] eta 0:00:28 lr 0.001198 wd 0.0500 time 0.4437 (0.4458) data time 0.0006 (0.0014) model time 0.4431 (0.4446) loss 3.5411 (3.7838) grad_norm 1.2320 (1.4258) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][570/625] eta 0:00:24 lr 0.001198 wd 0.0500 time 0.4397 (0.4457) data time 0.0009 (0.0014) model time 0.4388 (0.4446) loss 2.9265 (3.7796) grad_norm 1.0291 (1.4229) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][580/625] eta 0:00:20 lr 0.001198 wd 0.0500 time 0.4446 (0.4457) data time 0.0010 (0.0014) model time 0.4436 (0.4445) loss 3.9600 (3.7809) grad_norm 1.6006 (1.4223) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][590/625] eta 0:00:15 lr 0.001198 wd 0.0500 time 0.4502 (0.4458) data time 0.0008 (0.0014) model time 0.4494 (0.4446) loss 3.6933 (3.7819) grad_norm 1.1987 (1.4211) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][600/625] eta 0:00:11 lr 0.001198 wd 0.0500 time 0.4425 (0.4458) data time 0.0008 (0.0014) model time 0.4417 (0.4446) loss 3.6740 (3.7861) grad_norm 1.9233 (1.4226) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][610/625] eta 0:00:06 lr 0.001198 wd 0.0500 time 0.4423 (0.4457) data time 0.0004 (0.0014) model time 0.4419 (0.4446) loss 4.1748 (3.7829) grad_norm 1.0249 (1.4211) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [26/300][620/625] eta 0:00:02 lr 0.001198 wd 0.0500 time 0.4399 (0.4459) data time 0.0006 (0.0014) model time 0.4393 (0.4448) loss 2.5623 (3.7827) grad_norm 1.4049 (1.4181) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 26 training takes 0:04:38 [2024-08-04 11:48:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:48:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:48:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.7944 (0.7944) Acc@1 83.789 (83.789) Acc@5 96.338 (96.338) Mem 16696MB [2024-08-04 11:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.4219 (1.0107) Acc@1 66.895 (76.744) Acc@5 89.404 (94.740) Mem 16696MB [2024-08-04 11:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.134) Loss 1.5791 (1.2261) Acc@1 64.209 (72.240) Acc@5 87.646 (91.685) Mem 16696MB [2024-08-04 11:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.195 Acc@5 91.711 [2024-08-04 11:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 72.2% [2024-08-04 11:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 72.19% [2024-08-04 11:48:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:48:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.8608 (0.8608) Acc@1 77.881 (77.881) Acc@5 93.848 (93.848) Mem 16696MB [2024-08-04 11:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.150) Loss 1.5205 (1.0747) Acc@1 62.207 (71.733) Acc@5 86.621 (91.934) Mem 16696MB [2024-08-04 11:49:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.6855 (1.3187) Acc@1 59.424 (67.104) Acc@5 82.715 (88.421) Mem 16696MB [2024-08-04 11:49:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 67.103 Acc@5 88.462 [2024-08-04 11:49:01 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 67.1% [2024-08-04 11:49:01 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 67.10% [2024-08-04 11:49:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:49:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][0/625] eta 0:08:17 lr 0.001198 wd 0.0500 time 0.7954 (0.7954) data time 0.4102 (0.4102) model time 0.0000 (0.0000) loss 3.8710 (3.8710) grad_norm 1.4050 (1.4050) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][10/625] eta 0:04:53 lr 0.001198 wd 0.0500 time 0.4452 (0.4772) data time 0.0006 (0.0380) model time 0.0000 (0.0000) loss 4.0959 (3.8086) grad_norm 1.5739 (1.4498) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][20/625] eta 0:04:39 lr 0.001198 wd 0.0500 time 0.4456 (0.4614) data time 0.0006 (0.0202) model time 0.0000 (0.0000) loss 3.0628 (3.7111) grad_norm 1.4568 (1.4868) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][30/625] eta 0:04:31 lr 0.001198 wd 0.0500 time 0.4458 (0.4558) data time 0.0006 (0.0140) model time 0.0000 (0.0000) loss 2.1671 (3.6537) grad_norm 1.6817 (1.4768) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][40/625] eta 0:04:24 lr 0.001198 wd 0.0500 time 0.4508 (0.4529) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 4.1090 (3.6680) grad_norm 1.2545 (1.4387) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][50/625] eta 0:04:19 lr 0.001198 wd 0.0500 time 0.4438 (0.4515) data time 0.0008 (0.0088) model time 0.0000 (0.0000) loss 3.4815 (3.6830) grad_norm 1.6739 (1.4116) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][60/625] eta 0:04:16 lr 0.001198 wd 0.0500 time 0.6338 (0.4536) data time 0.0006 (0.0075) model time 0.6332 (0.4630) loss 2.6954 (3.6655) grad_norm 1.3604 (1.3897) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][70/625] eta 0:04:10 lr 0.001198 wd 0.0500 time 0.4437 (0.4516) data time 0.0006 (0.0065) model time 0.4431 (0.4511) loss 4.4571 (3.7108) grad_norm 1.1743 (1.3875) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][80/625] eta 0:04:05 lr 0.001198 wd 0.0500 time 0.4421 (0.4505) data time 0.0008 (0.0058) model time 0.4413 (0.4480) loss 4.1145 (3.7595) grad_norm 1.2423 (1.3717) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][90/625] eta 0:04:00 lr 0.001198 wd 0.0500 time 0.4431 (0.4499) data time 0.0006 (0.0053) model time 0.4424 (0.4470) loss 3.2078 (3.7491) grad_norm 1.1118 (1.3506) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][100/625] eta 0:03:55 lr 0.001198 wd 0.0500 time 0.4432 (0.4492) data time 0.0006 (0.0048) model time 0.4426 (0.4461) loss 4.3595 (3.7456) grad_norm 1.5085 (1.3391) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][110/625] eta 0:03:51 lr 0.001198 wd 0.0500 time 0.4431 (0.4487) data time 0.0009 (0.0045) model time 0.4422 (0.4455) loss 3.5301 (3.7175) grad_norm 1.3135 (1.3444) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][120/625] eta 0:03:46 lr 0.001198 wd 0.0500 time 0.4454 (0.4483) data time 0.0006 (0.0042) model time 0.4448 (0.4452) loss 3.3890 (3.7172) grad_norm 1.4470 (1.3483) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][130/625] eta 0:03:41 lr 0.001198 wd 0.0500 time 0.4439 (0.4481) data time 0.0006 (0.0039) model time 0.4433 (0.4451) loss 2.7589 (3.7283) grad_norm 1.3551 (1.3847) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][140/625] eta 0:03:37 lr 0.001198 wd 0.0500 time 0.4453 (0.4491) data time 0.0009 (0.0037) model time 0.4444 (0.4470) loss 4.3743 (3.7358) grad_norm 1.3379 (1.4049) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][150/625] eta 0:03:33 lr 0.001198 wd 0.0500 time 0.4414 (0.4487) data time 0.0011 (0.0035) model time 0.4403 (0.4464) loss 3.5045 (3.7231) grad_norm 0.9696 (1.4007) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][160/625] eta 0:03:28 lr 0.001198 wd 0.0500 time 0.4448 (0.4484) data time 0.0007 (0.0033) model time 0.4441 (0.4461) loss 2.9443 (3.7084) grad_norm 1.6028 (1.4146) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][170/625] eta 0:03:23 lr 0.001198 wd 0.0500 time 0.4415 (0.4481) data time 0.0009 (0.0032) model time 0.4406 (0.4458) loss 4.0844 (3.7129) grad_norm 1.3880 (1.4105) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][180/625] eta 0:03:19 lr 0.001198 wd 0.0500 time 0.4470 (0.4479) data time 0.0006 (0.0031) model time 0.4464 (0.4457) loss 4.5259 (3.7156) grad_norm 1.5579 (1.4142) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][190/625] eta 0:03:14 lr 0.001198 wd 0.0500 time 0.4469 (0.4477) data time 0.0009 (0.0029) model time 0.4461 (0.4455) loss 4.1758 (3.7193) grad_norm 1.1901 (1.4042) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][200/625] eta 0:03:10 lr 0.001198 wd 0.0500 time 0.4480 (0.4476) data time 0.0009 (0.0028) model time 0.4472 (0.4454) loss 4.1860 (3.7243) grad_norm 1.2713 (1.4000) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][210/625] eta 0:03:05 lr 0.001198 wd 0.0500 time 0.4440 (0.4474) data time 0.0006 (0.0027) model time 0.4434 (0.4452) loss 4.4163 (3.7387) grad_norm 1.1172 (1.4013) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][220/625] eta 0:03:01 lr 0.001198 wd 0.0500 time 0.4405 (0.4472) data time 0.0009 (0.0027) model time 0.4395 (0.4451) loss 2.7476 (3.7307) grad_norm 1.6901 (1.4035) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][230/625] eta 0:02:56 lr 0.001198 wd 0.0500 time 0.4410 (0.4470) data time 0.0008 (0.0026) model time 0.4402 (0.4449) loss 4.1576 (3.7443) grad_norm 1.1873 (1.4036) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][240/625] eta 0:02:52 lr 0.001198 wd 0.0500 time 0.4438 (0.4468) data time 0.0006 (0.0025) model time 0.4432 (0.4448) loss 4.3430 (3.7458) grad_norm 1.0787 (1.4089) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][250/625] eta 0:02:47 lr 0.001198 wd 0.0500 time 0.4457 (0.4467) data time 0.0006 (0.0024) model time 0.4451 (0.4447) loss 2.7242 (3.7475) grad_norm 1.0194 (1.4092) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:50:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][260/625] eta 0:02:43 lr 0.001198 wd 0.0500 time 0.4431 (0.4466) data time 0.0006 (0.0024) model time 0.4425 (0.4446) loss 2.7312 (3.7515) grad_norm 1.5591 (1.4029) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][270/625] eta 0:02:38 lr 0.001198 wd 0.0500 time 0.4432 (0.4465) data time 0.0009 (0.0023) model time 0.4423 (0.4445) loss 4.5183 (3.7520) grad_norm 1.2954 (1.4034) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][280/625] eta 0:02:34 lr 0.001198 wd 0.0500 time 0.4465 (0.4464) data time 0.0007 (0.0023) model time 0.4458 (0.4445) loss 4.2257 (3.7579) grad_norm 1.2226 (1.3944) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][290/625] eta 0:02:29 lr 0.001198 wd 0.0500 time 0.4416 (0.4464) data time 0.0006 (0.0022) model time 0.4410 (0.4445) loss 3.3771 (3.7634) grad_norm 1.4841 (1.3932) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][300/625] eta 0:02:25 lr 0.001198 wd 0.0500 time 0.4381 (0.4463) data time 0.0008 (0.0022) model time 0.4373 (0.4444) loss 3.7509 (3.7643) grad_norm 1.4232 (1.3900) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][310/625] eta 0:02:20 lr 0.001198 wd 0.0500 time 0.4455 (0.4463) data time 0.0007 (0.0021) model time 0.4448 (0.4444) loss 4.5490 (3.7585) grad_norm 1.4733 (1.3926) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][320/625] eta 0:02:16 lr 0.001198 wd 0.0500 time 0.4425 (0.4462) data time 0.0007 (0.0021) model time 0.4419 (0.4444) loss 3.7571 (3.7496) grad_norm 1.4170 (1.3929) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][330/625] eta 0:02:11 lr 0.001198 wd 0.0500 time 0.4438 (0.4461) data time 0.0008 (0.0020) model time 0.4431 (0.4444) loss 3.7260 (3.7566) grad_norm 2.6525 (1.3945) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][340/625] eta 0:02:07 lr 0.001198 wd 0.0500 time 0.4528 (0.4461) data time 0.0006 (0.0020) model time 0.4522 (0.4444) loss 3.5941 (3.7575) grad_norm 1.2716 (1.3944) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][350/625] eta 0:02:02 lr 0.001198 wd 0.0500 time 0.4410 (0.4460) data time 0.0008 (0.0020) model time 0.4402 (0.4443) loss 3.4806 (3.7542) grad_norm 2.1398 (1.3946) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][360/625] eta 0:01:58 lr 0.001198 wd 0.0500 time 0.4441 (0.4460) data time 0.0006 (0.0019) model time 0.4434 (0.4443) loss 4.4650 (3.7544) grad_norm 1.1660 (1.3974) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][370/625] eta 0:01:53 lr 0.001198 wd 0.0500 time 0.4525 (0.4460) data time 0.0009 (0.0019) model time 0.4517 (0.4443) loss 4.1363 (3.7609) grad_norm 1.5790 (1.3992) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][380/625] eta 0:01:49 lr 0.001198 wd 0.0500 time 0.4428 (0.4459) data time 0.0007 (0.0019) model time 0.4421 (0.4443) loss 4.1331 (3.7593) grad_norm 1.2443 (1.3965) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][390/625] eta 0:01:44 lr 0.001198 wd 0.0500 time 0.4420 (0.4459) data time 0.0009 (0.0018) model time 0.4411 (0.4442) loss 3.2281 (3.7595) grad_norm 1.3125 (1.3899) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][400/625] eta 0:01:40 lr 0.001198 wd 0.0500 time 0.4445 (0.4461) data time 0.0008 (0.0018) model time 0.4437 (0.4445) loss 2.7647 (3.7575) grad_norm 0.9144 (1.3885) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][410/625] eta 0:01:35 lr 0.001198 wd 0.0500 time 0.4398 (0.4461) data time 0.0011 (0.0018) model time 0.4387 (0.4445) loss 3.2488 (3.7582) grad_norm 1.2910 (1.3897) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][420/625] eta 0:01:31 lr 0.001198 wd 0.0500 time 0.4445 (0.4460) data time 0.0008 (0.0018) model time 0.4437 (0.4444) loss 3.9697 (3.7644) grad_norm 1.5748 (1.3909) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][430/625] eta 0:01:26 lr 0.001198 wd 0.0500 time 0.4411 (0.4459) data time 0.0008 (0.0017) model time 0.4403 (0.4443) loss 3.9015 (3.7625) grad_norm 1.1075 (1.3887) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][440/625] eta 0:01:22 lr 0.001198 wd 0.0500 time 0.4400 (0.4458) data time 0.0008 (0.0017) model time 0.4392 (0.4442) loss 2.7535 (3.7612) grad_norm 1.0084 (1.3914) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][450/625] eta 0:01:18 lr 0.001198 wd 0.0500 time 0.4438 (0.4457) data time 0.0006 (0.0017) model time 0.4432 (0.4442) loss 4.4464 (3.7664) grad_norm 1.3377 (1.3886) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][460/625] eta 0:01:13 lr 0.001198 wd 0.0500 time 0.4428 (0.4457) data time 0.0006 (0.0017) model time 0.4422 (0.4442) loss 4.7645 (3.7709) grad_norm 2.7163 (1.3917) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][470/625] eta 0:01:09 lr 0.001198 wd 0.0500 time 0.6574 (0.4461) data time 0.0006 (0.0017) model time 0.6568 (0.4447) loss 3.9818 (3.7650) grad_norm 1.4925 (1.3967) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][480/625] eta 0:01:04 lr 0.001198 wd 0.0500 time 0.4476 (0.4461) data time 0.0008 (0.0017) model time 0.4468 (0.4447) loss 3.8733 (3.7648) grad_norm 1.1955 (1.3960) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][490/625] eta 0:01:00 lr 0.001198 wd 0.0500 time 0.4438 (0.4461) data time 0.0008 (0.0016) model time 0.4431 (0.4446) loss 3.4642 (3.7602) grad_norm 1.0221 (1.3935) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][500/625] eta 0:00:55 lr 0.001198 wd 0.0500 time 0.4534 (0.4461) data time 0.0008 (0.0016) model time 0.4525 (0.4447) loss 3.7107 (3.7571) grad_norm 1.4834 (1.3913) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][510/625] eta 0:00:51 lr 0.001198 wd 0.0500 time 0.4468 (0.4461) data time 0.0008 (0.0016) model time 0.4460 (0.4447) loss 3.8359 (3.7517) grad_norm 1.2646 (1.3954) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][520/625] eta 0:00:46 lr 0.001198 wd 0.0500 time 0.4437 (0.4460) data time 0.0008 (0.0016) model time 0.4429 (0.4446) loss 3.4350 (3.7493) grad_norm 1.2842 (1.3957) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][530/625] eta 0:00:42 lr 0.001198 wd 0.0500 time 0.4472 (0.4460) data time 0.0006 (0.0016) model time 0.4466 (0.4446) loss 4.5808 (3.7554) grad_norm 1.2072 (1.3934) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][540/625] eta 0:00:37 lr 0.001198 wd 0.0500 time 0.4465 (0.4460) data time 0.0008 (0.0016) model time 0.4457 (0.4446) loss 3.6661 (3.7557) grad_norm 1.0093 (1.3875) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][550/625] eta 0:00:33 lr 0.001198 wd 0.0500 time 0.4464 (0.4460) data time 0.0009 (0.0015) model time 0.4455 (0.4446) loss 3.9473 (3.7608) grad_norm 1.3859 (1.3841) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][560/625] eta 0:00:28 lr 0.001198 wd 0.0500 time 0.4437 (0.4460) data time 0.0006 (0.0015) model time 0.4431 (0.4446) loss 3.9656 (3.7617) grad_norm 1.1699 (1.3879) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][570/625] eta 0:00:24 lr 0.001198 wd 0.0500 time 0.4416 (0.4460) data time 0.0008 (0.0015) model time 0.4408 (0.4446) loss 4.2685 (3.7608) grad_norm 1.0565 (1.3894) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][580/625] eta 0:00:20 lr 0.001198 wd 0.0500 time 0.4432 (0.4459) data time 0.0007 (0.0015) model time 0.4425 (0.4446) loss 3.1566 (3.7564) grad_norm 1.3071 (1.3887) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][590/625] eta 0:00:15 lr 0.001198 wd 0.0500 time 0.4448 (0.4460) data time 0.0008 (0.0015) model time 0.4440 (0.4446) loss 4.0964 (3.7568) grad_norm 1.0680 (1.3897) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 11:53:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][600/625] eta 0:00:11 lr 0.001198 wd 0.0500 time 0.4440 (0.4460) data time 0.0008 (0.0015) model time 0.4432 (0.4446) loss 3.0920 (3.7566) grad_norm 0.9921 (1.3873) loss_scale 16384.0000 (8328.3062) mem 16696MB [2024-08-04 11:53:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][610/625] eta 0:00:06 lr 0.001198 wd 0.0500 time 0.4412 (0.4459) data time 0.0004 (0.0015) model time 0.4408 (0.4446) loss 2.8611 (3.7579) grad_norm 1.3548 (1.3918) loss_scale 16384.0000 (8460.1506) mem 16696MB [2024-08-04 11:53:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [27/300][620/625] eta 0:00:02 lr 0.001198 wd 0.0500 time 0.4433 (0.4459) data time 0.0004 (0.0015) model time 0.4429 (0.4446) loss 3.3007 (3.7575) grad_norm 1.2575 (1.3933) loss_scale 16384.0000 (8587.7488) mem 16696MB [2024-08-04 11:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 27 training takes 0:04:38 [2024-08-04 11:53:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:53:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:53:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.7656 (0.7656) Acc@1 83.447 (83.447) Acc@5 96.582 (96.582) Mem 16696MB [2024-08-04 11:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 1.3496 (0.9684) Acc@1 69.580 (77.424) Acc@5 89.453 (94.616) Mem 16696MB [2024-08-04 11:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.5010 (1.1811) Acc@1 64.697 (72.631) Acc@5 87.305 (91.690) Mem 16696MB [2024-08-04 11:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.551 Acc@5 91.667 [2024-08-04 11:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 72.6% [2024-08-04 11:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 72.55% [2024-08-04 11:53:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:53:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.8052 (0.8052) Acc@1 79.492 (79.492) Acc@5 94.580 (94.580) Mem 16696MB [2024-08-04 11:53:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.4512 (1.0176) Acc@1 64.209 (73.060) Acc@5 87.402 (92.569) Mem 16696MB [2024-08-04 11:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.6172 (1.2569) Acc@1 60.986 (68.434) Acc@5 83.691 (89.242) Mem 16696MB [2024-08-04 11:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 68.394 Acc@5 89.257 [2024-08-04 11:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 68.4% [2024-08-04 11:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 68.39% [2024-08-04 11:53:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:53:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][0/625] eta 0:08:08 lr 0.001198 wd 0.0500 time 0.7814 (0.7814) data time 0.4015 (0.4015) model time 0.0000 (0.0000) loss 2.9829 (2.9829) grad_norm 1.7367 (1.7367) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][10/625] eta 0:04:51 lr 0.001198 wd 0.0500 time 0.4469 (0.4741) data time 0.0007 (0.0372) model time 0.0000 (0.0000) loss 4.7673 (3.7024) grad_norm 1.7758 (1.3173) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][20/625] eta 0:04:38 lr 0.001198 wd 0.0500 time 0.4462 (0.4601) data time 0.0008 (0.0199) model time 0.0000 (0.0000) loss 4.1299 (3.7738) grad_norm 1.1114 (1.4067) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][30/625] eta 0:04:30 lr 0.001198 wd 0.0500 time 0.4433 (0.4547) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 3.9627 (3.7057) grad_norm 1.6028 (1.4021) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][40/625] eta 0:04:24 lr 0.001198 wd 0.0500 time 0.4508 (0.4526) data time 0.0009 (0.0106) model time 0.0000 (0.0000) loss 3.8444 (3.7319) grad_norm 1.3657 (1.4720) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][50/625] eta 0:04:19 lr 0.001198 wd 0.0500 time 0.4427 (0.4507) data time 0.0009 (0.0087) model time 0.0000 (0.0000) loss 2.8203 (3.7035) grad_norm 1.6152 (1.4806) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][60/625] eta 0:04:15 lr 0.001198 wd 0.0500 time 0.6279 (0.4527) data time 0.0008 (0.0074) model time 0.6271 (0.4618) loss 4.2081 (3.6835) grad_norm 1.9285 (1.4836) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][70/625] eta 0:04:10 lr 0.001198 wd 0.0500 time 0.4425 (0.4519) data time 0.0007 (0.0064) model time 0.4419 (0.4542) loss 4.0613 (3.7167) grad_norm 1.0530 (1.4502) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][80/625] eta 0:04:05 lr 0.001198 wd 0.0500 time 0.4401 (0.4508) data time 0.0009 (0.0057) model time 0.4392 (0.4502) loss 4.2837 (3.7602) grad_norm 1.5356 (1.4599) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][90/625] eta 0:04:00 lr 0.001198 wd 0.0500 time 0.4452 (0.4501) data time 0.0006 (0.0052) model time 0.4447 (0.4486) loss 3.3986 (3.7517) grad_norm 1.7189 (1.4440) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][100/625] eta 0:03:56 lr 0.001198 wd 0.0500 time 0.4380 (0.4495) data time 0.0009 (0.0048) model time 0.4371 (0.4475) loss 3.8316 (3.7490) grad_norm 1.5277 (1.4280) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][110/625] eta 0:03:51 lr 0.001198 wd 0.0500 time 0.4435 (0.4491) data time 0.0009 (0.0044) model time 0.4426 (0.4469) loss 3.4345 (3.7529) grad_norm 1.5500 (1.4250) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][120/625] eta 0:03:46 lr 0.001197 wd 0.0500 time 0.4458 (0.4487) data time 0.0008 (0.0041) model time 0.4450 (0.4465) loss 3.9045 (3.7404) grad_norm 1.0534 (1.4281) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][130/625] eta 0:03:41 lr 0.001197 wd 0.0500 time 0.4433 (0.4485) data time 0.0008 (0.0039) model time 0.4425 (0.4462) loss 3.1970 (3.7330) grad_norm 1.3815 (1.4429) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:54:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][140/625] eta 0:03:37 lr 0.001197 wd 0.0500 time 0.4392 (0.4482) data time 0.0007 (0.0036) model time 0.4384 (0.4459) loss 4.5366 (3.7460) grad_norm 1.4478 (1.4299) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][150/625] eta 0:03:32 lr 0.001197 wd 0.0500 time 0.4510 (0.4480) data time 0.0006 (0.0035) model time 0.4504 (0.4458) loss 3.9283 (3.7366) grad_norm 1.3642 (1.4251) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][160/625] eta 0:03:28 lr 0.001197 wd 0.0500 time 0.4438 (0.4477) data time 0.0006 (0.0033) model time 0.4432 (0.4455) loss 3.0979 (3.7451) grad_norm 1.5840 (1.4162) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][170/625] eta 0:03:23 lr 0.001197 wd 0.0500 time 0.4367 (0.4475) data time 0.0009 (0.0031) model time 0.4358 (0.4453) loss 4.0430 (3.7344) grad_norm 1.5121 (1.4153) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][180/625] eta 0:03:19 lr 0.001197 wd 0.0500 time 0.4500 (0.4472) data time 0.0007 (0.0030) model time 0.4493 (0.4451) loss 3.8023 (3.7385) grad_norm 0.9366 (1.4030) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][190/625] eta 0:03:14 lr 0.001197 wd 0.0500 time 0.4450 (0.4471) data time 0.0006 (0.0029) model time 0.4444 (0.4449) loss 3.0677 (3.7175) grad_norm 1.4963 (1.4020) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][200/625] eta 0:03:09 lr 0.001197 wd 0.0500 time 0.4436 (0.4469) data time 0.0008 (0.0028) model time 0.4428 (0.4448) loss 4.0413 (3.7113) grad_norm 1.0238 (1.3947) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][210/625] eta 0:03:05 lr 0.001197 wd 0.0500 time 0.4454 (0.4468) data time 0.0009 (0.0027) model time 0.4444 (0.4448) loss 4.2357 (3.7148) grad_norm 1.2714 (1.3976) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][220/625] eta 0:03:00 lr 0.001197 wd 0.0500 time 0.4448 (0.4468) data time 0.0009 (0.0026) model time 0.4439 (0.4448) loss 3.7149 (3.7158) grad_norm 1.2568 (1.3931) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][230/625] eta 0:02:56 lr 0.001197 wd 0.0500 time 0.4415 (0.4466) data time 0.0006 (0.0025) model time 0.4409 (0.4447) loss 3.5426 (3.7114) grad_norm 2.3625 (1.4016) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][240/625] eta 0:02:51 lr 0.001197 wd 0.0500 time 0.4419 (0.4465) data time 0.0008 (0.0025) model time 0.4411 (0.4446) loss 2.9004 (3.7115) grad_norm 1.5012 (1.4038) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][250/625] eta 0:02:47 lr 0.001197 wd 0.0500 time 0.4492 (0.4464) data time 0.0006 (0.0024) model time 0.4485 (0.4446) loss 2.6476 (3.7162) grad_norm 1.9273 (1.4088) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][260/625] eta 0:02:43 lr 0.001197 wd 0.0500 time 0.4394 (0.4469) data time 0.0007 (0.0023) model time 0.4387 (0.4452) loss 4.3163 (3.7266) grad_norm 1.6259 (1.4071) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][270/625] eta 0:02:38 lr 0.001197 wd 0.0500 time 0.4437 (0.4469) data time 0.0006 (0.0023) model time 0.4431 (0.4452) loss 4.5753 (3.7249) grad_norm 1.2884 (1.4024) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][280/625] eta 0:02:34 lr 0.001197 wd 0.0500 time 0.4470 (0.4469) data time 0.0007 (0.0022) model time 0.4463 (0.4452) loss 4.1798 (3.7107) grad_norm 1.9143 (1.4030) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][290/625] eta 0:02:29 lr 0.001197 wd 0.0500 time 0.4488 (0.4469) data time 0.0008 (0.0022) model time 0.4480 (0.4452) loss 4.0786 (3.7097) grad_norm 1.6787 (1.4043) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][300/625] eta 0:02:25 lr 0.001197 wd 0.0500 time 0.4449 (0.4468) data time 0.0008 (0.0021) model time 0.4441 (0.4452) loss 2.9648 (3.7043) grad_norm 1.4639 (1.4082) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][310/625] eta 0:02:20 lr 0.001197 wd 0.0500 time 0.4431 (0.4467) data time 0.0006 (0.0021) model time 0.4424 (0.4451) loss 4.3167 (3.7107) grad_norm 1.9502 (1.4134) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][320/625] eta 0:02:16 lr 0.001197 wd 0.0500 time 0.4401 (0.4466) data time 0.0009 (0.0020) model time 0.4392 (0.4450) loss 3.9959 (3.7032) grad_norm 1.1063 (1.4143) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][330/625] eta 0:02:11 lr 0.001197 wd 0.0500 time 0.4440 (0.4465) data time 0.0007 (0.0020) model time 0.4434 (0.4449) loss 4.5630 (3.6993) grad_norm 1.0837 (1.4151) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][340/625] eta 0:02:07 lr 0.001197 wd 0.0500 time 0.4403 (0.4465) data time 0.0008 (0.0020) model time 0.4395 (0.4449) loss 2.8508 (3.6975) grad_norm 1.4026 (1.4098) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][350/625] eta 0:02:02 lr 0.001197 wd 0.0500 time 0.4426 (0.4463) data time 0.0007 (0.0019) model time 0.4419 (0.4448) loss 4.7224 (3.6983) grad_norm 1.5127 (1.4093) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][360/625] eta 0:01:58 lr 0.001197 wd 0.0500 time 0.4443 (0.4463) data time 0.0008 (0.0019) model time 0.4435 (0.4448) loss 2.5076 (3.6957) grad_norm 1.8564 (1.4165) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][370/625] eta 0:01:53 lr 0.001197 wd 0.0500 time 0.4395 (0.4462) data time 0.0009 (0.0019) model time 0.4386 (0.4447) loss 3.5721 (3.6985) grad_norm 1.5859 (1.4175) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][380/625] eta 0:01:49 lr 0.001197 wd 0.0500 time 0.4460 (0.4462) data time 0.0008 (0.0019) model time 0.4451 (0.4447) loss 3.1038 (3.7019) grad_norm 1.3474 (1.4178) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][390/625] eta 0:01:44 lr 0.001197 wd 0.0500 time 0.4419 (0.4462) data time 0.0009 (0.0018) model time 0.4410 (0.4447) loss 3.2426 (3.6969) grad_norm 1.1066 (1.4182) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][400/625] eta 0:01:40 lr 0.001197 wd 0.0500 time 0.4443 (0.4465) data time 0.0008 (0.0018) model time 0.4435 (0.4451) loss 4.2511 (3.7002) grad_norm 1.4293 (1.4197) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][410/625] eta 0:01:35 lr 0.001197 wd 0.0500 time 0.4381 (0.4465) data time 0.0010 (0.0018) model time 0.4371 (0.4450) loss 3.7405 (3.7033) grad_norm 1.0026 (1.4214) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][420/625] eta 0:01:31 lr 0.001197 wd 0.0500 time 0.4416 (0.4464) data time 0.0007 (0.0018) model time 0.4409 (0.4450) loss 2.9511 (3.7013) grad_norm 1.1528 (1.4165) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][430/625] eta 0:01:27 lr 0.001197 wd 0.0500 time 0.4439 (0.4463) data time 0.0007 (0.0017) model time 0.4432 (0.4449) loss 3.2096 (3.7074) grad_norm 0.9379 (1.4134) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][440/625] eta 0:01:22 lr 0.001197 wd 0.0500 time 0.4419 (0.4462) data time 0.0008 (0.0017) model time 0.4411 (0.4449) loss 3.5907 (3.7051) grad_norm 1.6541 (1.4135) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][450/625] eta 0:01:18 lr 0.001197 wd 0.0500 time 0.4462 (0.4463) data time 0.0006 (0.0017) model time 0.4456 (0.4449) loss 4.1998 (3.7047) grad_norm 1.1489 (1.4183) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][460/625] eta 0:01:13 lr 0.001197 wd 0.0500 time 0.4530 (0.4463) data time 0.0008 (0.0017) model time 0.4522 (0.4449) loss 4.2501 (3.7031) grad_norm 1.7366 (1.4157) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][470/625] eta 0:01:09 lr 0.001197 wd 0.0500 time 0.4455 (0.4462) data time 0.0009 (0.0017) model time 0.4446 (0.4449) loss 4.0378 (3.7082) grad_norm 1.1619 (1.4160) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][480/625] eta 0:01:04 lr 0.001197 wd 0.0500 time 0.4451 (0.4466) data time 0.0008 (0.0016) model time 0.4443 (0.4453) loss 2.2110 (3.7035) grad_norm 1.3857 (1.4137) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][490/625] eta 0:01:00 lr 0.001197 wd 0.0500 time 0.4433 (0.4466) data time 0.0009 (0.0016) model time 0.4424 (0.4453) loss 3.0868 (3.6970) grad_norm 1.4547 (1.4117) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][500/625] eta 0:00:55 lr 0.001197 wd 0.0500 time 0.4419 (0.4465) data time 0.0008 (0.0016) model time 0.4411 (0.4452) loss 4.3624 (3.6993) grad_norm 1.5264 (1.4105) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][510/625] eta 0:00:51 lr 0.001197 wd 0.0500 time 0.4405 (0.4464) data time 0.0009 (0.0016) model time 0.4396 (0.4452) loss 3.7818 (3.6993) grad_norm 1.2307 (1.4113) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][520/625] eta 0:00:46 lr 0.001197 wd 0.0500 time 0.4422 (0.4464) data time 0.0006 (0.0016) model time 0.4416 (0.4451) loss 4.6252 (3.7040) grad_norm 1.6897 (1.4104) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][530/625] eta 0:00:42 lr 0.001197 wd 0.0500 time 0.4388 (0.4463) data time 0.0007 (0.0016) model time 0.4381 (0.4450) loss 3.9324 (3.7019) grad_norm 1.7125 (1.4098) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][540/625] eta 0:00:37 lr 0.001197 wd 0.0500 time 0.4443 (0.4463) data time 0.0006 (0.0016) model time 0.4438 (0.4450) loss 3.8753 (3.7053) grad_norm 1.6283 (1.4099) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:57:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][550/625] eta 0:00:33 lr 0.001197 wd 0.0500 time 0.4442 (0.4462) data time 0.0006 (0.0015) model time 0.4436 (0.4449) loss 4.3551 (3.7024) grad_norm 1.0164 (1.4093) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][560/625] eta 0:00:29 lr 0.001197 wd 0.0500 time 0.4438 (0.4462) data time 0.0008 (0.0015) model time 0.4430 (0.4449) loss 3.9023 (3.7023) grad_norm 1.8849 (1.4139) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][570/625] eta 0:00:24 lr 0.001197 wd 0.0500 time 0.4440 (0.4461) data time 0.0009 (0.0015) model time 0.4432 (0.4448) loss 3.4865 (3.7036) grad_norm 1.0555 (1.4136) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][580/625] eta 0:00:20 lr 0.001197 wd 0.0500 time 0.4511 (0.4461) data time 0.0006 (0.0015) model time 0.4505 (0.4448) loss 2.6423 (3.7061) grad_norm 1.8023 (1.4146) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][590/625] eta 0:00:15 lr 0.001197 wd 0.0500 time 0.4425 (0.4461) data time 0.0007 (0.0015) model time 0.4418 (0.4449) loss 4.6049 (3.7072) grad_norm 1.7762 (1.4220) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][600/625] eta 0:00:11 lr 0.001197 wd 0.0500 time 0.4424 (0.4460) data time 0.0006 (0.0015) model time 0.4418 (0.4448) loss 4.7586 (3.7139) grad_norm 1.4771 (1.4189) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][610/625] eta 0:00:06 lr 0.001197 wd 0.0500 time 0.4391 (0.4460) data time 0.0006 (0.0015) model time 0.4385 (0.4447) loss 4.0669 (3.7149) grad_norm 1.1727 (1.4188) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [28/300][620/625] eta 0:00:02 lr 0.001197 wd 0.0500 time 0.4391 (0.4462) data time 0.0004 (0.0015) model time 0.4387 (0.4450) loss 4.4447 (3.7137) grad_norm 1.1569 (1.4152) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 28 training takes 0:04:38 [2024-08-04 11:58:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 11:58:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 11:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.7671 (0.7671) Acc@1 83.057 (83.057) Acc@5 96.533 (96.533) Mem 16696MB [2024-08-04 11:58:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.3164 (0.9407) Acc@1 68.066 (77.730) Acc@5 90.234 (94.882) Mem 16696MB [2024-08-04 11:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.134) Loss 1.4521 (1.1410) Acc@1 66.309 (73.272) Acc@5 88.965 (92.136) Mem 16696MB [2024-08-04 11:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.171 Acc@5 92.087 [2024-08-04 11:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.2% [2024-08-04 11:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 73.17% [2024-08-04 11:58:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 11:58:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 11:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.7617 (0.7617) Acc@1 80.518 (80.518) Acc@5 94.922 (94.922) Mem 16696MB [2024-08-04 11:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 1.3857 (0.9684) Acc@1 64.990 (74.228) Acc@5 87.939 (93.137) Mem 16696MB [2024-08-04 11:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.133) Loss 1.5527 (1.2026) Acc@1 62.305 (69.657) Acc@5 84.961 (89.972) Mem 16696MB [2024-08-04 11:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 69.612 Acc@5 89.965 [2024-08-04 11:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 69.6% [2024-08-04 11:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 69.61% [2024-08-04 11:58:40 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 11:58:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 11:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][0/625] eta 0:07:35 lr 0.001197 wd 0.0500 time 0.7291 (0.7291) data time 0.3430 (0.3430) model time 0.0000 (0.0000) loss 3.2473 (3.2473) grad_norm 1.5605 (1.5605) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][10/625] eta 0:04:49 lr 0.001197 wd 0.0500 time 0.4420 (0.4706) data time 0.0006 (0.0319) model time 0.0000 (0.0000) loss 4.1718 (3.6430) grad_norm 1.3076 (1.4553) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][20/625] eta 0:04:37 lr 0.001197 wd 0.0500 time 0.4414 (0.4585) data time 0.0009 (0.0171) model time 0.0000 (0.0000) loss 3.5747 (3.7465) grad_norm 1.2093 (1.3732) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:58:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][30/625] eta 0:04:29 lr 0.001197 wd 0.0500 time 0.4450 (0.4537) data time 0.0007 (0.0118) model time 0.0000 (0.0000) loss 3.0547 (3.7163) grad_norm 1.1593 (1.4024) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][40/625] eta 0:04:24 lr 0.001197 wd 0.0500 time 0.4527 (0.4514) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 3.9042 (3.7033) grad_norm 1.4165 (1.3696) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][50/625] eta 0:04:18 lr 0.001197 wd 0.0500 time 0.4466 (0.4501) data time 0.0007 (0.0075) model time 0.0000 (0.0000) loss 3.6872 (3.7477) grad_norm 1.2878 (1.3443) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][60/625] eta 0:04:15 lr 0.001197 wd 0.0500 time 0.6579 (0.4525) data time 0.0008 (0.0064) model time 0.6570 (0.4644) loss 3.5552 (3.7231) grad_norm 1.8197 (1.3637) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][70/625] eta 0:04:10 lr 0.001197 wd 0.0500 time 0.4472 (0.4506) data time 0.0009 (0.0056) model time 0.4464 (0.4511) loss 4.0881 (3.7190) grad_norm 1.0314 (1.3563) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][80/625] eta 0:04:05 lr 0.001197 wd 0.0500 time 0.4436 (0.4496) data time 0.0008 (0.0050) model time 0.4428 (0.4481) loss 3.8077 (3.7226) grad_norm 4.0624 (1.4038) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][90/625] eta 0:04:00 lr 0.001197 wd 0.0500 time 0.4457 (0.4490) data time 0.0007 (0.0046) model time 0.4451 (0.4470) loss 3.9929 (3.7486) grad_norm 1.0775 (1.4085) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][100/625] eta 0:03:55 lr 0.001197 wd 0.0500 time 0.4392 (0.4485) data time 0.0008 (0.0042) model time 0.4384 (0.4461) loss 4.5906 (3.7634) grad_norm 2.2036 (1.4310) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][110/625] eta 0:03:50 lr 0.001197 wd 0.0500 time 0.4401 (0.4480) data time 0.0007 (0.0039) model time 0.4394 (0.4455) loss 4.5487 (3.7680) grad_norm 1.6402 (1.4231) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][120/625] eta 0:03:46 lr 0.001197 wd 0.0500 time 0.4448 (0.4477) data time 0.0006 (0.0036) model time 0.4442 (0.4453) loss 4.1384 (3.7756) grad_norm 1.8432 (1.4176) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][130/625] eta 0:03:41 lr 0.001197 wd 0.0500 time 0.4469 (0.4474) data time 0.0006 (0.0034) model time 0.4462 (0.4450) loss 4.0695 (3.7824) grad_norm 1.0094 (1.4131) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][140/625] eta 0:03:36 lr 0.001197 wd 0.0500 time 0.4434 (0.4472) data time 0.0007 (0.0032) model time 0.4427 (0.4448) loss 4.0057 (3.7941) grad_norm 1.3238 (1.4017) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][150/625] eta 0:03:32 lr 0.001197 wd 0.0500 time 0.4469 (0.4471) data time 0.0006 (0.0031) model time 0.4462 (0.4448) loss 4.1901 (3.7961) grad_norm 1.0425 (1.3921) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][160/625] eta 0:03:27 lr 0.001197 wd 0.0500 time 0.4461 (0.4470) data time 0.0009 (0.0029) model time 0.4453 (0.4447) loss 3.7432 (3.8031) grad_norm 1.2270 (1.3808) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 11:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][170/625] eta 0:03:23 lr 0.001197 wd 0.0500 time 0.4488 (0.4468) data time 0.0007 (0.0028) model time 0.4481 (0.4447) loss 3.6281 (3.7848) grad_norm 1.3670 (1.3803) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][180/625] eta 0:03:18 lr 0.001197 wd 0.0500 time 0.4416 (0.4467) data time 0.0008 (0.0027) model time 0.4408 (0.4446) loss 3.6372 (3.7998) grad_norm 1.3806 (1.3776) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][190/625] eta 0:03:14 lr 0.001197 wd 0.0500 time 0.4421 (0.4474) data time 0.0008 (0.0026) model time 0.4412 (0.4457) loss 3.8880 (3.7959) grad_norm 1.6772 (1.3773) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][200/625] eta 0:03:10 lr 0.001197 wd 0.0500 time 0.4473 (0.4474) data time 0.0008 (0.0025) model time 0.4465 (0.4456) loss 4.0431 (3.8041) grad_norm 1.4450 (1.3883) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][210/625] eta 0:03:05 lr 0.001197 wd 0.0500 time 0.4471 (0.4473) data time 0.0007 (0.0024) model time 0.4464 (0.4456) loss 3.8279 (3.7939) grad_norm 1.9518 (1.4045) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][220/625] eta 0:03:01 lr 0.001197 wd 0.0500 time 0.4429 (0.4471) data time 0.0006 (0.0023) model time 0.4423 (0.4454) loss 2.8184 (3.7866) grad_norm 1.2103 (1.3990) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][230/625] eta 0:02:56 lr 0.001197 wd 0.0500 time 0.4420 (0.4470) data time 0.0008 (0.0023) model time 0.4412 (0.4453) loss 3.9051 (3.7836) grad_norm 1.0792 (1.3982) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][240/625] eta 0:02:52 lr 0.001197 wd 0.0500 time 0.4466 (0.4470) data time 0.0006 (0.0022) model time 0.4460 (0.4454) loss 4.3276 (3.7788) grad_norm 1.3675 (1.3902) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][250/625] eta 0:02:47 lr 0.001197 wd 0.0500 time 0.4439 (0.4469) data time 0.0009 (0.0022) model time 0.4430 (0.4453) loss 4.4145 (3.7777) grad_norm 1.0806 (1.3861) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][260/625] eta 0:02:43 lr 0.001197 wd 0.0500 time 0.4419 (0.4468) data time 0.0008 (0.0021) model time 0.4411 (0.4452) loss 4.0232 (3.7829) grad_norm 1.1911 (1.3884) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][270/625] eta 0:02:38 lr 0.001197 wd 0.0500 time 0.4408 (0.4467) data time 0.0007 (0.0021) model time 0.4401 (0.4451) loss 3.9756 (3.7788) grad_norm 1.5835 (1.3880) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][280/625] eta 0:02:34 lr 0.001197 wd 0.0500 time 0.4444 (0.4466) data time 0.0006 (0.0020) model time 0.4438 (0.4451) loss 3.3547 (3.7789) grad_norm 1.2504 (1.3838) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][290/625] eta 0:02:29 lr 0.001197 wd 0.0500 time 0.4491 (0.4466) data time 0.0011 (0.0020) model time 0.4481 (0.4450) loss 2.9584 (3.7805) grad_norm 1.2231 (1.3876) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][300/625] eta 0:02:25 lr 0.001197 wd 0.0500 time 0.4435 (0.4465) data time 0.0008 (0.0019) model time 0.4428 (0.4449) loss 4.1299 (3.7789) grad_norm 1.5749 (1.3867) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][310/625] eta 0:02:20 lr 0.001197 wd 0.0500 time 0.4447 (0.4464) data time 0.0006 (0.0019) model time 0.4441 (0.4449) loss 4.1889 (3.7719) grad_norm 1.9433 (1.3892) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][320/625] eta 0:02:16 lr 0.001197 wd 0.0500 time 0.4443 (0.4463) data time 0.0007 (0.0019) model time 0.4435 (0.4448) loss 4.2723 (3.7631) grad_norm 1.2870 (1.3873) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][330/625] eta 0:02:11 lr 0.001197 wd 0.0500 time 0.4420 (0.4462) data time 0.0007 (0.0018) model time 0.4413 (0.4447) loss 4.5350 (3.7613) grad_norm 2.1679 (1.3903) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][340/625] eta 0:02:07 lr 0.001197 wd 0.0500 time 0.4422 (0.4461) data time 0.0006 (0.0018) model time 0.4416 (0.4446) loss 3.2748 (3.7532) grad_norm 1.3071 (1.3910) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][350/625] eta 0:02:02 lr 0.001197 wd 0.0500 time 0.4392 (0.4460) data time 0.0008 (0.0018) model time 0.4384 (0.4445) loss 4.0977 (3.7599) grad_norm 1.1116 (1.3879) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][360/625] eta 0:01:58 lr 0.001197 wd 0.0500 time 0.4462 (0.4460) data time 0.0008 (0.0017) model time 0.4454 (0.4445) loss 3.7573 (3.7570) grad_norm 1.4019 (1.3855) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][370/625] eta 0:01:53 lr 0.001197 wd 0.0500 time 0.4432 (0.4459) data time 0.0006 (0.0017) model time 0.4426 (0.4445) loss 3.3243 (3.7562) grad_norm 0.9982 (1.3829) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][380/625] eta 0:01:49 lr 0.001197 wd 0.0500 time 0.4411 (0.4459) data time 0.0006 (0.0017) model time 0.4405 (0.4445) loss 3.8234 (3.7587) grad_norm 1.5885 (1.3846) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][390/625] eta 0:01:44 lr 0.001197 wd 0.0500 time 0.4455 (0.4458) data time 0.0008 (0.0017) model time 0.4447 (0.4444) loss 3.5852 (3.7582) grad_norm 1.0267 (1.3849) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][400/625] eta 0:01:40 lr 0.001197 wd 0.0500 time 0.4525 (0.4462) data time 0.0006 (0.0016) model time 0.4520 (0.4448) loss 4.5583 (3.7600) grad_norm 1.3569 (1.3869) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][410/625] eta 0:01:36 lr 0.001197 wd 0.0500 time 0.4422 (0.4466) data time 0.0010 (0.0016) model time 0.4413 (0.4453) loss 2.9007 (3.7547) grad_norm 1.1175 (1.3868) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][420/625] eta 0:01:31 lr 0.001197 wd 0.0500 time 0.4415 (0.4465) data time 0.0006 (0.0016) model time 0.4409 (0.4453) loss 4.7487 (3.7460) grad_norm 1.3265 (1.3853) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][430/625] eta 0:01:27 lr 0.001196 wd 0.0500 time 0.4447 (0.4465) data time 0.0008 (0.0016) model time 0.4439 (0.4452) loss 3.6334 (3.7500) grad_norm 1.7933 (1.3848) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][440/625] eta 0:01:22 lr 0.001196 wd 0.0500 time 0.4409 (0.4464) data time 0.0006 (0.0016) model time 0.4403 (0.4451) loss 2.4499 (3.7468) grad_norm 1.0446 (1.3866) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][450/625] eta 0:01:18 lr 0.001196 wd 0.0500 time 0.4438 (0.4463) data time 0.0006 (0.0015) model time 0.4432 (0.4451) loss 4.0761 (3.7473) grad_norm 1.1639 (1.3859) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][460/625] eta 0:01:13 lr 0.001196 wd 0.0500 time 0.4442 (0.4462) data time 0.0008 (0.0015) model time 0.4434 (0.4450) loss 3.4471 (3.7442) grad_norm 1.4767 (1.3926) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][470/625] eta 0:01:09 lr 0.001196 wd 0.0500 time 0.4447 (0.4462) data time 0.0008 (0.0015) model time 0.4439 (0.4449) loss 4.2365 (3.7428) grad_norm 1.2999 (1.3908) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][480/625] eta 0:01:04 lr 0.001196 wd 0.0500 time 0.4424 (0.4461) data time 0.0006 (0.0015) model time 0.4418 (0.4449) loss 2.3744 (3.7379) grad_norm 1.6816 (1.3865) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][490/625] eta 0:01:00 lr 0.001196 wd 0.0500 time 0.4439 (0.4461) data time 0.0008 (0.0015) model time 0.4432 (0.4449) loss 3.3995 (3.7376) grad_norm 1.3951 (1.3864) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][500/625] eta 0:00:55 lr 0.001196 wd 0.0500 time 0.4453 (0.4461) data time 0.0007 (0.0015) model time 0.4446 (0.4448) loss 2.8500 (3.7298) grad_norm 1.2176 (1.3839) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][510/625] eta 0:00:51 lr 0.001196 wd 0.0500 time 0.4497 (0.4460) data time 0.0008 (0.0015) model time 0.4489 (0.4448) loss 3.7979 (3.7280) grad_norm 1.3529 (1.3882) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][520/625] eta 0:00:46 lr 0.001196 wd 0.0500 time 0.4414 (0.4460) data time 0.0005 (0.0014) model time 0.4409 (0.4448) loss 2.4764 (3.7291) grad_norm 1.4914 (1.3948) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][530/625] eta 0:00:42 lr 0.001196 wd 0.0500 time 0.4469 (0.4460) data time 0.0006 (0.0014) model time 0.4462 (0.4448) loss 4.0966 (3.7302) grad_norm 1.2446 (1.3945) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][540/625] eta 0:00:37 lr 0.001196 wd 0.0500 time 0.4433 (0.4460) data time 0.0006 (0.0014) model time 0.4427 (0.4448) loss 4.2838 (3.7295) grad_norm 1.0526 (1.3941) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][550/625] eta 0:00:33 lr 0.001196 wd 0.0500 time 0.6321 (0.4463) data time 0.0006 (0.0014) model time 0.6315 (0.4451) loss 4.3060 (3.7282) grad_norm 0.9019 (1.3902) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][560/625] eta 0:00:29 lr 0.001196 wd 0.0500 time 0.4477 (0.4462) data time 0.0009 (0.0014) model time 0.4468 (0.4450) loss 3.7399 (3.7275) grad_norm 1.1028 (1.3876) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][570/625] eta 0:00:24 lr 0.001196 wd 0.0500 time 0.4408 (0.4462) data time 0.0007 (0.0014) model time 0.4401 (0.4450) loss 4.1369 (3.7337) grad_norm 2.5245 (1.3911) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][580/625] eta 0:00:20 lr 0.001196 wd 0.0500 time 0.4459 (0.4461) data time 0.0008 (0.0014) model time 0.4451 (0.4450) loss 3.3647 (3.7300) grad_norm 1.2624 (1.3917) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][590/625] eta 0:00:15 lr 0.001196 wd 0.0500 time 0.4437 (0.4462) data time 0.0006 (0.0014) model time 0.4431 (0.4451) loss 3.1714 (3.7251) grad_norm 1.3155 (1.3913) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][600/625] eta 0:00:11 lr 0.001196 wd 0.0500 time 0.4469 (0.4462) data time 0.0006 (0.0014) model time 0.4463 (0.4451) loss 3.8360 (3.7262) grad_norm 1.3098 (1.3908) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][610/625] eta 0:00:06 lr 0.001196 wd 0.0500 time 0.4437 (0.4462) data time 0.0004 (0.0013) model time 0.4433 (0.4451) loss 4.1661 (3.7210) grad_norm 1.3842 (1.3949) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [29/300][620/625] eta 0:00:02 lr 0.001196 wd 0.0500 time 0.4377 (0.4461) data time 0.0005 (0.0013) model time 0.4372 (0.4450) loss 3.4490 (3.7176) grad_norm 1.3224 (1.3978) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 29 training takes 0:04:38 [2024-08-04 12:03:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 12:03:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 12:03:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.7710 (0.7710) Acc@1 84.180 (84.180) Acc@5 96.436 (96.436) Mem 16696MB [2024-08-04 12:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.151) Loss 1.3408 (0.9689) Acc@1 66.895 (77.601) Acc@5 90.381 (94.949) Mem 16696MB [2024-08-04 12:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 1.4648 (1.1678) Acc@1 66.406 (73.307) Acc@5 88.574 (92.327) Mem 16696MB [2024-08-04 12:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.311 Acc@5 92.280 [2024-08-04 12:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.3% [2024-08-04 12:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 73.31% [2024-08-04 12:03:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 12:03:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 12:03:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.7256 (0.7256) Acc@1 81.299 (81.299) Acc@5 95.166 (95.166) Mem 16696MB [2024-08-04 12:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.3311 (0.9267) Acc@1 66.113 (75.231) Acc@5 88.818 (93.648) Mem 16696MB [2024-08-04 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.5010 (1.1569) Acc@1 63.525 (70.771) Acc@5 85.840 (90.590) Mem 16696MB [2024-08-04 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 70.687 Acc@5 90.571 [2024-08-04 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 70.7% [2024-08-04 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 70.69% [2024-08-04 12:03:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 12:03:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 12:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][0/625] eta 0:08:25 lr 0.001196 wd 0.0500 time 0.8095 (0.8095) data time 0.4150 (0.4150) model time 0.0000 (0.0000) loss 2.5091 (2.5091) grad_norm 1.0208 (1.0208) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][10/625] eta 0:04:53 lr 0.001196 wd 0.0500 time 0.4448 (0.4770) data time 0.0009 (0.0385) model time 0.0000 (0.0000) loss 3.2404 (3.4618) grad_norm 1.6979 (1.2449) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][20/625] eta 0:04:39 lr 0.001196 wd 0.0500 time 0.4432 (0.4627) data time 0.0009 (0.0206) model time 0.0000 (0.0000) loss 3.8570 (3.5670) grad_norm 1.0190 (1.3156) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][30/625] eta 0:04:32 lr 0.001196 wd 0.0500 time 0.4463 (0.4573) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 3.6336 (3.6308) grad_norm 2.1318 (1.3760) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][40/625] eta 0:04:25 lr 0.001196 wd 0.0500 time 0.4474 (0.4545) data time 0.0006 (0.0109) model time 0.0000 (0.0000) loss 2.7591 (3.6281) grad_norm 1.6877 (1.3852) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][50/625] eta 0:04:20 lr 0.001196 wd 0.0500 time 0.4449 (0.4523) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 4.3482 (3.6662) grad_norm 1.1559 (1.4127) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][60/625] eta 0:04:16 lr 0.001196 wd 0.0500 time 0.6318 (0.4542) data time 0.0008 (0.0076) model time 0.6310 (0.4629) loss 3.6533 (3.6874) grad_norm 1.1784 (1.3847) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][70/625] eta 0:04:10 lr 0.001196 wd 0.0500 time 0.4393 (0.4518) data time 0.0010 (0.0067) model time 0.4383 (0.4497) loss 3.2983 (3.6700) grad_norm 1.1190 (1.4140) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][80/625] eta 0:04:05 lr 0.001196 wd 0.0500 time 0.4404 (0.4507) data time 0.0008 (0.0059) model time 0.4395 (0.4472) loss 3.0256 (3.6670) grad_norm 1.4021 (1.4236) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][90/625] eta 0:04:00 lr 0.001196 wd 0.0500 time 0.4452 (0.4498) data time 0.0008 (0.0054) model time 0.4444 (0.4459) loss 3.0197 (3.6669) grad_norm 1.9353 (1.4065) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][100/625] eta 0:03:55 lr 0.001196 wd 0.0500 time 0.4432 (0.4491) data time 0.0008 (0.0049) model time 0.4424 (0.4452) loss 3.3618 (3.6311) grad_norm 1.1974 (1.4155) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][110/625] eta 0:03:51 lr 0.001196 wd 0.0500 time 0.4436 (0.4487) data time 0.0008 (0.0045) model time 0.4429 (0.4449) loss 3.7704 (3.6420) grad_norm 1.5228 (1.4196) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][120/625] eta 0:03:47 lr 0.001196 wd 0.0500 time 0.4445 (0.4503) data time 0.0008 (0.0042) model time 0.4437 (0.4480) loss 4.1267 (3.6556) grad_norm 1.7101 (1.4210) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][130/625] eta 0:03:42 lr 0.001196 wd 0.0500 time 0.4457 (0.4498) data time 0.0008 (0.0040) model time 0.4449 (0.4474) loss 2.7920 (3.6618) grad_norm 1.3160 (1.4153) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][140/625] eta 0:03:37 lr 0.001196 wd 0.0500 time 0.4431 (0.4493) data time 0.0007 (0.0037) model time 0.4424 (0.4469) loss 3.9091 (3.6850) grad_norm 1.0745 (1.4172) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][150/625] eta 0:03:33 lr 0.001196 wd 0.0500 time 0.4434 (0.4491) data time 0.0006 (0.0035) model time 0.4428 (0.4466) loss 4.1896 (3.6914) grad_norm 1.2717 (1.4097) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][160/625] eta 0:03:28 lr 0.001196 wd 0.0500 time 0.4437 (0.4489) data time 0.0007 (0.0034) model time 0.4430 (0.4465) loss 4.2293 (3.6852) grad_norm 1.2602 (1.4051) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][170/625] eta 0:03:24 lr 0.001196 wd 0.0500 time 0.4449 (0.4486) data time 0.0006 (0.0032) model time 0.4443 (0.4463) loss 3.4751 (3.6767) grad_norm 1.2756 (1.4003) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][180/625] eta 0:03:19 lr 0.001196 wd 0.0500 time 0.4429 (0.4484) data time 0.0007 (0.0031) model time 0.4422 (0.4461) loss 3.4648 (3.6847) grad_norm 1.2707 (1.3937) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][190/625] eta 0:03:14 lr 0.001196 wd 0.0500 time 0.4432 (0.4482) data time 0.0006 (0.0030) model time 0.4426 (0.4459) loss 3.2769 (3.6651) grad_norm 1.3030 (1.3896) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][200/625] eta 0:03:10 lr 0.001196 wd 0.0500 time 0.4430 (0.4479) data time 0.0006 (0.0029) model time 0.4424 (0.4456) loss 3.5879 (3.6683) grad_norm 1.3165 (1.3860) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][210/625] eta 0:03:05 lr 0.001196 wd 0.0500 time 0.4456 (0.4477) data time 0.0006 (0.0028) model time 0.4450 (0.4455) loss 3.3595 (3.6485) grad_norm 1.8017 (1.3968) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][220/625] eta 0:03:01 lr 0.001196 wd 0.0500 time 0.4420 (0.4476) data time 0.0007 (0.0027) model time 0.4413 (0.4454) loss 3.8778 (3.6527) grad_norm 1.0829 (1.3987) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][230/625] eta 0:02:56 lr 0.001196 wd 0.0500 time 0.4431 (0.4474) data time 0.0006 (0.0026) model time 0.4425 (0.4452) loss 3.4224 (3.6592) grad_norm 1.0107 (1.3983) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][240/625] eta 0:02:52 lr 0.001196 wd 0.0500 time 0.4440 (0.4473) data time 0.0009 (0.0025) model time 0.4430 (0.4451) loss 3.7803 (3.6532) grad_norm 1.2540 (1.3997) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][250/625] eta 0:02:47 lr 0.001196 wd 0.0500 time 0.4438 (0.4471) data time 0.0008 (0.0024) model time 0.4430 (0.4451) loss 3.7648 (3.6447) grad_norm 1.1971 (1.3972) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][260/625] eta 0:02:43 lr 0.001196 wd 0.0500 time 0.4427 (0.4470) data time 0.0008 (0.0024) model time 0.4419 (0.4449) loss 3.9955 (3.6518) grad_norm 1.7888 (1.3990) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][270/625] eta 0:02:38 lr 0.001196 wd 0.0500 time 0.4426 (0.4468) data time 0.0008 (0.0023) model time 0.4418 (0.4448) loss 4.2115 (3.6547) grad_norm 1.1652 (1.4043) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][280/625] eta 0:02:34 lr 0.001196 wd 0.0500 time 0.4442 (0.4467) data time 0.0006 (0.0023) model time 0.4436 (0.4447) loss 3.4562 (3.6544) grad_norm 1.0155 (1.3939) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][290/625] eta 0:02:29 lr 0.001196 wd 0.0500 time 0.4430 (0.4466) data time 0.0009 (0.0022) model time 0.4422 (0.4446) loss 3.7626 (3.6487) grad_norm 1.4559 (1.3949) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][300/625] eta 0:02:25 lr 0.001196 wd 0.0500 time 0.4458 (0.4465) data time 0.0005 (0.0022) model time 0.4453 (0.4446) loss 4.0500 (3.6452) grad_norm 1.1605 (1.4074) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][310/625] eta 0:02:20 lr 0.001196 wd 0.0500 time 0.4444 (0.4464) data time 0.0008 (0.0021) model time 0.4435 (0.4445) loss 3.7020 (3.6524) grad_norm 1.1369 (1.4047) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][320/625] eta 0:02:16 lr 0.001196 wd 0.0500 time 0.4427 (0.4463) data time 0.0006 (0.0021) model time 0.4421 (0.4444) loss 4.2194 (3.6588) grad_norm 1.5726 (1.4060) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:05:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][330/625] eta 0:02:11 lr 0.001196 wd 0.0500 time 0.4457 (0.4462) data time 0.0007 (0.0020) model time 0.4450 (0.4443) loss 4.0484 (3.6630) grad_norm 1.1270 (1.4103) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][340/625] eta 0:02:07 lr 0.001196 wd 0.0500 time 0.4422 (0.4461) data time 0.0008 (0.0020) model time 0.4413 (0.4442) loss 3.4490 (3.6591) grad_norm 1.7396 (1.4153) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][350/625] eta 0:02:02 lr 0.001196 wd 0.0500 time 0.4423 (0.4461) data time 0.0008 (0.0020) model time 0.4415 (0.4442) loss 3.9240 (3.6716) grad_norm 1.2380 (1.4126) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][360/625] eta 0:01:58 lr 0.001196 wd 0.0500 time 0.4419 (0.4460) data time 0.0010 (0.0019) model time 0.4409 (0.4442) loss 3.9562 (3.6833) grad_norm 1.2725 (1.4141) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][370/625] eta 0:01:53 lr 0.001196 wd 0.0500 time 0.4461 (0.4460) data time 0.0008 (0.0019) model time 0.4453 (0.4442) loss 3.7411 (3.6786) grad_norm 1.5920 (1.4101) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][380/625] eta 0:01:49 lr 0.001196 wd 0.0500 time 0.4414 (0.4459) data time 0.0008 (0.0019) model time 0.4405 (0.4441) loss 2.3022 (3.6721) grad_norm 1.5467 (1.4055) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][390/625] eta 0:01:44 lr 0.001196 wd 0.0500 time 0.4467 (0.4459) data time 0.0009 (0.0019) model time 0.4458 (0.4442) loss 2.7855 (3.6672) grad_norm 1.8859 (1.4144) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][400/625] eta 0:01:40 lr 0.001196 wd 0.0500 time 0.4463 (0.4462) data time 0.0008 (0.0018) model time 0.4455 (0.4446) loss 3.8149 (3.6669) grad_norm 1.1389 (1.4110) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][410/625] eta 0:01:35 lr 0.001196 wd 0.0500 time 0.4402 (0.4462) data time 0.0008 (0.0018) model time 0.4394 (0.4446) loss 3.1374 (3.6669) grad_norm 1.1553 (1.4094) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][420/625] eta 0:01:31 lr 0.001196 wd 0.0500 time 0.4443 (0.4462) data time 0.0006 (0.0018) model time 0.4436 (0.4445) loss 4.0198 (3.6759) grad_norm 1.1720 (1.4058) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][430/625] eta 0:01:26 lr 0.001196 wd 0.0500 time 0.4434 (0.4461) data time 0.0010 (0.0018) model time 0.4424 (0.4445) loss 4.0539 (3.6799) grad_norm 1.7956 (1.4043) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][440/625] eta 0:01:22 lr 0.001196 wd 0.0500 time 0.4444 (0.4460) data time 0.0008 (0.0017) model time 0.4436 (0.4444) loss 3.5492 (3.6808) grad_norm 1.6048 (1.4081) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][450/625] eta 0:01:18 lr 0.001196 wd 0.0500 time 0.4462 (0.4464) data time 0.0006 (0.0017) model time 0.4456 (0.4449) loss 3.3803 (3.6809) grad_norm 1.3239 (1.4053) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][460/625] eta 0:01:13 lr 0.001196 wd 0.0500 time 0.4455 (0.4464) data time 0.0006 (0.0017) model time 0.4449 (0.4448) loss 4.6850 (3.6862) grad_norm 1.6074 (1.4050) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][470/625] eta 0:01:09 lr 0.001196 wd 0.0500 time 0.4455 (0.4463) data time 0.0006 (0.0017) model time 0.4448 (0.4448) loss 3.5959 (3.6906) grad_norm 1.4040 (1.4024) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][480/625] eta 0:01:04 lr 0.001196 wd 0.0500 time 0.4462 (0.4463) data time 0.0010 (0.0017) model time 0.4452 (0.4448) loss 3.9438 (3.6960) grad_norm 1.5746 (1.4002) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][490/625] eta 0:01:00 lr 0.001196 wd 0.0500 time 0.4436 (0.4463) data time 0.0008 (0.0016) model time 0.4427 (0.4448) loss 3.8405 (3.6961) grad_norm 1.1380 (1.4032) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][500/625] eta 0:00:55 lr 0.001196 wd 0.0500 time 0.4406 (0.4462) data time 0.0006 (0.0016) model time 0.4400 (0.4447) loss 2.7601 (3.6935) grad_norm 1.3047 (1.4076) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][510/625] eta 0:00:51 lr 0.001196 wd 0.0500 time 0.4475 (0.4462) data time 0.0009 (0.0016) model time 0.4466 (0.4447) loss 3.8103 (3.6866) grad_norm 1.4159 (1.4047) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][520/625] eta 0:00:46 lr 0.001196 wd 0.0500 time 0.4465 (0.4461) data time 0.0005 (0.0016) model time 0.4460 (0.4447) loss 4.4110 (3.6858) grad_norm 1.5450 (1.4029) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][530/625] eta 0:00:42 lr 0.001196 wd 0.0500 time 0.4456 (0.4461) data time 0.0006 (0.0016) model time 0.4449 (0.4446) loss 3.1689 (3.6825) grad_norm 1.6749 (1.4042) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][540/625] eta 0:00:37 lr 0.001196 wd 0.0500 time 0.4406 (0.4461) data time 0.0007 (0.0016) model time 0.4400 (0.4446) loss 3.9945 (3.6859) grad_norm 1.2591 (1.4045) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][550/625] eta 0:00:33 lr 0.001196 wd 0.0500 time 0.4488 (0.4461) data time 0.0006 (0.0016) model time 0.4482 (0.4446) loss 4.2054 (3.6842) grad_norm 0.9194 (1.4013) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][560/625] eta 0:00:28 lr 0.001196 wd 0.0500 time 0.4450 (0.4460) data time 0.0006 (0.0015) model time 0.4443 (0.4446) loss 3.2641 (3.6810) grad_norm 1.4988 (1.4034) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][570/625] eta 0:00:24 lr 0.001196 wd 0.0500 time 0.4486 (0.4460) data time 0.0008 (0.0015) model time 0.4478 (0.4446) loss 4.1294 (3.6794) grad_norm 1.3360 (1.4058) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][580/625] eta 0:00:20 lr 0.001196 wd 0.0500 time 0.4455 (0.4460) data time 0.0011 (0.0015) model time 0.4444 (0.4446) loss 2.5422 (3.6752) grad_norm 1.2180 (1.4049) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][590/625] eta 0:00:15 lr 0.001196 wd 0.0500 time 0.4447 (0.4460) data time 0.0008 (0.0015) model time 0.4439 (0.4446) loss 4.1499 (3.6773) grad_norm 2.1145 (1.4092) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][600/625] eta 0:00:11 lr 0.001196 wd 0.0500 time 0.4464 (0.4460) data time 0.0008 (0.0015) model time 0.4456 (0.4446) loss 3.8645 (3.6807) grad_norm 1.5434 (1.4131) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][610/625] eta 0:00:06 lr 0.001196 wd 0.0500 time 0.4386 (0.4460) data time 0.0006 (0.0015) model time 0.4381 (0.4446) loss 3.9983 (3.6807) grad_norm 1.3190 (1.4134) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [30/300][620/625] eta 0:00:02 lr 0.001195 wd 0.0500 time 0.4394 (0.4459) data time 0.0005 (0.0015) model time 0.4389 (0.4445) loss 3.5816 (3.6795) grad_norm 1.3581 (1.4139) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 30 training takes 0:04:38 [2024-08-04 12:08:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 12:08:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 12:08:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.7676 (0.7676) Acc@1 83.203 (83.203) Acc@5 96.582 (96.582) Mem 16696MB [2024-08-04 12:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.152) Loss 1.2988 (0.9241) Acc@1 68.506 (78.209) Acc@5 90.869 (95.117) Mem 16696MB [2024-08-04 12:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 1.5010 (1.1259) Acc@1 65.137 (73.879) Acc@5 87.793 (92.487) Mem 16696MB [2024-08-04 12:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.818 Acc@5 92.426 [2024-08-04 12:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 73.8% [2024-08-04 12:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 73.82% [2024-08-04 12:08:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 12:08:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 12:08:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.6973 (0.6973) Acc@1 81.787 (81.787) Acc@5 95.459 (95.459) Mem 16696MB [2024-08-04 12:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.2861 (0.8929) Acc@1 67.285 (76.119) Acc@5 89.209 (94.016) Mem 16696MB [2024-08-04 12:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.135) Loss 1.4541 (1.1180) Acc@1 64.307 (71.591) Acc@5 86.328 (91.053) Mem 16696MB [2024-08-04 12:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 71.541 Acc@5 91.051 [2024-08-04 12:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 71.5% [2024-08-04 12:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 71.54% [2024-08-04 12:08:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 12:08:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 12:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][0/625] eta 0:07:56 lr 0.001195 wd 0.0500 time 0.7624 (0.7624) data time 0.3800 (0.3800) model time 0.0000 (0.0000) loss 4.3327 (4.3327) grad_norm 1.4583 (1.4583) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][10/625] eta 0:04:51 lr 0.001195 wd 0.0500 time 0.4444 (0.4735) data time 0.0007 (0.0353) model time 0.0000 (0.0000) loss 2.8706 (3.5319) grad_norm 1.2807 (1.2655) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][20/625] eta 0:04:37 lr 0.001195 wd 0.0500 time 0.4399 (0.4595) data time 0.0007 (0.0189) model time 0.0000 (0.0000) loss 3.6476 (3.4138) grad_norm 1.8490 (1.2968) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][30/625] eta 0:04:37 lr 0.001195 wd 0.0500 time 0.4442 (0.4666) data time 0.0007 (0.0131) model time 0.0000 (0.0000) loss 4.2782 (3.5574) grad_norm 1.3324 (1.3597) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][40/625] eta 0:04:29 lr 0.001195 wd 0.0500 time 0.4437 (0.4609) data time 0.0007 (0.0101) model time 0.0000 (0.0000) loss 4.2308 (3.6011) grad_norm 0.9764 (1.3087) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][50/625] eta 0:04:23 lr 0.001195 wd 0.0500 time 0.4513 (0.4577) data time 0.0009 (0.0083) model time 0.0000 (0.0000) loss 3.9030 (3.6909) grad_norm 1.2252 (1.2682) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][60/625] eta 0:04:19 lr 0.001195 wd 0.0500 time 0.6636 (0.4588) data time 0.0008 (0.0070) model time 0.6628 (0.4637) loss 4.0076 (3.6822) grad_norm 1.8302 (1.2797) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][70/625] eta 0:04:13 lr 0.001195 wd 0.0500 time 0.4402 (0.4559) data time 0.0011 (0.0062) model time 0.4392 (0.4505) loss 2.6347 (3.6945) grad_norm 1.3347 (1.3091) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][80/625] eta 0:04:07 lr 0.001195 wd 0.0500 time 0.4513 (0.4546) data time 0.0008 (0.0055) model time 0.4505 (0.4486) loss 3.6428 (3.7146) grad_norm 1.4627 (1.3170) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][90/625] eta 0:04:02 lr 0.001195 wd 0.0500 time 0.4447 (0.4533) data time 0.0006 (0.0050) model time 0.4440 (0.4468) loss 4.0369 (3.7184) grad_norm 1.5065 (1.3255) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:09:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][100/625] eta 0:03:57 lr 0.001195 wd 0.0500 time 0.4436 (0.4524) data time 0.0008 (0.0046) model time 0.4428 (0.4462) loss 3.7856 (3.7213) grad_norm 1.3373 (1.3289) loss_scale 32768.0000 (18006.1782) mem 16696MB [2024-08-04 12:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][110/625] eta 0:03:52 lr 0.001195 wd 0.0500 time 0.4472 (0.4517) data time 0.0006 (0.0042) model time 0.4466 (0.4458) loss 3.8751 (3.7457) grad_norm 1.2644 (1.3532) loss_scale 32768.0000 (19336.0721) mem 16696MB [2024-08-04 12:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][120/625] eta 0:03:47 lr 0.001195 wd 0.0500 time 0.4442 (0.4512) data time 0.0006 (0.0039) model time 0.4436 (0.4456) loss 4.1386 (3.7481) grad_norm 1.2861 (1.3560) loss_scale 32768.0000 (20446.1488) mem 16696MB [2024-08-04 12:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][130/625] eta 0:03:43 lr 0.001195 wd 0.0500 time 0.4451 (0.4505) data time 0.0006 (0.0037) model time 0.4445 (0.4452) loss 2.9013 (3.7327) grad_norm 2.1559 (1.3546) loss_scale 32768.0000 (21386.7481) mem 16696MB [2024-08-04 12:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][140/625] eta 0:03:38 lr 0.001195 wd 0.0500 time 0.4496 (0.4501) data time 0.0010 (0.0035) model time 0.4487 (0.4450) loss 3.2937 (3.7354) grad_norm 1.6382 (1.3657) loss_scale 32768.0000 (22193.9291) mem 16696MB [2024-08-04 12:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][150/625] eta 0:03:33 lr 0.001195 wd 0.0500 time 0.4457 (0.4497) data time 0.0008 (0.0033) model time 0.4449 (0.4448) loss 4.4064 (3.7348) grad_norm 1.1467 (1.3655) loss_scale 32768.0000 (22894.1987) mem 16696MB [2024-08-04 12:09:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][160/625] eta 0:03:28 lr 0.001195 wd 0.0500 time 0.4450 (0.4494) data time 0.0009 (0.0032) model time 0.4441 (0.4447) loss 3.7655 (3.7382) grad_norm 1.2392 (1.3671) loss_scale 32768.0000 (23507.4783) mem 16696MB [2024-08-04 12:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][170/625] eta 0:03:24 lr 0.001195 wd 0.0500 time 0.4549 (0.4493) data time 0.0006 (0.0030) model time 0.4543 (0.4449) loss 4.7648 (3.7475) grad_norm 1.1740 (1.3673) loss_scale 32768.0000 (24049.0292) mem 16696MB [2024-08-04 12:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][180/625] eta 0:03:19 lr 0.001195 wd 0.0500 time 0.4438 (0.4490) data time 0.0006 (0.0029) model time 0.4433 (0.4448) loss 2.6145 (3.7374) grad_norm 1.2035 (1.3613) loss_scale 32768.0000 (24530.7403) mem 16696MB [2024-08-04 12:09:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][190/625] eta 0:03:15 lr 0.001195 wd 0.0500 time 0.4434 (0.4487) data time 0.0008 (0.0028) model time 0.4426 (0.4446) loss 3.3884 (3.7397) grad_norm 1.4281 (1.3641) loss_scale 32768.0000 (24962.0105) mem 16696MB [2024-08-04 12:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][200/625] eta 0:03:10 lr 0.001195 wd 0.0500 time 0.4522 (0.4485) data time 0.0007 (0.0027) model time 0.4514 (0.4446) loss 2.4937 (3.7434) grad_norm 1.3664 (1.3691) loss_scale 32768.0000 (25350.3682) mem 16696MB [2024-08-04 12:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][210/625] eta 0:03:06 lr 0.001195 wd 0.0500 time 0.4375 (0.4484) data time 0.0008 (0.0026) model time 0.4366 (0.4446) loss 2.9928 (3.7384) grad_norm 1.4180 (1.3700) loss_scale 32768.0000 (25701.9147) mem 16696MB [2024-08-04 12:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][220/625] eta 0:03:02 lr 0.001195 wd 0.0500 time 0.4484 (0.4496) data time 0.0008 (0.0025) model time 0.4477 (0.4464) loss 3.7131 (3.7428) grad_norm 0.9520 (1.3675) loss_scale 32768.0000 (26021.6471) mem 16696MB [2024-08-04 12:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][230/625] eta 0:02:57 lr 0.001195 wd 0.0500 time 0.4405 (0.4493) data time 0.0009 (0.0025) model time 0.4397 (0.4462) loss 3.9468 (3.7397) grad_norm 1.9087 (1.3849) loss_scale 32768.0000 (26313.6970) mem 16696MB [2024-08-04 12:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][240/625] eta 0:02:52 lr 0.001195 wd 0.0500 time 0.4423 (0.4491) data time 0.0009 (0.0024) model time 0.4414 (0.4460) loss 3.0175 (3.7371) grad_norm 1.5327 (1.3785) loss_scale 32768.0000 (26581.5104) mem 16696MB [2024-08-04 12:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][250/625] eta 0:02:48 lr 0.001195 wd 0.0500 time 0.4473 (0.4489) data time 0.0007 (0.0023) model time 0.4466 (0.4458) loss 4.5010 (3.7427) grad_norm 1.3993 (1.3746) loss_scale 32768.0000 (26827.9841) mem 16696MB [2024-08-04 12:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][260/625] eta 0:02:43 lr 0.001195 wd 0.0500 time 0.4413 (0.4487) data time 0.0008 (0.0023) model time 0.4405 (0.4457) loss 2.3908 (3.7323) grad_norm 1.0938 (1.3758) loss_scale 32768.0000 (27055.5709) mem 16696MB [2024-08-04 12:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][270/625] eta 0:02:39 lr 0.001195 wd 0.0500 time 0.4433 (0.4484) data time 0.0008 (0.0022) model time 0.4425 (0.4455) loss 3.2627 (3.7271) grad_norm 0.9713 (1.3751) loss_scale 32768.0000 (27266.3616) mem 16696MB [2024-08-04 12:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][280/625] eta 0:02:34 lr 0.001195 wd 0.0500 time 0.4488 (0.4482) data time 0.0006 (0.0022) model time 0.4482 (0.4453) loss 3.0940 (3.7149) grad_norm 1.3362 (1.3766) loss_scale 32768.0000 (27462.1495) mem 16696MB [2024-08-04 12:10:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][290/625] eta 0:02:30 lr 0.001195 wd 0.0500 time 0.4405 (0.4480) data time 0.0006 (0.0021) model time 0.4399 (0.4452) loss 3.1010 (3.7062) grad_norm 1.2563 (1.3747) loss_scale 32768.0000 (27644.4811) mem 16696MB [2024-08-04 12:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][300/625] eta 0:02:25 lr 0.001195 wd 0.0500 time 0.4477 (0.4480) data time 0.0006 (0.0021) model time 0.4472 (0.4452) loss 4.3476 (3.7047) grad_norm 1.5122 (1.3747) loss_scale 32768.0000 (27814.6977) mem 16696MB [2024-08-04 12:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][310/625] eta 0:02:21 lr 0.001195 wd 0.0500 time 0.4400 (0.4478) data time 0.0007 (0.0020) model time 0.4393 (0.4451) loss 3.7671 (3.7096) grad_norm 1.4931 (1.3791) loss_scale 32768.0000 (27973.9678) mem 16696MB [2024-08-04 12:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][320/625] eta 0:02:16 lr 0.001195 wd 0.0500 time 0.4427 (0.4478) data time 0.0006 (0.0020) model time 0.4421 (0.4451) loss 4.0811 (3.7042) grad_norm 1.6910 (1.3847) loss_scale 32768.0000 (28123.3146) mem 16696MB [2024-08-04 12:10:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][330/625] eta 0:02:12 lr 0.001195 wd 0.0500 time 0.4397 (0.4476) data time 0.0008 (0.0019) model time 0.4389 (0.4450) loss 3.8388 (3.7118) grad_norm 1.3044 (1.3832) loss_scale 32768.0000 (28263.6375) mem 16696MB [2024-08-04 12:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][340/625] eta 0:02:07 lr 0.001195 wd 0.0500 time 0.4455 (0.4475) data time 0.0006 (0.0019) model time 0.4449 (0.4450) loss 2.7955 (3.7009) grad_norm 1.1126 (1.3766) loss_scale 32768.0000 (28395.7302) mem 16696MB [2024-08-04 12:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][350/625] eta 0:02:03 lr 0.001195 wd 0.0500 time 0.4414 (0.4475) data time 0.0008 (0.0019) model time 0.4406 (0.4450) loss 4.2443 (3.7105) grad_norm 0.9488 (1.3701) loss_scale 32768.0000 (28520.2963) mem 16696MB [2024-08-04 12:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][360/625] eta 0:01:58 lr 0.001195 wd 0.0500 time 0.4434 (0.4474) data time 0.0007 (0.0019) model time 0.4427 (0.4449) loss 4.4648 (3.7166) grad_norm 2.6023 (1.3729) loss_scale 32768.0000 (28637.9612) mem 16696MB [2024-08-04 12:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][370/625] eta 0:01:54 lr 0.001195 wd 0.0500 time 0.4448 (0.4474) data time 0.0006 (0.0018) model time 0.4442 (0.4449) loss 3.7499 (3.7156) grad_norm 1.4991 (1.3782) loss_scale 32768.0000 (28749.2830) mem 16696MB [2024-08-04 12:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][380/625] eta 0:01:49 lr 0.001195 wd 0.0500 time 0.4456 (0.4473) data time 0.0008 (0.0018) model time 0.4448 (0.4449) loss 4.1187 (3.7139) grad_norm 1.4708 (1.3840) loss_scale 32768.0000 (28854.7612) mem 16696MB [2024-08-04 12:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][390/625] eta 0:01:45 lr 0.001195 wd 0.0500 time 0.4452 (0.4472) data time 0.0006 (0.0018) model time 0.4446 (0.4448) loss 4.2585 (3.7144) grad_norm 1.4366 (1.3857) loss_scale 32768.0000 (28954.8440) mem 16696MB [2024-08-04 12:11:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][400/625] eta 0:01:40 lr 0.001195 wd 0.0500 time 0.4414 (0.4475) data time 0.0007 (0.0017) model time 0.4408 (0.4452) loss 4.6930 (3.7074) grad_norm 1.4324 (1.3845) loss_scale 32768.0000 (29049.9352) mem 16696MB [2024-08-04 12:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][410/625] eta 0:01:36 lr 0.001195 wd 0.0500 time 0.4455 (0.4474) data time 0.0008 (0.0017) model time 0.4447 (0.4452) loss 3.2977 (3.7062) grad_norm 1.3272 (1.3868) loss_scale 32768.0000 (29140.3990) mem 16696MB [2024-08-04 12:11:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][420/625] eta 0:01:31 lr 0.001195 wd 0.0500 time 0.4494 (0.4474) data time 0.0008 (0.0017) model time 0.4486 (0.4452) loss 3.4345 (3.7042) grad_norm 0.9857 (1.3897) loss_scale 32768.0000 (29226.5653) mem 16696MB [2024-08-04 12:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][430/625] eta 0:01:27 lr 0.001195 wd 0.0500 time 0.4448 (0.4474) data time 0.0006 (0.0017) model time 0.4442 (0.4452) loss 2.9848 (3.7024) grad_norm 1.1419 (1.3879) loss_scale 32768.0000 (29308.7332) mem 16696MB [2024-08-04 12:11:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][440/625] eta 0:01:22 lr 0.001195 wd 0.0500 time 0.4526 (0.4480) data time 0.0007 (0.0017) model time 0.4519 (0.4459) loss 2.3466 (3.6922) grad_norm 1.2890 (1.3851) loss_scale 32768.0000 (29387.1746) mem 16696MB [2024-08-04 12:11:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][450/625] eta 0:01:18 lr 0.001195 wd 0.0500 time 0.4402 (0.4479) data time 0.0008 (0.0016) model time 0.4394 (0.4458) loss 2.5031 (3.6895) grad_norm 1.4073 (inf) loss_scale 16384.0000 (29098.8559) mem 16696MB [2024-08-04 12:11:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][460/625] eta 0:01:13 lr 0.001195 wd 0.0500 time 0.4374 (0.4478) data time 0.0008 (0.0016) model time 0.4366 (0.4457) loss 3.5221 (3.6925) grad_norm 1.1149 (inf) loss_scale 16384.0000 (28823.0456) mem 16696MB [2024-08-04 12:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][470/625] eta 0:01:09 lr 0.001195 wd 0.0500 time 0.4435 (0.4477) data time 0.0009 (0.0016) model time 0.4426 (0.4457) loss 3.0154 (3.6945) grad_norm 2.2212 (inf) loss_scale 16384.0000 (28558.9469) mem 16696MB [2024-08-04 12:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][480/625] eta 0:01:04 lr 0.001195 wd 0.0500 time 0.4454 (0.4477) data time 0.0008 (0.0016) model time 0.4446 (0.4457) loss 3.4219 (3.6884) grad_norm 1.2462 (inf) loss_scale 16384.0000 (28305.8295) mem 16696MB [2024-08-04 12:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][490/625] eta 0:01:00 lr 0.001195 wd 0.0500 time 0.4473 (0.4476) data time 0.0007 (0.0016) model time 0.4465 (0.4456) loss 4.0019 (3.6909) grad_norm 1.1827 (inf) loss_scale 16384.0000 (28063.0224) mem 16696MB [2024-08-04 12:12:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][500/625] eta 0:00:55 lr 0.001195 wd 0.0500 time 0.4437 (0.4475) data time 0.0008 (0.0016) model time 0.4429 (0.4456) loss 2.9735 (3.6893) grad_norm 1.1027 (inf) loss_scale 16384.0000 (27829.9082) mem 16696MB [2024-08-04 12:12:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][510/625] eta 0:00:51 lr 0.001195 wd 0.0500 time 0.4423 (0.4475) data time 0.0008 (0.0015) model time 0.4416 (0.4455) loss 4.0227 (3.6906) grad_norm 1.1466 (inf) loss_scale 16384.0000 (27605.9178) mem 16696MB [2024-08-04 12:12:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][520/625] eta 0:00:46 lr 0.001195 wd 0.0500 time 0.4409 (0.4474) data time 0.0006 (0.0015) model time 0.4403 (0.4455) loss 3.9092 (3.6922) grad_norm 1.0518 (inf) loss_scale 16384.0000 (27390.5259) mem 16696MB [2024-08-04 12:12:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][530/625] eta 0:00:42 lr 0.001195 wd 0.0500 time 0.4451 (0.4473) data time 0.0009 (0.0015) model time 0.4442 (0.4454) loss 2.6597 (3.6892) grad_norm 1.1589 (inf) loss_scale 16384.0000 (27183.2467) mem 16696MB [2024-08-04 12:12:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][540/625] eta 0:00:38 lr 0.001195 wd 0.0500 time 0.4411 (0.4473) data time 0.0007 (0.0015) model time 0.4404 (0.4454) loss 4.3773 (3.6899) grad_norm 1.2755 (inf) loss_scale 16384.0000 (26983.6303) mem 16696MB [2024-08-04 12:12:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][550/625] eta 0:00:33 lr 0.001195 wd 0.0500 time 0.4450 (0.4472) data time 0.0007 (0.0015) model time 0.4443 (0.4454) loss 4.3615 (3.6906) grad_norm 2.1869 (inf) loss_scale 16384.0000 (26791.2595) mem 16696MB [2024-08-04 12:12:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][560/625] eta 0:00:29 lr 0.001195 wd 0.0500 time 0.4464 (0.4472) data time 0.0007 (0.0015) model time 0.4457 (0.4453) loss 4.4221 (3.6900) grad_norm 1.3666 (inf) loss_scale 16384.0000 (26605.7469) mem 16696MB [2024-08-04 12:12:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][570/625] eta 0:00:24 lr 0.001195 wd 0.0500 time 0.4405 (0.4471) data time 0.0009 (0.0015) model time 0.4397 (0.4453) loss 3.8641 (3.6865) grad_norm 1.4211 (inf) loss_scale 16384.0000 (26426.7320) mem 16696MB [2024-08-04 12:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][580/625] eta 0:00:20 lr 0.001195 wd 0.0500 time 0.6363 (0.4477) data time 0.0008 (0.0015) model time 0.6356 (0.4460) loss 3.8899 (3.6885) grad_norm 1.1496 (inf) loss_scale 16384.0000 (26253.8795) mem 16696MB [2024-08-04 12:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][590/625] eta 0:00:15 lr 0.001195 wd 0.0500 time 0.4436 (0.4477) data time 0.0006 (0.0014) model time 0.4431 (0.4460) loss 4.2956 (3.6895) grad_norm 1.2516 (inf) loss_scale 16384.0000 (26086.8765) mem 16696MB [2024-08-04 12:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][600/625] eta 0:00:11 lr 0.001195 wd 0.0500 time 0.4423 (0.4477) data time 0.0007 (0.0014) model time 0.4416 (0.4460) loss 4.1467 (3.6861) grad_norm 1.3909 (inf) loss_scale 16384.0000 (25925.4309) mem 16696MB [2024-08-04 12:12:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][610/625] eta 0:00:06 lr 0.001195 wd 0.0500 time 0.4401 (0.4477) data time 0.0004 (0.0014) model time 0.4397 (0.4459) loss 3.9683 (3.6842) grad_norm 1.3400 (inf) loss_scale 16384.0000 (25769.2700) mem 16696MB [2024-08-04 12:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [31/300][620/625] eta 0:00:02 lr 0.001195 wd 0.0500 time 0.4438 (0.4476) data time 0.0004 (0.0014) model time 0.4434 (0.4459) loss 3.3003 (3.6826) grad_norm 0.9627 (inf) loss_scale 16384.0000 (25618.1385) mem 16696MB [2024-08-04 12:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 31 training takes 0:04:39 [2024-08-04 12:13:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 12:13:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 12:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.7280 (0.7280) Acc@1 83.545 (83.545) Acc@5 96.875 (96.875) Mem 16696MB [2024-08-04 12:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 1.3213 (0.9133) Acc@1 68.457 (78.613) Acc@5 90.039 (95.139) Mem 16696MB [2024-08-04 12:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.4111 (1.1089) Acc@1 67.529 (74.414) Acc@5 88.965 (92.615) Mem 16696MB [2024-08-04 12:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.326 Acc@5 92.634 [2024-08-04 12:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.3% [2024-08-04 12:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.33% [2024-08-04 12:13:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 12:13:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 12:13:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.6738 (0.6738) Acc@1 82.373 (82.373) Acc@5 95.947 (95.947) Mem 16696MB [2024-08-04 12:13:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 1.2441 (0.8636) Acc@1 68.066 (76.998) Acc@5 89.795 (94.385) Mem 16696MB [2024-08-04 12:13:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.134) Loss 1.4141 (1.0844) Acc@1 64.990 (72.380) Acc@5 86.768 (91.532) Mem 16696MB [2024-08-04 12:13:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 72.311 Acc@5 91.535 [2024-08-04 12:13:11 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 72.3% [2024-08-04 12:13:11 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 72.31% [2024-08-04 12:13:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 12:13:13 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 12:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][0/625] eta 0:07:33 lr 0.001195 wd 0.0500 time 0.7253 (0.7253) data time 0.3385 (0.3385) model time 0.0000 (0.0000) loss 4.1721 (4.1721) grad_norm 0.9349 (0.9349) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][10/625] eta 0:04:49 lr 0.001195 wd 0.0500 time 0.4520 (0.4710) data time 0.0008 (0.0315) model time 0.0000 (0.0000) loss 4.5081 (3.5959) grad_norm 1.3086 (1.1969) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][20/625] eta 0:04:36 lr 0.001195 wd 0.0500 time 0.4401 (0.4576) data time 0.0008 (0.0169) model time 0.0000 (0.0000) loss 3.7599 (3.6276) grad_norm 1.3184 (1.2769) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][30/625] eta 0:04:29 lr 0.001195 wd 0.0500 time 0.4422 (0.4534) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 3.8115 (3.6274) grad_norm 1.2880 (1.3752) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][40/625] eta 0:04:24 lr 0.001195 wd 0.0500 time 0.4575 (0.4517) data time 0.0008 (0.0091) model time 0.0000 (0.0000) loss 3.6505 (3.7110) grad_norm 1.2301 (1.3590) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][50/625] eta 0:04:18 lr 0.001195 wd 0.0500 time 0.4416 (0.4500) data time 0.0009 (0.0075) model time 0.0000 (0.0000) loss 3.7843 (3.7219) grad_norm 1.0542 (1.3382) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][60/625] eta 0:04:15 lr 0.001195 wd 0.0500 time 0.6479 (0.4521) data time 0.0006 (0.0064) model time 0.6473 (0.4620) loss 2.9052 (3.6816) grad_norm 2.3997 (1.3848) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][70/625] eta 0:04:09 lr 0.001195 wd 0.0500 time 0.4410 (0.4499) data time 0.0009 (0.0056) model time 0.4402 (0.4488) loss 3.5547 (3.7026) grad_norm 1.4745 (1.3732) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][80/625] eta 0:04:04 lr 0.001195 wd 0.0500 time 0.4423 (0.4490) data time 0.0008 (0.0050) model time 0.4414 (0.4466) loss 3.8712 (3.7227) grad_norm 1.6244 (1.3718) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][90/625] eta 0:04:00 lr 0.001194 wd 0.0500 time 0.4487 (0.4487) data time 0.0008 (0.0045) model time 0.4479 (0.4462) loss 3.9934 (3.6954) grad_norm 0.9966 (1.3588) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][100/625] eta 0:03:55 lr 0.001194 wd 0.0500 time 0.4447 (0.4482) data time 0.0008 (0.0042) model time 0.4440 (0.4455) loss 4.1796 (3.7238) grad_norm 1.7953 (1.3749) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][110/625] eta 0:03:50 lr 0.001194 wd 0.0500 time 0.4393 (0.4478) data time 0.0009 (0.0039) model time 0.4384 (0.4452) loss 3.6240 (3.7058) grad_norm 1.2781 (1.3802) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][120/625] eta 0:03:45 lr 0.001194 wd 0.0500 time 0.4444 (0.4475) data time 0.0009 (0.0036) model time 0.4435 (0.4449) loss 4.1012 (3.6724) grad_norm 2.2983 (1.3896) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][130/625] eta 0:03:42 lr 0.001194 wd 0.0500 time 0.4407 (0.4501) data time 0.0008 (0.0034) model time 0.4399 (0.4494) loss 3.5082 (3.6628) grad_norm 1.4166 (1.3869) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][140/625] eta 0:03:38 lr 0.001194 wd 0.0500 time 0.4449 (0.4496) data time 0.0006 (0.0032) model time 0.4442 (0.4486) loss 4.2670 (3.6594) grad_norm 1.6391 (1.3893) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][150/625] eta 0:03:33 lr 0.001194 wd 0.0500 time 0.4434 (0.4492) data time 0.0009 (0.0031) model time 0.4426 (0.4480) loss 2.9251 (3.6428) grad_norm 1.9584 (1.4023) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][160/625] eta 0:03:28 lr 0.001194 wd 0.0500 time 0.4484 (0.4488) data time 0.0006 (0.0029) model time 0.4477 (0.4475) loss 3.3955 (3.6538) grad_norm 1.1606 (1.3977) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][170/625] eta 0:03:24 lr 0.001194 wd 0.0500 time 0.4431 (0.4485) data time 0.0007 (0.0028) model time 0.4425 (0.4471) loss 4.4643 (3.6504) grad_norm 1.3659 (1.3954) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][180/625] eta 0:03:19 lr 0.001194 wd 0.0500 time 0.4440 (0.4482) data time 0.0009 (0.0027) model time 0.4431 (0.4467) loss 3.1829 (3.6358) grad_norm 0.9807 (1.3907) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][190/625] eta 0:03:14 lr 0.001194 wd 0.0500 time 0.4460 (0.4480) data time 0.0006 (0.0026) model time 0.4454 (0.4465) loss 2.5340 (3.6257) grad_norm 1.1437 (1.3847) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][200/625] eta 0:03:10 lr 0.001194 wd 0.0500 time 0.4427 (0.4479) data time 0.0009 (0.0025) model time 0.4418 (0.4464) loss 3.8385 (3.6301) grad_norm 1.1130 (1.3791) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][210/625] eta 0:03:05 lr 0.001194 wd 0.0500 time 0.4426 (0.4478) data time 0.0009 (0.0024) model time 0.4417 (0.4463) loss 3.3593 (3.6234) grad_norm 1.1945 (1.3747) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][220/625] eta 0:03:01 lr 0.001194 wd 0.0500 time 0.4438 (0.4476) data time 0.0009 (0.0023) model time 0.4429 (0.4461) loss 3.8154 (3.6324) grad_norm 1.6647 (1.3775) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][230/625] eta 0:02:56 lr 0.001194 wd 0.0500 time 0.4446 (0.4475) data time 0.0007 (0.0023) model time 0.4439 (0.4459) loss 4.2239 (3.6333) grad_norm 1.9035 (1.3835) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][240/625] eta 0:02:52 lr 0.001194 wd 0.0500 time 0.4429 (0.4473) data time 0.0009 (0.0022) model time 0.4421 (0.4458) loss 3.8807 (3.6321) grad_norm 1.2372 (1.3900) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][250/625] eta 0:02:47 lr 0.001194 wd 0.0500 time 0.4397 (0.4472) data time 0.0007 (0.0022) model time 0.4390 (0.4456) loss 3.6134 (3.6337) grad_norm 2.0231 (1.3924) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][260/625] eta 0:02:43 lr 0.001194 wd 0.0500 time 0.4466 (0.4472) data time 0.0007 (0.0021) model time 0.4460 (0.4457) loss 3.3531 (3.6234) grad_norm 0.9886 (1.3894) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][270/625] eta 0:02:38 lr 0.001194 wd 0.0500 time 0.4443 (0.4471) data time 0.0008 (0.0021) model time 0.4435 (0.4456) loss 2.6502 (3.6226) grad_norm 1.0853 (1.3887) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][280/625] eta 0:02:34 lr 0.001194 wd 0.0500 time 0.4426 (0.4470) data time 0.0007 (0.0020) model time 0.4419 (0.4455) loss 4.3399 (3.6304) grad_norm 2.1430 (1.3874) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][290/625] eta 0:02:29 lr 0.001194 wd 0.0500 time 0.4454 (0.4468) data time 0.0008 (0.0020) model time 0.4445 (0.4453) loss 3.4314 (3.6333) grad_norm 1.1183 (1.3867) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][300/625] eta 0:02:25 lr 0.001194 wd 0.0500 time 0.4414 (0.4467) data time 0.0007 (0.0019) model time 0.4407 (0.4452) loss 2.2691 (3.6349) grad_norm 0.9030 (1.3813) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][310/625] eta 0:02:20 lr 0.001194 wd 0.0500 time 0.4431 (0.4466) data time 0.0008 (0.0019) model time 0.4424 (0.4451) loss 4.1392 (3.6323) grad_norm 1.8963 (1.3784) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][320/625] eta 0:02:16 lr 0.001194 wd 0.0500 time 0.4418 (0.4465) data time 0.0009 (0.0019) model time 0.4410 (0.4451) loss 3.9187 (3.6330) grad_norm 1.2087 (1.3849) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][330/625] eta 0:02:11 lr 0.001194 wd 0.0500 time 0.4456 (0.4465) data time 0.0008 (0.0018) model time 0.4448 (0.4451) loss 3.9084 (3.6358) grad_norm 1.1286 (1.3825) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][340/625] eta 0:02:07 lr 0.001194 wd 0.0500 time 0.4453 (0.4465) data time 0.0006 (0.0018) model time 0.4447 (0.4450) loss 4.4213 (3.6442) grad_norm 1.0362 (1.3814) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][350/625] eta 0:02:02 lr 0.001194 wd 0.0500 time 0.4473 (0.4464) data time 0.0006 (0.0018) model time 0.4467 (0.4450) loss 3.0842 (3.6328) grad_norm 1.1412 (1.3795) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][360/625] eta 0:01:58 lr 0.001194 wd 0.0500 time 0.4391 (0.4464) data time 0.0006 (0.0017) model time 0.4385 (0.4450) loss 3.5796 (3.6250) grad_norm 1.4858 (1.3788) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][370/625] eta 0:01:53 lr 0.001194 wd 0.0500 time 0.4432 (0.4464) data time 0.0009 (0.0017) model time 0.4423 (0.4450) loss 3.6442 (3.6302) grad_norm 1.0457 (1.3804) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][380/625] eta 0:01:49 lr 0.001194 wd 0.0500 time 0.4415 (0.4463) data time 0.0008 (0.0017) model time 0.4407 (0.4449) loss 3.7945 (3.6320) grad_norm 1.4342 (1.3788) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][390/625] eta 0:01:44 lr 0.001194 wd 0.0500 time 0.4455 (0.4463) data time 0.0010 (0.0017) model time 0.4445 (0.4450) loss 3.6128 (3.6336) grad_norm 1.5761 (1.3790) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][400/625] eta 0:01:40 lr 0.001194 wd 0.0500 time 0.4469 (0.4467) data time 0.0009 (0.0017) model time 0.4460 (0.4454) loss 3.5478 (3.6334) grad_norm 1.9165 (1.3837) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][410/625] eta 0:01:36 lr 0.001194 wd 0.0500 time 0.4461 (0.4467) data time 0.0007 (0.0016) model time 0.4453 (0.4454) loss 4.3301 (3.6293) grad_norm 1.2035 (1.3883) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][420/625] eta 0:01:31 lr 0.001194 wd 0.0500 time 0.4457 (0.4467) data time 0.0008 (0.0016) model time 0.4449 (0.4454) loss 3.8504 (3.6327) grad_norm 1.5513 (1.3905) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][430/625] eta 0:01:27 lr 0.001194 wd 0.0500 time 0.4439 (0.4466) data time 0.0008 (0.0016) model time 0.4431 (0.4453) loss 2.5109 (3.6373) grad_norm 1.1559 (1.3911) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][440/625] eta 0:01:22 lr 0.001194 wd 0.0500 time 0.4448 (0.4466) data time 0.0008 (0.0016) model time 0.4440 (0.4453) loss 2.9001 (3.6346) grad_norm 1.0253 (1.3865) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][450/625] eta 0:01:18 lr 0.001194 wd 0.0500 time 0.4455 (0.4466) data time 0.0009 (0.0016) model time 0.4446 (0.4454) loss 3.7157 (3.6317) grad_norm 0.9575 (1.3827) loss_scale 16384.0000 (16384.0000) mem 16696MB [2024-08-04 12:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][460/625] eta 0:01:13 lr 0.001194 wd 0.0500 time 0.6699 (0.4470) data time 0.0008 (0.0015) model time 0.6691 (0.4459) loss 4.0069 (3.6366) grad_norm 2.2503 (inf) loss_scale 8192.0000 (16277.3796) mem 16696MB [2024-08-04 12:16:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][470/625] eta 0:01:09 lr 0.001194 wd 0.0500 time 0.4441 (0.4470) data time 0.0008 (0.0015) model time 0.4433 (0.4458) loss 3.7670 (3.6389) grad_norm 1.9741 (inf) loss_scale 8192.0000 (16105.7155) mem 16696MB [2024-08-04 12:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][480/625] eta 0:01:04 lr 0.001194 wd 0.0500 time 0.4426 (0.4469) data time 0.0008 (0.0015) model time 0.4418 (0.4458) loss 4.3432 (3.6408) grad_norm 0.9946 (inf) loss_scale 8192.0000 (15941.1892) mem 16696MB [2024-08-04 12:16:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][490/625] eta 0:01:00 lr 0.001194 wd 0.0500 time 0.4467 (0.4469) data time 0.0006 (0.0015) model time 0.4462 (0.4457) loss 4.5504 (3.6429) grad_norm 1.2258 (inf) loss_scale 8192.0000 (15783.3646) mem 16696MB [2024-08-04 12:16:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][500/625] eta 0:00:55 lr 0.001194 wd 0.0500 time 0.4483 (0.4469) data time 0.0009 (0.0015) model time 0.4474 (0.4457) loss 2.3488 (3.6362) grad_norm 1.0438 (inf) loss_scale 8192.0000 (15631.8403) mem 16696MB [2024-08-04 12:17:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][510/625] eta 0:00:51 lr 0.001194 wd 0.0500 time 0.4455 (0.4468) data time 0.0006 (0.0015) model time 0.4449 (0.4457) loss 3.6413 (3.6370) grad_norm 1.2067 (inf) loss_scale 8192.0000 (15486.2466) mem 16696MB [2024-08-04 12:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][520/625] eta 0:00:46 lr 0.001194 wd 0.0500 time 0.4452 (0.4468) data time 0.0009 (0.0015) model time 0.4443 (0.4456) loss 4.3182 (3.6382) grad_norm 2.1174 (inf) loss_scale 8192.0000 (15346.2418) mem 16696MB [2024-08-04 12:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][530/625] eta 0:00:42 lr 0.001194 wd 0.0500 time 0.4468 (0.4468) data time 0.0007 (0.0014) model time 0.4461 (0.4456) loss 3.6907 (3.6365) grad_norm 1.4840 (inf) loss_scale 8192.0000 (15211.5104) mem 16696MB [2024-08-04 12:17:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][540/625] eta 0:00:37 lr 0.001194 wd 0.0500 time 0.4433 (0.4467) data time 0.0008 (0.0014) model time 0.4426 (0.4456) loss 3.5936 (3.6407) grad_norm 1.9826 (inf) loss_scale 8192.0000 (15081.7597) mem 16696MB [2024-08-04 12:17:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][550/625] eta 0:00:33 lr 0.001194 wd 0.0500 time 0.4447 (0.4467) data time 0.0006 (0.0014) model time 0.4441 (0.4456) loss 4.5220 (3.6406) grad_norm 1.3239 (inf) loss_scale 8192.0000 (14956.7187) mem 16696MB [2024-08-04 12:17:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][560/625] eta 0:00:29 lr 0.001194 wd 0.0500 time 0.4464 (0.4467) data time 0.0007 (0.0014) model time 0.4457 (0.4455) loss 3.9711 (3.6376) grad_norm 1.4439 (inf) loss_scale 8192.0000 (14836.1355) mem 16696MB [2024-08-04 12:17:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][570/625] eta 0:00:24 lr 0.001194 wd 0.0500 time 0.4457 (0.4467) data time 0.0007 (0.0014) model time 0.4451 (0.4455) loss 4.0279 (3.6426) grad_norm 1.4698 (inf) loss_scale 8192.0000 (14719.7758) mem 16696MB [2024-08-04 12:17:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][580/625] eta 0:00:20 lr 0.001194 wd 0.0500 time 0.4467 (0.4467) data time 0.0008 (0.0014) model time 0.4459 (0.4455) loss 3.9755 (3.6435) grad_norm 0.9651 (inf) loss_scale 8192.0000 (14607.4217) mem 16696MB [2024-08-04 12:17:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][590/625] eta 0:00:15 lr 0.001194 wd 0.0500 time 0.4434 (0.4467) data time 0.0008 (0.0014) model time 0.4426 (0.4456) loss 3.8069 (3.6477) grad_norm 1.5002 (inf) loss_scale 8192.0000 (14498.8697) mem 16696MB [2024-08-04 12:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][600/625] eta 0:00:11 lr 0.001194 wd 0.0500 time 0.4465 (0.4467) data time 0.0006 (0.0014) model time 0.4459 (0.4456) loss 3.5776 (3.6529) grad_norm 1.1989 (inf) loss_scale 8192.0000 (14393.9301) mem 16696MB [2024-08-04 12:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][610/625] eta 0:00:06 lr 0.001194 wd 0.0500 time 0.4397 (0.4466) data time 0.0004 (0.0014) model time 0.4393 (0.4455) loss 3.1069 (3.6528) grad_norm 1.1491 (inf) loss_scale 8192.0000 (14292.4255) mem 16696MB [2024-08-04 12:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [32/300][620/625] eta 0:00:02 lr 0.001194 wd 0.0500 time 0.4372 (0.4465) data time 0.0004 (0.0014) model time 0.4368 (0.4454) loss 2.7835 (3.6570) grad_norm 1.2795 (inf) loss_scale 8192.0000 (14194.1900) mem 16696MB [2024-08-04 12:17:52 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 32 training takes 0:04:39 [2024-08-04 12:17:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 12:17:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 12:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.7075 (0.7075) Acc@1 83.643 (83.643) Acc@5 96.973 (96.973) Mem 16696MB [2024-08-04 12:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 1.2275 (0.8953) Acc@1 70.801 (79.070) Acc@5 91.650 (95.344) Mem 16696MB [2024-08-04 12:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 1.3555 (1.0932) Acc@1 69.092 (74.772) Acc@5 90.332 (92.804) Mem 16696MB [2024-08-04 12:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.450 Acc@5 92.820 [2024-08-04 12:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 74.5% [2024-08-04 12:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 74.45% [2024-08-04 12:17:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 12:17:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 12:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.6558 (0.6558) Acc@1 82.959 (82.959) Acc@5 96.387 (96.387) Mem 16696MB [2024-08-04 12:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 1.2100 (0.8393) Acc@1 69.043 (77.677) Acc@5 90.088 (94.682) Mem 16696MB [2024-08-04 12:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 1.3770 (1.0552) Acc@1 66.016 (73.112) Acc@5 87.451 (91.885) Mem 16696MB [2024-08-04 12:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.029 Acc@5 91.879 [2024-08-04 12:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 73.0% [2024-08-04 12:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 73.03% [2024-08-04 12:18:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 12:18:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 12:18:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][0/625] eta 0:07:28 lr 0.001194 wd 0.0500 time 0.7182 (0.7182) data time 0.3364 (0.3364) model time 0.0000 (0.0000) loss 3.5672 (3.5672) grad_norm 1.3842 (1.3842) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][10/625] eta 0:04:48 lr 0.001194 wd 0.0500 time 0.4414 (0.4689) data time 0.0008 (0.0313) model time 0.0000 (0.0000) loss 3.3050 (3.4333) grad_norm 1.5011 (1.2653) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][20/625] eta 0:04:36 lr 0.001194 wd 0.0500 time 0.4446 (0.4572) data time 0.0006 (0.0168) model time 0.0000 (0.0000) loss 4.1593 (3.4623) grad_norm 1.3812 (1.4640) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][30/625] eta 0:04:29 lr 0.001194 wd 0.0500 time 0.4391 (0.4531) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 2.9698 (3.5659) grad_norm 1.4854 (1.4895) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][40/625] eta 0:04:23 lr 0.001194 wd 0.0500 time 0.4559 (0.4511) data time 0.0006 (0.0090) model time 0.0000 (0.0000) loss 2.9268 (3.5807) grad_norm 1.2297 (1.4625) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][50/625] eta 0:04:20 lr 0.001194 wd 0.0500 time 0.4410 (0.4538) data time 0.0007 (0.0074) model time 0.0000 (0.0000) loss 3.0447 (3.5863) grad_norm 1.2066 (1.4331) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][60/625] eta 0:04:17 lr 0.001194 wd 0.0500 time 0.6378 (0.4555) data time 0.0009 (0.0063) model time 0.6369 (0.4634) loss 3.6498 (3.6083) grad_norm 0.9915 (1.4049) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][70/625] eta 0:04:11 lr 0.001194 wd 0.0500 time 0.4395 (0.4530) data time 0.0008 (0.0055) model time 0.4387 (0.4501) loss 3.2231 (3.6119) grad_norm 1.3343 (1.3892) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][80/625] eta 0:04:06 lr 0.001194 wd 0.0500 time 0.4446 (0.4517) data time 0.0008 (0.0050) model time 0.4438 (0.4474) loss 3.3742 (3.6202) grad_norm 1.1366 (1.4117) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][90/625] eta 0:04:01 lr 0.001194 wd 0.0500 time 0.4403 (0.4510) data time 0.0008 (0.0045) model time 0.4395 (0.4465) loss 4.0798 (3.6137) grad_norm 1.6690 (1.4302) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][100/625] eta 0:03:56 lr 0.001194 wd 0.0500 time 0.4482 (0.4504) data time 0.0009 (0.0041) model time 0.4473 (0.4460) loss 4.1491 (3.6199) grad_norm 1.2120 (1.4344) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][110/625] eta 0:03:51 lr 0.001194 wd 0.0500 time 0.4404 (0.4498) data time 0.0006 (0.0038) model time 0.4399 (0.4455) loss 2.5990 (3.6073) grad_norm 1.1137 (1.4159) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][120/625] eta 0:03:46 lr 0.001194 wd 0.0500 time 0.4372 (0.4493) data time 0.0008 (0.0036) model time 0.4364 (0.4452) loss 3.8394 (3.6204) grad_norm 0.9507 (1.4070) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][130/625] eta 0:03:42 lr 0.001193 wd 0.0500 time 0.4438 (0.4490) data time 0.0007 (0.0034) model time 0.4431 (0.4451) loss 4.2324 (3.6200) grad_norm 1.1730 (1.4132) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][140/625] eta 0:03:37 lr 0.001193 wd 0.0500 time 0.4470 (0.4488) data time 0.0009 (0.0032) model time 0.4461 (0.4452) loss 3.9389 (3.6243) grad_norm 1.4670 (1.4073) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][150/625] eta 0:03:33 lr 0.001193 wd 0.0500 time 0.4434 (0.4486) data time 0.0009 (0.0030) model time 0.4425 (0.4451) loss 3.9985 (3.6135) grad_norm 1.5796 (1.4099) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][160/625] eta 0:03:28 lr 0.001193 wd 0.0500 time 0.4437 (0.4484) data time 0.0006 (0.0029) model time 0.4431 (0.4451) loss 4.3027 (3.6262) grad_norm 1.1790 (1.4037) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][170/625] eta 0:03:23 lr 0.001193 wd 0.0500 time 0.4455 (0.4483) data time 0.0008 (0.0028) model time 0.4447 (0.4451) loss 2.7034 (3.6141) grad_norm 2.4533 (1.4067) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][180/625] eta 0:03:19 lr 0.001193 wd 0.0500 time 0.4457 (0.4481) data time 0.0006 (0.0026) model time 0.4451 (0.4451) loss 4.3518 (3.6062) grad_norm 0.9826 (1.3996) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][190/625] eta 0:03:14 lr 0.001193 wd 0.0500 time 0.4430 (0.4479) data time 0.0007 (0.0026) model time 0.4423 (0.4449) loss 3.4160 (3.6104) grad_norm 1.0024 (1.3817) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][200/625] eta 0:03:10 lr 0.001193 wd 0.0500 time 0.4424 (0.4477) data time 0.0007 (0.0025) model time 0.4417 (0.4449) loss 3.6938 (3.6063) grad_norm 0.9639 (1.3805) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][210/625] eta 0:03:05 lr 0.001193 wd 0.0500 time 0.4436 (0.4475) data time 0.0007 (0.0024) model time 0.4429 (0.4447) loss 4.4061 (3.6113) grad_norm 1.2806 (1.3866) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][220/625] eta 0:03:01 lr 0.001193 wd 0.0500 time 0.4440 (0.4473) data time 0.0006 (0.0023) model time 0.4433 (0.4446) loss 4.4385 (3.6298) grad_norm 1.1442 (1.3975) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][230/625] eta 0:02:56 lr 0.001193 wd 0.0500 time 0.4439 (0.4472) data time 0.0006 (0.0022) model time 0.4433 (0.4445) loss 3.6203 (3.6215) grad_norm 1.1353 (1.3920) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][240/625] eta 0:02:52 lr 0.001193 wd 0.0500 time 0.4422 (0.4477) data time 0.0006 (0.0022) model time 0.4416 (0.4453) loss 4.5564 (3.6170) grad_norm 1.0545 (1.3949) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][250/625] eta 0:02:47 lr 0.001193 wd 0.0500 time 0.4419 (0.4475) data time 0.0007 (0.0021) model time 0.4412 (0.4451) loss 4.3816 (3.6172) grad_norm 1.1230 (1.3992) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:19:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][260/625] eta 0:02:43 lr 0.001193 wd 0.0500 time 0.4434 (0.4474) data time 0.0008 (0.0021) model time 0.4426 (0.4450) loss 3.5693 (3.6069) grad_norm 1.1339 (1.3940) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][270/625] eta 0:02:38 lr 0.001193 wd 0.0500 time 0.4419 (0.4472) data time 0.0006 (0.0020) model time 0.4413 (0.4449) loss 3.0270 (3.6099) grad_norm 1.2178 (1.3928) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][280/625] eta 0:02:34 lr 0.001193 wd 0.0500 time 0.4448 (0.4470) data time 0.0009 (0.0020) model time 0.4439 (0.4447) loss 3.2195 (3.6098) grad_norm 1.2242 (1.3888) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][290/625] eta 0:02:29 lr 0.001193 wd 0.0500 time 0.4419 (0.4469) data time 0.0008 (0.0019) model time 0.4411 (0.4446) loss 3.8212 (3.6071) grad_norm 1.4667 (1.3940) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][300/625] eta 0:02:25 lr 0.001193 wd 0.0500 time 0.4412 (0.4467) data time 0.0006 (0.0019) model time 0.4406 (0.4445) loss 3.8011 (3.6128) grad_norm 1.2446 (1.3947) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][310/625] eta 0:02:20 lr 0.001193 wd 0.0500 time 0.4437 (0.4466) data time 0.0007 (0.0019) model time 0.4430 (0.4444) loss 3.7352 (3.6076) grad_norm 1.1441 (1.3972) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][320/625] eta 0:02:16 lr 0.001193 wd 0.0500 time 0.4415 (0.4466) data time 0.0006 (0.0018) model time 0.4409 (0.4444) loss 3.7597 (3.5979) grad_norm 1.2284 (1.3926) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][330/625] eta 0:02:11 lr 0.001193 wd 0.0500 time 0.4406 (0.4465) data time 0.0007 (0.0018) model time 0.4399 (0.4444) loss 3.2650 (3.6038) grad_norm 1.3987 (1.3896) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][340/625] eta 0:02:07 lr 0.001193 wd 0.0500 time 0.4434 (0.4464) data time 0.0009 (0.0018) model time 0.4425 (0.4443) loss 3.0804 (3.5996) grad_norm 1.4871 (1.3903) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][350/625] eta 0:02:02 lr 0.001193 wd 0.0500 time 0.4411 (0.4463) data time 0.0009 (0.0018) model time 0.4402 (0.4442) loss 3.7094 (3.6002) grad_norm 1.5182 (1.3944) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][360/625] eta 0:01:58 lr 0.001193 wd 0.0500 time 0.4449 (0.4463) data time 0.0008 (0.0017) model time 0.4441 (0.4443) loss 2.7494 (3.5977) grad_norm 1.5131 (1.3920) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][370/625] eta 0:01:53 lr 0.001193 wd 0.0500 time 0.4436 (0.4462) data time 0.0007 (0.0017) model time 0.4430 (0.4442) loss 3.4464 (3.5988) grad_norm 1.3631 (1.3929) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][380/625] eta 0:01:49 lr 0.001193 wd 0.0500 time 0.4422 (0.4462) data time 0.0007 (0.0017) model time 0.4415 (0.4442) loss 3.9120 (3.6057) grad_norm 1.2040 (1.3902) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:20:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][390/625] eta 0:01:44 lr 0.001193 wd 0.0500 time 0.4431 (0.4461) data time 0.0006 (0.0017) model time 0.4425 (0.4441) loss 2.7504 (3.6033) grad_norm 1.1266 (1.3885) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][400/625] eta 0:01:40 lr 0.001193 wd 0.0500 time 0.4453 (0.4465) data time 0.0007 (0.0017) model time 0.4446 (0.4446) loss 2.8477 (3.5991) grad_norm 1.4687 (1.3885) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][410/625] eta 0:01:35 lr 0.001193 wd 0.0500 time 0.4466 (0.4464) data time 0.0008 (0.0016) model time 0.4457 (0.4446) loss 3.8401 (3.6011) grad_norm 1.2690 (1.3861) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][420/625] eta 0:01:31 lr 0.001193 wd 0.0500 time 0.4419 (0.4465) data time 0.0006 (0.0016) model time 0.4412 (0.4446) loss 4.1649 (3.6048) grad_norm 1.1852 (1.3844) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][430/625] eta 0:01:27 lr 0.001193 wd 0.0500 time 0.4436 (0.4464) data time 0.0006 (0.0016) model time 0.4430 (0.4446) loss 2.7091 (3.6011) grad_norm 1.4653 (1.3878) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][440/625] eta 0:01:22 lr 0.001193 wd 0.0500 time 0.4411 (0.4464) data time 0.0006 (0.0016) model time 0.4405 (0.4446) loss 2.8219 (3.6030) grad_norm 2.3281 (1.3874) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][450/625] eta 0:01:18 lr 0.001193 wd 0.0500 time 0.4478 (0.4464) data time 0.0007 (0.0016) model time 0.4470 (0.4446) loss 3.1404 (3.6001) grad_norm 2.1987 (1.3962) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][460/625] eta 0:01:13 lr 0.001193 wd 0.0500 time 0.4455 (0.4468) data time 0.0008 (0.0016) model time 0.4446 (0.4451) loss 3.8583 (3.5973) grad_norm 1.0000 (1.3930) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][470/625] eta 0:01:09 lr 0.001193 wd 0.0500 time 0.4488 (0.4468) data time 0.0006 (0.0015) model time 0.4482 (0.4451) loss 2.5554 (3.5990) grad_norm 1.3797 (1.3953) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][480/625] eta 0:01:04 lr 0.001193 wd 0.0500 time 0.4431 (0.4468) data time 0.0006 (0.0015) model time 0.4425 (0.4452) loss 3.1930 (3.6000) grad_norm 0.9614 (1.3959) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][490/625] eta 0:01:00 lr 0.001193 wd 0.0500 time 0.4418 (0.4467) data time 0.0007 (0.0015) model time 0.4411 (0.4451) loss 2.9406 (3.5964) grad_norm 1.3322 (1.3942) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][500/625] eta 0:00:55 lr 0.001193 wd 0.0500 time 0.4459 (0.4467) data time 0.0006 (0.0015) model time 0.4453 (0.4450) loss 4.4195 (3.5932) grad_norm 1.1935 (1.3903) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][510/625] eta 0:00:51 lr 0.001193 wd 0.0500 time 0.4406 (0.4466) data time 0.0009 (0.0015) model time 0.4397 (0.4450) loss 3.5615 (3.5891) grad_norm 2.4814 (1.3927) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:21:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][520/625] eta 0:00:46 lr 0.001193 wd 0.0500 time 0.4421 (0.4465) data time 0.0006 (0.0015) model time 0.4415 (0.4449) loss 4.1027 (3.5890) grad_norm 1.5789 (1.3951) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][530/625] eta 0:00:42 lr 0.001193 wd 0.0500 time 0.4538 (0.4465) data time 0.0007 (0.0015) model time 0.4531 (0.4449) loss 3.0524 (3.5924) grad_norm 1.2025 (1.3920) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][540/625] eta 0:00:37 lr 0.001193 wd 0.0500 time 0.4392 (0.4465) data time 0.0009 (0.0015) model time 0.4383 (0.4449) loss 2.9224 (3.5911) grad_norm 1.6389 (1.3905) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][550/625] eta 0:00:33 lr 0.001193 wd 0.0500 time 0.4415 (0.4464) data time 0.0009 (0.0015) model time 0.4406 (0.4448) loss 3.2377 (3.5896) grad_norm 1.6798 (1.3874) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][560/625] eta 0:00:29 lr 0.001193 wd 0.0500 time 0.4467 (0.4464) data time 0.0008 (0.0014) model time 0.4458 (0.4448) loss 4.1661 (3.5892) grad_norm 1.1630 (1.3874) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][570/625] eta 0:00:24 lr 0.001193 wd 0.0500 time 0.4451 (0.4464) data time 0.0007 (0.0014) model time 0.4444 (0.4448) loss 4.4994 (3.5956) grad_norm 1.8037 (1.3866) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][580/625] eta 0:00:20 lr 0.001193 wd 0.0500 time 0.4424 (0.4463) data time 0.0009 (0.0014) model time 0.4415 (0.4447) loss 3.1072 (3.5941) grad_norm 1.6180 (1.3874) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][590/625] eta 0:00:15 lr 0.001193 wd 0.0500 time 0.4416 (0.4463) data time 0.0007 (0.0014) model time 0.4410 (0.4448) loss 4.1860 (3.5975) grad_norm 1.4313 (1.3894) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][600/625] eta 0:00:11 lr 0.001193 wd 0.0500 time 0.4450 (0.4463) data time 0.0008 (0.0014) model time 0.4441 (0.4447) loss 3.8854 (3.5967) grad_norm 1.2596 (1.3890) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][610/625] eta 0:00:06 lr 0.001193 wd 0.0500 time 0.4420 (0.4466) data time 0.0006 (0.0014) model time 0.4415 (0.4451) loss 4.0070 (3.5976) grad_norm 2.0828 (1.3926) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [33/300][620/625] eta 0:00:02 lr 0.001193 wd 0.0500 time 0.4388 (0.4465) data time 0.0006 (0.0014) model time 0.4382 (0.4450) loss 3.5183 (3.5975) grad_norm 1.3315 (1.3970) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:22:42 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 33 training takes 0:04:39 [2024-08-04 12:22:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 12:22:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 12:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.486 (0.486) Loss 0.7090 (0.7090) Acc@1 84.717 (84.717) Acc@5 96.973 (96.973) Mem 16696MB [2024-08-04 12:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 1.1953 (0.8856) Acc@1 72.021 (79.572) Acc@5 90.918 (95.459) Mem 16696MB [2024-08-04 12:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 1.3877 (1.0827) Acc@1 68.408 (75.128) Acc@5 89.502 (92.932) Mem 16696MB [2024-08-04 12:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.002 Acc@5 93.004 [2024-08-04 12:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.0% [2024-08-04 12:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.00% [2024-08-04 12:22:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 12:22:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 12:22:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.6392 (0.6392) Acc@1 83.496 (83.496) Acc@5 96.484 (96.484) Mem 16696MB [2024-08-04 12:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.1777 (0.8184) Acc@1 70.020 (78.307) Acc@5 90.381 (94.873) Mem 16696MB [2024-08-04 12:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.134) Loss 1.3477 (1.0300) Acc@1 66.553 (73.789) Acc@5 88.037 (92.227) Mem 16696MB [2024-08-04 12:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 73.706 Acc@5 92.214 [2024-08-04 12:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 73.7% [2024-08-04 12:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 73.71% [2024-08-04 12:22:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 12:22:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 12:22:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][0/625] eta 0:08:10 lr 0.001193 wd 0.0500 time 0.7850 (0.7850) data time 0.4011 (0.4011) model time 0.0000 (0.0000) loss 4.0275 (4.0275) grad_norm 1.7486 (1.7486) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][10/625] eta 0:04:51 lr 0.001193 wd 0.0500 time 0.4447 (0.4738) data time 0.0006 (0.0372) model time 0.0000 (0.0000) loss 4.3154 (3.6960) grad_norm 1.0547 (1.3721) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][20/625] eta 0:04:37 lr 0.001193 wd 0.0500 time 0.4426 (0.4591) data time 0.0008 (0.0200) model time 0.0000 (0.0000) loss 3.7759 (3.5523) grad_norm 1.6150 (1.4722) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][30/625] eta 0:04:30 lr 0.001193 wd 0.0500 time 0.4396 (0.4540) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 3.6965 (3.6575) grad_norm 1.1740 (1.4480) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][40/625] eta 0:04:24 lr 0.001193 wd 0.0500 time 0.4470 (0.4517) data time 0.0009 (0.0107) model time 0.0000 (0.0000) loss 3.8758 (3.6005) grad_norm 1.7318 (1.4623) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][50/625] eta 0:04:18 lr 0.001193 wd 0.0500 time 0.4440 (0.4500) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 4.3773 (3.5831) grad_norm 1.5815 (1.4326) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][60/625] eta 0:04:15 lr 0.001193 wd 0.0500 time 0.6526 (0.4525) data time 0.0008 (0.0074) model time 0.6518 (0.4644) loss 3.6939 (3.4983) grad_norm 1.7634 (1.4472) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][70/625] eta 0:04:09 lr 0.001193 wd 0.0500 time 0.4424 (0.4504) data time 0.0007 (0.0065) model time 0.4418 (0.4505) loss 2.9971 (3.4610) grad_norm 1.3210 (1.4595) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][80/625] eta 0:04:05 lr 0.001193 wd 0.0500 time 0.4560 (0.4497) data time 0.0007 (0.0058) model time 0.4552 (0.4483) loss 4.0529 (3.4861) grad_norm 1.4422 (1.4547) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][90/625] eta 0:04:00 lr 0.001193 wd 0.0500 time 0.4454 (0.4489) data time 0.0009 (0.0053) model time 0.4445 (0.4467) loss 3.9598 (3.5238) grad_norm 1.3629 (1.4529) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][100/625] eta 0:03:55 lr 0.001193 wd 0.0500 time 0.4427 (0.4484) data time 0.0007 (0.0048) model time 0.4420 (0.4459) loss 4.1441 (3.5291) grad_norm 1.2328 (1.4365) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][110/625] eta 0:03:50 lr 0.001193 wd 0.0500 time 0.4521 (0.4481) data time 0.0006 (0.0045) model time 0.4515 (0.4455) loss 3.6663 (3.5491) grad_norm 1.7545 (1.4331) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][120/625] eta 0:03:46 lr 0.001192 wd 0.0500 time 0.4484 (0.4478) data time 0.0006 (0.0042) model time 0.4478 (0.4454) loss 2.8970 (3.5640) grad_norm 1.9256 (1.4399) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][130/625] eta 0:03:41 lr 0.001192 wd 0.0500 time 0.4463 (0.4476) data time 0.0008 (0.0040) model time 0.4455 (0.4451) loss 4.2160 (3.5625) grad_norm 2.0966 (1.4342) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:23:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][140/625] eta 0:03:36 lr 0.001192 wd 0.0500 time 0.4454 (0.4473) data time 0.0008 (0.0037) model time 0.4445 (0.4449) loss 4.1066 (3.5585) grad_norm 1.1385 (1.4322) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][150/625] eta 0:03:32 lr 0.001192 wd 0.0500 time 0.4440 (0.4472) data time 0.0009 (0.0035) model time 0.4432 (0.4448) loss 3.2225 (3.5644) grad_norm 1.2892 (1.4416) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][160/625] eta 0:03:27 lr 0.001192 wd 0.0500 time 0.4444 (0.4470) data time 0.0009 (0.0034) model time 0.4436 (0.4447) loss 3.7414 (3.5530) grad_norm 1.6113 (1.4376) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][170/625] eta 0:03:23 lr 0.001192 wd 0.0500 time 0.4399 (0.4469) data time 0.0007 (0.0032) model time 0.4392 (0.4447) loss 4.1848 (3.5561) grad_norm 1.4903 (1.4289) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][180/625] eta 0:03:18 lr 0.001192 wd 0.0500 time 0.4478 (0.4468) data time 0.0009 (0.0031) model time 0.4469 (0.4447) loss 3.1381 (3.5520) grad_norm 1.5268 (1.4293) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][190/625] eta 0:03:14 lr 0.001192 wd 0.0500 time 0.4531 (0.4469) data time 0.0008 (0.0030) model time 0.4523 (0.4448) loss 3.8747 (3.5511) grad_norm 1.4746 (1.4222) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][200/625] eta 0:03:09 lr 0.001192 wd 0.0500 time 0.4526 (0.4468) data time 0.0006 (0.0029) model time 0.4520 (0.4449) loss 4.1878 (3.5559) grad_norm 1.0054 (1.4186) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][210/625] eta 0:03:06 lr 0.001192 wd 0.0500 time 0.4410 (0.4484) data time 0.0008 (0.0028) model time 0.4401 (0.4470) loss 3.4629 (3.5432) grad_norm 2.2633 (1.4226) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][220/625] eta 0:03:01 lr 0.001192 wd 0.0500 time 0.4393 (0.4482) data time 0.0010 (0.0027) model time 0.4383 (0.4468) loss 3.9029 (3.5404) grad_norm 0.9954 (1.4222) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][230/625] eta 0:02:56 lr 0.001192 wd 0.0500 time 0.4451 (0.4481) data time 0.0006 (0.0026) model time 0.4445 (0.4466) loss 4.9114 (3.5428) grad_norm 1.3268 (1.4215) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][240/625] eta 0:02:52 lr 0.001192 wd 0.0500 time 0.4470 (0.4480) data time 0.0009 (0.0026) model time 0.4460 (0.4466) loss 3.1615 (3.5462) grad_norm 1.1924 (1.4157) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][250/625] eta 0:02:47 lr 0.001192 wd 0.0500 time 0.4437 (0.4478) data time 0.0006 (0.0025) model time 0.4431 (0.4464) loss 4.1116 (3.5396) grad_norm 1.1731 (1.4074) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][260/625] eta 0:02:43 lr 0.001192 wd 0.0500 time 0.4480 (0.4477) data time 0.0006 (0.0024) model time 0.4474 (0.4462) loss 2.6124 (3.5276) grad_norm 1.1058 (1.4060) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][270/625] eta 0:02:38 lr 0.001192 wd 0.0500 time 0.4395 (0.4475) data time 0.0011 (0.0024) model time 0.4384 (0.4461) loss 4.3290 (3.5320) grad_norm 1.7731 (1.4090) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][280/625] eta 0:02:34 lr 0.001192 wd 0.0500 time 0.4424 (0.4474) data time 0.0007 (0.0023) model time 0.4417 (0.4460) loss 2.7922 (3.5347) grad_norm 1.8842 (1.4129) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][290/625] eta 0:02:29 lr 0.001192 wd 0.0500 time 0.4499 (0.4474) data time 0.0008 (0.0023) model time 0.4490 (0.4460) loss 3.7476 (3.5286) grad_norm 1.3046 (1.4099) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][300/625] eta 0:02:25 lr 0.001192 wd 0.0500 time 0.4466 (0.4473) data time 0.0009 (0.0022) model time 0.4457 (0.4459) loss 3.0838 (3.5238) grad_norm 2.1501 (1.4085) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][310/625] eta 0:02:20 lr 0.001192 wd 0.0500 time 0.4437 (0.4472) data time 0.0009 (0.0022) model time 0.4428 (0.4457) loss 3.9366 (3.5205) grad_norm 1.7513 (1.4103) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][320/625] eta 0:02:16 lr 0.001192 wd 0.0500 time 0.4480 (0.4471) data time 0.0007 (0.0021) model time 0.4473 (0.4456) loss 4.3751 (3.5319) grad_norm 2.5880 (1.4124) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][330/625] eta 0:02:11 lr 0.001192 wd 0.0500 time 0.4435 (0.4470) data time 0.0009 (0.0021) model time 0.4426 (0.4455) loss 3.8183 (3.5384) grad_norm 1.0629 (1.4091) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][340/625] eta 0:02:07 lr 0.001192 wd 0.0500 time 0.4433 (0.4469) data time 0.0008 (0.0021) model time 0.4425 (0.4454) loss 3.8468 (3.5394) grad_norm 1.8001 (1.4125) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][350/625] eta 0:02:03 lr 0.001192 wd 0.0500 time 0.4406 (0.4474) data time 0.0009 (0.0020) model time 0.4397 (0.4461) loss 3.6577 (3.5263) grad_norm 1.7059 (1.4121) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][360/625] eta 0:01:58 lr 0.001192 wd 0.0500 time 0.4465 (0.4478) data time 0.0010 (0.0020) model time 0.4456 (0.4465) loss 2.3428 (3.5214) grad_norm 1.1915 (1.4135) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][370/625] eta 0:01:54 lr 0.001192 wd 0.0500 time 0.4409 (0.4477) data time 0.0009 (0.0020) model time 0.4400 (0.4464) loss 4.4690 (3.5236) grad_norm 1.4847 (1.4069) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][380/625] eta 0:01:49 lr 0.001192 wd 0.0500 time 0.4423 (0.4476) data time 0.0006 (0.0020) model time 0.4417 (0.4463) loss 3.8608 (3.5247) grad_norm 1.7306 (1.4093) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][390/625] eta 0:01:45 lr 0.001192 wd 0.0500 time 0.4451 (0.4475) data time 0.0008 (0.0019) model time 0.4442 (0.4462) loss 3.6800 (3.5294) grad_norm 1.7165 (1.4122) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:25:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][400/625] eta 0:01:40 lr 0.001192 wd 0.0500 time 0.4480 (0.4478) data time 0.0006 (0.0019) model time 0.4474 (0.4466) loss 2.8131 (3.5284) grad_norm 1.8149 (1.4080) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:26:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][410/625] eta 0:01:36 lr 0.001192 wd 0.0500 time 0.4431 (0.4477) data time 0.0008 (0.0019) model time 0.4423 (0.4465) loss 3.6700 (3.5268) grad_norm 1.6310 (1.4073) loss_scale 8192.0000 (8192.0000) mem 16696MB [2024-08-04 12:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 12:26:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 12:26:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 12:32:12 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 12:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 12:32:28 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 14:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 14:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 14:29:01 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 14:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 14:29:13 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 14:29:16 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 14:29:18 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 14:29:18 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 34) [2024-08-04 14:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 14:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][420/625] eta 0:35:33 lr 0.001192 wd 0.0500 time 1.2447 (10.4055) data time 0.0012 (0.5272) model time 1.2435 (9.8783) loss 4.4211 (4.5494) grad_norm 1.4393 (1.2317) loss_scale 8192.0000 (8192.0000) mem 16697MB [2024-08-04 14:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][430/625] eta 0:06:52 lr 0.001192 wd 0.0500 time 0.4554 (2.1133) data time 0.0009 (0.0888) model time 0.4546 (2.0245) loss 2.7047 (3.8762) grad_norm 1.3749 (1.3507) loss_scale 8192.0000 (8192.0000) mem 16695MB [2024-08-04 14:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][440/625] eta 0:04:11 lr 0.001192 wd 0.0500 time 0.4563 (1.3605) data time 0.0011 (0.0489) model time 0.4552 (1.3115) loss 3.7962 (3.8358) grad_norm 1.1216 (1.3093) loss_scale 8192.0000 (8192.0000) mem 16695MB [2024-08-04 14:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 14:29:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 14:30:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 14:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 14:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 14:32:20 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 14:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 14:32:34 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 14:32:36 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 14:32:38 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 14:32:38 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 34) [2024-08-04 14:32:39 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 14:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][450/625] eta 0:10:18 lr 0.001192 wd 0.0500 time 0.4915 (3.5367) data time 0.0012 (0.0853) model time 0.4903 (3.4514) loss 3.6730 (3.9642) grad_norm 1.2641 (1.3911) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][460/625] eta 0:04:48 lr 0.001192 wd 0.0500 time 0.4893 (1.7465) data time 0.0011 (0.0358) model time 0.4882 (1.7108) loss 3.4386 (3.7672) grad_norm 1.9174 (1.4329) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][470/625] eta 0:03:18 lr 0.001192 wd 0.0500 time 0.4996 (1.2811) data time 0.0008 (0.0229) model time 0.4988 (1.2582) loss 4.7843 (3.8359) grad_norm 1.4388 (1.3884) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][480/625] eta 0:02:36 lr 0.001192 wd 0.0500 time 0.7454 (1.0809) data time 0.0010 (0.0170) model time 0.7444 (1.0639) loss 3.4241 (3.8230) grad_norm 1.3776 (1.3197) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][490/625] eta 0:02:08 lr 0.001192 wd 0.0500 time 0.4941 (0.9532) data time 0.0009 (0.0137) model time 0.4932 (0.9396) loss 3.8343 (3.7663) grad_norm 1.7673 (1.3535) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][500/625] eta 0:01:48 lr 0.001192 wd 0.0500 time 0.4880 (0.8718) data time 0.0010 (0.0114) model time 0.4870 (0.8604) loss 3.7378 (3.7554) grad_norm 1.6860 (1.3987) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][510/625] eta 0:01:33 lr 0.001192 wd 0.0500 time 0.4920 (0.8153) data time 0.0010 (0.0099) model time 0.4910 (0.8054) loss 4.0803 (3.7147) grad_norm 1.9354 (1.4332) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][520/625] eta 0:01:21 lr 0.001192 wd 0.0500 time 0.4907 (0.7735) data time 0.0011 (0.0088) model time 0.4897 (0.7647) loss 3.8114 (3.6845) grad_norm 1.2615 (1.4347) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][530/625] eta 0:01:10 lr 0.001192 wd 0.0500 time 0.4962 (0.7413) data time 0.0011 (0.0079) model time 0.4951 (0.7334) loss 3.4130 (3.6363) grad_norm 1.0808 (1.4102) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][540/625] eta 0:01:00 lr 0.001192 wd 0.0500 time 0.4920 (0.7155) data time 0.0011 (0.0072) model time 0.4909 (0.7084) loss 3.5836 (3.6433) grad_norm 1.4987 (1.4047) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:33:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][550/625] eta 0:00:52 lr 0.001192 wd 0.0500 time 0.4895 (0.6944) data time 0.0008 (0.0066) model time 0.4887 (0.6878) loss 3.4399 (3.6583) grad_norm 1.1956 (1.4038) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][560/625] eta 0:00:43 lr 0.001192 wd 0.0500 time 0.4865 (0.6769) data time 0.0010 (0.0061) model time 0.4855 (0.6707) loss 3.9502 (3.6571) grad_norm 1.3006 (1.4066) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][570/625] eta 0:00:36 lr 0.001192 wd 0.0500 time 0.4971 (0.6621) data time 0.0011 (0.0057) model time 0.4961 (0.6564) loss 3.7137 (3.6547) grad_norm 1.7798 (1.4060) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][580/625] eta 0:00:29 lr 0.001192 wd 0.0500 time 0.4916 (0.6498) data time 0.0009 (0.0054) model time 0.4907 (0.6444) loss 3.5419 (3.6534) grad_norm 1.6460 (1.4061) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][590/625] eta 0:00:22 lr 0.001192 wd 0.0500 time 0.4909 (0.6391) data time 0.0010 (0.0051) model time 0.4899 (0.6340) loss 3.7914 (3.6481) grad_norm 0.9211 (1.3906) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][600/625] eta 0:00:15 lr 0.001192 wd 0.0500 time 0.4893 (0.6299) data time 0.0010 (0.0049) model time 0.4883 (0.6251) loss 3.6345 (3.6517) grad_norm 1.5294 (1.3888) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][610/625] eta 0:00:09 lr 0.001192 wd 0.0500 time 0.4870 (0.6217) data time 0.0008 (0.0047) model time 0.4862 (0.6170) loss 3.7167 (3.6518) grad_norm 1.1465 (1.3842) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [34/300][620/625] eta 0:00:03 lr 0.001192 wd 0.0500 time 0.4876 (0.6141) data time 0.0007 (0.0044) model time 0.4868 (0.6097) loss 3.4189 (3.6457) grad_norm 1.3728 (1.3804) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:34:34 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 34 training takes 0:01:50 [2024-08-04 14:34:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 14:34:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 14:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.589 (0.589) Loss 0.6719 (0.6719) Acc@1 84.619 (84.619) Acc@5 96.973 (96.973) Mem 16721MB [2024-08-04 14:34:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.168) Loss 1.2285 (0.8642) Acc@1 70.996 (79.643) Acc@5 91.504 (95.517) Mem 16721MB [2024-08-04 14:34:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.144) Loss 1.3545 (1.0574) Acc@1 67.041 (75.316) Acc@5 89.795 (93.048) Mem 16721MB [2024-08-04 14:34:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.196 Acc@5 93.054 [2024-08-04 14:34:47 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.2% [2024-08-04 14:34:47 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.20% [2024-08-04 14:34:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 14:34:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 14:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.6221 (0.6221) Acc@1 83.984 (83.984) Acc@5 96.973 (96.973) Mem 16721MB [2024-08-04 14:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 1.1455 (0.7973) Acc@1 70.947 (78.880) Acc@5 90.674 (95.104) Mem 16721MB [2024-08-04 14:34:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.3203 (1.0052) Acc@1 67.676 (74.454) Acc@5 88.672 (92.506) Mem 16721MB [2024-08-04 14:34:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.350 Acc@5 92.494 [2024-08-04 14:34:52 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 74.4% [2024-08-04 14:34:52 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 74.35% [2024-08-04 14:34:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 14:34:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 14:34:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][0/625] eta 0:10:11 lr 0.001192 wd 0.0500 time 0.9788 (0.9788) data time 0.4320 (0.4320) model time 0.0000 (0.0000) loss 4.1960 (4.1960) grad_norm 1.2886 (1.2886) loss_scale 8192.0000 (8192.0000) mem 16725MB [2024-08-04 14:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][10/625] eta 0:05:27 lr 0.001192 wd 0.0500 time 0.4912 (0.5324) data time 0.0010 (0.0402) model time 0.0000 (0.0000) loss 3.6332 (3.6719) grad_norm 1.2361 (1.4885) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][20/625] eta 0:05:10 lr 0.001192 wd 0.0500 time 0.4911 (0.5132) data time 0.0008 (0.0216) model time 0.0000 (0.0000) loss 3.9255 (3.6059) grad_norm 1.3158 (1.3815) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][30/625] eta 0:05:01 lr 0.001192 wd 0.0500 time 0.4886 (0.5062) data time 0.0011 (0.0150) model time 0.0000 (0.0000) loss 3.6522 (3.5611) grad_norm 1.2074 (1.3364) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][40/625] eta 0:04:54 lr 0.001192 wd 0.0500 time 0.4954 (0.5028) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 4.0848 (3.5398) grad_norm 1.4528 (1.3535) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][50/625] eta 0:04:47 lr 0.001192 wd 0.0500 time 0.4906 (0.5002) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 3.9952 (3.5439) grad_norm 1.1940 (1.3909) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][60/625] eta 0:04:41 lr 0.001191 wd 0.0500 time 0.4876 (0.4982) data time 0.0007 (0.0081) model time 0.4869 (0.4869) loss 4.0003 (3.5475) grad_norm 1.2590 (1.3542) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][70/625] eta 0:04:35 lr 0.001191 wd 0.0500 time 0.4914 (0.4969) data time 0.0008 (0.0071) model time 0.4906 (0.4874) loss 4.3270 (3.5318) grad_norm 1.1068 (1.3387) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][80/625] eta 0:04:31 lr 0.001191 wd 0.0500 time 0.6999 (0.4984) data time 0.0010 (0.0064) model time 0.6989 (0.4944) loss 3.6414 (3.5047) grad_norm 1.1212 (1.3271) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][90/625] eta 0:04:26 lr 0.001191 wd 0.0500 time 0.4913 (0.4976) data time 0.0010 (0.0058) model time 0.4903 (0.4932) loss 3.0259 (3.4997) grad_norm 1.5603 (1.3450) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][100/625] eta 0:04:20 lr 0.001191 wd 0.0500 time 0.4966 (0.4971) data time 0.0010 (0.0053) model time 0.4956 (0.4928) loss 2.8028 (3.5090) grad_norm 2.3920 (1.3678) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][110/625] eta 0:04:15 lr 0.001191 wd 0.0500 time 0.4881 (0.4965) data time 0.0011 (0.0050) model time 0.4871 (0.4923) loss 3.3393 (3.5109) grad_norm 1.5659 (1.3908) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][120/625] eta 0:04:10 lr 0.001191 wd 0.0500 time 0.4979 (0.4961) data time 0.0010 (0.0046) model time 0.4969 (0.4921) loss 3.5683 (3.4881) grad_norm 1.1113 (1.3843) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:35:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][130/625] eta 0:04:05 lr 0.001191 wd 0.0500 time 0.4899 (0.4957) data time 0.0010 (0.0044) model time 0.4890 (0.4917) loss 3.8987 (3.4866) grad_norm 1.5521 (1.3751) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][140/625] eta 0:04:00 lr 0.001191 wd 0.0500 time 0.4868 (0.4961) data time 0.0012 (0.0041) model time 0.4856 (0.4927) loss 3.9227 (3.5194) grad_norm 1.1374 (1.3729) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][150/625] eta 0:03:55 lr 0.001191 wd 0.0500 time 0.4912 (0.4958) data time 0.0010 (0.0039) model time 0.4902 (0.4925) loss 4.0566 (3.5295) grad_norm 1.3650 (1.3748) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][160/625] eta 0:03:50 lr 0.001191 wd 0.0500 time 0.5008 (0.4955) data time 0.0007 (0.0037) model time 0.5000 (0.4922) loss 3.5790 (3.5323) grad_norm 1.1005 (1.3632) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][170/625] eta 0:03:45 lr 0.001191 wd 0.0500 time 0.4935 (0.4953) data time 0.0007 (0.0036) model time 0.4928 (0.4921) loss 4.3324 (3.5385) grad_norm 1.0697 (1.3607) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][180/625] eta 0:03:40 lr 0.001191 wd 0.0500 time 0.4915 (0.4952) data time 0.0008 (0.0034) model time 0.4907 (0.4921) loss 3.9560 (3.5332) grad_norm 1.5070 (1.3593) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][190/625] eta 0:03:35 lr 0.001191 wd 0.0500 time 0.4910 (0.4951) data time 0.0011 (0.0033) model time 0.4899 (0.4921) loss 3.3254 (3.5185) grad_norm 0.9881 (1.3791) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][200/625] eta 0:03:30 lr 0.001191 wd 0.0500 time 0.4991 (0.4949) data time 0.0008 (0.0032) model time 0.4983 (0.4921) loss 4.1746 (3.5247) grad_norm 1.2674 (1.3755) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][210/625] eta 0:03:25 lr 0.001191 wd 0.0500 time 0.4888 (0.4946) data time 0.0008 (0.0031) model time 0.4880 (0.4918) loss 3.0417 (3.5143) grad_norm 1.5021 (1.3795) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][220/625] eta 0:03:20 lr 0.001191 wd 0.0500 time 0.4892 (0.4945) data time 0.0009 (0.0030) model time 0.4884 (0.4917) loss 4.7454 (3.5320) grad_norm 1.7724 (1.3869) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][230/625] eta 0:03:15 lr 0.001191 wd 0.0500 time 0.4926 (0.4942) data time 0.0011 (0.0029) model time 0.4915 (0.4914) loss 3.7634 (3.5428) grad_norm 1.3790 (1.3906) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][240/625] eta 0:03:10 lr 0.001191 wd 0.0500 time 0.4921 (0.4941) data time 0.0008 (0.0029) model time 0.4913 (0.4913) loss 3.7694 (3.5428) grad_norm 1.1574 (1.3863) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][250/625] eta 0:03:05 lr 0.001191 wd 0.0500 time 0.4960 (0.4940) data time 0.0008 (0.0028) model time 0.4953 (0.4913) loss 3.3157 (3.5584) grad_norm 1.1294 (1.3849) loss_scale 8192.0000 (8192.0000) mem 16721MB [2024-08-04 14:36:59 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 14:36:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 14:37:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 14:45:52 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 14:45:54 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 14:46:07 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 14:46:19 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 14:46:19 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 14:46:22 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 14:46:24 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 14:46:24 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 35) [2024-08-04 14:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 14:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 14:48:25 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 14:48:35 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 14:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 14:48:51 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 14:48:54 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 14:48:56 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 14:48:56 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 35) [2024-08-04 14:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 14:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][260/625] eta 0:17:38 lr 0.001191 wd 0.0500 time 0.4461 (2.9013) data time 0.0007 (0.0873) model time 0.4454 (2.8140) loss 3.4572 (3.8845) grad_norm 1.3060 (1.1670) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][270/625] eta 0:09:05 lr 0.001191 wd 0.0500 time 0.4385 (1.5373) data time 0.0007 (0.0393) model time 0.4377 (1.4980) loss 4.0795 (3.7646) grad_norm 1.5934 (1.2026) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][280/625] eta 0:06:35 lr 0.001191 wd 0.0500 time 0.4489 (1.1477) data time 0.0008 (0.0256) model time 0.4481 (1.1221) loss 3.7165 (3.7922) grad_norm 1.8237 (1.2559) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][290/625] eta 0:05:26 lr 0.001191 wd 0.0500 time 0.3835 (0.9746) data time 0.0009 (0.0191) model time 0.3826 (0.9555) loss 3.6137 (3.7751) grad_norm 1.4337 (1.3012) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][300/625] eta 0:04:40 lr 0.001191 wd 0.0500 time 0.4477 (0.8646) data time 0.0006 (0.0153) model time 0.4471 (0.8492) loss 3.5699 (3.7277) grad_norm 1.7283 (1.3281) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][310/625] eta 0:04:09 lr 0.001191 wd 0.0500 time 0.4505 (0.7930) data time 0.0007 (0.0128) model time 0.4497 (0.7802) loss 2.9000 (3.7131) grad_norm 0.9875 (1.2917) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][320/625] eta 0:03:46 lr 0.001191 wd 0.0500 time 0.4532 (0.7428) data time 0.0007 (0.0111) model time 0.4525 (0.7317) loss 2.5359 (3.6890) grad_norm 1.3962 (1.2917) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][330/625] eta 0:03:28 lr 0.001191 wd 0.0500 time 0.4491 (0.7053) data time 0.0007 (0.0098) model time 0.4484 (0.6955) loss 3.1452 (3.6654) grad_norm 1.3136 (1.3215) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:49:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][340/625] eta 0:03:12 lr 0.001191 wd 0.0500 time 0.4505 (0.6763) data time 0.0008 (0.0088) model time 0.4496 (0.6675) loss 3.7684 (3.6481) grad_norm 1.1943 (1.3153) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][350/625] eta 0:02:59 lr 0.001191 wd 0.0500 time 0.4548 (0.6531) data time 0.0006 (0.0080) model time 0.4542 (0.6452) loss 4.4780 (3.6699) grad_norm 1.1826 (1.3329) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][360/625] eta 0:02:48 lr 0.001191 wd 0.0500 time 0.4446 (0.6342) data time 0.0006 (0.0073) model time 0.4439 (0.6269) loss 3.0639 (3.6748) grad_norm 1.5976 (1.3333) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][370/625] eta 0:02:37 lr 0.001191 wd 0.0500 time 0.4477 (0.6185) data time 0.0009 (0.0068) model time 0.4468 (0.6117) loss 3.0894 (3.6746) grad_norm 1.4642 (1.3380) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][380/625] eta 0:02:28 lr 0.001191 wd 0.0500 time 0.4486 (0.6051) data time 0.0007 (0.0063) model time 0.4479 (0.5988) loss 4.1071 (3.6610) grad_norm 1.4316 (1.3465) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][390/625] eta 0:02:19 lr 0.001191 wd 0.0500 time 0.4474 (0.5938) data time 0.0008 (0.0059) model time 0.4466 (0.5879) loss 3.4437 (3.6580) grad_norm 1.1286 (1.3528) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][400/625] eta 0:02:11 lr 0.001191 wd 0.0500 time 0.4528 (0.5840) data time 0.0007 (0.0056) model time 0.4521 (0.5785) loss 4.1151 (3.6548) grad_norm 1.3973 (1.3614) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][410/625] eta 0:02:03 lr 0.001191 wd 0.0500 time 0.4492 (0.5756) data time 0.0006 (0.0053) model time 0.4486 (0.5703) loss 2.8643 (3.6452) grad_norm 1.2765 (1.3540) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][420/625] eta 0:01:56 lr 0.001191 wd 0.0500 time 0.4488 (0.5680) data time 0.0009 (0.0050) model time 0.4480 (0.5630) loss 4.0356 (3.6476) grad_norm 1.1813 (1.3583) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][430/625] eta 0:01:49 lr 0.001191 wd 0.0500 time 0.4501 (0.5612) data time 0.0006 (0.0048) model time 0.4495 (0.5564) loss 2.6179 (3.6274) grad_norm 1.3437 (1.3593) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][440/625] eta 0:01:42 lr 0.001191 wd 0.0500 time 0.4454 (0.5559) data time 0.0007 (0.0046) model time 0.4448 (0.5513) loss 4.0984 (3.6230) grad_norm 1.3615 (1.3490) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][450/625] eta 0:01:36 lr 0.001191 wd 0.0500 time 0.4461 (0.5505) data time 0.0009 (0.0044) model time 0.4453 (0.5461) loss 2.5087 (3.6109) grad_norm 2.1226 (1.3520) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][460/625] eta 0:01:30 lr 0.001191 wd 0.0500 time 0.4498 (0.5456) data time 0.0010 (0.0042) model time 0.4488 (0.5414) loss 4.1309 (3.6036) grad_norm 1.4967 (1.3520) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:50:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][470/625] eta 0:01:23 lr 0.001191 wd 0.0500 time 0.4478 (0.5412) data time 0.0009 (0.0041) model time 0.4469 (0.5371) loss 3.0594 (3.5940) grad_norm 1.7612 (1.3660) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][480/625] eta 0:01:17 lr 0.001191 wd 0.0500 time 0.4459 (0.5372) data time 0.0009 (0.0039) model time 0.4450 (0.5332) loss 3.7436 (3.6064) grad_norm 1.1901 (1.3647) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][490/625] eta 0:01:12 lr 0.001191 wd 0.0500 time 0.4487 (0.5335) data time 0.0006 (0.0038) model time 0.4481 (0.5296) loss 4.4138 (3.6022) grad_norm 1.2279 (1.3583) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][500/625] eta 0:01:06 lr 0.001191 wd 0.0500 time 0.4462 (0.5300) data time 0.0009 (0.0037) model time 0.4454 (0.5263) loss 3.3110 (3.5956) grad_norm 1.2826 (1.3560) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][510/625] eta 0:01:00 lr 0.001191 wd 0.0500 time 0.4438 (0.5268) data time 0.0009 (0.0036) model time 0.4429 (0.5232) loss 3.7702 (3.5885) grad_norm 1.4151 (1.3615) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][520/625] eta 0:00:54 lr 0.001191 wd 0.0500 time 0.4441 (0.5238) data time 0.0007 (0.0035) model time 0.4434 (0.5203) loss 4.2295 (3.5805) grad_norm 1.2174 (1.3596) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][530/625] eta 0:00:49 lr 0.001191 wd 0.0500 time 0.4467 (0.5211) data time 0.0009 (0.0034) model time 0.4458 (0.5177) loss 2.7827 (3.5822) grad_norm 1.0477 (1.3594) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][540/625] eta 0:00:44 lr 0.001191 wd 0.0500 time 0.4453 (0.5186) data time 0.0007 (0.0033) model time 0.4446 (0.5153) loss 4.4672 (3.5848) grad_norm 1.5234 (1.3618) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][550/625] eta 0:00:38 lr 0.001191 wd 0.0500 time 0.4501 (0.5162) data time 0.0008 (0.0032) model time 0.4493 (0.5130) loss 3.7030 (3.5691) grad_norm 1.5494 (1.3631) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][560/625] eta 0:00:33 lr 0.001191 wd 0.0500 time 0.4463 (0.5140) data time 0.0006 (0.0031) model time 0.4457 (0.5109) loss 3.2298 (3.5667) grad_norm 1.5064 (1.3637) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][570/625] eta 0:00:28 lr 0.001191 wd 0.0500 time 0.4458 (0.5119) data time 0.0008 (0.0031) model time 0.4450 (0.5089) loss 3.2527 (3.5776) grad_norm 1.3952 (1.3629) loss_scale 8192.0000 (8192.0000) mem 16700MB [2024-08-04 14:51:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][580/625] eta 0:00:22 lr 0.001191 wd 0.0500 time 0.4515 (0.5100) data time 0.0006 (0.0030) model time 0.4509 (0.5070) loss 4.1930 (3.5863) grad_norm 1.3289 (1.3663) loss_scale 16384.0000 (8216.9756) mem 16700MB [2024-08-04 14:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][590/625] eta 0:00:17 lr 0.001191 wd 0.0500 time 0.4448 (0.5082) data time 0.0008 (0.0029) model time 0.4440 (0.5052) loss 3.8517 (3.5828) grad_norm 0.9906 (1.3651) loss_scale 16384.0000 (8458.6036) mem 16700MB [2024-08-04 14:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][600/625] eta 0:00:12 lr 0.001191 wd 0.0500 time 0.4497 (0.5064) data time 0.0006 (0.0029) model time 0.4491 (0.5036) loss 3.3557 (3.5837) grad_norm 1.1673 (1.3613) loss_scale 16384.0000 (8686.3448) mem 16700MB [2024-08-04 14:52:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][610/625] eta 0:00:07 lr 0.001190 wd 0.0500 time 0.4462 (0.5048) data time 0.0006 (0.0028) model time 0.4456 (0.5020) loss 3.7591 (3.5878) grad_norm 1.7818 (1.3588) loss_scale 16384.0000 (8901.3631) mem 16700MB [2024-08-04 14:52:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [35/300][620/625] eta 0:00:02 lr 0.001190 wd 0.0500 time 0.4448 (0.5037) data time 0.0004 (0.0028) model time 0.4444 (0.5009) loss 3.3622 (3.5862) grad_norm 1.2681 (1.3566) loss_scale 16384.0000 (9104.6957) mem 16700MB [2024-08-04 14:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 35 training takes 0:03:07 [2024-08-04 14:52:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 14:52:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 14:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.6772 (0.6772) Acc@1 84.521 (84.521) Acc@5 97.656 (97.656) Mem 16700MB [2024-08-04 14:52:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 1.1992 (0.8533) Acc@1 72.900 (79.803) Acc@5 91.748 (95.779) Mem 16700MB [2024-08-04 14:52:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 1.4053 (1.0518) Acc@1 66.016 (75.393) Acc@5 89.209 (93.273) Mem 16700MB [2024-08-04 14:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.254 Acc@5 93.202 [2024-08-04 14:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.3% [2024-08-04 14:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.25% [2024-08-04 14:52:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 14:52:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 14:52:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.6084 (0.6084) Acc@1 84.570 (84.570) Acc@5 97.168 (97.168) Mem 16700MB [2024-08-04 14:52:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.149) Loss 1.1191 (0.7812) Acc@1 71.582 (79.359) Acc@5 91.113 (95.304) Mem 16700MB [2024-08-04 14:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.134) Loss 1.2959 (0.9847) Acc@1 68.164 (74.949) Acc@5 89.062 (92.801) Mem 16700MB [2024-08-04 14:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 74.830 Acc@5 92.792 [2024-08-04 14:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 74.8% [2024-08-04 14:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 74.83% [2024-08-04 14:52:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 14:52:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 14:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][0/625] eta 0:08:14 lr 0.001190 wd 0.0500 time 0.7906 (0.7906) data time 0.3117 (0.3117) model time 0.0000 (0.0000) loss 4.1407 (4.1407) grad_norm 1.3747 (1.3747) loss_scale 16384.0000 (16384.0000) mem 16712MB [2024-08-04 14:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][10/625] eta 0:05:03 lr 0.001190 wd 0.0500 time 0.4525 (0.4930) data time 0.0009 (0.0292) model time 0.0000 (0.0000) loss 2.1633 (3.4385) grad_norm 1.2411 (1.5277) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][20/625] eta 0:04:45 lr 0.001190 wd 0.0500 time 0.4498 (0.4723) data time 0.0008 (0.0157) model time 0.0000 (0.0000) loss 3.7219 (3.4428) grad_norm 1.0873 (1.3842) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:52:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][30/625] eta 0:04:36 lr 0.001190 wd 0.0500 time 0.4505 (0.4650) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 3.3160 (3.5737) grad_norm 1.5874 (1.3694) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][40/625] eta 0:04:29 lr 0.001190 wd 0.0500 time 0.4486 (0.4612) data time 0.0009 (0.0084) model time 0.0000 (0.0000) loss 3.7315 (3.6143) grad_norm 1.6011 (1.3698) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][50/625] eta 0:04:23 lr 0.001190 wd 0.0500 time 0.4487 (0.4587) data time 0.0007 (0.0069) model time 0.0000 (0.0000) loss 3.0187 (3.5896) grad_norm 1.0709 (1.3388) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][60/625] eta 0:04:18 lr 0.001190 wd 0.0500 time 0.4465 (0.4571) data time 0.0007 (0.0060) model time 0.4458 (0.4477) loss 4.4214 (3.6516) grad_norm 1.3608 (1.3164) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][70/625] eta 0:04:12 lr 0.001190 wd 0.0500 time 0.4457 (0.4558) data time 0.0006 (0.0052) model time 0.4451 (0.4474) loss 3.5215 (3.6518) grad_norm 1.0335 (1.3307) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][80/625] eta 0:04:07 lr 0.001190 wd 0.0500 time 0.4453 (0.4547) data time 0.0007 (0.0047) model time 0.4446 (0.4470) loss 2.3646 (3.6168) grad_norm 0.9587 (1.3449) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][90/625] eta 0:04:02 lr 0.001190 wd 0.0500 time 0.4521 (0.4540) data time 0.0007 (0.0043) model time 0.4514 (0.4471) loss 3.4776 (3.5999) grad_norm 1.7642 (1.3450) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][100/625] eta 0:03:58 lr 0.001190 wd 0.0500 time 0.4494 (0.4537) data time 0.0007 (0.0040) model time 0.4487 (0.4477) loss 3.3834 (3.5761) grad_norm 1.3875 (1.3508) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][110/625] eta 0:03:53 lr 0.001190 wd 0.0500 time 0.4524 (0.4535) data time 0.0007 (0.0037) model time 0.4517 (0.4481) loss 3.9827 (3.5898) grad_norm 1.0462 (1.3483) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][120/625] eta 0:03:48 lr 0.001190 wd 0.0500 time 0.4489 (0.4532) data time 0.0007 (0.0035) model time 0.4482 (0.4482) loss 2.9655 (3.5849) grad_norm 1.2455 (1.3351) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][130/625] eta 0:03:44 lr 0.001190 wd 0.0500 time 0.4474 (0.4529) data time 0.0006 (0.0033) model time 0.4467 (0.4483) loss 2.7841 (3.5787) grad_norm 1.5099 (1.3280) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][140/625] eta 0:03:39 lr 0.001190 wd 0.0500 time 0.4511 (0.4527) data time 0.0008 (0.0031) model time 0.4503 (0.4483) loss 4.4489 (3.6127) grad_norm 1.3153 (1.3247) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][150/625] eta 0:03:34 lr 0.001190 wd 0.0500 time 0.4534 (0.4525) data time 0.0008 (0.0029) model time 0.4526 (0.4484) loss 3.5412 (3.6017) grad_norm 1.5468 (1.3349) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][160/625] eta 0:03:30 lr 0.001190 wd 0.0500 time 0.4456 (0.4524) data time 0.0007 (0.0028) model time 0.4450 (0.4485) loss 3.8970 (3.6048) grad_norm 1.3063 (1.3386) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][170/625] eta 0:03:25 lr 0.001190 wd 0.0500 time 0.4576 (0.4524) data time 0.0008 (0.0027) model time 0.4569 (0.4488) loss 3.5360 (3.5909) grad_norm 1.7205 (1.3410) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][180/625] eta 0:03:21 lr 0.001190 wd 0.0500 time 0.4528 (0.4523) data time 0.0007 (0.0026) model time 0.4522 (0.4488) loss 4.1329 (3.6004) grad_norm 1.5678 (1.3491) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][190/625] eta 0:03:16 lr 0.001190 wd 0.0500 time 0.4489 (0.4521) data time 0.0009 (0.0025) model time 0.4481 (0.4488) loss 3.2187 (3.6099) grad_norm 1.1235 (1.3507) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][200/625] eta 0:03:12 lr 0.001190 wd 0.0500 time 0.4445 (0.4520) data time 0.0007 (0.0024) model time 0.4438 (0.4488) loss 3.2094 (3.6148) grad_norm 1.6965 (1.3509) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][210/625] eta 0:03:07 lr 0.001190 wd 0.0500 time 0.4508 (0.4518) data time 0.0008 (0.0024) model time 0.4500 (0.4487) loss 3.8997 (3.6218) grad_norm 1.1514 (1.3489) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][220/625] eta 0:03:03 lr 0.001190 wd 0.0500 time 0.4470 (0.4523) data time 0.0009 (0.0023) model time 0.4461 (0.4495) loss 2.8200 (3.6197) grad_norm 1.3505 (1.3428) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][230/625] eta 0:02:58 lr 0.001190 wd 0.0500 time 0.4493 (0.4520) data time 0.0008 (0.0022) model time 0.4485 (0.4493) loss 2.8559 (3.6123) grad_norm 1.3817 (1.3431) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][240/625] eta 0:02:53 lr 0.001190 wd 0.0500 time 0.4465 (0.4519) data time 0.0009 (0.0022) model time 0.4456 (0.4492) loss 4.1730 (3.6158) grad_norm 1.4921 (1.3460) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][250/625] eta 0:02:49 lr 0.001190 wd 0.0500 time 0.4470 (0.4517) data time 0.0006 (0.0021) model time 0.4464 (0.4491) loss 4.1119 (3.6257) grad_norm 1.0904 (1.3469) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][260/625] eta 0:02:44 lr 0.001190 wd 0.0500 time 0.4481 (0.4516) data time 0.0007 (0.0021) model time 0.4474 (0.4490) loss 4.0986 (3.6308) grad_norm 1.3736 (1.3477) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][270/625] eta 0:02:40 lr 0.001190 wd 0.0500 time 0.4455 (0.4515) data time 0.0006 (0.0020) model time 0.4449 (0.4489) loss 4.0397 (3.6202) grad_norm 1.3381 (1.3439) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][280/625] eta 0:02:35 lr 0.001190 wd 0.0500 time 0.4482 (0.4514) data time 0.0007 (0.0020) model time 0.4475 (0.4489) loss 4.0925 (3.6198) grad_norm 1.2703 (1.3521) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][290/625] eta 0:02:31 lr 0.001190 wd 0.0500 time 0.4459 (0.4513) data time 0.0008 (0.0020) model time 0.4450 (0.4489) loss 3.6854 (3.6102) grad_norm 1.1223 (1.3519) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][300/625] eta 0:02:26 lr 0.001190 wd 0.0500 time 0.4450 (0.4512) data time 0.0006 (0.0019) model time 0.4444 (0.4488) loss 4.1048 (3.6209) grad_norm 1.1601 (1.3489) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][310/625] eta 0:02:22 lr 0.001190 wd 0.0500 time 0.4498 (0.4512) data time 0.0006 (0.0019) model time 0.4491 (0.4488) loss 4.0984 (3.6217) grad_norm 1.1378 (1.3434) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:54:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][320/625] eta 0:02:17 lr 0.001190 wd 0.0500 time 0.4543 (0.4511) data time 0.0009 (0.0018) model time 0.4535 (0.4488) loss 3.8492 (3.6170) grad_norm 1.4866 (1.3412) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][330/625] eta 0:02:13 lr 0.001190 wd 0.0500 time 0.4479 (0.4511) data time 0.0006 (0.0018) model time 0.4473 (0.4489) loss 3.7139 (3.6062) grad_norm 1.3435 (1.3547) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][340/625] eta 0:02:08 lr 0.001190 wd 0.0500 time 0.4483 (0.4515) data time 0.0006 (0.0018) model time 0.4477 (0.4493) loss 3.6787 (3.5986) grad_norm 0.9770 (1.3568) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][350/625] eta 0:02:04 lr 0.001190 wd 0.0500 time 0.4450 (0.4514) data time 0.0006 (0.0018) model time 0.4443 (0.4493) loss 3.3019 (3.5926) grad_norm 1.2952 (1.3598) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][360/625] eta 0:01:59 lr 0.001190 wd 0.0500 time 0.4528 (0.4519) data time 0.0008 (0.0017) model time 0.4520 (0.4499) loss 3.7112 (3.5947) grad_norm 1.1522 (1.3602) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][370/625] eta 0:01:55 lr 0.001190 wd 0.0500 time 0.4454 (0.4518) data time 0.0008 (0.0017) model time 0.4446 (0.4498) loss 3.7963 (3.6002) grad_norm 1.6476 (1.3651) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][380/625] eta 0:01:50 lr 0.001190 wd 0.0500 time 0.4483 (0.4517) data time 0.0008 (0.0017) model time 0.4475 (0.4497) loss 2.3919 (3.5935) grad_norm 1.5077 (1.3684) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][390/625] eta 0:01:46 lr 0.001190 wd 0.0500 time 0.4491 (0.4516) data time 0.0007 (0.0017) model time 0.4485 (0.4496) loss 3.2168 (3.5946) grad_norm 1.0601 (1.3632) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][400/625] eta 0:01:41 lr 0.001190 wd 0.0500 time 0.4477 (0.4515) data time 0.0007 (0.0016) model time 0.4470 (0.4496) loss 4.1241 (3.6000) grad_norm 1.8182 (1.3672) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][410/625] eta 0:01:37 lr 0.001190 wd 0.0500 time 0.4512 (0.4515) data time 0.0008 (0.0016) model time 0.4503 (0.4496) loss 3.6716 (3.5989) grad_norm 1.8797 (1.3654) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][420/625] eta 0:01:32 lr 0.001190 wd 0.0500 time 0.4454 (0.4514) data time 0.0008 (0.0016) model time 0.4445 (0.4495) loss 3.9723 (3.5989) grad_norm 0.9854 (1.3626) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][430/625] eta 0:01:28 lr 0.001190 wd 0.0500 time 0.4418 (0.4514) data time 0.0009 (0.0016) model time 0.4410 (0.4495) loss 2.9523 (3.5975) grad_norm 1.0477 (1.3571) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][440/625] eta 0:01:23 lr 0.001190 wd 0.0500 time 0.4469 (0.4513) data time 0.0006 (0.0016) model time 0.4462 (0.4494) loss 2.7703 (3.5865) grad_norm 1.3583 (1.3567) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][450/625] eta 0:01:18 lr 0.001190 wd 0.0500 time 0.4469 (0.4512) data time 0.0008 (0.0016) model time 0.4461 (0.4493) loss 2.9692 (3.5901) grad_norm 1.3054 (1.3596) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][460/625] eta 0:01:14 lr 0.001190 wd 0.0500 time 0.4484 (0.4511) data time 0.0006 (0.0015) model time 0.4478 (0.4493) loss 3.6087 (3.5856) grad_norm 1.4585 (1.3569) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][470/625] eta 0:01:09 lr 0.001190 wd 0.0500 time 0.4576 (0.4511) data time 0.0006 (0.0015) model time 0.4570 (0.4493) loss 4.3226 (3.5857) grad_norm 1.3820 (1.3563) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][480/625] eta 0:01:05 lr 0.001190 wd 0.0500 time 0.4496 (0.4510) data time 0.0006 (0.0015) model time 0.4490 (0.4493) loss 3.0291 (3.5838) grad_norm 1.6458 (1.3560) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][490/625] eta 0:01:00 lr 0.001189 wd 0.0500 time 0.4472 (0.4510) data time 0.0008 (0.0015) model time 0.4464 (0.4492) loss 2.5910 (3.5811) grad_norm 1.2784 (1.3535) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][500/625] eta 0:00:56 lr 0.001189 wd 0.0500 time 0.4495 (0.4509) data time 0.0006 (0.0015) model time 0.4489 (0.4492) loss 3.6169 (3.5803) grad_norm 1.2442 (1.3528) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][510/625] eta 0:00:51 lr 0.001189 wd 0.0500 time 0.4459 (0.4509) data time 0.0009 (0.0015) model time 0.4450 (0.4491) loss 3.7931 (3.5762) grad_norm 1.1028 (1.3597) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][520/625] eta 0:00:47 lr 0.001189 wd 0.0500 time 0.4487 (0.4508) data time 0.0006 (0.0015) model time 0.4480 (0.4491) loss 3.7371 (3.5750) grad_norm 2.0499 (1.3616) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][530/625] eta 0:00:42 lr 0.001189 wd 0.0500 time 0.4480 (0.4509) data time 0.0008 (0.0015) model time 0.4472 (0.4492) loss 3.8330 (3.5752) grad_norm 0.9547 (1.3588) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][540/625] eta 0:00:38 lr 0.001189 wd 0.0500 time 0.4482 (0.4508) data time 0.0006 (0.0014) model time 0.4476 (0.4491) loss 4.1661 (3.5726) grad_norm 1.2291 (1.3598) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][550/625] eta 0:00:33 lr 0.001189 wd 0.0500 time 0.4470 (0.4508) data time 0.0009 (0.0014) model time 0.4462 (0.4491) loss 3.8721 (3.5781) grad_norm 1.0088 (1.3595) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][560/625] eta 0:00:29 lr 0.001189 wd 0.0500 time 0.4503 (0.4508) data time 0.0007 (0.0014) model time 0.4496 (0.4491) loss 3.4946 (3.5846) grad_norm 1.0325 (1.3620) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][570/625] eta 0:00:24 lr 0.001189 wd 0.0500 time 0.4465 (0.4507) data time 0.0008 (0.0014) model time 0.4457 (0.4490) loss 2.4420 (3.5785) grad_norm 1.1931 (1.3680) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][580/625] eta 0:00:20 lr 0.001189 wd 0.0500 time 0.4471 (0.4506) data time 0.0006 (0.0014) model time 0.4465 (0.4490) loss 3.5367 (3.5724) grad_norm 1.4222 (1.3700) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:56:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][590/625] eta 0:00:15 lr 0.001189 wd 0.0500 time 0.4475 (0.4506) data time 0.0009 (0.0014) model time 0.4466 (0.4490) loss 3.6176 (3.5739) grad_norm 1.0516 (1.3687) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][600/625] eta 0:00:11 lr 0.001189 wd 0.0500 time 0.4488 (0.4506) data time 0.0007 (0.0014) model time 0.4481 (0.4489) loss 3.5363 (3.5739) grad_norm 1.0080 (1.3655) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][610/625] eta 0:00:06 lr 0.001189 wd 0.0500 time 0.4401 (0.4505) data time 0.0004 (0.0014) model time 0.4397 (0.4489) loss 2.7404 (3.5720) grad_norm 1.1882 (1.3630) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [36/300][620/625] eta 0:00:02 lr 0.001189 wd 0.0500 time 0.4442 (0.4504) data time 0.0004 (0.0014) model time 0.4437 (0.4488) loss 2.9524 (3.5717) grad_norm 1.3652 (1.3632) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:13 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 36 training takes 0:04:41 [2024-08-04 14:57:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 14:57:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 14:57:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.6934 (0.6934) Acc@1 84.277 (84.277) Acc@5 97.168 (97.168) Mem 16703MB [2024-08-04 14:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.153) Loss 1.1768 (0.8522) Acc@1 71.680 (80.393) Acc@5 92.725 (95.787) Mem 16703MB [2024-08-04 14:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.136) Loss 1.3057 (1.0352) Acc@1 69.287 (76.058) Acc@5 89.844 (93.473) Mem 16703MB [2024-08-04 14:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.916 Acc@5 93.448 [2024-08-04 14:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.9% [2024-08-04 14:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 75.92% [2024-08-04 14:57:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 14:57:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 14:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.6006 (0.6006) Acc@1 84.961 (84.961) Acc@5 97.217 (97.217) Mem 16703MB [2024-08-04 14:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.150) Loss 1.0977 (0.7682) Acc@1 72.363 (79.909) Acc@5 91.504 (95.494) Mem 16703MB [2024-08-04 14:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.134) Loss 1.2734 (0.9674) Acc@1 68.408 (75.467) Acc@5 89.111 (93.025) Mem 16703MB [2024-08-04 14:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.316 Acc@5 93.006 [2024-08-04 14:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 75.3% [2024-08-04 14:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 75.32% [2024-08-04 14:57:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 14:57:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 14:57:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][0/625] eta 0:07:35 lr 0.001189 wd 0.0500 time 0.7292 (0.7292) data time 0.3403 (0.3403) model time 0.0000 (0.0000) loss 3.3997 (3.3997) grad_norm 1.5617 (1.5617) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][10/625] eta 0:04:51 lr 0.001189 wd 0.0500 time 0.4494 (0.4734) data time 0.0006 (0.0317) model time 0.0000 (0.0000) loss 4.6146 (3.5818) grad_norm 1.2762 (1.3858) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][20/625] eta 0:04:42 lr 0.001189 wd 0.0500 time 0.4454 (0.4673) data time 0.0006 (0.0170) model time 0.0000 (0.0000) loss 2.9391 (3.6110) grad_norm 1.1686 (1.3548) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][30/625] eta 0:04:38 lr 0.001189 wd 0.0500 time 0.6731 (0.4684) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 3.1825 (3.5285) grad_norm 1.2763 (1.3493) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][40/625] eta 0:04:31 lr 0.001189 wd 0.0500 time 0.4493 (0.4637) data time 0.0006 (0.0091) model time 0.0000 (0.0000) loss 4.4468 (3.5890) grad_norm 1.2939 (1.3242) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][50/625] eta 0:04:25 lr 0.001189 wd 0.0500 time 0.4496 (0.4609) data time 0.0008 (0.0075) model time 0.0000 (0.0000) loss 2.9000 (3.5818) grad_norm 1.0185 (1.3079) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][60/625] eta 0:04:19 lr 0.001189 wd 0.0500 time 0.4506 (0.4591) data time 0.0006 (0.0064) model time 0.4500 (0.4490) loss 4.2949 (3.6385) grad_norm 1.3533 (1.3040) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:57:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][70/625] eta 0:04:14 lr 0.001189 wd 0.0500 time 0.4473 (0.4577) data time 0.0008 (0.0056) model time 0.4465 (0.4488) loss 3.2441 (3.6168) grad_norm 1.2112 (1.2861) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][80/625] eta 0:04:08 lr 0.001189 wd 0.0500 time 0.4489 (0.4565) data time 0.0008 (0.0050) model time 0.4481 (0.4482) loss 3.9649 (3.5864) grad_norm 1.1458 (1.2868) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][90/625] eta 0:04:03 lr 0.001189 wd 0.0500 time 0.4483 (0.4556) data time 0.0006 (0.0046) model time 0.4478 (0.4481) loss 2.3910 (3.5665) grad_norm 1.1920 (1.2986) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][100/625] eta 0:03:58 lr 0.001189 wd 0.0500 time 0.4468 (0.4549) data time 0.0006 (0.0042) model time 0.4462 (0.4480) loss 3.0639 (3.5658) grad_norm 1.2060 (1.3475) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][110/625] eta 0:03:54 lr 0.001189 wd 0.0500 time 0.4501 (0.4545) data time 0.0006 (0.0039) model time 0.4495 (0.4482) loss 3.4717 (3.5696) grad_norm 1.3891 (1.3573) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][120/625] eta 0:03:49 lr 0.001189 wd 0.0500 time 0.4498 (0.4541) data time 0.0006 (0.0036) model time 0.4491 (0.4483) loss 4.3210 (3.5809) grad_norm 1.1262 (1.3591) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][130/625] eta 0:03:44 lr 0.001189 wd 0.0500 time 0.4481 (0.4538) data time 0.0009 (0.0034) model time 0.4472 (0.4484) loss 3.5580 (3.5815) grad_norm 2.1067 (1.3932) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][140/625] eta 0:03:39 lr 0.001189 wd 0.0500 time 0.4487 (0.4535) data time 0.0009 (0.0033) model time 0.4478 (0.4485) loss 3.5981 (3.5780) grad_norm 1.2318 (1.3984) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][150/625] eta 0:03:35 lr 0.001189 wd 0.0500 time 0.4523 (0.4532) data time 0.0008 (0.0031) model time 0.4515 (0.4484) loss 3.6900 (3.5787) grad_norm 0.8952 (1.3882) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][160/625] eta 0:03:30 lr 0.001189 wd 0.0500 time 0.4481 (0.4529) data time 0.0008 (0.0030) model time 0.4472 (0.4484) loss 2.6892 (3.5712) grad_norm 0.9657 (1.3790) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][170/625] eta 0:03:25 lr 0.001189 wd 0.0500 time 0.4517 (0.4527) data time 0.0006 (0.0028) model time 0.4511 (0.4484) loss 4.1263 (3.5688) grad_norm 1.2911 (1.3753) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][180/625] eta 0:03:21 lr 0.001189 wd 0.0500 time 0.4475 (0.4525) data time 0.0009 (0.0027) model time 0.4466 (0.4483) loss 3.8791 (3.5851) grad_norm 1.3028 (1.3715) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][190/625] eta 0:03:16 lr 0.001189 wd 0.0500 time 0.4533 (0.4524) data time 0.0006 (0.0026) model time 0.4527 (0.4484) loss 3.3756 (3.5982) grad_norm 1.4660 (1.3720) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][200/625] eta 0:03:12 lr 0.001189 wd 0.0500 time 0.4464 (0.4522) data time 0.0006 (0.0025) model time 0.4458 (0.4484) loss 3.8346 (3.5961) grad_norm 1.7093 (1.3803) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:58:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][210/625] eta 0:03:07 lr 0.001189 wd 0.0500 time 0.4512 (0.4522) data time 0.0009 (0.0025) model time 0.4503 (0.4486) loss 3.6768 (3.5987) grad_norm 1.7988 (1.3801) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][220/625] eta 0:03:03 lr 0.001189 wd 0.0500 time 0.4465 (0.4520) data time 0.0006 (0.0024) model time 0.4458 (0.4485) loss 3.0234 (3.5974) grad_norm 1.4789 (1.3800) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][230/625] eta 0:02:58 lr 0.001189 wd 0.0500 time 0.4483 (0.4519) data time 0.0006 (0.0023) model time 0.4476 (0.4485) loss 2.3478 (3.5802) grad_norm 1.6168 (1.3848) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][240/625] eta 0:02:53 lr 0.001189 wd 0.0500 time 0.4464 (0.4517) data time 0.0008 (0.0023) model time 0.4456 (0.4484) loss 3.9412 (3.5824) grad_norm 1.1493 (1.3804) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][250/625] eta 0:02:49 lr 0.001189 wd 0.0500 time 0.4436 (0.4515) data time 0.0008 (0.0022) model time 0.4428 (0.4483) loss 2.5170 (3.5856) grad_norm 1.4457 (1.3890) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][260/625] eta 0:02:44 lr 0.001189 wd 0.0500 time 0.4550 (0.4515) data time 0.0006 (0.0022) model time 0.4544 (0.4484) loss 4.0367 (3.5760) grad_norm 1.3591 (1.3901) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][270/625] eta 0:02:40 lr 0.001189 wd 0.0500 time 0.4498 (0.4515) data time 0.0006 (0.0021) model time 0.4492 (0.4485) loss 2.9706 (3.5826) grad_norm 1.2480 (1.3911) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][280/625] eta 0:02:35 lr 0.001189 wd 0.0500 time 0.4456 (0.4514) data time 0.0007 (0.0021) model time 0.4449 (0.4484) loss 4.3102 (3.5920) grad_norm 1.4023 (1.3881) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][290/625] eta 0:02:31 lr 0.001189 wd 0.0500 time 0.4510 (0.4514) data time 0.0007 (0.0020) model time 0.4503 (0.4485) loss 3.9447 (3.5999) grad_norm 1.2815 (1.3866) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][300/625] eta 0:02:26 lr 0.001189 wd 0.0500 time 0.4525 (0.4513) data time 0.0006 (0.0020) model time 0.4519 (0.4485) loss 3.0544 (3.5987) grad_norm 0.9608 (1.3846) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][310/625] eta 0:02:22 lr 0.001189 wd 0.0500 time 0.4440 (0.4512) data time 0.0009 (0.0019) model time 0.4431 (0.4485) loss 3.5202 (3.5998) grad_norm 1.3079 (1.3879) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][320/625] eta 0:02:17 lr 0.001189 wd 0.0500 time 0.4485 (0.4511) data time 0.0007 (0.0019) model time 0.4477 (0.4484) loss 3.0749 (3.5910) grad_norm 1.5357 (1.3873) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][330/625] eta 0:02:13 lr 0.001189 wd 0.0500 time 0.4510 (0.4510) data time 0.0006 (0.0019) model time 0.4505 (0.4484) loss 4.4163 (3.5857) grad_norm 1.5860 (1.3836) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 14:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][340/625] eta 0:02:08 lr 0.001189 wd 0.0500 time 0.4476 (0.4510) data time 0.0008 (0.0018) model time 0.4468 (0.4484) loss 3.5959 (3.5822) grad_norm 1.1113 (1.3805) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][350/625] eta 0:02:04 lr 0.001189 wd 0.0500 time 0.4501 (0.4509) data time 0.0009 (0.0018) model time 0.4493 (0.4484) loss 3.8646 (3.5833) grad_norm 1.4767 (1.3836) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][360/625] eta 0:01:59 lr 0.001188 wd 0.0500 time 0.4495 (0.4513) data time 0.0011 (0.0018) model time 0.4484 (0.4489) loss 3.9359 (3.5906) grad_norm 1.6994 (1.3826) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][370/625] eta 0:01:55 lr 0.001188 wd 0.0500 time 0.4504 (0.4518) data time 0.0006 (0.0018) model time 0.4498 (0.4495) loss 3.8133 (3.5924) grad_norm 1.7681 (1.3881) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][380/625] eta 0:01:50 lr 0.001188 wd 0.0500 time 0.4452 (0.4517) data time 0.0009 (0.0018) model time 0.4443 (0.4494) loss 3.4284 (3.5893) grad_norm 1.9922 (1.3916) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][390/625] eta 0:01:46 lr 0.001188 wd 0.0500 time 0.4504 (0.4516) data time 0.0006 (0.0017) model time 0.4497 (0.4494) loss 2.4269 (3.5879) grad_norm 1.1443 (1.3979) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][400/625] eta 0:01:41 lr 0.001188 wd 0.0500 time 0.4494 (0.4516) data time 0.0009 (0.0017) model time 0.4485 (0.4493) loss 4.2620 (3.5918) grad_norm 1.2640 (1.3971) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][410/625] eta 0:01:37 lr 0.001188 wd 0.0500 time 0.4521 (0.4515) data time 0.0007 (0.0017) model time 0.4514 (0.4493) loss 3.0666 (3.5973) grad_norm 1.1602 (1.3947) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][420/625] eta 0:01:32 lr 0.001188 wd 0.0500 time 0.4509 (0.4515) data time 0.0009 (0.0017) model time 0.4500 (0.4493) loss 3.7489 (3.5956) grad_norm 1.1507 (1.3921) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][430/625] eta 0:01:28 lr 0.001188 wd 0.0500 time 0.4492 (0.4515) data time 0.0009 (0.0017) model time 0.4483 (0.4493) loss 3.0408 (3.5930) grad_norm 1.8545 (1.3900) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][440/625] eta 0:01:23 lr 0.001188 wd 0.0500 time 0.4449 (0.4515) data time 0.0009 (0.0016) model time 0.4439 (0.4494) loss 3.3004 (3.5947) grad_norm 1.4369 (1.3900) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][450/625] eta 0:01:18 lr 0.001188 wd 0.0500 time 0.4501 (0.4514) data time 0.0008 (0.0016) model time 0.4493 (0.4493) loss 2.9417 (3.5938) grad_norm 1.2241 (1.3861) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][460/625] eta 0:01:14 lr 0.001188 wd 0.0500 time 0.4478 (0.4514) data time 0.0007 (0.0016) model time 0.4471 (0.4493) loss 4.1504 (3.5966) grad_norm 1.3843 (1.3873) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][470/625] eta 0:01:09 lr 0.001188 wd 0.0500 time 0.4528 (0.4513) data time 0.0011 (0.0016) model time 0.4517 (0.4493) loss 4.0578 (3.5951) grad_norm 1.1518 (1.3855) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][480/625] eta 0:01:05 lr 0.001188 wd 0.0500 time 0.4500 (0.4513) data time 0.0008 (0.0016) model time 0.4492 (0.4493) loss 3.8639 (3.5873) grad_norm 1.2059 (1.3816) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][490/625] eta 0:01:00 lr 0.001188 wd 0.0500 time 0.4501 (0.4513) data time 0.0006 (0.0016) model time 0.4495 (0.4493) loss 3.5252 (3.5797) grad_norm 1.7034 (1.3849) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][500/625] eta 0:00:56 lr 0.001188 wd 0.0500 time 0.4509 (0.4513) data time 0.0006 (0.0016) model time 0.4503 (0.4493) loss 3.7458 (3.5754) grad_norm 1.4421 (1.3834) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][510/625] eta 0:00:51 lr 0.001188 wd 0.0500 time 0.4477 (0.4513) data time 0.0008 (0.0015) model time 0.4469 (0.4493) loss 3.5736 (3.5798) grad_norm 1.1381 (1.3819) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][520/625] eta 0:00:47 lr 0.001188 wd 0.0500 time 0.4450 (0.4512) data time 0.0008 (0.0015) model time 0.4442 (0.4493) loss 4.1044 (3.5830) grad_norm 1.1569 (1.3828) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][530/625] eta 0:00:42 lr 0.001188 wd 0.0500 time 0.4495 (0.4512) data time 0.0006 (0.0015) model time 0.4489 (0.4493) loss 2.4522 (3.5848) grad_norm 1.3854 (1.3829) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][540/625] eta 0:00:38 lr 0.001188 wd 0.0500 time 0.5425 (0.4513) data time 0.0008 (0.0015) model time 0.5417 (0.4494) loss 3.6030 (3.5832) grad_norm 1.2091 (1.3810) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][550/625] eta 0:00:33 lr 0.001188 wd 0.0500 time 0.4478 (0.4512) data time 0.0007 (0.0015) model time 0.4471 (0.4493) loss 4.0760 (3.5814) grad_norm 1.8461 (1.3815) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][560/625] eta 0:00:29 lr 0.001188 wd 0.0500 time 0.4502 (0.4514) data time 0.0006 (0.0015) model time 0.4496 (0.4496) loss 3.3381 (3.5825) grad_norm 1.4131 (1.3822) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][570/625] eta 0:00:24 lr 0.001188 wd 0.0500 time 0.4471 (0.4514) data time 0.0006 (0.0015) model time 0.4465 (0.4496) loss 2.5473 (3.5759) grad_norm 1.5112 (1.3792) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][580/625] eta 0:00:20 lr 0.001188 wd 0.0500 time 0.4500 (0.4514) data time 0.0010 (0.0015) model time 0.4490 (0.4496) loss 4.0190 (3.5765) grad_norm 1.1396 (1.3771) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][590/625] eta 0:00:15 lr 0.001188 wd 0.0500 time 0.4475 (0.4514) data time 0.0006 (0.0015) model time 0.4468 (0.4496) loss 3.9561 (3.5775) grad_norm 1.1424 (1.3761) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][600/625] eta 0:00:11 lr 0.001188 wd 0.0500 time 0.4449 (0.4513) data time 0.0008 (0.0014) model time 0.4441 (0.4495) loss 3.4028 (3.5806) grad_norm 1.5184 (1.3739) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:01:56 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 15:01:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:01:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 15:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 15:04:45 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 15:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 15:04:57 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 15:04:59 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 15:05:01 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 15:05:01 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 37) [2024-08-04 15:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 15:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][610/625] eta 0:00:39 lr 0.001188 wd 0.0500 time 0.4667 (2.6246) data time 0.0006 (0.0820) model time 0.4661 (2.5426) loss 4.0460 (3.9874) grad_norm 1.0724 (1.3337) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:05:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [37/300][620/625] eta 0:00:07 lr 0.001188 wd 0.0500 time 0.4588 (1.4869) data time 0.0008 (0.0393) model time 0.4581 (1.4476) loss 3.5148 (3.7703) grad_norm 1.5587 (1.3059) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:05:36 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 37 training takes 0:00:30 [2024-08-04 15:05:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:05:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 0.6719 (0.6719) Acc@1 85.010 (85.010) Acc@5 97.021 (97.021) Mem 16699MB [2024-08-04 15:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.159) Loss 1.1016 (0.8484) Acc@1 72.852 (80.069) Acc@5 93.018 (95.650) Mem 16699MB [2024-08-04 15:05:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.3232 (1.0252) Acc@1 68.408 (75.951) Acc@5 90.430 (93.513) Mem 16699MB [2024-08-04 15:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.768 Acc@5 93.470 [2024-08-04 15:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 75.8% [2024-08-04 15:05:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.966 (0.966) Loss 0.5923 (0.5923) Acc@1 85.107 (85.107) Acc@5 97.168 (97.168) Mem 16699MB [2024-08-04 15:05:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.123 (0.203) Loss 1.0791 (0.7560) Acc@1 72.754 (80.202) Acc@5 91.895 (95.681) Mem 16699MB [2024-08-04 15:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.162) Loss 1.2578 (0.9515) Acc@1 68.408 (75.777) Acc@5 89.404 (93.269) Mem 16699MB [2024-08-04 15:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 75.624 Acc@5 93.248 [2024-08-04 15:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 75.6% [2024-08-04 15:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 75.62% [2024-08-04 15:05:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:05:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:05:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][0/625] eta 0:09:54 lr 0.001188 wd 0.0500 time 0.9511 (0.9511) data time 0.3936 (0.3936) model time 0.0000 (0.0000) loss 3.6579 (3.6579) grad_norm 1.1122 (1.1122) loss_scale 16384.0000 (16384.0000) mem 16711MB [2024-08-04 15:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][10/625] eta 0:05:28 lr 0.001188 wd 0.0500 time 0.4635 (0.5341) data time 0.0008 (0.0368) model time 0.0000 (0.0000) loss 3.0246 (3.7535) grad_norm 1.1452 (1.2229) loss_scale 16384.0000 (16384.0000) mem 16706MB [2024-08-04 15:06:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][20/625] eta 0:05:02 lr 0.001188 wd 0.0500 time 0.4580 (0.5001) data time 0.0008 (0.0198) model time 0.0000 (0.0000) loss 3.3787 (3.6184) grad_norm 2.1152 (1.2838) loss_scale 16384.0000 (16384.0000) mem 16706MB [2024-08-04 15:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][30/625] eta 0:04:50 lr 0.001188 wd 0.0500 time 0.4599 (0.4878) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 4.3188 (3.6315) grad_norm 2.8079 (1.4085) loss_scale 16384.0000 (16384.0000) mem 16706MB [2024-08-04 15:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][40/625] eta 0:04:41 lr 0.001188 wd 0.0500 time 0.4613 (0.4817) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 3.7767 (3.6006) grad_norm 2.3378 (1.5023) loss_scale 16384.0000 (16384.0000) mem 16706MB [2024-08-04 15:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][50/625] eta 0:04:35 lr 0.001188 wd 0.0500 time 0.4659 (0.4785) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 3.6978 (3.5688) grad_norm 1.3900 (1.4783) loss_scale 16384.0000 (16384.0000) mem 16706MB [2024-08-04 15:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 15:06:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:06:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 15:08:16 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 15:08:29 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 15:08:41 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 15:08:41 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 15:08:43 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 15:08:45 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 15:08:45 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 38) [2024-08-04 15:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 15:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][60/625] eta 0:39:04 lr 0.001188 wd 0.0500 time 0.4702 (4.1493) data time 0.0008 (0.1145) model time 0.4693 (4.0348) loss 4.1679 (4.1069) grad_norm 1.9831 (1.6984) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][70/625] eta 0:17:06 lr 0.001188 wd 0.0500 time 0.4713 (1.8496) data time 0.0012 (0.0437) model time 0.4701 (1.8059) loss 3.7924 (3.8726) grad_norm 1.3235 (1.7120) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][80/625] eta 0:11:57 lr 0.001188 wd 0.0500 time 0.4636 (1.3170) data time 0.0009 (0.0273) model time 0.4627 (1.2897) loss 3.4649 (3.8422) grad_norm 1.3575 (1.5686) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][90/625] eta 0:09:41 lr 0.001188 wd 0.0500 time 0.4661 (1.0878) data time 0.0010 (0.0200) model time 0.4651 (1.0678) loss 3.9872 (3.8442) grad_norm 1.4137 (1.4790) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][100/625] eta 0:08:22 lr 0.001188 wd 0.0500 time 0.4660 (0.9566) data time 0.0010 (0.0159) model time 0.4649 (0.9407) loss 2.9620 (3.7862) grad_norm 1.0016 (1.5149) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][110/625] eta 0:07:27 lr 0.001188 wd 0.0500 time 0.4710 (0.8692) data time 0.0008 (0.0133) model time 0.4703 (0.8559) loss 4.0277 (3.7459) grad_norm 0.9157 (1.4938) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][120/625] eta 0:06:48 lr 0.001188 wd 0.0500 time 0.4686 (0.8083) data time 0.0008 (0.0114) model time 0.4678 (0.7969) loss 2.8116 (3.6986) grad_norm 2.2627 (1.4968) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][130/625] eta 0:06:17 lr 0.001188 wd 0.0500 time 0.4685 (0.7636) data time 0.0010 (0.0101) model time 0.4674 (0.7535) loss 3.9979 (3.6708) grad_norm 1.1997 (1.4614) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][140/625] eta 0:05:53 lr 0.001188 wd 0.0500 time 0.4664 (0.7292) data time 0.0009 (0.0090) model time 0.4655 (0.7202) loss 3.1160 (3.6332) grad_norm 1.0844 (1.4333) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][150/625] eta 0:05:33 lr 0.001188 wd 0.0500 time 0.4642 (0.7016) data time 0.0008 (0.0082) model time 0.4635 (0.6934) loss 3.6374 (3.6325) grad_norm 1.7496 (1.4583) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][160/625] eta 0:05:15 lr 0.001188 wd 0.0500 time 0.4601 (0.6791) data time 0.0012 (0.0075) model time 0.4589 (0.6716) loss 4.5946 (3.6571) grad_norm 1.1207 (1.4524) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][170/625] eta 0:05:00 lr 0.001188 wd 0.0500 time 0.4676 (0.6605) data time 0.0008 (0.0070) model time 0.4668 (0.6536) loss 4.3537 (3.6579) grad_norm 1.6374 (1.4469) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][180/625] eta 0:04:46 lr 0.001188 wd 0.0500 time 0.4622 (0.6449) data time 0.0009 (0.0065) model time 0.4613 (0.6384) loss 2.5399 (3.6484) grad_norm 1.0789 (1.4306) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][190/625] eta 0:04:34 lr 0.001188 wd 0.0500 time 0.4643 (0.6318) data time 0.0010 (0.0061) model time 0.4632 (0.6257) loss 3.1939 (3.6475) grad_norm 1.2420 (1.4128) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][200/625] eta 0:04:23 lr 0.001187 wd 0.0500 time 0.4695 (0.6205) data time 0.0007 (0.0058) model time 0.4687 (0.6148) loss 3.2761 (3.6374) grad_norm 1.1669 (1.4040) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][210/625] eta 0:04:13 lr 0.001187 wd 0.0500 time 0.4667 (0.6108) data time 0.0010 (0.0055) model time 0.4658 (0.6053) loss 4.0128 (3.6342) grad_norm 1.3838 (1.3991) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][220/625] eta 0:04:03 lr 0.001187 wd 0.0500 time 0.4684 (0.6021) data time 0.0011 (0.0052) model time 0.4673 (0.5969) loss 3.7149 (3.6344) grad_norm 2.1616 (1.3958) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][230/625] eta 0:03:54 lr 0.001187 wd 0.0500 time 0.4624 (0.5943) data time 0.0009 (0.0050) model time 0.4615 (0.5893) loss 3.5119 (3.6222) grad_norm 0.9748 (1.3901) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][240/625] eta 0:03:46 lr 0.001187 wd 0.0500 time 0.4635 (0.5880) data time 0.0010 (0.0048) model time 0.4624 (0.5833) loss 3.3588 (3.6183) grad_norm 1.2252 (1.3853) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][250/625] eta 0:03:38 lr 0.001187 wd 0.0500 time 0.4639 (0.5817) data time 0.0009 (0.0046) model time 0.4630 (0.5772) loss 2.8859 (3.6070) grad_norm 0.9490 (1.3840) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][260/625] eta 0:03:30 lr 0.001187 wd 0.0500 time 0.4668 (0.5761) data time 0.0008 (0.0044) model time 0.4660 (0.5717) loss 2.4905 (3.5868) grad_norm 1.9296 (1.3904) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][270/625] eta 0:03:22 lr 0.001187 wd 0.0500 time 0.4694 (0.5712) data time 0.0008 (0.0043) model time 0.4687 (0.5669) loss 2.8262 (3.5848) grad_norm 1.5136 (1.3922) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][280/625] eta 0:03:15 lr 0.001187 wd 0.0500 time 0.4661 (0.5667) data time 0.0008 (0.0041) model time 0.4653 (0.5626) loss 3.2505 (3.5921) grad_norm 1.4513 (1.3862) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][290/625] eta 0:03:08 lr 0.001187 wd 0.0500 time 0.4685 (0.5626) data time 0.0008 (0.0040) model time 0.4678 (0.5586) loss 4.2846 (3.5852) grad_norm 1.5249 (1.3882) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][300/625] eta 0:03:01 lr 0.001187 wd 0.0500 time 0.4614 (0.5586) data time 0.0009 (0.0039) model time 0.4606 (0.5547) loss 2.2521 (3.5787) grad_norm 2.4811 (1.3975) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][310/625] eta 0:02:54 lr 0.001187 wd 0.0500 time 0.4629 (0.5549) data time 0.0011 (0.0038) model time 0.4618 (0.5511) loss 2.8723 (3.5728) grad_norm 1.4229 (1.3956) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][320/625] eta 0:02:48 lr 0.001187 wd 0.0500 time 0.4609 (0.5515) data time 0.0009 (0.0037) model time 0.4600 (0.5478) loss 3.0867 (3.5640) grad_norm 1.7319 (1.3966) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][330/625] eta 0:02:41 lr 0.001187 wd 0.0500 time 0.4749 (0.5484) data time 0.0011 (0.0036) model time 0.4738 (0.5448) loss 4.0623 (3.5640) grad_norm 0.9574 (1.3918) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][340/625] eta 0:02:35 lr 0.001187 wd 0.0500 time 0.4679 (0.5456) data time 0.0010 (0.0035) model time 0.4669 (0.5421) loss 3.2673 (3.5601) grad_norm 1.4605 (1.3913) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][350/625] eta 0:02:29 lr 0.001187 wd 0.0500 time 0.4659 (0.5430) data time 0.0011 (0.0034) model time 0.4648 (0.5396) loss 2.8505 (3.5560) grad_norm 0.9529 (1.3860) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][360/625] eta 0:02:23 lr 0.001187 wd 0.0500 time 0.4686 (0.5407) data time 0.0008 (0.0033) model time 0.4678 (0.5373) loss 2.5644 (3.5436) grad_norm 1.4748 (1.3843) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][370/625] eta 0:02:17 lr 0.001187 wd 0.0500 time 0.4677 (0.5383) data time 0.0008 (0.0033) model time 0.4670 (0.5350) loss 4.0083 (3.5518) grad_norm 1.6150 (1.3791) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][380/625] eta 0:02:11 lr 0.001187 wd 0.0500 time 0.4652 (0.5361) data time 0.0010 (0.0032) model time 0.4643 (0.5329) loss 3.3703 (3.5619) grad_norm 1.0988 (1.3821) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][390/625] eta 0:02:05 lr 0.001187 wd 0.0500 time 0.4642 (0.5340) data time 0.0012 (0.0031) model time 0.4631 (0.5309) loss 3.3797 (3.5580) grad_norm 1.2400 (1.3962) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][400/625] eta 0:01:59 lr 0.001187 wd 0.0500 time 0.4697 (0.5321) data time 0.0010 (0.0031) model time 0.4687 (0.5290) loss 3.9690 (3.5601) grad_norm 1.0684 (1.3878) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][410/625] eta 0:01:54 lr 0.001187 wd 0.0500 time 0.4726 (0.5303) data time 0.0008 (0.0030) model time 0.4718 (0.5273) loss 2.5195 (3.5577) grad_norm 0.9849 (1.3824) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][420/625] eta 0:01:48 lr 0.001187 wd 0.0500 time 0.4694 (0.5293) data time 0.0012 (0.0030) model time 0.4682 (0.5263) loss 3.8505 (3.5581) grad_norm 1.5118 (1.3827) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:12:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][430/625] eta 0:01:42 lr 0.001187 wd 0.0500 time 0.4742 (0.5276) data time 0.0008 (0.0029) model time 0.4735 (0.5247) loss 2.2932 (3.5495) grad_norm 1.3806 (1.3832) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][440/625] eta 0:01:37 lr 0.001187 wd 0.0500 time 0.4632 (0.5261) data time 0.0010 (0.0029) model time 0.4622 (0.5232) loss 3.6495 (3.5419) grad_norm 1.6905 (1.3875) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][450/625] eta 0:01:31 lr 0.001187 wd 0.0500 time 0.4676 (0.5246) data time 0.0009 (0.0028) model time 0.4667 (0.5218) loss 4.2512 (3.5445) grad_norm 1.1833 (1.3862) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][460/625] eta 0:01:26 lr 0.001187 wd 0.0500 time 0.4670 (0.5231) data time 0.0009 (0.0028) model time 0.4661 (0.5204) loss 3.8173 (3.5480) grad_norm 1.3547 (inf) loss_scale 8192.0000 (16323.4680) mem 16699MB [2024-08-04 15:12:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][470/625] eta 0:01:20 lr 0.001187 wd 0.0500 time 0.4656 (0.5218) data time 0.0008 (0.0027) model time 0.4648 (0.5190) loss 3.1912 (3.5471) grad_norm 1.7487 (inf) loss_scale 8192.0000 (16128.0000) mem 16699MB [2024-08-04 15:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][480/625] eta 0:01:15 lr 0.001187 wd 0.0500 time 0.4710 (0.5205) data time 0.0011 (0.0027) model time 0.4699 (0.5178) loss 3.8076 (3.5503) grad_norm 1.7803 (inf) loss_scale 8192.0000 (15941.7089) mem 16699MB [2024-08-04 15:12:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][490/625] eta 0:01:10 lr 0.001187 wd 0.0500 time 0.4705 (0.5194) data time 0.0010 (0.0027) model time 0.4694 (0.5167) loss 3.4285 (3.5562) grad_norm 1.2144 (inf) loss_scale 8192.0000 (15763.9633) mem 16699MB [2024-08-04 15:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][500/625] eta 0:01:04 lr 0.001187 wd 0.0500 time 0.4691 (0.5183) data time 0.0008 (0.0026) model time 0.4683 (0.5157) loss 3.6241 (3.5573) grad_norm 1.9184 (inf) loss_scale 8192.0000 (15594.1883) mem 16699MB [2024-08-04 15:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][510/625] eta 0:00:59 lr 0.001187 wd 0.0500 time 0.4773 (0.5172) data time 0.0008 (0.0026) model time 0.4765 (0.5146) loss 3.4753 (3.5555) grad_norm 1.0926 (inf) loss_scale 8192.0000 (15431.8596) mem 16699MB [2024-08-04 15:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][520/625] eta 0:00:54 lr 0.001187 wd 0.0500 time 0.4666 (0.5161) data time 0.0008 (0.0026) model time 0.4658 (0.5136) loss 2.5163 (3.5486) grad_norm 1.0853 (inf) loss_scale 8192.0000 (15276.4979) mem 16699MB [2024-08-04 15:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][530/625] eta 0:00:48 lr 0.001187 wd 0.0500 time 0.4645 (0.5151) data time 0.0007 (0.0025) model time 0.4638 (0.5125) loss 2.9301 (3.5409) grad_norm 1.2203 (inf) loss_scale 8192.0000 (15127.6639) mem 16699MB [2024-08-04 15:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][540/625] eta 0:00:43 lr 0.001187 wd 0.0500 time 0.4674 (0.5141) data time 0.0010 (0.0025) model time 0.4663 (0.5116) loss 3.9059 (3.5454) grad_norm 1.2670 (inf) loss_scale 8192.0000 (14984.9547) mem 16699MB [2024-08-04 15:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][550/625] eta 0:00:38 lr 0.001187 wd 0.0500 time 0.4645 (0.5131) data time 0.0008 (0.0025) model time 0.4637 (0.5106) loss 2.8841 (3.5425) grad_norm 1.0604 (inf) loss_scale 8192.0000 (14848.0000) mem 16699MB [2024-08-04 15:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][560/625] eta 0:00:33 lr 0.001187 wd 0.0500 time 0.4671 (0.5122) data time 0.0010 (0.0024) model time 0.4661 (0.5098) loss 3.2412 (3.5420) grad_norm 1.1252 (inf) loss_scale 8192.0000 (14716.4585) mem 16699MB [2024-08-04 15:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][570/625] eta 0:00:28 lr 0.001187 wd 0.0500 time 0.4725 (0.5115) data time 0.0010 (0.0024) model time 0.4715 (0.5090) loss 2.9911 (3.5483) grad_norm 2.5031 (inf) loss_scale 8192.0000 (14590.0155) mem 16699MB [2024-08-04 15:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][580/625] eta 0:00:22 lr 0.001187 wd 0.0500 time 0.4723 (0.5110) data time 0.0008 (0.0024) model time 0.4715 (0.5086) loss 2.0978 (3.5429) grad_norm 1.2017 (inf) loss_scale 8192.0000 (14468.3802) mem 16699MB [2024-08-04 15:13:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][590/625] eta 0:00:17 lr 0.001187 wd 0.0500 time 0.4682 (0.5102) data time 0.0011 (0.0024) model time 0.4671 (0.5078) loss 3.7795 (3.5400) grad_norm 1.3078 (inf) loss_scale 8192.0000 (14351.2836) mem 16699MB [2024-08-04 15:13:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][600/625] eta 0:00:12 lr 0.001187 wd 0.0500 time 0.4643 (0.5094) data time 0.0008 (0.0023) model time 0.4635 (0.5071) loss 3.0796 (3.5384) grad_norm 1.4243 (inf) loss_scale 8192.0000 (14238.4762) mem 16699MB [2024-08-04 15:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][610/625] eta 0:00:07 lr 0.001187 wd 0.0500 time 0.4674 (0.5089) data time 0.0008 (0.0023) model time 0.4667 (0.5066) loss 3.7236 (3.5443) grad_norm 1.0435 (inf) loss_scale 8192.0000 (14129.7266) mem 16699MB [2024-08-04 15:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [38/300][620/625] eta 0:00:02 lr 0.001187 wd 0.0500 time 0.4674 (0.5082) data time 0.0006 (0.0023) model time 0.4668 (0.5059) loss 2.3813 (3.5448) grad_norm 1.4489 (inf) loss_scale 8192.0000 (14024.8198) mem 16699MB [2024-08-04 15:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 38 training takes 0:04:49 [2024-08-04 15:13:40 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:13:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.6792 (0.6792) Acc@1 84.570 (84.570) Acc@5 97.363 (97.363) Mem 16699MB [2024-08-04 15:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.122 (0.161) Loss 1.1484 (0.8403) Acc@1 71.826 (80.442) Acc@5 92.529 (95.779) Mem 16699MB [2024-08-04 15:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.2725 (1.0142) Acc@1 69.580 (76.307) Acc@5 90.527 (93.615) Mem 16699MB [2024-08-04 15:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.042 Acc@5 93.504 [2024-08-04 15:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.0% [2024-08-04 15:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.04% [2024-08-04 15:13:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:13:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5840 (0.5840) Acc@1 85.156 (85.156) Acc@5 97.363 (97.363) Mem 16699MB [2024-08-04 15:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 1.0635 (0.7463) Acc@1 73.193 (80.597) Acc@5 92.188 (95.810) Mem 16699MB [2024-08-04 15:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 1.2363 (0.9375) Acc@1 68.994 (76.197) Acc@5 89.697 (93.427) Mem 16699MB [2024-08-04 15:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.028 Acc@5 93.416 [2024-08-04 15:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 76.0% [2024-08-04 15:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 76.03% [2024-08-04 15:13:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:13:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:13:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][0/625] eta 0:09:50 lr 0.001187 wd 0.0500 time 0.9451 (0.9451) data time 0.4211 (0.4211) model time 0.0000 (0.0000) loss 3.3843 (3.3843) grad_norm 1.2132 (1.2132) loss_scale 8192.0000 (8192.0000) mem 16712MB [2024-08-04 15:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][10/625] eta 0:05:12 lr 0.001187 wd 0.0500 time 0.4656 (0.5084) data time 0.0007 (0.0392) model time 0.0000 (0.0000) loss 4.3219 (3.6827) grad_norm 1.0881 (1.1848) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][20/625] eta 0:04:56 lr 0.001187 wd 0.0500 time 0.4705 (0.4896) data time 0.0011 (0.0211) model time 0.0000 (0.0000) loss 4.0517 (3.6577) grad_norm 1.5907 (1.3352) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][30/625] eta 0:04:47 lr 0.001186 wd 0.0500 time 0.4668 (0.4829) data time 0.0009 (0.0146) model time 0.0000 (0.0000) loss 2.7588 (3.6100) grad_norm 1.7227 (1.3944) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][40/625] eta 0:04:40 lr 0.001186 wd 0.0500 time 0.4738 (0.4800) data time 0.0012 (0.0113) model time 0.0000 (0.0000) loss 3.4126 (3.5949) grad_norm 1.0531 (1.3267) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][50/625] eta 0:04:34 lr 0.001186 wd 0.0500 time 0.4703 (0.4773) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 4.0856 (3.6160) grad_norm 1.7856 (1.3287) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][60/625] eta 0:04:28 lr 0.001186 wd 0.0500 time 0.4639 (0.4758) data time 0.0010 (0.0080) model time 0.4628 (0.4670) loss 4.0093 (3.6186) grad_norm 1.8141 (1.3546) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][70/625] eta 0:04:23 lr 0.001186 wd 0.0500 time 0.4646 (0.4745) data time 0.0008 (0.0070) model time 0.4638 (0.4665) loss 2.5360 (3.5900) grad_norm 1.9605 (1.3587) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][80/625] eta 0:04:18 lr 0.001186 wd 0.0500 time 0.4695 (0.4736) data time 0.0008 (0.0063) model time 0.4687 (0.4663) loss 4.7415 (3.5918) grad_norm 1.1806 (1.3674) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][90/625] eta 0:04:12 lr 0.001186 wd 0.0500 time 0.4662 (0.4728) data time 0.0008 (0.0057) model time 0.4654 (0.4661) loss 2.4079 (3.5511) grad_norm 1.2441 (1.3714) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][100/625] eta 0:04:08 lr 0.001186 wd 0.0500 time 0.4692 (0.4724) data time 0.0008 (0.0052) model time 0.4684 (0.4664) loss 4.2641 (3.5831) grad_norm 1.0279 (1.3527) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][110/625] eta 0:04:03 lr 0.001186 wd 0.0500 time 0.4636 (0.4735) data time 0.0008 (0.0049) model time 0.4629 (0.4693) loss 3.8277 (3.5930) grad_norm 1.3260 (1.3715) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][120/625] eta 0:03:58 lr 0.001186 wd 0.0500 time 0.4722 (0.4732) data time 0.0011 (0.0045) model time 0.4711 (0.4692) loss 2.6316 (3.5722) grad_norm 1.8314 (1.3950) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][130/625] eta 0:03:54 lr 0.001186 wd 0.0500 time 0.4626 (0.4727) data time 0.0010 (0.0043) model time 0.4616 (0.4688) loss 3.7641 (3.5617) grad_norm 1.1880 (1.4031) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][140/625] eta 0:03:49 lr 0.001186 wd 0.0500 time 0.4655 (0.4722) data time 0.0009 (0.0041) model time 0.4645 (0.4683) loss 2.5476 (3.5510) grad_norm 0.9679 (1.3900) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][150/625] eta 0:03:44 lr 0.001186 wd 0.0500 time 0.4677 (0.4719) data time 0.0009 (0.0039) model time 0.4668 (0.4680) loss 4.1438 (3.5361) grad_norm 1.4161 (1.3837) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][160/625] eta 0:03:39 lr 0.001186 wd 0.0500 time 0.4695 (0.4718) data time 0.0011 (0.0037) model time 0.4683 (0.4682) loss 3.7934 (3.5345) grad_norm 2.1566 (1.3949) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][170/625] eta 0:03:34 lr 0.001186 wd 0.0500 time 0.4677 (0.4715) data time 0.0011 (0.0035) model time 0.4666 (0.4680) loss 4.2551 (3.5511) grad_norm 1.1427 (1.3835) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][180/625] eta 0:03:29 lr 0.001186 wd 0.0500 time 0.4654 (0.4715) data time 0.0010 (0.0034) model time 0.4643 (0.4681) loss 3.9379 (3.5408) grad_norm 1.6794 (1.3749) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][190/625] eta 0:03:25 lr 0.001186 wd 0.0500 time 0.4654 (0.4714) data time 0.0011 (0.0033) model time 0.4643 (0.4681) loss 3.5023 (3.5420) grad_norm 1.8674 (1.3800) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][200/625] eta 0:03:20 lr 0.001186 wd 0.0500 time 0.4631 (0.4711) data time 0.0009 (0.0032) model time 0.4622 (0.4679) loss 4.1850 (3.5480) grad_norm 0.9887 (1.3690) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][210/625] eta 0:03:15 lr 0.001186 wd 0.0500 time 0.4627 (0.4708) data time 0.0008 (0.0031) model time 0.4619 (0.4676) loss 3.9831 (3.5506) grad_norm 1.2134 (1.3713) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][220/625] eta 0:03:10 lr 0.001186 wd 0.0500 time 0.4624 (0.4705) data time 0.0010 (0.0030) model time 0.4613 (0.4674) loss 2.9173 (3.5413) grad_norm 1.4517 (1.3690) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][230/625] eta 0:03:05 lr 0.001186 wd 0.0500 time 0.4718 (0.4703) data time 0.0008 (0.0029) model time 0.4710 (0.4673) loss 3.7521 (3.5412) grad_norm 1.7800 (1.3717) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][240/625] eta 0:03:01 lr 0.001186 wd 0.0500 time 0.4654 (0.4702) data time 0.0009 (0.0028) model time 0.4645 (0.4673) loss 2.6118 (3.5175) grad_norm 1.2068 (1.3655) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][250/625] eta 0:02:56 lr 0.001186 wd 0.0500 time 0.4700 (0.4702) data time 0.0008 (0.0028) model time 0.4692 (0.4673) loss 3.7861 (3.5261) grad_norm 1.3075 (1.3622) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][260/625] eta 0:02:51 lr 0.001186 wd 0.0500 time 0.4644 (0.4701) data time 0.0009 (0.0027) model time 0.4635 (0.4673) loss 2.9537 (3.5130) grad_norm 1.0294 (1.3682) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][270/625] eta 0:02:46 lr 0.001186 wd 0.0500 time 0.4675 (0.4699) data time 0.0008 (0.0026) model time 0.4667 (0.4672) loss 3.9455 (3.5104) grad_norm 1.4250 (1.3666) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][280/625] eta 0:02:42 lr 0.001186 wd 0.0500 time 0.4626 (0.4698) data time 0.0008 (0.0026) model time 0.4618 (0.4671) loss 4.0018 (3.5086) grad_norm 1.9578 (1.3747) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][290/625] eta 0:02:37 lr 0.001186 wd 0.0500 time 0.4651 (0.4697) data time 0.0011 (0.0025) model time 0.4640 (0.4670) loss 3.6378 (3.5088) grad_norm 1.5940 (1.3767) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][300/625] eta 0:02:32 lr 0.001186 wd 0.0500 time 0.4673 (0.4698) data time 0.0007 (0.0025) model time 0.4666 (0.4672) loss 3.4102 (3.5086) grad_norm 1.7214 (1.3770) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][310/625] eta 0:02:28 lr 0.001186 wd 0.0500 time 0.6853 (0.4704) data time 0.0008 (0.0024) model time 0.6845 (0.4680) loss 4.0837 (3.5059) grad_norm 1.3947 (1.3722) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][320/625] eta 0:02:23 lr 0.001186 wd 0.0500 time 0.4671 (0.4704) data time 0.0011 (0.0024) model time 0.4661 (0.4680) loss 2.6991 (3.4970) grad_norm 1.2403 (1.3681) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][330/625] eta 0:02:18 lr 0.001186 wd 0.0500 time 0.4686 (0.4703) data time 0.0010 (0.0023) model time 0.4675 (0.4680) loss 4.0832 (3.4995) grad_norm 1.2155 (1.3642) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][340/625] eta 0:02:14 lr 0.001186 wd 0.0500 time 0.4621 (0.4702) data time 0.0011 (0.0023) model time 0.4610 (0.4679) loss 2.4348 (3.4969) grad_norm 2.9250 (1.3752) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][350/625] eta 0:02:09 lr 0.001186 wd 0.0500 time 0.4684 (0.4701) data time 0.0008 (0.0023) model time 0.4676 (0.4678) loss 3.9871 (3.5004) grad_norm 1.1754 (1.3751) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][360/625] eta 0:02:04 lr 0.001186 wd 0.0500 time 0.4653 (0.4700) data time 0.0010 (0.0022) model time 0.4643 (0.4677) loss 3.8158 (3.5088) grad_norm 1.2911 (1.3755) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][370/625] eta 0:01:59 lr 0.001186 wd 0.0500 time 0.4675 (0.4699) data time 0.0009 (0.0022) model time 0.4667 (0.4676) loss 3.8631 (3.5064) grad_norm 1.5396 (1.3712) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][380/625] eta 0:01:55 lr 0.001186 wd 0.0500 time 0.4827 (0.4699) data time 0.0010 (0.0022) model time 0.4817 (0.4676) loss 3.0440 (3.4935) grad_norm 2.0920 (1.3740) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][390/625] eta 0:01:50 lr 0.001186 wd 0.0500 time 0.4670 (0.4698) data time 0.0010 (0.0022) model time 0.4660 (0.4676) loss 3.8490 (3.4961) grad_norm 1.1414 (1.3731) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][400/625] eta 0:01:45 lr 0.001186 wd 0.0500 time 0.4710 (0.4698) data time 0.0010 (0.0021) model time 0.4701 (0.4676) loss 3.2286 (3.4969) grad_norm 1.2468 (1.3698) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][410/625] eta 0:01:40 lr 0.001186 wd 0.0500 time 0.4674 (0.4697) data time 0.0012 (0.0021) model time 0.4662 (0.4676) loss 3.5623 (3.4999) grad_norm 1.3421 (1.3663) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][420/625] eta 0:01:36 lr 0.001186 wd 0.0500 time 0.4640 (0.4696) data time 0.0011 (0.0021) model time 0.4629 (0.4675) loss 3.9972 (3.5024) grad_norm 1.7598 (1.3692) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][430/625] eta 0:01:31 lr 0.001186 wd 0.0500 time 0.4655 (0.4695) data time 0.0011 (0.0021) model time 0.4644 (0.4674) loss 3.1237 (3.5026) grad_norm 1.3355 (1.3732) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][440/625] eta 0:01:26 lr 0.001186 wd 0.0500 time 0.4625 (0.4695) data time 0.0011 (0.0020) model time 0.4614 (0.4674) loss 3.8692 (3.5066) grad_norm 1.5022 (1.3753) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][450/625] eta 0:01:22 lr 0.001186 wd 0.0500 time 0.4716 (0.4694) data time 0.0008 (0.0020) model time 0.4708 (0.4673) loss 3.8783 (3.5062) grad_norm 1.0028 (1.3732) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][460/625] eta 0:01:17 lr 0.001185 wd 0.0500 time 0.4707 (0.4694) data time 0.0011 (0.0020) model time 0.4696 (0.4674) loss 3.6926 (3.4967) grad_norm 1.6458 (1.3723) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][470/625] eta 0:01:12 lr 0.001185 wd 0.0500 time 0.4707 (0.4695) data time 0.0008 (0.0020) model time 0.4698 (0.4674) loss 4.2382 (3.5019) grad_norm 2.0082 (1.3803) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][480/625] eta 0:01:08 lr 0.001185 wd 0.0500 time 0.4666 (0.4695) data time 0.0010 (0.0020) model time 0.4656 (0.4675) loss 2.7730 (3.5006) grad_norm 1.7362 (1.3808) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][490/625] eta 0:01:03 lr 0.001185 wd 0.0500 time 0.4679 (0.4694) data time 0.0008 (0.0019) model time 0.4671 (0.4675) loss 2.4634 (3.5021) grad_norm 1.1867 (1.3805) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][500/625] eta 0:00:58 lr 0.001185 wd 0.0500 time 0.6450 (0.4697) data time 0.0009 (0.0019) model time 0.6440 (0.4678) loss 3.9799 (3.5002) grad_norm 1.1015 (1.3799) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][510/625] eta 0:00:54 lr 0.001185 wd 0.0500 time 0.4684 (0.4697) data time 0.0009 (0.0019) model time 0.4675 (0.4677) loss 2.3661 (3.4978) grad_norm 1.2967 (1.3781) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][520/625] eta 0:00:49 lr 0.001185 wd 0.0500 time 0.4755 (0.4699) data time 0.0010 (0.0019) model time 0.4745 (0.4680) loss 3.6641 (3.4974) grad_norm 1.1863 (1.3768) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][530/625] eta 0:00:44 lr 0.001185 wd 0.0500 time 0.4659 (0.4699) data time 0.0010 (0.0019) model time 0.4648 (0.4680) loss 3.6861 (3.4951) grad_norm 1.6842 (1.3761) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][540/625] eta 0:00:39 lr 0.001185 wd 0.0500 time 0.4680 (0.4699) data time 0.0007 (0.0019) model time 0.4673 (0.4680) loss 3.9954 (3.4952) grad_norm 1.2942 (1.3776) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][550/625] eta 0:00:35 lr 0.001185 wd 0.0500 time 0.4669 (0.4698) data time 0.0009 (0.0018) model time 0.4660 (0.4680) loss 2.7802 (3.4979) grad_norm 1.1881 (1.3814) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][560/625] eta 0:00:30 lr 0.001185 wd 0.0500 time 0.4698 (0.4698) data time 0.0008 (0.0018) model time 0.4690 (0.4680) loss 4.0253 (3.5015) grad_norm 1.3330 (1.3823) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][570/625] eta 0:00:25 lr 0.001185 wd 0.0500 time 0.4623 (0.4697) data time 0.0010 (0.0018) model time 0.4613 (0.4679) loss 2.3132 (3.4977) grad_norm 1.3093 (1.3833) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][580/625] eta 0:00:21 lr 0.001185 wd 0.0500 time 0.4700 (0.4697) data time 0.0009 (0.0018) model time 0.4691 (0.4679) loss 3.5029 (3.4984) grad_norm 1.3360 (1.3818) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][590/625] eta 0:00:16 lr 0.001185 wd 0.0500 time 0.4649 (0.4696) data time 0.0007 (0.0018) model time 0.4642 (0.4678) loss 4.3817 (3.4983) grad_norm 1.2658 (1.3841) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][600/625] eta 0:00:11 lr 0.001185 wd 0.0500 time 0.4638 (0.4696) data time 0.0010 (0.0018) model time 0.4628 (0.4678) loss 3.4522 (3.4977) grad_norm 1.0150 (1.3810) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][610/625] eta 0:00:07 lr 0.001185 wd 0.0500 time 0.4670 (0.4696) data time 0.0007 (0.0018) model time 0.4662 (0.4678) loss 3.5513 (3.5033) grad_norm 0.9653 (1.3812) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [39/300][620/625] eta 0:00:02 lr 0.001185 wd 0.0500 time 0.4649 (0.4695) data time 0.0007 (0.0018) model time 0.4642 (0.4677) loss 2.2648 (3.5074) grad_norm 1.8574 (1.3803) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 39 training takes 0:04:53 [2024-08-04 15:18:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:18:54 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:18:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.6665 (0.6665) Acc@1 84.082 (84.082) Acc@5 97.363 (97.363) Mem 16703MB [2024-08-04 15:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.160) Loss 1.1133 (0.8340) Acc@1 73.926 (80.473) Acc@5 92.529 (95.836) Mem 16703MB [2024-08-04 15:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 1.2646 (1.0134) Acc@1 70.215 (76.265) Acc@5 90.527 (93.610) Mem 16703MB [2024-08-04 15:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.138 Acc@5 93.560 [2024-08-04 15:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.1% [2024-08-04 15:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.14% [2024-08-04 15:18:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:18:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:19:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5771 (0.5771) Acc@1 85.400 (85.400) Acc@5 97.510 (97.510) Mem 16703MB [2024-08-04 15:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.159) Loss 1.0488 (0.7373) Acc@1 73.633 (80.913) Acc@5 92.578 (95.938) Mem 16703MB [2024-08-04 15:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.2197 (0.9249) Acc@1 69.434 (76.544) Acc@5 89.990 (93.573) Mem 16703MB [2024-08-04 15:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.378 Acc@5 93.558 [2024-08-04 15:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 76.4% [2024-08-04 15:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 76.38% [2024-08-04 15:19:02 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:19:04 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][0/625] eta 0:08:06 lr 0.001185 wd 0.0500 time 0.7782 (0.7782) data time 0.3639 (0.3639) model time 0.0000 (0.0000) loss 3.2264 (3.2264) grad_norm 1.1598 (1.1598) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][10/625] eta 0:05:04 lr 0.001185 wd 0.0500 time 0.4675 (0.4946) data time 0.0008 (0.0340) model time 0.0000 (0.0000) loss 3.2526 (3.6139) grad_norm 1.6782 (1.2925) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][20/625] eta 0:04:51 lr 0.001185 wd 0.0500 time 0.4673 (0.4819) data time 0.0008 (0.0182) model time 0.0000 (0.0000) loss 2.8324 (3.5567) grad_norm 1.0860 (1.2956) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][30/625] eta 0:04:43 lr 0.001185 wd 0.0500 time 0.4635 (0.4769) data time 0.0008 (0.0127) model time 0.0000 (0.0000) loss 2.5609 (3.5179) grad_norm 1.7729 (1.3326) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][40/625] eta 0:04:37 lr 0.001185 wd 0.0500 time 0.4797 (0.4751) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 2.8489 (3.4620) grad_norm 1.5156 (1.3893) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][50/625] eta 0:04:32 lr 0.001185 wd 0.0500 time 0.4682 (0.4742) data time 0.0010 (0.0082) model time 0.0000 (0.0000) loss 3.7392 (3.4834) grad_norm 1.1369 (1.4057) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][60/625] eta 0:04:27 lr 0.001185 wd 0.0500 time 0.4656 (0.4735) data time 0.0009 (0.0070) model time 0.4648 (0.4691) loss 4.4540 (3.4539) grad_norm 1.4190 (1.4009) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][70/625] eta 0:04:22 lr 0.001185 wd 0.0500 time 0.4718 (0.4728) data time 0.0008 (0.0062) model time 0.4711 (0.4683) loss 4.0924 (3.4979) grad_norm 1.1961 (1.3743) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][80/625] eta 0:04:17 lr 0.001185 wd 0.0500 time 0.4631 (0.4720) data time 0.0012 (0.0056) model time 0.4619 (0.4673) loss 4.3443 (3.5225) grad_norm 1.1133 (1.3527) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][90/625] eta 0:04:13 lr 0.001185 wd 0.0500 time 0.4686 (0.4733) data time 0.0008 (0.0051) model time 0.4678 (0.4711) loss 4.0716 (3.5439) grad_norm 1.1751 (1.3392) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][100/625] eta 0:04:08 lr 0.001185 wd 0.0500 time 0.4649 (0.4726) data time 0.0009 (0.0047) model time 0.4640 (0.4699) loss 3.7143 (3.5606) grad_norm 1.7439 (1.3477) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:19:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][110/625] eta 0:04:03 lr 0.001185 wd 0.0500 time 0.4703 (0.4723) data time 0.0008 (0.0043) model time 0.4695 (0.4695) loss 3.4545 (3.5486) grad_norm 1.5765 (1.3339) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][120/625] eta 0:03:58 lr 0.001185 wd 0.0500 time 0.4690 (0.4722) data time 0.0009 (0.0041) model time 0.4680 (0.4696) loss 3.3688 (3.5297) grad_norm 1.1051 (1.3329) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][130/625] eta 0:03:53 lr 0.001185 wd 0.0500 time 0.4688 (0.4718) data time 0.0011 (0.0038) model time 0.4677 (0.4691) loss 3.9070 (3.5233) grad_norm 1.5232 (1.3304) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][140/625] eta 0:03:48 lr 0.001185 wd 0.0500 time 0.4677 (0.4716) data time 0.0011 (0.0036) model time 0.4666 (0.4690) loss 2.9576 (3.5148) grad_norm 1.3008 (1.3341) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][150/625] eta 0:03:43 lr 0.001185 wd 0.0500 time 0.4672 (0.4713) data time 0.0010 (0.0035) model time 0.4662 (0.4687) loss 3.6812 (3.5207) grad_norm 1.5327 (1.3399) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][160/625] eta 0:03:39 lr 0.001185 wd 0.0500 time 0.4673 (0.4710) data time 0.0010 (0.0033) model time 0.4664 (0.4685) loss 3.6505 (3.5246) grad_norm 1.0883 (1.3340) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][170/625] eta 0:03:34 lr 0.001185 wd 0.0500 time 0.4632 (0.4708) data time 0.0010 (0.0032) model time 0.4622 (0.4682) loss 3.4178 (3.5403) grad_norm 0.9097 (1.3530) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][180/625] eta 0:03:29 lr 0.001185 wd 0.0500 time 0.4699 (0.4706) data time 0.0011 (0.0031) model time 0.4688 (0.4681) loss 3.2895 (3.5460) grad_norm 1.8221 (1.3468) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][190/625] eta 0:03:24 lr 0.001185 wd 0.0500 time 0.4668 (0.4706) data time 0.0008 (0.0030) model time 0.4660 (0.4681) loss 3.3566 (3.5504) grad_norm 1.3481 (1.3432) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][200/625] eta 0:03:20 lr 0.001185 wd 0.0500 time 0.4705 (0.4713) data time 0.0007 (0.0029) model time 0.4698 (0.4692) loss 2.7076 (3.5427) grad_norm 0.8230 (1.3362) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][210/625] eta 0:03:15 lr 0.001185 wd 0.0500 time 0.4728 (0.4712) data time 0.0010 (0.0028) model time 0.4718 (0.4691) loss 3.5387 (3.5506) grad_norm 2.4587 (1.3485) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][220/625] eta 0:03:10 lr 0.001185 wd 0.0500 time 0.4711 (0.4711) data time 0.0008 (0.0027) model time 0.4703 (0.4690) loss 2.3507 (3.5449) grad_norm 1.1747 (1.3522) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][230/625] eta 0:03:06 lr 0.001185 wd 0.0500 time 0.4660 (0.4709) data time 0.0008 (0.0027) model time 0.4652 (0.4689) loss 3.4064 (3.5486) grad_norm 0.9694 (1.3464) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][240/625] eta 0:03:01 lr 0.001185 wd 0.0500 time 0.4704 (0.4707) data time 0.0010 (0.0026) model time 0.4695 (0.4687) loss 4.0023 (3.5473) grad_norm 1.1491 (1.3405) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][250/625] eta 0:02:56 lr 0.001185 wd 0.0500 time 0.4615 (0.4706) data time 0.0011 (0.0025) model time 0.4604 (0.4686) loss 3.6543 (3.5544) grad_norm 1.3584 (1.3414) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][260/625] eta 0:02:51 lr 0.001184 wd 0.0500 time 0.4673 (0.4705) data time 0.0009 (0.0025) model time 0.4664 (0.4684) loss 4.2855 (3.5513) grad_norm 1.2181 (1.3415) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][270/625] eta 0:02:47 lr 0.001184 wd 0.0500 time 0.4767 (0.4704) data time 0.0010 (0.0024) model time 0.4757 (0.4685) loss 2.0462 (3.5526) grad_norm 1.4940 (1.3439) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][280/625] eta 0:02:42 lr 0.001184 wd 0.0500 time 0.4716 (0.4704) data time 0.0010 (0.0024) model time 0.4706 (0.4685) loss 2.8846 (3.5452) grad_norm 1.0170 (1.3468) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][290/625] eta 0:02:37 lr 0.001184 wd 0.0500 time 0.4623 (0.4704) data time 0.0010 (0.0023) model time 0.4613 (0.4685) loss 3.2387 (3.5237) grad_norm 1.3519 (1.3477) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][300/625] eta 0:02:32 lr 0.001184 wd 0.0500 time 0.4688 (0.4702) data time 0.0010 (0.0023) model time 0.4677 (0.4683) loss 3.5864 (3.5134) grad_norm 1.0592 (1.3448) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][310/625] eta 0:02:28 lr 0.001184 wd 0.0500 time 0.4669 (0.4708) data time 0.0011 (0.0023) model time 0.4658 (0.4690) loss 3.3636 (3.5138) grad_norm 1.1898 (1.3427) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][320/625] eta 0:02:23 lr 0.001184 wd 0.0500 time 0.4628 (0.4706) data time 0.0010 (0.0022) model time 0.4618 (0.4688) loss 3.3448 (3.5148) grad_norm 1.3115 (1.3435) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][330/625] eta 0:02:18 lr 0.001184 wd 0.0500 time 0.4662 (0.4705) data time 0.0008 (0.0022) model time 0.4654 (0.4687) loss 4.0755 (3.5223) grad_norm 1.7801 (1.3493) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][340/625] eta 0:02:14 lr 0.001184 wd 0.0500 time 0.6926 (0.4711) data time 0.0011 (0.0022) model time 0.6915 (0.4694) loss 3.7818 (3.5128) grad_norm 2.2446 (1.3609) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][350/625] eta 0:02:09 lr 0.001184 wd 0.0500 time 0.4654 (0.4708) data time 0.0008 (0.0021) model time 0.4646 (0.4691) loss 3.2214 (3.5079) grad_norm 1.2889 (1.3653) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][360/625] eta 0:02:04 lr 0.001184 wd 0.0500 time 0.4692 (0.4706) data time 0.0011 (0.0021) model time 0.4681 (0.4690) loss 3.8333 (3.5078) grad_norm 1.5539 (1.3600) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][370/625] eta 0:01:59 lr 0.001184 wd 0.0500 time 0.4642 (0.4705) data time 0.0007 (0.0021) model time 0.4635 (0.4688) loss 3.2592 (3.4986) grad_norm 1.1350 (1.3572) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][380/625] eta 0:01:55 lr 0.001184 wd 0.0500 time 0.4641 (0.4704) data time 0.0008 (0.0020) model time 0.4633 (0.4687) loss 3.2409 (3.4989) grad_norm 1.9222 (1.3579) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][390/625] eta 0:01:50 lr 0.001184 wd 0.0500 time 0.4652 (0.4703) data time 0.0008 (0.0020) model time 0.4643 (0.4686) loss 4.5810 (3.4989) grad_norm 1.5182 (1.3591) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][400/625] eta 0:01:45 lr 0.001184 wd 0.0500 time 0.4631 (0.4702) data time 0.0008 (0.0020) model time 0.4623 (0.4685) loss 3.3180 (3.5014) grad_norm 0.9703 (1.3558) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][410/625] eta 0:01:41 lr 0.001184 wd 0.0500 time 0.4662 (0.4701) data time 0.0012 (0.0020) model time 0.4650 (0.4684) loss 3.3034 (3.5091) grad_norm 1.3801 (1.3563) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][420/625] eta 0:01:36 lr 0.001184 wd 0.0500 time 0.4659 (0.4701) data time 0.0012 (0.0020) model time 0.4647 (0.4684) loss 3.8816 (3.5113) grad_norm 1.3452 (1.3554) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][430/625] eta 0:01:31 lr 0.001184 wd 0.0500 time 0.4703 (0.4701) data time 0.0008 (0.0019) model time 0.4696 (0.4684) loss 4.1692 (3.5137) grad_norm 1.0470 (1.3551) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][440/625] eta 0:01:26 lr 0.001184 wd 0.0500 time 0.4680 (0.4700) data time 0.0010 (0.0019) model time 0.4670 (0.4683) loss 3.7351 (3.5161) grad_norm 1.6462 (1.3540) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][450/625] eta 0:01:22 lr 0.001184 wd 0.0500 time 0.4685 (0.4704) data time 0.0009 (0.0019) model time 0.4677 (0.4688) loss 2.2495 (3.5052) grad_norm 1.5123 (1.3534) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][460/625] eta 0:01:17 lr 0.001184 wd 0.0500 time 0.4707 (0.4703) data time 0.0009 (0.0019) model time 0.4698 (0.4687) loss 3.1861 (3.4960) grad_norm 1.1020 (1.3521) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][470/625] eta 0:01:12 lr 0.001184 wd 0.0500 time 0.4756 (0.4703) data time 0.0008 (0.0019) model time 0.4748 (0.4687) loss 3.8684 (3.4925) grad_norm 1.0987 (1.3503) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][480/625] eta 0:01:08 lr 0.001184 wd 0.0500 time 0.4664 (0.4702) data time 0.0008 (0.0018) model time 0.4656 (0.4687) loss 4.3987 (3.4926) grad_norm 1.2804 (1.3473) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][490/625] eta 0:01:03 lr 0.001184 wd 0.0500 time 0.4690 (0.4702) data time 0.0008 (0.0018) model time 0.4682 (0.4686) loss 3.9409 (3.4933) grad_norm 1.5830 (1.3490) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][500/625] eta 0:00:58 lr 0.001184 wd 0.0500 time 0.4657 (0.4701) data time 0.0007 (0.0018) model time 0.4650 (0.4686) loss 4.0720 (3.4957) grad_norm 1.2497 (1.3545) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][510/625] eta 0:00:54 lr 0.001184 wd 0.0500 time 0.4690 (0.4700) data time 0.0007 (0.0018) model time 0.4683 (0.4685) loss 3.5083 (3.5016) grad_norm 1.0049 (1.3528) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][520/625] eta 0:00:49 lr 0.001184 wd 0.0500 time 0.4641 (0.4700) data time 0.0009 (0.0018) model time 0.4632 (0.4684) loss 3.7518 (3.4989) grad_norm 1.4464 (1.3502) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][530/625] eta 0:00:44 lr 0.001184 wd 0.0500 time 0.4638 (0.4699) data time 0.0008 (0.0018) model time 0.4630 (0.4683) loss 4.1740 (3.4997) grad_norm 1.5008 (1.3498) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][540/625] eta 0:00:39 lr 0.001184 wd 0.0500 time 0.4671 (0.4698) data time 0.0009 (0.0017) model time 0.4662 (0.4682) loss 3.9595 (3.5000) grad_norm 1.0347 (1.3472) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][550/625] eta 0:00:35 lr 0.001184 wd 0.0500 time 0.4708 (0.4697) data time 0.0010 (0.0017) model time 0.4699 (0.4682) loss 2.8378 (3.4989) grad_norm 1.3475 (1.3494) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][560/625] eta 0:00:30 lr 0.001184 wd 0.0500 time 0.4673 (0.4697) data time 0.0009 (0.0017) model time 0.4664 (0.4682) loss 3.0066 (3.4924) grad_norm 1.5028 (1.3492) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][570/625] eta 0:00:25 lr 0.001184 wd 0.0500 time 0.4650 (0.4697) data time 0.0010 (0.0017) model time 0.4640 (0.4681) loss 3.9586 (3.4988) grad_norm 1.2380 (1.3486) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][580/625] eta 0:00:21 lr 0.001184 wd 0.0500 time 0.4673 (0.4696) data time 0.0010 (0.0017) model time 0.4663 (0.4681) loss 4.1135 (3.5023) grad_norm 2.2213 (1.3543) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][590/625] eta 0:00:16 lr 0.001184 wd 0.0500 time 0.4669 (0.4695) data time 0.0008 (0.0017) model time 0.4662 (0.4680) loss 4.2814 (3.5002) grad_norm 0.9395 (1.3513) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][600/625] eta 0:00:11 lr 0.001184 wd 0.0500 time 0.4641 (0.4695) data time 0.0011 (0.0017) model time 0.4630 (0.4680) loss 3.5820 (3.5030) grad_norm 1.1969 (1.3517) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][610/625] eta 0:00:07 lr 0.001184 wd 0.0500 time 0.4627 (0.4694) data time 0.0008 (0.0017) model time 0.4620 (0.4679) loss 3.6151 (3.5036) grad_norm 1.0143 (1.3533) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [40/300][620/625] eta 0:00:02 lr 0.001184 wd 0.0500 time 0.4633 (0.4694) data time 0.0007 (0.0017) model time 0.4625 (0.4678) loss 3.1580 (3.5025) grad_norm 1.1822 (1.3540) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:23:57 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 40 training takes 0:04:53 [2024-08-04 15:23:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:24:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.6729 (0.6729) Acc@1 84.180 (84.180) Acc@5 97.461 (97.461) Mem 16703MB [2024-08-04 15:24:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.163) Loss 1.1113 (0.8417) Acc@1 73.877 (80.651) Acc@5 92.334 (96.001) Mem 16703MB [2024-08-04 15:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.2871 (1.0139) Acc@1 69.189 (76.700) Acc@5 91.113 (93.750) Mem 16703MB [2024-08-04 15:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.470 Acc@5 93.720 [2024-08-04 15:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.5% [2024-08-04 15:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.47% [2024-08-04 15:24:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:24:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:24:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5728 (0.5728) Acc@1 85.596 (85.596) Acc@5 97.607 (97.607) Mem 16703MB [2024-08-04 15:24:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 1.0342 (0.7300) Acc@1 73.633 (81.161) Acc@5 92.822 (96.040) Mem 16703MB [2024-08-04 15:24:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.2051 (0.9141) Acc@1 69.824 (76.886) Acc@5 90.186 (93.743) Mem 16703MB [2024-08-04 15:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.715 Acc@5 93.722 [2024-08-04 15:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 76.7% [2024-08-04 15:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 76.72% [2024-08-04 15:24:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:24:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][0/625] eta 0:08:58 lr 0.001184 wd 0.0500 time 0.8621 (0.8621) data time 0.4481 (0.4481) model time 0.0000 (0.0000) loss 3.6951 (3.6951) grad_norm 1.1001 (1.1001) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][10/625] eta 0:05:09 lr 0.001184 wd 0.0500 time 0.4685 (0.5039) data time 0.0010 (0.0417) model time 0.0000 (0.0000) loss 3.6989 (3.6908) grad_norm 1.0906 (1.3360) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][20/625] eta 0:04:54 lr 0.001184 wd 0.0500 time 0.4645 (0.4867) data time 0.0008 (0.0223) model time 0.0000 (0.0000) loss 2.8638 (3.5238) grad_norm 0.9183 (1.2714) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][30/625] eta 0:04:45 lr 0.001184 wd 0.0500 time 0.4688 (0.4805) data time 0.0008 (0.0154) model time 0.0000 (0.0000) loss 2.4553 (3.4384) grad_norm 1.0058 (1.2805) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][40/625] eta 0:04:39 lr 0.001183 wd 0.0500 time 0.4688 (0.4772) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 3.6842 (3.4183) grad_norm 1.7975 (1.2989) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][50/625] eta 0:04:33 lr 0.001183 wd 0.0500 time 0.4675 (0.4752) data time 0.0010 (0.0098) model time 0.0000 (0.0000) loss 3.4804 (3.3804) grad_norm 1.3067 (1.3103) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][60/625] eta 0:04:27 lr 0.001183 wd 0.0500 time 0.4662 (0.4741) data time 0.0010 (0.0083) model time 0.4652 (0.4674) loss 3.4602 (3.4526) grad_norm 1.1707 (1.2962) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][70/625] eta 0:04:24 lr 0.001183 wd 0.0500 time 0.4673 (0.4765) data time 0.0010 (0.0073) model time 0.4663 (0.4785) loss 2.7437 (3.4338) grad_norm 1.5345 (1.3290) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][80/625] eta 0:04:19 lr 0.001183 wd 0.0500 time 0.4672 (0.4755) data time 0.0008 (0.0065) model time 0.4664 (0.4748) loss 3.9458 (3.4297) grad_norm 1.1738 (1.3075) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][90/625] eta 0:04:13 lr 0.001183 wd 0.0500 time 0.4703 (0.4747) data time 0.0010 (0.0059) model time 0.4693 (0.4730) loss 3.7255 (3.4409) grad_norm 1.2885 (1.3004) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][100/625] eta 0:04:08 lr 0.001183 wd 0.0500 time 0.4638 (0.4738) data time 0.0008 (0.0054) model time 0.4630 (0.4712) loss 2.9429 (3.4107) grad_norm 1.4566 (1.2934) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][110/625] eta 0:04:03 lr 0.001183 wd 0.0500 time 0.4639 (0.4730) data time 0.0007 (0.0051) model time 0.4632 (0.4700) loss 3.2755 (3.4163) grad_norm 1.4576 (1.2966) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][120/625] eta 0:03:58 lr 0.001183 wd 0.0500 time 0.4641 (0.4723) data time 0.0008 (0.0047) model time 0.4633 (0.4691) loss 3.6857 (3.4090) grad_norm 2.1642 (1.3248) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][130/625] eta 0:03:53 lr 0.001183 wd 0.0500 time 0.4653 (0.4717) data time 0.0011 (0.0044) model time 0.4642 (0.4684) loss 3.4919 (3.4080) grad_norm 0.9931 (1.3288) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][140/625] eta 0:03:48 lr 0.001183 wd 0.0500 time 0.4669 (0.4713) data time 0.0010 (0.0042) model time 0.4659 (0.4680) loss 3.3161 (3.3977) grad_norm 1.1812 (1.3209) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][150/625] eta 0:03:43 lr 0.001183 wd 0.0500 time 0.4665 (0.4711) data time 0.0009 (0.0040) model time 0.4656 (0.4679) loss 3.1057 (3.3825) grad_norm 1.2249 (1.3135) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][160/625] eta 0:03:38 lr 0.001183 wd 0.0500 time 0.4669 (0.4709) data time 0.0012 (0.0038) model time 0.4657 (0.4678) loss 2.4921 (3.3810) grad_norm 2.1043 (1.3266) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][170/625] eta 0:03:34 lr 0.001183 wd 0.0500 time 0.4644 (0.4706) data time 0.0009 (0.0037) model time 0.4635 (0.4675) loss 4.0484 (3.3830) grad_norm 1.2180 (1.3384) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][180/625] eta 0:03:29 lr 0.001183 wd 0.0500 time 0.4642 (0.4703) data time 0.0010 (0.0035) model time 0.4632 (0.4673) loss 3.4154 (3.3802) grad_norm 1.7293 (1.3392) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][190/625] eta 0:03:24 lr 0.001183 wd 0.0500 time 0.4648 (0.4701) data time 0.0008 (0.0034) model time 0.4640 (0.4671) loss 3.7832 (3.4000) grad_norm 1.4109 (1.3365) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][200/625] eta 0:03:19 lr 0.001183 wd 0.0500 time 0.4660 (0.4699) data time 0.0010 (0.0033) model time 0.4650 (0.4670) loss 3.2172 (3.3930) grad_norm 1.5880 (1.3349) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][210/625] eta 0:03:15 lr 0.001183 wd 0.0500 time 0.4851 (0.4700) data time 0.0009 (0.0032) model time 0.4842 (0.4673) loss 3.9246 (3.4056) grad_norm 1.1435 (1.3369) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:25:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][220/625] eta 0:03:10 lr 0.001183 wd 0.0500 time 0.4668 (0.4699) data time 0.0010 (0.0031) model time 0.4658 (0.4673) loss 3.6042 (3.4180) grad_norm 1.4014 (1.3431) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][230/625] eta 0:03:05 lr 0.001183 wd 0.0500 time 0.4633 (0.4699) data time 0.0009 (0.0030) model time 0.4624 (0.4673) loss 3.3117 (3.4165) grad_norm 1.0981 (1.3473) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][240/625] eta 0:03:00 lr 0.001183 wd 0.0500 time 0.4703 (0.4698) data time 0.0011 (0.0029) model time 0.4692 (0.4672) loss 3.3156 (3.4161) grad_norm 1.3670 (1.3460) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][250/625] eta 0:02:56 lr 0.001183 wd 0.0500 time 0.4691 (0.4702) data time 0.0010 (0.0028) model time 0.4680 (0.4679) loss 3.7042 (3.4255) grad_norm 1.6223 (1.3556) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][260/625] eta 0:02:51 lr 0.001183 wd 0.0500 time 0.4595 (0.4701) data time 0.0008 (0.0028) model time 0.4587 (0.4678) loss 2.5804 (3.4254) grad_norm 1.2270 (1.3523) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][270/625] eta 0:02:46 lr 0.001183 wd 0.0500 time 0.4649 (0.4699) data time 0.0013 (0.0027) model time 0.4636 (0.4676) loss 3.5922 (3.4324) grad_norm 1.5885 (1.3556) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][280/625] eta 0:02:42 lr 0.001183 wd 0.0500 time 0.4704 (0.4699) data time 0.0008 (0.0026) model time 0.4696 (0.4676) loss 3.4075 (3.4268) grad_norm 1.1576 (1.3551) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][290/625] eta 0:02:37 lr 0.001183 wd 0.0500 time 0.4709 (0.4699) data time 0.0010 (0.0026) model time 0.4699 (0.4677) loss 3.7969 (3.4342) grad_norm 1.4462 (1.3510) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][300/625] eta 0:02:32 lr 0.001183 wd 0.0500 time 0.4681 (0.4699) data time 0.0010 (0.0025) model time 0.4671 (0.4677) loss 3.5000 (3.4320) grad_norm 0.9303 (1.3499) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][310/625] eta 0:02:28 lr 0.001183 wd 0.0500 time 0.4641 (0.4698) data time 0.0008 (0.0025) model time 0.4633 (0.4677) loss 3.8399 (3.4391) grad_norm 1.9887 (1.3523) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][320/625] eta 0:02:23 lr 0.001183 wd 0.0500 time 0.4675 (0.4698) data time 0.0008 (0.0025) model time 0.4668 (0.4677) loss 2.9479 (3.4484) grad_norm 1.6650 (1.3553) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][330/625] eta 0:02:18 lr 0.001183 wd 0.0500 time 0.4605 (0.4697) data time 0.0008 (0.0024) model time 0.4597 (0.4676) loss 3.8864 (3.4475) grad_norm 1.1859 (1.3518) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][340/625] eta 0:02:13 lr 0.001183 wd 0.0500 time 0.4646 (0.4696) data time 0.0010 (0.0024) model time 0.4636 (0.4675) loss 3.8233 (3.4527) grad_norm 1.2791 (1.3529) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:26:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][350/625] eta 0:02:09 lr 0.001183 wd 0.0500 time 0.4770 (0.4695) data time 0.0008 (0.0023) model time 0.4762 (0.4675) loss 3.9309 (3.4597) grad_norm 1.2812 (1.3521) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][360/625] eta 0:02:04 lr 0.001183 wd 0.0500 time 0.4658 (0.4694) data time 0.0010 (0.0023) model time 0.4648 (0.4674) loss 2.7700 (3.4574) grad_norm 1.1800 (1.3462) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][370/625] eta 0:01:59 lr 0.001183 wd 0.0500 time 0.4659 (0.4694) data time 0.0008 (0.0023) model time 0.4651 (0.4674) loss 4.2349 (3.4655) grad_norm 1.3150 (1.3469) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][380/625] eta 0:01:54 lr 0.001183 wd 0.0500 time 0.4762 (0.4693) data time 0.0010 (0.0022) model time 0.4752 (0.4674) loss 4.0400 (3.4580) grad_norm 1.6078 (1.3532) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][390/625] eta 0:01:50 lr 0.001183 wd 0.0500 time 0.4679 (0.4693) data time 0.0008 (0.0022) model time 0.4671 (0.4673) loss 2.7870 (3.4563) grad_norm 1.0541 (1.3561) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][400/625] eta 0:01:45 lr 0.001183 wd 0.0500 time 0.4608 (0.4693) data time 0.0008 (0.0022) model time 0.4599 (0.4673) loss 2.7956 (3.4557) grad_norm 1.4858 (1.3526) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][410/625] eta 0:01:40 lr 0.001183 wd 0.0500 time 0.4690 (0.4697) data time 0.0009 (0.0022) model time 0.4682 (0.4679) loss 3.1639 (3.4526) grad_norm 1.2238 (1.3552) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][420/625] eta 0:01:36 lr 0.001183 wd 0.0500 time 0.4655 (0.4696) data time 0.0010 (0.0021) model time 0.4645 (0.4678) loss 3.4325 (3.4487) grad_norm 1.2753 (1.3556) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][430/625] eta 0:01:31 lr 0.001183 wd 0.0500 time 0.4718 (0.4696) data time 0.0010 (0.0021) model time 0.4709 (0.4678) loss 3.4220 (3.4453) grad_norm 1.2588 (1.3551) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][440/625] eta 0:01:26 lr 0.001182 wd 0.0500 time 0.4665 (0.4698) data time 0.0009 (0.0021) model time 0.4656 (0.4680) loss 4.0173 (3.4436) grad_norm 1.2769 (1.3577) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][450/625] eta 0:01:22 lr 0.001182 wd 0.0500 time 0.4679 (0.4697) data time 0.0010 (0.0021) model time 0.4669 (0.4679) loss 3.2599 (3.4465) grad_norm 1.0874 (1.3580) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][460/625] eta 0:01:17 lr 0.001182 wd 0.0500 time 0.4659 (0.4696) data time 0.0008 (0.0020) model time 0.4651 (0.4678) loss 3.9739 (3.4493) grad_norm 1.1571 (1.3590) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][470/625] eta 0:01:12 lr 0.001182 wd 0.0500 time 0.4612 (0.4695) data time 0.0008 (0.0020) model time 0.4604 (0.4677) loss 4.5360 (3.4478) grad_norm 1.3614 (1.3534) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][480/625] eta 0:01:08 lr 0.001182 wd 0.0500 time 0.4658 (0.4694) data time 0.0010 (0.0020) model time 0.4648 (0.4676) loss 3.0435 (3.4471) grad_norm 1.1677 (1.3512) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][490/625] eta 0:01:03 lr 0.001182 wd 0.0500 time 0.4610 (0.4693) data time 0.0008 (0.0020) model time 0.4602 (0.4675) loss 2.7083 (3.4437) grad_norm 0.9784 (1.3483) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][500/625] eta 0:00:58 lr 0.001182 wd 0.0500 time 0.4638 (0.4692) data time 0.0010 (0.0020) model time 0.4629 (0.4674) loss 2.9422 (3.4478) grad_norm 1.3495 (1.3491) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][510/625] eta 0:00:53 lr 0.001182 wd 0.0500 time 0.4665 (0.4691) data time 0.0010 (0.0019) model time 0.4655 (0.4673) loss 3.6198 (3.4507) grad_norm 1.1789 (1.3528) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][520/625] eta 0:00:49 lr 0.001182 wd 0.0500 time 0.4676 (0.4690) data time 0.0010 (0.0019) model time 0.4666 (0.4673) loss 3.3266 (3.4532) grad_norm 1.5349 (1.3523) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][530/625] eta 0:00:44 lr 0.001182 wd 0.0500 time 0.4623 (0.4690) data time 0.0008 (0.0019) model time 0.4615 (0.4672) loss 2.1599 (3.4549) grad_norm 1.5866 (1.3537) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][540/625] eta 0:00:39 lr 0.001182 wd 0.0500 time 0.4620 (0.4689) data time 0.0008 (0.0019) model time 0.4612 (0.4671) loss 3.7480 (3.4527) grad_norm 0.9078 (1.3537) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][550/625] eta 0:00:35 lr 0.001182 wd 0.0500 time 0.4639 (0.4688) data time 0.0011 (0.0019) model time 0.4628 (0.4671) loss 3.5753 (3.4523) grad_norm 1.5626 (1.3519) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][560/625] eta 0:00:30 lr 0.001182 wd 0.0500 time 0.4732 (0.4687) data time 0.0008 (0.0019) model time 0.4724 (0.4670) loss 4.1073 (3.4549) grad_norm 1.2843 (1.3490) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][570/625] eta 0:00:25 lr 0.001182 wd 0.0500 time 0.4652 (0.4687) data time 0.0009 (0.0018) model time 0.4642 (0.4669) loss 3.3050 (3.4522) grad_norm 1.1936 (1.3498) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][580/625] eta 0:00:21 lr 0.001182 wd 0.0500 time 0.4656 (0.4686) data time 0.0010 (0.0018) model time 0.4646 (0.4669) loss 3.9197 (3.4508) grad_norm 1.1339 (1.3502) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][590/625] eta 0:00:16 lr 0.001182 wd 0.0500 time 0.4707 (0.4686) data time 0.0008 (0.0018) model time 0.4699 (0.4669) loss 2.2004 (3.4549) grad_norm 1.6595 (1.3544) loss_scale 16384.0000 (8302.8900) mem 16703MB [2024-08-04 15:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][600/625] eta 0:00:11 lr 0.001182 wd 0.0500 time 0.4718 (0.4689) data time 0.0010 (0.0018) model time 0.4707 (0.4672) loss 3.7536 (3.4529) grad_norm 1.3131 (1.3533) loss_scale 16384.0000 (8437.3511) mem 16703MB [2024-08-04 15:29:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][610/625] eta 0:00:07 lr 0.001182 wd 0.0500 time 0.4673 (0.4688) data time 0.0007 (0.0018) model time 0.4666 (0.4672) loss 3.5751 (3.4528) grad_norm 1.0304 (1.3519) loss_scale 16384.0000 (8567.4108) mem 16703MB [2024-08-04 15:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [41/300][620/625] eta 0:00:02 lr 0.001182 wd 0.0500 time 0.4615 (0.4688) data time 0.0005 (0.0018) model time 0.4610 (0.4671) loss 3.9855 (3.4536) grad_norm 1.2604 (1.3489) loss_scale 16384.0000 (8693.2818) mem 16703MB [2024-08-04 15:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 41 training takes 0:04:52 [2024-08-04 15:29:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:29:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.6445 (0.6445) Acc@1 84.277 (84.277) Acc@5 97.168 (97.168) Mem 16703MB [2024-08-04 15:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 1.1367 (0.8129) Acc@1 73.486 (80.899) Acc@5 92.383 (95.907) Mem 16703MB [2024-08-04 15:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 1.2617 (0.9980) Acc@1 69.775 (76.742) Acc@5 90.283 (93.669) Mem 16703MB [2024-08-04 15:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.565 Acc@5 93.652 [2024-08-04 15:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.6% [2024-08-04 15:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.57% [2024-08-04 15:29:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:29:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.506 (0.506) Loss 0.5688 (0.5688) Acc@1 85.791 (85.791) Acc@5 97.803 (97.803) Mem 16703MB [2024-08-04 15:29:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.159) Loss 1.0225 (0.7235) Acc@1 73.828 (81.396) Acc@5 92.969 (96.120) Mem 16703MB [2024-08-04 15:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.139) Loss 1.1904 (0.9042) Acc@1 70.264 (77.209) Acc@5 90.479 (93.890) Mem 16703MB [2024-08-04 15:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.021 Acc@5 93.880 [2024-08-04 15:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.0% [2024-08-04 15:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.02% [2024-08-04 15:29:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:29:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:29:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][0/625] eta 0:08:21 lr 0.001182 wd 0.0500 time 0.8025 (0.8025) data time 0.3870 (0.3870) model time 0.0000 (0.0000) loss 3.5966 (3.5966) grad_norm 1.4396 (1.4396) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][10/625] eta 0:05:05 lr 0.001182 wd 0.0500 time 0.4769 (0.4975) data time 0.0008 (0.0362) model time 0.0000 (0.0000) loss 3.9952 (3.8322) grad_norm 1.3787 (1.3596) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][20/625] eta 0:04:52 lr 0.001182 wd 0.0500 time 0.4661 (0.4834) data time 0.0009 (0.0195) model time 0.0000 (0.0000) loss 2.5720 (3.4342) grad_norm 1.1093 (1.6380) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][30/625] eta 0:04:44 lr 0.001182 wd 0.0500 time 0.4745 (0.4789) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 3.2054 (3.4495) grad_norm 1.1839 (1.5902) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][40/625] eta 0:04:38 lr 0.001182 wd 0.0500 time 0.4676 (0.4761) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 4.2930 (3.4875) grad_norm 1.4335 (1.5450) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][50/625] eta 0:04:32 lr 0.001182 wd 0.0500 time 0.4667 (0.4745) data time 0.0009 (0.0087) model time 0.0000 (0.0000) loss 3.0767 (3.4773) grad_norm 1.5220 (1.4924) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][60/625] eta 0:04:27 lr 0.001182 wd 0.0500 time 0.4687 (0.4733) data time 0.0008 (0.0074) model time 0.4679 (0.4667) loss 2.5707 (3.4591) grad_norm 2.4106 (1.5095) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][70/625] eta 0:04:22 lr 0.001182 wd 0.0500 time 0.4684 (0.4725) data time 0.0010 (0.0065) model time 0.4673 (0.4664) loss 3.2548 (3.4708) grad_norm 1.5319 (1.4958) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][80/625] eta 0:04:17 lr 0.001182 wd 0.0500 time 0.4747 (0.4720) data time 0.0008 (0.0058) model time 0.4738 (0.4669) loss 4.2149 (3.5014) grad_norm 1.2147 (1.4724) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][90/625] eta 0:04:12 lr 0.001182 wd 0.0500 time 0.4636 (0.4715) data time 0.0010 (0.0053) model time 0.4626 (0.4667) loss 3.4357 (3.5131) grad_norm 1.6166 (1.4727) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][100/625] eta 0:04:07 lr 0.001182 wd 0.0500 time 0.4739 (0.4714) data time 0.0010 (0.0049) model time 0.4729 (0.4672) loss 3.2393 (3.4987) grad_norm 1.4264 (1.4527) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][110/625] eta 0:04:02 lr 0.001182 wd 0.0500 time 0.4668 (0.4713) data time 0.0010 (0.0046) model time 0.4658 (0.4675) loss 4.1761 (3.5236) grad_norm 1.2309 (1.4370) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][120/625] eta 0:03:57 lr 0.001182 wd 0.0500 time 0.4710 (0.4711) data time 0.0010 (0.0043) model time 0.4700 (0.4675) loss 3.7457 (3.5442) grad_norm 1.1780 (1.4332) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][130/625] eta 0:03:53 lr 0.001182 wd 0.0500 time 0.4645 (0.4708) data time 0.0010 (0.0040) model time 0.4635 (0.4673) loss 3.7958 (3.5516) grad_norm 1.1588 (1.4238) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][140/625] eta 0:03:48 lr 0.001182 wd 0.0500 time 0.4637 (0.4705) data time 0.0010 (0.0038) model time 0.4627 (0.4671) loss 3.6095 (3.5397) grad_norm 0.9901 (1.4168) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][150/625] eta 0:03:43 lr 0.001182 wd 0.0500 time 0.4691 (0.4703) data time 0.0010 (0.0036) model time 0.4680 (0.4671) loss 3.3686 (3.5305) grad_norm 1.1047 (1.4082) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][160/625] eta 0:03:39 lr 0.001182 wd 0.0500 time 0.6818 (0.4716) data time 0.0011 (0.0035) model time 0.6807 (0.4692) loss 2.9363 (3.5373) grad_norm 2.9179 (1.4125) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][170/625] eta 0:03:34 lr 0.001182 wd 0.0500 time 0.4707 (0.4716) data time 0.0009 (0.0033) model time 0.4697 (0.4694) loss 2.7238 (3.5243) grad_norm 1.1427 (1.4116) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][180/625] eta 0:03:29 lr 0.001182 wd 0.0500 time 0.4737 (0.4716) data time 0.0011 (0.0032) model time 0.4727 (0.4695) loss 3.5164 (3.5240) grad_norm 0.9920 (1.4079) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][190/625] eta 0:03:25 lr 0.001181 wd 0.0500 time 0.4686 (0.4715) data time 0.0008 (0.0031) model time 0.4678 (0.4693) loss 4.5384 (3.5317) grad_norm 1.9022 (1.4128) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:30:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][200/625] eta 0:03:20 lr 0.001181 wd 0.0500 time 0.4662 (0.4713) data time 0.0008 (0.0030) model time 0.4654 (0.4691) loss 3.0189 (3.5347) grad_norm 1.5312 (1.4133) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][210/625] eta 0:03:15 lr 0.001181 wd 0.0500 time 0.4707 (0.4712) data time 0.0007 (0.0029) model time 0.4700 (0.4690) loss 4.1914 (3.5267) grad_norm 1.1768 (1.4030) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][220/625] eta 0:03:10 lr 0.001181 wd 0.0500 time 0.4716 (0.4711) data time 0.0009 (0.0028) model time 0.4707 (0.4690) loss 3.4306 (3.5280) grad_norm 1.1439 (1.4003) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][230/625] eta 0:03:06 lr 0.001181 wd 0.0500 time 0.4670 (0.4710) data time 0.0010 (0.0027) model time 0.4660 (0.4690) loss 4.0090 (3.5284) grad_norm 1.1469 (1.3933) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][240/625] eta 0:03:01 lr 0.001181 wd 0.0500 time 0.4721 (0.4710) data time 0.0007 (0.0027) model time 0.4714 (0.4690) loss 4.3427 (3.5292) grad_norm 1.8912 (1.4003) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][250/625] eta 0:02:56 lr 0.001181 wd 0.0500 time 0.4667 (0.4709) data time 0.0010 (0.0026) model time 0.4658 (0.4689) loss 2.9352 (3.5292) grad_norm 1.0907 (1.3996) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][260/625] eta 0:02:51 lr 0.001181 wd 0.0500 time 0.4616 (0.4708) data time 0.0008 (0.0026) model time 0.4607 (0.4689) loss 4.3278 (3.5427) grad_norm 1.1721 (1.3947) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][270/625] eta 0:02:47 lr 0.001181 wd 0.0500 time 0.4636 (0.4706) data time 0.0008 (0.0025) model time 0.4629 (0.4687) loss 4.2517 (3.5513) grad_norm 1.1954 (1.3915) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][280/625] eta 0:02:42 lr 0.001181 wd 0.0500 time 0.4638 (0.4705) data time 0.0008 (0.0025) model time 0.4631 (0.4685) loss 3.5246 (3.5467) grad_norm 1.0027 (1.3886) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][290/625] eta 0:02:37 lr 0.001181 wd 0.0500 time 0.4671 (0.4704) data time 0.0010 (0.0024) model time 0.4660 (0.4685) loss 2.6560 (3.5418) grad_norm 1.3469 (1.3833) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][300/625] eta 0:02:32 lr 0.001181 wd 0.0500 time 0.4692 (0.4703) data time 0.0011 (0.0024) model time 0.4682 (0.4684) loss 3.8208 (3.5386) grad_norm 1.0795 (1.3739) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][310/625] eta 0:02:28 lr 0.001181 wd 0.0500 time 0.4643 (0.4702) data time 0.0010 (0.0023) model time 0.4633 (0.4683) loss 3.3622 (3.5291) grad_norm 1.5039 (1.3713) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][320/625] eta 0:02:23 lr 0.001181 wd 0.0500 time 0.4638 (0.4701) data time 0.0008 (0.0023) model time 0.4631 (0.4682) loss 4.2209 (3.5294) grad_norm 2.2898 (1.3703) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][330/625] eta 0:02:18 lr 0.001181 wd 0.0500 time 0.4786 (0.4701) data time 0.0007 (0.0022) model time 0.4778 (0.4682) loss 3.2948 (3.5295) grad_norm 1.5468 (1.3758) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][340/625] eta 0:02:13 lr 0.001181 wd 0.0500 time 0.4629 (0.4699) data time 0.0010 (0.0022) model time 0.4620 (0.4681) loss 2.9892 (3.5243) grad_norm 1.5490 (1.3747) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][350/625] eta 0:02:09 lr 0.001181 wd 0.0500 time 0.4662 (0.4698) data time 0.0011 (0.0022) model time 0.4651 (0.4679) loss 2.8577 (3.5158) grad_norm 1.3281 (1.3737) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][360/625] eta 0:02:04 lr 0.001181 wd 0.0500 time 0.4675 (0.4697) data time 0.0011 (0.0022) model time 0.4665 (0.4678) loss 3.8020 (3.5181) grad_norm 0.9506 (1.3759) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][370/625] eta 0:01:59 lr 0.001181 wd 0.0500 time 0.4711 (0.4696) data time 0.0010 (0.0021) model time 0.4700 (0.4678) loss 3.5905 (3.5183) grad_norm 1.4274 (1.3713) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][380/625] eta 0:01:55 lr 0.001181 wd 0.0500 time 0.4662 (0.4696) data time 0.0011 (0.0021) model time 0.4651 (0.4678) loss 4.0076 (3.5186) grad_norm 0.9951 (1.3676) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][390/625] eta 0:01:50 lr 0.001181 wd 0.0500 time 0.4754 (0.4706) data time 0.0010 (0.0021) model time 0.4744 (0.4689) loss 3.5217 (3.5179) grad_norm 1.4069 (1.3712) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 15:32:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:32:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:35:42 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 15:35:44 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 15:35:46 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 15:36:05 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 15:36:05 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 15:36:08 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 15:36:10 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 15:36:10 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 42) [2024-08-04 15:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 15:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][400/625] eta 0:36:16 lr 0.001181 wd 0.0500 time 0.9786 (9.6730) data time 0.0008 (0.3404) model time 0.9778 (9.3326) loss 4.3101 (4.3295) grad_norm 1.8840 (1.5055) loss_scale 16384.0000 (16384.0000) mem 16705MB [2024-08-04 15:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][410/625] eta 0:07:05 lr 0.001181 wd 0.0500 time 0.4434 (1.9802) data time 0.0006 (0.0574) model time 0.4427 (1.9228) loss 2.2521 (3.7352) grad_norm 1.2172 (1.3102) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][420/625] eta 0:04:22 lr 0.001181 wd 0.0500 time 0.4413 (1.2815) data time 0.0008 (0.0317) model time 0.4404 (1.2498) loss 3.7638 (3.7043) grad_norm 1.4704 (1.2948) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][430/625] eta 0:03:20 lr 0.001181 wd 0.0500 time 0.4428 (1.0274) data time 0.0008 (0.0221) model time 0.4420 (1.0053) loss 3.7360 (3.7328) grad_norm 1.3274 (1.3102) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][440/625] eta 0:02:45 lr 0.001181 wd 0.0500 time 0.4666 (0.8925) data time 0.0009 (0.0170) model time 0.4657 (0.8755) loss 3.8642 (3.6893) grad_norm 1.4199 (1.2911) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][450/625] eta 0:02:21 lr 0.001181 wd 0.0500 time 0.4451 (0.8060) data time 0.0007 (0.0139) model time 0.4444 (0.7921) loss 3.8642 (3.6700) grad_norm 0.9795 (1.3186) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][460/625] eta 0:02:03 lr 0.001181 wd 0.0500 time 0.4458 (0.7476) data time 0.0006 (0.0118) model time 0.4452 (0.7358) loss 4.4098 (3.6545) grad_norm 1.4067 (1.3759) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][470/625] eta 0:01:49 lr 0.001181 wd 0.0500 time 0.4452 (0.7053) data time 0.0009 (0.0103) model time 0.4443 (0.6950) loss 3.2280 (3.6032) grad_norm 1.6007 (1.3816) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][480/625] eta 0:01:37 lr 0.001181 wd 0.0500 time 0.4473 (0.6733) data time 0.0009 (0.0092) model time 0.4464 (0.6641) loss 3.6608 (3.5839) grad_norm 1.4915 (1.4126) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][490/625] eta 0:01:27 lr 0.001181 wd 0.0500 time 0.4459 (0.6483) data time 0.0007 (0.0082) model time 0.4452 (0.6400) loss 2.4325 (3.5552) grad_norm 1.3780 (1.3935) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][500/625] eta 0:01:18 lr 0.001181 wd 0.0500 time 0.4443 (0.6280) data time 0.0006 (0.0075) model time 0.4437 (0.6205) loss 4.0299 (3.5741) grad_norm 1.6963 (1.3976) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][510/625] eta 0:01:10 lr 0.001181 wd 0.0500 time 0.4408 (0.6114) data time 0.0009 (0.0069) model time 0.4399 (0.6045) loss 3.9294 (3.5602) grad_norm 2.8408 (1.4111) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][520/625] eta 0:01:02 lr 0.001181 wd 0.0500 time 0.4396 (0.5975) data time 0.0006 (0.0064) model time 0.4389 (0.5911) loss 3.8074 (3.5599) grad_norm 1.9975 (1.4177) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][530/625] eta 0:00:55 lr 0.001181 wd 0.0500 time 0.4414 (0.5857) data time 0.0009 (0.0060) model time 0.4405 (0.5797) loss 3.5883 (3.5531) grad_norm 1.4031 (1.4145) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][540/625] eta 0:00:48 lr 0.001181 wd 0.0500 time 0.4423 (0.5757) data time 0.0008 (0.0056) model time 0.4415 (0.5700) loss 3.5826 (3.5388) grad_norm 1.9789 (1.4092) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][550/625] eta 0:00:42 lr 0.001181 wd 0.0500 time 0.4381 (0.5670) data time 0.0010 (0.0054) model time 0.4372 (0.5616) loss 3.8316 (3.5369) grad_norm 1.0378 (1.4087) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][560/625] eta 0:00:36 lr 0.001181 wd 0.0500 time 0.4407 (0.5594) data time 0.0010 (0.0051) model time 0.4398 (0.5543) loss 3.6025 (3.5421) grad_norm 1.1815 (1.4191) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][570/625] eta 0:00:30 lr 0.001180 wd 0.0500 time 0.4397 (0.5526) data time 0.0009 (0.0048) model time 0.4388 (0.5477) loss 2.9775 (3.5353) grad_norm 1.3111 (1.4020) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][580/625] eta 0:00:24 lr 0.001180 wd 0.0500 time 0.4440 (0.5466) data time 0.0009 (0.0046) model time 0.4431 (0.5420) loss 3.5996 (3.5239) grad_norm 1.6189 (1.3953) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][590/625] eta 0:00:18 lr 0.001180 wd 0.0500 time 0.4447 (0.5420) data time 0.0008 (0.0044) model time 0.4439 (0.5376) loss 4.2619 (3.5246) grad_norm 1.3676 (1.3957) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][600/625] eta 0:00:13 lr 0.001180 wd 0.0500 time 0.4452 (0.5371) data time 0.0006 (0.0042) model time 0.4446 (0.5329) loss 3.9150 (3.5158) grad_norm 1.2901 (1.3941) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:38:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][610/625] eta 0:00:07 lr 0.001180 wd 0.0500 time 0.4362 (0.5327) data time 0.0005 (0.0041) model time 0.4357 (0.5286) loss 3.5088 (3.5069) grad_norm 1.2614 (1.3952) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:38:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [42/300][620/625] eta 0:00:02 lr 0.001180 wd 0.0500 time 0.4364 (0.5284) data time 0.0004 (0.0039) model time 0.4360 (0.5244) loss 4.3642 (3.5036) grad_norm 2.1057 (1.3962) loss_scale 16384.0000 (16384.0000) mem 16699MB [2024-08-04 15:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 42 training takes 0:01:59 [2024-08-04 15:38:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:38:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.6665 (0.6665) Acc@1 86.084 (86.084) Acc@5 97.656 (97.656) Mem 16699MB [2024-08-04 15:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 1.1836 (0.8544) Acc@1 71.387 (80.935) Acc@5 92.578 (96.143) Mem 16699MB [2024-08-04 15:38:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 1.2988 (1.0263) Acc@1 70.361 (77.048) Acc@5 90.186 (93.880) Mem 16699MB [2024-08-04 15:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.761 Acc@5 93.842 [2024-08-04 15:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 76.8% [2024-08-04 15:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.76% [2024-08-04 15:38:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:38:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5640 (0.5640) Acc@1 85.840 (85.840) Acc@5 97.754 (97.754) Mem 16699MB [2024-08-04 15:38:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 1.0098 (0.7172) Acc@1 73.779 (81.605) Acc@5 93.066 (96.196) Mem 16699MB [2024-08-04 15:38:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.1777 (0.8954) Acc@1 70.703 (77.465) Acc@5 90.723 (94.022) Mem 16699MB [2024-08-04 15:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.287 Acc@5 94.018 [2024-08-04 15:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.3% [2024-08-04 15:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.29% [2024-08-04 15:38:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:38:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:38:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][0/625] eta 0:09:36 lr 0.001180 wd 0.0500 time 0.9227 (0.9227) data time 0.3841 (0.3841) model time 0.0000 (0.0000) loss 3.5199 (3.5199) grad_norm 0.8140 (0.8140) loss_scale 16384.0000 (16384.0000) mem 16712MB [2024-08-04 15:38:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][10/625] eta 0:04:58 lr 0.001180 wd 0.0500 time 0.4452 (0.4852) data time 0.0007 (0.0357) model time 0.0000 (0.0000) loss 3.4271 (3.4030) grad_norm 1.6771 (1.3033) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:38:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][20/625] eta 0:04:41 lr 0.001180 wd 0.0500 time 0.4450 (0.4647) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 3.5454 (3.4106) grad_norm 1.4194 (1.4884) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:38:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][30/625] eta 0:04:32 lr 0.001180 wd 0.0500 time 0.4473 (0.4575) data time 0.0006 (0.0132) model time 0.0000 (0.0000) loss 2.2964 (3.2938) grad_norm 1.6416 (1.4487) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:38:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][40/625] eta 0:04:25 lr 0.001180 wd 0.0500 time 0.4467 (0.4541) data time 0.0006 (0.0102) model time 0.0000 (0.0000) loss 2.6459 (3.3012) grad_norm 1.2122 (1.4698) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:38:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][50/625] eta 0:04:19 lr 0.001180 wd 0.0500 time 0.4377 (0.4519) data time 0.0009 (0.0083) model time 0.0000 (0.0000) loss 3.4549 (3.3657) grad_norm 1.4356 (1.4625) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:39:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][60/625] eta 0:04:16 lr 0.001180 wd 0.0500 time 0.6953 (0.4547) data time 0.0007 (0.0071) model time 0.6946 (0.4680) loss 4.6438 (3.3710) grad_norm 1.3711 (1.4498) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:39:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][70/625] eta 0:04:12 lr 0.001180 wd 0.0500 time 0.3882 (0.4551) data time 0.0009 (0.0062) model time 0.3872 (0.4624) loss 2.3578 (3.3276) grad_norm 1.2110 (inf) loss_scale 8192.0000 (16153.2394) mem 16703MB [2024-08-04 15:39:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][80/625] eta 0:04:07 lr 0.001180 wd 0.0500 time 0.4414 (0.4535) data time 0.0006 (0.0056) model time 0.4407 (0.4552) loss 3.9695 (3.3370) grad_norm 1.2035 (inf) loss_scale 8192.0000 (15170.3704) mem 16703MB [2024-08-04 15:39:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][90/625] eta 0:04:02 lr 0.001180 wd 0.0500 time 0.4407 (0.4525) data time 0.0008 (0.0051) model time 0.4399 (0.4523) loss 3.5420 (3.3647) grad_norm 1.7227 (inf) loss_scale 8192.0000 (14403.5165) mem 16703MB [2024-08-04 15:39:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][100/625] eta 0:03:57 lr 0.001180 wd 0.0500 time 0.4483 (0.4519) data time 0.0007 (0.0046) model time 0.4476 (0.4509) loss 2.9712 (3.4004) grad_norm 1.1021 (inf) loss_scale 8192.0000 (13788.5149) mem 16703MB [2024-08-04 15:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][110/625] eta 0:03:52 lr 0.001180 wd 0.0500 time 0.4445 (0.4513) data time 0.0009 (0.0043) model time 0.4437 (0.4500) loss 2.9780 (3.3916) grad_norm 1.1829 (inf) loss_scale 8192.0000 (13284.3243) mem 16703MB [2024-08-04 15:39:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][120/625] eta 0:03:47 lr 0.001180 wd 0.0500 time 0.4405 (0.4505) data time 0.0007 (0.0040) model time 0.4398 (0.4487) loss 2.9798 (3.4126) grad_norm 1.1342 (inf) loss_scale 8192.0000 (12863.4711) mem 16703MB [2024-08-04 15:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][130/625] eta 0:03:42 lr 0.001180 wd 0.0500 time 0.4407 (0.4499) data time 0.0008 (0.0038) model time 0.4398 (0.4478) loss 2.9625 (3.4218) grad_norm 1.1154 (inf) loss_scale 8192.0000 (12506.8702) mem 16703MB [2024-08-04 15:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][140/625] eta 0:03:37 lr 0.001180 wd 0.0500 time 0.4404 (0.4493) data time 0.0007 (0.0036) model time 0.4397 (0.4469) loss 2.7018 (3.4244) grad_norm 1.7350 (inf) loss_scale 8192.0000 (12200.8511) mem 16703MB [2024-08-04 15:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][150/625] eta 0:03:33 lr 0.001180 wd 0.0500 time 0.4420 (0.4488) data time 0.0009 (0.0034) model time 0.4412 (0.4464) loss 3.3590 (3.4176) grad_norm 1.2909 (inf) loss_scale 8192.0000 (11935.3642) mem 16703MB [2024-08-04 15:39:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][160/625] eta 0:03:28 lr 0.001180 wd 0.0500 time 0.4428 (0.4485) data time 0.0007 (0.0032) model time 0.4420 (0.4460) loss 3.5285 (3.4102) grad_norm 1.6921 (inf) loss_scale 8192.0000 (11702.8571) mem 16703MB [2024-08-04 15:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][170/625] eta 0:03:23 lr 0.001180 wd 0.0500 time 0.4432 (0.4482) data time 0.0007 (0.0031) model time 0.4425 (0.4457) loss 3.1063 (3.4168) grad_norm 1.5194 (inf) loss_scale 8192.0000 (11497.5439) mem 16703MB [2024-08-04 15:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][180/625] eta 0:03:19 lr 0.001180 wd 0.0500 time 0.4473 (0.4479) data time 0.0009 (0.0030) model time 0.4464 (0.4455) loss 3.3170 (3.4358) grad_norm 0.9634 (inf) loss_scale 8192.0000 (11314.9171) mem 16703MB [2024-08-04 15:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][190/625] eta 0:03:14 lr 0.001180 wd 0.0500 time 0.4421 (0.4477) data time 0.0006 (0.0029) model time 0.4415 (0.4453) loss 2.5528 (3.4378) grad_norm 1.1679 (inf) loss_scale 8192.0000 (11151.4136) mem 16703MB [2024-08-04 15:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][200/625] eta 0:03:10 lr 0.001180 wd 0.0500 time 0.4429 (0.4475) data time 0.0008 (0.0027) model time 0.4422 (0.4451) loss 3.9897 (3.4501) grad_norm 1.0073 (inf) loss_scale 8192.0000 (11004.1791) mem 16703MB [2024-08-04 15:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][210/625] eta 0:03:05 lr 0.001180 wd 0.0500 time 0.4382 (0.4471) data time 0.0009 (0.0027) model time 0.4373 (0.4448) loss 4.1107 (3.4652) grad_norm 1.1724 (inf) loss_scale 8192.0000 (10870.9005) mem 16703MB [2024-08-04 15:40:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][220/625] eta 0:03:01 lr 0.001180 wd 0.0500 time 0.4435 (0.4469) data time 0.0008 (0.0026) model time 0.4427 (0.4446) loss 3.1857 (3.4672) grad_norm 1.7281 (inf) loss_scale 8192.0000 (10749.6833) mem 16703MB [2024-08-04 15:40:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][230/625] eta 0:02:56 lr 0.001180 wd 0.0500 time 0.4438 (0.4467) data time 0.0008 (0.0025) model time 0.4430 (0.4444) loss 2.4162 (3.4581) grad_norm 1.2203 (inf) loss_scale 8192.0000 (10638.9610) mem 16703MB [2024-08-04 15:40:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][240/625] eta 0:02:51 lr 0.001180 wd 0.0500 time 0.4438 (0.4466) data time 0.0008 (0.0024) model time 0.4430 (0.4443) loss 3.1784 (3.4521) grad_norm 1.3730 (inf) loss_scale 8192.0000 (10537.4274) mem 16703MB [2024-08-04 15:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][250/625] eta 0:02:47 lr 0.001180 wd 0.0500 time 0.4432 (0.4464) data time 0.0007 (0.0024) model time 0.4426 (0.4442) loss 4.0395 (3.4490) grad_norm 0.9358 (inf) loss_scale 8192.0000 (10443.9841) mem 16703MB [2024-08-04 15:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][260/625] eta 0:02:42 lr 0.001180 wd 0.0500 time 0.4474 (0.4463) data time 0.0008 (0.0023) model time 0.4466 (0.4441) loss 3.2899 (3.4550) grad_norm 1.1154 (inf) loss_scale 8192.0000 (10357.7011) mem 16703MB [2024-08-04 15:40:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][270/625] eta 0:02:38 lr 0.001180 wd 0.0500 time 0.4519 (0.4462) data time 0.0007 (0.0023) model time 0.4513 (0.4441) loss 3.6275 (3.4509) grad_norm 1.0258 (inf) loss_scale 8192.0000 (10277.7860) mem 16703MB [2024-08-04 15:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][280/625] eta 0:02:33 lr 0.001180 wd 0.0500 time 0.4406 (0.4462) data time 0.0009 (0.0022) model time 0.4397 (0.4441) loss 3.5866 (3.4541) grad_norm 1.1847 (inf) loss_scale 8192.0000 (10203.5587) mem 16703MB [2024-08-04 15:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][290/625] eta 0:02:29 lr 0.001180 wd 0.0500 time 0.4421 (0.4461) data time 0.0009 (0.0022) model time 0.4413 (0.4440) loss 3.9088 (3.4703) grad_norm 1.5802 (inf) loss_scale 8192.0000 (10134.4330) mem 16703MB [2024-08-04 15:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][300/625] eta 0:02:24 lr 0.001180 wd 0.0500 time 0.4429 (0.4460) data time 0.0009 (0.0021) model time 0.4421 (0.4439) loss 3.5226 (3.4588) grad_norm 1.3861 (inf) loss_scale 8192.0000 (10069.9003) mem 16703MB [2024-08-04 15:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][310/625] eta 0:02:20 lr 0.001179 wd 0.0500 time 0.4455 (0.4459) data time 0.0006 (0.0021) model time 0.4449 (0.4439) loss 2.4022 (3.4554) grad_norm 1.0261 (inf) loss_scale 8192.0000 (10009.5177) mem 16703MB [2024-08-04 15:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][320/625] eta 0:02:15 lr 0.001179 wd 0.0500 time 0.4450 (0.4459) data time 0.0006 (0.0020) model time 0.4443 (0.4439) loss 4.1310 (3.4590) grad_norm 1.3412 (inf) loss_scale 8192.0000 (9952.8972) mem 16703MB [2024-08-04 15:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][330/625] eta 0:02:11 lr 0.001179 wd 0.0500 time 0.4469 (0.4458) data time 0.0006 (0.0020) model time 0.4462 (0.4438) loss 2.3520 (3.4699) grad_norm 1.1626 (inf) loss_scale 8192.0000 (9899.6979) mem 16703MB [2024-08-04 15:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][340/625] eta 0:02:07 lr 0.001179 wd 0.0500 time 0.4415 (0.4457) data time 0.0009 (0.0020) model time 0.4407 (0.4438) loss 3.7078 (3.4753) grad_norm 1.2737 (inf) loss_scale 8192.0000 (9849.6188) mem 16703MB [2024-08-04 15:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][350/625] eta 0:02:02 lr 0.001179 wd 0.0500 time 0.4449 (0.4457) data time 0.0006 (0.0019) model time 0.4442 (0.4437) loss 4.3065 (3.4766) grad_norm 1.7941 (inf) loss_scale 8192.0000 (9802.3932) mem 16703MB [2024-08-04 15:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][360/625] eta 0:01:58 lr 0.001179 wd 0.0500 time 0.4430 (0.4456) data time 0.0006 (0.0019) model time 0.4424 (0.4437) loss 3.9545 (3.4808) grad_norm 1.5065 (inf) loss_scale 8192.0000 (9757.7839) mem 16703MB [2024-08-04 15:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][370/625] eta 0:01:53 lr 0.001179 wd 0.0500 time 0.4444 (0.4456) data time 0.0007 (0.0019) model time 0.4436 (0.4438) loss 3.9670 (3.4825) grad_norm 1.0131 (inf) loss_scale 8192.0000 (9715.5795) mem 16703MB [2024-08-04 15:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][380/625] eta 0:01:49 lr 0.001179 wd 0.0500 time 0.4451 (0.4456) data time 0.0008 (0.0019) model time 0.4443 (0.4438) loss 3.4428 (3.4793) grad_norm 1.1793 (inf) loss_scale 8192.0000 (9675.5906) mem 16703MB [2024-08-04 15:41:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][390/625] eta 0:01:44 lr 0.001179 wd 0.0500 time 0.4459 (0.4456) data time 0.0008 (0.0018) model time 0.4451 (0.4438) loss 3.9503 (3.4816) grad_norm 1.3517 (inf) loss_scale 8192.0000 (9637.6471) mem 16703MB [2024-08-04 15:41:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][400/625] eta 0:01:40 lr 0.001179 wd 0.0500 time 0.4468 (0.4461) data time 0.0006 (0.0018) model time 0.4462 (0.4444) loss 3.1009 (3.4858) grad_norm 1.4133 (inf) loss_scale 8192.0000 (9601.5960) mem 16703MB [2024-08-04 15:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][410/625] eta 0:01:35 lr 0.001179 wd 0.0500 time 0.4432 (0.4464) data time 0.0007 (0.0018) model time 0.4425 (0.4448) loss 3.3449 (3.4855) grad_norm 1.3129 (inf) loss_scale 8192.0000 (9567.2993) mem 16703MB [2024-08-04 15:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][420/625] eta 0:01:31 lr 0.001179 wd 0.0500 time 0.4448 (0.4463) data time 0.0006 (0.0018) model time 0.4442 (0.4447) loss 3.5062 (3.4798) grad_norm 1.7481 (inf) loss_scale 8192.0000 (9534.6318) mem 16703MB [2024-08-04 15:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][430/625] eta 0:01:27 lr 0.001179 wd 0.0500 time 0.4368 (0.4462) data time 0.0008 (0.0017) model time 0.4359 (0.4446) loss 2.5804 (3.4766) grad_norm 1.1525 (inf) loss_scale 8192.0000 (9503.4803) mem 16703MB [2024-08-04 15:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][440/625] eta 0:01:22 lr 0.001179 wd 0.0500 time 0.4450 (0.4461) data time 0.0008 (0.0017) model time 0.4442 (0.4445) loss 4.0662 (3.4756) grad_norm 1.3577 (inf) loss_scale 8192.0000 (9473.7415) mem 16703MB [2024-08-04 15:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][450/625] eta 0:01:18 lr 0.001179 wd 0.0500 time 0.4426 (0.4460) data time 0.0007 (0.0017) model time 0.4419 (0.4444) loss 3.3070 (3.4806) grad_norm 1.0311 (inf) loss_scale 8192.0000 (9445.3215) mem 16703MB [2024-08-04 15:42:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][460/625] eta 0:01:13 lr 0.001179 wd 0.0500 time 0.4434 (0.4459) data time 0.0006 (0.0017) model time 0.4428 (0.4443) loss 4.1168 (3.4838) grad_norm 1.4376 (inf) loss_scale 8192.0000 (9418.1345) mem 16703MB [2024-08-04 15:42:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][470/625] eta 0:01:09 lr 0.001179 wd 0.0500 time 0.4377 (0.4459) data time 0.0008 (0.0017) model time 0.4369 (0.4443) loss 2.6033 (3.4761) grad_norm 1.7670 (inf) loss_scale 8192.0000 (9392.1019) mem 16703MB [2024-08-04 15:42:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][480/625] eta 0:01:04 lr 0.001179 wd 0.0500 time 0.4461 (0.4458) data time 0.0006 (0.0016) model time 0.4455 (0.4443) loss 2.7247 (3.4751) grad_norm 1.1498 (inf) loss_scale 8192.0000 (9367.1518) mem 16703MB [2024-08-04 15:42:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][490/625] eta 0:01:00 lr 0.001179 wd 0.0500 time 0.4462 (0.4458) data time 0.0009 (0.0016) model time 0.4453 (0.4442) loss 2.3570 (3.4662) grad_norm 1.2273 (inf) loss_scale 8192.0000 (9343.2179) mem 16703MB [2024-08-04 15:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][500/625] eta 0:00:55 lr 0.001179 wd 0.0500 time 0.4411 (0.4457) data time 0.0006 (0.0016) model time 0.4405 (0.4442) loss 2.6579 (3.4661) grad_norm 1.3229 (inf) loss_scale 8192.0000 (9320.2395) mem 16703MB [2024-08-04 15:42:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][510/625] eta 0:00:51 lr 0.001179 wd 0.0500 time 0.4440 (0.4457) data time 0.0006 (0.0016) model time 0.4434 (0.4441) loss 3.6946 (3.4735) grad_norm 1.7777 (inf) loss_scale 8192.0000 (9298.1605) mem 16703MB [2024-08-04 15:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][520/625] eta 0:00:46 lr 0.001179 wd 0.0500 time 0.4431 (0.4456) data time 0.0007 (0.0016) model time 0.4424 (0.4441) loss 4.1234 (3.4760) grad_norm 1.5032 (inf) loss_scale 8192.0000 (9276.9290) mem 16703MB [2024-08-04 15:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][530/625] eta 0:00:42 lr 0.001179 wd 0.0500 time 0.4407 (0.4456) data time 0.0009 (0.0016) model time 0.4397 (0.4441) loss 3.6349 (3.4684) grad_norm 1.3607 (inf) loss_scale 8192.0000 (9256.4972) mem 16703MB [2024-08-04 15:42:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][540/625] eta 0:00:37 lr 0.001179 wd 0.0500 time 0.4416 (0.4456) data time 0.0008 (0.0016) model time 0.4408 (0.4440) loss 2.7747 (3.4730) grad_norm 2.2178 (inf) loss_scale 8192.0000 (9236.8207) mem 16703MB [2024-08-04 15:42:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][550/625] eta 0:00:33 lr 0.001179 wd 0.0500 time 0.4435 (0.4455) data time 0.0009 (0.0015) model time 0.4426 (0.4440) loss 2.8519 (3.4737) grad_norm 1.0746 (inf) loss_scale 8192.0000 (9217.8584) mem 16703MB [2024-08-04 15:42:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][560/625] eta 0:00:28 lr 0.001179 wd 0.0500 time 0.4444 (0.4455) data time 0.0007 (0.0015) model time 0.4437 (0.4440) loss 3.9800 (3.4754) grad_norm 1.0263 (inf) loss_scale 8192.0000 (9199.5722) mem 16703MB [2024-08-04 15:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][570/625] eta 0:00:24 lr 0.001179 wd 0.0500 time 0.4424 (0.4454) data time 0.0007 (0.0015) model time 0.4418 (0.4440) loss 3.8084 (3.4764) grad_norm 2.4467 (inf) loss_scale 8192.0000 (9181.9264) mem 16703MB [2024-08-04 15:42:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][580/625] eta 0:00:20 lr 0.001179 wd 0.0500 time 0.4383 (0.4454) data time 0.0006 (0.0015) model time 0.4377 (0.4439) loss 2.5273 (3.4724) grad_norm 1.9111 (inf) loss_scale 8192.0000 (9164.8881) mem 16703MB [2024-08-04 15:42:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][590/625] eta 0:00:15 lr 0.001179 wd 0.0500 time 0.4422 (0.4456) data time 0.0008 (0.0015) model time 0.4414 (0.4442) loss 3.6761 (3.4715) grad_norm 1.1304 (inf) loss_scale 8192.0000 (9148.4264) mem 16703MB [2024-08-04 15:43:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][600/625] eta 0:00:11 lr 0.001179 wd 0.0500 time 0.4416 (0.4457) data time 0.0006 (0.0015) model time 0.4410 (0.4442) loss 3.0146 (3.4670) grad_norm 1.5561 (inf) loss_scale 8192.0000 (9132.5125) mem 16703MB [2024-08-04 15:43:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][610/625] eta 0:00:06 lr 0.001179 wd 0.0500 time 0.4414 (0.4456) data time 0.0006 (0.0015) model time 0.4408 (0.4442) loss 2.4119 (3.4656) grad_norm 1.2126 (inf) loss_scale 8192.0000 (9117.1195) mem 16703MB [2024-08-04 15:43:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [43/300][620/625] eta 0:00:02 lr 0.001179 wd 0.0500 time 0.4411 (0.4455) data time 0.0006 (0.0015) model time 0.4406 (0.4441) loss 3.5339 (3.4651) grad_norm 1.1853 (inf) loss_scale 8192.0000 (9102.2222) mem 16703MB [2024-08-04 15:43:13 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 43 training takes 0:04:38 [2024-08-04 15:43:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:43:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.6650 (0.6650) Acc@1 85.498 (85.498) Acc@5 97.607 (97.607) Mem 16703MB [2024-08-04 15:43:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.149) Loss 1.1309 (0.8250) Acc@1 73.047 (81.037) Acc@5 93.555 (96.143) Mem 16703MB [2024-08-04 15:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.2402 (0.9959) Acc@1 70.752 (77.200) Acc@5 90.918 (94.024) Mem 16703MB [2024-08-04 15:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 76.979 Acc@5 93.998 [2024-08-04 15:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.0% [2024-08-04 15:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 76.98% [2024-08-04 15:43:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:43:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.5605 (0.5605) Acc@1 86.084 (86.084) Acc@5 97.803 (97.803) Mem 16703MB [2024-08-04 15:43:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 1.0000 (0.7124) Acc@1 74.170 (81.836) Acc@5 93.262 (96.294) Mem 16703MB [2024-08-04 15:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1680 (0.8879) Acc@1 70.752 (77.783) Acc@5 90.967 (94.182) Mem 16703MB [2024-08-04 15:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.577 Acc@5 94.176 [2024-08-04 15:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.6% [2024-08-04 15:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.58% [2024-08-04 15:43:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:43:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:43:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][0/625] eta 0:08:25 lr 0.001179 wd 0.0500 time 0.8088 (0.8088) data time 0.4242 (0.4242) model time 0.0000 (0.0000) loss 2.8891 (2.8891) grad_norm 1.0057 (1.0057) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][10/625] eta 0:04:53 lr 0.001179 wd 0.0500 time 0.4396 (0.4768) data time 0.0009 (0.0393) model time 0.0000 (0.0000) loss 3.3118 (3.4580) grad_norm 1.1788 (1.2428) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][20/625] eta 0:04:38 lr 0.001179 wd 0.0500 time 0.4405 (0.4604) data time 0.0006 (0.0210) model time 0.0000 (0.0000) loss 2.0722 (3.3825) grad_norm 1.2150 (1.3111) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][30/625] eta 0:04:30 lr 0.001179 wd 0.0500 time 0.4434 (0.4550) data time 0.0006 (0.0145) model time 0.0000 (0.0000) loss 2.2765 (3.3314) grad_norm 1.0735 (1.2797) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][40/625] eta 0:04:24 lr 0.001178 wd 0.0500 time 0.4427 (0.4524) data time 0.0006 (0.0111) model time 0.0000 (0.0000) loss 3.8912 (3.3473) grad_norm 1.2480 (1.2887) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][50/625] eta 0:04:19 lr 0.001178 wd 0.0500 time 0.4429 (0.4508) data time 0.0006 (0.0091) model time 0.0000 (0.0000) loss 3.8858 (3.3609) grad_norm 1.0170 (1.3208) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][60/625] eta 0:04:14 lr 0.001178 wd 0.0500 time 0.4439 (0.4498) data time 0.0007 (0.0078) model time 0.4432 (0.4437) loss 2.8430 (3.3614) grad_norm 0.9779 (1.3549) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][70/625] eta 0:04:09 lr 0.001178 wd 0.0500 time 0.4431 (0.4488) data time 0.0008 (0.0068) model time 0.4423 (0.4429) loss 3.6192 (3.4012) grad_norm 1.3888 (1.3875) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][80/625] eta 0:04:04 lr 0.001178 wd 0.0500 time 0.4438 (0.4481) data time 0.0006 (0.0061) model time 0.4432 (0.4426) loss 4.3803 (3.4497) grad_norm 1.4778 (1.3889) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][90/625] eta 0:03:59 lr 0.001178 wd 0.0500 time 0.4419 (0.4474) data time 0.0008 (0.0055) model time 0.4410 (0.4423) loss 2.1008 (3.4283) grad_norm 1.9118 (1.3991) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][100/625] eta 0:03:54 lr 0.001178 wd 0.0500 time 0.4436 (0.4470) data time 0.0008 (0.0050) model time 0.4427 (0.4422) loss 3.7938 (3.4023) grad_norm 1.6017 (1.4021) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][110/625] eta 0:03:50 lr 0.001178 wd 0.0500 time 0.4436 (0.4472) data time 0.0008 (0.0046) model time 0.4428 (0.4433) loss 3.4175 (3.4086) grad_norm 0.9689 (1.4002) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][120/625] eta 0:03:45 lr 0.001178 wd 0.0500 time 0.4433 (0.4469) data time 0.0006 (0.0043) model time 0.4427 (0.4432) loss 4.4435 (3.4219) grad_norm 1.3346 (1.3978) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][130/625] eta 0:03:41 lr 0.001178 wd 0.0500 time 0.4406 (0.4466) data time 0.0006 (0.0041) model time 0.4400 (0.4431) loss 4.1023 (3.4242) grad_norm 0.8894 (1.3924) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][140/625] eta 0:03:36 lr 0.001178 wd 0.0500 time 0.4473 (0.4464) data time 0.0010 (0.0039) model time 0.4463 (0.4430) loss 3.6839 (3.4237) grad_norm 1.2188 (1.3785) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][150/625] eta 0:03:31 lr 0.001178 wd 0.0500 time 0.4443 (0.4462) data time 0.0008 (0.0037) model time 0.4435 (0.4430) loss 3.5587 (3.4173) grad_norm 1.4115 (1.3754) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][160/625] eta 0:03:27 lr 0.001178 wd 0.0500 time 0.4410 (0.4469) data time 0.0006 (0.0035) model time 0.4404 (0.4443) loss 3.4395 (3.4354) grad_norm 1.5920 (1.3945) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][170/625] eta 0:03:23 lr 0.001178 wd 0.0500 time 0.4400 (0.4466) data time 0.0008 (0.0033) model time 0.4392 (0.4440) loss 3.5335 (3.4285) grad_norm 1.1520 (1.3931) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][180/625] eta 0:03:18 lr 0.001178 wd 0.0500 time 0.4455 (0.4464) data time 0.0006 (0.0032) model time 0.4449 (0.4438) loss 3.5845 (3.4136) grad_norm 1.7986 (1.3826) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][190/625] eta 0:03:14 lr 0.001178 wd 0.0500 time 0.4461 (0.4463) data time 0.0006 (0.0031) model time 0.4454 (0.4438) loss 3.6549 (3.4316) grad_norm 1.1479 (1.3808) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][200/625] eta 0:03:09 lr 0.001178 wd 0.0500 time 0.4418 (0.4462) data time 0.0009 (0.0029) model time 0.4409 (0.4437) loss 4.2714 (3.4382) grad_norm 1.6591 (1.3993) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][210/625] eta 0:03:05 lr 0.001178 wd 0.0500 time 0.4411 (0.4460) data time 0.0006 (0.0028) model time 0.4405 (0.4437) loss 3.6674 (3.4452) grad_norm 1.3161 (1.3957) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][220/625] eta 0:03:00 lr 0.001178 wd 0.0500 time 0.4408 (0.4459) data time 0.0008 (0.0028) model time 0.4400 (0.4436) loss 2.0700 (3.4358) grad_norm 1.7507 (1.3951) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][230/625] eta 0:02:56 lr 0.001178 wd 0.0500 time 0.4456 (0.4459) data time 0.0008 (0.0027) model time 0.4447 (0.4437) loss 3.8001 (3.4394) grad_norm 1.2695 (1.3907) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][240/625] eta 0:02:51 lr 0.001178 wd 0.0500 time 0.4426 (0.4458) data time 0.0007 (0.0026) model time 0.4420 (0.4436) loss 2.6373 (3.4336) grad_norm 1.4061 (1.3897) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][250/625] eta 0:02:47 lr 0.001178 wd 0.0500 time 0.4430 (0.4463) data time 0.0008 (0.0025) model time 0.4422 (0.4443) loss 3.2195 (3.4322) grad_norm 1.6890 (1.3993) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][260/625] eta 0:02:42 lr 0.001178 wd 0.0500 time 0.4451 (0.4462) data time 0.0008 (0.0025) model time 0.4442 (0.4442) loss 3.7144 (3.4353) grad_norm 1.3860 (1.4024) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][270/625] eta 0:02:38 lr 0.001178 wd 0.0500 time 0.4410 (0.4461) data time 0.0007 (0.0024) model time 0.4403 (0.4441) loss 3.4562 (3.4427) grad_norm 1.0076 (1.3962) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][280/625] eta 0:02:33 lr 0.001178 wd 0.0500 time 0.4440 (0.4460) data time 0.0007 (0.0023) model time 0.4434 (0.4440) loss 2.5079 (3.4471) grad_norm 1.2584 (1.3888) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][290/625] eta 0:02:29 lr 0.001178 wd 0.0500 time 0.4432 (0.4459) data time 0.0008 (0.0023) model time 0.4424 (0.4440) loss 3.7732 (3.4439) grad_norm 1.2552 (1.3924) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][300/625] eta 0:02:24 lr 0.001178 wd 0.0500 time 0.4394 (0.4458) data time 0.0008 (0.0022) model time 0.4386 (0.4439) loss 3.7974 (3.4473) grad_norm 1.2838 (1.3955) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][310/625] eta 0:02:20 lr 0.001178 wd 0.0500 time 0.4435 (0.4457) data time 0.0006 (0.0022) model time 0.4430 (0.4439) loss 3.2597 (3.4454) grad_norm 1.1782 (1.3895) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][320/625] eta 0:02:15 lr 0.001178 wd 0.0500 time 0.4425 (0.4456) data time 0.0006 (0.0022) model time 0.4418 (0.4438) loss 3.1689 (3.4475) grad_norm 1.3527 (1.3868) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][330/625] eta 0:02:11 lr 0.001178 wd 0.0500 time 0.4438 (0.4456) data time 0.0006 (0.0021) model time 0.4432 (0.4438) loss 3.9151 (3.4584) grad_norm 1.2453 (1.3864) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][340/625] eta 0:02:06 lr 0.001178 wd 0.0500 time 0.4407 (0.4455) data time 0.0008 (0.0021) model time 0.4399 (0.4438) loss 3.1697 (3.4641) grad_norm 1.1190 (1.3837) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][350/625] eta 0:02:02 lr 0.001178 wd 0.0500 time 0.4479 (0.4455) data time 0.0008 (0.0020) model time 0.4471 (0.4438) loss 2.0587 (3.4653) grad_norm 1.4234 (1.3829) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][360/625] eta 0:01:58 lr 0.001178 wd 0.0500 time 0.4433 (0.4455) data time 0.0006 (0.0020) model time 0.4427 (0.4438) loss 4.1105 (3.4678) grad_norm 1.1315 (1.3818) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][370/625] eta 0:01:53 lr 0.001178 wd 0.0500 time 0.4430 (0.4455) data time 0.0008 (0.0020) model time 0.4422 (0.4438) loss 3.8219 (3.4644) grad_norm 1.5481 (1.3801) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][380/625] eta 0:01:49 lr 0.001178 wd 0.0500 time 0.4423 (0.4459) data time 0.0006 (0.0019) model time 0.4417 (0.4443) loss 2.3742 (3.4584) grad_norm 1.3894 (1.3814) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][390/625] eta 0:01:44 lr 0.001177 wd 0.0500 time 0.4448 (0.4459) data time 0.0006 (0.0019) model time 0.4442 (0.4443) loss 4.3760 (3.4631) grad_norm 1.6660 (1.3840) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][400/625] eta 0:01:40 lr 0.001177 wd 0.0500 time 0.4478 (0.4458) data time 0.0008 (0.0019) model time 0.4469 (0.4443) loss 3.4011 (3.4565) grad_norm 1.2802 (1.3818) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][410/625] eta 0:01:35 lr 0.001177 wd 0.0500 time 0.4458 (0.4459) data time 0.0007 (0.0019) model time 0.4451 (0.4443) loss 3.4739 (3.4588) grad_norm 1.5733 (1.3823) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][420/625] eta 0:01:31 lr 0.001177 wd 0.0500 time 0.4434 (0.4458) data time 0.0008 (0.0018) model time 0.4426 (0.4443) loss 2.8606 (3.4578) grad_norm 1.4823 (1.3831) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][430/625] eta 0:01:26 lr 0.001177 wd 0.0500 time 0.4445 (0.4458) data time 0.0008 (0.0018) model time 0.4437 (0.4443) loss 3.8870 (3.4699) grad_norm 1.5050 (1.3842) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][440/625] eta 0:01:22 lr 0.001177 wd 0.0500 time 0.4414 (0.4457) data time 0.0008 (0.0018) model time 0.4406 (0.4443) loss 2.8657 (3.4707) grad_norm 1.5504 (1.3847) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][450/625] eta 0:01:17 lr 0.001177 wd 0.0500 time 0.4436 (0.4457) data time 0.0006 (0.0018) model time 0.4429 (0.4442) loss 2.6685 (3.4717) grad_norm 1.2527 (1.3848) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][460/625] eta 0:01:13 lr 0.001177 wd 0.0500 time 0.4428 (0.4456) data time 0.0008 (0.0018) model time 0.4419 (0.4441) loss 3.4602 (3.4701) grad_norm 1.0247 (1.3854) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][470/625] eta 0:01:09 lr 0.001177 wd 0.0500 time 0.4372 (0.4456) data time 0.0006 (0.0017) model time 0.4366 (0.4441) loss 2.7146 (3.4607) grad_norm 1.4731 (1.3823) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:46:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][480/625] eta 0:01:04 lr 0.001177 wd 0.0500 time 0.4492 (0.4455) data time 0.0006 (0.0017) model time 0.4486 (0.4441) loss 3.3354 (3.4626) grad_norm 2.7854 (1.3881) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][490/625] eta 0:01:00 lr 0.001177 wd 0.0500 time 0.4416 (0.4455) data time 0.0008 (0.0017) model time 0.4408 (0.4441) loss 2.5983 (3.4615) grad_norm 1.4665 (1.3852) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][500/625] eta 0:00:55 lr 0.001177 wd 0.0500 time 0.4445 (0.4455) data time 0.0006 (0.0017) model time 0.4439 (0.4440) loss 4.0921 (3.4655) grad_norm 1.5286 (1.3881) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][510/625] eta 0:00:51 lr 0.001177 wd 0.0500 time 0.4394 (0.4454) data time 0.0008 (0.0017) model time 0.4386 (0.4440) loss 3.1529 (3.4665) grad_norm 1.4176 (1.3868) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][520/625] eta 0:00:46 lr 0.001177 wd 0.0500 time 0.4429 (0.4454) data time 0.0010 (0.0017) model time 0.4420 (0.4439) loss 4.0948 (3.4696) grad_norm 1.2062 (1.3836) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][530/625] eta 0:00:42 lr 0.001177 wd 0.0500 time 0.4415 (0.4457) data time 0.0010 (0.0017) model time 0.4406 (0.4443) loss 3.2860 (3.4720) grad_norm 1.3376 (1.3839) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][540/625] eta 0:00:37 lr 0.001177 wd 0.0500 time 0.4441 (0.4457) data time 0.0006 (0.0016) model time 0.4435 (0.4443) loss 2.4282 (3.4731) grad_norm 1.2960 (1.3833) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][550/625] eta 0:00:33 lr 0.001177 wd 0.0500 time 0.4443 (0.4457) data time 0.0007 (0.0016) model time 0.4436 (0.4443) loss 4.1339 (3.4753) grad_norm 1.2451 (1.3794) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][560/625] eta 0:00:28 lr 0.001177 wd 0.0500 time 0.4418 (0.4456) data time 0.0007 (0.0016) model time 0.4412 (0.4443) loss 3.8170 (3.4774) grad_norm 1.6471 (1.3778) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][570/625] eta 0:00:24 lr 0.001177 wd 0.0500 time 0.4429 (0.4456) data time 0.0008 (0.0016) model time 0.4421 (0.4442) loss 3.3995 (3.4734) grad_norm 1.2314 (1.3793) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][580/625] eta 0:00:20 lr 0.001177 wd 0.0500 time 0.4440 (0.4456) data time 0.0007 (0.0016) model time 0.4434 (0.4442) loss 3.7373 (3.4727) grad_norm 1.2729 (1.3805) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][590/625] eta 0:00:15 lr 0.001177 wd 0.0500 time 0.4401 (0.4458) data time 0.0006 (0.0016) model time 0.4395 (0.4444) loss 4.1245 (3.4707) grad_norm 1.0734 (1.3774) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][600/625] eta 0:00:11 lr 0.001177 wd 0.0500 time 0.4402 (0.4457) data time 0.0008 (0.0016) model time 0.4394 (0.4444) loss 3.5974 (3.4712) grad_norm 1.5092 (1.3793) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][610/625] eta 0:00:06 lr 0.001177 wd 0.0500 time 0.4384 (0.4456) data time 0.0004 (0.0016) model time 0.4380 (0.4443) loss 4.0236 (3.4732) grad_norm 1.2549 (1.3802) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [44/300][620/625] eta 0:00:02 lr 0.001177 wd 0.0500 time 0.4406 (0.4455) data time 0.0006 (0.0015) model time 0.4399 (0.4442) loss 3.6337 (3.4716) grad_norm 1.5202 (1.3828) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 44 training takes 0:04:38 [2024-08-04 15:48:02 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:48:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.6538 (0.6538) Acc@1 86.377 (86.377) Acc@5 97.412 (97.412) Mem 16703MB [2024-08-04 15:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.1084 (0.8307) Acc@1 74.121 (81.410) Acc@5 93.066 (96.147) Mem 16703MB [2024-08-04 15:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.2881 (0.9997) Acc@1 70.752 (77.525) Acc@5 91.016 (94.127) Mem 16703MB [2024-08-04 15:48:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.203 Acc@5 94.060 [2024-08-04 15:48:07 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.2% [2024-08-04 15:48:07 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.20% [2024-08-04 15:48:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:48:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:48:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5571 (0.5571) Acc@1 86.182 (86.182) Acc@5 97.900 (97.900) Mem 16703MB [2024-08-04 15:48:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.9878 (0.7081) Acc@1 74.609 (82.058) Acc@5 93.408 (96.378) Mem 16703MB [2024-08-04 15:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.1572 (0.8810) Acc@1 70.850 (78.006) Acc@5 91.211 (94.303) Mem 16703MB [2024-08-04 15:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.811 Acc@5 94.292 [2024-08-04 15:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 77.8% [2024-08-04 15:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.81% [2024-08-04 15:48:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:48:13 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][0/625] eta 0:08:08 lr 0.001177 wd 0.0500 time 0.7808 (0.7808) data time 0.3991 (0.3991) model time 0.0000 (0.0000) loss 2.6120 (2.6120) grad_norm 1.1467 (1.1467) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][10/625] eta 0:04:51 lr 0.001177 wd 0.0500 time 0.4429 (0.4746) data time 0.0006 (0.0370) model time 0.0000 (0.0000) loss 2.5584 (3.0305) grad_norm 1.4422 (1.3252) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][20/625] eta 0:04:38 lr 0.001177 wd 0.0500 time 0.4420 (0.4599) data time 0.0006 (0.0197) model time 0.0000 (0.0000) loss 3.1758 (3.1152) grad_norm 1.1451 (1.2484) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][30/625] eta 0:04:30 lr 0.001177 wd 0.0500 time 0.4472 (0.4546) data time 0.0006 (0.0136) model time 0.0000 (0.0000) loss 4.5135 (3.3338) grad_norm 1.2252 (1.3108) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][40/625] eta 0:04:24 lr 0.001177 wd 0.0500 time 0.4442 (0.4523) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 3.8780 (3.3671) grad_norm 1.0597 (1.3316) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][50/625] eta 0:04:18 lr 0.001177 wd 0.0500 time 0.4424 (0.4504) data time 0.0009 (0.0086) model time 0.0000 (0.0000) loss 3.1935 (3.4232) grad_norm 1.0287 (1.3420) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][60/625] eta 0:04:13 lr 0.001177 wd 0.0500 time 0.4433 (0.4492) data time 0.0008 (0.0073) model time 0.4425 (0.4423) loss 3.9765 (3.4032) grad_norm 1.3109 (1.3516) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][70/625] eta 0:04:09 lr 0.001177 wd 0.0500 time 0.4449 (0.4487) data time 0.0008 (0.0064) model time 0.4441 (0.4435) loss 3.5383 (3.3901) grad_norm 1.4652 (1.3451) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][80/625] eta 0:04:04 lr 0.001177 wd 0.0500 time 0.4467 (0.4482) data time 0.0010 (0.0057) model time 0.4457 (0.4437) loss 3.5329 (3.3927) grad_norm 1.6895 (1.3637) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][90/625] eta 0:04:00 lr 0.001177 wd 0.0500 time 0.4391 (0.4495) data time 0.0007 (0.0052) model time 0.4384 (0.4476) loss 4.0848 (3.3680) grad_norm 2.1727 (1.3752) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][100/625] eta 0:03:55 lr 0.001176 wd 0.0500 time 0.4506 (0.4490) data time 0.0006 (0.0048) model time 0.4499 (0.4468) loss 2.6516 (3.3699) grad_norm 1.2750 (1.4045) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][110/625] eta 0:03:51 lr 0.001176 wd 0.0500 time 0.4431 (0.4504) data time 0.0006 (0.0044) model time 0.4425 (0.4495) loss 3.3370 (3.3777) grad_norm 1.5231 (1.4145) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][120/625] eta 0:03:47 lr 0.001176 wd 0.0500 time 0.4433 (0.4499) data time 0.0007 (0.0041) model time 0.4426 (0.4486) loss 3.4673 (3.3919) grad_norm 1.3532 (1.4163) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][130/625] eta 0:03:42 lr 0.001176 wd 0.0500 time 0.4424 (0.4493) data time 0.0007 (0.0039) model time 0.4417 (0.4478) loss 3.8370 (3.4211) grad_norm 1.3379 (1.4021) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][140/625] eta 0:03:37 lr 0.001176 wd 0.0500 time 0.4449 (0.4488) data time 0.0006 (0.0037) model time 0.4443 (0.4470) loss 3.9540 (3.4258) grad_norm 1.3027 (1.3931) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][150/625] eta 0:03:32 lr 0.001176 wd 0.0500 time 0.4446 (0.4483) data time 0.0008 (0.0035) model time 0.4439 (0.4464) loss 3.5454 (3.4316) grad_norm 1.3052 (1.3951) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][160/625] eta 0:03:28 lr 0.001176 wd 0.0500 time 0.4443 (0.4479) data time 0.0008 (0.0033) model time 0.4435 (0.4459) loss 3.9273 (3.4470) grad_norm 1.3678 (1.3985) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][170/625] eta 0:03:23 lr 0.001176 wd 0.0500 time 0.4413 (0.4477) data time 0.0006 (0.0032) model time 0.4407 (0.4457) loss 2.2646 (3.4169) grad_norm 1.5567 (1.4017) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][180/625] eta 0:03:19 lr 0.001176 wd 0.0500 time 0.4385 (0.4474) data time 0.0008 (0.0030) model time 0.4377 (0.4454) loss 3.3905 (3.4008) grad_norm 1.7921 (1.4112) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][190/625] eta 0:03:14 lr 0.001176 wd 0.0500 time 0.4419 (0.4472) data time 0.0006 (0.0029) model time 0.4414 (0.4451) loss 3.7491 (3.3933) grad_norm 1.1868 (1.4120) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][200/625] eta 0:03:09 lr 0.001176 wd 0.0500 time 0.4428 (0.4469) data time 0.0008 (0.0028) model time 0.4420 (0.4449) loss 2.5240 (3.3905) grad_norm 1.3749 (1.4132) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][210/625] eta 0:03:05 lr 0.001176 wd 0.0500 time 0.4429 (0.4467) data time 0.0009 (0.0027) model time 0.4420 (0.4447) loss 3.4921 (3.3904) grad_norm 1.7965 (1.4116) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][220/625] eta 0:03:00 lr 0.001176 wd 0.0500 time 0.4444 (0.4466) data time 0.0008 (0.0026) model time 0.4436 (0.4446) loss 3.2149 (3.3975) grad_norm 1.3634 (1.4146) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][230/625] eta 0:02:56 lr 0.001176 wd 0.0500 time 0.4499 (0.4466) data time 0.0006 (0.0026) model time 0.4493 (0.4446) loss 3.1206 (3.4088) grad_norm 1.0353 (1.4081) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][240/625] eta 0:02:52 lr 0.001176 wd 0.0500 time 0.4416 (0.4471) data time 0.0009 (0.0025) model time 0.4407 (0.4454) loss 3.6675 (3.4072) grad_norm 1.1976 (1.4027) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][250/625] eta 0:02:47 lr 0.001176 wd 0.0500 time 0.4425 (0.4469) data time 0.0008 (0.0024) model time 0.4417 (0.4452) loss 2.7702 (3.4063) grad_norm 1.5733 (1.4015) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][260/625] eta 0:02:43 lr 0.001176 wd 0.0500 time 0.4417 (0.4468) data time 0.0006 (0.0024) model time 0.4411 (0.4450) loss 3.3782 (3.4086) grad_norm 1.0823 (1.3935) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][270/625] eta 0:02:38 lr 0.001176 wd 0.0500 time 0.4398 (0.4466) data time 0.0008 (0.0023) model time 0.4389 (0.4449) loss 2.8239 (3.4038) grad_norm 1.8219 (1.3944) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][280/625] eta 0:02:34 lr 0.001176 wd 0.0500 time 0.4415 (0.4465) data time 0.0009 (0.0023) model time 0.4406 (0.4448) loss 3.5526 (3.3958) grad_norm 1.2387 (1.3981) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][290/625] eta 0:02:29 lr 0.001176 wd 0.0500 time 0.4467 (0.4464) data time 0.0006 (0.0022) model time 0.4461 (0.4447) loss 2.6528 (3.4043) grad_norm 1.7877 (1.3978) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][300/625] eta 0:02:25 lr 0.001176 wd 0.0500 time 0.4474 (0.4463) data time 0.0008 (0.0022) model time 0.4466 (0.4446) loss 3.7091 (3.4140) grad_norm 1.5817 (1.3940) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][310/625] eta 0:02:20 lr 0.001176 wd 0.0500 time 0.4470 (0.4462) data time 0.0006 (0.0021) model time 0.4464 (0.4446) loss 3.1929 (3.4084) grad_norm 1.5131 (1.3917) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][320/625] eta 0:02:16 lr 0.001176 wd 0.0500 time 0.4377 (0.4461) data time 0.0007 (0.0021) model time 0.4371 (0.4444) loss 4.0962 (3.4195) grad_norm 1.4198 (1.3881) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][330/625] eta 0:02:11 lr 0.001176 wd 0.0500 time 0.4396 (0.4460) data time 0.0008 (0.0020) model time 0.4388 (0.4443) loss 3.4768 (3.4199) grad_norm 0.9807 (1.3821) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][340/625] eta 0:02:07 lr 0.001176 wd 0.0500 time 0.4432 (0.4459) data time 0.0006 (0.0020) model time 0.4426 (0.4442) loss 3.9903 (3.4215) grad_norm 1.7206 (1.3786) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][350/625] eta 0:02:02 lr 0.001176 wd 0.0500 time 0.4385 (0.4458) data time 0.0007 (0.0020) model time 0.4378 (0.4442) loss 2.9555 (3.4214) grad_norm 1.4733 (1.3755) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][360/625] eta 0:01:58 lr 0.001176 wd 0.0500 time 0.4438 (0.4457) data time 0.0009 (0.0019) model time 0.4429 (0.4441) loss 3.8836 (3.4279) grad_norm 1.7983 (1.3773) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:50:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][370/625] eta 0:01:53 lr 0.001176 wd 0.0500 time 0.4392 (0.4456) data time 0.0006 (0.0019) model time 0.4385 (0.4440) loss 2.5751 (3.4222) grad_norm 2.3556 (1.3877) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][380/625] eta 0:01:49 lr 0.001176 wd 0.0500 time 0.4453 (0.4455) data time 0.0006 (0.0019) model time 0.4447 (0.4439) loss 3.2892 (3.4196) grad_norm 1.3217 (1.3865) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][390/625] eta 0:01:44 lr 0.001176 wd 0.0500 time 0.4418 (0.4454) data time 0.0006 (0.0019) model time 0.4411 (0.4438) loss 2.4681 (3.4136) grad_norm 2.0490 (1.3891) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][400/625] eta 0:01:40 lr 0.001176 wd 0.0500 time 0.4448 (0.4453) data time 0.0006 (0.0018) model time 0.4442 (0.4437) loss 3.8301 (3.4193) grad_norm 0.9844 (1.3909) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][410/625] eta 0:01:35 lr 0.001176 wd 0.0500 time 0.4373 (0.4452) data time 0.0009 (0.0018) model time 0.4363 (0.4436) loss 3.4574 (3.4212) grad_norm 1.2931 (1.3888) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][420/625] eta 0:01:31 lr 0.001176 wd 0.0500 time 0.4419 (0.4451) data time 0.0006 (0.0018) model time 0.4413 (0.4435) loss 2.7322 (3.4140) grad_norm 1.0241 (1.3833) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][430/625] eta 0:01:26 lr 0.001175 wd 0.0500 time 0.4400 (0.4450) data time 0.0008 (0.0018) model time 0.4393 (0.4434) loss 3.1264 (3.4165) grad_norm 1.5696 (1.3862) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][440/625] eta 0:01:22 lr 0.001175 wd 0.0500 time 0.4420 (0.4454) data time 0.0008 (0.0017) model time 0.4411 (0.4439) loss 2.7737 (3.4103) grad_norm 1.2415 (1.3927) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][450/625] eta 0:01:17 lr 0.001175 wd 0.0500 time 0.4426 (0.4453) data time 0.0006 (0.0017) model time 0.4420 (0.4438) loss 3.1241 (3.4063) grad_norm 1.2911 (1.3956) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][460/625] eta 0:01:13 lr 0.001175 wd 0.0500 time 0.4413 (0.4453) data time 0.0006 (0.0017) model time 0.4407 (0.4438) loss 3.3704 (3.4072) grad_norm 1.7723 (1.3988) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][470/625] eta 0:01:09 lr 0.001175 wd 0.0500 time 0.4410 (0.4453) data time 0.0008 (0.0017) model time 0.4402 (0.4438) loss 2.5548 (3.4048) grad_norm 1.3104 (1.3970) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][480/625] eta 0:01:04 lr 0.001175 wd 0.0500 time 0.4427 (0.4452) data time 0.0009 (0.0017) model time 0.4418 (0.4437) loss 3.8357 (3.4039) grad_norm 1.4055 (1.3933) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][490/625] eta 0:01:00 lr 0.001175 wd 0.0500 time 0.4411 (0.4451) data time 0.0008 (0.0017) model time 0.4403 (0.4436) loss 3.7964 (3.4001) grad_norm 1.3020 (1.3896) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][500/625] eta 0:00:55 lr 0.001175 wd 0.0500 time 0.4449 (0.4450) data time 0.0008 (0.0016) model time 0.4441 (0.4436) loss 4.0304 (3.4019) grad_norm 1.5433 (1.3868) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][510/625] eta 0:00:51 lr 0.001175 wd 0.0500 time 0.4450 (0.4450) data time 0.0008 (0.0016) model time 0.4442 (0.4436) loss 3.0015 (3.3959) grad_norm 1.8605 (1.3864) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][520/625] eta 0:00:46 lr 0.001175 wd 0.0500 time 0.4400 (0.4450) data time 0.0007 (0.0016) model time 0.4394 (0.4435) loss 3.8059 (3.3983) grad_norm 1.1156 (1.3871) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][530/625] eta 0:00:42 lr 0.001175 wd 0.0500 time 0.4398 (0.4449) data time 0.0008 (0.0016) model time 0.4390 (0.4435) loss 3.7453 (3.4010) grad_norm 1.2937 (1.3857) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][540/625] eta 0:00:37 lr 0.001175 wd 0.0500 time 0.4393 (0.4449) data time 0.0009 (0.0016) model time 0.4384 (0.4435) loss 3.6387 (3.4016) grad_norm 1.1027 (1.3829) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][550/625] eta 0:00:33 lr 0.001175 wd 0.0500 time 0.4428 (0.4448) data time 0.0009 (0.0016) model time 0.4418 (0.4434) loss 3.8330 (3.4037) grad_norm 1.2852 (1.3783) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][560/625] eta 0:00:28 lr 0.001175 wd 0.0500 time 0.4417 (0.4448) data time 0.0007 (0.0016) model time 0.4410 (0.4434) loss 3.7587 (3.4057) grad_norm 1.5166 (1.3751) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][570/625] eta 0:00:24 lr 0.001175 wd 0.0500 time 0.4432 (0.4450) data time 0.0009 (0.0015) model time 0.4423 (0.4437) loss 3.5441 (3.4052) grad_norm 1.3036 (1.3730) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][580/625] eta 0:00:20 lr 0.001175 wd 0.0500 time 0.4389 (0.4450) data time 0.0006 (0.0015) model time 0.4382 (0.4436) loss 2.9015 (3.4056) grad_norm 2.5586 (1.3787) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][590/625] eta 0:00:15 lr 0.001175 wd 0.0500 time 0.4420 (0.4449) data time 0.0006 (0.0015) model time 0.4414 (0.4436) loss 2.8343 (3.4040) grad_norm 1.0881 (1.3775) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][600/625] eta 0:00:11 lr 0.001175 wd 0.0500 time 0.4463 (0.4449) data time 0.0006 (0.0015) model time 0.4456 (0.4436) loss 3.1233 (3.4078) grad_norm 1.6927 (1.3740) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][610/625] eta 0:00:06 lr 0.001175 wd 0.0500 time 0.4388 (0.4449) data time 0.0006 (0.0015) model time 0.4382 (0.4435) loss 3.7372 (3.4101) grad_norm 1.3693 (1.3735) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [45/300][620/625] eta 0:00:02 lr 0.001175 wd 0.0500 time 0.4422 (0.4448) data time 0.0004 (0.0015) model time 0.4417 (0.4435) loss 2.5986 (3.4110) grad_norm 1.5589 (1.3735) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:52:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 45 training takes 0:04:37 [2024-08-04 15:52:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:52:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:52:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.6309 (0.6309) Acc@1 85.400 (85.400) Acc@5 97.803 (97.803) Mem 16703MB [2024-08-04 15:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.0352 (0.7790) Acc@1 74.756 (81.721) Acc@5 93.311 (96.316) Mem 16703MB [2024-08-04 15:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1699 (0.9522) Acc@1 72.070 (77.674) Acc@5 91.455 (94.229) Mem 16703MB [2024-08-04 15:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.439 Acc@5 94.170 [2024-08-04 15:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.4% [2024-08-04 15:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.44% [2024-08-04 15:52:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:52:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:52:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5542 (0.5542) Acc@1 86.230 (86.230) Acc@5 97.949 (97.949) Mem 16703MB [2024-08-04 15:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.151) Loss 0.9780 (0.7046) Acc@1 74.658 (82.173) Acc@5 93.604 (96.449) Mem 16703MB [2024-08-04 15:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1455 (0.8748) Acc@1 71.045 (78.181) Acc@5 91.211 (94.415) Mem 16703MB [2024-08-04 15:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.985 Acc@5 94.410 [2024-08-04 15:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.0% [2024-08-04 15:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 77.99% [2024-08-04 15:53:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:53:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:53:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][0/625] eta 0:07:46 lr 0.001175 wd 0.0500 time 0.7460 (0.7460) data time 0.3612 (0.3612) model time 0.0000 (0.0000) loss 3.4950 (3.4950) grad_norm 0.9696 (0.9696) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][10/625] eta 0:05:02 lr 0.001175 wd 0.0500 time 0.4424 (0.4913) data time 0.0008 (0.0336) model time 0.0000 (0.0000) loss 2.4734 (3.4750) grad_norm 1.1055 (1.3663) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][20/625] eta 0:04:43 lr 0.001175 wd 0.0500 time 0.4447 (0.4687) data time 0.0006 (0.0180) model time 0.0000 (0.0000) loss 2.8962 (3.4008) grad_norm 1.9673 (1.3936) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][30/625] eta 0:04:33 lr 0.001175 wd 0.0500 time 0.4424 (0.4605) data time 0.0007 (0.0125) model time 0.0000 (0.0000) loss 4.0058 (3.4326) grad_norm 1.7368 (1.4551) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][40/625] eta 0:04:26 lr 0.001175 wd 0.0500 time 0.4435 (0.4562) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 3.8563 (3.4927) grad_norm 1.2155 (1.4405) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][50/625] eta 0:04:20 lr 0.001175 wd 0.0500 time 0.4394 (0.4533) data time 0.0008 (0.0079) model time 0.0000 (0.0000) loss 3.7311 (3.4787) grad_norm 1.0867 (1.4175) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][60/625] eta 0:04:15 lr 0.001175 wd 0.0500 time 0.4434 (0.4516) data time 0.0009 (0.0068) model time 0.4425 (0.4420) loss 3.0382 (3.4830) grad_norm 1.1061 (1.3908) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][70/625] eta 0:04:11 lr 0.001175 wd 0.0500 time 0.6549 (0.4534) data time 0.0007 (0.0059) model time 0.6543 (0.4527) loss 2.9731 (3.4982) grad_norm 2.1082 (1.3932) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][80/625] eta 0:04:06 lr 0.001175 wd 0.0500 time 0.4432 (0.4514) data time 0.0008 (0.0053) model time 0.4424 (0.4473) loss 3.6154 (3.4788) grad_norm 1.7754 (1.3947) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][90/625] eta 0:04:00 lr 0.001175 wd 0.0500 time 0.4422 (0.4504) data time 0.0008 (0.0048) model time 0.4414 (0.4459) loss 2.7973 (3.4942) grad_norm 1.5923 (1.3915) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][100/625] eta 0:03:56 lr 0.001175 wd 0.0500 time 0.4434 (0.4497) data time 0.0008 (0.0044) model time 0.4426 (0.4451) loss 3.4696 (3.4637) grad_norm 1.1917 (1.3695) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][110/625] eta 0:03:51 lr 0.001175 wd 0.0500 time 0.4416 (0.4491) data time 0.0007 (0.0041) model time 0.4409 (0.4447) loss 4.1045 (3.4660) grad_norm 1.4271 (1.3545) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:53:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][120/625] eta 0:03:46 lr 0.001175 wd 0.0500 time 0.4440 (0.4485) data time 0.0009 (0.0038) model time 0.4431 (0.4442) loss 2.9196 (3.4386) grad_norm 1.5361 (1.3724) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][130/625] eta 0:03:41 lr 0.001175 wd 0.0500 time 0.4380 (0.4480) data time 0.0007 (0.0036) model time 0.4373 (0.4438) loss 2.6743 (3.4227) grad_norm 1.4060 (1.3660) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][140/625] eta 0:03:37 lr 0.001174 wd 0.0500 time 0.4408 (0.4476) data time 0.0009 (0.0034) model time 0.4399 (0.4435) loss 3.5359 (3.4178) grad_norm 1.2470 (1.3537) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][150/625] eta 0:03:32 lr 0.001174 wd 0.0500 time 0.4382 (0.4471) data time 0.0006 (0.0032) model time 0.4376 (0.4431) loss 3.9702 (3.4067) grad_norm 1.3443 (1.3468) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][160/625] eta 0:03:27 lr 0.001174 wd 0.0500 time 0.4440 (0.4468) data time 0.0008 (0.0031) model time 0.4432 (0.4430) loss 3.3552 (3.4007) grad_norm 1.7835 (1.3521) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][170/625] eta 0:03:23 lr 0.001174 wd 0.0500 time 0.4447 (0.4467) data time 0.0007 (0.0030) model time 0.4440 (0.4430) loss 3.6598 (3.4032) grad_norm 1.5700 (1.3571) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][180/625] eta 0:03:18 lr 0.001174 wd 0.0500 time 0.4448 (0.4466) data time 0.0008 (0.0028) model time 0.4440 (0.4431) loss 3.6440 (3.4073) grad_norm 1.6038 (1.3602) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][190/625] eta 0:03:14 lr 0.001174 wd 0.0500 time 0.6600 (0.4476) data time 0.0006 (0.0027) model time 0.6594 (0.4447) loss 3.6093 (3.4026) grad_norm 1.6367 (1.3633) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 15:54:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][200/625] eta 0:03:10 lr 0.001174 wd 0.0500 time 0.4437 (0.4473) data time 0.0007 (0.0026) model time 0.4430 (0.4445) loss 2.4448 (3.3900) grad_norm 1.1614 (1.3555) loss_scale 16384.0000 (8477.2935) mem 16703MB [2024-08-04 15:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][210/625] eta 0:03:05 lr 0.001174 wd 0.0500 time 0.4399 (0.4471) data time 0.0008 (0.0026) model time 0.4392 (0.4442) loss 3.4603 (3.3802) grad_norm 1.5272 (1.3528) loss_scale 16384.0000 (8852.0190) mem 16703MB [2024-08-04 15:54:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][220/625] eta 0:03:01 lr 0.001174 wd 0.0500 time 0.5718 (0.4480) data time 0.0008 (0.0025) model time 0.5710 (0.4456) loss 3.9721 (3.3915) grad_norm 2.8609 (1.3664) loss_scale 16384.0000 (9192.8326) mem 16703MB [2024-08-04 15:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][230/625] eta 0:02:56 lr 0.001174 wd 0.0500 time 0.4454 (0.4478) data time 0.0006 (0.0024) model time 0.4448 (0.4455) loss 4.4107 (3.4105) grad_norm 1.1072 (1.3636) loss_scale 16384.0000 (9504.1385) mem 16703MB [2024-08-04 15:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][240/625] eta 0:02:52 lr 0.001174 wd 0.0500 time 0.4446 (0.4476) data time 0.0008 (0.0023) model time 0.4438 (0.4453) loss 3.5183 (3.4137) grad_norm 1.1877 (1.3576) loss_scale 16384.0000 (9789.6100) mem 16703MB [2024-08-04 15:54:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][250/625] eta 0:02:47 lr 0.001174 wd 0.0500 time 0.4457 (0.4475) data time 0.0006 (0.0023) model time 0.4451 (0.4452) loss 3.1986 (3.4197) grad_norm 0.8942 (1.3571) loss_scale 16384.0000 (10052.3347) mem 16703MB [2024-08-04 15:54:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][260/625] eta 0:02:43 lr 0.001174 wd 0.0500 time 0.4390 (0.4473) data time 0.0006 (0.0022) model time 0.4384 (0.4450) loss 3.3466 (3.4102) grad_norm 1.4814 (1.3591) loss_scale 16384.0000 (10294.9272) mem 16703MB [2024-08-04 15:55:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][270/625] eta 0:02:38 lr 0.001174 wd 0.0500 time 0.4396 (0.4471) data time 0.0006 (0.0022) model time 0.4390 (0.4448) loss 3.7028 (3.4118) grad_norm 1.2781 (1.3660) loss_scale 16384.0000 (10519.6162) mem 16703MB [2024-08-04 15:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][280/625] eta 0:02:34 lr 0.001174 wd 0.0500 time 0.4434 (0.4469) data time 0.0008 (0.0021) model time 0.4426 (0.4447) loss 3.7542 (3.4138) grad_norm 1.9329 (1.3620) loss_scale 16384.0000 (10728.3132) mem 16703MB [2024-08-04 15:55:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][290/625] eta 0:02:29 lr 0.001174 wd 0.0500 time 0.4410 (0.4468) data time 0.0008 (0.0021) model time 0.4402 (0.4445) loss 2.9284 (3.4047) grad_norm 2.0986 (1.3713) loss_scale 16384.0000 (10922.6667) mem 16703MB [2024-08-04 15:55:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][300/625] eta 0:02:25 lr 0.001174 wd 0.0500 time 0.4430 (0.4467) data time 0.0009 (0.0021) model time 0.4421 (0.4445) loss 3.5035 (3.4049) grad_norm 1.0807 (1.3682) loss_scale 16384.0000 (11104.1063) mem 16703MB [2024-08-04 15:55:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][310/625] eta 0:02:20 lr 0.001174 wd 0.0500 time 0.4443 (0.4466) data time 0.0006 (0.0020) model time 0.4437 (0.4445) loss 2.9359 (3.4038) grad_norm 1.1455 (1.3674) loss_scale 16384.0000 (11273.8778) mem 16703MB [2024-08-04 15:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][320/625] eta 0:02:16 lr 0.001174 wd 0.0500 time 0.4445 (0.4466) data time 0.0006 (0.0020) model time 0.4439 (0.4445) loss 2.7188 (3.3966) grad_norm 1.5128 (1.3693) loss_scale 16384.0000 (11433.0717) mem 16703MB [2024-08-04 15:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][330/625] eta 0:02:11 lr 0.001174 wd 0.0500 time 0.4449 (0.4465) data time 0.0008 (0.0019) model time 0.4442 (0.4445) loss 3.2723 (3.3995) grad_norm 1.6213 (1.3730) loss_scale 16384.0000 (11582.6465) mem 16703MB [2024-08-04 15:55:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][340/625] eta 0:02:07 lr 0.001174 wd 0.0500 time 0.4380 (0.4464) data time 0.0008 (0.0019) model time 0.4372 (0.4444) loss 3.6682 (3.4025) grad_norm 1.6603 (1.3783) loss_scale 16384.0000 (11723.4487) mem 16703MB [2024-08-04 15:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][350/625] eta 0:02:02 lr 0.001174 wd 0.0500 time 0.4410 (0.4463) data time 0.0006 (0.0019) model time 0.4404 (0.4443) loss 3.8522 (3.4093) grad_norm 1.1615 (1.3845) loss_scale 16384.0000 (11856.2279) mem 16703MB [2024-08-04 15:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][360/625] eta 0:01:58 lr 0.001174 wd 0.0500 time 0.4410 (0.4462) data time 0.0008 (0.0018) model time 0.4403 (0.4442) loss 2.5674 (3.4085) grad_norm 1.4335 (1.3820) loss_scale 16384.0000 (11981.6510) mem 16703MB [2024-08-04 15:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][370/625] eta 0:01:53 lr 0.001174 wd 0.0500 time 0.4403 (0.4466) data time 0.0008 (0.0018) model time 0.4395 (0.4447) loss 2.7810 (3.4094) grad_norm 1.3167 (1.3818) loss_scale 16384.0000 (12100.3127) mem 16703MB [2024-08-04 15:55:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][380/625] eta 0:01:49 lr 0.001174 wd 0.0500 time 0.4432 (0.4466) data time 0.0006 (0.0018) model time 0.4425 (0.4447) loss 3.1927 (3.4049) grad_norm 1.0619 (1.3754) loss_scale 16384.0000 (12212.7454) mem 16703MB [2024-08-04 15:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][390/625] eta 0:01:44 lr 0.001174 wd 0.0500 time 0.4418 (0.4465) data time 0.0009 (0.0018) model time 0.4409 (0.4447) loss 3.2025 (3.4122) grad_norm 0.9017 (1.3701) loss_scale 16384.0000 (12319.4271) mem 16703MB [2024-08-04 15:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][400/625] eta 0:01:40 lr 0.001174 wd 0.0500 time 0.4460 (0.4465) data time 0.0006 (0.0017) model time 0.4454 (0.4446) loss 2.2672 (3.4049) grad_norm 1.8350 (1.3693) loss_scale 16384.0000 (12420.7880) mem 16703MB [2024-08-04 15:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][410/625] eta 0:01:35 lr 0.001174 wd 0.0500 time 0.4440 (0.4464) data time 0.0007 (0.0017) model time 0.4433 (0.4446) loss 3.4857 (3.4074) grad_norm 1.1053 (1.3683) loss_scale 16384.0000 (12517.2165) mem 16703MB [2024-08-04 15:56:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][420/625] eta 0:01:31 lr 0.001174 wd 0.0500 time 0.4470 (0.4471) data time 0.0009 (0.0017) model time 0.4462 (0.4454) loss 3.3053 (3.4077) grad_norm 1.9102 (1.3699) loss_scale 16384.0000 (12609.0641) mem 16703MB [2024-08-04 15:56:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][430/625] eta 0:01:27 lr 0.001174 wd 0.0500 time 0.4432 (0.4470) data time 0.0008 (0.0017) model time 0.4424 (0.4453) loss 3.1866 (3.4129) grad_norm 1.4449 (1.3698) loss_scale 16384.0000 (12696.6497) mem 16703MB [2024-08-04 15:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][440/625] eta 0:01:22 lr 0.001174 wd 0.0500 time 0.4441 (0.4469) data time 0.0006 (0.0017) model time 0.4435 (0.4453) loss 3.2725 (3.4118) grad_norm 1.5076 (1.3661) loss_scale 16384.0000 (12780.2630) mem 16703MB [2024-08-04 15:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][450/625] eta 0:01:18 lr 0.001174 wd 0.0500 time 0.4443 (0.4469) data time 0.0007 (0.0016) model time 0.4436 (0.4452) loss 3.7366 (3.4144) grad_norm 1.3704 (1.3642) loss_scale 16384.0000 (12860.1685) mem 16703MB [2024-08-04 15:56:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][460/625] eta 0:01:13 lr 0.001173 wd 0.0500 time 0.4422 (0.4468) data time 0.0008 (0.0016) model time 0.4414 (0.4452) loss 3.5261 (3.4204) grad_norm 1.1478 (1.3635) loss_scale 16384.0000 (12936.6074) mem 16703MB [2024-08-04 15:56:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][470/625] eta 0:01:09 lr 0.001173 wd 0.0500 time 0.4445 (0.4468) data time 0.0008 (0.0016) model time 0.4436 (0.4452) loss 2.8488 (3.4249) grad_norm 2.0132 (1.3656) loss_scale 16384.0000 (13009.8004) mem 16703MB [2024-08-04 15:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][480/625] eta 0:01:04 lr 0.001173 wd 0.0500 time 0.4452 (0.4467) data time 0.0008 (0.0016) model time 0.4443 (0.4451) loss 3.4750 (3.4261) grad_norm 1.5426 (1.3657) loss_scale 16384.0000 (13079.9501) mem 16703MB [2024-08-04 15:56:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][490/625] eta 0:01:00 lr 0.001173 wd 0.0500 time 0.4442 (0.4467) data time 0.0008 (0.0016) model time 0.4434 (0.4451) loss 2.8725 (3.4235) grad_norm 1.4919 (1.3653) loss_scale 16384.0000 (13147.2424) mem 16703MB [2024-08-04 15:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][500/625] eta 0:00:55 lr 0.001173 wd 0.0500 time 0.4411 (0.4466) data time 0.0007 (0.0016) model time 0.4404 (0.4450) loss 3.7393 (3.4262) grad_norm 1.4032 (1.3626) loss_scale 16384.0000 (13211.8483) mem 16703MB [2024-08-04 15:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][510/625] eta 0:00:51 lr 0.001173 wd 0.0500 time 0.4387 (0.4465) data time 0.0009 (0.0015) model time 0.4378 (0.4450) loss 3.8669 (3.4232) grad_norm 1.4437 (1.3632) loss_scale 16384.0000 (13273.9256) mem 16703MB [2024-08-04 15:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][520/625] eta 0:00:46 lr 0.001173 wd 0.0500 time 0.4427 (0.4464) data time 0.0008 (0.0015) model time 0.4419 (0.4449) loss 3.4480 (3.4252) grad_norm 1.0847 (1.3640) loss_scale 16384.0000 (13333.6200) mem 16703MB [2024-08-04 15:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][530/625] eta 0:00:42 lr 0.001173 wd 0.0500 time 0.4420 (0.4464) data time 0.0008 (0.0015) model time 0.4412 (0.4448) loss 3.2897 (3.4259) grad_norm 1.3000 (1.3618) loss_scale 16384.0000 (13391.0659) mem 16703MB [2024-08-04 15:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][540/625] eta 0:00:37 lr 0.001173 wd 0.0500 time 0.4387 (0.4463) data time 0.0007 (0.0015) model time 0.4379 (0.4447) loss 4.2311 (3.4317) grad_norm 1.5067 (1.3612) loss_scale 16384.0000 (13446.3882) mem 16703MB [2024-08-04 15:57:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][550/625] eta 0:00:33 lr 0.001173 wd 0.0500 time 0.4382 (0.4462) data time 0.0009 (0.0015) model time 0.4372 (0.4447) loss 2.2310 (3.4294) grad_norm 1.3129 (1.3657) loss_scale 16384.0000 (13499.7024) mem 16703MB [2024-08-04 15:57:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][560/625] eta 0:00:29 lr 0.001173 wd 0.0500 time 0.4466 (0.4471) data time 0.0006 (0.0015) model time 0.4460 (0.4457) loss 2.8449 (3.4294) grad_norm 1.0843 (1.3653) loss_scale 16384.0000 (13551.1159) mem 16703MB [2024-08-04 15:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][570/625] eta 0:00:24 lr 0.001173 wd 0.0500 time 0.4422 (0.4470) data time 0.0009 (0.0015) model time 0.4414 (0.4456) loss 2.6603 (3.4308) grad_norm 1.1315 (1.3644) loss_scale 16384.0000 (13600.7285) mem 16703MB [2024-08-04 15:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][580/625] eta 0:00:20 lr 0.001173 wd 0.0500 time 0.4420 (0.4469) data time 0.0006 (0.0015) model time 0.4414 (0.4455) loss 2.3583 (3.4356) grad_norm 1.0939 (1.3623) loss_scale 16384.0000 (13648.6334) mem 16703MB [2024-08-04 15:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][590/625] eta 0:00:15 lr 0.001173 wd 0.0500 time 0.4482 (0.4469) data time 0.0007 (0.0015) model time 0.4475 (0.4454) loss 2.9868 (3.4396) grad_norm 1.3279 (1.3664) loss_scale 16384.0000 (13694.9171) mem 16703MB [2024-08-04 15:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][600/625] eta 0:00:11 lr 0.001173 wd 0.0500 time 0.4480 (0.4468) data time 0.0008 (0.0014) model time 0.4472 (0.4454) loss 3.4860 (3.4416) grad_norm 1.1231 (1.3680) loss_scale 16384.0000 (13739.6606) mem 16703MB [2024-08-04 15:57:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][610/625] eta 0:00:06 lr 0.001173 wd 0.0500 time 0.4389 (0.4468) data time 0.0003 (0.0014) model time 0.4386 (0.4454) loss 4.6348 (3.4460) grad_norm 1.7980 (1.3673) loss_scale 16384.0000 (13782.9394) mem 16703MB [2024-08-04 15:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [46/300][620/625] eta 0:00:02 lr 0.001173 wd 0.0500 time 0.4360 (0.4467) data time 0.0006 (0.0014) model time 0.4354 (0.4453) loss 3.1242 (3.4450) grad_norm 1.1323 (1.3673) loss_scale 16384.0000 (13824.8245) mem 16703MB [2024-08-04 15:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 46 training takes 0:04:39 [2024-08-04 15:57:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 15:57:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 15:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.6597 (0.6597) Acc@1 85.596 (85.596) Acc@5 97.559 (97.559) Mem 16703MB [2024-08-04 15:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.0693 (0.8066) Acc@1 75.684 (81.760) Acc@5 93.408 (96.151) Mem 16703MB [2024-08-04 15:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.2646 (0.9784) Acc@1 70.605 (77.618) Acc@5 90.576 (93.978) Mem 16703MB [2024-08-04 15:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.449 Acc@5 94.004 [2024-08-04 15:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.4% [2024-08-04 15:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.45% [2024-08-04 15:57:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 15:57:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 15:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5527 (0.5527) Acc@1 86.426 (86.426) Acc@5 97.949 (97.949) Mem 16703MB [2024-08-04 15:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 0.9678 (0.7015) Acc@1 75.098 (82.351) Acc@5 93.604 (96.511) Mem 16703MB [2024-08-04 15:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.1367 (0.8691) Acc@1 71.240 (78.381) Acc@5 91.455 (94.527) Mem 16703MB [2024-08-04 15:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.173 Acc@5 94.520 [2024-08-04 15:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-08-04 15:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.17% [2024-08-04 15:57:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 15:57:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 15:57:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][0/625] eta 0:07:19 lr 0.001173 wd 0.0500 time 0.7040 (0.7040) data time 0.3207 (0.3207) model time 0.0000 (0.0000) loss 3.6906 (3.6906) grad_norm 1.7058 (1.7058) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][10/625] eta 0:04:46 lr 0.001173 wd 0.0500 time 0.4429 (0.4665) data time 0.0009 (0.0299) model time 0.0000 (0.0000) loss 4.2057 (3.5262) grad_norm 1.1471 (1.4253) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:58:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][20/625] eta 0:04:35 lr 0.001173 wd 0.0500 time 0.4441 (0.4547) data time 0.0007 (0.0161) model time 0.0000 (0.0000) loss 2.2518 (3.4257) grad_norm 1.0072 (1.4924) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:58:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][30/625] eta 0:04:28 lr 0.001173 wd 0.0500 time 0.4443 (0.4512) data time 0.0007 (0.0112) model time 0.0000 (0.0000) loss 3.7379 (3.3832) grad_norm 0.9436 (1.4825) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][40/625] eta 0:04:25 lr 0.001173 wd 0.0500 time 0.4398 (0.4542) data time 0.0010 (0.0087) model time 0.0000 (0.0000) loss 3.7871 (3.4055) grad_norm 1.1323 (1.4495) loss_scale 16384.0000 (16384.0000) mem 16703MB [2024-08-04 15:58:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][50/625] eta 0:04:21 lr 0.001173 wd 0.0500 time 0.3844 (0.4541) data time 0.0007 (0.0071) model time 0.0000 (0.0000) loss 4.7447 (3.4357) grad_norm 1.2342 (inf) loss_scale 8192.0000 (15902.1176) mem 16703MB [2024-08-04 15:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][60/625] eta 0:04:15 lr 0.001173 wd 0.0500 time 0.4392 (0.4522) data time 0.0009 (0.0061) model time 0.4382 (0.4411) loss 2.6478 (3.3840) grad_norm 1.7457 (inf) loss_scale 8192.0000 (14638.1639) mem 16703MB [2024-08-04 15:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][70/625] eta 0:04:10 lr 0.001173 wd 0.0500 time 0.4437 (0.4507) data time 0.0008 (0.0054) model time 0.4429 (0.4411) loss 3.4997 (3.3582) grad_norm 1.2559 (inf) loss_scale 8192.0000 (13730.2535) mem 16703MB [2024-08-04 15:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][80/625] eta 0:04:05 lr 0.001173 wd 0.0500 time 0.4312 (0.4496) data time 0.0007 (0.0048) model time 0.4304 (0.4409) loss 3.6269 (3.3688) grad_norm 1.4052 (inf) loss_scale 8192.0000 (13046.5185) mem 16703MB [2024-08-04 15:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][90/625] eta 0:04:00 lr 0.001173 wd 0.0500 time 0.4421 (0.4487) data time 0.0006 (0.0044) model time 0.4414 (0.4410) loss 3.8438 (3.3853) grad_norm 1.0755 (inf) loss_scale 8192.0000 (12513.0549) mem 16703MB [2024-08-04 15:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][100/625] eta 0:03:55 lr 0.001173 wd 0.0500 time 0.4429 (0.4481) data time 0.0007 (0.0040) model time 0.4422 (0.4410) loss 2.5023 (3.3834) grad_norm 1.8768 (inf) loss_scale 8192.0000 (12085.2277) mem 16703MB [2024-08-04 15:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][110/625] eta 0:03:50 lr 0.001173 wd 0.0500 time 0.4479 (0.4477) data time 0.0007 (0.0037) model time 0.4472 (0.4413) loss 3.0192 (3.3894) grad_norm 1.0768 (inf) loss_scale 8192.0000 (11734.4865) mem 16703MB [2024-08-04 15:58:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][120/625] eta 0:03:46 lr 0.001173 wd 0.0500 time 0.4415 (0.4490) data time 0.0006 (0.0035) model time 0.4409 (0.4443) loss 2.4414 (3.3716) grad_norm 1.5960 (inf) loss_scale 8192.0000 (11441.7190) mem 16703MB [2024-08-04 15:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][130/625] eta 0:03:41 lr 0.001173 wd 0.0500 time 0.4402 (0.4485) data time 0.0008 (0.0033) model time 0.4393 (0.4440) loss 3.0275 (3.3574) grad_norm 1.5265 (inf) loss_scale 8192.0000 (11193.6489) mem 16703MB [2024-08-04 15:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][140/625] eta 0:03:37 lr 0.001173 wd 0.0500 time 0.4423 (0.4481) data time 0.0006 (0.0031) model time 0.4417 (0.4438) loss 2.5712 (3.3337) grad_norm 1.3099 (inf) loss_scale 8192.0000 (10980.7660) mem 16703MB [2024-08-04 15:58:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][150/625] eta 0:03:32 lr 0.001172 wd 0.0500 time 0.4433 (0.4477) data time 0.0008 (0.0030) model time 0.4425 (0.4435) loss 3.6903 (3.3386) grad_norm 2.5024 (inf) loss_scale 8192.0000 (10796.0795) mem 16703MB [2024-08-04 15:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][160/625] eta 0:03:28 lr 0.001172 wd 0.0500 time 0.4464 (0.4475) data time 0.0008 (0.0028) model time 0.4456 (0.4436) loss 2.5631 (3.3261) grad_norm 1.4955 (inf) loss_scale 8192.0000 (10634.3354) mem 16703MB [2024-08-04 15:59:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][170/625] eta 0:03:23 lr 0.001172 wd 0.0500 time 0.4424 (0.4473) data time 0.0006 (0.0027) model time 0.4418 (0.4435) loss 3.8676 (3.3390) grad_norm 1.3791 (inf) loss_scale 8192.0000 (10491.5088) mem 16703MB [2024-08-04 15:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][180/625] eta 0:03:18 lr 0.001172 wd 0.0500 time 0.4485 (0.4471) data time 0.0006 (0.0026) model time 0.4478 (0.4435) loss 3.6530 (3.3440) grad_norm 1.2660 (inf) loss_scale 8192.0000 (10364.4641) mem 16703MB [2024-08-04 15:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][190/625] eta 0:03:14 lr 0.001172 wd 0.0500 time 0.4453 (0.4469) data time 0.0008 (0.0025) model time 0.4444 (0.4435) loss 3.8859 (3.3495) grad_norm 1.5194 (inf) loss_scale 8192.0000 (10250.7225) mem 16703MB [2024-08-04 15:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][200/625] eta 0:03:10 lr 0.001172 wd 0.0500 time 0.4410 (0.4476) data time 0.0009 (0.0024) model time 0.4401 (0.4446) loss 3.7358 (3.3447) grad_norm 1.2179 (inf) loss_scale 8192.0000 (10148.2985) mem 16703MB [2024-08-04 15:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][210/625] eta 0:03:05 lr 0.001172 wd 0.0500 time 0.4428 (0.4474) data time 0.0008 (0.0024) model time 0.4420 (0.4445) loss 3.4945 (3.3605) grad_norm 0.9888 (inf) loss_scale 8192.0000 (10055.5829) mem 16703MB [2024-08-04 15:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][220/625] eta 0:03:01 lr 0.001172 wd 0.0500 time 0.4390 (0.4472) data time 0.0007 (0.0023) model time 0.4383 (0.4443) loss 2.5829 (3.3625) grad_norm 1.0266 (inf) loss_scale 8192.0000 (9971.2579) mem 16703MB [2024-08-04 15:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][230/625] eta 0:02:56 lr 0.001172 wd 0.0500 time 0.4440 (0.4470) data time 0.0006 (0.0022) model time 0.4434 (0.4441) loss 2.7151 (3.3581) grad_norm 1.1663 (inf) loss_scale 8192.0000 (9894.2338) mem 16703MB [2024-08-04 15:59:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][240/625] eta 0:02:52 lr 0.001172 wd 0.0500 time 0.4436 (0.4469) data time 0.0009 (0.0022) model time 0.4428 (0.4441) loss 3.5671 (3.3633) grad_norm 1.4210 (inf) loss_scale 8192.0000 (9823.6017) mem 16703MB [2024-08-04 15:59:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][250/625] eta 0:02:47 lr 0.001172 wd 0.0500 time 0.4444 (0.4468) data time 0.0008 (0.0021) model time 0.4436 (0.4440) loss 3.9828 (3.3750) grad_norm 1.4502 (inf) loss_scale 8192.0000 (9758.5976) mem 16703MB [2024-08-04 15:59:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][260/625] eta 0:02:43 lr 0.001172 wd 0.0500 time 0.4482 (0.4467) data time 0.0008 (0.0021) model time 0.4473 (0.4441) loss 3.2785 (3.3678) grad_norm 1.2954 (inf) loss_scale 8192.0000 (9698.5747) mem 16703MB [2024-08-04 15:59:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][270/625] eta 0:02:38 lr 0.001172 wd 0.0500 time 0.4390 (0.4467) data time 0.0006 (0.0020) model time 0.4383 (0.4441) loss 3.7599 (3.3701) grad_norm 0.9150 (inf) loss_scale 8192.0000 (9642.9815) mem 16703MB [2024-08-04 15:59:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][280/625] eta 0:02:34 lr 0.001172 wd 0.0500 time 0.4436 (0.4466) data time 0.0006 (0.0020) model time 0.4429 (0.4441) loss 3.3115 (3.3686) grad_norm 1.3543 (inf) loss_scale 8192.0000 (9591.3452) mem 16703MB [2024-08-04 16:00:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][290/625] eta 0:02:29 lr 0.001172 wd 0.0500 time 0.4417 (0.4465) data time 0.0010 (0.0019) model time 0.4407 (0.4440) loss 2.7431 (3.3718) grad_norm 1.3480 (inf) loss_scale 8192.0000 (9543.2577) mem 16703MB [2024-08-04 16:00:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][300/625] eta 0:02:25 lr 0.001172 wd 0.0500 time 0.4433 (0.4464) data time 0.0006 (0.0019) model time 0.4426 (0.4440) loss 2.8256 (3.3682) grad_norm 1.5568 (inf) loss_scale 8192.0000 (9498.3654) mem 16703MB [2024-08-04 16:00:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][310/625] eta 0:02:20 lr 0.001172 wd 0.0500 time 0.4421 (0.4463) data time 0.0008 (0.0019) model time 0.4413 (0.4439) loss 3.7270 (3.3777) grad_norm 1.2652 (inf) loss_scale 8192.0000 (9456.3601) mem 16703MB [2024-08-04 16:00:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][320/625] eta 0:02:16 lr 0.001172 wd 0.0500 time 0.4445 (0.4462) data time 0.0008 (0.0018) model time 0.4437 (0.4439) loss 2.7977 (3.3730) grad_norm 1.1562 (inf) loss_scale 8192.0000 (9416.9720) mem 16703MB [2024-08-04 16:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][330/625] eta 0:02:11 lr 0.001172 wd 0.0500 time 0.4447 (0.4462) data time 0.0008 (0.0018) model time 0.4439 (0.4439) loss 3.8116 (3.3798) grad_norm 1.5053 (inf) loss_scale 8192.0000 (9379.9637) mem 16703MB [2024-08-04 16:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][340/625] eta 0:02:07 lr 0.001172 wd 0.0500 time 0.4449 (0.4462) data time 0.0006 (0.0018) model time 0.4443 (0.4439) loss 2.9829 (3.3750) grad_norm 1.4422 (inf) loss_scale 8192.0000 (9345.1261) mem 16703MB [2024-08-04 16:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][350/625] eta 0:02:02 lr 0.001172 wd 0.0500 time 0.4505 (0.4461) data time 0.0007 (0.0017) model time 0.4499 (0.4439) loss 2.7566 (3.3745) grad_norm 1.3080 (inf) loss_scale 8192.0000 (9312.2735) mem 16703MB [2024-08-04 16:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][360/625] eta 0:01:58 lr 0.001172 wd 0.0500 time 0.4493 (0.4461) data time 0.0007 (0.0017) model time 0.4486 (0.4440) loss 3.2918 (3.3715) grad_norm 1.4568 (inf) loss_scale 8192.0000 (9281.2410) mem 16703MB [2024-08-04 16:00:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][370/625] eta 0:01:53 lr 0.001172 wd 0.0500 time 0.4453 (0.4465) data time 0.0008 (0.0017) model time 0.4445 (0.4445) loss 3.7663 (3.3776) grad_norm 1.1994 (inf) loss_scale 8192.0000 (9251.8814) mem 16703MB [2024-08-04 16:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][380/625] eta 0:01:49 lr 0.001172 wd 0.0500 time 0.4514 (0.4465) data time 0.0008 (0.0017) model time 0.4506 (0.4445) loss 3.9242 (3.3755) grad_norm 1.9269 (inf) loss_scale 8192.0000 (9224.0630) mem 16703MB [2024-08-04 16:00:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][390/625] eta 0:01:44 lr 0.001172 wd 0.0500 time 0.4420 (0.4464) data time 0.0006 (0.0017) model time 0.4414 (0.4445) loss 3.6474 (3.3801) grad_norm 1.4556 (inf) loss_scale 8192.0000 (9197.6675) mem 16703MB [2024-08-04 16:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][400/625] eta 0:01:40 lr 0.001172 wd 0.0500 time 0.4450 (0.4464) data time 0.0009 (0.0016) model time 0.4441 (0.4444) loss 3.4979 (3.3821) grad_norm 1.2192 (inf) loss_scale 8192.0000 (9172.5885) mem 16703MB [2024-08-04 16:00:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][410/625] eta 0:01:35 lr 0.001172 wd 0.0500 time 0.4433 (0.4464) data time 0.0008 (0.0016) model time 0.4425 (0.4444) loss 2.4807 (3.3855) grad_norm 1.3531 (inf) loss_scale 8192.0000 (9148.7299) mem 16703MB [2024-08-04 16:01:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][420/625] eta 0:01:31 lr 0.001172 wd 0.0500 time 0.4436 (0.4463) data time 0.0007 (0.0016) model time 0.4430 (0.4444) loss 3.8294 (3.3797) grad_norm 1.2310 (inf) loss_scale 8192.0000 (9126.0048) mem 16703MB [2024-08-04 16:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][430/625] eta 0:01:27 lr 0.001172 wd 0.0500 time 0.4387 (0.4462) data time 0.0006 (0.0016) model time 0.4380 (0.4443) loss 3.7440 (3.3804) grad_norm 1.2444 (inf) loss_scale 8192.0000 (9104.3341) mem 16703MB [2024-08-04 16:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][440/625] eta 0:01:22 lr 0.001172 wd 0.0500 time 0.4374 (0.4461) data time 0.0006 (0.0016) model time 0.4368 (0.4442) loss 2.7677 (3.3791) grad_norm 1.6585 (inf) loss_scale 8192.0000 (9083.6463) mem 16703MB [2024-08-04 16:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][450/625] eta 0:01:18 lr 0.001172 wd 0.0500 time 0.6502 (0.4465) data time 0.0008 (0.0015) model time 0.6494 (0.4447) loss 3.5789 (3.3787) grad_norm 1.9448 (inf) loss_scale 8192.0000 (9063.8758) mem 16703MB [2024-08-04 16:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][460/625] eta 0:01:13 lr 0.001171 wd 0.0500 time 0.4388 (0.4464) data time 0.0008 (0.0015) model time 0.4380 (0.4446) loss 2.3352 (3.3758) grad_norm 0.9766 (inf) loss_scale 8192.0000 (9044.9631) mem 16703MB [2024-08-04 16:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][470/625] eta 0:01:09 lr 0.001171 wd 0.0500 time 0.4445 (0.4463) data time 0.0006 (0.0015) model time 0.4439 (0.4445) loss 3.8063 (3.3764) grad_norm 1.4951 (inf) loss_scale 8192.0000 (9026.8535) mem 16703MB [2024-08-04 16:01:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][480/625] eta 0:01:04 lr 0.001171 wd 0.0500 time 0.4428 (0.4462) data time 0.0008 (0.0015) model time 0.4420 (0.4445) loss 3.4179 (3.3806) grad_norm 1.3250 (inf) loss_scale 8192.0000 (9009.4969) mem 16703MB [2024-08-04 16:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][490/625] eta 0:01:00 lr 0.001171 wd 0.0500 time 0.4437 (0.4462) data time 0.0007 (0.0015) model time 0.4429 (0.4445) loss 2.9382 (3.3806) grad_norm 1.4452 (inf) loss_scale 8192.0000 (8992.8473) mem 16703MB [2024-08-04 16:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][500/625] eta 0:00:55 lr 0.001171 wd 0.0500 time 0.4476 (0.4462) data time 0.0006 (0.0015) model time 0.4470 (0.4445) loss 3.6748 (3.3838) grad_norm 1.6410 (inf) loss_scale 8192.0000 (8976.8623) mem 16703MB [2024-08-04 16:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][510/625] eta 0:00:51 lr 0.001171 wd 0.0500 time 0.4426 (0.4462) data time 0.0008 (0.0015) model time 0.4417 (0.4444) loss 3.8293 (3.3921) grad_norm 1.4399 (inf) loss_scale 8192.0000 (8961.5029) mem 16703MB [2024-08-04 16:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][520/625] eta 0:00:46 lr 0.001171 wd 0.0500 time 0.4457 (0.4461) data time 0.0006 (0.0014) model time 0.4450 (0.4444) loss 3.3595 (3.3934) grad_norm 1.0327 (inf) loss_scale 8192.0000 (8946.7332) mem 16703MB [2024-08-04 16:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][530/625] eta 0:00:42 lr 0.001171 wd 0.0500 time 0.5955 (0.4464) data time 0.0006 (0.0014) model time 0.5949 (0.4447) loss 2.9098 (3.3962) grad_norm 1.1604 (inf) loss_scale 8192.0000 (8932.5198) mem 16703MB [2024-08-04 16:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][540/625] eta 0:00:37 lr 0.001171 wd 0.0500 time 0.4459 (0.4463) data time 0.0007 (0.0014) model time 0.4452 (0.4447) loss 2.9036 (3.3997) grad_norm 1.2601 (inf) loss_scale 8192.0000 (8918.8318) mem 16703MB [2024-08-04 16:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][550/625] eta 0:00:33 lr 0.001171 wd 0.0500 time 0.4445 (0.4463) data time 0.0009 (0.0014) model time 0.4436 (0.4446) loss 3.0329 (3.3984) grad_norm 1.2732 (inf) loss_scale 8192.0000 (8905.6407) mem 16703MB [2024-08-04 16:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][560/625] eta 0:00:29 lr 0.001171 wd 0.0500 time 0.4409 (0.4465) data time 0.0006 (0.0014) model time 0.4403 (0.4449) loss 3.2842 (3.3951) grad_norm 1.2053 (inf) loss_scale 8192.0000 (8892.9198) mem 16703MB [2024-08-04 16:02:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][570/625] eta 0:00:24 lr 0.001171 wd 0.0500 time 0.4424 (0.4465) data time 0.0007 (0.0014) model time 0.4417 (0.4449) loss 3.1844 (3.4013) grad_norm 1.4699 (inf) loss_scale 8192.0000 (8880.6445) mem 16703MB [2024-08-04 16:02:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][580/625] eta 0:00:20 lr 0.001171 wd 0.0500 time 0.4407 (0.4464) data time 0.0007 (0.0014) model time 0.4401 (0.4448) loss 3.9416 (3.4018) grad_norm 1.1259 (inf) loss_scale 8192.0000 (8868.7917) mem 16703MB [2024-08-04 16:02:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][590/625] eta 0:00:15 lr 0.001171 wd 0.0500 time 0.4402 (0.4463) data time 0.0007 (0.0014) model time 0.4395 (0.4447) loss 3.8309 (3.4043) grad_norm 1.2382 (inf) loss_scale 8192.0000 (8857.3401) mem 16703MB [2024-08-04 16:02:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][600/625] eta 0:00:11 lr 0.001171 wd 0.0500 time 0.4407 (0.4462) data time 0.0006 (0.0014) model time 0.4401 (0.4447) loss 2.8041 (3.4018) grad_norm 1.2076 (inf) loss_scale 8192.0000 (8846.2696) mem 16703MB [2024-08-04 16:02:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][610/625] eta 0:00:06 lr 0.001171 wd 0.0500 time 0.4418 (0.4461) data time 0.0004 (0.0014) model time 0.4413 (0.4446) loss 2.0336 (3.3981) grad_norm 0.9502 (inf) loss_scale 8192.0000 (8835.5614) mem 16703MB [2024-08-04 16:02:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [47/300][620/625] eta 0:00:02 lr 0.001171 wd 0.0500 time 0.4418 (0.4460) data time 0.0004 (0.0013) model time 0.4413 (0.4445) loss 3.4829 (3.3980) grad_norm 1.7455 (inf) loss_scale 8192.0000 (8825.1981) mem 16703MB [2024-08-04 16:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 47 training takes 0:04:38 [2024-08-04 16:02:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:02:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:02:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.6245 (0.6245) Acc@1 85.400 (85.400) Acc@5 97.852 (97.852) Mem 16703MB [2024-08-04 16:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.151) Loss 1.0811 (0.8061) Acc@1 74.805 (81.246) Acc@5 93.213 (96.316) Mem 16703MB [2024-08-04 16:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.134) Loss 1.2373 (0.9708) Acc@1 70.752 (77.627) Acc@5 90.820 (94.292) Mem 16703MB [2024-08-04 16:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.325 Acc@5 94.244 [2024-08-04 16:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.3% [2024-08-04 16:02:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.778 (0.778) Loss 0.5518 (0.5518) Acc@1 86.426 (86.426) Acc@5 97.900 (97.900) Mem 16703MB [2024-08-04 16:02:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.180) Loss 0.9585 (0.6990) Acc@1 75.293 (82.524) Acc@5 93.848 (96.582) Mem 16703MB [2024-08-04 16:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.149) Loss 1.1279 (0.8640) Acc@1 71.387 (78.599) Acc@5 91.602 (94.589) Mem 16703MB [2024-08-04 16:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.377 Acc@5 94.580 [2024-08-04 16:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-08-04 16:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.38% [2024-08-04 16:02:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:02:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][0/625] eta 0:08:28 lr 0.001171 wd 0.0500 time 0.8137 (0.8137) data time 0.4290 (0.4290) model time 0.0000 (0.0000) loss 4.0370 (4.0370) grad_norm 1.1711 (1.1711) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:02:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][10/625] eta 0:04:53 lr 0.001171 wd 0.0500 time 0.4467 (0.4769) data time 0.0007 (0.0397) model time 0.0000 (0.0000) loss 3.2082 (3.3847) grad_norm 1.4928 (1.4023) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:02:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][20/625] eta 0:04:38 lr 0.001171 wd 0.0500 time 0.4400 (0.4605) data time 0.0007 (0.0212) model time 0.0000 (0.0000) loss 2.9556 (3.3508) grad_norm 1.2483 (1.3967) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][30/625] eta 0:04:34 lr 0.001171 wd 0.0500 time 0.4476 (0.4620) data time 0.0006 (0.0146) model time 0.0000 (0.0000) loss 2.4695 (3.3688) grad_norm 0.8786 (1.3208) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:02:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][40/625] eta 0:04:27 lr 0.001171 wd 0.0500 time 0.4437 (0.4574) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 3.3436 (3.4114) grad_norm 1.3370 (1.2980) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][50/625] eta 0:04:21 lr 0.001171 wd 0.0500 time 0.4401 (0.4547) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 3.7287 (3.3858) grad_norm 1.0821 (1.3498) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][60/625] eta 0:04:15 lr 0.001171 wd 0.0500 time 0.4438 (0.4529) data time 0.0008 (0.0078) model time 0.4430 (0.4428) loss 2.9510 (3.4068) grad_norm 1.2306 (1.3498) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][70/625] eta 0:04:10 lr 0.001171 wd 0.0500 time 0.4435 (0.4516) data time 0.0009 (0.0068) model time 0.4427 (0.4431) loss 3.5037 (3.4222) grad_norm 1.8175 (1.3352) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][80/625] eta 0:04:06 lr 0.001171 wd 0.0500 time 0.3870 (0.4521) data time 0.0009 (0.0061) model time 0.3861 (0.4469) loss 3.2288 (3.3997) grad_norm 1.6173 (1.3432) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][90/625] eta 0:04:01 lr 0.001171 wd 0.0500 time 0.4434 (0.4509) data time 0.0006 (0.0055) model time 0.4428 (0.4453) loss 4.3888 (3.4032) grad_norm 1.2452 (1.3514) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][100/625] eta 0:03:56 lr 0.001171 wd 0.0500 time 0.4363 (0.4499) data time 0.0009 (0.0051) model time 0.4354 (0.4440) loss 3.1752 (3.3942) grad_norm 0.9411 (1.3392) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][110/625] eta 0:03:51 lr 0.001171 wd 0.0500 time 0.4409 (0.4490) data time 0.0008 (0.0047) model time 0.4401 (0.4433) loss 2.7742 (3.3971) grad_norm 1.0881 (1.3278) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][120/625] eta 0:03:46 lr 0.001171 wd 0.0500 time 0.4428 (0.4485) data time 0.0010 (0.0044) model time 0.4418 (0.4432) loss 3.1171 (3.3788) grad_norm 1.0730 (1.3507) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][130/625] eta 0:03:42 lr 0.001170 wd 0.0500 time 0.6138 (0.4494) data time 0.0007 (0.0041) model time 0.6131 (0.4451) loss 3.2945 (3.3875) grad_norm 1.3775 (1.3571) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][140/625] eta 0:03:37 lr 0.001170 wd 0.0500 time 0.4466 (0.4489) data time 0.0007 (0.0039) model time 0.4459 (0.4448) loss 3.8943 (3.3897) grad_norm 1.4476 (1.3625) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][150/625] eta 0:03:33 lr 0.001170 wd 0.0500 time 0.4478 (0.4487) data time 0.0009 (0.0037) model time 0.4470 (0.4448) loss 3.5799 (3.3817) grad_norm 1.9033 (1.3674) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][160/625] eta 0:03:28 lr 0.001170 wd 0.0500 time 0.4520 (0.4485) data time 0.0008 (0.0035) model time 0.4511 (0.4448) loss 3.2334 (3.3777) grad_norm 1.0740 (1.3649) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:03:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][170/625] eta 0:03:23 lr 0.001170 wd 0.0500 time 0.4416 (0.4482) data time 0.0008 (0.0034) model time 0.4407 (0.4445) loss 3.1199 (3.3786) grad_norm 1.2337 (1.3789) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][180/625] eta 0:03:19 lr 0.001170 wd 0.0500 time 0.4534 (0.4480) data time 0.0008 (0.0032) model time 0.4526 (0.4445) loss 3.6364 (3.3814) grad_norm 1.1926 (1.3816) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][190/625] eta 0:03:14 lr 0.001170 wd 0.0500 time 0.4448 (0.4480) data time 0.0008 (0.0031) model time 0.4440 (0.4446) loss 3.0833 (3.3760) grad_norm 1.3961 (1.3838) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][200/625] eta 0:03:10 lr 0.001170 wd 0.0500 time 0.4433 (0.4477) data time 0.0008 (0.0030) model time 0.4425 (0.4445) loss 3.5938 (3.3692) grad_norm 1.2163 (1.3813) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][210/625] eta 0:03:05 lr 0.001170 wd 0.0500 time 0.4413 (0.4475) data time 0.0008 (0.0029) model time 0.4405 (0.4444) loss 3.4724 (3.3822) grad_norm 1.0368 (1.3792) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][220/625] eta 0:03:01 lr 0.001170 wd 0.0500 time 0.4465 (0.4481) data time 0.0010 (0.0028) model time 0.4455 (0.4453) loss 3.6065 (3.3806) grad_norm 1.9010 (1.3804) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][230/625] eta 0:02:56 lr 0.001170 wd 0.0500 time 0.4521 (0.4480) data time 0.0008 (0.0027) model time 0.4513 (0.4452) loss 2.6733 (3.3899) grad_norm 1.3219 (1.3721) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][240/625] eta 0:02:52 lr 0.001170 wd 0.0500 time 0.4419 (0.4478) data time 0.0006 (0.0026) model time 0.4413 (0.4451) loss 3.0788 (3.3916) grad_norm 2.0758 (1.3738) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][250/625] eta 0:02:47 lr 0.001170 wd 0.0500 time 0.4395 (0.4475) data time 0.0006 (0.0026) model time 0.4389 (0.4449) loss 3.3603 (3.3890) grad_norm 0.9045 (1.3667) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][260/625] eta 0:02:43 lr 0.001170 wd 0.0500 time 0.4485 (0.4473) data time 0.0006 (0.0025) model time 0.4479 (0.4447) loss 4.2687 (3.3972) grad_norm 1.2957 (1.3630) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][270/625] eta 0:02:38 lr 0.001170 wd 0.0500 time 0.4497 (0.4478) data time 0.0006 (0.0024) model time 0.4491 (0.4453) loss 3.2631 (3.3990) grad_norm 1.4553 (1.3680) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][280/625] eta 0:02:34 lr 0.001170 wd 0.0500 time 0.4428 (0.4476) data time 0.0009 (0.0024) model time 0.4419 (0.4451) loss 3.6678 (3.4094) grad_norm 0.9152 (1.3619) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][290/625] eta 0:02:29 lr 0.001170 wd 0.0500 time 0.4419 (0.4474) data time 0.0009 (0.0023) model time 0.4410 (0.4450) loss 3.5265 (3.4112) grad_norm 1.0939 (1.3633) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][300/625] eta 0:02:25 lr 0.001170 wd 0.0500 time 0.4424 (0.4472) data time 0.0008 (0.0023) model time 0.4416 (0.4448) loss 3.6476 (3.4191) grad_norm 1.0444 (1.3597) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:04:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][310/625] eta 0:02:20 lr 0.001170 wd 0.0500 time 0.4423 (0.4470) data time 0.0008 (0.0022) model time 0.4415 (0.4446) loss 3.7883 (3.4145) grad_norm 1.7023 (1.3604) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][320/625] eta 0:02:16 lr 0.001170 wd 0.0500 time 0.4459 (0.4468) data time 0.0008 (0.0022) model time 0.4451 (0.4445) loss 3.8724 (3.4208) grad_norm 1.1299 (1.3591) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][330/625] eta 0:02:11 lr 0.001170 wd 0.0500 time 0.4394 (0.4467) data time 0.0007 (0.0021) model time 0.4388 (0.4444) loss 4.1330 (3.4174) grad_norm 1.0838 (1.3566) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][340/625] eta 0:02:07 lr 0.001170 wd 0.0500 time 0.4401 (0.4465) data time 0.0008 (0.0021) model time 0.4393 (0.4442) loss 3.6122 (3.4186) grad_norm 1.5487 (1.3619) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][350/625] eta 0:02:02 lr 0.001170 wd 0.0500 time 0.6149 (0.4469) data time 0.0008 (0.0021) model time 0.6141 (0.4447) loss 3.3478 (3.4116) grad_norm 1.3309 (1.3605) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][360/625] eta 0:01:58 lr 0.001170 wd 0.0500 time 0.4401 (0.4468) data time 0.0008 (0.0020) model time 0.4392 (0.4446) loss 1.9891 (3.4049) grad_norm 1.5944 (1.3615) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][370/625] eta 0:01:53 lr 0.001170 wd 0.0500 time 0.4403 (0.4466) data time 0.0009 (0.0020) model time 0.4394 (0.4445) loss 3.6401 (3.4118) grad_norm 2.2132 (1.3601) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][380/625] eta 0:01:49 lr 0.001170 wd 0.0500 time 0.4409 (0.4465) data time 0.0006 (0.0020) model time 0.4403 (0.4444) loss 3.8795 (3.4146) grad_norm 1.2060 (1.3640) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][390/625] eta 0:01:44 lr 0.001170 wd 0.0500 time 0.4439 (0.4464) data time 0.0009 (0.0019) model time 0.4431 (0.4443) loss 3.4486 (3.4170) grad_norm 1.3100 (1.3636) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][400/625] eta 0:01:40 lr 0.001170 wd 0.0500 time 0.4393 (0.4463) data time 0.0009 (0.0019) model time 0.4385 (0.4442) loss 3.0354 (3.4133) grad_norm 1.3924 (1.3642) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][410/625] eta 0:01:35 lr 0.001170 wd 0.0500 time 0.4418 (0.4462) data time 0.0007 (0.0019) model time 0.4412 (0.4441) loss 4.5097 (3.4180) grad_norm 1.2679 (1.3669) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][420/625] eta 0:01:31 lr 0.001170 wd 0.0500 time 0.4425 (0.4461) data time 0.0007 (0.0019) model time 0.4419 (0.4440) loss 3.0364 (3.4188) grad_norm 1.2810 (1.3643) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][430/625] eta 0:01:26 lr 0.001169 wd 0.0500 time 0.4439 (0.4460) data time 0.0008 (0.0018) model time 0.4430 (0.4440) loss 3.8203 (3.4128) grad_norm 1.5093 (1.3625) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][440/625] eta 0:01:22 lr 0.001169 wd 0.0500 time 0.4421 (0.4464) data time 0.0006 (0.0018) model time 0.4415 (0.4445) loss 4.1090 (3.4164) grad_norm 1.0072 (1.3623) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][450/625] eta 0:01:18 lr 0.001169 wd 0.0500 time 0.4435 (0.4464) data time 0.0006 (0.0018) model time 0.4429 (0.4445) loss 3.2475 (3.4187) grad_norm 1.3591 (1.3623) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][460/625] eta 0:01:13 lr 0.001169 wd 0.0500 time 0.4421 (0.4463) data time 0.0008 (0.0018) model time 0.4413 (0.4444) loss 2.7735 (3.4155) grad_norm 1.9271 (1.3627) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][470/625] eta 0:01:09 lr 0.001169 wd 0.0500 time 0.4439 (0.4462) data time 0.0009 (0.0018) model time 0.4431 (0.4444) loss 2.8768 (3.4109) grad_norm 1.1663 (1.3609) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][480/625] eta 0:01:04 lr 0.001169 wd 0.0500 time 0.4407 (0.4461) data time 0.0009 (0.0017) model time 0.4397 (0.4442) loss 2.5412 (3.4142) grad_norm 1.4126 (1.3582) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][490/625] eta 0:01:00 lr 0.001169 wd 0.0500 time 0.6260 (0.4464) data time 0.0006 (0.0017) model time 0.6253 (0.4446) loss 2.4043 (3.4081) grad_norm 1.1097 (1.3589) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][500/625] eta 0:00:55 lr 0.001169 wd 0.0500 time 0.4412 (0.4465) data time 0.0008 (0.0017) model time 0.4404 (0.4448) loss 3.8000 (3.4146) grad_norm 1.2769 (1.3575) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][510/625] eta 0:00:51 lr 0.001169 wd 0.0500 time 0.4411 (0.4464) data time 0.0007 (0.0017) model time 0.4404 (0.4447) loss 3.6734 (3.4088) grad_norm 1.7027 (1.3588) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][520/625] eta 0:00:46 lr 0.001169 wd 0.0500 time 0.4600 (0.4464) data time 0.0008 (0.0017) model time 0.4592 (0.4446) loss 3.3648 (3.4109) grad_norm 1.4277 (1.3615) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][530/625] eta 0:00:42 lr 0.001169 wd 0.0500 time 0.4462 (0.4463) data time 0.0008 (0.0017) model time 0.4455 (0.4446) loss 3.6330 (3.4059) grad_norm 1.2304 (1.3591) loss_scale 8192.0000 (8192.0000) mem 16703MB [2024-08-04 16:06:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][540/625] eta 0:00:37 lr 0.001169 wd 0.0500 time 0.4415 (0.4462) data time 0.0009 (0.0016) model time 0.4406 (0.4445) loss 3.0215 (3.4030) grad_norm 1.3291 (inf) loss_scale 4096.0000 (8176.8577) mem 16703MB [2024-08-04 16:06:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][550/625] eta 0:00:33 lr 0.001169 wd 0.0500 time 0.4386 (0.4461) data time 0.0006 (0.0016) model time 0.4380 (0.4444) loss 2.8908 (3.4008) grad_norm 1.1838 (inf) loss_scale 4096.0000 (8102.7949) mem 16703MB [2024-08-04 16:06:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][560/625] eta 0:00:28 lr 0.001169 wd 0.0500 time 0.4424 (0.4461) data time 0.0009 (0.0016) model time 0.4416 (0.4444) loss 2.9912 (3.4016) grad_norm 1.0035 (inf) loss_scale 4096.0000 (8031.3725) mem 16703MB [2024-08-04 16:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][570/625] eta 0:00:24 lr 0.001169 wd 0.0500 time 0.4457 (0.4460) data time 0.0008 (0.0016) model time 0.4449 (0.4443) loss 3.1028 (3.4048) grad_norm 1.3470 (inf) loss_scale 4096.0000 (7962.4518) mem 16703MB [2024-08-04 16:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][580/625] eta 0:00:20 lr 0.001169 wd 0.0500 time 0.4395 (0.4459) data time 0.0009 (0.0016) model time 0.4386 (0.4443) loss 3.5650 (3.4022) grad_norm 1.1170 (inf) loss_scale 4096.0000 (7895.9036) mem 16703MB [2024-08-04 16:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][590/625] eta 0:00:15 lr 0.001169 wd 0.0500 time 0.4424 (0.4463) data time 0.0006 (0.0016) model time 0.4417 (0.4446) loss 3.1354 (3.3987) grad_norm 1.5903 (inf) loss_scale 4096.0000 (7831.6074) mem 16703MB [2024-08-04 16:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][600/625] eta 0:00:11 lr 0.001169 wd 0.0500 time 0.4441 (0.4462) data time 0.0006 (0.0016) model time 0.4435 (0.4446) loss 2.3177 (3.3945) grad_norm 1.2894 (inf) loss_scale 4096.0000 (7769.4509) mem 16703MB [2024-08-04 16:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][610/625] eta 0:00:06 lr 0.001169 wd 0.0500 time 0.4460 (0.4462) data time 0.0006 (0.0015) model time 0.4454 (0.4446) loss 3.3636 (3.3937) grad_norm 1.3065 (inf) loss_scale 4096.0000 (7709.3290) mem 16703MB [2024-08-04 16:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [48/300][620/625] eta 0:00:02 lr 0.001169 wd 0.0500 time 0.4348 (0.4461) data time 0.0004 (0.0015) model time 0.4344 (0.4444) loss 3.5522 (3.4002) grad_norm 1.3047 (inf) loss_scale 4096.0000 (7651.1433) mem 16703MB [2024-08-04 16:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 48 training takes 0:04:38 [2024-08-04 16:07:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:07:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.6216 (0.6216) Acc@1 86.230 (86.230) Acc@5 97.705 (97.705) Mem 16703MB [2024-08-04 16:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 1.0674 (0.8062) Acc@1 75.049 (81.641) Acc@5 92.920 (96.222) Mem 16703MB [2024-08-04 16:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.2178 (0.9647) Acc@1 69.971 (77.653) Acc@5 91.504 (94.275) Mem 16703MB [2024-08-04 16:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.405 Acc@5 94.274 [2024-08-04 16:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.4% [2024-08-04 16:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.848 (0.848) Loss 0.5518 (0.5518) Acc@1 86.572 (86.572) Acc@5 97.998 (97.998) Mem 16703MB [2024-08-04 16:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.188) Loss 0.9526 (0.6971) Acc@1 75.977 (82.719) Acc@5 94.043 (96.635) Mem 16703MB [2024-08-04 16:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.153) Loss 1.1191 (0.8596) Acc@1 71.680 (78.848) Acc@5 91.602 (94.689) Mem 16703MB [2024-08-04 16:07:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.609 Acc@5 94.672 [2024-08-04 16:07:28 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.6% [2024-08-04 16:07:28 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.61% [2024-08-04 16:07:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:07:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][0/625] eta 0:07:29 lr 0.001169 wd 0.0500 time 0.7186 (0.7186) data time 0.3268 (0.3268) model time 0.0000 (0.0000) loss 4.0551 (4.0551) grad_norm 1.1043 (1.1043) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:07:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][10/625] eta 0:04:47 lr 0.001169 wd 0.0500 time 0.4426 (0.4677) data time 0.0006 (0.0304) model time 0.0000 (0.0000) loss 2.9299 (3.4296) grad_norm 1.1818 (1.2076) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:07:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][20/625] eta 0:04:36 lr 0.001169 wd 0.0500 time 0.4392 (0.4565) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 3.7818 (3.4654) grad_norm 1.2879 (1.2219) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][30/625] eta 0:04:29 lr 0.001169 wd 0.0500 time 0.4439 (0.4527) data time 0.0007 (0.0113) model time 0.0000 (0.0000) loss 2.4305 (3.3977) grad_norm 1.5125 (1.2497) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][40/625] eta 0:04:23 lr 0.001169 wd 0.0500 time 0.4516 (0.4507) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 4.5317 (3.4417) grad_norm 1.8439 (1.3221) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][50/625] eta 0:04:18 lr 0.001169 wd 0.0500 time 0.4433 (0.4490) data time 0.0006 (0.0072) model time 0.0000 (0.0000) loss 2.4148 (3.4325) grad_norm 1.1949 (1.4151) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][60/625] eta 0:04:13 lr 0.001169 wd 0.0500 time 0.4401 (0.4481) data time 0.0006 (0.0062) model time 0.4395 (0.4431) loss 3.5000 (3.4374) grad_norm 1.5762 (1.4323) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][70/625] eta 0:04:08 lr 0.001169 wd 0.0500 time 0.4460 (0.4474) data time 0.0008 (0.0054) model time 0.4452 (0.4427) loss 4.0575 (3.4694) grad_norm 0.9736 (1.4293) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][80/625] eta 0:04:03 lr 0.001169 wd 0.0500 time 0.4436 (0.4469) data time 0.0009 (0.0048) model time 0.4427 (0.4427) loss 3.7528 (3.4828) grad_norm 1.2992 (1.4008) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][90/625] eta 0:04:00 lr 0.001169 wd 0.0500 time 0.6106 (0.4504) data time 0.0006 (0.0044) model time 0.6099 (0.4514) loss 2.8712 (3.4501) grad_norm 1.6843 (1.4264) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][100/625] eta 0:03:55 lr 0.001168 wd 0.0500 time 0.4370 (0.4495) data time 0.0007 (0.0041) model time 0.4363 (0.4492) loss 4.1315 (3.4348) grad_norm 1.1965 (1.4208) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][110/625] eta 0:03:51 lr 0.001168 wd 0.0500 time 0.4458 (0.4490) data time 0.0006 (0.0038) model time 0.4452 (0.4482) loss 3.4211 (3.3968) grad_norm 1.0267 (1.4083) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][120/625] eta 0:03:46 lr 0.001168 wd 0.0500 time 0.4453 (0.4485) data time 0.0006 (0.0035) model time 0.4447 (0.4474) loss 2.8413 (3.4045) grad_norm 1.5642 (1.4358) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][130/625] eta 0:03:43 lr 0.001168 wd 0.0500 time 0.4436 (0.4507) data time 0.0009 (0.0033) model time 0.4427 (0.4510) loss 3.8874 (3.3867) grad_norm 1.3283 (1.4392) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][140/625] eta 0:03:38 lr 0.001168 wd 0.0500 time 0.4395 (0.4502) data time 0.0009 (0.0031) model time 0.4386 (0.4501) loss 3.2108 (3.3920) grad_norm 1.1601 (1.4183) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][150/625] eta 0:03:33 lr 0.001168 wd 0.0500 time 0.4429 (0.4497) data time 0.0007 (0.0030) model time 0.4423 (0.4493) loss 4.3198 (3.3892) grad_norm 1.3426 (1.4088) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][160/625] eta 0:03:28 lr 0.001168 wd 0.0500 time 0.4449 (0.4493) data time 0.0008 (0.0028) model time 0.4441 (0.4487) loss 3.2099 (3.3850) grad_norm 1.2906 (1.4075) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][170/625] eta 0:03:24 lr 0.001168 wd 0.0500 time 0.4408 (0.4489) data time 0.0006 (0.0027) model time 0.4402 (0.4481) loss 3.5219 (3.3785) grad_norm 1.3754 (1.4055) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][180/625] eta 0:03:19 lr 0.001168 wd 0.0500 time 0.4405 (0.4486) data time 0.0008 (0.0026) model time 0.4397 (0.4476) loss 3.5358 (3.3895) grad_norm 1.2697 (1.4060) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][190/625] eta 0:03:15 lr 0.001168 wd 0.0500 time 0.4421 (0.4483) data time 0.0011 (0.0025) model time 0.4410 (0.4472) loss 3.5914 (3.3875) grad_norm 1.3471 (1.4026) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][200/625] eta 0:03:10 lr 0.001168 wd 0.0500 time 0.4445 (0.4480) data time 0.0006 (0.0024) model time 0.4439 (0.4468) loss 3.9986 (3.4016) grad_norm 1.1037 (1.4029) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][210/625] eta 0:03:05 lr 0.001168 wd 0.0500 time 0.4427 (0.4477) data time 0.0007 (0.0024) model time 0.4420 (0.4465) loss 2.9402 (3.4002) grad_norm 1.1107 (1.4102) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][220/625] eta 0:03:01 lr 0.001168 wd 0.0500 time 0.4353 (0.4475) data time 0.0006 (0.0023) model time 0.4346 (0.4462) loss 4.6733 (3.4044) grad_norm 1.3271 (1.4059) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][230/625] eta 0:02:56 lr 0.001168 wd 0.0500 time 0.4442 (0.4473) data time 0.0008 (0.0022) model time 0.4434 (0.4461) loss 3.6150 (3.4012) grad_norm 1.8608 (1.4145) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][240/625] eta 0:02:52 lr 0.001168 wd 0.0500 time 0.4445 (0.4472) data time 0.0008 (0.0022) model time 0.4437 (0.4459) loss 3.6474 (3.4104) grad_norm 1.2501 (1.4100) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][250/625] eta 0:02:47 lr 0.001168 wd 0.0500 time 0.4416 (0.4470) data time 0.0008 (0.0021) model time 0.4408 (0.4457) loss 3.5641 (3.4019) grad_norm 1.4864 (1.4119) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][260/625] eta 0:02:43 lr 0.001168 wd 0.0500 time 0.4408 (0.4468) data time 0.0008 (0.0021) model time 0.4400 (0.4455) loss 3.8136 (3.3912) grad_norm 1.1682 (1.4048) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][270/625] eta 0:02:38 lr 0.001168 wd 0.0500 time 0.4432 (0.4466) data time 0.0006 (0.0020) model time 0.4425 (0.4453) loss 4.4430 (3.3950) grad_norm 2.1164 (1.4062) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][280/625] eta 0:02:34 lr 0.001168 wd 0.0500 time 0.4486 (0.4465) data time 0.0008 (0.0020) model time 0.4478 (0.4452) loss 3.9745 (3.3999) grad_norm 1.5200 (1.4066) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][290/625] eta 0:02:29 lr 0.001168 wd 0.0500 time 0.4403 (0.4463) data time 0.0007 (0.0019) model time 0.4396 (0.4449) loss 2.7685 (3.3903) grad_norm 1.2354 (1.4066) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][300/625] eta 0:02:25 lr 0.001168 wd 0.0500 time 0.4518 (0.4462) data time 0.0008 (0.0019) model time 0.4510 (0.4449) loss 3.0197 (3.3900) grad_norm 1.5752 (1.4021) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][310/625] eta 0:02:20 lr 0.001168 wd 0.0500 time 0.4433 (0.4461) data time 0.0006 (0.0019) model time 0.4427 (0.4448) loss 4.1457 (3.3916) grad_norm 1.5335 (1.3975) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][320/625] eta 0:02:16 lr 0.001168 wd 0.0500 time 0.4434 (0.4461) data time 0.0006 (0.0018) model time 0.4428 (0.4447) loss 4.1212 (3.3936) grad_norm 1.1782 (1.3983) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:09:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][330/625] eta 0:02:11 lr 0.001168 wd 0.0500 time 0.4409 (0.4460) data time 0.0008 (0.0018) model time 0.4401 (0.4446) loss 3.4995 (3.3847) grad_norm 0.9705 (1.3948) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][340/625] eta 0:02:07 lr 0.001168 wd 0.0500 time 0.4426 (0.4458) data time 0.0008 (0.0018) model time 0.4419 (0.4445) loss 3.7235 (3.3907) grad_norm 1.1733 (1.3943) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][350/625] eta 0:02:02 lr 0.001168 wd 0.0500 time 0.4401 (0.4458) data time 0.0007 (0.0017) model time 0.4394 (0.4444) loss 2.0422 (3.3912) grad_norm 1.5055 (1.3928) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][360/625] eta 0:01:58 lr 0.001168 wd 0.0500 time 0.4415 (0.4457) data time 0.0006 (0.0017) model time 0.4409 (0.4443) loss 2.3240 (3.3850) grad_norm 1.4291 (1.3912) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][370/625] eta 0:01:53 lr 0.001168 wd 0.0500 time 0.4388 (0.4456) data time 0.0008 (0.0017) model time 0.4380 (0.4443) loss 3.8635 (3.3865) grad_norm 1.9939 (1.3921) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][380/625] eta 0:01:49 lr 0.001168 wd 0.0500 time 0.4442 (0.4456) data time 0.0007 (0.0017) model time 0.4436 (0.4443) loss 4.0119 (3.3882) grad_norm 1.1984 (1.3894) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][390/625] eta 0:01:44 lr 0.001167 wd 0.0500 time 0.4438 (0.4456) data time 0.0008 (0.0016) model time 0.4430 (0.4443) loss 3.5918 (3.3891) grad_norm 1.5693 (1.3924) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][400/625] eta 0:01:40 lr 0.001167 wd 0.0500 time 0.4440 (0.4455) data time 0.0006 (0.0016) model time 0.4434 (0.4442) loss 2.6685 (3.3897) grad_norm 1.2153 (1.3975) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][410/625] eta 0:01:35 lr 0.001167 wd 0.0500 time 0.4390 (0.4454) data time 0.0009 (0.0016) model time 0.4381 (0.4441) loss 3.2188 (3.3839) grad_norm 2.0410 (1.3999) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][420/625] eta 0:01:31 lr 0.001167 wd 0.0500 time 0.4410 (0.4454) data time 0.0006 (0.0016) model time 0.4404 (0.4441) loss 3.2387 (3.3868) grad_norm 1.2974 (1.3998) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][430/625] eta 0:01:26 lr 0.001167 wd 0.0500 time 0.4454 (0.4458) data time 0.0009 (0.0016) model time 0.4445 (0.4445) loss 3.6331 (3.3884) grad_norm 1.2671 (1.4004) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][440/625] eta 0:01:22 lr 0.001167 wd 0.0500 time 0.4423 (0.4457) data time 0.0006 (0.0016) model time 0.4417 (0.4445) loss 4.1374 (3.3887) grad_norm 1.2047 (1.3976) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][450/625] eta 0:01:17 lr 0.001167 wd 0.0500 time 0.4439 (0.4457) data time 0.0006 (0.0015) model time 0.4433 (0.4444) loss 3.1488 (3.3886) grad_norm 1.3422 (1.3927) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][460/625] eta 0:01:13 lr 0.001167 wd 0.0500 time 0.4395 (0.4459) data time 0.0006 (0.0015) model time 0.4389 (0.4447) loss 4.0466 (3.3893) grad_norm 1.3747 (1.3894) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][470/625] eta 0:01:09 lr 0.001167 wd 0.0500 time 0.4423 (0.4463) data time 0.0009 (0.0015) model time 0.4414 (0.4452) loss 3.0496 (3.3905) grad_norm 1.1353 (1.3901) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][480/625] eta 0:01:04 lr 0.001167 wd 0.0500 time 0.4410 (0.4462) data time 0.0009 (0.0015) model time 0.4401 (0.4450) loss 3.0477 (3.3897) grad_norm 1.1337 (1.3895) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][490/625] eta 0:01:00 lr 0.001167 wd 0.0500 time 0.4427 (0.4461) data time 0.0006 (0.0015) model time 0.4421 (0.4450) loss 4.4001 (3.3955) grad_norm 1.3662 (1.3882) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][500/625] eta 0:00:55 lr 0.001167 wd 0.0500 time 0.4439 (0.4461) data time 0.0006 (0.0015) model time 0.4433 (0.4449) loss 3.5901 (3.3998) grad_norm 1.5602 (1.3940) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][510/625] eta 0:00:51 lr 0.001167 wd 0.0500 time 0.4427 (0.4460) data time 0.0007 (0.0014) model time 0.4420 (0.4449) loss 2.6176 (3.4019) grad_norm 1.4637 (1.3901) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][520/625] eta 0:00:46 lr 0.001167 wd 0.0500 time 0.4430 (0.4460) data time 0.0008 (0.0014) model time 0.4422 (0.4448) loss 2.9102 (3.3978) grad_norm 1.1311 (1.3909) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][530/625] eta 0:00:42 lr 0.001167 wd 0.0500 time 0.4431 (0.4459) data time 0.0008 (0.0014) model time 0.4422 (0.4448) loss 3.7399 (3.3989) grad_norm 1.4182 (1.3898) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][540/625] eta 0:00:37 lr 0.001167 wd 0.0500 time 0.4462 (0.4459) data time 0.0009 (0.0014) model time 0.4452 (0.4448) loss 3.2916 (3.3966) grad_norm 1.4463 (1.3954) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][550/625] eta 0:00:33 lr 0.001167 wd 0.0500 time 0.4443 (0.4459) data time 0.0009 (0.0014) model time 0.4435 (0.4448) loss 3.6904 (3.3976) grad_norm 1.6531 (1.4033) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][560/625] eta 0:00:28 lr 0.001167 wd 0.0500 time 0.4424 (0.4459) data time 0.0006 (0.0014) model time 0.4418 (0.4448) loss 3.1534 (3.3990) grad_norm 1.1878 (1.4005) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][570/625] eta 0:00:24 lr 0.001167 wd 0.0500 time 0.4420 (0.4458) data time 0.0006 (0.0014) model time 0.4415 (0.4447) loss 3.2139 (3.3986) grad_norm 0.9347 (1.3986) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][580/625] eta 0:00:20 lr 0.001167 wd 0.0500 time 0.4415 (0.4458) data time 0.0006 (0.0014) model time 0.4409 (0.4446) loss 3.8486 (3.3999) grad_norm 1.3294 (1.3986) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][590/625] eta 0:00:15 lr 0.001167 wd 0.0500 time 0.4420 (0.4457) data time 0.0008 (0.0014) model time 0.4412 (0.4446) loss 2.3482 (3.3983) grad_norm 2.1392 (1.3970) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][600/625] eta 0:00:11 lr 0.001167 wd 0.0500 time 0.4450 (0.4456) data time 0.0006 (0.0014) model time 0.4444 (0.4445) loss 3.8172 (3.3977) grad_norm 1.4478 (1.3968) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][610/625] eta 0:00:06 lr 0.001167 wd 0.0500 time 0.4416 (0.4456) data time 0.0004 (0.0014) model time 0.4412 (0.4445) loss 3.9671 (3.4030) grad_norm 0.9500 (1.3962) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [49/300][620/625] eta 0:00:02 lr 0.001167 wd 0.0500 time 0.4384 (0.4457) data time 0.0004 (0.0013) model time 0.4380 (0.4446) loss 3.7497 (3.4015) grad_norm 1.5022 (1.3967) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 49 training takes 0:04:38 [2024-08-04 16:12:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:12:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:12:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.6616 (0.6616) Acc@1 84.961 (84.961) Acc@5 97.754 (97.754) Mem 16703MB [2024-08-04 16:12:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 1.0547 (0.8058) Acc@1 74.707 (81.721) Acc@5 93.555 (96.391) Mem 16703MB [2024-08-04 16:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1953 (0.9622) Acc@1 71.240 (77.944) Acc@5 91.357 (94.396) Mem 16703MB [2024-08-04 16:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.775 Acc@5 94.428 [2024-08-04 16:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.8% [2024-08-04 16:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.78% [2024-08-04 16:12:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 16:12:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 16:12:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5498 (0.5498) Acc@1 86.670 (86.670) Acc@5 97.900 (97.900) Mem 16703MB [2024-08-04 16:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.9458 (0.6950) Acc@1 76.221 (82.852) Acc@5 94.238 (96.697) Mem 16703MB [2024-08-04 16:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1143 (0.8555) Acc@1 71.680 (79.004) Acc@5 91.748 (94.792) Mem 16703MB [2024-08-04 16:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.769 Acc@5 94.776 [2024-08-04 16:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-08-04 16:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.77% [2024-08-04 16:12:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:12:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:12:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][0/625] eta 0:07:29 lr 0.001167 wd 0.0500 time 0.7187 (0.7187) data time 0.3355 (0.3355) model time 0.0000 (0.0000) loss 3.4993 (3.4993) grad_norm 1.4789 (1.4789) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][10/625] eta 0:04:48 lr 0.001167 wd 0.0500 time 0.4430 (0.4685) data time 0.0006 (0.0312) model time 0.0000 (0.0000) loss 3.2299 (3.3365) grad_norm 1.4137 (1.2326) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][20/625] eta 0:04:36 lr 0.001167 wd 0.0500 time 0.4418 (0.4564) data time 0.0009 (0.0168) model time 0.0000 (0.0000) loss 3.7762 (3.4372) grad_norm 1.1921 (1.2634) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][30/625] eta 0:04:29 lr 0.001167 wd 0.0500 time 0.4417 (0.4525) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 3.6211 (3.4395) grad_norm 1.4188 (1.2988) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][40/625] eta 0:04:23 lr 0.001167 wd 0.0500 time 0.4398 (0.4500) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 3.6986 (3.4876) grad_norm 1.1055 (1.2906) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][50/625] eta 0:04:20 lr 0.001166 wd 0.0500 time 0.4408 (0.4530) data time 0.0009 (0.0074) model time 0.0000 (0.0000) loss 3.4265 (3.4550) grad_norm 1.2714 (1.2947) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][60/625] eta 0:04:15 lr 0.001166 wd 0.0500 time 0.4452 (0.4514) data time 0.0007 (0.0063) model time 0.4446 (0.4425) loss 3.5464 (3.4812) grad_norm 1.1329 (1.2759) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][70/625] eta 0:04:09 lr 0.001166 wd 0.0500 time 0.4424 (0.4501) data time 0.0008 (0.0055) model time 0.4416 (0.4422) loss 3.3956 (3.4835) grad_norm 1.8108 (1.3361) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:12:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][80/625] eta 0:04:04 lr 0.001166 wd 0.0500 time 0.4431 (0.4493) data time 0.0006 (0.0050) model time 0.4424 (0.4422) loss 2.8779 (3.4711) grad_norm 1.0888 (1.3361) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][90/625] eta 0:03:59 lr 0.001166 wd 0.0500 time 0.4397 (0.4484) data time 0.0006 (0.0045) model time 0.4391 (0.4419) loss 4.2565 (3.4947) grad_norm 1.5653 (1.3709) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][100/625] eta 0:03:55 lr 0.001166 wd 0.0500 time 0.4453 (0.4481) data time 0.0006 (0.0041) model time 0.4447 (0.4423) loss 3.8849 (3.5181) grad_norm 1.1267 (1.3970) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][110/625] eta 0:03:50 lr 0.001166 wd 0.0500 time 0.4427 (0.4476) data time 0.0008 (0.0038) model time 0.4418 (0.4423) loss 3.0339 (3.4972) grad_norm 1.7212 (1.3993) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][120/625] eta 0:03:46 lr 0.001166 wd 0.0500 time 0.4414 (0.4487) data time 0.0006 (0.0036) model time 0.4408 (0.4447) loss 3.9446 (3.4987) grad_norm 1.2646 (1.4015) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][130/625] eta 0:03:41 lr 0.001166 wd 0.0500 time 0.4448 (0.4483) data time 0.0007 (0.0034) model time 0.4441 (0.4445) loss 4.0237 (3.4727) grad_norm 1.5011 (1.3940) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][140/625] eta 0:03:37 lr 0.001166 wd 0.0500 time 0.4453 (0.4494) data time 0.0010 (0.0032) model time 0.4443 (0.4466) loss 3.7880 (3.4718) grad_norm 1.5538 (1.3916) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][150/625] eta 0:03:33 lr 0.001166 wd 0.0500 time 0.4425 (0.4490) data time 0.0006 (0.0030) model time 0.4419 (0.4462) loss 3.2622 (3.4635) grad_norm 1.0945 (1.3885) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][160/625] eta 0:03:28 lr 0.001166 wd 0.0500 time 0.4419 (0.4486) data time 0.0007 (0.0029) model time 0.4412 (0.4458) loss 4.2037 (3.4544) grad_norm 1.0928 (1.3802) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][170/625] eta 0:03:23 lr 0.001166 wd 0.0500 time 0.4438 (0.4483) data time 0.0008 (0.0028) model time 0.4429 (0.4455) loss 3.2319 (3.4422) grad_norm 1.0302 (1.3795) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][180/625] eta 0:03:19 lr 0.001166 wd 0.0500 time 0.4459 (0.4480) data time 0.0010 (0.0027) model time 0.4449 (0.4453) loss 3.5328 (3.4344) grad_norm 1.7285 (1.3736) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][190/625] eta 0:03:14 lr 0.001166 wd 0.0500 time 0.4442 (0.4479) data time 0.0009 (0.0026) model time 0.4433 (0.4452) loss 2.3668 (3.4281) grad_norm 1.1145 (1.3701) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][200/625] eta 0:03:10 lr 0.001166 wd 0.0500 time 0.4439 (0.4478) data time 0.0008 (0.0025) model time 0.4431 (0.4451) loss 3.5399 (3.4298) grad_norm 2.4468 (1.3949) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][210/625] eta 0:03:05 lr 0.001166 wd 0.0500 time 0.4411 (0.4476) data time 0.0008 (0.0024) model time 0.4402 (0.4450) loss 3.9358 (3.4309) grad_norm 1.7138 (1.3923) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][220/625] eta 0:03:01 lr 0.001166 wd 0.0500 time 0.4400 (0.4474) data time 0.0009 (0.0023) model time 0.4391 (0.4448) loss 2.8893 (3.4285) grad_norm 1.2781 (1.3880) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][230/625] eta 0:02:56 lr 0.001166 wd 0.0500 time 0.4440 (0.4472) data time 0.0006 (0.0023) model time 0.4434 (0.4448) loss 3.5321 (3.4231) grad_norm 1.2034 (1.3840) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][240/625] eta 0:02:52 lr 0.001166 wd 0.0500 time 0.4449 (0.4477) data time 0.0007 (0.0022) model time 0.4443 (0.4455) loss 3.5657 (3.4206) grad_norm 1.4513 (1.3878) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][250/625] eta 0:02:47 lr 0.001166 wd 0.0500 time 0.4428 (0.4476) data time 0.0006 (0.0022) model time 0.4421 (0.4454) loss 2.8271 (3.4136) grad_norm 1.2199 (1.3849) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][260/625] eta 0:02:43 lr 0.001166 wd 0.0500 time 0.4455 (0.4475) data time 0.0007 (0.0021) model time 0.4448 (0.4454) loss 2.5475 (3.4058) grad_norm 1.1105 (1.3828) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][270/625] eta 0:02:38 lr 0.001166 wd 0.0500 time 0.4568 (0.4475) data time 0.0008 (0.0021) model time 0.4560 (0.4454) loss 2.6756 (3.3945) grad_norm 1.4889 (1.3874) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][280/625] eta 0:02:34 lr 0.001166 wd 0.0500 time 0.4428 (0.4474) data time 0.0008 (0.0020) model time 0.4420 (0.4453) loss 3.5890 (3.3962) grad_norm 1.4836 (1.3858) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][290/625] eta 0:02:29 lr 0.001166 wd 0.0500 time 0.4453 (0.4472) data time 0.0009 (0.0020) model time 0.4444 (0.4451) loss 3.6386 (3.3967) grad_norm 1.2206 (1.3884) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][300/625] eta 0:02:25 lr 0.001166 wd 0.0500 time 0.4412 (0.4470) data time 0.0006 (0.0019) model time 0.4406 (0.4450) loss 4.4276 (3.3947) grad_norm 1.5849 (1.3872) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][310/625] eta 0:02:20 lr 0.001166 wd 0.0500 time 0.4388 (0.4468) data time 0.0008 (0.0019) model time 0.4380 (0.4448) loss 3.5643 (3.3883) grad_norm 1.2769 (1.3884) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][320/625] eta 0:02:16 lr 0.001166 wd 0.0500 time 0.4438 (0.4468) data time 0.0006 (0.0019) model time 0.4432 (0.4448) loss 3.8753 (3.3934) grad_norm 1.2062 (1.3902) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][330/625] eta 0:02:11 lr 0.001165 wd 0.0500 time 0.4438 (0.4467) data time 0.0006 (0.0018) model time 0.4433 (0.4447) loss 3.9581 (3.3926) grad_norm 1.2811 (1.3867) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][340/625] eta 0:02:07 lr 0.001165 wd 0.0500 time 0.4460 (0.4467) data time 0.0008 (0.0018) model time 0.4452 (0.4447) loss 3.6171 (3.4015) grad_norm 1.6917 (1.3821) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][350/625] eta 0:02:02 lr 0.001165 wd 0.0500 time 0.4453 (0.4466) data time 0.0006 (0.0018) model time 0.4447 (0.4447) loss 3.7415 (3.4012) grad_norm 1.5198 (1.3842) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][360/625] eta 0:01:58 lr 0.001165 wd 0.0500 time 0.4424 (0.4465) data time 0.0006 (0.0018) model time 0.4419 (0.4446) loss 2.7836 (3.3993) grad_norm 1.0845 (1.3807) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][370/625] eta 0:01:53 lr 0.001165 wd 0.0500 time 0.4413 (0.4464) data time 0.0005 (0.0017) model time 0.4408 (0.4446) loss 3.8930 (3.4027) grad_norm 1.6451 (1.3770) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][380/625] eta 0:01:49 lr 0.001165 wd 0.0500 time 0.4459 (0.4464) data time 0.0008 (0.0017) model time 0.4451 (0.4445) loss 3.5306 (3.4113) grad_norm 1.1919 (1.3719) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][390/625] eta 0:01:44 lr 0.001165 wd 0.0500 time 0.4462 (0.4463) data time 0.0008 (0.0017) model time 0.4454 (0.4445) loss 3.4655 (3.4150) grad_norm 1.6230 (1.3722) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][400/625] eta 0:01:40 lr 0.001165 wd 0.0500 time 0.4444 (0.4463) data time 0.0007 (0.0017) model time 0.4437 (0.4445) loss 3.2729 (3.4190) grad_norm 1.2576 (1.3780) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][410/625] eta 0:01:35 lr 0.001165 wd 0.0500 time 0.4455 (0.4462) data time 0.0006 (0.0016) model time 0.4449 (0.4445) loss 3.7721 (3.4198) grad_norm 1.2481 (1.3785) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][420/625] eta 0:01:31 lr 0.001165 wd 0.0500 time 0.4474 (0.4462) data time 0.0008 (0.0016) model time 0.4466 (0.4444) loss 2.6879 (3.4177) grad_norm 1.8016 (1.3791) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][430/625] eta 0:01:26 lr 0.001165 wd 0.0500 time 0.4418 (0.4461) data time 0.0009 (0.0016) model time 0.4408 (0.4444) loss 3.4247 (3.4183) grad_norm 1.6440 (1.3781) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][440/625] eta 0:01:22 lr 0.001165 wd 0.0500 time 0.4417 (0.4460) data time 0.0008 (0.0016) model time 0.4409 (0.4443) loss 3.4046 (3.4172) grad_norm 1.3092 (1.3804) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][450/625] eta 0:01:18 lr 0.001165 wd 0.0500 time 0.4390 (0.4463) data time 0.0007 (0.0016) model time 0.4383 (0.4446) loss 2.3522 (3.4148) grad_norm 1.0341 (1.3785) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][460/625] eta 0:01:13 lr 0.001165 wd 0.0500 time 0.4432 (0.4467) data time 0.0008 (0.0015) model time 0.4424 (0.4451) loss 2.6718 (3.4188) grad_norm 1.4414 (1.3873) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][470/625] eta 0:01:09 lr 0.001165 wd 0.0500 time 0.4444 (0.4466) data time 0.0006 (0.0015) model time 0.4438 (0.4450) loss 3.9443 (3.4187) grad_norm 1.6872 (1.3874) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][480/625] eta 0:01:04 lr 0.001165 wd 0.0500 time 0.4461 (0.4469) data time 0.0009 (0.0015) model time 0.4452 (0.4454) loss 3.7265 (3.4183) grad_norm 1.0560 (1.3868) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][490/625] eta 0:01:00 lr 0.001165 wd 0.0500 time 0.4424 (0.4469) data time 0.0006 (0.0015) model time 0.4418 (0.4454) loss 2.9453 (3.4152) grad_norm 1.8320 (1.3856) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][500/625] eta 0:00:55 lr 0.001165 wd 0.0500 time 0.4422 (0.4468) data time 0.0006 (0.0015) model time 0.4416 (0.4453) loss 4.2129 (3.4137) grad_norm 1.6175 (1.3894) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][510/625] eta 0:00:51 lr 0.001165 wd 0.0500 time 0.4444 (0.4468) data time 0.0006 (0.0015) model time 0.4438 (0.4453) loss 3.9708 (3.4140) grad_norm 1.4966 (1.3894) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][520/625] eta 0:00:46 lr 0.001165 wd 0.0500 time 0.4414 (0.4467) data time 0.0008 (0.0015) model time 0.4406 (0.4452) loss 3.5839 (3.4118) grad_norm 1.2182 (1.3849) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][530/625] eta 0:00:42 lr 0.001165 wd 0.0500 time 0.4530 (0.4466) data time 0.0005 (0.0014) model time 0.4525 (0.4451) loss 3.4338 (3.4144) grad_norm 1.3034 (1.3849) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][540/625] eta 0:00:37 lr 0.001165 wd 0.0500 time 0.4441 (0.4466) data time 0.0008 (0.0014) model time 0.4433 (0.4451) loss 3.1424 (3.4161) grad_norm 1.1005 (1.3824) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][550/625] eta 0:00:33 lr 0.001165 wd 0.0500 time 0.4478 (0.4465) data time 0.0006 (0.0014) model time 0.4472 (0.4451) loss 3.9246 (3.4171) grad_norm 1.0261 (1.3826) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][560/625] eta 0:00:29 lr 0.001165 wd 0.0500 time 0.4418 (0.4465) data time 0.0006 (0.0014) model time 0.4412 (0.4451) loss 4.2825 (3.4181) grad_norm 1.2663 (1.3866) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][570/625] eta 0:00:24 lr 0.001165 wd 0.0500 time 0.4412 (0.4465) data time 0.0008 (0.0014) model time 0.4404 (0.4450) loss 3.7947 (3.4176) grad_norm 1.0783 (1.3856) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][580/625] eta 0:00:20 lr 0.001165 wd 0.0500 time 0.4405 (0.4464) data time 0.0008 (0.0014) model time 0.4397 (0.4450) loss 3.4240 (3.4216) grad_norm 1.1810 (1.3855) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][590/625] eta 0:00:15 lr 0.001165 wd 0.0500 time 0.4469 (0.4464) data time 0.0009 (0.0014) model time 0.4460 (0.4449) loss 3.6789 (3.4183) grad_norm 1.3228 (1.3819) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 16:16:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:16:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:18:24 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 16:18:25 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 16:18:32 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 16:18:48 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 16:18:48 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 16:18:50 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 16:18:52 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 16:18:52 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 50) [2024-08-04 16:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 16:19:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][600/625] eta 0:02:16 lr 0.001165 wd 0.0500 time 0.4418 (5.4715) data time 0.0006 (0.2002) model time 0.4412 (5.2713) loss 4.0164 (3.7343) grad_norm 1.1718 (1.2301) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:19:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][610/625] eta 0:00:28 lr 0.001164 wd 0.0500 time 0.4395 (1.8790) data time 0.0004 (0.0580) model time 0.4391 (1.8210) loss 3.7841 (3.5687) grad_norm 1.4105 (1.2476) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:19:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [50/300][620/625] eta 0:00:06 lr 0.001164 wd 0.0500 time 0.4376 (1.2782) data time 0.0007 (0.0341) model time 0.4370 (1.2441) loss 3.5682 (3.5821) grad_norm 2.0083 (1.4207) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 50 training takes 0:00:32 [2024-08-04 16:19:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:19:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.6309 (0.6309) Acc@1 85.254 (85.254) Acc@5 97.754 (97.754) Mem 16721MB [2024-08-04 16:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 1.0107 (0.7824) Acc@1 75.244 (81.689) Acc@5 93.799 (96.400) Mem 16721MB [2024-08-04 16:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.142) Loss 1.1992 (0.9460) Acc@1 71.436 (77.958) Acc@5 91.846 (94.399) Mem 16721MB [2024-08-04 16:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.741 Acc@5 94.362 [2024-08-04 16:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.7% [2024-08-04 16:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.142 (1.142) Loss 0.5488 (0.5488) Acc@1 86.670 (86.670) Acc@5 97.949 (97.949) Mem 16721MB [2024-08-04 16:19:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.237) Loss 0.9395 (0.6938) Acc@1 76.660 (83.034) Acc@5 94.434 (96.746) Mem 16721MB [2024-08-04 16:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.113 (0.178) Loss 1.1064 (0.8516) Acc@1 72.119 (79.174) Acc@5 91.992 (94.878) Mem 16721MB [2024-08-04 16:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.945 Acc@5 94.854 [2024-08-04 16:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-08-04 16:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 78.94% [2024-08-04 16:19:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:19:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][0/625] eta 0:11:16 lr 0.001164 wd 0.0500 time 1.0830 (1.0830) data time 0.3769 (0.3769) model time 0.0000 (0.0000) loss 3.6766 (3.6766) grad_norm 1.4140 (1.4140) loss_scale 4096.0000 (4096.0000) mem 16725MB [2024-08-04 16:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][10/625] eta 0:05:07 lr 0.001164 wd 0.0500 time 0.4414 (0.4998) data time 0.0008 (0.0350) model time 0.0000 (0.0000) loss 3.2814 (3.4821) grad_norm 1.1666 (1.3332) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][20/625] eta 0:04:46 lr 0.001164 wd 0.0500 time 0.4438 (0.4731) data time 0.0010 (0.0188) model time 0.0000 (0.0000) loss 3.3707 (3.5385) grad_norm 2.4893 (1.4015) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:19:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][30/625] eta 0:04:36 lr 0.001164 wd 0.0500 time 0.4436 (0.4641) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 2.5520 (3.4912) grad_norm 1.6968 (1.4019) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][40/625] eta 0:04:28 lr 0.001164 wd 0.0500 time 0.4441 (0.4590) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 3.3535 (3.4710) grad_norm 2.1580 (1.4331) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][50/625] eta 0:04:22 lr 0.001164 wd 0.0500 time 0.4440 (0.4562) data time 0.0010 (0.0083) model time 0.0000 (0.0000) loss 3.3757 (3.4457) grad_norm 1.0349 (1.4407) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][60/625] eta 0:04:16 lr 0.001164 wd 0.0500 time 0.4462 (0.4541) data time 0.0006 (0.0071) model time 0.4456 (0.4424) loss 3.3674 (3.4387) grad_norm 1.2831 (1.4536) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][70/625] eta 0:04:11 lr 0.001164 wd 0.0500 time 0.4414 (0.4525) data time 0.0010 (0.0062) model time 0.4404 (0.4421) loss 3.4480 (3.4763) grad_norm 1.2115 (1.4252) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][80/625] eta 0:04:05 lr 0.001164 wd 0.0500 time 0.4398 (0.4513) data time 0.0007 (0.0055) model time 0.4391 (0.4420) loss 3.6920 (3.4892) grad_norm 1.2272 (1.4039) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][90/625] eta 0:04:01 lr 0.001164 wd 0.0500 time 0.4429 (0.4505) data time 0.0009 (0.0050) model time 0.4420 (0.4423) loss 3.6447 (3.4862) grad_norm 1.2992 (1.3878) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][100/625] eta 0:03:56 lr 0.001164 wd 0.0500 time 0.4452 (0.4499) data time 0.0006 (0.0046) model time 0.4446 (0.4426) loss 3.1426 (3.4739) grad_norm 1.2923 (1.3916) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][110/625] eta 0:03:51 lr 0.001164 wd 0.0500 time 0.4408 (0.4494) data time 0.0008 (0.0043) model time 0.4401 (0.4427) loss 4.0635 (3.4765) grad_norm 1.0019 (1.3839) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][120/625] eta 0:03:46 lr 0.001164 wd 0.0500 time 0.4437 (0.4491) data time 0.0006 (0.0040) model time 0.4431 (0.4429) loss 3.1194 (3.4628) grad_norm 1.5564 (1.3872) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][130/625] eta 0:03:42 lr 0.001164 wd 0.0500 time 0.4422 (0.4485) data time 0.0007 (0.0038) model time 0.4415 (0.4427) loss 3.9996 (3.4646) grad_norm 1.2290 (1.3809) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][140/625] eta 0:03:37 lr 0.001164 wd 0.0500 time 0.4431 (0.4481) data time 0.0008 (0.0036) model time 0.4423 (0.4426) loss 3.3882 (3.4622) grad_norm 1.4827 (1.3918) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][150/625] eta 0:03:32 lr 0.001164 wd 0.0500 time 0.4496 (0.4479) data time 0.0007 (0.0034) model time 0.4489 (0.4427) loss 3.5295 (3.4390) grad_norm 1.4123 (1.3989) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:20:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][160/625] eta 0:03:28 lr 0.001164 wd 0.0500 time 0.4422 (0.4476) data time 0.0007 (0.0032) model time 0.4416 (0.4427) loss 4.0437 (3.4341) grad_norm 1.2958 (1.3944) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][170/625] eta 0:03:23 lr 0.001164 wd 0.0500 time 0.4444 (0.4475) data time 0.0009 (0.0031) model time 0.4435 (0.4428) loss 3.0241 (3.4087) grad_norm 1.7358 (1.3969) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][180/625] eta 0:03:19 lr 0.001164 wd 0.0500 time 0.4430 (0.4473) data time 0.0006 (0.0030) model time 0.4424 (0.4429) loss 3.5353 (3.3957) grad_norm 1.3924 (1.4108) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][190/625] eta 0:03:14 lr 0.001164 wd 0.0500 time 0.4425 (0.4479) data time 0.0006 (0.0029) model time 0.4419 (0.4440) loss 3.9470 (3.3967) grad_norm 1.2449 (1.4032) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][200/625] eta 0:03:10 lr 0.001164 wd 0.0500 time 0.4460 (0.4477) data time 0.0006 (0.0028) model time 0.4453 (0.4440) loss 2.9762 (3.3997) grad_norm 1.5814 (1.4020) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][210/625] eta 0:03:05 lr 0.001164 wd 0.0500 time 0.4409 (0.4475) data time 0.0006 (0.0027) model time 0.4403 (0.4438) loss 2.7539 (3.3905) grad_norm 1.4115 (1.4022) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][220/625] eta 0:03:01 lr 0.001164 wd 0.0500 time 0.4436 (0.4472) data time 0.0006 (0.0026) model time 0.4430 (0.4437) loss 3.2352 (3.3871) grad_norm 1.0297 (1.3959) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][230/625] eta 0:02:56 lr 0.001164 wd 0.0500 time 0.4434 (0.4471) data time 0.0009 (0.0025) model time 0.4426 (0.4436) loss 3.6855 (3.3747) grad_norm 2.0756 (1.3950) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][240/625] eta 0:02:52 lr 0.001164 wd 0.0500 time 0.4571 (0.4471) data time 0.0006 (0.0024) model time 0.4565 (0.4437) loss 2.3710 (3.3607) grad_norm 1.1853 (1.3988) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][250/625] eta 0:02:47 lr 0.001164 wd 0.0500 time 0.4398 (0.4470) data time 0.0009 (0.0024) model time 0.4389 (0.4438) loss 3.5434 (3.3727) grad_norm 1.4435 (1.3974) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][260/625] eta 0:02:43 lr 0.001163 wd 0.0500 time 0.4487 (0.4470) data time 0.0008 (0.0023) model time 0.4478 (0.4439) loss 3.1529 (3.3763) grad_norm 1.1406 (1.3949) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][270/625] eta 0:02:38 lr 0.001163 wd 0.0500 time 0.4423 (0.4469) data time 0.0008 (0.0023) model time 0.4415 (0.4439) loss 3.7201 (3.3588) grad_norm 1.3475 (1.3909) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][280/625] eta 0:02:34 lr 0.001163 wd 0.0500 time 0.4451 (0.4473) data time 0.0008 (0.0022) model time 0.4443 (0.4445) loss 3.7570 (3.3546) grad_norm 1.7260 (1.3946) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][290/625] eta 0:02:29 lr 0.001163 wd 0.0500 time 0.4430 (0.4472) data time 0.0007 (0.0022) model time 0.4423 (0.4444) loss 4.2882 (3.3660) grad_norm 1.5392 (1.3980) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][300/625] eta 0:02:25 lr 0.001163 wd 0.0500 time 0.4487 (0.4470) data time 0.0008 (0.0021) model time 0.4479 (0.4443) loss 2.3520 (3.3701) grad_norm 2.0225 (1.4068) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][310/625] eta 0:02:20 lr 0.001163 wd 0.0500 time 0.4468 (0.4469) data time 0.0008 (0.0021) model time 0.4459 (0.4442) loss 3.1099 (3.3697) grad_norm 1.0509 (1.4041) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][320/625] eta 0:02:16 lr 0.001163 wd 0.0500 time 0.4416 (0.4468) data time 0.0006 (0.0021) model time 0.4410 (0.4442) loss 3.6519 (3.3710) grad_norm 1.0451 (1.4006) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][330/625] eta 0:02:11 lr 0.001163 wd 0.0500 time 0.4457 (0.4467) data time 0.0007 (0.0020) model time 0.4451 (0.4441) loss 2.5614 (3.3666) grad_norm 1.1227 (1.3995) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][340/625] eta 0:02:07 lr 0.001163 wd 0.0500 time 0.4443 (0.4472) data time 0.0009 (0.0020) model time 0.4434 (0.4448) loss 4.0894 (3.3717) grad_norm 1.2344 (1.3982) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][350/625] eta 0:02:02 lr 0.001163 wd 0.0500 time 0.4417 (0.4471) data time 0.0007 (0.0020) model time 0.4411 (0.4447) loss 3.7377 (3.3730) grad_norm 1.1796 (1.3989) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][360/625] eta 0:01:58 lr 0.001163 wd 0.0500 time 0.4415 (0.4470) data time 0.0006 (0.0019) model time 0.4409 (0.4446) loss 3.2297 (3.3659) grad_norm 2.1511 (1.3969) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][370/625] eta 0:01:53 lr 0.001163 wd 0.0500 time 0.4421 (0.4469) data time 0.0006 (0.0019) model time 0.4415 (0.4445) loss 3.6868 (3.3666) grad_norm 1.3183 (1.3978) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][380/625] eta 0:01:49 lr 0.001163 wd 0.0500 time 0.4456 (0.4468) data time 0.0008 (0.0019) model time 0.4447 (0.4445) loss 3.3399 (3.3708) grad_norm 1.2130 (1.4022) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][390/625] eta 0:01:44 lr 0.001163 wd 0.0500 time 0.4558 (0.4468) data time 0.0006 (0.0018) model time 0.4552 (0.4445) loss 2.5271 (3.3687) grad_norm 1.0021 (1.4007) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][400/625] eta 0:01:40 lr 0.001163 wd 0.0500 time 0.4434 (0.4467) data time 0.0007 (0.0018) model time 0.4427 (0.4444) loss 3.4963 (3.3749) grad_norm 0.9520 (1.3992) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][410/625] eta 0:01:36 lr 0.001163 wd 0.0500 time 0.4414 (0.4471) data time 0.0009 (0.0018) model time 0.4405 (0.4449) loss 3.7114 (3.3842) grad_norm 1.6154 (1.3991) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][420/625] eta 0:01:31 lr 0.001163 wd 0.0500 time 0.4439 (0.4471) data time 0.0008 (0.0018) model time 0.4431 (0.4450) loss 3.4796 (3.3845) grad_norm 1.4171 (1.4000) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][430/625] eta 0:01:27 lr 0.001163 wd 0.0500 time 0.4442 (0.4471) data time 0.0008 (0.0018) model time 0.4433 (0.4450) loss 3.3986 (3.3794) grad_norm 1.1269 (1.4005) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][440/625] eta 0:01:22 lr 0.001163 wd 0.0500 time 0.4447 (0.4472) data time 0.0007 (0.0017) model time 0.4440 (0.4451) loss 2.3325 (3.3756) grad_norm 1.6122 (1.3974) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][450/625] eta 0:01:18 lr 0.001163 wd 0.0500 time 0.4442 (0.4472) data time 0.0009 (0.0017) model time 0.4433 (0.4451) loss 3.6854 (3.3751) grad_norm 1.4645 (1.3981) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][460/625] eta 0:01:13 lr 0.001163 wd 0.0500 time 0.4439 (0.4471) data time 0.0009 (0.0017) model time 0.4430 (0.4451) loss 3.7113 (3.3777) grad_norm 1.3688 (1.3983) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][470/625] eta 0:01:09 lr 0.001163 wd 0.0500 time 0.4505 (0.4472) data time 0.0007 (0.0017) model time 0.4498 (0.4452) loss 3.4833 (3.3791) grad_norm 1.8699 (1.4005) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][480/625] eta 0:01:04 lr 0.001163 wd 0.0500 time 0.4425 (0.4471) data time 0.0006 (0.0017) model time 0.4419 (0.4452) loss 3.9531 (3.3833) grad_norm 1.2073 (1.4053) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][490/625] eta 0:01:00 lr 0.001163 wd 0.0500 time 0.4460 (0.4470) data time 0.0008 (0.0016) model time 0.4451 (0.4451) loss 2.7542 (3.3885) grad_norm 1.2943 (1.4079) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][500/625] eta 0:00:55 lr 0.001163 wd 0.0500 time 0.4400 (0.4470) data time 0.0006 (0.0016) model time 0.4394 (0.4450) loss 4.0722 (3.3839) grad_norm 1.2851 (1.4050) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][510/625] eta 0:00:51 lr 0.001163 wd 0.0500 time 0.4505 (0.4469) data time 0.0008 (0.0016) model time 0.4497 (0.4450) loss 4.0460 (3.3825) grad_norm 1.2791 (1.4055) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][520/625] eta 0:00:46 lr 0.001163 wd 0.0500 time 0.4428 (0.4468) data time 0.0006 (0.0016) model time 0.4422 (0.4449) loss 3.9496 (3.3824) grad_norm 1.0596 (1.4048) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][530/625] eta 0:00:42 lr 0.001162 wd 0.0500 time 0.4443 (0.4470) data time 0.0006 (0.0016) model time 0.4437 (0.4452) loss 4.3267 (3.3847) grad_norm 1.1933 (1.4023) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][540/625] eta 0:00:37 lr 0.001162 wd 0.0500 time 0.4450 (0.4470) data time 0.0008 (0.0016) model time 0.4442 (0.4451) loss 3.3044 (3.3862) grad_norm 1.6396 (1.3992) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][550/625] eta 0:00:33 lr 0.001162 wd 0.0500 time 0.4474 (0.4469) data time 0.0007 (0.0016) model time 0.4467 (0.4451) loss 4.4071 (3.3885) grad_norm 2.2980 (1.4012) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][560/625] eta 0:00:29 lr 0.001162 wd 0.0500 time 0.4518 (0.4472) data time 0.0008 (0.0015) model time 0.4510 (0.4455) loss 2.7510 (3.3897) grad_norm 1.7093 (1.4029) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][570/625] eta 0:00:24 lr 0.001162 wd 0.0500 time 0.4437 (0.4471) data time 0.0008 (0.0015) model time 0.4429 (0.4454) loss 3.2840 (3.3909) grad_norm 1.5494 (1.4031) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][580/625] eta 0:00:20 lr 0.001162 wd 0.0500 time 0.4409 (0.4471) data time 0.0006 (0.0015) model time 0.4403 (0.4453) loss 3.9701 (3.3901) grad_norm 1.5030 (1.4005) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][590/625] eta 0:00:15 lr 0.001162 wd 0.0500 time 0.4428 (0.4470) data time 0.0007 (0.0015) model time 0.4421 (0.4453) loss 3.5433 (3.3910) grad_norm 0.9801 (1.3985) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][600/625] eta 0:00:11 lr 0.001162 wd 0.0500 time 0.4439 (0.4469) data time 0.0008 (0.0015) model time 0.4431 (0.4452) loss 2.9553 (3.3920) grad_norm 1.3320 (1.3985) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][610/625] eta 0:00:06 lr 0.001162 wd 0.0500 time 0.4429 (0.4469) data time 0.0004 (0.0015) model time 0.4426 (0.4451) loss 3.9245 (3.3948) grad_norm 1.4335 (1.3974) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [51/300][620/625] eta 0:00:02 lr 0.001162 wd 0.0500 time 0.4386 (0.4467) data time 0.0004 (0.0015) model time 0.4381 (0.4450) loss 3.2094 (3.3913) grad_norm 1.8419 (1.3997) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 51 training takes 0:04:39 [2024-08-04 16:24:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:24:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:24:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.6235 (0.6235) Acc@1 85.400 (85.400) Acc@5 97.559 (97.559) Mem 16721MB [2024-08-04 16:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.0742 (0.7728) Acc@1 74.268 (81.685) Acc@5 93.066 (96.347) Mem 16721MB [2024-08-04 16:24:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.1631 (0.9338) Acc@1 72.021 (78.065) Acc@5 92.041 (94.455) Mem 16721MB [2024-08-04 16:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 77.883 Acc@5 94.472 [2024-08-04 16:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 77.9% [2024-08-04 16:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 77.88% [2024-08-04 16:24:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 16:24:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 16:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5483 (0.5483) Acc@1 86.816 (86.816) Acc@5 98.096 (98.096) Mem 16721MB [2024-08-04 16:24:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.148) Loss 0.9331 (0.6918) Acc@1 76.758 (83.145) Acc@5 94.482 (96.768) Mem 16721MB [2024-08-04 16:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.113 (0.132) Loss 1.0996 (0.8478) Acc@1 71.973 (79.350) Acc@5 91.992 (94.950) Mem 16721MB [2024-08-04 16:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.103 Acc@5 94.940 [2024-08-04 16:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-08-04 16:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.10% [2024-08-04 16:24:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:24:39 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][0/625] eta 0:07:41 lr 0.001162 wd 0.0500 time 0.7379 (0.7379) data time 0.3465 (0.3465) model time 0.0000 (0.0000) loss 3.8598 (3.8598) grad_norm 0.9252 (0.9252) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][10/625] eta 0:04:49 lr 0.001162 wd 0.0500 time 0.4422 (0.4714) data time 0.0009 (0.0331) model time 0.0000 (0.0000) loss 3.4561 (3.1554) grad_norm 1.7997 (1.3076) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][20/625] eta 0:04:37 lr 0.001162 wd 0.0500 time 0.4397 (0.4582) data time 0.0006 (0.0177) model time 0.0000 (0.0000) loss 3.9867 (3.3916) grad_norm 1.7431 (1.3968) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][30/625] eta 0:04:30 lr 0.001162 wd 0.0500 time 0.4534 (0.4538) data time 0.0006 (0.0123) model time 0.0000 (0.0000) loss 3.5993 (3.3627) grad_norm 1.3498 (1.4261) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][40/625] eta 0:04:23 lr 0.001162 wd 0.0500 time 0.4427 (0.4508) data time 0.0009 (0.0095) model time 0.0000 (0.0000) loss 3.3995 (3.3007) grad_norm 1.5433 (1.4075) loss_scale 8192.0000 (4295.8049) mem 16721MB [2024-08-04 16:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][50/625] eta 0:04:18 lr 0.001162 wd 0.0500 time 0.4438 (0.4494) data time 0.0008 (0.0078) model time 0.0000 (0.0000) loss 3.5158 (3.2756) grad_norm 1.6537 (1.3813) loss_scale 8192.0000 (5059.7647) mem 16721MB [2024-08-04 16:25:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][60/625] eta 0:04:13 lr 0.001162 wd 0.0500 time 0.4502 (0.4485) data time 0.0006 (0.0066) model time 0.4496 (0.4437) loss 2.3548 (3.2294) grad_norm 1.1747 (1.3736) loss_scale 8192.0000 (5573.2459) mem 16721MB [2024-08-04 16:25:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][70/625] eta 0:04:08 lr 0.001162 wd 0.0500 time 0.4452 (0.4479) data time 0.0006 (0.0058) model time 0.4446 (0.4434) loss 3.4919 (3.2584) grad_norm 2.6776 (1.3972) loss_scale 8192.0000 (5942.0845) mem 16721MB [2024-08-04 16:25:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][80/625] eta 0:04:03 lr 0.001162 wd 0.0500 time 0.4453 (0.4472) data time 0.0008 (0.0052) model time 0.4446 (0.4428) loss 3.7036 (3.2991) grad_norm 1.3118 (1.4379) loss_scale 8192.0000 (6219.8519) mem 16721MB [2024-08-04 16:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][90/625] eta 0:03:58 lr 0.001162 wd 0.0500 time 0.4430 (0.4467) data time 0.0007 (0.0047) model time 0.4422 (0.4425) loss 2.1095 (3.3151) grad_norm 1.2169 (1.4108) loss_scale 8192.0000 (6436.5714) mem 16721MB [2024-08-04 16:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][100/625] eta 0:03:54 lr 0.001162 wd 0.0500 time 0.4458 (0.4463) data time 0.0008 (0.0043) model time 0.4449 (0.4424) loss 2.1996 (3.3068) grad_norm 1.3763 (1.3995) loss_scale 8192.0000 (6610.3762) mem 16721MB [2024-08-04 16:25:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][110/625] eta 0:03:50 lr 0.001162 wd 0.0500 time 0.4428 (0.4471) data time 0.0010 (0.0040) model time 0.4418 (0.4443) loss 3.1432 (3.3374) grad_norm 1.3742 (inf) loss_scale 4096.0000 (6568.3604) mem 16721MB [2024-08-04 16:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][120/625] eta 0:03:46 lr 0.001162 wd 0.0500 time 0.4423 (0.4486) data time 0.0010 (0.0038) model time 0.4413 (0.4472) loss 3.4540 (3.3500) grad_norm 1.2580 (inf) loss_scale 4096.0000 (6364.0331) mem 16721MB [2024-08-04 16:25:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][130/625] eta 0:03:41 lr 0.001162 wd 0.0500 time 0.4441 (0.4482) data time 0.0007 (0.0035) model time 0.4435 (0.4467) loss 2.6653 (3.3561) grad_norm 1.4644 (inf) loss_scale 4096.0000 (6190.9008) mem 16721MB [2024-08-04 16:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][140/625] eta 0:03:37 lr 0.001162 wd 0.0500 time 0.4454 (0.4479) data time 0.0009 (0.0034) model time 0.4446 (0.4463) loss 3.6076 (3.3552) grad_norm 1.7986 (inf) loss_scale 4096.0000 (6042.3262) mem 16721MB [2024-08-04 16:25:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][150/625] eta 0:03:32 lr 0.001162 wd 0.0500 time 0.4434 (0.4476) data time 0.0008 (0.0032) model time 0.4426 (0.4458) loss 3.0223 (3.3467) grad_norm 1.0254 (inf) loss_scale 4096.0000 (5913.4305) mem 16721MB [2024-08-04 16:25:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][160/625] eta 0:03:27 lr 0.001162 wd 0.0500 time 0.4432 (0.4473) data time 0.0009 (0.0031) model time 0.4423 (0.4454) loss 3.6461 (3.3199) grad_norm 1.7032 (inf) loss_scale 4096.0000 (5800.5466) mem 16721MB [2024-08-04 16:25:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][170/625] eta 0:03:23 lr 0.001161 wd 0.0500 time 0.4453 (0.4470) data time 0.0008 (0.0029) model time 0.4444 (0.4451) loss 3.9152 (3.3305) grad_norm 1.3776 (inf) loss_scale 4096.0000 (5700.8655) mem 16721MB [2024-08-04 16:25:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][180/625] eta 0:03:18 lr 0.001161 wd 0.0500 time 0.4387 (0.4467) data time 0.0006 (0.0028) model time 0.4381 (0.4449) loss 3.2057 (3.3179) grad_norm 1.9151 (inf) loss_scale 4096.0000 (5612.1989) mem 16721MB [2024-08-04 16:26:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][190/625] eta 0:03:14 lr 0.001161 wd 0.0500 time 0.4469 (0.4466) data time 0.0007 (0.0027) model time 0.4462 (0.4447) loss 2.4438 (3.3104) grad_norm 1.1956 (inf) loss_scale 4096.0000 (5532.8168) mem 16721MB [2024-08-04 16:26:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][200/625] eta 0:03:09 lr 0.001161 wd 0.0500 time 0.4419 (0.4465) data time 0.0009 (0.0026) model time 0.4410 (0.4446) loss 3.5993 (3.3079) grad_norm 1.0288 (inf) loss_scale 4096.0000 (5461.3333) mem 16721MB [2024-08-04 16:26:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][210/625] eta 0:03:05 lr 0.001161 wd 0.0500 time 0.4420 (0.4463) data time 0.0009 (0.0025) model time 0.4412 (0.4444) loss 3.6557 (3.3066) grad_norm 1.1350 (inf) loss_scale 4096.0000 (5396.6256) mem 16721MB [2024-08-04 16:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][220/625] eta 0:03:00 lr 0.001161 wd 0.0500 time 0.4454 (0.4461) data time 0.0009 (0.0025) model time 0.4445 (0.4443) loss 3.4042 (3.3058) grad_norm 1.3881 (inf) loss_scale 4096.0000 (5337.7738) mem 16721MB [2024-08-04 16:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][230/625] eta 0:02:56 lr 0.001161 wd 0.0500 time 0.4410 (0.4459) data time 0.0007 (0.0024) model time 0.4403 (0.4441) loss 4.3120 (3.3046) grad_norm 1.5848 (inf) loss_scale 4096.0000 (5284.0173) mem 16721MB [2024-08-04 16:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][240/625] eta 0:02:51 lr 0.001161 wd 0.0500 time 0.4424 (0.4458) data time 0.0009 (0.0023) model time 0.4416 (0.4440) loss 3.4688 (3.3045) grad_norm 1.2379 (inf) loss_scale 4096.0000 (5234.7220) mem 16721MB [2024-08-04 16:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][250/625] eta 0:02:47 lr 0.001161 wd 0.0500 time 0.4409 (0.4457) data time 0.0006 (0.0023) model time 0.4403 (0.4439) loss 2.9138 (3.3063) grad_norm 1.1964 (inf) loss_scale 4096.0000 (5189.3546) mem 16721MB [2024-08-04 16:26:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][260/625] eta 0:02:42 lr 0.001161 wd 0.0500 time 0.4486 (0.4456) data time 0.0006 (0.0022) model time 0.4480 (0.4439) loss 3.1392 (3.3067) grad_norm 1.0138 (inf) loss_scale 4096.0000 (5147.4636) mem 16721MB [2024-08-04 16:26:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][270/625] eta 0:02:38 lr 0.001161 wd 0.0500 time 0.4440 (0.4456) data time 0.0006 (0.0022) model time 0.4434 (0.4439) loss 4.4216 (3.3289) grad_norm 1.1593 (inf) loss_scale 4096.0000 (5108.6642) mem 16721MB [2024-08-04 16:26:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][280/625] eta 0:02:33 lr 0.001161 wd 0.0500 time 0.4411 (0.4455) data time 0.0009 (0.0021) model time 0.4401 (0.4438) loss 3.3070 (3.3440) grad_norm 2.0707 (inf) loss_scale 4096.0000 (5072.6263) mem 16721MB [2024-08-04 16:26:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][290/625] eta 0:02:29 lr 0.001161 wd 0.0500 time 0.4366 (0.4454) data time 0.0009 (0.0021) model time 0.4358 (0.4437) loss 2.8781 (3.3373) grad_norm 1.0015 (inf) loss_scale 4096.0000 (5039.0653) mem 16721MB [2024-08-04 16:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][300/625] eta 0:02:24 lr 0.001161 wd 0.0500 time 0.4455 (0.4459) data time 0.0006 (0.0020) model time 0.4449 (0.4443) loss 2.6975 (3.3273) grad_norm 1.2529 (inf) loss_scale 4096.0000 (5007.7342) mem 16721MB [2024-08-04 16:26:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][310/625] eta 0:02:20 lr 0.001161 wd 0.0500 time 0.4418 (0.4458) data time 0.0007 (0.0020) model time 0.4412 (0.4443) loss 3.2113 (3.3281) grad_norm 1.8727 (inf) loss_scale 4096.0000 (4978.4180) mem 16721MB [2024-08-04 16:27:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][320/625] eta 0:02:15 lr 0.001161 wd 0.0500 time 0.4413 (0.4458) data time 0.0009 (0.0020) model time 0.4403 (0.4442) loss 3.2256 (3.3331) grad_norm 1.5394 (inf) loss_scale 4096.0000 (4950.9283) mem 16721MB [2024-08-04 16:27:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][330/625] eta 0:02:11 lr 0.001161 wd 0.0500 time 0.4459 (0.4462) data time 0.0009 (0.0019) model time 0.4450 (0.4448) loss 3.7431 (3.3332) grad_norm 1.6622 (inf) loss_scale 4096.0000 (4925.0997) mem 16721MB [2024-08-04 16:27:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][340/625] eta 0:02:07 lr 0.001161 wd 0.0500 time 0.4463 (0.4462) data time 0.0009 (0.0019) model time 0.4455 (0.4448) loss 2.7463 (3.3321) grad_norm 1.3375 (inf) loss_scale 4096.0000 (4900.7859) mem 16721MB [2024-08-04 16:27:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][350/625] eta 0:02:02 lr 0.001161 wd 0.0500 time 0.4552 (0.4462) data time 0.0007 (0.0019) model time 0.4545 (0.4448) loss 3.3423 (3.3324) grad_norm 1.1035 (inf) loss_scale 4096.0000 (4877.8575) mem 16721MB [2024-08-04 16:27:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][360/625] eta 0:01:58 lr 0.001161 wd 0.0500 time 0.4427 (0.4462) data time 0.0007 (0.0018) model time 0.4420 (0.4449) loss 4.2743 (3.3416) grad_norm 2.2899 (inf) loss_scale 4096.0000 (4856.1994) mem 16721MB [2024-08-04 16:27:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][370/625] eta 0:01:53 lr 0.001161 wd 0.0500 time 0.4411 (0.4462) data time 0.0008 (0.0018) model time 0.4403 (0.4448) loss 2.4996 (3.3328) grad_norm 2.0259 (inf) loss_scale 4096.0000 (4835.7089) mem 16721MB [2024-08-04 16:27:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][380/625] eta 0:01:49 lr 0.001161 wd 0.0500 time 0.4577 (0.4462) data time 0.0007 (0.0018) model time 0.4571 (0.4448) loss 3.4033 (3.3296) grad_norm 0.9718 (inf) loss_scale 4096.0000 (4816.2940) mem 16721MB [2024-08-04 16:27:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][390/625] eta 0:01:44 lr 0.001161 wd 0.0500 time 0.4445 (0.4461) data time 0.0008 (0.0018) model time 0.4436 (0.4447) loss 2.5115 (3.3395) grad_norm 1.2109 (inf) loss_scale 4096.0000 (4797.8721) mem 16721MB [2024-08-04 16:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][400/625] eta 0:01:40 lr 0.001161 wd 0.0500 time 0.4449 (0.4460) data time 0.0009 (0.0017) model time 0.4441 (0.4447) loss 3.5065 (3.3406) grad_norm 2.1648 (inf) loss_scale 4096.0000 (4780.3691) mem 16721MB [2024-08-04 16:27:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][410/625] eta 0:01:35 lr 0.001161 wd 0.0500 time 0.4440 (0.4459) data time 0.0010 (0.0017) model time 0.4430 (0.4446) loss 3.6492 (3.3455) grad_norm 1.6275 (inf) loss_scale 4096.0000 (4763.7178) mem 16721MB [2024-08-04 16:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][420/625] eta 0:01:31 lr 0.001161 wd 0.0500 time 0.4393 (0.4459) data time 0.0009 (0.0017) model time 0.4384 (0.4445) loss 3.6478 (3.3383) grad_norm 1.0358 (inf) loss_scale 4096.0000 (4747.8575) mem 16721MB [2024-08-04 16:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][430/625] eta 0:01:26 lr 0.001160 wd 0.0500 time 0.4450 (0.4458) data time 0.0008 (0.0017) model time 0.4442 (0.4445) loss 3.3122 (3.3406) grad_norm 2.0003 (inf) loss_scale 4096.0000 (4732.7332) mem 16721MB [2024-08-04 16:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][440/625] eta 0:01:22 lr 0.001160 wd 0.0500 time 0.4420 (0.4457) data time 0.0008 (0.0017) model time 0.4412 (0.4444) loss 3.3310 (3.3367) grad_norm 1.2248 (inf) loss_scale 4096.0000 (4718.2948) mem 16721MB [2024-08-04 16:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][450/625] eta 0:01:18 lr 0.001160 wd 0.0500 time 0.4392 (0.4461) data time 0.0010 (0.0016) model time 0.4382 (0.4449) loss 3.5754 (3.3387) grad_norm 1.1405 (inf) loss_scale 4096.0000 (4704.4967) mem 16721MB [2024-08-04 16:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][460/625] eta 0:01:13 lr 0.001160 wd 0.0500 time 0.4440 (0.4461) data time 0.0008 (0.0016) model time 0.4432 (0.4448) loss 3.6385 (3.3402) grad_norm 1.3909 (inf) loss_scale 4096.0000 (4691.2972) mem 16721MB [2024-08-04 16:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][470/625] eta 0:01:09 lr 0.001160 wd 0.0500 time 0.4434 (0.4463) data time 0.0008 (0.0016) model time 0.4426 (0.4451) loss 2.7453 (3.3431) grad_norm 1.5442 (inf) loss_scale 4096.0000 (4678.6582) mem 16721MB [2024-08-04 16:28:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][480/625] eta 0:01:04 lr 0.001160 wd 0.0500 time 0.4447 (0.4463) data time 0.0007 (0.0016) model time 0.4440 (0.4451) loss 3.2498 (3.3469) grad_norm 1.2850 (inf) loss_scale 4096.0000 (4666.5447) mem 16721MB [2024-08-04 16:28:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][490/625] eta 0:01:00 lr 0.001160 wd 0.0500 time 0.4432 (0.4464) data time 0.0009 (0.0016) model time 0.4424 (0.4452) loss 3.8169 (3.3496) grad_norm 1.5162 (inf) loss_scale 4096.0000 (4654.9246) mem 16721MB [2024-08-04 16:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][500/625] eta 0:00:55 lr 0.001160 wd 0.0500 time 0.4460 (0.4463) data time 0.0007 (0.0016) model time 0.4454 (0.4451) loss 2.2746 (3.3481) grad_norm 1.1362 (inf) loss_scale 4096.0000 (4643.7685) mem 16721MB [2024-08-04 16:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][510/625] eta 0:00:51 lr 0.001160 wd 0.0500 time 0.4425 (0.4463) data time 0.0006 (0.0015) model time 0.4418 (0.4451) loss 2.5992 (3.3463) grad_norm 1.7550 (inf) loss_scale 4096.0000 (4633.0489) mem 16721MB [2024-08-04 16:28:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][520/625] eta 0:00:46 lr 0.001160 wd 0.0500 time 0.4419 (0.4462) data time 0.0006 (0.0015) model time 0.4413 (0.4450) loss 4.5003 (3.3525) grad_norm 1.3002 (inf) loss_scale 4096.0000 (4622.7409) mem 16721MB [2024-08-04 16:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][530/625] eta 0:00:42 lr 0.001160 wd 0.0500 time 0.4411 (0.4461) data time 0.0009 (0.0015) model time 0.4402 (0.4449) loss 3.7525 (3.3606) grad_norm 1.2912 (inf) loss_scale 4096.0000 (4612.8211) mem 16721MB [2024-08-04 16:28:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][540/625] eta 0:00:37 lr 0.001160 wd 0.0500 time 0.4461 (0.4461) data time 0.0008 (0.0015) model time 0.4453 (0.4449) loss 3.6614 (3.3625) grad_norm 1.2126 (inf) loss_scale 4096.0000 (4603.2680) mem 16721MB [2024-08-04 16:28:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][550/625] eta 0:00:33 lr 0.001160 wd 0.0500 time 0.4405 (0.4460) data time 0.0008 (0.0015) model time 0.4397 (0.4448) loss 3.4670 (3.3640) grad_norm 1.1498 (inf) loss_scale 4096.0000 (4594.0617) mem 16721MB [2024-08-04 16:28:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][560/625] eta 0:00:28 lr 0.001160 wd 0.0500 time 0.4466 (0.4460) data time 0.0009 (0.0015) model time 0.4457 (0.4448) loss 3.3213 (3.3658) grad_norm 1.1926 (inf) loss_scale 4096.0000 (4585.1836) mem 16721MB [2024-08-04 16:28:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][570/625] eta 0:00:24 lr 0.001160 wd 0.0500 time 0.4457 (0.4460) data time 0.0010 (0.0015) model time 0.4447 (0.4448) loss 3.8964 (3.3641) grad_norm 1.3813 (inf) loss_scale 4096.0000 (4576.6165) mem 16721MB [2024-08-04 16:28:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][580/625] eta 0:00:20 lr 0.001160 wd 0.0500 time 0.4447 (0.4459) data time 0.0009 (0.0015) model time 0.4438 (0.4447) loss 3.3553 (3.3621) grad_norm 1.5325 (inf) loss_scale 4096.0000 (4568.3442) mem 16721MB [2024-08-04 16:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][590/625] eta 0:00:15 lr 0.001160 wd 0.0500 time 0.4444 (0.4459) data time 0.0008 (0.0015) model time 0.4436 (0.4447) loss 3.1490 (3.3640) grad_norm 1.1476 (inf) loss_scale 4096.0000 (4560.3519) mem 16721MB [2024-08-04 16:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][600/625] eta 0:00:11 lr 0.001160 wd 0.0500 time 0.4435 (0.4459) data time 0.0008 (0.0014) model time 0.4427 (0.4447) loss 3.6021 (3.3609) grad_norm 1.3485 (inf) loss_scale 4096.0000 (4552.6256) mem 16721MB [2024-08-04 16:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][610/625] eta 0:00:06 lr 0.001160 wd 0.0500 time 0.4373 (0.4458) data time 0.0006 (0.0014) model time 0.4366 (0.4446) loss 3.6366 (3.3665) grad_norm 1.1281 (inf) loss_scale 4096.0000 (4545.1522) mem 16721MB [2024-08-04 16:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [52/300][620/625] eta 0:00:02 lr 0.001160 wd 0.0500 time 0.4385 (0.4457) data time 0.0004 (0.0014) model time 0.4381 (0.4445) loss 3.8681 (3.3689) grad_norm 1.4421 (inf) loss_scale 4096.0000 (4537.9195) mem 16721MB [2024-08-04 16:29:17 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 52 training takes 0:04:38 [2024-08-04 16:29:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:29:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.6196 (0.6196) Acc@1 86.768 (86.768) Acc@5 97.852 (97.852) Mem 16721MB [2024-08-04 16:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.0166 (0.7751) Acc@1 75.586 (82.249) Acc@5 93.555 (96.524) Mem 16721MB [2024-08-04 16:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.1045 (0.9299) Acc@1 72.998 (78.555) Acc@5 92.236 (94.589) Mem 16721MB [2024-08-04 16:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.205 Acc@5 94.504 [2024-08-04 16:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-08-04 16:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.20% [2024-08-04 16:29:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 16:29:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 16:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5479 (0.5479) Acc@1 87.061 (87.061) Acc@5 98.145 (98.145) Mem 16721MB [2024-08-04 16:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.150) Loss 0.9268 (0.6899) Acc@1 77.002 (83.310) Acc@5 94.580 (96.817) Mem 16721MB [2024-08-04 16:29:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.0928 (0.8444) Acc@1 71.973 (79.506) Acc@5 92.041 (95.001) Mem 16721MB [2024-08-04 16:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.255 Acc@5 95.002 [2024-08-04 16:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-08-04 16:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.25% [2024-08-04 16:29:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:29:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][0/625] eta 0:07:43 lr 0.001160 wd 0.0500 time 0.7416 (0.7416) data time 0.3549 (0.3549) model time 0.0000 (0.0000) loss 3.1102 (3.1102) grad_norm 1.1456 (1.1456) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][10/625] eta 0:04:50 lr 0.001160 wd 0.0500 time 0.4441 (0.4721) data time 0.0009 (0.0331) model time 0.0000 (0.0000) loss 3.0233 (3.5276) grad_norm 1.3614 (1.5092) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][20/625] eta 0:04:37 lr 0.001160 wd 0.0500 time 0.4403 (0.4584) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 3.5112 (3.4000) grad_norm 0.9855 (1.4680) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][30/625] eta 0:04:37 lr 0.001160 wd 0.0500 time 0.4433 (0.4664) data time 0.0006 (0.0123) model time 0.0000 (0.0000) loss 2.2995 (3.3474) grad_norm 1.0830 (1.4030) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:29:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][40/625] eta 0:04:29 lr 0.001160 wd 0.0500 time 0.4413 (0.4604) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 3.0920 (3.2924) grad_norm 1.8516 (1.3895) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][50/625] eta 0:04:22 lr 0.001160 wd 0.0500 time 0.4497 (0.4570) data time 0.0010 (0.0078) model time 0.0000 (0.0000) loss 3.2171 (3.3054) grad_norm 1.5595 (1.3882) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][60/625] eta 0:04:16 lr 0.001160 wd 0.0500 time 0.4444 (0.4547) data time 0.0006 (0.0067) model time 0.4438 (0.4422) loss 4.4365 (3.3211) grad_norm 1.4227 (1.3932) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][70/625] eta 0:04:11 lr 0.001159 wd 0.0500 time 0.4390 (0.4530) data time 0.0006 (0.0058) model time 0.4384 (0.4420) loss 3.9915 (3.3194) grad_norm 1.4644 (1.3918) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][80/625] eta 0:04:07 lr 0.001159 wd 0.0500 time 0.4438 (0.4541) data time 0.0008 (0.0052) model time 0.4430 (0.4482) loss 2.8326 (3.3385) grad_norm 1.1224 (1.3840) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][90/625] eta 0:04:02 lr 0.001159 wd 0.0500 time 0.4438 (0.4529) data time 0.0009 (0.0048) model time 0.4430 (0.4469) loss 2.8090 (3.3611) grad_norm 1.3804 (1.3783) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][100/625] eta 0:03:57 lr 0.001159 wd 0.0500 time 0.4420 (0.4520) data time 0.0008 (0.0044) model time 0.4412 (0.4460) loss 3.1393 (3.3635) grad_norm 1.3812 (1.3635) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][110/625] eta 0:03:52 lr 0.001159 wd 0.0500 time 0.4409 (0.4511) data time 0.0007 (0.0041) model time 0.4403 (0.4453) loss 4.0816 (3.3698) grad_norm 1.9050 (1.3865) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][120/625] eta 0:03:47 lr 0.001159 wd 0.0500 time 0.4426 (0.4504) data time 0.0006 (0.0038) model time 0.4420 (0.4447) loss 2.9984 (3.3690) grad_norm 1.9760 (1.4008) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][130/625] eta 0:03:42 lr 0.001159 wd 0.0500 time 0.4456 (0.4498) data time 0.0006 (0.0036) model time 0.4450 (0.4443) loss 3.2142 (3.3925) grad_norm 1.1714 (1.3907) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][140/625] eta 0:03:37 lr 0.001159 wd 0.0500 time 0.4456 (0.4494) data time 0.0009 (0.0034) model time 0.4448 (0.4442) loss 3.4813 (3.3869) grad_norm 1.2023 (1.3920) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][150/625] eta 0:03:33 lr 0.001159 wd 0.0500 time 0.3877 (0.4501) data time 0.0008 (0.0032) model time 0.3869 (0.4457) loss 3.5356 (3.3804) grad_norm 1.0690 (1.3861) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][160/625] eta 0:03:29 lr 0.001159 wd 0.0500 time 0.4449 (0.4498) data time 0.0008 (0.0031) model time 0.4441 (0.4456) loss 3.0367 (3.3819) grad_norm 1.2955 (1.3830) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][170/625] eta 0:03:24 lr 0.001159 wd 0.0500 time 0.4430 (0.4495) data time 0.0006 (0.0029) model time 0.4424 (0.4454) loss 2.8966 (3.3896) grad_norm 1.1365 (1.3857) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][180/625] eta 0:03:19 lr 0.001159 wd 0.0500 time 0.4457 (0.4493) data time 0.0008 (0.0028) model time 0.4449 (0.4454) loss 2.3597 (3.3903) grad_norm 1.6732 (1.3930) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][190/625] eta 0:03:15 lr 0.001159 wd 0.0500 time 0.4435 (0.4490) data time 0.0010 (0.0027) model time 0.4425 (0.4452) loss 3.4694 (3.3984) grad_norm 1.3283 (1.3883) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:30:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][200/625] eta 0:03:10 lr 0.001159 wd 0.0500 time 0.4438 (0.4487) data time 0.0006 (0.0026) model time 0.4432 (0.4450) loss 3.1342 (3.3856) grad_norm 1.6870 (1.3916) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][210/625] eta 0:03:06 lr 0.001159 wd 0.0500 time 0.4432 (0.4484) data time 0.0006 (0.0025) model time 0.4426 (0.4448) loss 3.4897 (3.3607) grad_norm 1.0620 (1.3900) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][220/625] eta 0:03:02 lr 0.001159 wd 0.0500 time 0.4417 (0.4498) data time 0.0006 (0.0025) model time 0.4411 (0.4467) loss 3.3186 (3.3470) grad_norm 2.0211 (1.4012) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][230/625] eta 0:02:57 lr 0.001159 wd 0.0500 time 0.4428 (0.4494) data time 0.0008 (0.0024) model time 0.4420 (0.4464) loss 3.2466 (3.3606) grad_norm 2.6825 (1.4136) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][240/625] eta 0:02:52 lr 0.001159 wd 0.0500 time 0.4382 (0.4491) data time 0.0008 (0.0023) model time 0.4374 (0.4461) loss 3.0930 (3.3629) grad_norm 1.3761 (1.4233) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][250/625] eta 0:02:48 lr 0.001159 wd 0.0500 time 0.4422 (0.4488) data time 0.0008 (0.0023) model time 0.4414 (0.4459) loss 3.6922 (3.3707) grad_norm 1.4093 (1.4163) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][260/625] eta 0:02:43 lr 0.001159 wd 0.0500 time 0.4537 (0.4486) data time 0.0009 (0.0022) model time 0.4528 (0.4458) loss 2.9615 (3.3664) grad_norm 0.8902 (1.4107) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][270/625] eta 0:02:39 lr 0.001159 wd 0.0500 time 0.4418 (0.4484) data time 0.0008 (0.0022) model time 0.4410 (0.4456) loss 2.5183 (3.3624) grad_norm 1.2517 (1.4100) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][280/625] eta 0:02:34 lr 0.001159 wd 0.0500 time 0.4452 (0.4482) data time 0.0009 (0.0021) model time 0.4443 (0.4454) loss 3.5469 (3.3680) grad_norm 1.1577 (1.4081) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][290/625] eta 0:02:30 lr 0.001159 wd 0.0500 time 0.4503 (0.4481) data time 0.0009 (0.0021) model time 0.4494 (0.4453) loss 3.8277 (3.3557) grad_norm 1.1298 (1.4010) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][300/625] eta 0:02:25 lr 0.001159 wd 0.0500 time 0.4449 (0.4480) data time 0.0009 (0.0020) model time 0.4439 (0.4453) loss 3.4755 (3.3558) grad_norm 1.3581 (1.4008) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][310/625] eta 0:02:21 lr 0.001159 wd 0.0500 time 0.4418 (0.4478) data time 0.0007 (0.0020) model time 0.4411 (0.4452) loss 4.1619 (3.3571) grad_norm 1.3921 (1.4026) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][320/625] eta 0:02:16 lr 0.001159 wd 0.0500 time 0.4404 (0.4477) data time 0.0009 (0.0020) model time 0.4395 (0.4450) loss 3.6935 (3.3607) grad_norm 1.9469 (1.4019) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][330/625] eta 0:02:12 lr 0.001158 wd 0.0500 time 0.4440 (0.4475) data time 0.0008 (0.0019) model time 0.4432 (0.4450) loss 3.4691 (3.3682) grad_norm 1.6412 (1.4073) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][340/625] eta 0:02:07 lr 0.001158 wd 0.0500 time 0.4422 (0.4474) data time 0.0009 (0.0019) model time 0.4413 (0.4449) loss 2.6182 (3.3661) grad_norm 1.8641 (1.4050) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][350/625] eta 0:02:02 lr 0.001158 wd 0.0500 time 0.4456 (0.4473) data time 0.0008 (0.0019) model time 0.4448 (0.4448) loss 3.2810 (3.3659) grad_norm 1.0290 (1.4023) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][360/625] eta 0:01:58 lr 0.001158 wd 0.0500 time 0.4412 (0.4471) data time 0.0008 (0.0018) model time 0.4404 (0.4447) loss 2.8261 (3.3704) grad_norm 1.6037 (1.4034) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][370/625] eta 0:01:53 lr 0.001158 wd 0.0500 time 0.4423 (0.4470) data time 0.0007 (0.0018) model time 0.4416 (0.4446) loss 3.9052 (3.3583) grad_norm 1.7413 (1.4043) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][380/625] eta 0:01:49 lr 0.001158 wd 0.0500 time 0.4443 (0.4469) data time 0.0006 (0.0018) model time 0.4437 (0.4445) loss 3.3137 (3.3467) grad_norm 1.2609 (1.3997) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][390/625] eta 0:01:45 lr 0.001158 wd 0.0500 time 0.4410 (0.4469) data time 0.0007 (0.0018) model time 0.4403 (0.4445) loss 3.8815 (3.3467) grad_norm 1.3479 (1.3982) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][400/625] eta 0:01:40 lr 0.001158 wd 0.0500 time 0.4405 (0.4467) data time 0.0006 (0.0017) model time 0.4398 (0.4444) loss 2.9601 (3.3457) grad_norm 1.3895 (1.3999) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][410/625] eta 0:01:36 lr 0.001158 wd 0.0500 time 0.4446 (0.4467) data time 0.0009 (0.0017) model time 0.4437 (0.4444) loss 3.7555 (3.3477) grad_norm 1.6043 (1.4010) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][420/625] eta 0:01:31 lr 0.001158 wd 0.0500 time 0.4404 (0.4470) data time 0.0009 (0.0017) model time 0.4395 (0.4448) loss 3.6926 (3.3525) grad_norm 1.3011 (1.4039) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][430/625] eta 0:01:27 lr 0.001158 wd 0.0500 time 0.4446 (0.4470) data time 0.0007 (0.0017) model time 0.4439 (0.4448) loss 2.6861 (3.3573) grad_norm 1.3008 (1.4038) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][440/625] eta 0:01:22 lr 0.001158 wd 0.0500 time 0.4474 (0.4477) data time 0.0006 (0.0016) model time 0.4467 (0.4457) loss 3.3851 (3.3556) grad_norm 1.3875 (1.4034) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][450/625] eta 0:01:18 lr 0.001158 wd 0.0500 time 0.4476 (0.4476) data time 0.0008 (0.0016) model time 0.4468 (0.4456) loss 3.6559 (3.3569) grad_norm 1.3115 (1.4004) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][460/625] eta 0:01:13 lr 0.001158 wd 0.0500 time 0.4430 (0.4476) data time 0.0008 (0.0016) model time 0.4422 (0.4455) loss 3.1471 (3.3571) grad_norm 2.5242 (1.4023) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:32:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][470/625] eta 0:01:09 lr 0.001158 wd 0.0500 time 0.4472 (0.4475) data time 0.0006 (0.0016) model time 0.4465 (0.4455) loss 2.2566 (3.3528) grad_norm 1.2374 (1.3999) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][480/625] eta 0:01:04 lr 0.001158 wd 0.0500 time 0.4443 (0.4474) data time 0.0008 (0.0016) model time 0.4435 (0.4454) loss 3.8532 (3.3458) grad_norm 1.8956 (1.4001) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][490/625] eta 0:01:00 lr 0.001158 wd 0.0500 time 0.4438 (0.4476) data time 0.0008 (0.0016) model time 0.4429 (0.4457) loss 3.3506 (3.3498) grad_norm 1.6445 (1.4011) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][500/625] eta 0:00:55 lr 0.001158 wd 0.0500 time 0.4451 (0.4476) data time 0.0009 (0.0016) model time 0.4442 (0.4456) loss 2.6050 (3.3488) grad_norm 1.1419 (1.4025) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][510/625] eta 0:00:51 lr 0.001158 wd 0.0500 time 0.4440 (0.4475) data time 0.0007 (0.0015) model time 0.4433 (0.4456) loss 3.0747 (3.3490) grad_norm 0.9545 (1.4005) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][520/625] eta 0:00:46 lr 0.001158 wd 0.0500 time 0.4408 (0.4474) data time 0.0007 (0.0015) model time 0.4402 (0.4455) loss 2.4124 (3.3537) grad_norm 1.1014 (1.3988) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][530/625] eta 0:00:42 lr 0.001158 wd 0.0500 time 0.4418 (0.4473) data time 0.0008 (0.0015) model time 0.4411 (0.4454) loss 3.4594 (3.3544) grad_norm 1.3598 (1.4007) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][540/625] eta 0:00:38 lr 0.001158 wd 0.0500 time 0.4422 (0.4472) data time 0.0008 (0.0015) model time 0.4413 (0.4454) loss 3.3938 (3.3520) grad_norm 1.3776 (1.4080) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][550/625] eta 0:00:33 lr 0.001158 wd 0.0500 time 0.4447 (0.4472) data time 0.0008 (0.0015) model time 0.4439 (0.4453) loss 3.1410 (3.3547) grad_norm 1.6795 (1.4067) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][560/625] eta 0:00:29 lr 0.001158 wd 0.0500 time 0.4411 (0.4471) data time 0.0009 (0.0015) model time 0.4402 (0.4453) loss 3.7264 (3.3568) grad_norm 1.7835 (1.4051) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][570/625] eta 0:00:24 lr 0.001158 wd 0.0500 time 0.4411 (0.4471) data time 0.0007 (0.0015) model time 0.4404 (0.4452) loss 4.5503 (3.3557) grad_norm 1.3968 (1.4036) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][580/625] eta 0:00:20 lr 0.001157 wd 0.0500 time 0.4446 (0.4474) data time 0.0006 (0.0015) model time 0.4440 (0.4456) loss 3.5165 (3.3529) grad_norm 1.1311 (1.3998) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][590/625] eta 0:00:15 lr 0.001157 wd 0.0500 time 0.4451 (0.4476) data time 0.0007 (0.0015) model time 0.4444 (0.4458) loss 2.4953 (3.3477) grad_norm 1.6052 (1.4001) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:33:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][600/625] eta 0:00:11 lr 0.001157 wd 0.0500 time 0.4461 (0.4475) data time 0.0006 (0.0015) model time 0.4454 (0.4458) loss 4.2338 (3.3529) grad_norm 1.5002 (1.4004) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][610/625] eta 0:00:06 lr 0.001157 wd 0.0500 time 0.4382 (0.4477) data time 0.0004 (0.0014) model time 0.4378 (0.4460) loss 2.2260 (3.3517) grad_norm 1.5443 (1.3991) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [53/300][620/625] eta 0:00:02 lr 0.001157 wd 0.0500 time 0.4378 (0.4476) data time 0.0005 (0.0014) model time 0.4373 (0.4459) loss 3.3455 (3.3485) grad_norm 1.7304 (1.4022) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 53 training takes 0:04:39 [2024-08-04 16:34:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:34:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:34:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.6323 (0.6323) Acc@1 86.621 (86.621) Acc@5 97.510 (97.510) Mem 16721MB [2024-08-04 16:34:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 1.0918 (0.7839) Acc@1 73.682 (82.187) Acc@5 93.555 (96.373) Mem 16721MB [2024-08-04 16:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.1631 (0.9408) Acc@1 72.607 (78.390) Acc@5 91.943 (94.473) Mem 16721MB [2024-08-04 16:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.173 Acc@5 94.444 [2024-08-04 16:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.2% [2024-08-04 16:34:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.867 (0.867) Loss 0.5464 (0.5464) Acc@1 87.061 (87.061) Acc@5 98.145 (98.145) Mem 16721MB [2024-08-04 16:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.186) Loss 0.9229 (0.6881) Acc@1 77.002 (83.403) Acc@5 94.727 (96.835) Mem 16721MB [2024-08-04 16:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.152) Loss 1.0840 (0.8408) Acc@1 72.217 (79.660) Acc@5 92.285 (95.071) Mem 16721MB [2024-08-04 16:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.405 Acc@5 95.072 [2024-08-04 16:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-08-04 16:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.40% [2024-08-04 16:34:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:34:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:34:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][0/625] eta 0:07:32 lr 0.001157 wd 0.0500 time 0.7246 (0.7246) data time 0.3361 (0.3361) model time 0.0000 (0.0000) loss 3.8381 (3.8381) grad_norm 1.1643 (1.1643) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][10/625] eta 0:04:57 lr 0.001157 wd 0.0500 time 0.4400 (0.4836) data time 0.0007 (0.0314) model time 0.0000 (0.0000) loss 2.2584 (3.2240) grad_norm 1.7071 (1.6095) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][20/625] eta 0:04:40 lr 0.001157 wd 0.0500 time 0.4389 (0.4645) data time 0.0009 (0.0168) model time 0.0000 (0.0000) loss 2.8091 (3.0922) grad_norm 1.0753 (1.5026) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][30/625] eta 0:04:32 lr 0.001157 wd 0.0500 time 0.4466 (0.4581) data time 0.0008 (0.0117) model time 0.0000 (0.0000) loss 3.0967 (3.1143) grad_norm 1.1547 (1.4273) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][40/625] eta 0:04:25 lr 0.001157 wd 0.0500 time 0.4494 (0.4547) data time 0.0006 (0.0090) model time 0.0000 (0.0000) loss 3.2231 (3.1685) grad_norm 1.0530 (1.4034) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][50/625] eta 0:04:20 lr 0.001157 wd 0.0500 time 0.4421 (0.4522) data time 0.0008 (0.0074) model time 0.0000 (0.0000) loss 2.3596 (3.1336) grad_norm 1.4249 (1.4330) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][60/625] eta 0:04:14 lr 0.001157 wd 0.0500 time 0.4433 (0.4505) data time 0.0007 (0.0063) model time 0.4427 (0.4412) loss 2.2587 (3.1532) grad_norm 1.0209 (1.4213) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][70/625] eta 0:04:09 lr 0.001157 wd 0.0500 time 0.4419 (0.4497) data time 0.0006 (0.0056) model time 0.4413 (0.4425) loss 3.8896 (3.1727) grad_norm 1.2577 (1.3914) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:34:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][80/625] eta 0:04:04 lr 0.001157 wd 0.0500 time 0.4416 (0.4488) data time 0.0007 (0.0050) model time 0.4409 (0.4422) loss 2.0713 (3.1539) grad_norm 1.0833 (1.3853) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][90/625] eta 0:03:59 lr 0.001157 wd 0.0500 time 0.4415 (0.4481) data time 0.0008 (0.0045) model time 0.4408 (0.4421) loss 3.1430 (3.1767) grad_norm 1.6525 (1.4010) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][100/625] eta 0:03:55 lr 0.001157 wd 0.0500 time 0.4479 (0.4477) data time 0.0006 (0.0042) model time 0.4473 (0.4423) loss 3.6637 (3.1933) grad_norm 1.2917 (1.3901) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][110/625] eta 0:03:50 lr 0.001157 wd 0.0500 time 0.4424 (0.4473) data time 0.0006 (0.0039) model time 0.4418 (0.4423) loss 3.4562 (3.2295) grad_norm 1.3148 (1.3722) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][120/625] eta 0:03:45 lr 0.001157 wd 0.0500 time 0.4484 (0.4471) data time 0.0007 (0.0036) model time 0.4478 (0.4425) loss 2.5308 (3.2242) grad_norm 1.0559 (1.3858) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][130/625] eta 0:03:41 lr 0.001157 wd 0.0500 time 0.4402 (0.4484) data time 0.0009 (0.0034) model time 0.4394 (0.4451) loss 3.2054 (3.2409) grad_norm 1.3124 (1.3875) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][140/625] eta 0:03:37 lr 0.001157 wd 0.0500 time 0.4446 (0.4480) data time 0.0007 (0.0032) model time 0.4439 (0.4447) loss 2.3402 (3.2509) grad_norm 1.6253 (1.3969) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][150/625] eta 0:03:32 lr 0.001157 wd 0.0500 time 0.4406 (0.4478) data time 0.0006 (0.0031) model time 0.4400 (0.4448) loss 3.8707 (3.2715) grad_norm 1.4188 (1.3920) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][160/625] eta 0:03:28 lr 0.001157 wd 0.0500 time 0.4439 (0.4476) data time 0.0006 (0.0029) model time 0.4433 (0.4447) loss 3.8351 (3.2551) grad_norm 1.3893 (1.3849) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][170/625] eta 0:03:23 lr 0.001157 wd 0.0500 time 0.4468 (0.4474) data time 0.0008 (0.0028) model time 0.4460 (0.4446) loss 3.3051 (3.2730) grad_norm 1.5566 (1.3999) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][180/625] eta 0:03:19 lr 0.001157 wd 0.0500 time 0.4432 (0.4472) data time 0.0006 (0.0027) model time 0.4426 (0.4444) loss 3.7644 (3.2655) grad_norm 2.6395 (1.4101) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][190/625] eta 0:03:14 lr 0.001157 wd 0.0500 time 0.4461 (0.4471) data time 0.0006 (0.0026) model time 0.4455 (0.4444) loss 2.5233 (3.2828) grad_norm 0.9976 (1.4130) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][200/625] eta 0:03:09 lr 0.001157 wd 0.0500 time 0.4446 (0.4471) data time 0.0008 (0.0025) model time 0.4438 (0.4445) loss 3.2329 (3.2853) grad_norm 1.1996 (1.4135) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][210/625] eta 0:03:05 lr 0.001156 wd 0.0500 time 0.4433 (0.4468) data time 0.0006 (0.0024) model time 0.4428 (0.4443) loss 3.9081 (3.2884) grad_norm 1.1667 (1.4140) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:35:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][220/625] eta 0:03:00 lr 0.001156 wd 0.0500 time 0.4405 (0.4466) data time 0.0006 (0.0023) model time 0.4399 (0.4441) loss 3.8400 (3.2949) grad_norm 1.2119 (1.4236) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][230/625] eta 0:02:56 lr 0.001156 wd 0.0500 time 0.4466 (0.4465) data time 0.0006 (0.0023) model time 0.4460 (0.4441) loss 4.2832 (3.3075) grad_norm 1.2572 (1.4229) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][240/625] eta 0:02:51 lr 0.001156 wd 0.0500 time 0.4461 (0.4465) data time 0.0007 (0.0022) model time 0.4454 (0.4441) loss 3.8083 (3.3160) grad_norm 1.9606 (1.4205) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][250/625] eta 0:02:47 lr 0.001156 wd 0.0500 time 0.4469 (0.4464) data time 0.0006 (0.0022) model time 0.4463 (0.4441) loss 4.2292 (3.3237) grad_norm 1.5485 (1.4135) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][260/625] eta 0:02:42 lr 0.001156 wd 0.0500 time 0.4446 (0.4463) data time 0.0008 (0.0021) model time 0.4438 (0.4441) loss 3.6752 (3.3308) grad_norm 2.0144 (1.4147) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][270/625] eta 0:02:38 lr 0.001156 wd 0.0500 time 0.4425 (0.4462) data time 0.0008 (0.0021) model time 0.4417 (0.4440) loss 3.9127 (3.3346) grad_norm 1.1545 (1.4077) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][280/625] eta 0:02:33 lr 0.001156 wd 0.0500 time 0.4400 (0.4460) data time 0.0006 (0.0020) model time 0.4394 (0.4438) loss 2.9353 (3.3308) grad_norm 1.1773 (1.4112) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][290/625] eta 0:02:29 lr 0.001156 wd 0.0500 time 0.4430 (0.4459) data time 0.0009 (0.0020) model time 0.4421 (0.4438) loss 3.5524 (3.3424) grad_norm 1.2446 (1.4018) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][300/625] eta 0:02:25 lr 0.001156 wd 0.0500 time 0.6246 (0.4465) data time 0.0006 (0.0019) model time 0.6240 (0.4445) loss 3.9752 (3.3372) grad_norm 1.1333 (1.3999) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][310/625] eta 0:02:20 lr 0.001156 wd 0.0500 time 0.4464 (0.4464) data time 0.0008 (0.0019) model time 0.4456 (0.4444) loss 3.6620 (3.3341) grad_norm 1.8206 (1.4044) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][320/625] eta 0:02:16 lr 0.001156 wd 0.0500 time 0.4458 (0.4464) data time 0.0009 (0.0019) model time 0.4449 (0.4444) loss 2.8683 (3.3268) grad_norm 1.3424 (1.4058) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][330/625] eta 0:02:11 lr 0.001156 wd 0.0500 time 0.4447 (0.4463) data time 0.0007 (0.0018) model time 0.4441 (0.4444) loss 2.7367 (3.3179) grad_norm 1.3580 (1.4000) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][340/625] eta 0:02:07 lr 0.001156 wd 0.0500 time 0.4389 (0.4462) data time 0.0011 (0.0018) model time 0.4378 (0.4443) loss 2.9314 (3.3186) grad_norm 1.1948 (1.4038) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][350/625] eta 0:02:02 lr 0.001156 wd 0.0500 time 0.4425 (0.4461) data time 0.0009 (0.0018) model time 0.4417 (0.4443) loss 3.4025 (3.3181) grad_norm 1.3470 (1.3998) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][360/625] eta 0:01:58 lr 0.001156 wd 0.0500 time 0.4437 (0.4461) data time 0.0006 (0.0018) model time 0.4431 (0.4442) loss 3.8731 (3.3160) grad_norm 0.9433 (1.4022) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][370/625] eta 0:01:53 lr 0.001156 wd 0.0500 time 0.4447 (0.4460) data time 0.0006 (0.0017) model time 0.4441 (0.4442) loss 3.7853 (3.3176) grad_norm 1.1509 (1.3991) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][380/625] eta 0:01:49 lr 0.001156 wd 0.0500 time 0.4420 (0.4460) data time 0.0008 (0.0017) model time 0.4412 (0.4442) loss 2.2151 (3.3150) grad_norm 1.3538 (1.3981) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][390/625] eta 0:01:44 lr 0.001156 wd 0.0500 time 0.4432 (0.4459) data time 0.0009 (0.0017) model time 0.4422 (0.4442) loss 2.9342 (3.3127) grad_norm 1.1195 (1.3948) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][400/625] eta 0:01:40 lr 0.001156 wd 0.0500 time 0.4416 (0.4459) data time 0.0008 (0.0017) model time 0.4408 (0.4441) loss 3.3527 (3.3121) grad_norm 0.9246 (1.3916) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][410/625] eta 0:01:35 lr 0.001156 wd 0.0500 time 0.4428 (0.4458) data time 0.0007 (0.0016) model time 0.4421 (0.4441) loss 3.8016 (3.3104) grad_norm 1.5105 (1.3914) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][420/625] eta 0:01:31 lr 0.001156 wd 0.0500 time 0.4411 (0.4461) data time 0.0006 (0.0016) model time 0.4405 (0.4445) loss 4.0386 (3.3129) grad_norm 1.4998 (1.3947) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][430/625] eta 0:01:26 lr 0.001156 wd 0.0500 time 0.4457 (0.4461) data time 0.0006 (0.0016) model time 0.4451 (0.4444) loss 3.7098 (3.3169) grad_norm 1.7711 (1.4016) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][440/625] eta 0:01:22 lr 0.001156 wd 0.0500 time 0.4425 (0.4460) data time 0.0006 (0.0016) model time 0.4419 (0.4443) loss 3.0929 (3.3171) grad_norm 2.3447 (1.4031) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][450/625] eta 0:01:18 lr 0.001155 wd 0.0500 time 0.4311 (0.4459) data time 0.0008 (0.0016) model time 0.4302 (0.4443) loss 3.1121 (3.3183) grad_norm 0.9581 (1.4024) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][460/625] eta 0:01:13 lr 0.001155 wd 0.0500 time 0.4405 (0.4458) data time 0.0006 (0.0016) model time 0.4399 (0.4442) loss 3.7569 (3.3163) grad_norm 1.6489 (1.4023) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][470/625] eta 0:01:09 lr 0.001155 wd 0.0500 time 0.4407 (0.4462) data time 0.0006 (0.0015) model time 0.4402 (0.4447) loss 3.2687 (3.3164) grad_norm 1.3630 (1.3992) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][480/625] eta 0:01:04 lr 0.001155 wd 0.0500 time 0.4448 (0.4462) data time 0.0008 (0.0015) model time 0.4440 (0.4446) loss 2.1936 (3.3128) grad_norm 1.5881 (1.3989) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][490/625] eta 0:01:00 lr 0.001155 wd 0.0500 time 0.5610 (0.4463) data time 0.0008 (0.0015) model time 0.5602 (0.4448) loss 3.9881 (3.3128) grad_norm 1.1220 (1.4010) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][500/625] eta 0:00:55 lr 0.001155 wd 0.0500 time 0.4459 (0.4463) data time 0.0006 (0.0015) model time 0.4453 (0.4448) loss 3.5234 (3.3135) grad_norm 1.5389 (1.4003) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][510/625] eta 0:00:51 lr 0.001155 wd 0.0500 time 0.4402 (0.4462) data time 0.0008 (0.0015) model time 0.4394 (0.4447) loss 2.2664 (3.3130) grad_norm 1.6714 (1.4033) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][520/625] eta 0:00:46 lr 0.001155 wd 0.0500 time 0.4426 (0.4461) data time 0.0006 (0.0015) model time 0.4420 (0.4446) loss 4.0561 (3.3132) grad_norm 1.1828 (1.4031) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][530/625] eta 0:00:42 lr 0.001155 wd 0.0500 time 0.4448 (0.4461) data time 0.0008 (0.0015) model time 0.4439 (0.4446) loss 3.7919 (3.3171) grad_norm 1.7010 (1.4017) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][540/625] eta 0:00:37 lr 0.001155 wd 0.0500 time 0.4451 (0.4460) data time 0.0008 (0.0014) model time 0.4443 (0.4446) loss 3.9045 (3.3212) grad_norm 0.9572 (1.4007) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][550/625] eta 0:00:33 lr 0.001155 wd 0.0500 time 0.4425 (0.4460) data time 0.0006 (0.0014) model time 0.4419 (0.4445) loss 3.5547 (3.3285) grad_norm 2.0683 (1.4011) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][560/625] eta 0:00:29 lr 0.001155 wd 0.0500 time 0.4450 (0.4462) data time 0.0006 (0.0014) model time 0.4444 (0.4448) loss 2.6403 (3.3229) grad_norm 1.0867 (1.3996) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][570/625] eta 0:00:24 lr 0.001155 wd 0.0500 time 0.4414 (0.4461) data time 0.0006 (0.0014) model time 0.4407 (0.4447) loss 2.6691 (3.3225) grad_norm 1.3501 (1.3955) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][580/625] eta 0:00:20 lr 0.001155 wd 0.0500 time 0.4431 (0.4461) data time 0.0006 (0.0014) model time 0.4425 (0.4447) loss 3.7094 (3.3243) grad_norm 1.5471 (1.3966) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][590/625] eta 0:00:15 lr 0.001155 wd 0.0500 time 0.4474 (0.4461) data time 0.0006 (0.0014) model time 0.4467 (0.4447) loss 2.8574 (3.3255) grad_norm 1.4162 (1.3963) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][600/625] eta 0:00:11 lr 0.001155 wd 0.0500 time 0.4427 (0.4460) data time 0.0006 (0.0014) model time 0.4421 (0.4446) loss 2.9934 (3.3218) grad_norm 1.7258 (1.3988) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][610/625] eta 0:00:06 lr 0.001155 wd 0.0500 time 0.4409 (0.4460) data time 0.0004 (0.0014) model time 0.4405 (0.4446) loss 3.9903 (3.3226) grad_norm 1.0049 (1.4008) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [54/300][620/625] eta 0:00:02 lr 0.001155 wd 0.0500 time 0.4375 (0.4459) data time 0.0004 (0.0014) model time 0.4371 (0.4445) loss 4.0318 (3.3244) grad_norm 1.3403 (1.4021) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:38:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 54 training takes 0:04:38 [2024-08-04 16:38:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:39:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:39:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.6187 (0.6187) Acc@1 85.254 (85.254) Acc@5 98.096 (98.096) Mem 16721MB [2024-08-04 16:39:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.149) Loss 1.0781 (0.7921) Acc@1 74.854 (81.863) Acc@5 93.359 (96.413) Mem 16721MB [2024-08-04 16:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.1895 (0.9478) Acc@1 71.436 (78.306) Acc@5 92.188 (94.559) Mem 16721MB [2024-08-04 16:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.083 Acc@5 94.540 [2024-08-04 16:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.1% [2024-08-04 16:39:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.971 (0.971) Loss 0.5449 (0.5449) Acc@1 87.158 (87.158) Acc@5 98.047 (98.047) Mem 16721MB [2024-08-04 16:39:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.196) Loss 0.9165 (0.6862) Acc@1 77.197 (83.501) Acc@5 94.629 (96.831) Mem 16721MB [2024-08-04 16:39:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.157) Loss 1.0771 (0.8377) Acc@1 72.559 (79.827) Acc@5 92.578 (95.110) Mem 16721MB [2024-08-04 16:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.555 Acc@5 95.110 [2024-08-04 16:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-08-04 16:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.55% [2024-08-04 16:39:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:39:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:39:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][0/625] eta 0:07:48 lr 0.001155 wd 0.0500 time 0.7492 (0.7492) data time 0.3651 (0.3651) model time 0.0000 (0.0000) loss 3.3762 (3.3762) grad_norm 1.0397 (1.0397) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][10/625] eta 0:04:49 lr 0.001155 wd 0.0500 time 0.4446 (0.4711) data time 0.0006 (0.0339) model time 0.0000 (0.0000) loss 2.6077 (3.1848) grad_norm 1.3486 (1.2571) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][20/625] eta 0:04:36 lr 0.001155 wd 0.0500 time 0.4471 (0.4576) data time 0.0008 (0.0182) model time 0.0000 (0.0000) loss 3.4640 (3.2974) grad_norm 1.1963 (1.3601) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][30/625] eta 0:04:29 lr 0.001155 wd 0.0500 time 0.4422 (0.4533) data time 0.0006 (0.0126) model time 0.0000 (0.0000) loss 3.7983 (3.3886) grad_norm 1.4174 (1.3378) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][40/625] eta 0:04:23 lr 0.001155 wd 0.0500 time 0.4445 (0.4509) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 3.5331 (3.4323) grad_norm 1.9421 (1.3779) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][50/625] eta 0:04:20 lr 0.001155 wd 0.0500 time 0.4419 (0.4533) data time 0.0006 (0.0080) model time 0.0000 (0.0000) loss 4.0612 (3.4147) grad_norm 3.3618 (1.4257) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][60/625] eta 0:04:15 lr 0.001155 wd 0.0500 time 0.4415 (0.4516) data time 0.0007 (0.0068) model time 0.4408 (0.4419) loss 2.5542 (3.3814) grad_norm 1.6562 (1.4383) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][70/625] eta 0:04:09 lr 0.001154 wd 0.0500 time 0.4443 (0.4504) data time 0.0006 (0.0060) model time 0.4437 (0.4422) loss 4.3980 (3.3799) grad_norm 1.3238 (1.4324) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][80/625] eta 0:04:07 lr 0.001154 wd 0.0500 time 0.6065 (0.4537) data time 0.0006 (0.0053) model time 0.6059 (0.4535) loss 2.8674 (3.3598) grad_norm 1.2616 (1.4641) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][90/625] eta 0:04:02 lr 0.001154 wd 0.0500 time 0.4408 (0.4524) data time 0.0008 (0.0048) model time 0.4400 (0.4503) loss 3.5199 (3.3568) grad_norm 1.2137 (1.4585) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:39:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][100/625] eta 0:03:57 lr 0.001154 wd 0.0500 time 0.4366 (0.4515) data time 0.0007 (0.0044) model time 0.4359 (0.4489) loss 2.8229 (3.3649) grad_norm 1.4274 (1.4630) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][110/625] eta 0:03:52 lr 0.001154 wd 0.0500 time 0.4443 (0.4508) data time 0.0008 (0.0041) model time 0.4435 (0.4479) loss 3.6496 (3.3733) grad_norm 1.1689 (1.4374) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][120/625] eta 0:03:47 lr 0.001154 wd 0.0500 time 0.4481 (0.4503) data time 0.0006 (0.0038) model time 0.4475 (0.4473) loss 4.2258 (3.4125) grad_norm 1.5138 (1.4189) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][130/625] eta 0:03:42 lr 0.001154 wd 0.0500 time 0.4442 (0.4499) data time 0.0008 (0.0036) model time 0.4434 (0.4468) loss 3.2405 (3.3858) grad_norm 1.2009 (1.4047) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][140/625] eta 0:03:37 lr 0.001154 wd 0.0500 time 0.4437 (0.4495) data time 0.0009 (0.0034) model time 0.4428 (0.4465) loss 3.6785 (3.3873) grad_norm 1.6088 (1.4028) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][150/625] eta 0:03:33 lr 0.001154 wd 0.0500 time 0.4401 (0.4490) data time 0.0007 (0.0032) model time 0.4394 (0.4460) loss 3.9360 (3.3914) grad_norm 1.0186 (1.4084) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][160/625] eta 0:03:28 lr 0.001154 wd 0.0500 time 0.4482 (0.4486) data time 0.0008 (0.0031) model time 0.4474 (0.4457) loss 3.5356 (3.3938) grad_norm 1.2527 (1.4142) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][170/625] eta 0:03:23 lr 0.001154 wd 0.0500 time 0.4388 (0.4483) data time 0.0006 (0.0030) model time 0.4382 (0.4453) loss 3.8017 (3.4033) grad_norm 1.6973 (1.4089) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][180/625] eta 0:03:19 lr 0.001154 wd 0.0500 time 0.4546 (0.4480) data time 0.0008 (0.0028) model time 0.4538 (0.4451) loss 3.2655 (3.4137) grad_norm 1.3217 (1.4093) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][190/625] eta 0:03:14 lr 0.001154 wd 0.0500 time 0.4416 (0.4479) data time 0.0008 (0.0027) model time 0.4407 (0.4451) loss 3.9150 (3.4222) grad_norm 1.2777 (1.4078) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][200/625] eta 0:03:10 lr 0.001154 wd 0.0500 time 0.4466 (0.4477) data time 0.0006 (0.0026) model time 0.4460 (0.4450) loss 3.8318 (3.4307) grad_norm 1.4036 (1.4048) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][210/625] eta 0:03:05 lr 0.001154 wd 0.0500 time 0.4434 (0.4475) data time 0.0009 (0.0026) model time 0.4425 (0.4448) loss 3.0639 (3.4142) grad_norm 0.9933 (1.3927) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][220/625] eta 0:03:01 lr 0.001154 wd 0.0500 time 0.4441 (0.4472) data time 0.0008 (0.0025) model time 0.4433 (0.4446) loss 2.8826 (3.4071) grad_norm 1.6956 (1.3911) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][230/625] eta 0:02:56 lr 0.001154 wd 0.0500 time 0.4424 (0.4470) data time 0.0008 (0.0024) model time 0.4416 (0.4444) loss 2.7548 (3.4043) grad_norm 1.2134 (1.4044) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-04 16:40:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][240/625] eta 0:02:52 lr 0.001154 wd 0.0500 time 0.4434 (0.4475) data time 0.0007 (0.0023) model time 0.4428 (0.4451) loss 2.5972 (3.4069) grad_norm 1.0194 (1.3979) loss_scale 8192.0000 (4265.9585) mem 16721MB [2024-08-04 16:41:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][250/625] eta 0:02:47 lr 0.001154 wd 0.0500 time 0.4397 (0.4473) data time 0.0010 (0.0023) model time 0.4387 (0.4450) loss 3.1694 (3.4105) grad_norm 1.3237 (1.3915) loss_scale 8192.0000 (4422.3745) mem 16721MB [2024-08-04 16:41:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][260/625] eta 0:02:43 lr 0.001154 wd 0.0500 time 0.4431 (0.4472) data time 0.0008 (0.0022) model time 0.4423 (0.4449) loss 2.7066 (3.3982) grad_norm 2.1220 (1.3910) loss_scale 8192.0000 (4566.8046) mem 16721MB [2024-08-04 16:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][270/625] eta 0:02:38 lr 0.001154 wd 0.0500 time 0.4475 (0.4471) data time 0.0008 (0.0022) model time 0.4466 (0.4449) loss 3.2357 (3.3922) grad_norm 1.0963 (1.3924) loss_scale 8192.0000 (4700.5756) mem 16721MB [2024-08-04 16:41:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][280/625] eta 0:02:34 lr 0.001154 wd 0.0500 time 0.4454 (0.4470) data time 0.0009 (0.0021) model time 0.4445 (0.4448) loss 3.6183 (3.3908) grad_norm 1.6804 (1.3914) loss_scale 8192.0000 (4824.8256) mem 16721MB [2024-08-04 16:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 16:41:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:41:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 16:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 16:44:48 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 16:45:03 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 16:45:03 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 16:45:06 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 16:45:08 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 16:45:08 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 55) [2024-08-04 16:45:08 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 16:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][290/625] eta 0:27:34 lr 0.001154 wd 0.0500 time 0.4479 (4.9379) data time 0.0006 (0.1609) model time 0.4473 (4.7770) loss 4.1802 (3.8480) grad_norm 1.0067 (1.2394) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][300/625] eta 0:09:22 lr 0.001154 wd 0.0500 time 0.4468 (1.7299) data time 0.0007 (0.0466) model time 0.4461 (1.6834) loss 3.7445 (3.6918) grad_norm 1.1957 (1.3491) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][310/625] eta 0:06:16 lr 0.001154 wd 0.0500 time 0.4440 (1.1948) data time 0.0009 (0.0275) model time 0.4432 (1.1673) loss 3.2495 (3.6498) grad_norm 1.0662 (1.3458) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][320/625] eta 0:04:59 lr 0.001153 wd 0.0500 time 0.4443 (0.9820) data time 0.0006 (0.0197) model time 0.4437 (0.9623) loss 2.8247 (3.6103) grad_norm 1.2773 (1.3633) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][330/625] eta 0:04:14 lr 0.001153 wd 0.0500 time 0.4441 (0.8637) data time 0.0006 (0.0154) model time 0.4435 (0.8483) loss 3.4708 (3.5544) grad_norm 1.1629 (1.3507) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][340/625] eta 0:03:44 lr 0.001153 wd 0.0500 time 0.4471 (0.7861) data time 0.0006 (0.0127) model time 0.4465 (0.7734) loss 4.3478 (3.5490) grad_norm 1.6858 (1.3901) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:45:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][350/625] eta 0:03:21 lr 0.001153 wd 0.0500 time 0.4475 (0.7331) data time 0.0006 (0.0108) model time 0.4469 (0.7223) loss 3.6187 (3.5227) grad_norm 1.6256 (1.4088) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][360/625] eta 0:03:04 lr 0.001153 wd 0.0500 time 0.4484 (0.6945) data time 0.0007 (0.0095) model time 0.4477 (0.6850) loss 3.7434 (3.4907) grad_norm 1.0998 (1.4025) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][370/625] eta 0:02:49 lr 0.001153 wd 0.0500 time 0.4475 (0.6650) data time 0.0009 (0.0085) model time 0.4466 (0.6565) loss 3.6810 (3.4687) grad_norm 1.0113 (1.3941) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][380/625] eta 0:02:37 lr 0.001153 wd 0.0500 time 0.4439 (0.6418) data time 0.0008 (0.0076) model time 0.4431 (0.6342) loss 3.1593 (3.4606) grad_norm 1.1258 (1.3765) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][390/625] eta 0:02:26 lr 0.001153 wd 0.0500 time 0.4444 (0.6232) data time 0.0009 (0.0070) model time 0.4435 (0.6162) loss 3.2998 (3.4895) grad_norm 1.2352 (1.3839) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][400/625] eta 0:02:16 lr 0.001153 wd 0.0500 time 0.4457 (0.6077) data time 0.0008 (0.0065) model time 0.4449 (0.6013) loss 3.5673 (3.4844) grad_norm 1.2931 (1.3772) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][410/625] eta 0:02:07 lr 0.001153 wd 0.0500 time 0.4431 (0.5946) data time 0.0008 (0.0060) model time 0.4423 (0.5886) loss 2.9001 (3.4794) grad_norm 2.4430 (1.3913) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][420/625] eta 0:01:59 lr 0.001153 wd 0.0500 time 0.4499 (0.5837) data time 0.0009 (0.0056) model time 0.4490 (0.5781) loss 3.7380 (3.4694) grad_norm 2.1161 (1.3892) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][430/625] eta 0:01:51 lr 0.001153 wd 0.0500 time 0.4448 (0.5743) data time 0.0007 (0.0053) model time 0.4441 (0.5690) loss 3.1676 (3.4526) grad_norm 1.8290 (1.3862) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][440/625] eta 0:01:44 lr 0.001153 wd 0.0500 time 0.4503 (0.5662) data time 0.0007 (0.0050) model time 0.4497 (0.5612) loss 3.4263 (3.4443) grad_norm 1.3385 (1.3856) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][450/625] eta 0:01:37 lr 0.001153 wd 0.0500 time 0.4476 (0.5590) data time 0.0006 (0.0047) model time 0.4470 (0.5543) loss 3.2157 (3.4437) grad_norm 1.0334 (1.3807) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][460/625] eta 0:01:31 lr 0.001153 wd 0.0500 time 0.4474 (0.5526) data time 0.0008 (0.0045) model time 0.4466 (0.5481) loss 2.1676 (3.4301) grad_norm 1.0696 (1.3749) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][470/625] eta 0:01:24 lr 0.001153 wd 0.0500 time 0.3870 (0.5478) data time 0.0007 (0.0043) model time 0.3863 (0.5435) loss 3.4779 (3.4229) grad_norm 1.5128 (1.3751) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:46:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][480/625] eta 0:01:18 lr 0.001153 wd 0.0500 time 0.4437 (0.5425) data time 0.0006 (0.0041) model time 0.4431 (0.5383) loss 3.4567 (3.4235) grad_norm 1.2427 (1.3694) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][490/625] eta 0:01:12 lr 0.001153 wd 0.0500 time 0.4518 (0.5378) data time 0.0008 (0.0040) model time 0.4510 (0.5338) loss 4.0637 (3.4091) grad_norm 1.9811 (1.3770) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][500/625] eta 0:01:06 lr 0.001153 wd 0.0500 time 0.4471 (0.5336) data time 0.0006 (0.0038) model time 0.4465 (0.5298) loss 3.3465 (3.4041) grad_norm 1.2423 (1.3868) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][510/625] eta 0:01:00 lr 0.001153 wd 0.0500 time 0.4466 (0.5298) data time 0.0009 (0.0037) model time 0.4457 (0.5261) loss 3.6282 (3.4037) grad_norm 1.5196 (1.3815) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][520/625] eta 0:00:55 lr 0.001153 wd 0.0500 time 0.4487 (0.5263) data time 0.0006 (0.0036) model time 0.4481 (0.5227) loss 2.6088 (3.3950) grad_norm 1.3841 (1.3761) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][530/625] eta 0:00:49 lr 0.001153 wd 0.0500 time 0.4503 (0.5230) data time 0.0006 (0.0035) model time 0.4497 (0.5195) loss 2.4245 (3.3999) grad_norm 1.0042 (1.3745) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][540/625] eta 0:00:44 lr 0.001153 wd 0.0500 time 0.4480 (0.5200) data time 0.0006 (0.0034) model time 0.4474 (0.5166) loss 2.4679 (3.3860) grad_norm 1.5594 (1.3747) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][550/625] eta 0:00:38 lr 0.001153 wd 0.0500 time 0.4431 (0.5172) data time 0.0006 (0.0033) model time 0.4425 (0.5139) loss 3.1257 (3.3810) grad_norm 1.6895 (1.3792) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][560/625] eta 0:00:33 lr 0.001152 wd 0.0500 time 0.4454 (0.5146) data time 0.0008 (0.0032) model time 0.4446 (0.5114) loss 3.4127 (3.3776) grad_norm 1.3627 (1.3734) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][570/625] eta 0:00:28 lr 0.001152 wd 0.0500 time 0.4480 (0.5122) data time 0.0006 (0.0031) model time 0.4474 (0.5091) loss 2.6111 (3.3771) grad_norm 1.5690 (1.3776) loss_scale 8192.0000 (8192.0000) mem 16702MB [2024-08-04 16:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][580/625] eta 0:00:22 lr 0.001152 wd 0.0500 time 0.4459 (0.5099) data time 0.0006 (0.0030) model time 0.4453 (0.5069) loss 3.0870 (3.3725) grad_norm 1.8184 (inf) loss_scale 4096.0000 (8150.2041) mem 16702MB [2024-08-04 16:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][590/625] eta 0:00:17 lr 0.001152 wd 0.0500 time 0.4459 (0.5079) data time 0.0008 (0.0029) model time 0.4451 (0.5049) loss 3.4522 (3.3622) grad_norm 2.7151 (inf) loss_scale 4096.0000 (8016.8421) mem 16702MB [2024-08-04 16:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][600/625] eta 0:00:12 lr 0.001152 wd 0.0500 time 0.4454 (0.5059) data time 0.0008 (0.0029) model time 0.4447 (0.5031) loss 3.7130 (3.3615) grad_norm 1.0961 (inf) loss_scale 4096.0000 (7891.9745) mem 16702MB [2024-08-04 16:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][610/625] eta 0:00:07 lr 0.001152 wd 0.0500 time 0.4398 (0.5040) data time 0.0004 (0.0028) model time 0.4394 (0.5012) loss 3.8808 (3.3731) grad_norm 1.1629 (inf) loss_scale 4096.0000 (7774.8148) mem 16702MB [2024-08-04 16:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [55/300][620/625] eta 0:00:02 lr 0.001152 wd 0.0500 time 0.4422 (0.5021) data time 0.0004 (0.0027) model time 0.4418 (0.4994) loss 3.3017 (3.3688) grad_norm 1.2319 (inf) loss_scale 4096.0000 (7664.6707) mem 16702MB [2024-08-04 16:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 55 training takes 0:02:49 [2024-08-04 16:48:02 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:48:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:48:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 0.6597 (0.6597) Acc@1 86.719 (86.719) Acc@5 97.852 (97.852) Mem 16702MB [2024-08-04 16:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.152) Loss 1.0352 (0.8154) Acc@1 75.830 (82.386) Acc@5 93.799 (96.515) Mem 16702MB [2024-08-04 16:48:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 1.2031 (0.9669) Acc@1 72.461 (78.771) Acc@5 91.992 (94.622) Mem 16702MB [2024-08-04 16:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.411 Acc@5 94.600 [2024-08-04 16:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.4% [2024-08-04 16:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.41% [2024-08-04 16:48:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 16:48:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 16:48:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.5430 (0.5430) Acc@1 87.207 (87.207) Acc@5 98.096 (98.096) Mem 16702MB [2024-08-04 16:48:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 0.9126 (0.6847) Acc@1 77.344 (83.651) Acc@5 94.727 (96.919) Mem 16702MB [2024-08-04 16:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.132) Loss 1.0742 (0.8345) Acc@1 72.705 (79.985) Acc@5 92.725 (95.201) Mem 16702MB [2024-08-04 16:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.694 Acc@5 95.188 [2024-08-04 16:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-04 16:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.69% [2024-08-04 16:48:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:48:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][0/625] eta 0:09:04 lr 0.001152 wd 0.0500 time 0.8706 (0.8706) data time 0.3981 (0.3981) model time 0.0000 (0.0000) loss 2.9211 (2.9211) grad_norm 0.9748 (0.9748) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-04 16:48:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][10/625] eta 0:04:58 lr 0.001152 wd 0.0500 time 0.4496 (0.4850) data time 0.0006 (0.0370) model time 0.0000 (0.0000) loss 3.8421 (3.3561) grad_norm 1.7247 (1.2976) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][20/625] eta 0:04:55 lr 0.001152 wd 0.0500 time 0.6647 (0.4892) data time 0.0007 (0.0198) model time 0.0000 (0.0000) loss 2.6760 (3.3062) grad_norm 1.2064 (1.3615) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][30/625] eta 0:04:42 lr 0.001152 wd 0.0500 time 0.4500 (0.4743) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 3.4830 (3.2746) grad_norm 1.2313 (1.3986) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][40/625] eta 0:04:33 lr 0.001152 wd 0.0500 time 0.4465 (0.4680) data time 0.0007 (0.0105) model time 0.0000 (0.0000) loss 3.7436 (3.3146) grad_norm 1.6274 (1.4406) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][50/625] eta 0:04:26 lr 0.001152 wd 0.0500 time 0.4477 (0.4639) data time 0.0006 (0.0086) model time 0.0000 (0.0000) loss 3.6140 (3.2686) grad_norm 1.0553 (1.4239) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][60/625] eta 0:04:20 lr 0.001152 wd 0.0500 time 0.4510 (0.4614) data time 0.0006 (0.0073) model time 0.4504 (0.4476) loss 3.5135 (3.3012) grad_norm 1.3926 (1.4311) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][70/625] eta 0:04:14 lr 0.001152 wd 0.0500 time 0.4434 (0.4593) data time 0.0009 (0.0064) model time 0.4426 (0.4467) loss 3.4303 (3.3333) grad_norm 1.8089 (1.4573) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][80/625] eta 0:04:09 lr 0.001152 wd 0.0500 time 0.4441 (0.4576) data time 0.0006 (0.0057) model time 0.4434 (0.4461) loss 2.8971 (3.3186) grad_norm 1.1495 (1.4742) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][90/625] eta 0:04:04 lr 0.001152 wd 0.0500 time 0.4507 (0.4566) data time 0.0006 (0.0052) model time 0.4501 (0.4464) loss 3.4713 (3.3625) grad_norm 1.0435 (1.4496) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][100/625] eta 0:03:59 lr 0.001152 wd 0.0500 time 0.4458 (0.4556) data time 0.0008 (0.0047) model time 0.4450 (0.4464) loss 3.6479 (3.3845) grad_norm 1.2278 (1.4430) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][110/625] eta 0:03:54 lr 0.001152 wd 0.0500 time 0.4487 (0.4549) data time 0.0008 (0.0044) model time 0.4478 (0.4465) loss 3.6355 (3.3700) grad_norm 1.3854 (1.4477) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][120/625] eta 0:03:49 lr 0.001152 wd 0.0500 time 0.4485 (0.4543) data time 0.0008 (0.0041) model time 0.4477 (0.4465) loss 2.9480 (3.3484) grad_norm 1.2158 (1.4518) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][130/625] eta 0:03:44 lr 0.001152 wd 0.0500 time 0.4444 (0.4537) data time 0.0006 (0.0038) model time 0.4438 (0.4464) loss 2.3323 (3.3206) grad_norm 1.0490 (1.4453) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][140/625] eta 0:03:39 lr 0.001152 wd 0.0500 time 0.4437 (0.4532) data time 0.0009 (0.0036) model time 0.4428 (0.4463) loss 3.3707 (3.3166) grad_norm 1.2815 (1.4399) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][150/625] eta 0:03:35 lr 0.001152 wd 0.0500 time 0.4504 (0.4528) data time 0.0008 (0.0034) model time 0.4496 (0.4463) loss 3.0548 (3.3372) grad_norm 2.2915 (1.4424) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][160/625] eta 0:03:30 lr 0.001152 wd 0.0500 time 0.4493 (0.4524) data time 0.0006 (0.0033) model time 0.4487 (0.4463) loss 3.3389 (3.3340) grad_norm 1.6259 (1.4447) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][170/625] eta 0:03:25 lr 0.001151 wd 0.0500 time 0.4487 (0.4522) data time 0.0007 (0.0031) model time 0.4480 (0.4464) loss 4.3657 (3.3424) grad_norm 1.3768 (1.4383) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][180/625] eta 0:03:21 lr 0.001151 wd 0.0500 time 0.4455 (0.4519) data time 0.0009 (0.0030) model time 0.4446 (0.4464) loss 2.2678 (3.3450) grad_norm 1.3011 (1.4243) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][190/625] eta 0:03:16 lr 0.001151 wd 0.0500 time 0.4474 (0.4516) data time 0.0007 (0.0029) model time 0.4467 (0.4463) loss 3.6654 (3.3320) grad_norm 2.1103 (1.4276) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][200/625] eta 0:03:11 lr 0.001151 wd 0.0500 time 0.4454 (0.4514) data time 0.0009 (0.0028) model time 0.4445 (0.4463) loss 3.4582 (3.3279) grad_norm 1.1484 (1.4150) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][210/625] eta 0:03:07 lr 0.001151 wd 0.0500 time 0.4536 (0.4512) data time 0.0006 (0.0027) model time 0.4529 (0.4463) loss 4.0359 (3.3323) grad_norm 1.4385 (1.4134) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][220/625] eta 0:03:02 lr 0.001151 wd 0.0500 time 0.4460 (0.4510) data time 0.0007 (0.0026) model time 0.4454 (0.4463) loss 4.1775 (3.3398) grad_norm 2.8158 (1.4210) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][230/625] eta 0:02:58 lr 0.001151 wd 0.0500 time 0.4444 (0.4508) data time 0.0009 (0.0025) model time 0.4435 (0.4463) loss 3.2235 (3.3489) grad_norm 1.9164 (1.4245) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][240/625] eta 0:02:53 lr 0.001151 wd 0.0500 time 0.4517 (0.4507) data time 0.0007 (0.0025) model time 0.4510 (0.4464) loss 3.9762 (3.3546) grad_norm 1.7421 (1.4317) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][250/625] eta 0:02:49 lr 0.001151 wd 0.0500 time 0.4423 (0.4507) data time 0.0008 (0.0024) model time 0.4415 (0.4465) loss 2.8771 (3.3568) grad_norm 1.6925 (1.4339) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][260/625] eta 0:02:44 lr 0.001151 wd 0.0500 time 0.4522 (0.4507) data time 0.0009 (0.0024) model time 0.4513 (0.4466) loss 3.3646 (3.3562) grad_norm 1.0888 (1.4246) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][270/625] eta 0:02:39 lr 0.001151 wd 0.0500 time 0.4464 (0.4506) data time 0.0006 (0.0023) model time 0.4458 (0.4466) loss 4.1877 (3.3537) grad_norm 1.2776 (1.4243) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][280/625] eta 0:02:35 lr 0.001151 wd 0.0500 time 0.4447 (0.4504) data time 0.0006 (0.0022) model time 0.4441 (0.4466) loss 3.4029 (3.3527) grad_norm 1.0351 (1.4196) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][290/625] eta 0:02:30 lr 0.001151 wd 0.0500 time 0.4491 (0.4503) data time 0.0008 (0.0022) model time 0.4482 (0.4466) loss 3.0500 (3.3550) grad_norm 1.8989 (1.4267) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][300/625] eta 0:02:26 lr 0.001151 wd 0.0500 time 0.4461 (0.4502) data time 0.0007 (0.0021) model time 0.4455 (0.4466) loss 4.0319 (3.3583) grad_norm 2.0716 (1.4279) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][310/625] eta 0:02:21 lr 0.001151 wd 0.0500 time 0.4453 (0.4501) data time 0.0007 (0.0021) model time 0.4447 (0.4465) loss 2.9596 (3.3476) grad_norm 1.4671 (1.4285) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][320/625] eta 0:02:17 lr 0.001151 wd 0.0500 time 0.4452 (0.4500) data time 0.0007 (0.0021) model time 0.4445 (0.4465) loss 3.8558 (3.3434) grad_norm 1.2965 (1.4247) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][330/625] eta 0:02:12 lr 0.001151 wd 0.0500 time 0.4510 (0.4499) data time 0.0006 (0.0020) model time 0.4504 (0.4465) loss 3.4816 (3.3384) grad_norm 1.1279 (1.4160) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:50:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][340/625] eta 0:02:08 lr 0.001151 wd 0.0500 time 0.4425 (0.4498) data time 0.0006 (0.0020) model time 0.4419 (0.4465) loss 3.3470 (3.3440) grad_norm 1.5202 (1.4156) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][350/625] eta 0:02:03 lr 0.001151 wd 0.0500 time 0.4452 (0.4503) data time 0.0009 (0.0019) model time 0.4444 (0.4471) loss 2.3566 (3.3448) grad_norm 0.9603 (1.4132) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][360/625] eta 0:01:59 lr 0.001151 wd 0.0500 time 0.4472 (0.4505) data time 0.0008 (0.0019) model time 0.4464 (0.4475) loss 2.3394 (3.3375) grad_norm 1.8067 (1.4170) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][370/625] eta 0:01:54 lr 0.001151 wd 0.0500 time 0.4487 (0.4503) data time 0.0008 (0.0019) model time 0.4479 (0.4474) loss 2.3820 (3.3341) grad_norm 1.2293 (1.4193) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][380/625] eta 0:01:50 lr 0.001151 wd 0.0500 time 0.4477 (0.4502) data time 0.0006 (0.0019) model time 0.4471 (0.4473) loss 3.7081 (3.3289) grad_norm 1.0565 (1.4142) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][390/625] eta 0:01:45 lr 0.001151 wd 0.0500 time 0.4456 (0.4502) data time 0.0008 (0.0018) model time 0.4448 (0.4473) loss 3.9532 (3.3276) grad_norm 1.8092 (1.4157) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][400/625] eta 0:01:41 lr 0.001151 wd 0.0500 time 0.4440 (0.4501) data time 0.0008 (0.0018) model time 0.4433 (0.4472) loss 3.4682 (3.3373) grad_norm 1.3056 (1.4174) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][410/625] eta 0:01:36 lr 0.001150 wd 0.0500 time 0.4487 (0.4500) data time 0.0009 (0.0018) model time 0.4477 (0.4472) loss 3.4502 (3.3345) grad_norm 1.0443 (1.4144) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][420/625] eta 0:01:32 lr 0.001150 wd 0.0500 time 0.4462 (0.4499) data time 0.0008 (0.0018) model time 0.4454 (0.4471) loss 3.6275 (3.3344) grad_norm 1.1376 (1.4114) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][430/625] eta 0:01:27 lr 0.001150 wd 0.0500 time 0.4486 (0.4498) data time 0.0006 (0.0017) model time 0.4480 (0.4471) loss 3.0809 (3.3343) grad_norm 1.1360 (1.4082) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][440/625] eta 0:01:23 lr 0.001150 wd 0.0500 time 0.4411 (0.4498) data time 0.0006 (0.0017) model time 0.4405 (0.4471) loss 2.7331 (3.3368) grad_norm 1.3912 (1.4071) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][450/625] eta 0:01:18 lr 0.001150 wd 0.0500 time 0.4477 (0.4497) data time 0.0009 (0.0017) model time 0.4468 (0.4471) loss 3.4843 (3.3397) grad_norm 1.6848 (1.4064) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][460/625] eta 0:01:14 lr 0.001150 wd 0.0500 time 0.4466 (0.4497) data time 0.0008 (0.0017) model time 0.4458 (0.4471) loss 3.6063 (3.3415) grad_norm 1.1630 (1.4115) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:51:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][470/625] eta 0:01:09 lr 0.001150 wd 0.0500 time 0.4442 (0.4496) data time 0.0007 (0.0017) model time 0.4435 (0.4471) loss 2.2073 (3.3333) grad_norm 1.3468 (1.4123) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][480/625] eta 0:01:05 lr 0.001150 wd 0.0500 time 0.4475 (0.4496) data time 0.0008 (0.0016) model time 0.4468 (0.4471) loss 3.5815 (3.3339) grad_norm 1.8307 (1.4125) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][490/625] eta 0:01:00 lr 0.001150 wd 0.0500 time 0.4552 (0.4495) data time 0.0007 (0.0016) model time 0.4545 (0.4470) loss 3.5719 (3.3268) grad_norm 1.0994 (1.4148) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][500/625] eta 0:00:56 lr 0.001150 wd 0.0500 time 0.4436 (0.4495) data time 0.0006 (0.0016) model time 0.4430 (0.4470) loss 3.7325 (3.3263) grad_norm 1.3842 (1.4122) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][510/625] eta 0:00:51 lr 0.001150 wd 0.0500 time 0.4455 (0.4494) data time 0.0009 (0.0016) model time 0.4447 (0.4470) loss 3.5941 (3.3233) grad_norm 1.1834 (1.4102) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][520/625] eta 0:00:47 lr 0.001150 wd 0.0500 time 0.4457 (0.4493) data time 0.0006 (0.0016) model time 0.4451 (0.4469) loss 3.6310 (3.3258) grad_norm 1.1893 (1.4081) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][530/625] eta 0:00:42 lr 0.001150 wd 0.0500 time 0.4460 (0.4493) data time 0.0008 (0.0016) model time 0.4452 (0.4469) loss 3.3071 (3.3263) grad_norm 1.2758 (1.4087) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][540/625] eta 0:00:38 lr 0.001150 wd 0.0500 time 0.4450 (0.4495) data time 0.0007 (0.0016) model time 0.4443 (0.4472) loss 3.6389 (3.3278) grad_norm 0.9701 (1.4065) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][550/625] eta 0:00:33 lr 0.001150 wd 0.0500 time 0.4472 (0.4496) data time 0.0007 (0.0015) model time 0.4465 (0.4473) loss 3.4606 (3.3294) grad_norm 1.1283 (1.4062) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][560/625] eta 0:00:29 lr 0.001150 wd 0.0500 time 0.4467 (0.4496) data time 0.0009 (0.0015) model time 0.4458 (0.4473) loss 2.3068 (3.3240) grad_norm 1.3627 (1.4069) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][570/625] eta 0:00:24 lr 0.001150 wd 0.0500 time 0.4578 (0.4495) data time 0.0008 (0.0015) model time 0.4570 (0.4473) loss 3.9023 (3.3286) grad_norm 1.5491 (1.4104) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][580/625] eta 0:00:20 lr 0.001150 wd 0.0500 time 0.4454 (0.4495) data time 0.0006 (0.0015) model time 0.4448 (0.4473) loss 3.8753 (3.3309) grad_norm 1.3579 (1.4086) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][590/625] eta 0:00:15 lr 0.001150 wd 0.0500 time 0.4449 (0.4494) data time 0.0006 (0.0015) model time 0.4443 (0.4472) loss 4.0206 (3.3370) grad_norm 1.2945 (1.4084) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:52:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][600/625] eta 0:00:11 lr 0.001150 wd 0.0500 time 0.4455 (0.4494) data time 0.0008 (0.0015) model time 0.4447 (0.4472) loss 3.8134 (3.3351) grad_norm 1.2379 (1.4080) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][610/625] eta 0:00:06 lr 0.001150 wd 0.0500 time 0.4419 (0.4493) data time 0.0004 (0.0015) model time 0.4415 (0.4472) loss 2.4923 (3.3301) grad_norm 1.2084 (1.4058) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [56/300][620/625] eta 0:00:02 lr 0.001150 wd 0.0500 time 0.4433 (0.4492) data time 0.0004 (0.0015) model time 0.4429 (0.4471) loss 2.2759 (3.3287) grad_norm 1.8391 (1.4081) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:06 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 56 training takes 0:04:40 [2024-08-04 16:53:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:53:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.6465 (0.6465) Acc@1 85.303 (85.303) Acc@5 97.217 (97.217) Mem 16704MB [2024-08-04 16:53:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 1.0020 (0.7796) Acc@1 77.002 (82.382) Acc@5 94.141 (96.373) Mem 16704MB [2024-08-04 16:53:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 1.1445 (0.9328) Acc@1 72.705 (78.704) Acc@5 92.578 (94.638) Mem 16704MB [2024-08-04 16:53:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.469 Acc@5 94.602 [2024-08-04 16:53:11 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-08-04 16:53:11 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.47% [2024-08-04 16:53:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 16:53:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 16:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5420 (0.5420) Acc@1 87.207 (87.207) Acc@5 98.145 (98.145) Mem 16704MB [2024-08-04 16:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.147) Loss 0.9097 (0.6835) Acc@1 77.393 (83.794) Acc@5 94.727 (96.933) Mem 16704MB [2024-08-04 16:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.132) Loss 1.0664 (0.8316) Acc@1 72.852 (80.129) Acc@5 93.115 (95.247) Mem 16704MB [2024-08-04 16:53:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.830 Acc@5 95.238 [2024-08-04 16:53:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-04 16:53:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.83% [2024-08-04 16:53:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:53:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][0/625] eta 0:07:57 lr 0.001150 wd 0.0500 time 0.7637 (0.7637) data time 0.3791 (0.3791) model time 0.0000 (0.0000) loss 3.3786 (3.3786) grad_norm 1.2621 (1.2621) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][10/625] eta 0:04:53 lr 0.001150 wd 0.0500 time 0.4480 (0.4771) data time 0.0007 (0.0353) model time 0.0000 (0.0000) loss 3.5288 (3.4872) grad_norm 1.2823 (1.5026) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][20/625] eta 0:04:39 lr 0.001149 wd 0.0500 time 0.4475 (0.4625) data time 0.0008 (0.0188) model time 0.0000 (0.0000) loss 3.7706 (3.3939) grad_norm 1.3625 (1.4093) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][30/625] eta 0:04:32 lr 0.001149 wd 0.0500 time 0.4439 (0.4574) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 3.1811 (3.3407) grad_norm 0.8741 (1.3962) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][40/625] eta 0:04:28 lr 0.001149 wd 0.0500 time 0.4458 (0.4587) data time 0.0007 (0.0101) model time 0.0000 (0.0000) loss 3.8066 (3.3748) grad_norm 1.4330 (1.5016) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 16:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][50/625] eta 0:04:22 lr 0.001149 wd 0.0500 time 0.4361 (0.4566) data time 0.0007 (0.0082) model time 0.0000 (0.0000) loss 3.7356 (3.4323) grad_norm inf (inf) loss_scale 2048.0000 (4055.8431) mem 16704MB [2024-08-04 16:53:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][60/625] eta 0:04:17 lr 0.001149 wd 0.0500 time 0.4481 (0.4552) data time 0.0009 (0.0070) model time 0.4472 (0.4472) loss 2.4522 (3.3635) grad_norm 0.9320 (inf) loss_scale 2048.0000 (3726.6885) mem 16704MB [2024-08-04 16:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][70/625] eta 0:04:12 lr 0.001149 wd 0.0500 time 0.4496 (0.4543) data time 0.0006 (0.0062) model time 0.4489 (0.4477) loss 3.3840 (3.3243) grad_norm 1.5659 (inf) loss_scale 2048.0000 (3490.2535) mem 16704MB [2024-08-04 16:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][80/625] eta 0:04:07 lr 0.001149 wd 0.0500 time 0.4502 (0.4535) data time 0.0008 (0.0055) model time 0.4494 (0.4474) loss 2.4574 (3.3695) grad_norm 1.6070 (inf) loss_scale 2048.0000 (3312.1975) mem 16704MB [2024-08-04 16:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][90/625] eta 0:04:02 lr 0.001149 wd 0.0500 time 0.4508 (0.4530) data time 0.0008 (0.0050) model time 0.4500 (0.4477) loss 3.0840 (3.3613) grad_norm 2.0087 (inf) loss_scale 2048.0000 (3173.2747) mem 16704MB [2024-08-04 16:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][100/625] eta 0:03:57 lr 0.001149 wd 0.0500 time 0.4512 (0.4528) data time 0.0009 (0.0046) model time 0.4503 (0.4481) loss 3.6851 (3.3786) grad_norm 1.7868 (inf) loss_scale 2048.0000 (3061.8614) mem 16704MB [2024-08-04 16:54:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][110/625] eta 0:03:53 lr 0.001149 wd 0.0500 time 0.4514 (0.4526) data time 0.0008 (0.0042) model time 0.4506 (0.4484) loss 3.6320 (3.3498) grad_norm 1.6127 (inf) loss_scale 2048.0000 (2970.5225) mem 16704MB [2024-08-04 16:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][120/625] eta 0:03:49 lr 0.001149 wd 0.0500 time 0.4526 (0.4539) data time 0.0008 (0.0040) model time 0.4518 (0.4511) loss 2.8910 (3.3518) grad_norm 1.1906 (inf) loss_scale 2048.0000 (2894.2810) mem 16704MB [2024-08-04 16:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][130/625] eta 0:03:44 lr 0.001149 wd 0.0500 time 0.4485 (0.4535) data time 0.0010 (0.0037) model time 0.4475 (0.4508) loss 2.7303 (3.3342) grad_norm 1.3330 (inf) loss_scale 2048.0000 (2829.6794) mem 16704MB [2024-08-04 16:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][140/625] eta 0:03:39 lr 0.001149 wd 0.0500 time 0.4480 (0.4531) data time 0.0008 (0.0035) model time 0.4471 (0.4504) loss 3.5699 (3.3306) grad_norm 1.1240 (inf) loss_scale 2048.0000 (2774.2411) mem 16704MB [2024-08-04 16:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][150/625] eta 0:03:35 lr 0.001149 wd 0.0500 time 0.4556 (0.4528) data time 0.0007 (0.0033) model time 0.4548 (0.4500) loss 3.9120 (3.3272) grad_norm 1.0950 (inf) loss_scale 2048.0000 (2726.1457) mem 16704MB [2024-08-04 16:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][160/625] eta 0:03:30 lr 0.001149 wd 0.0500 time 0.4423 (0.4523) data time 0.0008 (0.0032) model time 0.4415 (0.4494) loss 2.9717 (3.3268) grad_norm 3.3038 (inf) loss_scale 2048.0000 (2684.0248) mem 16704MB [2024-08-04 16:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][170/625] eta 0:03:25 lr 0.001149 wd 0.0500 time 0.4454 (0.4519) data time 0.0006 (0.0030) model time 0.4448 (0.4491) loss 3.5281 (3.3337) grad_norm 1.4274 (inf) loss_scale 2048.0000 (2646.8304) mem 16704MB [2024-08-04 16:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][180/625] eta 0:03:20 lr 0.001149 wd 0.0500 time 0.4489 (0.4516) data time 0.0008 (0.0029) model time 0.4480 (0.4489) loss 3.4388 (3.3443) grad_norm 1.1352 (inf) loss_scale 2048.0000 (2613.7459) mem 16704MB [2024-08-04 16:54:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][190/625] eta 0:03:16 lr 0.001149 wd 0.0500 time 0.4484 (0.4514) data time 0.0006 (0.0028) model time 0.4478 (0.4487) loss 2.7537 (3.3391) grad_norm 1.4886 (inf) loss_scale 2048.0000 (2584.1257) mem 16704MB [2024-08-04 16:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][200/625] eta 0:03:11 lr 0.001149 wd 0.0500 time 0.4438 (0.4512) data time 0.0006 (0.0027) model time 0.4432 (0.4486) loss 2.0304 (3.3238) grad_norm 1.4908 (inf) loss_scale 2048.0000 (2557.4527) mem 16704MB [2024-08-04 16:54:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][210/625] eta 0:03:07 lr 0.001149 wd 0.0500 time 0.4482 (0.4510) data time 0.0006 (0.0026) model time 0.4476 (0.4483) loss 4.4647 (3.3406) grad_norm 1.9906 (inf) loss_scale 2048.0000 (2533.3081) mem 16704MB [2024-08-04 16:55:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][220/625] eta 0:03:02 lr 0.001149 wd 0.0500 time 0.4463 (0.4508) data time 0.0008 (0.0025) model time 0.4455 (0.4482) loss 3.7968 (3.3611) grad_norm 1.5458 (inf) loss_scale 2048.0000 (2511.3484) mem 16704MB [2024-08-04 16:55:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][230/625] eta 0:02:57 lr 0.001149 wd 0.0500 time 0.4465 (0.4505) data time 0.0008 (0.0025) model time 0.4457 (0.4480) loss 3.3252 (3.3612) grad_norm 1.1468 (inf) loss_scale 2048.0000 (2491.2900) mem 16704MB [2024-08-04 16:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][240/625] eta 0:02:53 lr 0.001149 wd 0.0500 time 0.4486 (0.4504) data time 0.0008 (0.0024) model time 0.4478 (0.4479) loss 3.6150 (3.3668) grad_norm 1.6084 (inf) loss_scale 2048.0000 (2472.8963) mem 16704MB [2024-08-04 16:55:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][250/625] eta 0:02:48 lr 0.001148 wd 0.0500 time 0.4423 (0.4502) data time 0.0008 (0.0023) model time 0.4415 (0.4477) loss 3.5464 (3.3673) grad_norm 1.3280 (inf) loss_scale 2048.0000 (2455.9681) mem 16704MB [2024-08-04 16:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][260/625] eta 0:02:44 lr 0.001148 wd 0.0500 time 0.4520 (0.4501) data time 0.0008 (0.0023) model time 0.4512 (0.4477) loss 3.5836 (3.3649) grad_norm 1.2354 (inf) loss_scale 2048.0000 (2440.3372) mem 16704MB [2024-08-04 16:55:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][270/625] eta 0:02:39 lr 0.001148 wd 0.0500 time 0.4514 (0.4500) data time 0.0007 (0.0022) model time 0.4507 (0.4477) loss 3.0975 (3.3601) grad_norm 1.6168 (inf) loss_scale 2048.0000 (2425.8598) mem 16704MB [2024-08-04 16:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][280/625] eta 0:02:35 lr 0.001148 wd 0.0500 time 0.4500 (0.4500) data time 0.0008 (0.0022) model time 0.4492 (0.4478) loss 3.0219 (3.3600) grad_norm 1.4825 (inf) loss_scale 2048.0000 (2412.4128) mem 16704MB [2024-08-04 16:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][290/625] eta 0:02:30 lr 0.001148 wd 0.0500 time 0.4449 (0.4500) data time 0.0008 (0.0021) model time 0.4441 (0.4478) loss 3.1972 (3.3472) grad_norm 1.3192 (inf) loss_scale 2048.0000 (2399.8900) mem 16704MB [2024-08-04 16:55:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][300/625] eta 0:02:26 lr 0.001148 wd 0.0500 time 0.4469 (0.4499) data time 0.0008 (0.0021) model time 0.4461 (0.4477) loss 3.6916 (3.3519) grad_norm 1.0676 (inf) loss_scale 2048.0000 (2388.1993) mem 16704MB [2024-08-04 16:55:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][310/625] eta 0:02:21 lr 0.001148 wd 0.0500 time 0.4456 (0.4497) data time 0.0006 (0.0020) model time 0.4451 (0.4476) loss 4.2213 (3.3573) grad_norm 1.2934 (inf) loss_scale 2048.0000 (2377.2605) mem 16704MB [2024-08-04 16:55:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][320/625] eta 0:02:17 lr 0.001148 wd 0.0500 time 0.4438 (0.4496) data time 0.0008 (0.0020) model time 0.4430 (0.4475) loss 3.0956 (3.3676) grad_norm 1.1715 (inf) loss_scale 2048.0000 (2367.0031) mem 16704MB [2024-08-04 16:55:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][330/625] eta 0:02:12 lr 0.001148 wd 0.0500 time 0.4465 (0.4495) data time 0.0006 (0.0020) model time 0.4458 (0.4474) loss 3.7696 (3.3693) grad_norm 1.7222 (inf) loss_scale 2048.0000 (2357.3656) mem 16704MB [2024-08-04 16:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][340/625] eta 0:02:08 lr 0.001148 wd 0.0500 time 0.4488 (0.4502) data time 0.0006 (0.0019) model time 0.4482 (0.4482) loss 3.9894 (3.3725) grad_norm 1.3698 (inf) loss_scale 2048.0000 (2348.2933) mem 16704MB [2024-08-04 16:55:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][350/625] eta 0:02:03 lr 0.001148 wd 0.0500 time 0.4464 (0.4501) data time 0.0006 (0.0019) model time 0.4458 (0.4482) loss 3.8826 (3.3636) grad_norm 1.4756 (inf) loss_scale 2048.0000 (2339.7379) mem 16704MB [2024-08-04 16:56:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][360/625] eta 0:01:59 lr 0.001148 wd 0.0500 time 0.4482 (0.4501) data time 0.0006 (0.0019) model time 0.4476 (0.4482) loss 2.7718 (3.3527) grad_norm 1.6846 (inf) loss_scale 2048.0000 (2331.6565) mem 16704MB [2024-08-04 16:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][370/625] eta 0:01:54 lr 0.001148 wd 0.0500 time 0.5806 (0.4504) data time 0.0008 (0.0018) model time 0.5798 (0.4486) loss 3.4998 (3.3562) grad_norm 0.9881 (inf) loss_scale 2048.0000 (2324.0108) mem 16704MB [2024-08-04 16:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][380/625] eta 0:01:50 lr 0.001148 wd 0.0500 time 0.4510 (0.4503) data time 0.0009 (0.0018) model time 0.4502 (0.4485) loss 3.3851 (3.3536) grad_norm 1.9930 (inf) loss_scale 2048.0000 (2316.7664) mem 16704MB [2024-08-04 16:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][390/625] eta 0:01:45 lr 0.001148 wd 0.0500 time 0.4424 (0.4502) data time 0.0008 (0.0018) model time 0.4416 (0.4484) loss 3.5653 (3.3579) grad_norm 0.8913 (inf) loss_scale 2048.0000 (2309.8926) mem 16704MB [2024-08-04 16:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][400/625] eta 0:01:41 lr 0.001148 wd 0.0500 time 0.4481 (0.4502) data time 0.0008 (0.0018) model time 0.4473 (0.4484) loss 3.6271 (3.3610) grad_norm 2.4452 (inf) loss_scale 2048.0000 (2303.3616) mem 16704MB [2024-08-04 16:56:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][410/625] eta 0:01:36 lr 0.001148 wd 0.0500 time 0.4476 (0.4501) data time 0.0008 (0.0017) model time 0.4467 (0.4484) loss 3.7074 (3.3652) grad_norm 1.3142 (inf) loss_scale 2048.0000 (2297.1484) mem 16704MB [2024-08-04 16:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][420/625] eta 0:01:32 lr 0.001148 wd 0.0500 time 0.4522 (0.4501) data time 0.0008 (0.0017) model time 0.4514 (0.4484) loss 2.5145 (3.3634) grad_norm 1.3229 (inf) loss_scale 2048.0000 (2291.2304) mem 16704MB [2024-08-04 16:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][430/625] eta 0:01:27 lr 0.001148 wd 0.0500 time 0.4437 (0.4501) data time 0.0007 (0.0017) model time 0.4430 (0.4484) loss 4.0172 (3.3651) grad_norm 1.8498 (inf) loss_scale 2048.0000 (2285.5870) mem 16704MB [2024-08-04 16:56:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][440/625] eta 0:01:23 lr 0.001148 wd 0.0500 time 0.4522 (0.4500) data time 0.0006 (0.0017) model time 0.4516 (0.4483) loss 3.5378 (3.3711) grad_norm 1.1417 (inf) loss_scale 2048.0000 (2280.1995) mem 16704MB [2024-08-04 16:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][450/625] eta 0:01:18 lr 0.001148 wd 0.0500 time 0.4443 (0.4500) data time 0.0008 (0.0017) model time 0.4435 (0.4483) loss 3.6867 (3.3724) grad_norm 1.3662 (inf) loss_scale 2048.0000 (2275.0510) mem 16704MB [2024-08-04 16:56:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][460/625] eta 0:01:14 lr 0.001148 wd 0.0500 time 0.4494 (0.4499) data time 0.0008 (0.0016) model time 0.4486 (0.4483) loss 3.3996 (3.3706) grad_norm 1.0925 (inf) loss_scale 2048.0000 (2270.1258) mem 16704MB [2024-08-04 16:56:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][470/625] eta 0:01:09 lr 0.001148 wd 0.0500 time 0.4434 (0.4499) data time 0.0007 (0.0016) model time 0.4427 (0.4482) loss 2.6036 (3.3688) grad_norm 0.9893 (inf) loss_scale 2048.0000 (2265.4098) mem 16704MB [2024-08-04 16:56:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][480/625] eta 0:01:05 lr 0.001147 wd 0.0500 time 0.4482 (0.4502) data time 0.0007 (0.0016) model time 0.4475 (0.4487) loss 4.0777 (3.3694) grad_norm 1.1638 (inf) loss_scale 2048.0000 (2260.8898) mem 16704MB [2024-08-04 16:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][490/625] eta 0:01:00 lr 0.001147 wd 0.0500 time 0.4441 (0.4502) data time 0.0008 (0.0016) model time 0.4433 (0.4486) loss 3.5317 (3.3681) grad_norm 1.9768 (inf) loss_scale 2048.0000 (2256.5540) mem 16704MB [2024-08-04 16:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][500/625] eta 0:00:56 lr 0.001147 wd 0.0500 time 0.4567 (0.4501) data time 0.0006 (0.0016) model time 0.4561 (0.4486) loss 3.4774 (3.3732) grad_norm 1.0603 (inf) loss_scale 2048.0000 (2252.3912) mem 16704MB [2024-08-04 16:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][510/625] eta 0:00:51 lr 0.001147 wd 0.0500 time 0.4465 (0.4501) data time 0.0009 (0.0016) model time 0.4456 (0.4486) loss 3.9151 (3.3726) grad_norm 0.9616 (inf) loss_scale 2048.0000 (2248.3914) mem 16704MB [2024-08-04 16:57:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][520/625] eta 0:00:47 lr 0.001147 wd 0.0500 time 0.4454 (0.4500) data time 0.0006 (0.0015) model time 0.4448 (0.4485) loss 2.6417 (3.3588) grad_norm 1.3913 (inf) loss_scale 2048.0000 (2244.5451) mem 16704MB [2024-08-04 16:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][530/625] eta 0:00:42 lr 0.001147 wd 0.0500 time 0.4479 (0.4500) data time 0.0007 (0.0015) model time 0.4472 (0.4485) loss 3.3889 (3.3533) grad_norm 1.0217 (inf) loss_scale 2048.0000 (2240.8437) mem 16704MB [2024-08-04 16:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][540/625] eta 0:00:38 lr 0.001147 wd 0.0500 time 0.4497 (0.4499) data time 0.0008 (0.0015) model time 0.4489 (0.4484) loss 3.8520 (3.3534) grad_norm 1.1624 (inf) loss_scale 2048.0000 (2237.2791) mem 16704MB [2024-08-04 16:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][550/625] eta 0:00:33 lr 0.001147 wd 0.0500 time 0.4435 (0.4499) data time 0.0009 (0.0015) model time 0.4426 (0.4484) loss 2.8669 (3.3548) grad_norm 1.4034 (inf) loss_scale 2048.0000 (2233.8439) mem 16704MB [2024-08-04 16:57:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][560/625] eta 0:00:29 lr 0.001147 wd 0.0500 time 0.3893 (0.4499) data time 0.0007 (0.0015) model time 0.3887 (0.4484) loss 4.0315 (3.3596) grad_norm 1.9547 (inf) loss_scale 2048.0000 (2230.5312) mem 16704MB [2024-08-04 16:57:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][570/625] eta 0:00:24 lr 0.001147 wd 0.0500 time 0.4466 (0.4499) data time 0.0008 (0.0015) model time 0.4458 (0.4484) loss 3.5523 (3.3574) grad_norm 1.3818 (inf) loss_scale 2048.0000 (2227.3345) mem 16704MB [2024-08-04 16:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][580/625] eta 0:00:20 lr 0.001147 wd 0.0500 time 0.4511 (0.4499) data time 0.0006 (0.0015) model time 0.4505 (0.4484) loss 2.3317 (3.3568) grad_norm 1.3372 (inf) loss_scale 2048.0000 (2224.2478) mem 16704MB [2024-08-04 16:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][590/625] eta 0:00:15 lr 0.001147 wd 0.0500 time 0.4510 (0.4498) data time 0.0006 (0.0015) model time 0.4504 (0.4484) loss 3.7579 (3.3567) grad_norm 1.1882 (inf) loss_scale 2048.0000 (2221.2657) mem 16704MB [2024-08-04 16:57:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][600/625] eta 0:00:11 lr 0.001147 wd 0.0500 time 0.4477 (0.4498) data time 0.0008 (0.0015) model time 0.4469 (0.4484) loss 3.0344 (3.3545) grad_norm 1.3985 (inf) loss_scale 2048.0000 (2218.3827) mem 16704MB [2024-08-04 16:57:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][610/625] eta 0:00:06 lr 0.001147 wd 0.0500 time 0.4417 (0.4498) data time 0.0006 (0.0014) model time 0.4411 (0.4483) loss 3.5663 (3.3533) grad_norm 1.3155 (inf) loss_scale 2048.0000 (2215.5941) mem 16704MB [2024-08-04 16:58:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [57/300][620/625] eta 0:00:02 lr 0.001147 wd 0.0500 time 0.4432 (0.4497) data time 0.0004 (0.0014) model time 0.4428 (0.4482) loss 4.1173 (3.3516) grad_norm 1.1953 (inf) loss_scale 2048.0000 (2212.8953) mem 16704MB [2024-08-04 16:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 57 training takes 0:04:41 [2024-08-04 16:58:02 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 16:58:04 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 16:58:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.6362 (0.6362) Acc@1 86.182 (86.182) Acc@5 98.096 (98.096) Mem 16704MB [2024-08-04 16:58:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 1.0244 (0.7776) Acc@1 75.928 (82.568) Acc@5 93.408 (96.604) Mem 16704MB [2024-08-04 16:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1680 (0.9318) Acc@1 71.484 (79.071) Acc@5 92.236 (94.680) Mem 16704MB [2024-08-04 16:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.813 Acc@5 94.698 [2024-08-04 16:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-08-04 16:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.81% [2024-08-04 16:58:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 16:58:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 16:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5400 (0.5400) Acc@1 87.207 (87.207) Acc@5 98.291 (98.291) Mem 16704MB [2024-08-04 16:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.148) Loss 0.9072 (0.6823) Acc@1 77.490 (83.887) Acc@5 94.873 (96.973) Mem 16704MB [2024-08-04 16:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.132) Loss 1.0615 (0.8288) Acc@1 73.047 (80.239) Acc@5 93.164 (95.271) Mem 16704MB [2024-08-04 16:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.946 Acc@5 95.264 [2024-08-04 16:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-04 16:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 79.95% [2024-08-04 16:58:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 16:58:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 16:58:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][0/625] eta 0:08:12 lr 0.001147 wd 0.0500 time 0.7873 (0.7873) data time 0.4037 (0.4037) model time 0.0000 (0.0000) loss 3.6352 (3.6352) grad_norm 1.1091 (1.1091) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][10/625] eta 0:04:55 lr 0.001147 wd 0.0500 time 0.4515 (0.4798) data time 0.0007 (0.0375) model time 0.0000 (0.0000) loss 3.5215 (3.5184) grad_norm 1.2953 (1.3038) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][20/625] eta 0:04:41 lr 0.001147 wd 0.0500 time 0.4442 (0.4651) data time 0.0009 (0.0200) model time 0.0000 (0.0000) loss 3.3794 (3.4924) grad_norm 1.9006 (1.3123) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][30/625] eta 0:04:33 lr 0.001147 wd 0.0500 time 0.4418 (0.4591) data time 0.0008 (0.0138) model time 0.0000 (0.0000) loss 2.4687 (3.4380) grad_norm 1.2651 (1.3523) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][40/625] eta 0:04:27 lr 0.001147 wd 0.0500 time 0.4477 (0.4565) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 3.6772 (3.4456) grad_norm 1.0511 (1.3634) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][50/625] eta 0:04:23 lr 0.001147 wd 0.0500 time 0.4444 (0.4574) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 2.8728 (3.4586) grad_norm 1.6715 (1.4063) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][60/625] eta 0:04:17 lr 0.001147 wd 0.0500 time 0.4511 (0.4557) data time 0.0006 (0.0074) model time 0.4504 (0.4464) loss 3.8061 (3.3430) grad_norm 1.2607 (1.4062) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][70/625] eta 0:04:12 lr 0.001147 wd 0.0500 time 0.4474 (0.4548) data time 0.0006 (0.0065) model time 0.4468 (0.4473) loss 3.1959 (3.2726) grad_norm 1.0501 (1.3884) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][80/625] eta 0:04:07 lr 0.001146 wd 0.0500 time 0.4468 (0.4540) data time 0.0006 (0.0058) model time 0.4462 (0.4473) loss 4.1770 (3.2924) grad_norm 0.9944 (1.3674) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:58:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][90/625] eta 0:04:03 lr 0.001146 wd 0.0500 time 0.4474 (0.4555) data time 0.0006 (0.0052) model time 0.4468 (0.4523) loss 2.9981 (3.2764) grad_norm 3.6288 (1.3927) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][100/625] eta 0:03:58 lr 0.001146 wd 0.0500 time 0.4484 (0.4548) data time 0.0008 (0.0048) model time 0.4476 (0.4513) loss 3.8702 (3.2986) grad_norm 1.1547 (1.4155) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][110/625] eta 0:03:53 lr 0.001146 wd 0.0500 time 0.4487 (0.4542) data time 0.0007 (0.0044) model time 0.4480 (0.4508) loss 4.0005 (3.3085) grad_norm 1.1820 (1.3944) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][120/625] eta 0:03:49 lr 0.001146 wd 0.0500 time 0.4447 (0.4537) data time 0.0006 (0.0041) model time 0.4441 (0.4502) loss 2.9112 (3.3140) grad_norm 1.6996 (1.4038) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][130/625] eta 0:03:44 lr 0.001146 wd 0.0500 time 0.4462 (0.4533) data time 0.0006 (0.0039) model time 0.4456 (0.4498) loss 3.7132 (3.3079) grad_norm 1.2113 (1.4019) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][140/625] eta 0:03:39 lr 0.001146 wd 0.0500 time 0.4546 (0.4529) data time 0.0008 (0.0036) model time 0.4538 (0.4495) loss 3.5133 (3.3185) grad_norm 1.2190 (1.4130) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][150/625] eta 0:03:34 lr 0.001146 wd 0.0500 time 0.4473 (0.4525) data time 0.0008 (0.0035) model time 0.4464 (0.4492) loss 3.2732 (3.3200) grad_norm 1.1539 (1.4054) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][160/625] eta 0:03:30 lr 0.001146 wd 0.0500 time 0.4445 (0.4521) data time 0.0006 (0.0033) model time 0.4439 (0.4488) loss 3.2090 (3.3190) grad_norm 0.9297 (1.3930) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][170/625] eta 0:03:25 lr 0.001146 wd 0.0500 time 0.4448 (0.4517) data time 0.0008 (0.0031) model time 0.4441 (0.4485) loss 3.8180 (3.3033) grad_norm 2.6546 (1.4023) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][180/625] eta 0:03:20 lr 0.001146 wd 0.0500 time 0.4474 (0.4514) data time 0.0008 (0.0030) model time 0.4467 (0.4482) loss 3.1823 (3.3177) grad_norm 1.7240 (1.4174) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][190/625] eta 0:03:16 lr 0.001146 wd 0.0500 time 0.4413 (0.4511) data time 0.0008 (0.0029) model time 0.4405 (0.4480) loss 2.7009 (3.3171) grad_norm 1.6017 (1.4285) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][200/625] eta 0:03:11 lr 0.001146 wd 0.0500 time 0.4472 (0.4509) data time 0.0006 (0.0028) model time 0.4466 (0.4479) loss 2.9482 (3.3160) grad_norm 1.3726 (1.4134) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][210/625] eta 0:03:07 lr 0.001146 wd 0.0500 time 0.4483 (0.4508) data time 0.0005 (0.0027) model time 0.4478 (0.4478) loss 2.5910 (3.3327) grad_norm 1.3291 (1.4119) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][220/625] eta 0:03:02 lr 0.001146 wd 0.0500 time 0.4490 (0.4507) data time 0.0008 (0.0026) model time 0.4482 (0.4478) loss 3.5171 (3.3309) grad_norm 1.7877 (1.4230) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 16:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][230/625] eta 0:02:57 lr 0.001146 wd 0.0500 time 0.4470 (0.4505) data time 0.0008 (0.0026) model time 0.4461 (0.4478) loss 3.0706 (3.3301) grad_norm 1.2141 (1.4220) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][240/625] eta 0:02:53 lr 0.001146 wd 0.0500 time 0.4494 (0.4504) data time 0.0008 (0.0025) model time 0.4487 (0.4477) loss 3.0197 (3.3342) grad_norm 1.0387 (1.4175) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][250/625] eta 0:02:48 lr 0.001146 wd 0.0500 time 0.4489 (0.4503) data time 0.0007 (0.0024) model time 0.4482 (0.4477) loss 3.3104 (3.3325) grad_norm 1.5376 (1.4163) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][260/625] eta 0:02:44 lr 0.001146 wd 0.0500 time 0.4484 (0.4502) data time 0.0006 (0.0024) model time 0.4477 (0.4476) loss 4.0615 (3.3281) grad_norm 1.4352 (1.4250) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][270/625] eta 0:02:39 lr 0.001146 wd 0.0500 time 0.4435 (0.4501) data time 0.0006 (0.0023) model time 0.4429 (0.4476) loss 3.4638 (3.3228) grad_norm 1.5502 (1.4262) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][280/625] eta 0:02:35 lr 0.001146 wd 0.0500 time 0.4482 (0.4500) data time 0.0006 (0.0023) model time 0.4476 (0.4475) loss 2.2491 (3.3152) grad_norm 1.2193 (1.4218) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][290/625] eta 0:02:30 lr 0.001146 wd 0.0500 time 0.4474 (0.4499) data time 0.0006 (0.0022) model time 0.4469 (0.4475) loss 3.5774 (3.3265) grad_norm 1.2253 (1.4131) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][300/625] eta 0:02:26 lr 0.001145 wd 0.0500 time 0.4507 (0.4499) data time 0.0006 (0.0022) model time 0.4500 (0.4475) loss 2.5178 (3.3271) grad_norm 2.1677 (1.4130) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][310/625] eta 0:02:21 lr 0.001145 wd 0.0500 time 0.4444 (0.4498) data time 0.0006 (0.0021) model time 0.4438 (0.4475) loss 3.7486 (3.3196) grad_norm 2.0327 (1.4228) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][320/625] eta 0:02:17 lr 0.001145 wd 0.0500 time 0.4445 (0.4497) data time 0.0008 (0.0021) model time 0.4437 (0.4474) loss 2.9659 (3.3199) grad_norm 1.4291 (1.4203) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][330/625] eta 0:02:12 lr 0.001145 wd 0.0500 time 0.4447 (0.4496) data time 0.0008 (0.0020) model time 0.4440 (0.4473) loss 3.5838 (3.3090) grad_norm 1.7670 (1.4197) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][340/625] eta 0:02:08 lr 0.001145 wd 0.0500 time 0.4449 (0.4495) data time 0.0006 (0.0020) model time 0.4443 (0.4473) loss 3.8518 (3.3071) grad_norm 1.7742 (1.4224) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][350/625] eta 0:02:03 lr 0.001145 wd 0.0500 time 0.4499 (0.4494) data time 0.0006 (0.0020) model time 0.4493 (0.4472) loss 2.8137 (3.3050) grad_norm 1.2554 (1.4304) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][360/625] eta 0:01:59 lr 0.001145 wd 0.0500 time 0.4462 (0.4493) data time 0.0008 (0.0019) model time 0.4454 (0.4471) loss 3.6967 (3.3008) grad_norm 1.7649 (1.4337) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][370/625] eta 0:01:54 lr 0.001145 wd 0.0500 time 0.4501 (0.4493) data time 0.0006 (0.0019) model time 0.4495 (0.4472) loss 3.2479 (3.3021) grad_norm 1.2060 (1.4376) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][380/625] eta 0:01:50 lr 0.001145 wd 0.0500 time 0.4496 (0.4496) data time 0.0008 (0.0019) model time 0.4488 (0.4476) loss 2.4465 (3.2923) grad_norm 1.2071 (1.4336) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][390/625] eta 0:01:45 lr 0.001145 wd 0.0500 time 0.4481 (0.4496) data time 0.0006 (0.0018) model time 0.4475 (0.4476) loss 2.5873 (3.2980) grad_norm 1.1268 (1.4335) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][400/625] eta 0:01:41 lr 0.001145 wd 0.0500 time 0.4488 (0.4495) data time 0.0008 (0.0018) model time 0.4480 (0.4476) loss 3.5552 (3.2895) grad_norm 1.4734 (1.4301) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][410/625] eta 0:01:36 lr 0.001145 wd 0.0500 time 0.4477 (0.4495) data time 0.0008 (0.0018) model time 0.4469 (0.4476) loss 3.4018 (3.2871) grad_norm 1.2470 (1.4252) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][420/625] eta 0:01:32 lr 0.001145 wd 0.0500 time 0.4504 (0.4500) data time 0.0006 (0.0018) model time 0.4498 (0.4481) loss 3.3731 (3.2942) grad_norm 1.6714 (1.4245) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][430/625] eta 0:01:27 lr 0.001145 wd 0.0500 time 0.4450 (0.4499) data time 0.0008 (0.0017) model time 0.4442 (0.4481) loss 3.4977 (3.2923) grad_norm 0.9384 (1.4231) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][440/625] eta 0:01:23 lr 0.001145 wd 0.0500 time 0.4457 (0.4499) data time 0.0008 (0.0017) model time 0.4449 (0.4481) loss 3.8043 (3.2950) grad_norm 1.0474 (1.4204) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][450/625] eta 0:01:18 lr 0.001145 wd 0.0500 time 0.4488 (0.4499) data time 0.0006 (0.0017) model time 0.4481 (0.4481) loss 4.0613 (3.2995) grad_norm 1.2744 (1.4173) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][460/625] eta 0:01:14 lr 0.001145 wd 0.0500 time 0.4463 (0.4498) data time 0.0008 (0.0017) model time 0.4455 (0.4480) loss 3.0402 (3.2969) grad_norm 1.3641 (1.4241) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][470/625] eta 0:01:09 lr 0.001145 wd 0.0500 time 0.4465 (0.4498) data time 0.0006 (0.0017) model time 0.4459 (0.4480) loss 3.4184 (3.2962) grad_norm 1.2493 (1.4253) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][480/625] eta 0:01:05 lr 0.001145 wd 0.0500 time 0.4460 (0.4497) data time 0.0006 (0.0016) model time 0.4454 (0.4480) loss 4.1830 (3.2973) grad_norm 1.0025 (1.4244) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][490/625] eta 0:01:00 lr 0.001145 wd 0.0500 time 0.4456 (0.4496) data time 0.0008 (0.0016) model time 0.4448 (0.4479) loss 3.2017 (3.2987) grad_norm 1.0242 (1.4191) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][500/625] eta 0:00:56 lr 0.001145 wd 0.0500 time 0.4517 (0.4496) data time 0.0008 (0.0016) model time 0.4509 (0.4479) loss 3.9030 (3.3011) grad_norm 1.3823 (1.4168) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][510/625] eta 0:00:51 lr 0.001145 wd 0.0500 time 0.4445 (0.4495) data time 0.0008 (0.0016) model time 0.4437 (0.4478) loss 3.7382 (3.3002) grad_norm 1.0672 (1.4165) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][520/625] eta 0:00:47 lr 0.001145 wd 0.0500 time 0.4458 (0.4495) data time 0.0008 (0.0016) model time 0.4450 (0.4478) loss 3.8089 (3.3003) grad_norm 1.3307 (1.4171) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][530/625] eta 0:00:42 lr 0.001144 wd 0.0500 time 0.4449 (0.4494) data time 0.0010 (0.0016) model time 0.4439 (0.4478) loss 3.4113 (3.2999) grad_norm 1.8245 (1.4184) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][540/625] eta 0:00:38 lr 0.001144 wd 0.0500 time 0.4456 (0.4494) data time 0.0008 (0.0016) model time 0.4448 (0.4477) loss 2.7534 (3.3004) grad_norm 1.1268 (1.4184) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][550/625] eta 0:00:33 lr 0.001144 wd 0.0500 time 0.4504 (0.4493) data time 0.0006 (0.0015) model time 0.4498 (0.4477) loss 2.8557 (3.3062) grad_norm 1.5828 (1.4147) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][560/625] eta 0:00:29 lr 0.001144 wd 0.0500 time 0.4450 (0.4493) data time 0.0008 (0.0015) model time 0.4443 (0.4477) loss 2.6346 (3.3068) grad_norm 1.0873 (1.4144) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][570/625] eta 0:00:24 lr 0.001144 wd 0.0500 time 0.4433 (0.4493) data time 0.0008 (0.0015) model time 0.4426 (0.4477) loss 2.6103 (3.3095) grad_norm 1.5426 (1.4169) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][580/625] eta 0:00:20 lr 0.001144 wd 0.0500 time 0.4426 (0.4492) data time 0.0007 (0.0015) model time 0.4419 (0.4477) loss 3.0951 (3.3110) grad_norm 1.3372 (1.4169) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][590/625] eta 0:00:15 lr 0.001144 wd 0.0500 time 0.4490 (0.4492) data time 0.0006 (0.0015) model time 0.4484 (0.4476) loss 3.9016 (3.3112) grad_norm 1.5991 (1.4167) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][600/625] eta 0:00:11 lr 0.001144 wd 0.0500 time 0.4505 (0.4492) data time 0.0007 (0.0015) model time 0.4498 (0.4476) loss 3.7592 (3.3137) grad_norm 1.7660 (1.4225) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][610/625] eta 0:00:06 lr 0.001144 wd 0.0500 time 0.4431 (0.4494) data time 0.0004 (0.0015) model time 0.4427 (0.4479) loss 2.3665 (3.3130) grad_norm 2.4948 (1.4297) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [58/300][620/625] eta 0:00:02 lr 0.001144 wd 0.0500 time 0.4421 (0.4493) data time 0.0004 (0.0015) model time 0.4417 (0.4478) loss 3.6094 (3.3107) grad_norm 1.5909 (1.4278) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 58 training takes 0:04:40 [2024-08-04 17:02:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:02:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.6299 (0.6299) Acc@1 86.426 (86.426) Acc@5 97.852 (97.852) Mem 16704MB [2024-08-04 17:02:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 1.1152 (0.7850) Acc@1 72.412 (82.049) Acc@5 93.506 (96.502) Mem 16704MB [2024-08-04 17:02:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 1.1191 (0.9277) Acc@1 73.828 (78.723) Acc@5 92.432 (94.817) Mem 16704MB [2024-08-04 17:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.483 Acc@5 94.748 [2024-08-04 17:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-08-04 17:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.915 (0.915) Loss 0.5386 (0.5386) Acc@1 87.354 (87.354) Acc@5 98.340 (98.340) Mem 16704MB [2024-08-04 17:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.191) Loss 0.9033 (0.6810) Acc@1 77.637 (83.975) Acc@5 94.727 (96.990) Mem 16704MB [2024-08-04 17:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.155) Loss 1.0557 (0.8258) Acc@1 73.047 (80.306) Acc@5 93.311 (95.331) Mem 16704MB [2024-08-04 17:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.030 Acc@5 95.333 [2024-08-04 17:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-04 17:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.03% [2024-08-04 17:03:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 17:03:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 17:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][0/625] eta 0:07:59 lr 0.001144 wd 0.0500 time 0.7672 (0.7672) data time 0.3840 (0.3840) model time 0.0000 (0.0000) loss 3.2941 (3.2941) grad_norm 1.1585 (1.1585) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][10/625] eta 0:04:52 lr 0.001144 wd 0.0500 time 0.4475 (0.4762) data time 0.0009 (0.0357) model time 0.0000 (0.0000) loss 3.0983 (3.1335) grad_norm 1.0521 (1.3336) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][20/625] eta 0:04:40 lr 0.001144 wd 0.0500 time 0.4418 (0.4634) data time 0.0006 (0.0190) model time 0.0000 (0.0000) loss 2.7080 (3.1481) grad_norm 1.2147 (1.3325) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][30/625] eta 0:04:32 lr 0.001144 wd 0.0500 time 0.4544 (0.4586) data time 0.0009 (0.0132) model time 0.0000 (0.0000) loss 2.8068 (3.1556) grad_norm 1.1628 (1.3281) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][40/625] eta 0:04:27 lr 0.001144 wd 0.0500 time 0.4571 (0.4565) data time 0.0009 (0.0102) model time 0.0000 (0.0000) loss 3.2443 (3.1503) grad_norm 1.2619 (1.4111) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][50/625] eta 0:04:21 lr 0.001144 wd 0.0500 time 0.4429 (0.4545) data time 0.0006 (0.0083) model time 0.0000 (0.0000) loss 3.7933 (3.1797) grad_norm 1.6215 (1.4124) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][60/625] eta 0:04:16 lr 0.001144 wd 0.0500 time 0.4448 (0.4532) data time 0.0007 (0.0071) model time 0.4442 (0.4456) loss 4.0330 (3.2446) grad_norm 1.5440 (1.4116) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][70/625] eta 0:04:11 lr 0.001144 wd 0.0500 time 0.4523 (0.4524) data time 0.0008 (0.0062) model time 0.4516 (0.4463) loss 2.4274 (3.2507) grad_norm 1.2928 (1.4035) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][80/625] eta 0:04:06 lr 0.001144 wd 0.0500 time 0.4480 (0.4518) data time 0.0008 (0.0055) model time 0.4472 (0.4465) loss 3.2353 (3.2539) grad_norm 1.3394 (1.3936) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][90/625] eta 0:04:01 lr 0.001144 wd 0.0500 time 0.4474 (0.4514) data time 0.0008 (0.0050) model time 0.4466 (0.4467) loss 3.5023 (3.2567) grad_norm 1.6037 (1.3843) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][100/625] eta 0:03:57 lr 0.001144 wd 0.0500 time 0.4478 (0.4517) data time 0.0007 (0.0046) model time 0.4471 (0.4479) loss 3.4758 (3.2591) grad_norm 1.3721 (1.3962) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][110/625] eta 0:03:52 lr 0.001144 wd 0.0500 time 0.4500 (0.4514) data time 0.0006 (0.0043) model time 0.4494 (0.4480) loss 3.9499 (3.2829) grad_norm 1.1918 (1.3924) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][120/625] eta 0:03:47 lr 0.001143 wd 0.0500 time 0.4479 (0.4512) data time 0.0006 (0.0040) model time 0.4473 (0.4479) loss 3.4086 (3.2894) grad_norm 1.5485 (1.3914) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][130/625] eta 0:03:43 lr 0.001143 wd 0.0500 time 0.4478 (0.4509) data time 0.0006 (0.0037) model time 0.4472 (0.4478) loss 3.0285 (3.2905) grad_norm 1.6792 (1.4121) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][140/625] eta 0:03:38 lr 0.001143 wd 0.0500 time 0.4492 (0.4508) data time 0.0009 (0.0035) model time 0.4483 (0.4478) loss 3.5742 (3.2903) grad_norm 2.4910 (1.4185) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][150/625] eta 0:03:33 lr 0.001143 wd 0.0500 time 0.4492 (0.4505) data time 0.0006 (0.0033) model time 0.4486 (0.4477) loss 4.1340 (3.2877) grad_norm 1.0178 (1.4239) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][160/625] eta 0:03:29 lr 0.001143 wd 0.0500 time 0.4481 (0.4503) data time 0.0006 (0.0032) model time 0.4475 (0.4475) loss 3.6038 (3.2868) grad_norm 1.2096 (1.4239) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][170/625] eta 0:03:24 lr 0.001143 wd 0.0500 time 0.4468 (0.4501) data time 0.0008 (0.0030) model time 0.4460 (0.4475) loss 2.1528 (3.2857) grad_norm 1.2531 (1.4125) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][180/625] eta 0:03:20 lr 0.001143 wd 0.0500 time 0.4464 (0.4500) data time 0.0008 (0.0029) model time 0.4456 (0.4474) loss 3.5913 (3.2864) grad_norm 1.3366 (1.4133) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][190/625] eta 0:03:15 lr 0.001143 wd 0.0500 time 0.4438 (0.4499) data time 0.0006 (0.0028) model time 0.4432 (0.4474) loss 3.7412 (3.2888) grad_norm 1.4546 (1.4089) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][200/625] eta 0:03:11 lr 0.001143 wd 0.0500 time 0.4459 (0.4498) data time 0.0008 (0.0027) model time 0.4451 (0.4474) loss 2.7666 (3.2885) grad_norm 1.7596 (1.4140) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][210/625] eta 0:03:06 lr 0.001143 wd 0.0500 time 0.4513 (0.4498) data time 0.0006 (0.0026) model time 0.4506 (0.4475) loss 3.7892 (3.2825) grad_norm 2.0255 (1.4285) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][220/625] eta 0:03:02 lr 0.001143 wd 0.0500 time 0.4447 (0.4498) data time 0.0010 (0.0025) model time 0.4437 (0.4476) loss 3.3433 (3.2855) grad_norm 1.5376 (1.4234) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][230/625] eta 0:02:57 lr 0.001143 wd 0.0500 time 0.4471 (0.4497) data time 0.0010 (0.0025) model time 0.4462 (0.4476) loss 3.8198 (3.2894) grad_norm 1.3727 (1.4266) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][240/625] eta 0:02:53 lr 0.001143 wd 0.0500 time 0.4534 (0.4497) data time 0.0006 (0.0024) model time 0.4528 (0.4476) loss 3.9551 (3.3080) grad_norm 1.2349 (1.4237) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:04:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][250/625] eta 0:02:48 lr 0.001143 wd 0.0500 time 0.4490 (0.4498) data time 0.0006 (0.0023) model time 0.4484 (0.4477) loss 2.5412 (3.2972) grad_norm 1.4898 (1.4294) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][260/625] eta 0:02:44 lr 0.001143 wd 0.0500 time 0.4424 (0.4497) data time 0.0007 (0.0023) model time 0.4417 (0.4477) loss 3.1867 (3.2954) grad_norm 1.4988 (1.4230) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][270/625] eta 0:02:39 lr 0.001143 wd 0.0500 time 0.4491 (0.4496) data time 0.0006 (0.0022) model time 0.4485 (0.4476) loss 3.4350 (3.2967) grad_norm 1.2129 (1.4211) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][280/625] eta 0:02:35 lr 0.001143 wd 0.0500 time 0.4515 (0.4495) data time 0.0007 (0.0022) model time 0.4508 (0.4476) loss 2.9574 (3.2992) grad_norm 1.2908 (1.4184) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][290/625] eta 0:02:30 lr 0.001143 wd 0.0500 time 0.4570 (0.4496) data time 0.0007 (0.0021) model time 0.4564 (0.4478) loss 2.9942 (3.2892) grad_norm 2.0194 (1.4166) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][300/625] eta 0:02:26 lr 0.001143 wd 0.0500 time 0.4509 (0.4496) data time 0.0006 (0.0021) model time 0.4503 (0.4478) loss 3.9853 (3.2926) grad_norm 1.0384 (1.4164) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][310/625] eta 0:02:21 lr 0.001143 wd 0.0500 time 0.4503 (0.4504) data time 0.0006 (0.0020) model time 0.4497 (0.4487) loss 3.8629 (3.2925) grad_norm 1.2769 (1.4221) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][320/625] eta 0:02:17 lr 0.001143 wd 0.0500 time 0.4558 (0.4509) data time 0.0006 (0.0020) model time 0.4552 (0.4494) loss 3.8418 (3.2980) grad_norm 1.2878 (1.4215) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][330/625] eta 0:02:13 lr 0.001143 wd 0.0500 time 0.4623 (0.4509) data time 0.0008 (0.0020) model time 0.4615 (0.4494) loss 2.9052 (3.2966) grad_norm 1.4945 (1.4257) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][340/625] eta 0:02:08 lr 0.001142 wd 0.0500 time 0.4505 (0.4508) data time 0.0008 (0.0019) model time 0.4497 (0.4494) loss 3.9230 (3.3000) grad_norm 1.0757 (1.4240) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][350/625] eta 0:02:03 lr 0.001142 wd 0.0500 time 0.4452 (0.4508) data time 0.0008 (0.0019) model time 0.4444 (0.4493) loss 3.3500 (3.3075) grad_norm 1.0060 (1.4235) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][360/625] eta 0:01:59 lr 0.001142 wd 0.0500 time 0.4448 (0.4507) data time 0.0007 (0.0019) model time 0.4440 (0.4493) loss 3.4882 (3.3115) grad_norm 1.9045 (1.4219) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][370/625] eta 0:01:54 lr 0.001142 wd 0.0500 time 0.4471 (0.4506) data time 0.0006 (0.0018) model time 0.4465 (0.4492) loss 3.0073 (3.3107) grad_norm 1.7411 (1.4220) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][380/625] eta 0:01:50 lr 0.001142 wd 0.0500 time 0.4510 (0.4506) data time 0.0008 (0.0018) model time 0.4502 (0.4492) loss 2.1581 (3.3075) grad_norm 1.1385 (1.4160) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][390/625] eta 0:01:45 lr 0.001142 wd 0.0500 time 0.4547 (0.4506) data time 0.0010 (0.0018) model time 0.4537 (0.4492) loss 2.5508 (3.3138) grad_norm 1.7380 (1.4158) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][400/625] eta 0:01:41 lr 0.001142 wd 0.0500 time 0.4516 (0.4505) data time 0.0008 (0.0018) model time 0.4508 (0.4491) loss 3.6627 (3.3118) grad_norm 1.7315 (1.4180) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][410/625] eta 0:01:36 lr 0.001142 wd 0.0500 time 0.4432 (0.4504) data time 0.0006 (0.0017) model time 0.4425 (0.4490) loss 2.1296 (3.3146) grad_norm 1.3567 (1.4181) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][420/625] eta 0:01:32 lr 0.001142 wd 0.0500 time 0.4508 (0.4504) data time 0.0006 (0.0017) model time 0.4502 (0.4490) loss 3.8548 (3.3190) grad_norm 1.3033 (1.4237) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][430/625] eta 0:01:27 lr 0.001142 wd 0.0500 time 0.4498 (0.4503) data time 0.0006 (0.0017) model time 0.4492 (0.4489) loss 3.8347 (3.3256) grad_norm 1.2055 (1.4291) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][440/625] eta 0:01:23 lr 0.001142 wd 0.0500 time 0.4447 (0.4502) data time 0.0009 (0.0017) model time 0.4438 (0.4489) loss 3.5838 (3.3210) grad_norm 1.1964 (1.4311) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][450/625] eta 0:01:18 lr 0.001142 wd 0.0500 time 0.4497 (0.4502) data time 0.0006 (0.0017) model time 0.4492 (0.4488) loss 3.2257 (3.3248) grad_norm 1.0397 (1.4325) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][460/625] eta 0:01:14 lr 0.001142 wd 0.0500 time 0.4437 (0.4504) data time 0.0006 (0.0016) model time 0.4431 (0.4491) loss 4.3483 (3.3289) grad_norm 1.7585 (1.4363) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][470/625] eta 0:01:09 lr 0.001142 wd 0.0500 time 0.4482 (0.4504) data time 0.0008 (0.0016) model time 0.4473 (0.4490) loss 2.5475 (3.3281) grad_norm 1.3616 (1.4352) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][480/625] eta 0:01:05 lr 0.001142 wd 0.0500 time 0.4459 (0.4503) data time 0.0008 (0.0016) model time 0.4451 (0.4490) loss 3.5901 (3.3335) grad_norm 1.0224 (1.4295) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][490/625] eta 0:01:00 lr 0.001142 wd 0.0500 time 0.4476 (0.4502) data time 0.0006 (0.0016) model time 0.4470 (0.4489) loss 4.1472 (3.3385) grad_norm 0.8974 (1.4253) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][500/625] eta 0:00:56 lr 0.001142 wd 0.0500 time 0.4506 (0.4505) data time 0.0008 (0.0016) model time 0.4498 (0.4492) loss 2.8760 (3.3387) grad_norm 1.7019 (1.4227) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][510/625] eta 0:00:51 lr 0.001142 wd 0.0500 time 0.4452 (0.4504) data time 0.0006 (0.0016) model time 0.4446 (0.4492) loss 4.0201 (3.3413) grad_norm 1.1238 (1.4221) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][520/625] eta 0:00:47 lr 0.001142 wd 0.0500 time 0.4485 (0.4504) data time 0.0006 (0.0015) model time 0.4480 (0.4491) loss 2.6081 (3.3451) grad_norm 1.1956 (1.4160) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][530/625] eta 0:00:42 lr 0.001142 wd 0.0500 time 0.4535 (0.4504) data time 0.0008 (0.0015) model time 0.4527 (0.4491) loss 3.4698 (3.3394) grad_norm 1.1591 (1.4180) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][540/625] eta 0:00:38 lr 0.001142 wd 0.0500 time 0.4454 (0.4503) data time 0.0006 (0.0015) model time 0.4448 (0.4491) loss 3.9863 (3.3387) grad_norm 2.1099 (1.4173) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][550/625] eta 0:00:33 lr 0.001142 wd 0.0500 time 0.4559 (0.4503) data time 0.0008 (0.0015) model time 0.4550 (0.4491) loss 3.7428 (3.3385) grad_norm 1.2389 (1.4164) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][560/625] eta 0:00:29 lr 0.001141 wd 0.0500 time 0.4487 (0.4503) data time 0.0008 (0.0015) model time 0.4479 (0.4491) loss 3.0616 (3.3395) grad_norm 1.4359 (1.4169) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][570/625] eta 0:00:24 lr 0.001141 wd 0.0500 time 0.4488 (0.4503) data time 0.0008 (0.0015) model time 0.4480 (0.4491) loss 2.8819 (3.3382) grad_norm 1.2988 (1.4174) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][580/625] eta 0:00:20 lr 0.001141 wd 0.0500 time 0.4472 (0.4504) data time 0.0006 (0.0015) model time 0.4466 (0.4492) loss 3.1533 (3.3322) grad_norm 1.1813 (1.4181) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][590/625] eta 0:00:15 lr 0.001141 wd 0.0500 time 0.4571 (0.4503) data time 0.0008 (0.0015) model time 0.4563 (0.4491) loss 3.1497 (3.3317) grad_norm 1.6638 (1.4184) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][600/625] eta 0:00:11 lr 0.001141 wd 0.0500 time 0.4462 (0.4503) data time 0.0006 (0.0014) model time 0.4456 (0.4491) loss 3.8029 (3.3328) grad_norm 1.3146 (1.4173) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][610/625] eta 0:00:06 lr 0.001141 wd 0.0500 time 0.4440 (0.4503) data time 0.0005 (0.0014) model time 0.4435 (0.4491) loss 1.9611 (3.3287) grad_norm 1.1891 (1.4175) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [59/300][620/625] eta 0:00:02 lr 0.001141 wd 0.0500 time 0.4436 (0.4502) data time 0.0006 (0.0014) model time 0.4431 (0.4490) loss 3.7553 (3.3311) grad_norm 2.1465 (1.4176) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 59 training takes 0:04:41 [2024-08-04 17:07:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:07:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.6187 (0.6187) Acc@1 86.963 (86.963) Acc@5 98.047 (98.047) Mem 16704MB [2024-08-04 17:07:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.0576 (0.7763) Acc@1 75.830 (82.777) Acc@5 93.848 (96.702) Mem 16704MB [2024-08-04 17:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1553 (0.9180) Acc@1 72.754 (79.222) Acc@5 91.992 (94.975) Mem 16704MB [2024-08-04 17:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.909 Acc@5 94.920 [2024-08-04 17:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-08-04 17:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.91% [2024-08-04 17:07:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 17:07:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 17:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.5366 (0.5366) Acc@1 87.354 (87.354) Acc@5 98.291 (98.291) Mem 16704MB [2024-08-04 17:07:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 0.9009 (0.6792) Acc@1 77.832 (84.007) Acc@5 94.824 (97.017) Mem 16704MB [2024-08-04 17:07:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.132) Loss 1.0488 (0.8226) Acc@1 73.096 (80.401) Acc@5 93.506 (95.394) Mem 16704MB [2024-08-04 17:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.132 Acc@5 95.401 [2024-08-04 17:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-08-04 17:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.13% [2024-08-04 17:08:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 17:08:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 17:08:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][0/625] eta 0:07:57 lr 0.001141 wd 0.0500 time 0.7638 (0.7638) data time 0.3772 (0.3772) model time 0.0000 (0.0000) loss 3.5413 (3.5413) grad_norm 1.3873 (1.3873) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][10/625] eta 0:04:52 lr 0.001141 wd 0.0500 time 0.4489 (0.4753) data time 0.0007 (0.0351) model time 0.0000 (0.0000) loss 3.7053 (3.2177) grad_norm 1.6759 (1.3613) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][20/625] eta 0:04:39 lr 0.001141 wd 0.0500 time 0.4462 (0.4621) data time 0.0006 (0.0188) model time 0.0000 (0.0000) loss 2.7871 (3.1199) grad_norm 1.2643 (1.3541) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][30/625] eta 0:04:32 lr 0.001141 wd 0.0500 time 0.4489 (0.4582) data time 0.0006 (0.0131) model time 0.0000 (0.0000) loss 3.3939 (3.0935) grad_norm 1.8532 (1.4042) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][40/625] eta 0:04:26 lr 0.001141 wd 0.0500 time 0.4480 (0.4559) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 2.1509 (3.1098) grad_norm 1.0238 (1.4624) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][50/625] eta 0:04:21 lr 0.001141 wd 0.0500 time 0.4467 (0.4543) data time 0.0009 (0.0083) model time 0.0000 (0.0000) loss 3.2706 (3.1155) grad_norm 1.3183 (1.4515) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][60/625] eta 0:04:16 lr 0.001141 wd 0.0500 time 0.4467 (0.4533) data time 0.0009 (0.0071) model time 0.4457 (0.4475) loss 3.2113 (3.1887) grad_norm 1.1747 (1.5324) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][70/625] eta 0:04:11 lr 0.001141 wd 0.0500 time 0.4481 (0.4526) data time 0.0006 (0.0062) model time 0.4474 (0.4475) loss 2.3187 (3.1720) grad_norm 1.5248 (1.5654) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][80/625] eta 0:04:07 lr 0.001141 wd 0.0500 time 0.4478 (0.4539) data time 0.0006 (0.0055) model time 0.4472 (0.4524) loss 2.9049 (3.2055) grad_norm 1.6773 (1.5634) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][90/625] eta 0:04:02 lr 0.001141 wd 0.0500 time 0.4480 (0.4532) data time 0.0006 (0.0050) model time 0.4474 (0.4510) loss 4.4864 (3.2262) grad_norm 2.2791 (1.5471) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][100/625] eta 0:03:57 lr 0.001141 wd 0.0500 time 0.4481 (0.4526) data time 0.0008 (0.0046) model time 0.4474 (0.4500) loss 3.8467 (3.2645) grad_norm 1.5540 (1.5120) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][110/625] eta 0:03:52 lr 0.001141 wd 0.0500 time 0.4532 (0.4524) data time 0.0008 (0.0042) model time 0.4524 (0.4499) loss 2.5442 (3.2498) grad_norm 1.5378 (1.4986) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:08:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][120/625] eta 0:03:48 lr 0.001141 wd 0.0500 time 0.4504 (0.4522) data time 0.0006 (0.0040) model time 0.4497 (0.4497) loss 2.8161 (3.2431) grad_norm 1.4549 (1.4785) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:09:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][130/625] eta 0:03:43 lr 0.001141 wd 0.0500 time 0.4437 (0.4517) data time 0.0006 (0.0038) model time 0.4431 (0.4491) loss 3.4834 (3.2415) grad_norm 1.3187 (1.4644) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:09:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][140/625] eta 0:03:38 lr 0.001141 wd 0.0500 time 0.4491 (0.4514) data time 0.0009 (0.0036) model time 0.4482 (0.4489) loss 3.2436 (3.2633) grad_norm 1.1268 (1.4520) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][150/625] eta 0:03:34 lr 0.001140 wd 0.0500 time 0.4468 (0.4513) data time 0.0007 (0.0034) model time 0.4461 (0.4488) loss 3.1573 (3.2529) grad_norm 1.2445 (1.4455) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:09:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][160/625] eta 0:03:29 lr 0.001140 wd 0.0500 time 0.4488 (0.4508) data time 0.0007 (0.0032) model time 0.4481 (0.4483) loss 2.2967 (3.2573) grad_norm 1.1538 (1.4293) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:09:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][170/625] eta 0:03:24 lr 0.001140 wd 0.0500 time 0.4430 (0.4504) data time 0.0009 (0.0031) model time 0.4422 (0.4479) loss 3.1095 (3.2587) grad_norm 1.2802 (1.4253) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:09:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][180/625] eta 0:03:20 lr 0.001140 wd 0.0500 time 0.4411 (0.4501) data time 0.0010 (0.0030) model time 0.4401 (0.4476) loss 3.5075 (3.2666) grad_norm 1.7145 (1.4292) loss_scale 4096.0000 (2115.8895) mem 16704MB [2024-08-04 17:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][190/625] eta 0:03:15 lr 0.001140 wd 0.0500 time 0.4473 (0.4500) data time 0.0007 (0.0028) model time 0.4466 (0.4475) loss 3.4398 (3.2645) grad_norm 1.0020 (1.4363) loss_scale 4096.0000 (2219.5602) mem 16704MB [2024-08-04 17:09:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][200/625] eta 0:03:11 lr 0.001140 wd 0.0500 time 0.4484 (0.4498) data time 0.0009 (0.0027) model time 0.4474 (0.4475) loss 3.4492 (3.2808) grad_norm 1.3168 (1.4229) loss_scale 4096.0000 (2312.9154) mem 16704MB [2024-08-04 17:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][210/625] eta 0:03:06 lr 0.001140 wd 0.0500 time 0.4505 (0.4497) data time 0.0007 (0.0027) model time 0.4498 (0.4474) loss 3.9373 (3.2799) grad_norm 1.3189 (1.4142) loss_scale 4096.0000 (2397.4218) mem 16704MB [2024-08-04 17:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][220/625] eta 0:03:02 lr 0.001140 wd 0.0500 time 0.4417 (0.4496) data time 0.0008 (0.0026) model time 0.4409 (0.4473) loss 2.6903 (3.2780) grad_norm 1.3095 (1.4128) loss_scale 4096.0000 (2474.2805) mem 16704MB [2024-08-04 17:09:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][230/625] eta 0:02:57 lr 0.001140 wd 0.0500 time 0.4476 (0.4495) data time 0.0008 (0.0025) model time 0.4468 (0.4472) loss 3.7774 (3.2728) grad_norm 1.3260 (1.4153) loss_scale 4096.0000 (2544.4848) mem 16704MB [2024-08-04 17:09:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][240/625] eta 0:02:53 lr 0.001140 wd 0.0500 time 0.4534 (0.4494) data time 0.0008 (0.0024) model time 0.4525 (0.4473) loss 3.6077 (3.2686) grad_norm 1.1709 (1.4165) loss_scale 4096.0000 (2608.8631) mem 16704MB [2024-08-04 17:09:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][250/625] eta 0:02:48 lr 0.001140 wd 0.0500 time 0.4441 (0.4498) data time 0.0008 (0.0024) model time 0.4433 (0.4478) loss 2.6814 (3.2619) grad_norm 1.5495 (1.4189) loss_scale 4096.0000 (2668.1116) mem 16704MB [2024-08-04 17:09:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][260/625] eta 0:02:44 lr 0.001140 wd 0.0500 time 0.4446 (0.4497) data time 0.0008 (0.0023) model time 0.4439 (0.4477) loss 2.7719 (3.2712) grad_norm 1.6093 (1.4136) loss_scale 4096.0000 (2722.8199) mem 16704MB [2024-08-04 17:10:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][270/625] eta 0:02:39 lr 0.001140 wd 0.0500 time 0.4466 (0.4496) data time 0.0008 (0.0023) model time 0.4458 (0.4476) loss 2.8299 (3.2668) grad_norm 0.9178 (1.4074) loss_scale 4096.0000 (2773.4908) mem 16704MB [2024-08-04 17:10:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][280/625] eta 0:02:35 lr 0.001140 wd 0.0500 time 0.4470 (0.4495) data time 0.0008 (0.0022) model time 0.4461 (0.4476) loss 2.9950 (3.2665) grad_norm 1.3305 (1.4083) loss_scale 4096.0000 (2820.5552) mem 16704MB [2024-08-04 17:10:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][290/625] eta 0:02:30 lr 0.001140 wd 0.0500 time 0.4448 (0.4494) data time 0.0006 (0.0022) model time 0.4442 (0.4475) loss 2.4404 (3.2682) grad_norm 1.2653 (1.4036) loss_scale 4096.0000 (2864.3849) mem 16704MB [2024-08-04 17:10:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][300/625] eta 0:02:26 lr 0.001140 wd 0.0500 time 0.4442 (0.4499) data time 0.0008 (0.0021) model time 0.4434 (0.4481) loss 2.2687 (3.2670) grad_norm 1.3878 (1.4006) loss_scale 4096.0000 (2905.3023) mem 16704MB [2024-08-04 17:10:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][310/625] eta 0:02:21 lr 0.001140 wd 0.0500 time 0.4442 (0.4498) data time 0.0008 (0.0021) model time 0.4434 (0.4480) loss 3.6978 (3.2713) grad_norm 1.7154 (1.4074) loss_scale 4096.0000 (2943.5884) mem 16704MB [2024-08-04 17:10:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][320/625] eta 0:02:17 lr 0.001140 wd 0.0500 time 0.4469 (0.4496) data time 0.0009 (0.0020) model time 0.4460 (0.4479) loss 3.3187 (3.2675) grad_norm 1.0043 (1.4147) loss_scale 4096.0000 (2979.4891) mem 16704MB [2024-08-04 17:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][330/625] eta 0:02:12 lr 0.001140 wd 0.0500 time 0.4414 (0.4495) data time 0.0008 (0.0020) model time 0.4406 (0.4478) loss 3.4298 (3.2691) grad_norm 1.0624 (1.4123) loss_scale 4096.0000 (3013.2205) mem 16704MB [2024-08-04 17:10:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][340/625] eta 0:02:08 lr 0.001140 wd 0.0500 time 0.4428 (0.4494) data time 0.0008 (0.0020) model time 0.4419 (0.4477) loss 3.0819 (3.2681) grad_norm 1.1309 (1.4090) loss_scale 4096.0000 (3044.9736) mem 16704MB [2024-08-04 17:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][350/625] eta 0:02:03 lr 0.001140 wd 0.0500 time 0.4495 (0.4493) data time 0.0008 (0.0019) model time 0.4487 (0.4476) loss 3.2674 (3.2666) grad_norm 1.8015 (1.4068) loss_scale 4096.0000 (3074.9174) mem 16704MB [2024-08-04 17:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][360/625] eta 0:01:59 lr 0.001139 wd 0.0500 time 0.4442 (0.4492) data time 0.0008 (0.0019) model time 0.4434 (0.4475) loss 3.7890 (3.2693) grad_norm 1.1203 (1.4068) loss_scale 4096.0000 (3103.2022) mem 16704MB [2024-08-04 17:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][370/625] eta 0:01:54 lr 0.001139 wd 0.0500 time 0.4426 (0.4491) data time 0.0006 (0.0019) model time 0.4420 (0.4474) loss 2.7453 (3.2663) grad_norm 1.6239 (1.4056) loss_scale 4096.0000 (3129.9623) mem 16704MB [2024-08-04 17:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][380/625] eta 0:01:50 lr 0.001139 wd 0.0500 time 0.4439 (0.4490) data time 0.0008 (0.0018) model time 0.4430 (0.4474) loss 3.5101 (3.2675) grad_norm 1.6932 (1.4017) loss_scale 4096.0000 (3155.3176) mem 16704MB [2024-08-04 17:10:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][390/625] eta 0:01:45 lr 0.001139 wd 0.0500 time 0.3865 (0.4493) data time 0.0009 (0.0018) model time 0.3856 (0.4477) loss 3.5867 (3.2713) grad_norm 1.1284 (1.3997) loss_scale 4096.0000 (3179.3760) mem 16704MB [2024-08-04 17:11:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][400/625] eta 0:01:41 lr 0.001139 wd 0.0500 time 0.4546 (0.4492) data time 0.0008 (0.0018) model time 0.4538 (0.4476) loss 3.5369 (3.2793) grad_norm 1.2481 (1.3957) loss_scale 4096.0000 (3202.2344) mem 16704MB [2024-08-04 17:11:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][410/625] eta 0:01:36 lr 0.001139 wd 0.0500 time 0.4438 (0.4491) data time 0.0007 (0.0018) model time 0.4431 (0.4475) loss 3.0773 (3.2807) grad_norm 1.9078 (1.4022) loss_scale 4096.0000 (3223.9805) mem 16704MB [2024-08-04 17:11:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][420/625] eta 0:01:32 lr 0.001139 wd 0.0500 time 0.4425 (0.4490) data time 0.0006 (0.0017) model time 0.4419 (0.4475) loss 3.8279 (3.2839) grad_norm 1.0238 (1.4031) loss_scale 4096.0000 (3244.6936) mem 16704MB [2024-08-04 17:11:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][430/625] eta 0:01:27 lr 0.001139 wd 0.0500 time 0.4476 (0.4490) data time 0.0006 (0.0017) model time 0.4470 (0.4475) loss 4.0939 (3.2871) grad_norm 1.4729 (1.4020) loss_scale 4096.0000 (3264.4455) mem 16704MB [2024-08-04 17:11:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][440/625] eta 0:01:23 lr 0.001139 wd 0.0500 time 0.4461 (0.4489) data time 0.0008 (0.0017) model time 0.4453 (0.4474) loss 3.1191 (3.2851) grad_norm 1.6826 (1.4040) loss_scale 4096.0000 (3283.3016) mem 16704MB [2024-08-04 17:11:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][450/625] eta 0:01:18 lr 0.001139 wd 0.0500 time 0.4461 (0.4493) data time 0.0006 (0.0017) model time 0.4455 (0.4478) loss 2.8964 (3.2835) grad_norm 1.4931 (1.4068) loss_scale 4096.0000 (3301.3215) mem 16704MB [2024-08-04 17:11:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][460/625] eta 0:01:14 lr 0.001139 wd 0.0500 time 0.4429 (0.4493) data time 0.0008 (0.0017) model time 0.4421 (0.4478) loss 2.3985 (3.2887) grad_norm 1.5347 (1.4074) loss_scale 4096.0000 (3318.5597) mem 16704MB [2024-08-04 17:11:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][470/625] eta 0:01:09 lr 0.001139 wd 0.0500 time 0.4466 (0.4492) data time 0.0008 (0.0016) model time 0.4457 (0.4477) loss 2.8600 (3.2918) grad_norm 0.9543 (1.4130) loss_scale 4096.0000 (3335.0658) mem 16704MB [2024-08-04 17:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][480/625] eta 0:01:05 lr 0.001139 wd 0.0500 time 0.4505 (0.4492) data time 0.0008 (0.0016) model time 0.4497 (0.4477) loss 3.4323 (3.2965) grad_norm 1.4069 (1.4112) loss_scale 4096.0000 (3350.8857) mem 16704MB [2024-08-04 17:11:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][490/625] eta 0:01:00 lr 0.001139 wd 0.0500 time 0.4482 (0.4491) data time 0.0009 (0.0016) model time 0.4474 (0.4477) loss 3.6946 (3.2912) grad_norm 1.3660 (1.4070) loss_scale 4096.0000 (3366.0611) mem 16704MB [2024-08-04 17:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][500/625] eta 0:00:56 lr 0.001139 wd 0.0500 time 0.4439 (0.4491) data time 0.0009 (0.0016) model time 0.4430 (0.4476) loss 2.8293 (3.2860) grad_norm 1.1739 (1.4062) loss_scale 4096.0000 (3380.6307) mem 16704MB [2024-08-04 17:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][510/625] eta 0:00:51 lr 0.001139 wd 0.0500 time 0.4494 (0.4490) data time 0.0006 (0.0016) model time 0.4488 (0.4476) loss 3.7095 (3.2876) grad_norm 2.5832 (1.4118) loss_scale 4096.0000 (3394.6301) mem 16704MB [2024-08-04 17:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][520/625] eta 0:00:47 lr 0.001139 wd 0.0500 time 0.4425 (0.4490) data time 0.0008 (0.0016) model time 0.4417 (0.4476) loss 3.1036 (3.2875) grad_norm 1.0110 (1.4146) loss_scale 4096.0000 (3408.0921) mem 16704MB [2024-08-04 17:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][530/625] eta 0:00:42 lr 0.001139 wd 0.0500 time 0.4451 (0.4489) data time 0.0009 (0.0016) model time 0.4442 (0.4475) loss 3.2981 (3.2873) grad_norm 1.3286 (1.4117) loss_scale 4096.0000 (3421.0471) mem 16704MB [2024-08-04 17:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][540/625] eta 0:00:38 lr 0.001139 wd 0.0500 time 0.4480 (0.4489) data time 0.0006 (0.0015) model time 0.4474 (0.4475) loss 3.7449 (3.2909) grad_norm 1.2354 (1.4108) loss_scale 4096.0000 (3433.5231) mem 16704MB [2024-08-04 17:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][550/625] eta 0:00:33 lr 0.001139 wd 0.0500 time 0.4509 (0.4489) data time 0.0007 (0.0015) model time 0.4502 (0.4475) loss 2.6947 (3.2923) grad_norm 1.0848 (1.4097) loss_scale 4096.0000 (3445.5463) mem 16704MB [2024-08-04 17:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][560/625] eta 0:00:29 lr 0.001139 wd 0.0500 time 0.4463 (0.4489) data time 0.0006 (0.0015) model time 0.4457 (0.4475) loss 2.2135 (3.2891) grad_norm 1.4398 (1.4094) loss_scale 4096.0000 (3457.1408) mem 16704MB [2024-08-04 17:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][570/625] eta 0:00:24 lr 0.001139 wd 0.0500 time 0.4487 (0.4488) data time 0.0006 (0.0015) model time 0.4481 (0.4475) loss 2.3853 (3.2905) grad_norm 1.0786 (1.4063) loss_scale 4096.0000 (3468.3292) mem 16704MB [2024-08-04 17:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][580/625] eta 0:00:20 lr 0.001138 wd 0.0500 time 0.4447 (0.4488) data time 0.0008 (0.0015) model time 0.4439 (0.4474) loss 3.5887 (3.2914) grad_norm 1.3472 (1.4049) loss_scale 4096.0000 (3479.1325) mem 16704MB [2024-08-04 17:12:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][590/625] eta 0:00:15 lr 0.001138 wd 0.0500 time 0.4486 (0.4487) data time 0.0006 (0.0015) model time 0.4479 (0.4474) loss 2.8499 (3.2902) grad_norm 1.1396 (1.4054) loss_scale 4096.0000 (3489.5702) mem 16704MB [2024-08-04 17:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][600/625] eta 0:00:11 lr 0.001138 wd 0.0500 time 0.4459 (0.4487) data time 0.0006 (0.0015) model time 0.4453 (0.4474) loss 4.0165 (3.2895) grad_norm 1.4478 (1.4049) loss_scale 4096.0000 (3499.6606) mem 16704MB [2024-08-04 17:12:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][610/625] eta 0:00:06 lr 0.001138 wd 0.0500 time 0.4437 (0.4487) data time 0.0006 (0.0015) model time 0.4431 (0.4473) loss 2.2546 (3.2893) grad_norm 1.9066 (1.4042) loss_scale 4096.0000 (3509.4206) mem 16704MB [2024-08-04 17:12:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [60/300][620/625] eta 0:00:02 lr 0.001138 wd 0.0500 time 0.4419 (0.4486) data time 0.0004 (0.0014) model time 0.4415 (0.4472) loss 2.3876 (3.2912) grad_norm 1.0868 (1.4042) loss_scale 4096.0000 (3518.8663) mem 16704MB [2024-08-04 17:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 60 training takes 0:04:40 [2024-08-04 17:12:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:12:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:12:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.6387 (0.6387) Acc@1 86.230 (86.230) Acc@5 97.852 (97.852) Mem 16704MB [2024-08-04 17:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.0176 (0.7716) Acc@1 76.221 (82.906) Acc@5 93.359 (96.622) Mem 16704MB [2024-08-04 17:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1621 (0.9182) Acc@1 72.021 (79.127) Acc@5 92.822 (94.850) Mem 16704MB [2024-08-04 17:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.789 Acc@5 94.822 [2024-08-04 17:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.8% [2024-08-04 17:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.772 (0.772) Loss 0.5342 (0.5342) Acc@1 87.402 (87.402) Acc@5 98.340 (98.340) Mem 16704MB [2024-08-04 17:12:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.181) Loss 0.8999 (0.6778) Acc@1 77.979 (84.055) Acc@5 94.873 (97.044) Mem 16704MB [2024-08-04 17:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.150) Loss 1.0439 (0.8195) Acc@1 73.047 (80.434) Acc@5 93.506 (95.468) Mem 16704MB [2024-08-04 17:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.192 Acc@5 95.463 [2024-08-04 17:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-04 17:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.19% [2024-08-04 17:12:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 17:12:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 17:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][0/625] eta 0:08:15 lr 0.001138 wd 0.0500 time 0.7934 (0.7934) data time 0.4057 (0.4057) model time 0.0000 (0.0000) loss 2.3898 (2.3898) grad_norm 2.4790 (2.4790) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][10/625] eta 0:05:03 lr 0.001138 wd 0.0500 time 0.4492 (0.4931) data time 0.0006 (0.0376) model time 0.0000 (0.0000) loss 3.9690 (3.0472) grad_norm 0.9434 (1.5219) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][20/625] eta 0:04:45 lr 0.001138 wd 0.0500 time 0.4471 (0.4716) data time 0.0006 (0.0201) model time 0.0000 (0.0000) loss 2.9415 (3.1127) grad_norm 1.1946 (1.4922) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][30/625] eta 0:04:35 lr 0.001138 wd 0.0500 time 0.4483 (0.4636) data time 0.0006 (0.0139) model time 0.0000 (0.0000) loss 2.9921 (3.2261) grad_norm 1.4088 (1.3840) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][40/625] eta 0:04:28 lr 0.001138 wd 0.0500 time 0.4477 (0.4596) data time 0.0007 (0.0107) model time 0.0000 (0.0000) loss 3.8658 (3.2455) grad_norm 3.3030 (1.4241) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][50/625] eta 0:04:22 lr 0.001138 wd 0.0500 time 0.4477 (0.4571) data time 0.0008 (0.0088) model time 0.0000 (0.0000) loss 1.9533 (3.2570) grad_norm 1.4621 (1.4634) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][60/625] eta 0:04:17 lr 0.001138 wd 0.0500 time 0.4489 (0.4557) data time 0.0006 (0.0075) model time 0.4483 (0.4475) loss 4.1549 (3.2636) grad_norm 1.0905 (1.4482) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][70/625] eta 0:04:13 lr 0.001138 wd 0.0500 time 0.4471 (0.4572) data time 0.0006 (0.0065) model time 0.4464 (0.4567) loss 3.2786 (3.2786) grad_norm 1.4767 (1.4541) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][80/625] eta 0:04:08 lr 0.001138 wd 0.0500 time 0.4546 (0.4563) data time 0.0006 (0.0058) model time 0.4539 (0.4540) loss 3.3131 (3.2520) grad_norm 1.3310 (1.4546) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][90/625] eta 0:04:03 lr 0.001138 wd 0.0500 time 0.4475 (0.4555) data time 0.0008 (0.0053) model time 0.4467 (0.4526) loss 2.8099 (3.2509) grad_norm 1.0429 (1.4572) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][100/625] eta 0:03:58 lr 0.001138 wd 0.0500 time 0.4463 (0.4547) data time 0.0006 (0.0048) model time 0.4457 (0.4515) loss 3.8522 (3.2930) grad_norm 1.2930 (1.4591) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][110/625] eta 0:03:53 lr 0.001138 wd 0.0500 time 0.4475 (0.4542) data time 0.0007 (0.0045) model time 0.4468 (0.4510) loss 3.3306 (3.2991) grad_norm 1.0824 (1.4584) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][120/625] eta 0:03:49 lr 0.001138 wd 0.0500 time 0.4493 (0.4538) data time 0.0008 (0.0042) model time 0.4484 (0.4506) loss 3.4707 (3.3091) grad_norm 1.5118 (1.4577) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][130/625] eta 0:03:44 lr 0.001138 wd 0.0500 time 0.4498 (0.4534) data time 0.0008 (0.0039) model time 0.4490 (0.4502) loss 3.0976 (3.3199) grad_norm 0.9859 (1.4398) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][140/625] eta 0:03:39 lr 0.001138 wd 0.0500 time 0.4469 (0.4532) data time 0.0010 (0.0037) model time 0.4459 (0.4502) loss 3.6570 (3.3169) grad_norm 1.4764 (1.4335) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][150/625] eta 0:03:35 lr 0.001138 wd 0.0500 time 0.4462 (0.4529) data time 0.0008 (0.0035) model time 0.4454 (0.4498) loss 3.9630 (3.3346) grad_norm 1.1678 (1.4132) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][160/625] eta 0:03:30 lr 0.001137 wd 0.0500 time 0.4467 (0.4526) data time 0.0008 (0.0034) model time 0.4459 (0.4496) loss 3.2491 (3.3363) grad_norm 1.3356 (1.4117) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][170/625] eta 0:03:25 lr 0.001137 wd 0.0500 time 0.4474 (0.4522) data time 0.0006 (0.0032) model time 0.4468 (0.4493) loss 4.2935 (3.3572) grad_norm 1.2713 (1.4044) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][180/625] eta 0:03:21 lr 0.001137 wd 0.0500 time 0.4503 (0.4520) data time 0.0008 (0.0031) model time 0.4496 (0.4491) loss 3.5061 (3.3518) grad_norm 0.8795 (1.4102) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][190/625] eta 0:03:16 lr 0.001137 wd 0.0500 time 0.4483 (0.4519) data time 0.0006 (0.0030) model time 0.4477 (0.4492) loss 2.6165 (3.3584) grad_norm 1.4794 (1.4125) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][200/625] eta 0:03:12 lr 0.001137 wd 0.0500 time 0.4529 (0.4518) data time 0.0007 (0.0029) model time 0.4523 (0.4492) loss 2.8935 (3.3528) grad_norm 1.4871 (1.4166) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][210/625] eta 0:03:07 lr 0.001137 wd 0.0500 time 0.4542 (0.4517) data time 0.0008 (0.0028) model time 0.4534 (0.4492) loss 3.9202 (3.3559) grad_norm 1.1475 (1.4178) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][220/625] eta 0:03:02 lr 0.001137 wd 0.0500 time 0.4477 (0.4518) data time 0.0008 (0.0027) model time 0.4469 (0.4493) loss 2.6487 (3.3531) grad_norm 1.5297 (1.4163) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][230/625] eta 0:02:58 lr 0.001137 wd 0.0500 time 0.4492 (0.4517) data time 0.0008 (0.0026) model time 0.4483 (0.4493) loss 3.4939 (3.3545) grad_norm 1.3222 (1.4174) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][240/625] eta 0:02:53 lr 0.001137 wd 0.0500 time 0.4487 (0.4516) data time 0.0008 (0.0025) model time 0.4479 (0.4493) loss 3.2391 (3.3434) grad_norm 1.8887 (1.4165) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][250/625] eta 0:02:49 lr 0.001137 wd 0.0500 time 0.4480 (0.4515) data time 0.0006 (0.0025) model time 0.4474 (0.4492) loss 3.7037 (3.3394) grad_norm 1.1239 (1.4192) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][260/625] eta 0:02:44 lr 0.001137 wd 0.0500 time 0.4459 (0.4514) data time 0.0006 (0.0024) model time 0.4453 (0.4492) loss 2.3378 (3.3376) grad_norm 1.8877 (1.4217) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][270/625] eta 0:02:40 lr 0.001137 wd 0.0500 time 0.4567 (0.4514) data time 0.0009 (0.0023) model time 0.4559 (0.4492) loss 3.7503 (3.3447) grad_norm 1.6401 (1.4255) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:14:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][280/625] eta 0:02:35 lr 0.001137 wd 0.0500 time 0.4516 (0.4515) data time 0.0006 (0.0023) model time 0.4510 (0.4494) loss 3.7720 (3.3484) grad_norm 1.3399 (1.4372) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][290/625] eta 0:02:31 lr 0.001137 wd 0.0500 time 0.4602 (0.4515) data time 0.0006 (0.0022) model time 0.4596 (0.4495) loss 3.5342 (3.3378) grad_norm 1.6940 (1.4393) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][300/625] eta 0:02:26 lr 0.001137 wd 0.0500 time 0.4501 (0.4515) data time 0.0007 (0.0022) model time 0.4494 (0.4496) loss 3.6647 (3.3438) grad_norm 1.2520 (1.4399) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][310/625] eta 0:02:22 lr 0.001137 wd 0.0500 time 0.4486 (0.4516) data time 0.0007 (0.0021) model time 0.4479 (0.4497) loss 2.9496 (3.3403) grad_norm 1.5146 (1.4382) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][320/625] eta 0:02:17 lr 0.001137 wd 0.0500 time 0.4452 (0.4515) data time 0.0007 (0.0021) model time 0.4445 (0.4496) loss 3.3405 (3.3363) grad_norm 1.2224 (1.4390) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][330/625] eta 0:02:13 lr 0.001137 wd 0.0500 time 0.4484 (0.4514) data time 0.0007 (0.0021) model time 0.4477 (0.4495) loss 3.7542 (3.3385) grad_norm 1.3917 (1.4423) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][340/625] eta 0:02:08 lr 0.001137 wd 0.0500 time 0.4407 (0.4513) data time 0.0008 (0.0020) model time 0.4399 (0.4495) loss 3.0664 (3.3390) grad_norm 2.0932 (1.4385) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][350/625] eta 0:02:04 lr 0.001137 wd 0.0500 time 0.4525 (0.4517) data time 0.0007 (0.0020) model time 0.4518 (0.4500) loss 2.6599 (3.3330) grad_norm 1.6364 (1.4342) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][360/625] eta 0:01:59 lr 0.001137 wd 0.0500 time 0.4466 (0.4516) data time 0.0008 (0.0020) model time 0.4458 (0.4499) loss 3.7086 (3.3325) grad_norm 1.8607 (1.4408) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][370/625] eta 0:01:55 lr 0.001136 wd 0.0500 time 0.4492 (0.4516) data time 0.0006 (0.0019) model time 0.4486 (0.4498) loss 3.4400 (3.3363) grad_norm 1.3184 (1.4434) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][380/625] eta 0:01:50 lr 0.001136 wd 0.0500 time 0.4466 (0.4515) data time 0.0006 (0.0019) model time 0.4460 (0.4498) loss 3.5149 (3.3309) grad_norm 1.5290 (1.4409) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][390/625] eta 0:01:46 lr 0.001136 wd 0.0500 time 0.4450 (0.4514) data time 0.0009 (0.0019) model time 0.4442 (0.4497) loss 2.6147 (3.3333) grad_norm 1.0141 (1.4368) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][400/625] eta 0:01:41 lr 0.001136 wd 0.0500 time 0.6690 (0.4519) data time 0.0006 (0.0018) model time 0.6684 (0.4503) loss 3.4570 (3.3249) grad_norm 1.5554 (1.4348) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][410/625] eta 0:01:37 lr 0.001136 wd 0.0500 time 0.4465 (0.4518) data time 0.0007 (0.0018) model time 0.4458 (0.4502) loss 3.5266 (3.3254) grad_norm 0.8414 (1.4318) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][420/625] eta 0:01:32 lr 0.001136 wd 0.0500 time 0.4470 (0.4517) data time 0.0008 (0.0018) model time 0.4462 (0.4501) loss 3.3796 (3.3205) grad_norm 1.2171 (1.4294) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][430/625] eta 0:01:28 lr 0.001136 wd 0.0500 time 0.4525 (0.4517) data time 0.0007 (0.0018) model time 0.4518 (0.4501) loss 3.8405 (3.3191) grad_norm 1.1123 (1.4238) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][440/625] eta 0:01:23 lr 0.001136 wd 0.0500 time 0.4450 (0.4516) data time 0.0008 (0.0017) model time 0.4442 (0.4500) loss 3.5532 (3.3147) grad_norm 1.1790 (1.4207) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][450/625] eta 0:01:19 lr 0.001136 wd 0.0500 time 0.4507 (0.4515) data time 0.0006 (0.0017) model time 0.4500 (0.4500) loss 3.5223 (3.3151) grad_norm 0.8092 (1.4189) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][460/625] eta 0:01:14 lr 0.001136 wd 0.0500 time 0.4499 (0.4514) data time 0.0006 (0.0017) model time 0.4494 (0.4499) loss 3.7374 (3.3147) grad_norm 1.0129 (1.4229) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][470/625] eta 0:01:09 lr 0.001136 wd 0.0500 time 0.4431 (0.4513) data time 0.0007 (0.0017) model time 0.4424 (0.4498) loss 3.9865 (3.3141) grad_norm 1.7607 (1.4253) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][480/625] eta 0:01:05 lr 0.001136 wd 0.0500 time 0.4479 (0.4512) data time 0.0006 (0.0017) model time 0.4473 (0.4497) loss 3.8256 (3.3092) grad_norm 1.4217 (1.4296) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-04 17:16:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][490/625] eta 0:01:00 lr 0.001136 wd 0.0500 time 0.4354 (0.4511) data time 0.0006 (0.0016) model time 0.4348 (0.4496) loss 3.8407 (3.3058) grad_norm inf (inf) loss_scale 2048.0000 (4091.8289) mem 16704MB [2024-08-04 17:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][500/625] eta 0:00:56 lr 0.001136 wd 0.0500 time 0.4507 (0.4510) data time 0.0006 (0.0016) model time 0.4501 (0.4495) loss 3.6962 (3.3075) grad_norm 1.2075 (inf) loss_scale 2048.0000 (4051.0339) mem 16704MB [2024-08-04 17:16:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][510/625] eta 0:00:51 lr 0.001136 wd 0.0500 time 0.4477 (0.4510) data time 0.0009 (0.0016) model time 0.4467 (0.4495) loss 2.4745 (3.3118) grad_norm 1.6263 (inf) loss_scale 2048.0000 (4011.8356) mem 16704MB [2024-08-04 17:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][520/625] eta 0:00:47 lr 0.001136 wd 0.0500 time 0.4497 (0.4509) data time 0.0006 (0.0016) model time 0.4491 (0.4494) loss 3.2151 (3.3123) grad_norm 1.5147 (inf) loss_scale 2048.0000 (3974.1420) mem 16704MB [2024-08-04 17:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][530/625] eta 0:00:42 lr 0.001136 wd 0.0500 time 0.4469 (0.4509) data time 0.0006 (0.0016) model time 0.4462 (0.4494) loss 4.1717 (3.3149) grad_norm 1.5975 (inf) loss_scale 2048.0000 (3937.8682) mem 16704MB [2024-08-04 17:16:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][540/625] eta 0:00:38 lr 0.001136 wd 0.0500 time 0.4484 (0.4509) data time 0.0006 (0.0016) model time 0.4477 (0.4495) loss 3.9556 (3.3133) grad_norm 1.3067 (inf) loss_scale 2048.0000 (3902.9353) mem 16704MB [2024-08-04 17:17:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][550/625] eta 0:00:33 lr 0.001136 wd 0.0500 time 0.4484 (0.4509) data time 0.0008 (0.0016) model time 0.4476 (0.4494) loss 3.5968 (3.3107) grad_norm 1.3405 (inf) loss_scale 2048.0000 (3869.2704) mem 16704MB [2024-08-04 17:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][560/625] eta 0:00:29 lr 0.001136 wd 0.0500 time 0.4457 (0.4508) data time 0.0008 (0.0015) model time 0.4449 (0.4494) loss 3.1063 (3.3118) grad_norm 1.1052 (inf) loss_scale 2048.0000 (3836.8057) mem 16704MB [2024-08-04 17:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][570/625] eta 0:00:24 lr 0.001136 wd 0.0500 time 0.4491 (0.4508) data time 0.0007 (0.0015) model time 0.4484 (0.4494) loss 2.9916 (3.3118) grad_norm 1.0418 (inf) loss_scale 2048.0000 (3805.4781) mem 16704MB [2024-08-04 17:17:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][580/625] eta 0:00:20 lr 0.001135 wd 0.0500 time 0.4515 (0.4508) data time 0.0009 (0.0015) model time 0.4507 (0.4493) loss 3.5894 (3.3158) grad_norm 1.6272 (inf) loss_scale 2048.0000 (3775.2289) mem 16704MB [2024-08-04 17:17:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][590/625] eta 0:00:15 lr 0.001135 wd 0.0500 time 0.6001 (0.4510) data time 0.0010 (0.0015) model time 0.5991 (0.4496) loss 2.5664 (3.3197) grad_norm 1.7495 (inf) loss_scale 2048.0000 (3746.0034) mem 16704MB [2024-08-04 17:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][600/625] eta 0:00:11 lr 0.001135 wd 0.0500 time 0.4573 (0.4510) data time 0.0008 (0.0015) model time 0.4565 (0.4496) loss 2.3180 (3.3185) grad_norm 1.1291 (inf) loss_scale 2048.0000 (3717.7504) mem 16704MB [2024-08-04 17:17:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][610/625] eta 0:00:06 lr 0.001135 wd 0.0500 time 0.4410 (0.4509) data time 0.0005 (0.0015) model time 0.4405 (0.4495) loss 2.8080 (3.3131) grad_norm 1.0660 (inf) loss_scale 2048.0000 (3690.4223) mem 16704MB [2024-08-04 17:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [61/300][620/625] eta 0:00:02 lr 0.001135 wd 0.0500 time 0.4423 (0.4508) data time 0.0004 (0.0015) model time 0.4420 (0.4494) loss 2.5978 (3.3087) grad_norm 1.1546 (inf) loss_scale 2048.0000 (3663.9742) mem 16704MB [2024-08-04 17:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 61 training takes 0:04:41 [2024-08-04 17:17:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:17:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.448 (0.448) Loss 0.6226 (0.6226) Acc@1 86.719 (86.719) Acc@5 97.949 (97.949) Mem 16704MB [2024-08-04 17:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.150) Loss 1.0820 (0.7771) Acc@1 74.756 (82.710) Acc@5 93.066 (96.573) Mem 16704MB [2024-08-04 17:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.133) Loss 1.1543 (0.9191) Acc@1 72.363 (79.134) Acc@5 93.018 (94.996) Mem 16704MB [2024-08-04 17:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.873 Acc@5 94.960 [2024-08-04 17:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-08-04 17:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.783 (0.783) Loss 0.5332 (0.5332) Acc@1 87.500 (87.500) Acc@5 98.340 (98.340) Mem 16704MB [2024-08-04 17:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.182) Loss 0.8979 (0.6762) Acc@1 78.125 (84.233) Acc@5 94.824 (97.039) Mem 16704MB [2024-08-04 17:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.150) Loss 1.0400 (0.8166) Acc@1 73.047 (80.590) Acc@5 93.604 (95.526) Mem 16704MB [2024-08-04 17:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.342 Acc@5 95.521 [2024-08-04 17:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-04 17:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.34% [2024-08-04 17:17:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 17:17:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 17:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][0/625] eta 0:07:48 lr 0.001135 wd 0.0500 time 0.7498 (0.7498) data time 0.3603 (0.3603) model time 0.0000 (0.0000) loss 2.9963 (2.9963) grad_norm 1.8133 (1.8133) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:17:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][10/625] eta 0:04:52 lr 0.001135 wd 0.0500 time 0.4493 (0.4753) data time 0.0006 (0.0335) model time 0.0000 (0.0000) loss 2.5588 (3.2912) grad_norm 1.0941 (1.4017) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][20/625] eta 0:04:40 lr 0.001135 wd 0.0500 time 0.4472 (0.4629) data time 0.0008 (0.0179) model time 0.0000 (0.0000) loss 3.4492 (3.1709) grad_norm 1.3079 (1.4639) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:17:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][30/625] eta 0:04:32 lr 0.001135 wd 0.0500 time 0.4476 (0.4585) data time 0.0007 (0.0124) model time 0.0000 (0.0000) loss 3.3668 (3.2463) grad_norm 1.3474 (1.4731) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][40/625] eta 0:04:26 lr 0.001135 wd 0.0500 time 0.4466 (0.4562) data time 0.0006 (0.0096) model time 0.0000 (0.0000) loss 3.2599 (3.2445) grad_norm 0.9783 (1.4472) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][50/625] eta 0:04:21 lr 0.001135 wd 0.0500 time 0.4474 (0.4546) data time 0.0007 (0.0079) model time 0.0000 (0.0000) loss 3.9767 (3.2292) grad_norm 1.9454 (1.4594) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][60/625] eta 0:04:17 lr 0.001135 wd 0.0500 time 0.4492 (0.4556) data time 0.0006 (0.0067) model time 0.4486 (0.4600) loss 3.7580 (3.2158) grad_norm 1.4372 (1.4392) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][70/625] eta 0:04:12 lr 0.001135 wd 0.0500 time 0.4467 (0.4544) data time 0.0008 (0.0059) model time 0.4459 (0.4533) loss 3.5883 (3.2492) grad_norm 1.2808 (1.4192) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][80/625] eta 0:04:07 lr 0.001135 wd 0.0500 time 0.4488 (0.4538) data time 0.0006 (0.0052) model time 0.4482 (0.4517) loss 3.2149 (3.2545) grad_norm 1.3802 (1.4169) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][90/625] eta 0:04:02 lr 0.001135 wd 0.0500 time 0.4560 (0.4533) data time 0.0008 (0.0047) model time 0.4552 (0.4510) loss 3.4081 (3.2914) grad_norm 1.4848 (1.4247) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][100/625] eta 0:03:57 lr 0.001135 wd 0.0500 time 0.4457 (0.4528) data time 0.0008 (0.0044) model time 0.4449 (0.4502) loss 3.2757 (3.2757) grad_norm 1.1583 (1.4221) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][110/625] eta 0:03:53 lr 0.001135 wd 0.0500 time 0.4469 (0.4524) data time 0.0009 (0.0040) model time 0.4460 (0.4498) loss 3.8753 (3.2724) grad_norm 1.8436 (1.4241) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][120/625] eta 0:03:48 lr 0.001135 wd 0.0500 time 0.4465 (0.4520) data time 0.0006 (0.0038) model time 0.4458 (0.4493) loss 3.6630 (3.2732) grad_norm 1.4374 (1.4257) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][130/625] eta 0:03:43 lr 0.001135 wd 0.0500 time 0.4448 (0.4515) data time 0.0008 (0.0036) model time 0.4440 (0.4488) loss 3.4831 (3.2836) grad_norm 2.0651 (1.4189) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][140/625] eta 0:03:38 lr 0.001135 wd 0.0500 time 0.4521 (0.4512) data time 0.0009 (0.0034) model time 0.4512 (0.4485) loss 2.3307 (3.2787) grad_norm 1.2357 (1.4275) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][150/625] eta 0:03:34 lr 0.001135 wd 0.0500 time 0.4482 (0.4509) data time 0.0007 (0.0032) model time 0.4475 (0.4482) loss 3.8305 (3.2780) grad_norm 1.1424 (1.4232) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][160/625] eta 0:03:30 lr 0.001134 wd 0.0500 time 0.4481 (0.4519) data time 0.0006 (0.0030) model time 0.4476 (0.4499) loss 3.4614 (3.2839) grad_norm 1.2007 (1.4136) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][170/625] eta 0:03:25 lr 0.001134 wd 0.0500 time 0.4487 (0.4517) data time 0.0006 (0.0029) model time 0.4481 (0.4496) loss 2.9168 (3.2943) grad_norm 1.2763 (1.4267) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][180/625] eta 0:03:20 lr 0.001134 wd 0.0500 time 0.4497 (0.4515) data time 0.0006 (0.0028) model time 0.4492 (0.4495) loss 2.1031 (3.2761) grad_norm 1.0398 (1.4254) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][190/625] eta 0:03:16 lr 0.001134 wd 0.0500 time 0.4483 (0.4513) data time 0.0006 (0.0027) model time 0.4476 (0.4493) loss 3.8478 (3.2800) grad_norm 2.3665 (1.4369) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][200/625] eta 0:03:11 lr 0.001134 wd 0.0500 time 0.4475 (0.4511) data time 0.0007 (0.0026) model time 0.4468 (0.4491) loss 3.7289 (3.2819) grad_norm 1.7387 (1.4484) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][210/625] eta 0:03:07 lr 0.001134 wd 0.0500 time 0.4472 (0.4510) data time 0.0008 (0.0025) model time 0.4465 (0.4491) loss 2.3285 (3.2772) grad_norm 1.6005 (1.4458) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][220/625] eta 0:03:02 lr 0.001134 wd 0.0500 time 0.4457 (0.4509) data time 0.0006 (0.0024) model time 0.4450 (0.4489) loss 3.9001 (3.2733) grad_norm 1.3433 (1.4454) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][230/625] eta 0:02:58 lr 0.001134 wd 0.0500 time 0.4523 (0.4508) data time 0.0008 (0.0024) model time 0.4515 (0.4489) loss 3.8639 (3.2803) grad_norm 1.0720 (1.4416) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][240/625] eta 0:02:53 lr 0.001134 wd 0.0500 time 0.4471 (0.4507) data time 0.0008 (0.0023) model time 0.4463 (0.4488) loss 2.8499 (3.2757) grad_norm 1.3257 (1.4462) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][250/625] eta 0:02:48 lr 0.001134 wd 0.0500 time 0.4499 (0.4506) data time 0.0006 (0.0022) model time 0.4493 (0.4487) loss 3.0515 (3.2686) grad_norm 2.4464 (1.4490) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][260/625] eta 0:02:44 lr 0.001134 wd 0.0500 time 0.4540 (0.4505) data time 0.0008 (0.0022) model time 0.4532 (0.4487) loss 3.7054 (3.2687) grad_norm 1.7368 (1.4502) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][270/625] eta 0:02:39 lr 0.001134 wd 0.0500 time 0.4487 (0.4504) data time 0.0006 (0.0021) model time 0.4481 (0.4486) loss 3.1666 (3.2773) grad_norm 1.1664 (1.4468) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][280/625] eta 0:02:35 lr 0.001134 wd 0.0500 time 0.3865 (0.4503) data time 0.0008 (0.0021) model time 0.3856 (0.4486) loss 3.2667 (3.2768) grad_norm 0.9343 (1.4418) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][290/625] eta 0:02:30 lr 0.001134 wd 0.0500 time 0.4458 (0.4502) data time 0.0007 (0.0020) model time 0.4451 (0.4485) loss 2.2709 (3.2770) grad_norm 1.2168 (1.4395) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:19:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][300/625] eta 0:02:26 lr 0.001134 wd 0.0500 time 0.4482 (0.4501) data time 0.0010 (0.0020) model time 0.4471 (0.4484) loss 3.9270 (3.2756) grad_norm 1.4259 (1.4434) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][310/625] eta 0:02:21 lr 0.001134 wd 0.0500 time 0.4506 (0.4500) data time 0.0008 (0.0020) model time 0.4498 (0.4483) loss 3.7581 (3.2784) grad_norm 1.6558 (1.4482) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][320/625] eta 0:02:17 lr 0.001134 wd 0.0500 time 0.4486 (0.4499) data time 0.0007 (0.0019) model time 0.4479 (0.4483) loss 3.6417 (3.2844) grad_norm 1.7058 (1.4453) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][330/625] eta 0:02:12 lr 0.001134 wd 0.0500 time 0.4478 (0.4499) data time 0.0008 (0.0019) model time 0.4470 (0.4482) loss 3.2847 (3.2786) grad_norm 2.2812 (1.4482) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][340/625] eta 0:02:08 lr 0.001134 wd 0.0500 time 0.4493 (0.4498) data time 0.0007 (0.0019) model time 0.4486 (0.4481) loss 4.0220 (3.2797) grad_norm 1.2726 (1.4403) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][350/625] eta 0:02:03 lr 0.001134 wd 0.0500 time 0.4515 (0.4497) data time 0.0006 (0.0018) model time 0.4509 (0.4481) loss 3.6734 (3.2858) grad_norm 1.1566 (1.4350) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][360/625] eta 0:01:59 lr 0.001134 wd 0.0500 time 0.4439 (0.4496) data time 0.0008 (0.0018) model time 0.4431 (0.4480) loss 3.6468 (3.2925) grad_norm 1.9244 (1.4369) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][370/625] eta 0:01:54 lr 0.001133 wd 0.0500 time 0.4443 (0.4495) data time 0.0010 (0.0018) model time 0.4433 (0.4479) loss 3.0631 (3.2848) grad_norm 1.2631 (1.4433) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][380/625] eta 0:01:50 lr 0.001133 wd 0.0500 time 0.6162 (0.4504) data time 0.0008 (0.0018) model time 0.6154 (0.4490) loss 3.0499 (3.2918) grad_norm 1.0615 (1.4458) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][390/625] eta 0:01:45 lr 0.001133 wd 0.0500 time 0.4472 (0.4504) data time 0.0006 (0.0017) model time 0.4466 (0.4489) loss 3.0428 (3.2895) grad_norm 1.5810 (1.4435) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][400/625] eta 0:01:41 lr 0.001133 wd 0.0500 time 0.4495 (0.4503) data time 0.0008 (0.0017) model time 0.4487 (0.4489) loss 2.5601 (3.2871) grad_norm 1.0679 (1.4421) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][410/625] eta 0:01:36 lr 0.001133 wd 0.0500 time 0.4473 (0.4503) data time 0.0006 (0.0017) model time 0.4466 (0.4488) loss 3.0417 (3.2887) grad_norm 1.5657 (1.4403) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][420/625] eta 0:01:32 lr 0.001133 wd 0.0500 time 0.4488 (0.4502) data time 0.0009 (0.0017) model time 0.4479 (0.4488) loss 3.7036 (3.2905) grad_norm 1.2107 (1.4391) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:20:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][430/625] eta 0:01:27 lr 0.001133 wd 0.0500 time 0.4432 (0.4504) data time 0.0006 (0.0016) model time 0.4426 (0.4491) loss 3.8728 (3.2887) grad_norm 1.2153 (1.4373) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][440/625] eta 0:01:23 lr 0.001133 wd 0.0500 time 0.4491 (0.4503) data time 0.0008 (0.0016) model time 0.4483 (0.4490) loss 3.7147 (3.2910) grad_norm 1.5803 (1.4415) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][450/625] eta 0:01:18 lr 0.001133 wd 0.0500 time 0.4463 (0.4503) data time 0.0006 (0.0016) model time 0.4458 (0.4489) loss 2.2826 (3.2879) grad_norm 1.4402 (1.4430) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][460/625] eta 0:01:14 lr 0.001133 wd 0.0500 time 0.4504 (0.4503) data time 0.0008 (0.0016) model time 0.4497 (0.4489) loss 4.1594 (3.2916) grad_norm 1.2140 (1.4392) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][470/625] eta 0:01:09 lr 0.001133 wd 0.0500 time 0.4495 (0.4503) data time 0.0008 (0.0016) model time 0.4487 (0.4490) loss 3.3178 (3.2939) grad_norm 1.6627 (1.4397) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][480/625] eta 0:01:05 lr 0.001133 wd 0.0500 time 0.4486 (0.4503) data time 0.0007 (0.0016) model time 0.4479 (0.4490) loss 3.8320 (3.2952) grad_norm 1.1958 (1.4386) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][490/625] eta 0:01:00 lr 0.001133 wd 0.0500 time 0.4463 (0.4502) data time 0.0006 (0.0015) model time 0.4456 (0.4489) loss 3.6662 (3.2955) grad_norm 1.3069 (1.4360) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][500/625] eta 0:00:56 lr 0.001133 wd 0.0500 time 0.4504 (0.4502) data time 0.0006 (0.0015) model time 0.4498 (0.4489) loss 2.9915 (3.2957) grad_norm 1.2503 (1.4334) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][510/625] eta 0:00:51 lr 0.001133 wd 0.0500 time 0.4454 (0.4502) data time 0.0008 (0.0015) model time 0.4446 (0.4489) loss 3.4160 (3.2969) grad_norm 2.0585 (1.4365) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][520/625] eta 0:00:47 lr 0.001133 wd 0.0500 time 0.4511 (0.4501) data time 0.0006 (0.0015) model time 0.4504 (0.4489) loss 3.1541 (3.2933) grad_norm 1.0730 (1.4398) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][530/625] eta 0:00:42 lr 0.001133 wd 0.0500 time 0.4455 (0.4508) data time 0.0006 (0.0015) model time 0.4449 (0.4496) loss 3.2352 (3.2987) grad_norm 1.2112 (1.4376) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][540/625] eta 0:00:38 lr 0.001133 wd 0.0500 time 0.4511 (0.4508) data time 0.0006 (0.0015) model time 0.4505 (0.4496) loss 3.6242 (3.2993) grad_norm 1.3569 (1.4357) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][550/625] eta 0:00:33 lr 0.001133 wd 0.0500 time 0.4483 (0.4508) data time 0.0008 (0.0015) model time 0.4476 (0.4496) loss 3.2482 (3.3008) grad_norm 1.4036 (1.4340) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:21:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][560/625] eta 0:00:29 lr 0.001133 wd 0.0500 time 0.4464 (0.4507) data time 0.0006 (0.0015) model time 0.4459 (0.4495) loss 2.9245 (3.3003) grad_norm 1.8010 (1.4330) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][570/625] eta 0:00:24 lr 0.001132 wd 0.0500 time 0.4477 (0.4507) data time 0.0008 (0.0014) model time 0.4469 (0.4495) loss 3.3297 (3.3029) grad_norm 0.9707 (1.4336) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][580/625] eta 0:00:20 lr 0.001132 wd 0.0500 time 0.4483 (0.4507) data time 0.0006 (0.0014) model time 0.4478 (0.4495) loss 3.2411 (3.3051) grad_norm 1.3234 (1.4319) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][590/625] eta 0:00:15 lr 0.001132 wd 0.0500 time 0.4479 (0.4507) data time 0.0006 (0.0014) model time 0.4473 (0.4495) loss 3.9526 (3.3053) grad_norm 1.7368 (1.4310) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][600/625] eta 0:00:11 lr 0.001132 wd 0.0500 time 0.4519 (0.4506) data time 0.0006 (0.0014) model time 0.4513 (0.4495) loss 3.1261 (3.3099) grad_norm 1.6714 (1.4344) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][610/625] eta 0:00:06 lr 0.001132 wd 0.0500 time 0.4465 (0.4507) data time 0.0004 (0.0014) model time 0.4461 (0.4495) loss 4.0418 (3.3165) grad_norm 2.5866 (1.4353) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [62/300][620/625] eta 0:00:02 lr 0.001132 wd 0.0500 time 0.4458 (0.4506) data time 0.0004 (0.0014) model time 0.4454 (0.4494) loss 3.0409 (3.3183) grad_norm 0.9400 (1.4353) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:25 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 62 training takes 0:04:41 [2024-08-04 17:22:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:22:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:22:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.6377 (0.6377) Acc@1 86.572 (86.572) Acc@5 97.510 (97.510) Mem 16704MB [2024-08-04 17:22:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 1.0918 (0.8079) Acc@1 74.854 (82.320) Acc@5 93.115 (96.480) Mem 16704MB [2024-08-04 17:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 1.2568 (0.9474) Acc@1 70.459 (78.813) Acc@5 91.846 (94.713) Mem 16704MB [2024-08-04 17:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.529 Acc@5 94.668 [2024-08-04 17:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.5% [2024-08-04 17:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.817 (0.817) Loss 0.5322 (0.5322) Acc@1 87.598 (87.598) Acc@5 98.340 (98.340) Mem 16704MB [2024-08-04 17:22:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.183) Loss 0.8955 (0.6750) Acc@1 77.881 (84.286) Acc@5 94.727 (97.061) Mem 16704MB [2024-08-04 17:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 1.0342 (0.8139) Acc@1 73.340 (80.680) Acc@5 93.652 (95.573) Mem 16704MB [2024-08-04 17:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.442 Acc@5 95.559 [2024-08-04 17:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-08-04 17:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.44% [2024-08-04 17:22:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 17:22:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 17:22:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][0/625] eta 0:07:28 lr 0.001132 wd 0.0500 time 0.7174 (0.7174) data time 0.3343 (0.3343) model time 0.0000 (0.0000) loss 3.4804 (3.4804) grad_norm 1.3256 (1.3256) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][10/625] eta 0:04:49 lr 0.001132 wd 0.0500 time 0.4492 (0.4706) data time 0.0006 (0.0312) model time 0.0000 (0.0000) loss 2.8329 (3.4310) grad_norm 1.5927 (1.3821) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 17:22:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:22:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-04 17:24:42 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-04 17:24:54 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-04 17:25:06 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-04 17:25:06 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-04 17:25:08 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-04 17:25:10 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-04 17:25:10 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 63) [2024-08-04 17:25:10 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-04 17:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][20/625] eta 1:05:12 lr 0.001132 wd 0.0500 time 0.4414 (6.4669) data time 0.0006 (0.2618) model time 0.0000 (0.0000) loss 3.0396 (3.5921) grad_norm 1.2512 (1.4990) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][30/625] eta 0:18:10 lr 0.001132 wd 0.0500 time 0.4382 (1.8324) data time 0.0008 (0.0611) model time 0.0000 (0.0000) loss 3.3617 (3.5378) grad_norm 1.6754 (1.3275) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][40/625] eta 0:11:57 lr 0.001132 wd 0.0500 time 0.4392 (1.2271) data time 0.0006 (0.0349) model time 0.0000 (0.0000) loss 3.8407 (3.5828) grad_norm 1.0336 (1.3477) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][50/625] eta 0:09:32 lr 0.001132 wd 0.0500 time 0.4354 (0.9964) data time 0.0007 (0.0246) model time 0.0000 (0.0000) loss 4.2937 (3.5976) grad_norm 1.2788 (1.3362) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][60/625] eta 0:08:12 lr 0.001132 wd 0.0500 time 0.4417 (0.8712) data time 0.0008 (0.0191) model time 0.4409 (0.4572) loss 3.1392 (3.5069) grad_norm 1.2198 (1.3236) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:25:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][70/625] eta 0:07:18 lr 0.001132 wd 0.0500 time 0.4411 (0.7896) data time 0.0008 (0.0157) model time 0.4404 (0.4476) loss 3.6136 (3.5098) grad_norm 1.0010 (1.4292) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][80/625] eta 0:06:40 lr 0.001132 wd 0.0500 time 0.4394 (0.7342) data time 0.0007 (0.0133) model time 0.4387 (0.4449) loss 3.1087 (3.4759) grad_norm 1.1419 (1.4260) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][90/625] eta 0:06:11 lr 0.001132 wd 0.0500 time 0.4384 (0.6940) data time 0.0008 (0.0116) model time 0.4376 (0.4437) loss 3.7111 (3.4416) grad_norm 1.3810 (1.4250) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][100/625] eta 0:05:48 lr 0.001132 wd 0.0500 time 0.4416 (0.6635) data time 0.0006 (0.0103) model time 0.4410 (0.4429) loss 2.1914 (3.4002) grad_norm 1.4557 (1.4163) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][110/625] eta 0:05:29 lr 0.001132 wd 0.0500 time 0.4384 (0.6396) data time 0.0006 (0.0093) model time 0.4378 (0.4425) loss 3.7391 (3.3822) grad_norm 1.0069 (1.3930) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][120/625] eta 0:05:13 lr 0.001132 wd 0.0500 time 0.4397 (0.6202) data time 0.0006 (0.0085) model time 0.4391 (0.4419) loss 4.0331 (3.4023) grad_norm 1.6293 (1.3966) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][130/625] eta 0:04:59 lr 0.001132 wd 0.0500 time 0.4409 (0.6043) data time 0.0006 (0.0078) model time 0.4404 (0.4417) loss 3.3664 (3.3880) grad_norm 1.4605 (1.4005) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][140/625] eta 0:04:46 lr 0.001132 wd 0.0500 time 0.4428 (0.5911) data time 0.0007 (0.0072) model time 0.4421 (0.4416) loss 3.6603 (3.3873) grad_norm 1.1631 (1.4039) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][150/625] eta 0:04:35 lr 0.001131 wd 0.0500 time 0.4426 (0.5800) data time 0.0008 (0.0068) model time 0.4418 (0.4417) loss 3.8071 (3.3848) grad_norm 1.4163 (1.4022) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][160/625] eta 0:04:25 lr 0.001131 wd 0.0500 time 0.4459 (0.5704) data time 0.0008 (0.0063) model time 0.4451 (0.4418) loss 3.4089 (3.3634) grad_norm 1.1287 (1.4152) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][170/625] eta 0:04:15 lr 0.001131 wd 0.0500 time 0.4441 (0.5621) data time 0.0006 (0.0060) model time 0.4436 (0.4418) loss 3.4820 (3.3527) grad_norm 1.4311 (1.4154) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][180/625] eta 0:04:06 lr 0.001131 wd 0.0500 time 0.4441 (0.5549) data time 0.0007 (0.0057) model time 0.4435 (0.4420) loss 2.6397 (3.3543) grad_norm 1.0665 (1.4149) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][190/625] eta 0:03:58 lr 0.001131 wd 0.0500 time 0.4554 (0.5485) data time 0.0009 (0.0054) model time 0.4545 (0.4420) loss 3.5689 (3.3551) grad_norm 2.2197 (1.4096) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][200/625] eta 0:03:51 lr 0.001131 wd 0.0500 time 0.6528 (0.5438) data time 0.0007 (0.0052) model time 0.6521 (0.4433) loss 3.8083 (3.3455) grad_norm 1.6042 (1.4146) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][210/625] eta 0:03:43 lr 0.001131 wd 0.0500 time 0.4421 (0.5382) data time 0.0006 (0.0049) model time 0.4415 (0.4429) loss 3.4411 (3.3411) grad_norm 1.0527 (1.4138) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][220/625] eta 0:03:36 lr 0.001131 wd 0.0500 time 0.4409 (0.5336) data time 0.0006 (0.0047) model time 0.4403 (0.4428) loss 2.6811 (3.3299) grad_norm 1.4196 (1.4060) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][230/625] eta 0:03:29 lr 0.001131 wd 0.0500 time 0.4442 (0.5293) data time 0.0008 (0.0046) model time 0.4435 (0.4428) loss 2.0990 (3.3197) grad_norm 1.3234 (1.4003) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][240/625] eta 0:03:22 lr 0.001131 wd 0.0500 time 0.4469 (0.5255) data time 0.0006 (0.0044) model time 0.4463 (0.4428) loss 2.5722 (3.3149) grad_norm 1.2291 (1.3948) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][250/625] eta 0:03:15 lr 0.001131 wd 0.0500 time 0.4433 (0.5220) data time 0.0008 (0.0042) model time 0.4426 (0.4428) loss 2.5670 (3.3091) grad_norm 1.7024 (1.3939) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][260/625] eta 0:03:09 lr 0.001131 wd 0.0500 time 0.4426 (0.5187) data time 0.0008 (0.0041) model time 0.4418 (0.4427) loss 3.8197 (3.3117) grad_norm 1.1693 (1.3895) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][270/625] eta 0:03:03 lr 0.001131 wd 0.0500 time 0.4560 (0.5158) data time 0.0008 (0.0040) model time 0.4552 (0.4428) loss 3.6631 (3.3060) grad_norm 1.5826 (1.3913) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][280/625] eta 0:02:56 lr 0.001131 wd 0.0500 time 0.4404 (0.5129) data time 0.0005 (0.0038) model time 0.4398 (0.4427) loss 3.1309 (3.2948) grad_norm 1.3037 (1.3904) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][290/625] eta 0:02:50 lr 0.001131 wd 0.0500 time 0.4381 (0.5103) data time 0.0006 (0.0037) model time 0.4375 (0.4426) loss 3.9832 (3.2915) grad_norm 1.5625 (1.3861) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][300/625] eta 0:02:45 lr 0.001131 wd 0.0500 time 0.4516 (0.5079) data time 0.0008 (0.0036) model time 0.4508 (0.4426) loss 3.5890 (3.2903) grad_norm 1.4472 (1.3902) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][310/625] eta 0:02:39 lr 0.001131 wd 0.0500 time 0.4432 (0.5056) data time 0.0008 (0.0035) model time 0.4424 (0.4425) loss 3.4461 (3.2888) grad_norm 2.0116 (1.3890) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][320/625] eta 0:02:33 lr 0.001131 wd 0.0500 time 0.4414 (0.5035) data time 0.0008 (0.0035) model time 0.4406 (0.4424) loss 3.1499 (3.2820) grad_norm 1.2358 (1.3884) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][330/625] eta 0:02:27 lr 0.001131 wd 0.0500 time 0.4493 (0.5016) data time 0.0006 (0.0034) model time 0.4487 (0.4424) loss 3.8647 (3.2829) grad_norm 0.9783 (1.3919) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][340/625] eta 0:02:22 lr 0.001131 wd 0.0500 time 0.4375 (0.4997) data time 0.0008 (0.0033) model time 0.4367 (0.4423) loss 3.6549 (3.2915) grad_norm 1.1893 (1.3892) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][350/625] eta 0:02:16 lr 0.001130 wd 0.0500 time 0.4408 (0.4980) data time 0.0008 (0.0032) model time 0.4400 (0.4423) loss 3.2445 (3.2898) grad_norm 1.7896 (1.3957) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][360/625] eta 0:02:11 lr 0.001130 wd 0.0500 time 0.4450 (0.4963) data time 0.0008 (0.0032) model time 0.4442 (0.4422) loss 3.1907 (3.2909) grad_norm 1.5038 (1.3942) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][370/625] eta 0:02:06 lr 0.001130 wd 0.0500 time 0.4414 (0.4948) data time 0.0006 (0.0031) model time 0.4408 (0.4422) loss 3.6213 (3.2935) grad_norm 1.7387 (1.3967) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][380/625] eta 0:02:00 lr 0.001130 wd 0.0500 time 0.4437 (0.4934) data time 0.0006 (0.0030) model time 0.4431 (0.4422) loss 3.7505 (3.2940) grad_norm 2.0300 (1.3967) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][390/625] eta 0:01:55 lr 0.001130 wd 0.0500 time 0.4430 (0.4926) data time 0.0006 (0.0030) model time 0.4424 (0.4429) loss 3.9200 (3.2892) grad_norm 2.0001 (1.4009) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][400/625] eta 0:01:50 lr 0.001130 wd 0.0500 time 0.4524 (0.4914) data time 0.0007 (0.0029) model time 0.4517 (0.4429) loss 2.3746 (3.2842) grad_norm 1.2420 (1.4010) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][410/625] eta 0:01:45 lr 0.001130 wd 0.0500 time 0.4445 (0.4902) data time 0.0008 (0.0029) model time 0.4437 (0.4429) loss 3.9818 (3.2827) grad_norm 1.0971 (1.3994) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][420/625] eta 0:01:40 lr 0.001130 wd 0.0500 time 0.4384 (0.4890) data time 0.0007 (0.0028) model time 0.4376 (0.4429) loss 3.1875 (3.2895) grad_norm 1.5903 (1.4028) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][430/625] eta 0:01:35 lr 0.001130 wd 0.0500 time 0.4426 (0.4879) data time 0.0008 (0.0028) model time 0.4418 (0.4428) loss 3.8677 (3.2936) grad_norm 1.4475 (1.4085) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][440/625] eta 0:01:30 lr 0.001130 wd 0.0500 time 0.4393 (0.4868) data time 0.0006 (0.0027) model time 0.4388 (0.4428) loss 2.6567 (3.2916) grad_norm 1.1800 (1.4057) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][450/625] eta 0:01:25 lr 0.001130 wd 0.0500 time 0.4422 (0.4858) data time 0.0006 (0.0027) model time 0.4416 (0.4428) loss 4.3498 (3.3008) grad_norm 1.1019 (1.4013) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][460/625] eta 0:01:19 lr 0.001130 wd 0.0500 time 0.4400 (0.4848) data time 0.0006 (0.0026) model time 0.4394 (0.4427) loss 3.3193 (3.3040) grad_norm 2.0418 (1.4078) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][470/625] eta 0:01:14 lr 0.001130 wd 0.0500 time 0.4405 (0.4838) data time 0.0008 (0.0026) model time 0.4398 (0.4427) loss 2.2775 (3.3005) grad_norm 1.3518 (1.4086) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:28:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][480/625] eta 0:01:10 lr 0.001130 wd 0.0500 time 0.4392 (0.4829) data time 0.0006 (0.0026) model time 0.4386 (0.4426) loss 3.0949 (3.2979) grad_norm 1.6096 (1.4060) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][490/625] eta 0:01:05 lr 0.001130 wd 0.0500 time 0.4383 (0.4820) data time 0.0006 (0.0025) model time 0.4377 (0.4426) loss 3.0815 (3.2916) grad_norm 1.7880 (1.4065) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][500/625] eta 0:01:00 lr 0.001130 wd 0.0500 time 0.4381 (0.4811) data time 0.0006 (0.0025) model time 0.4375 (0.4425) loss 4.1428 (3.2919) grad_norm 1.2374 (1.4040) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][510/625] eta 0:00:55 lr 0.001130 wd 0.0500 time 0.4411 (0.4803) data time 0.0006 (0.0025) model time 0.4405 (0.4424) loss 2.6691 (3.2907) grad_norm 1.7319 (1.4021) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][520/625] eta 0:00:50 lr 0.001130 wd 0.0500 time 0.4408 (0.4796) data time 0.0007 (0.0024) model time 0.4401 (0.4424) loss 2.7985 (3.2890) grad_norm 1.7362 (1.4056) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][530/625] eta 0:00:45 lr 0.001130 wd 0.0500 time 0.4414 (0.4789) data time 0.0006 (0.0024) model time 0.4408 (0.4424) loss 3.9617 (3.2965) grad_norm 1.0404 (1.4011) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][540/625] eta 0:00:40 lr 0.001130 wd 0.0500 time 0.4469 (0.4785) data time 0.0007 (0.0024) model time 0.4461 (0.4427) loss 3.3416 (3.2940) grad_norm 1.2134 (1.4003) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][550/625] eta 0:00:35 lr 0.001129 wd 0.0500 time 0.4433 (0.4778) data time 0.0006 (0.0023) model time 0.4428 (0.4427) loss 3.4798 (3.2928) grad_norm 1.6637 (1.4015) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][560/625] eta 0:00:31 lr 0.001129 wd 0.0500 time 0.4427 (0.4772) data time 0.0008 (0.0023) model time 0.4419 (0.4427) loss 3.2639 (3.2897) grad_norm 2.2919 (1.4019) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][570/625] eta 0:00:26 lr 0.001129 wd 0.0500 time 0.4413 (0.4765) data time 0.0006 (0.0023) model time 0.4407 (0.4427) loss 3.6852 (3.2936) grad_norm 1.1304 (1.4036) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][580/625] eta 0:00:21 lr 0.001129 wd 0.0500 time 0.4444 (0.4762) data time 0.0008 (0.0023) model time 0.4436 (0.4429) loss 2.7907 (3.2953) grad_norm 1.6116 (1.4090) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][590/625] eta 0:00:16 lr 0.001129 wd 0.0500 time 0.4435 (0.4756) data time 0.0007 (0.0022) model time 0.4429 (0.4429) loss 3.2440 (3.2944) grad_norm 1.2814 (1.4060) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][600/625] eta 0:00:11 lr 0.001129 wd 0.0500 time 0.4466 (0.4751) data time 0.0008 (0.0022) model time 0.4458 (0.4429) loss 3.5217 (3.2994) grad_norm 1.4488 (1.4047) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][610/625] eta 0:00:07 lr 0.001129 wd 0.0500 time 0.4367 (0.4745) data time 0.0006 (0.0022) model time 0.4361 (0.4429) loss 2.1499 (3.2966) grad_norm 2.3150 (1.4051) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [63/300][620/625] eta 0:00:02 lr 0.001129 wd 0.0500 time 0.4410 (0.4739) data time 0.0006 (0.0022) model time 0.4404 (0.4428) loss 2.5131 (3.2953) grad_norm 1.4752 (1.4075) loss_scale 2048.0000 (2048.0000) mem 16694MB [2024-08-04 17:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 63 training takes 0:04:47 [2024-08-04 17:30:02 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:30:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-04 17:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.453 (0.453) Loss 0.6045 (0.6045) Acc@1 86.133 (86.133) Acc@5 97.803 (97.803) Mem 16694MB [2024-08-04 17:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 1.0127 (0.7517) Acc@1 76.074 (82.604) Acc@5 93.555 (96.604) Mem 16694MB [2024-08-04 17:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 1.1250 (0.8926) Acc@1 71.582 (79.153) Acc@5 92.627 (94.896) Mem 16694MB [2024-08-04 17:30:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.945 Acc@5 94.894 [2024-08-04 17:30:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 78.9% [2024-08-04 17:30:13 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 78.94% [2024-08-04 17:30:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-04 17:30:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-04 17:30:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.449 (0.449) Loss 0.5317 (0.5317) Acc@1 87.646 (87.646) Acc@5 98.340 (98.340) Mem 16694MB [2024-08-04 17:30:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.148) Loss 0.8945 (0.6745) Acc@1 78.027 (84.317) Acc@5 94.775 (97.084) Mem 16694MB [2024-08-04 17:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.132) Loss 1.0293 (0.8118) Acc@1 73.633 (80.776) Acc@5 93.750 (95.633) Mem 16694MB [2024-08-04 17:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.522 Acc@5 95.611 [2024-08-04 17:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-04 17:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.52% [2024-08-04 17:30:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-04 17:30:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-04 17:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][0/625] eta 0:09:02 lr 0.001129 wd 0.0500 time 0.8687 (0.8687) data time 0.3965 (0.3965) model time 0.0000 (0.0000) loss 3.0435 (3.0435) grad_norm 1.0646 (1.0646) loss_scale 2048.0000 (2048.0000) mem 16704MB [2024-08-04 17:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][10/625] eta 0:04:55 lr 0.001129 wd 0.0500 time 0.4480 (0.4811) data time 0.0009 (0.0368) model time 0.0000 (0.0000) loss 3.0827 (3.3887) grad_norm 1.2653 (1.5444) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][20/625] eta 0:04:39 lr 0.001129 wd 0.0500 time 0.4408 (0.4626) data time 0.0006 (0.0197) model time 0.0000 (0.0000) loss 2.9455 (3.4687) grad_norm 1.3584 (1.5321) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][30/625] eta 0:04:31 lr 0.001129 wd 0.0500 time 0.4399 (0.4562) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 3.3015 (3.4691) grad_norm 1.2495 (1.5641) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][40/625] eta 0:04:24 lr 0.001129 wd 0.0500 time 0.4391 (0.4530) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 3.1796 (3.3678) grad_norm 1.5750 (1.5476) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][50/625] eta 0:04:19 lr 0.001129 wd 0.0500 time 0.4436 (0.4511) data time 0.0006 (0.0086) model time 0.0000 (0.0000) loss 2.6871 (3.3287) grad_norm 1.3672 (1.5778) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][60/625] eta 0:04:15 lr 0.001129 wd 0.0500 time 0.4473 (0.4528) data time 0.0006 (0.0074) model time 0.4467 (0.4602) loss 3.5330 (3.3169) grad_norm 1.1923 (1.5544) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][70/625] eta 0:04:10 lr 0.001129 wd 0.0500 time 0.4468 (0.4513) data time 0.0008 (0.0064) model time 0.4459 (0.4511) loss 3.3819 (3.3517) grad_norm 1.2111 (1.5323) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][80/625] eta 0:04:05 lr 0.001129 wd 0.0500 time 0.4426 (0.4503) data time 0.0007 (0.0057) model time 0.4419 (0.4482) loss 3.3636 (3.3518) grad_norm 2.0154 (1.5278) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][90/625] eta 0:04:00 lr 0.001129 wd 0.0500 time 0.4426 (0.4497) data time 0.0006 (0.0052) model time 0.4420 (0.4471) loss 3.7078 (3.3106) grad_norm 1.4851 (1.5138) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][100/625] eta 0:03:55 lr 0.001129 wd 0.0500 time 0.4397 (0.4489) data time 0.0009 (0.0048) model time 0.4388 (0.4457) loss 3.5033 (3.3112) grad_norm 1.5644 (1.4993) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][110/625] eta 0:03:50 lr 0.001129 wd 0.0500 time 0.4413 (0.4483) data time 0.0006 (0.0044) model time 0.4407 (0.4450) loss 3.8696 (3.2800) grad_norm 1.7615 (1.4849) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][120/625] eta 0:03:46 lr 0.001128 wd 0.0500 time 0.4384 (0.4477) data time 0.0008 (0.0041) model time 0.4376 (0.4444) loss 3.8348 (3.2763) grad_norm 1.7542 (1.4759) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][130/625] eta 0:03:41 lr 0.001128 wd 0.0500 time 0.4420 (0.4473) data time 0.0008 (0.0039) model time 0.4412 (0.4440) loss 3.3864 (3.3114) grad_norm 1.0223 (1.4665) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][140/625] eta 0:03:36 lr 0.001128 wd 0.0500 time 0.4391 (0.4468) data time 0.0007 (0.0037) model time 0.4383 (0.4435) loss 1.9727 (3.3022) grad_norm 1.3556 (1.4619) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][150/625] eta 0:03:32 lr 0.001128 wd 0.0500 time 0.4371 (0.4478) data time 0.0006 (0.0035) model time 0.4365 (0.4452) loss 4.0553 (3.2945) grad_norm 1.0823 (1.4454) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][160/625] eta 0:03:28 lr 0.001128 wd 0.0500 time 0.4433 (0.4474) data time 0.0008 (0.0033) model time 0.4424 (0.4448) loss 3.5544 (3.3057) grad_norm 1.0755 (1.4467) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][170/625] eta 0:03:23 lr 0.001128 wd 0.0500 time 0.4511 (0.4471) data time 0.0007 (0.0032) model time 0.4505 (0.4445) loss 4.1915 (3.3207) grad_norm 1.5912 (1.4583) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][180/625] eta 0:03:18 lr 0.001128 wd 0.0500 time 0.4394 (0.4468) data time 0.0007 (0.0031) model time 0.4386 (0.4442) loss 3.9691 (3.3232) grad_norm 1.6923 (1.4604) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][190/625] eta 0:03:14 lr 0.001128 wd 0.0500 time 0.4427 (0.4466) data time 0.0006 (0.0029) model time 0.4421 (0.4440) loss 3.5757 (3.3197) grad_norm 1.6406 (1.4514) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][200/625] eta 0:03:09 lr 0.001128 wd 0.0500 time 0.4430 (0.4464) data time 0.0008 (0.0028) model time 0.4422 (0.4439) loss 3.5909 (3.3017) grad_norm 1.5090 (1.4463) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][210/625] eta 0:03:05 lr 0.001128 wd 0.0500 time 0.4443 (0.4461) data time 0.0008 (0.0027) model time 0.4434 (0.4436) loss 3.5628 (3.3013) grad_norm 1.2153 (1.4508) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][220/625] eta 0:03:00 lr 0.001128 wd 0.0500 time 0.4426 (0.4459) data time 0.0008 (0.0027) model time 0.4418 (0.4435) loss 3.2528 (3.2867) grad_norm 1.7558 (1.4533) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][230/625] eta 0:02:56 lr 0.001128 wd 0.0500 time 0.4486 (0.4458) data time 0.0006 (0.0026) model time 0.4480 (0.4434) loss 3.8901 (3.2831) grad_norm 1.2406 (1.4636) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][240/625] eta 0:02:51 lr 0.001128 wd 0.0500 time 0.4361 (0.4456) data time 0.0006 (0.0025) model time 0.4355 (0.4433) loss 2.7885 (3.2780) grad_norm 1.5069 (1.4627) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][250/625] eta 0:02:47 lr 0.001128 wd 0.0500 time 0.4489 (0.4457) data time 0.0006 (0.0024) model time 0.4483 (0.4434) loss 2.8471 (3.2835) grad_norm 0.9404 (1.4614) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][260/625] eta 0:02:42 lr 0.001128 wd 0.0500 time 0.4418 (0.4455) data time 0.0007 (0.0024) model time 0.4411 (0.4433) loss 3.9482 (3.2832) grad_norm 1.2462 (1.4556) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][270/625] eta 0:02:38 lr 0.001128 wd 0.0500 time 0.4390 (0.4454) data time 0.0009 (0.0023) model time 0.4381 (0.4432) loss 3.2817 (3.2824) grad_norm 1.6278 (1.4558) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][280/625] eta 0:02:33 lr 0.001128 wd 0.0500 time 0.4376 (0.4452) data time 0.0008 (0.0023) model time 0.4368 (0.4430) loss 3.5472 (3.2782) grad_norm 1.3540 (1.4498) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][290/625] eta 0:02:29 lr 0.001128 wd 0.0500 time 0.4410 (0.4451) data time 0.0006 (0.0022) model time 0.4404 (0.4429) loss 2.3768 (3.2718) grad_norm 3.4169 (1.4589) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][300/625] eta 0:02:24 lr 0.001128 wd 0.0500 time 0.4401 (0.4449) data time 0.0009 (0.0022) model time 0.4392 (0.4428) loss 2.7157 (3.2725) grad_norm 1.3921 (1.4742) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][310/625] eta 0:02:20 lr 0.001127 wd 0.0500 time 0.4394 (0.4448) data time 0.0008 (0.0021) model time 0.4386 (0.4427) loss 3.0081 (3.2759) grad_norm 1.0014 (1.4721) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][320/625] eta 0:02:15 lr 0.001127 wd 0.0500 time 0.4440 (0.4447) data time 0.0006 (0.0021) model time 0.4434 (0.4427) loss 3.9165 (3.2895) grad_norm 1.2489 (1.4683) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][330/625] eta 0:02:11 lr 0.001127 wd 0.0500 time 0.4442 (0.4447) data time 0.0007 (0.0021) model time 0.4435 (0.4426) loss 3.0973 (3.2876) grad_norm 1.0515 (1.4653) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][340/625] eta 0:02:06 lr 0.001127 wd 0.0500 time 0.4422 (0.4446) data time 0.0006 (0.0020) model time 0.4416 (0.4426) loss 3.2814 (3.2838) grad_norm 1.3678 (1.4641) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:32:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][350/625] eta 0:02:02 lr 0.001127 wd 0.0500 time 0.4442 (0.4445) data time 0.0006 (0.0020) model time 0.4436 (0.4425) loss 3.7914 (3.2812) grad_norm 1.1714 (1.4624) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][360/625] eta 0:01:57 lr 0.001127 wd 0.0500 time 0.4521 (0.4444) data time 0.0008 (0.0020) model time 0.4513 (0.4424) loss 3.3336 (3.2829) grad_norm 1.1052 (1.4613) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][370/625] eta 0:01:53 lr 0.001127 wd 0.0500 time 0.4401 (0.4452) data time 0.0009 (0.0019) model time 0.4393 (0.4434) loss 2.5091 (3.2838) grad_norm 1.3890 (1.4631) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][380/625] eta 0:01:49 lr 0.001127 wd 0.0500 time 0.4402 (0.4451) data time 0.0009 (0.0019) model time 0.4393 (0.4433) loss 1.8692 (3.2810) grad_norm 1.0027 (1.4631) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][390/625] eta 0:01:44 lr 0.001127 wd 0.0500 time 0.4450 (0.4451) data time 0.0008 (0.0019) model time 0.4441 (0.4433) loss 3.6806 (3.2835) grad_norm 1.5199 (1.4731) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][400/625] eta 0:01:40 lr 0.001127 wd 0.0500 time 0.4411 (0.4450) data time 0.0005 (0.0019) model time 0.4406 (0.4432) loss 3.6386 (3.2818) grad_norm 1.5505 (1.4784) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][410/625] eta 0:01:35 lr 0.001127 wd 0.0500 time 0.4433 (0.4449) data time 0.0007 (0.0018) model time 0.4426 (0.4432) loss 2.9788 (3.2840) grad_norm 1.1499 (1.4751) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][420/625] eta 0:01:31 lr 0.001127 wd 0.0500 time 0.4409 (0.4448) data time 0.0008 (0.0018) model time 0.4400 (0.4431) loss 2.8151 (3.2805) grad_norm 1.3414 (1.4718) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][430/625] eta 0:01:26 lr 0.001127 wd 0.0500 time 0.4371 (0.4448) data time 0.0006 (0.0018) model time 0.4365 (0.4431) loss 3.9365 (3.2838) grad_norm 1.5538 (1.4686) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][440/625] eta 0:01:22 lr 0.001127 wd 0.0500 time 0.4413 (0.4447) data time 0.0008 (0.0018) model time 0.4405 (0.4430) loss 2.4850 (3.2824) grad_norm 1.2920 (1.4625) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][450/625] eta 0:01:17 lr 0.001127 wd 0.0500 time 0.4456 (0.4446) data time 0.0006 (0.0018) model time 0.4450 (0.4429) loss 3.7249 (3.2847) grad_norm 1.7759 (1.4628) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][460/625] eta 0:01:13 lr 0.001127 wd 0.0500 time 0.4402 (0.4446) data time 0.0008 (0.0017) model time 0.4394 (0.4429) loss 3.7488 (3.2875) grad_norm 1.4686 (1.4632) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][470/625] eta 0:01:08 lr 0.001127 wd 0.0500 time 0.4434 (0.4448) data time 0.0008 (0.0017) model time 0.4426 (0.4432) loss 3.6277 (3.2838) grad_norm 1.4857 (1.4600) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:33:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][480/625] eta 0:01:04 lr 0.001127 wd 0.0500 time 0.4428 (0.4448) data time 0.0006 (0.0017) model time 0.4422 (0.4431) loss 2.1166 (3.2794) grad_norm 1.7684 (1.4587) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][490/625] eta 0:01:00 lr 0.001127 wd 0.0500 time 0.4437 (0.4448) data time 0.0006 (0.0017) model time 0.4431 (0.4431) loss 2.3672 (3.2779) grad_norm 1.1033 (1.4623) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:34:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][500/625] eta 0:00:55 lr 0.001127 wd 0.0500 time 0.4428 (0.4447) data time 0.0007 (0.0017) model time 0.4422 (0.4431) loss 3.2657 (3.2792) grad_norm 1.3634 (1.4600) loss_scale 2048.0000 (2048.0000) mem 16700MB [2024-08-04 17:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-04 17:34:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-04 17:34:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-06 22:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-06 22:50:06 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-06 22:50:08 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-06 22:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-06 22:50:28 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-06 22:50:31 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-06 22:50:33 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-06 22:50:33 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 64) [2024-08-06 22:50:33 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-06 22:50:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][510/625] eta 0:09:58 lr 0.001126 wd 0.0500 time 0.4478 (5.2086) data time 0.0006 (0.1103) model time 0.4472 (5.0983) loss 3.9889 (3.6536) grad_norm 1.2619 (1.4133) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-06 22:51:00 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-06 22:51:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-06 22:51:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 08:08:56 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 08:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 08:09:02 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 08:59:52 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 08:59:53 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 09:00:03 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 09:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 09:05:18 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 09:05:32 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 09:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 09:05:42 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 09:05:44 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 09:05:46 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 09:05:46 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 64) [2024-08-07 09:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 09:06:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][520/625] eta 0:14:32 lr 0.001126 wd 0.0500 time 0.4761 (8.3109) data time 0.0010 (0.2614) model time 0.4751 (8.0495) loss 2.8881 (3.8036) grad_norm 1.6918 (1.3744) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][530/625] eta 0:03:37 lr 0.001126 wd 0.0500 time 0.4814 (2.2883) data time 0.0011 (0.0613) model time 0.4803 (2.2271) loss 3.4215 (3.5774) grad_norm 1.3896 (1.4138) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][540/625] eta 0:02:07 lr 0.001126 wd 0.0500 time 0.4800 (1.5012) data time 0.0009 (0.0352) model time 0.4790 (1.4660) loss 3.9821 (3.5883) grad_norm 1.1605 (1.3997) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][550/625] eta 0:01:29 lr 0.001126 wd 0.0500 time 0.4744 (1.1991) data time 0.0009 (0.0249) model time 0.4735 (1.1742) loss 4.1778 (3.5700) grad_norm 1.5130 (1.4055) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][560/625] eta 0:01:07 lr 0.001126 wd 0.0500 time 0.4738 (1.0354) data time 0.0012 (0.0194) model time 0.4726 (1.0160) loss 3.1949 (3.5119) grad_norm 1.6442 (1.4226) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][570/625] eta 0:00:51 lr 0.001126 wd 0.0500 time 0.4735 (0.9295) data time 0.0011 (0.0160) model time 0.4724 (0.9136) loss 3.3806 (3.5051) grad_norm 1.2943 (1.4319) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][580/625] eta 0:00:38 lr 0.001126 wd 0.0500 time 0.4798 (0.8583) data time 0.0012 (0.0136) model time 0.4786 (0.8447) loss 3.0972 (3.4829) grad_norm 1.4353 (1.4116) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][590/625] eta 0:00:28 lr 0.001126 wd 0.0500 time 0.4822 (0.8067) data time 0.0011 (0.0119) model time 0.4811 (0.7948) loss 3.7113 (3.4425) grad_norm 1.9523 (1.3945) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][600/625] eta 0:00:19 lr 0.001126 wd 0.0500 time 0.4843 (0.7674) data time 0.0008 (0.0106) model time 0.4834 (0.7568) loss 2.8042 (3.3997) grad_norm 2.1898 (1.4244) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][610/625] eta 0:00:11 lr 0.001126 wd 0.0500 time 0.4684 (0.7364) data time 0.0006 (0.0096) model time 0.4678 (0.7267) loss 3.8358 (3.3971) grad_norm 1.1848 (1.4495) loss_scale 2048.0000 (2048.0000) mem 16695MB [2024-08-07 09:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [64/300][620/625] eta 0:00:03 lr 0.001126 wd 0.0500 time 0.4693 (0.7106) data time 0.0006 (0.0088) model time 0.4687 (0.7018) loss 3.7388 (3.4215) grad_norm 2.1672 (1.4708) loss_scale 4096.0000 (2167.3010) mem 16695MB [2024-08-07 09:07:06 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 64 training takes 0:01:15 [2024-08-07 09:07:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:07:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5947 (0.5947) Acc@1 86.768 (86.768) Acc@5 97.998 (97.998) Mem 16695MB [2024-08-07 09:07:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.163) Loss 1.0264 (0.7501) Acc@1 76.270 (82.977) Acc@5 93.164 (96.729) Mem 16695MB [2024-08-07 09:07:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 1.1230 (0.9027) Acc@1 73.389 (79.327) Acc@5 93.164 (94.950) Mem 16695MB [2024-08-07 09:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.079 Acc@5 94.924 [2024-08-07 09:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-08-07 09:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.08% [2024-08-07 09:07:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 09:07:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 09:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5312 (0.5312) Acc@1 87.793 (87.793) Acc@5 98.340 (98.340) Mem 16695MB [2024-08-07 09:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.160) Loss 0.8921 (0.6727) Acc@1 78.125 (84.424) Acc@5 94.775 (97.110) Mem 16695MB [2024-08-07 09:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.140) Loss 1.0254 (0.8088) Acc@1 73.779 (80.864) Acc@5 93.994 (95.675) Mem 16695MB [2024-08-07 09:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.612 Acc@5 95.661 [2024-08-07 09:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-08-07 09:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.61% [2024-08-07 09:07:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:07:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][0/625] eta 0:10:36 lr 0.001126 wd 0.0500 time 1.0190 (1.0190) data time 0.3981 (0.3981) model time 0.0000 (0.0000) loss 2.4560 (2.4560) grad_norm 2.2029 (2.2029) loss_scale 4096.0000 (4096.0000) mem 16704MB [2024-08-07 09:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][10/625] eta 0:05:22 lr 0.001126 wd 0.0500 time 0.4720 (0.5246) data time 0.0011 (0.0373) model time 0.0000 (0.0000) loss 3.0391 (3.2301) grad_norm 1.3881 (1.3789) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][20/625] eta 0:05:03 lr 0.001126 wd 0.0500 time 0.4682 (0.5011) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 3.3207 (3.1877) grad_norm 1.2892 (1.3774) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:07:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][30/625] eta 0:04:55 lr 0.001126 wd 0.0500 time 0.4144 (0.4960) data time 0.0011 (0.0140) model time 0.0000 (0.0000) loss 3.5641 (3.2435) grad_norm 1.2468 (1.4432) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:07:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][40/625] eta 0:04:48 lr 0.001126 wd 0.0500 time 0.4786 (0.4926) data time 0.0008 (0.0108) model time 0.0000 (0.0000) loss 3.5701 (3.2569) grad_norm 1.5518 (1.4501) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][50/625] eta 0:04:41 lr 0.001126 wd 0.0500 time 0.4851 (0.4898) data time 0.0010 (0.0090) model time 0.0000 (0.0000) loss 2.7215 (3.2507) grad_norm 1.3848 (1.4471) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][60/625] eta 0:04:35 lr 0.001126 wd 0.0500 time 0.4804 (0.4878) data time 0.0011 (0.0077) model time 0.4793 (0.4765) loss 3.0330 (3.2607) grad_norm 1.2928 (1.4437) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][70/625] eta 0:04:29 lr 0.001126 wd 0.0500 time 0.4765 (0.4864) data time 0.0008 (0.0068) model time 0.4757 (0.4766) loss 2.8275 (3.2299) grad_norm 1.5187 (1.4346) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][80/625] eta 0:04:24 lr 0.001125 wd 0.0500 time 0.4836 (0.4853) data time 0.0008 (0.0061) model time 0.4828 (0.4764) loss 3.5225 (3.2390) grad_norm 2.1228 (1.4316) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][90/625] eta 0:04:19 lr 0.001125 wd 0.0500 time 0.4779 (0.4843) data time 0.0011 (0.0055) model time 0.4768 (0.4761) loss 2.3636 (3.2263) grad_norm 0.9903 (1.4364) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][100/625] eta 0:04:13 lr 0.001125 wd 0.0500 time 0.4750 (0.4835) data time 0.0011 (0.0051) model time 0.4739 (0.4758) loss 3.8383 (3.2093) grad_norm 4.1675 (1.4958) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][110/625] eta 0:04:09 lr 0.001125 wd 0.0500 time 0.6999 (0.4850) data time 0.0012 (0.0048) model time 0.6987 (0.4797) loss 3.1211 (3.2099) grad_norm 0.9595 (1.4997) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][120/625] eta 0:04:04 lr 0.001125 wd 0.0500 time 0.4741 (0.4845) data time 0.0012 (0.0045) model time 0.4729 (0.4794) loss 3.3695 (3.2301) grad_norm 1.6317 (1.4900) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][130/625] eta 0:03:59 lr 0.001125 wd 0.0500 time 0.4774 (0.4840) data time 0.0009 (0.0042) model time 0.4766 (0.4791) loss 3.5639 (3.2258) grad_norm 1.1664 (1.4761) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][140/625] eta 0:03:54 lr 0.001125 wd 0.0500 time 0.4769 (0.4837) data time 0.0011 (0.0040) model time 0.4757 (0.4790) loss 2.5296 (3.2158) grad_norm 1.1864 (1.4683) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][150/625] eta 0:03:49 lr 0.001125 wd 0.0500 time 0.4770 (0.4832) data time 0.0012 (0.0038) model time 0.4758 (0.4787) loss 3.2227 (3.2041) grad_norm 1.0300 (1.4602) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][160/625] eta 0:03:44 lr 0.001125 wd 0.0500 time 0.4802 (0.4829) data time 0.0010 (0.0036) model time 0.4793 (0.4785) loss 3.7623 (3.1994) grad_norm 1.6862 (1.4527) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][170/625] eta 0:03:39 lr 0.001125 wd 0.0500 time 0.4740 (0.4825) data time 0.0011 (0.0035) model time 0.4730 (0.4782) loss 2.1310 (3.2085) grad_norm 1.4049 (1.4553) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][180/625] eta 0:03:35 lr 0.001125 wd 0.0500 time 0.4812 (0.4833) data time 0.0010 (0.0034) model time 0.4802 (0.4795) loss 3.6492 (3.2092) grad_norm 1.2982 (1.4455) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][190/625] eta 0:03:30 lr 0.001125 wd 0.0500 time 0.4787 (0.4830) data time 0.0011 (0.0033) model time 0.4776 (0.4793) loss 3.0286 (3.1906) grad_norm 1.5227 (1.4425) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][200/625] eta 0:03:25 lr 0.001125 wd 0.0500 time 0.4789 (0.4827) data time 0.0008 (0.0032) model time 0.4781 (0.4792) loss 2.5109 (3.1884) grad_norm 1.2705 (1.4629) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][210/625] eta 0:03:20 lr 0.001125 wd 0.0500 time 0.4829 (0.4825) data time 0.0011 (0.0031) model time 0.4818 (0.4789) loss 3.1351 (3.2074) grad_norm 1.1538 (1.4589) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][220/625] eta 0:03:15 lr 0.001125 wd 0.0500 time 0.4799 (0.4822) data time 0.0009 (0.0030) model time 0.4790 (0.4788) loss 3.7807 (3.2207) grad_norm 1.6069 (1.4586) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][230/625] eta 0:03:10 lr 0.001125 wd 0.0500 time 0.4730 (0.4821) data time 0.0010 (0.0029) model time 0.4720 (0.4787) loss 3.5977 (3.2170) grad_norm 1.1194 (1.4638) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][240/625] eta 0:03:05 lr 0.001125 wd 0.0500 time 0.4792 (0.4819) data time 0.0009 (0.0028) model time 0.4783 (0.4786) loss 3.6521 (3.2210) grad_norm 1.8780 (1.4665) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][250/625] eta 0:03:00 lr 0.001125 wd 0.0500 time 0.4796 (0.4818) data time 0.0011 (0.0028) model time 0.4785 (0.4786) loss 3.5115 (3.2246) grad_norm 1.4352 (1.4634) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][260/625] eta 0:02:55 lr 0.001125 wd 0.0500 time 0.4772 (0.4817) data time 0.0008 (0.0027) model time 0.4763 (0.4785) loss 2.6688 (3.2234) grad_norm 1.6044 (1.4685) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][270/625] eta 0:02:50 lr 0.001124 wd 0.0500 time 0.4731 (0.4816) data time 0.0011 (0.0026) model time 0.4720 (0.4785) loss 4.0406 (3.2275) grad_norm 1.1159 (1.4702) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][280/625] eta 0:02:46 lr 0.001124 wd 0.0500 time 0.4742 (0.4815) data time 0.0008 (0.0026) model time 0.4734 (0.4784) loss 3.0159 (3.2205) grad_norm 1.3495 (1.4802) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][290/625] eta 0:02:41 lr 0.001124 wd 0.0500 time 0.4824 (0.4815) data time 0.0010 (0.0025) model time 0.4814 (0.4785) loss 3.6671 (3.2237) grad_norm 1.4275 (1.4760) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][300/625] eta 0:02:36 lr 0.001124 wd 0.0500 time 0.4871 (0.4818) data time 0.0011 (0.0025) model time 0.4860 (0.4790) loss 3.6699 (3.2352) grad_norm 1.4422 (1.4753) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][310/625] eta 0:02:31 lr 0.001124 wd 0.0500 time 0.4799 (0.4817) data time 0.0008 (0.0024) model time 0.4791 (0.4789) loss 3.8679 (3.2336) grad_norm 1.2458 (1.4767) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][320/625] eta 0:02:26 lr 0.001124 wd 0.0500 time 0.4830 (0.4816) data time 0.0008 (0.0024) model time 0.4822 (0.4789) loss 3.9820 (3.2424) grad_norm 1.0217 (1.4721) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][330/625] eta 0:02:22 lr 0.001124 wd 0.0500 time 0.4823 (0.4815) data time 0.0010 (0.0024) model time 0.4813 (0.4788) loss 3.1475 (3.2472) grad_norm 1.1686 (1.4641) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][340/625] eta 0:02:17 lr 0.001124 wd 0.0500 time 0.4811 (0.4814) data time 0.0010 (0.0023) model time 0.4801 (0.4787) loss 2.5910 (3.2456) grad_norm 1.5155 (1.4624) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][350/625] eta 0:02:12 lr 0.001124 wd 0.0500 time 0.4802 (0.4813) data time 0.0009 (0.0023) model time 0.4793 (0.4787) loss 3.1869 (3.2437) grad_norm 2.7237 (1.4795) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][360/625] eta 0:02:07 lr 0.001124 wd 0.0500 time 0.4804 (0.4812) data time 0.0011 (0.0023) model time 0.4793 (0.4786) loss 3.1372 (3.2394) grad_norm 1.3908 (1.4840) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][370/625] eta 0:02:02 lr 0.001124 wd 0.0500 time 0.4807 (0.4811) data time 0.0010 (0.0022) model time 0.4797 (0.4786) loss 2.3097 (3.2345) grad_norm 1.1700 (1.4810) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][380/625] eta 0:01:57 lr 0.001124 wd 0.0500 time 0.4809 (0.4811) data time 0.0011 (0.0022) model time 0.4799 (0.4786) loss 3.2930 (3.2416) grad_norm 2.1159 (1.4857) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][390/625] eta 0:01:53 lr 0.001124 wd 0.0500 time 0.4840 (0.4811) data time 0.0009 (0.0022) model time 0.4832 (0.4786) loss 3.4981 (3.2420) grad_norm 1.7293 (1.4881) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][400/625] eta 0:01:48 lr 0.001124 wd 0.0500 time 0.4841 (0.4810) data time 0.0008 (0.0022) model time 0.4833 (0.4786) loss 3.8425 (3.2466) grad_norm 1.0480 (1.4851) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][410/625] eta 0:01:43 lr 0.001124 wd 0.0500 time 0.4748 (0.4809) data time 0.0011 (0.0021) model time 0.4736 (0.4785) loss 3.1802 (3.2540) grad_norm 1.2720 (1.4838) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][420/625] eta 0:01:38 lr 0.001124 wd 0.0500 time 0.4742 (0.4808) data time 0.0009 (0.0021) model time 0.4734 (0.4784) loss 1.9722 (3.2442) grad_norm 1.1825 (1.4816) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][430/625] eta 0:01:33 lr 0.001124 wd 0.0500 time 0.4765 (0.4807) data time 0.0009 (0.0021) model time 0.4756 (0.4783) loss 3.7877 (3.2433) grad_norm 1.9715 (1.4839) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:10:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][440/625] eta 0:01:28 lr 0.001124 wd 0.0500 time 0.4780 (0.4806) data time 0.0009 (0.0021) model time 0.4771 (0.4783) loss 3.6918 (3.2452) grad_norm 1.5876 (1.4847) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][450/625] eta 0:01:24 lr 0.001124 wd 0.0500 time 0.4727 (0.4811) data time 0.0010 (0.0020) model time 0.4717 (0.4788) loss 3.7223 (3.2492) grad_norm 1.6543 (1.4873) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][460/625] eta 0:01:19 lr 0.001123 wd 0.0500 time 0.4762 (0.4809) data time 0.0008 (0.0020) model time 0.4754 (0.4787) loss 4.2898 (3.2567) grad_norm 1.4043 (1.4830) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][470/625] eta 0:01:14 lr 0.001123 wd 0.0500 time 0.4765 (0.4809) data time 0.0010 (0.0020) model time 0.4754 (0.4787) loss 3.6394 (3.2586) grad_norm 1.1190 (1.4803) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][480/625] eta 0:01:09 lr 0.001123 wd 0.0500 time 0.4770 (0.4808) data time 0.0011 (0.0020) model time 0.4759 (0.4785) loss 3.3867 (3.2626) grad_norm 1.1185 (1.4744) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][490/625] eta 0:01:04 lr 0.001123 wd 0.0500 time 0.4771 (0.4807) data time 0.0011 (0.0020) model time 0.4760 (0.4785) loss 2.9684 (3.2646) grad_norm 1.4785 (1.4717) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][500/625] eta 0:01:00 lr 0.001123 wd 0.0500 time 0.4734 (0.4806) data time 0.0011 (0.0020) model time 0.4723 (0.4784) loss 3.0261 (3.2617) grad_norm 1.3170 (1.4658) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][510/625] eta 0:00:55 lr 0.001123 wd 0.0500 time 0.4695 (0.4805) data time 0.0012 (0.0019) model time 0.4683 (0.4783) loss 2.9248 (3.2627) grad_norm 1.5019 (1.4656) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][520/625] eta 0:00:50 lr 0.001123 wd 0.0500 time 0.4782 (0.4808) data time 0.0008 (0.0019) model time 0.4773 (0.4786) loss 2.9960 (3.2661) grad_norm 1.1433 (1.4697) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][530/625] eta 0:00:45 lr 0.001123 wd 0.0500 time 0.4663 (0.4806) data time 0.0011 (0.0019) model time 0.4652 (0.4785) loss 3.2717 (3.2670) grad_norm 1.4797 (1.4648) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][540/625] eta 0:00:40 lr 0.001123 wd 0.0500 time 0.4692 (0.4806) data time 0.0011 (0.0019) model time 0.4681 (0.4785) loss 3.4729 (3.2615) grad_norm 1.1878 (1.4665) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][550/625] eta 0:00:36 lr 0.001123 wd 0.0500 time 0.4744 (0.4805) data time 0.0009 (0.0019) model time 0.4735 (0.4784) loss 2.6563 (3.2594) grad_norm 1.7242 (1.4638) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][560/625] eta 0:00:31 lr 0.001123 wd 0.0500 time 0.4769 (0.4805) data time 0.0008 (0.0019) model time 0.4761 (0.4784) loss 3.9368 (3.2626) grad_norm 1.6466 (1.4623) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][570/625] eta 0:00:26 lr 0.001123 wd 0.0500 time 0.4772 (0.4804) data time 0.0012 (0.0019) model time 0.4760 (0.4783) loss 3.2131 (3.2668) grad_norm 1.2286 (1.4577) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][580/625] eta 0:00:21 lr 0.001123 wd 0.0500 time 0.4764 (0.4803) data time 0.0009 (0.0018) model time 0.4755 (0.4782) loss 3.2016 (3.2709) grad_norm 1.2580 (1.4601) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][590/625] eta 0:00:16 lr 0.001123 wd 0.0500 time 0.4735 (0.4802) data time 0.0008 (0.0018) model time 0.4726 (0.4781) loss 3.8864 (3.2660) grad_norm 1.2627 (1.4575) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][600/625] eta 0:00:12 lr 0.001123 wd 0.0500 time 0.4718 (0.4801) data time 0.0010 (0.0018) model time 0.4708 (0.4780) loss 3.6511 (3.2669) grad_norm 1.6693 (1.4588) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][610/625] eta 0:00:07 lr 0.001123 wd 0.0500 time 0.4782 (0.4800) data time 0.0006 (0.0018) model time 0.4776 (0.4780) loss 3.9737 (3.2622) grad_norm 1.1844 (1.4592) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [65/300][620/625] eta 0:00:02 lr 0.001123 wd 0.0500 time 0.4763 (0.4800) data time 0.0008 (0.0018) model time 0.4754 (0.4779) loss 3.1293 (3.2607) grad_norm 1.3734 (1.4595) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 65 training takes 0:04:59 [2024-08-07 09:12:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:12:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.535 (0.535) Loss 0.6284 (0.6284) Acc@1 86.084 (86.084) Acc@5 97.754 (97.754) Mem 16699MB [2024-08-07 09:12:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 1.0732 (0.7688) Acc@1 74.365 (82.440) Acc@5 93.408 (96.680) Mem 16699MB [2024-08-07 09:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.142) Loss 1.1387 (0.9157) Acc@1 72.900 (79.136) Acc@5 92.578 (94.950) Mem 16699MB [2024-08-07 09:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 78.995 Acc@5 94.956 [2024-08-07 09:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.0% [2024-08-07 09:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.838 (0.838) Loss 0.5288 (0.5288) Acc@1 87.891 (87.891) Acc@5 98.389 (98.389) Mem 16699MB [2024-08-07 09:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.194) Loss 0.8872 (0.6707) Acc@1 78.271 (84.521) Acc@5 94.922 (97.177) Mem 16699MB [2024-08-07 09:12:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.158) Loss 1.0195 (0.8060) Acc@1 73.730 (80.971) Acc@5 93.994 (95.731) Mem 16699MB [2024-08-07 09:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.712 Acc@5 95.723 [2024-08-07 09:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-07 09:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.71% [2024-08-07 09:12:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:12:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][0/625] eta 0:08:18 lr 0.001123 wd 0.0500 time 0.7972 (0.7972) data time 0.3882 (0.3882) model time 0.0000 (0.0000) loss 3.2720 (3.2720) grad_norm 1.4415 (1.4415) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][10/625] eta 0:05:13 lr 0.001123 wd 0.0500 time 0.4860 (0.5092) data time 0.0013 (0.0365) model time 0.0000 (0.0000) loss 3.4610 (3.4398) grad_norm 2.1567 (1.5789) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][20/625] eta 0:05:06 lr 0.001123 wd 0.0500 time 0.4806 (0.5060) data time 0.0011 (0.0197) model time 0.0000 (0.0000) loss 2.1048 (3.2935) grad_norm 1.3199 (1.4911) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][30/625] eta 0:04:55 lr 0.001122 wd 0.0500 time 0.4778 (0.4968) data time 0.0008 (0.0137) model time 0.0000 (0.0000) loss 3.2572 (3.3279) grad_norm 1.4517 (1.4852) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][40/625] eta 0:04:47 lr 0.001122 wd 0.0500 time 0.4719 (0.4921) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 4.1657 (3.3663) grad_norm 2.2517 (1.4701) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][50/625] eta 0:04:43 lr 0.001122 wd 0.0500 time 0.6498 (0.4930) data time 0.0010 (0.0088) model time 0.0000 (0.0000) loss 3.8239 (3.3405) grad_norm 1.4416 (1.4846) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][60/625] eta 0:04:37 lr 0.001122 wd 0.0500 time 0.4765 (0.4907) data time 0.0011 (0.0076) model time 0.4754 (0.4777) loss 3.2750 (3.3160) grad_norm 1.6030 (1.4826) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][70/625] eta 0:04:31 lr 0.001122 wd 0.0500 time 0.4732 (0.4891) data time 0.0011 (0.0067) model time 0.4721 (0.4782) loss 2.4048 (3.3003) grad_norm 1.7544 (1.5038) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][80/625] eta 0:04:26 lr 0.001122 wd 0.0500 time 0.4793 (0.4882) data time 0.0009 (0.0060) model time 0.4784 (0.4788) loss 2.4258 (3.2518) grad_norm 1.9798 (1.5260) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][90/625] eta 0:04:20 lr 0.001122 wd 0.0500 time 0.4872 (0.4875) data time 0.0012 (0.0055) model time 0.4860 (0.4793) loss 3.0727 (3.2637) grad_norm 1.4008 (1.5373) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][100/625] eta 0:04:15 lr 0.001122 wd 0.0500 time 0.4734 (0.4867) data time 0.0008 (0.0051) model time 0.4726 (0.4790) loss 3.2720 (3.2471) grad_norm 1.3765 (1.5132) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][110/625] eta 0:04:10 lr 0.001122 wd 0.0500 time 0.4764 (0.4863) data time 0.0008 (0.0047) model time 0.4756 (0.4793) loss 3.5017 (3.2382) grad_norm 1.5332 (1.5130) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][120/625] eta 0:04:05 lr 0.001122 wd 0.0500 time 0.4906 (0.4860) data time 0.0008 (0.0044) model time 0.4898 (0.4797) loss 3.2522 (3.2392) grad_norm 1.5526 (1.5143) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][130/625] eta 0:04:00 lr 0.001122 wd 0.0500 time 0.4803 (0.4856) data time 0.0010 (0.0042) model time 0.4793 (0.4796) loss 2.1589 (3.2359) grad_norm 1.5722 (1.5083) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][140/625] eta 0:03:55 lr 0.001122 wd 0.0500 time 0.4800 (0.4852) data time 0.0008 (0.0040) model time 0.4791 (0.4795) loss 3.4549 (3.2446) grad_norm 2.2629 (1.5221) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][150/625] eta 0:03:50 lr 0.001122 wd 0.0500 time 0.4788 (0.4848) data time 0.0011 (0.0038) model time 0.4777 (0.4794) loss 3.4952 (3.2349) grad_norm 1.4961 (1.5359) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][160/625] eta 0:03:45 lr 0.001122 wd 0.0500 time 0.4523 (0.4845) data time 0.0008 (0.0036) model time 0.4515 (0.4794) loss 3.3761 (3.2347) grad_norm 1.3507 (1.5209) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][170/625] eta 0:03:40 lr 0.001122 wd 0.0500 time 0.4800 (0.4842) data time 0.0010 (0.0035) model time 0.4790 (0.4793) loss 3.4348 (3.2449) grad_norm 1.0920 (1.5106) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][180/625] eta 0:03:35 lr 0.001122 wd 0.0500 time 0.4840 (0.4840) data time 0.0008 (0.0033) model time 0.4832 (0.4793) loss 3.6448 (3.2368) grad_norm 1.7006 (1.5123) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][190/625] eta 0:03:30 lr 0.001122 wd 0.0500 time 0.4721 (0.4837) data time 0.0011 (0.0032) model time 0.4710 (0.4791) loss 3.2196 (3.2552) grad_norm 1.2510 (1.5186) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][200/625] eta 0:03:25 lr 0.001122 wd 0.0500 time 0.4746 (0.4836) data time 0.0008 (0.0031) model time 0.4738 (0.4792) loss 3.2828 (3.2744) grad_norm 1.3494 (1.5151) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][210/625] eta 0:03:20 lr 0.001122 wd 0.0500 time 0.4710 (0.4842) data time 0.0010 (0.0030) model time 0.4700 (0.4802) loss 2.5748 (3.2562) grad_norm 1.3485 (1.5133) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][220/625] eta 0:03:16 lr 0.001121 wd 0.0500 time 0.4827 (0.4840) data time 0.0009 (0.0029) model time 0.4818 (0.4802) loss 3.1328 (3.2409) grad_norm 1.3595 (1.5120) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][230/625] eta 0:03:11 lr 0.001121 wd 0.0500 time 0.4779 (0.4838) data time 0.0010 (0.0029) model time 0.4769 (0.4800) loss 3.2660 (3.2412) grad_norm 1.2251 (1.5077) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][240/625] eta 0:03:06 lr 0.001121 wd 0.0500 time 0.4112 (0.4838) data time 0.0008 (0.0028) model time 0.4104 (0.4802) loss 3.6565 (3.2511) grad_norm 1.0810 (1.5015) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][250/625] eta 0:03:01 lr 0.001121 wd 0.0500 time 0.4895 (0.4838) data time 0.0009 (0.0027) model time 0.4886 (0.4802) loss 2.0703 (3.2455) grad_norm 1.3664 (1.4998) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][260/625] eta 0:02:56 lr 0.001121 wd 0.0500 time 0.4771 (0.4836) data time 0.0009 (0.0027) model time 0.4762 (0.4801) loss 2.7860 (3.2447) grad_norm 1.5979 (1.4934) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][270/625] eta 0:02:51 lr 0.001121 wd 0.0500 time 0.4745 (0.4835) data time 0.0010 (0.0026) model time 0.4735 (0.4801) loss 3.6997 (3.2443) grad_norm 1.1112 (1.4898) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][280/625] eta 0:02:46 lr 0.001121 wd 0.0500 time 0.4821 (0.4834) data time 0.0011 (0.0026) model time 0.4810 (0.4801) loss 3.0642 (3.2503) grad_norm 1.1378 (1.4823) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][290/625] eta 0:02:41 lr 0.001121 wd 0.0500 time 0.4766 (0.4833) data time 0.0008 (0.0025) model time 0.4758 (0.4801) loss 2.1820 (3.2483) grad_norm 1.2899 (1.4788) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][300/625] eta 0:02:37 lr 0.001121 wd 0.0500 time 0.4808 (0.4832) data time 0.0008 (0.0025) model time 0.4800 (0.4800) loss 2.8086 (3.2409) grad_norm 1.3739 (1.4805) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][310/625] eta 0:02:32 lr 0.001121 wd 0.0500 time 0.4723 (0.4830) data time 0.0010 (0.0024) model time 0.4714 (0.4799) loss 4.1178 (3.2527) grad_norm 1.3534 (1.4742) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][320/625] eta 0:02:27 lr 0.001121 wd 0.0500 time 0.4745 (0.4829) data time 0.0011 (0.0024) model time 0.4734 (0.4799) loss 3.3928 (3.2516) grad_norm 1.7967 (1.4803) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][330/625] eta 0:02:22 lr 0.001121 wd 0.0500 time 0.4817 (0.4830) data time 0.0008 (0.0024) model time 0.4808 (0.4800) loss 3.3453 (3.2564) grad_norm 2.0483 (1.4888) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][340/625] eta 0:02:17 lr 0.001121 wd 0.0500 time 0.4894 (0.4831) data time 0.0010 (0.0023) model time 0.4883 (0.4802) loss 2.9402 (3.2475) grad_norm 1.1706 (1.4861) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][350/625] eta 0:02:12 lr 0.001121 wd 0.0500 time 0.4844 (0.4830) data time 0.0010 (0.0023) model time 0.4834 (0.4802) loss 3.5982 (3.2488) grad_norm 1.5364 (1.4830) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][360/625] eta 0:02:07 lr 0.001121 wd 0.0500 time 0.4753 (0.4829) data time 0.0010 (0.0023) model time 0.4743 (0.4801) loss 3.7838 (3.2479) grad_norm 1.5660 (1.4839) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][370/625] eta 0:02:03 lr 0.001121 wd 0.0500 time 0.4750 (0.4828) data time 0.0011 (0.0022) model time 0.4740 (0.4799) loss 3.7901 (3.2491) grad_norm 1.6154 (1.4818) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][380/625] eta 0:01:58 lr 0.001121 wd 0.0500 time 0.4740 (0.4826) data time 0.0009 (0.0022) model time 0.4732 (0.4798) loss 3.2887 (3.2530) grad_norm 1.3350 (1.4820) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][390/625] eta 0:01:53 lr 0.001121 wd 0.0500 time 0.4853 (0.4825) data time 0.0010 (0.0022) model time 0.4843 (0.4797) loss 3.0404 (3.2566) grad_norm 1.5857 (1.4816) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][400/625] eta 0:01:48 lr 0.001121 wd 0.0500 time 0.4774 (0.4824) data time 0.0011 (0.0022) model time 0.4763 (0.4797) loss 2.9638 (3.2582) grad_norm 1.3773 (1.4750) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][410/625] eta 0:01:43 lr 0.001120 wd 0.0500 time 0.4794 (0.4823) data time 0.0009 (0.0021) model time 0.4785 (0.4796) loss 3.6896 (3.2594) grad_norm 1.3514 (1.4739) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:15:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][420/625] eta 0:01:38 lr 0.001120 wd 0.0500 time 0.4736 (0.4822) data time 0.0010 (0.0021) model time 0.4726 (0.4795) loss 3.7470 (3.2594) grad_norm 1.8497 (1.4754) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][430/625] eta 0:01:34 lr 0.001120 wd 0.0500 time 0.4727 (0.4825) data time 0.0011 (0.0021) model time 0.4716 (0.4799) loss 2.5786 (3.2531) grad_norm 1.7150 (1.4745) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][440/625] eta 0:01:29 lr 0.001120 wd 0.0500 time 0.4746 (0.4824) data time 0.0008 (0.0021) model time 0.4737 (0.4798) loss 3.9817 (3.2581) grad_norm 1.0517 (1.4955) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][450/625] eta 0:01:24 lr 0.001120 wd 0.0500 time 0.4838 (0.4823) data time 0.0009 (0.0021) model time 0.4829 (0.4797) loss 3.9716 (3.2687) grad_norm 2.0256 (1.4909) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][460/625] eta 0:01:19 lr 0.001120 wd 0.0500 time 0.4072 (0.4825) data time 0.0011 (0.0020) model time 0.4061 (0.4800) loss 2.8874 (3.2709) grad_norm 1.7792 (1.4909) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][470/625] eta 0:01:14 lr 0.001120 wd 0.0500 time 0.4806 (0.4824) data time 0.0008 (0.0020) model time 0.4798 (0.4799) loss 3.9715 (3.2734) grad_norm 1.1045 (1.4918) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][480/625] eta 0:01:09 lr 0.001120 wd 0.0500 time 0.4746 (0.4823) data time 0.0009 (0.0020) model time 0.4736 (0.4798) loss 3.2010 (3.2754) grad_norm 1.6795 (1.4894) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][490/625] eta 0:01:05 lr 0.001120 wd 0.0500 time 0.4810 (0.4822) data time 0.0013 (0.0020) model time 0.4797 (0.4797) loss 3.6673 (3.2743) grad_norm 1.3661 (1.4874) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][500/625] eta 0:01:00 lr 0.001120 wd 0.0500 time 0.4737 (0.4821) data time 0.0010 (0.0020) model time 0.4727 (0.4797) loss 3.9848 (3.2723) grad_norm 1.0004 (1.4860) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][510/625] eta 0:00:55 lr 0.001120 wd 0.0500 time 0.4862 (0.4820) data time 0.0010 (0.0020) model time 0.4852 (0.4796) loss 3.6287 (3.2733) grad_norm 1.4612 (1.4813) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][520/625] eta 0:00:50 lr 0.001120 wd 0.0500 time 0.4673 (0.4819) data time 0.0010 (0.0019) model time 0.4663 (0.4796) loss 3.3915 (3.2668) grad_norm 1.3008 (1.4810) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][530/625] eta 0:00:45 lr 0.001120 wd 0.0500 time 0.4799 (0.4819) data time 0.0010 (0.0019) model time 0.4788 (0.4795) loss 3.0648 (3.2668) grad_norm 1.4105 (1.4772) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:16:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][540/625] eta 0:00:40 lr 0.001120 wd 0.0500 time 0.4742 (0.4818) data time 0.0011 (0.0019) model time 0.4731 (0.4795) loss 3.2091 (3.2681) grad_norm 2.9222 (1.4793) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][550/625] eta 0:00:36 lr 0.001120 wd 0.0500 time 0.4717 (0.4817) data time 0.0011 (0.0019) model time 0.4706 (0.4794) loss 3.2171 (3.2761) grad_norm 1.4354 (1.4824) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][560/625] eta 0:00:31 lr 0.001120 wd 0.0500 time 0.4743 (0.4816) data time 0.0011 (0.0019) model time 0.4731 (0.4793) loss 3.5095 (3.2784) grad_norm 1.8617 (1.4813) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][570/625] eta 0:00:26 lr 0.001120 wd 0.0500 time 0.4765 (0.4815) data time 0.0011 (0.0019) model time 0.4754 (0.4792) loss 3.5070 (3.2802) grad_norm 1.2400 (1.4788) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][580/625] eta 0:00:21 lr 0.001120 wd 0.0500 time 0.4728 (0.4818) data time 0.0011 (0.0019) model time 0.4717 (0.4795) loss 2.4639 (3.2771) grad_norm 1.4455 (1.4776) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][590/625] eta 0:00:16 lr 0.001119 wd 0.0500 time 0.4781 (0.4817) data time 0.0011 (0.0018) model time 0.4770 (0.4794) loss 3.4185 (3.2723) grad_norm 1.2136 (1.4753) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][600/625] eta 0:00:12 lr 0.001119 wd 0.0500 time 0.4748 (0.4816) data time 0.0011 (0.0018) model time 0.4737 (0.4794) loss 3.5571 (3.2750) grad_norm 1.0939 (1.4741) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][610/625] eta 0:00:07 lr 0.001119 wd 0.0500 time 0.4760 (0.4818) data time 0.0009 (0.0018) model time 0.4751 (0.4796) loss 3.0932 (3.2750) grad_norm 2.0614 (1.4782) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [66/300][620/625] eta 0:00:02 lr 0.001119 wd 0.0500 time 0.4727 (0.4817) data time 0.0006 (0.0018) model time 0.4721 (0.4795) loss 3.4645 (3.2786) grad_norm 0.9785 (1.4787) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 66 training takes 0:05:01 [2024-08-07 09:17:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:17:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.6138 (0.6138) Acc@1 85.352 (85.352) Acc@5 97.852 (97.852) Mem 16699MB [2024-08-07 09:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.9844 (0.7543) Acc@1 76.660 (82.923) Acc@5 94.141 (96.822) Mem 16699MB [2024-08-07 09:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.1553 (0.9018) Acc@1 71.582 (79.346) Acc@5 91.943 (95.059) Mem 16699MB [2024-08-07 09:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.079 Acc@5 95.034 [2024-08-07 09:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-08-07 09:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.08% [2024-08-07 09:17:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 09:17:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 09:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5283 (0.5283) Acc@1 87.988 (87.988) Acc@5 98.389 (98.389) Mem 16699MB [2024-08-07 09:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.160) Loss 0.8862 (0.6691) Acc@1 78.174 (84.610) Acc@5 94.824 (97.195) Mem 16699MB [2024-08-07 09:17:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 1.0176 (0.8036) Acc@1 73.535 (81.064) Acc@5 94.189 (95.761) Mem 16699MB [2024-08-07 09:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.796 Acc@5 95.763 [2024-08-07 09:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-07 09:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.80% [2024-08-07 09:17:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:17:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:17:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][0/625] eta 0:08:22 lr 0.001119 wd 0.0500 time 0.8042 (0.8042) data time 0.3923 (0.3923) model time 0.0000 (0.0000) loss 3.7853 (3.7853) grad_norm 1.3723 (1.3723) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][10/625] eta 0:05:10 lr 0.001119 wd 0.0500 time 0.4738 (0.5048) data time 0.0008 (0.0368) model time 0.0000 (0.0000) loss 3.2748 (3.4661) grad_norm 1.1306 (1.3241) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][20/625] eta 0:04:57 lr 0.001119 wd 0.0500 time 0.4756 (0.4913) data time 0.0011 (0.0198) model time 0.0000 (0.0000) loss 3.6571 (3.4018) grad_norm 1.5696 (inf) loss_scale 2048.0000 (3315.8095) mem 16699MB [2024-08-07 09:18:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][30/625] eta 0:04:49 lr 0.001119 wd 0.0500 time 0.4699 (0.4861) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 2.1949 (3.3906) grad_norm 1.9230 (inf) loss_scale 2048.0000 (2906.8387) mem 16699MB [2024-08-07 09:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][40/625] eta 0:04:42 lr 0.001119 wd 0.0500 time 0.4704 (0.4832) data time 0.0012 (0.0107) model time 0.0000 (0.0000) loss 3.7639 (3.3570) grad_norm 2.2882 (inf) loss_scale 2048.0000 (2697.3659) mem 16699MB [2024-08-07 09:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][50/625] eta 0:04:37 lr 0.001119 wd 0.0500 time 0.4790 (0.4819) data time 0.0009 (0.0088) model time 0.0000 (0.0000) loss 2.6360 (3.3688) grad_norm 0.9446 (inf) loss_scale 2048.0000 (2570.0392) mem 16699MB [2024-08-07 09:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][60/625] eta 0:04:32 lr 0.001119 wd 0.0500 time 0.4728 (0.4822) data time 0.0010 (0.0076) model time 0.4718 (0.4828) loss 3.5763 (3.3564) grad_norm 1.3749 (inf) loss_scale 2048.0000 (2484.4590) mem 16699MB [2024-08-07 09:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][70/625] eta 0:04:27 lr 0.001119 wd 0.0500 time 0.4837 (0.4818) data time 0.0011 (0.0067) model time 0.4826 (0.4803) loss 2.7141 (3.3398) grad_norm 1.9723 (inf) loss_scale 2048.0000 (2422.9859) mem 16699MB [2024-08-07 09:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][80/625] eta 0:04:22 lr 0.001119 wd 0.0500 time 0.4718 (0.4810) data time 0.0011 (0.0060) model time 0.4706 (0.4784) loss 2.8087 (3.3347) grad_norm 1.9679 (inf) loss_scale 2048.0000 (2376.6914) mem 16699MB [2024-08-07 09:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][90/625] eta 0:04:17 lr 0.001119 wd 0.0500 time 0.4766 (0.4820) data time 0.0011 (0.0055) model time 0.4756 (0.4810) loss 2.5110 (3.3428) grad_norm 1.5896 (inf) loss_scale 2048.0000 (2340.5714) mem 16699MB [2024-08-07 09:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][100/625] eta 0:04:12 lr 0.001119 wd 0.0500 time 0.4750 (0.4814) data time 0.0008 (0.0050) model time 0.4742 (0.4798) loss 4.0905 (3.3557) grad_norm 1.2407 (inf) loss_scale 2048.0000 (2311.6040) mem 16699MB [2024-08-07 09:18:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][110/625] eta 0:04:07 lr 0.001119 wd 0.0500 time 0.4758 (0.4812) data time 0.0011 (0.0047) model time 0.4748 (0.4793) loss 3.6347 (3.3409) grad_norm 0.9951 (inf) loss_scale 2048.0000 (2287.8559) mem 16699MB [2024-08-07 09:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][120/625] eta 0:04:02 lr 0.001119 wd 0.0500 time 0.4814 (0.4810) data time 0.0010 (0.0044) model time 0.4804 (0.4791) loss 3.3864 (3.3148) grad_norm 1.3990 (inf) loss_scale 2048.0000 (2268.0331) mem 16699MB [2024-08-07 09:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][130/625] eta 0:03:58 lr 0.001119 wd 0.0500 time 0.4716 (0.4824) data time 0.0008 (0.0042) model time 0.4708 (0.4815) loss 2.9965 (3.2744) grad_norm 1.2763 (inf) loss_scale 2048.0000 (2251.2366) mem 16699MB [2024-08-07 09:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][140/625] eta 0:03:53 lr 0.001119 wd 0.0500 time 0.4778 (0.4819) data time 0.0008 (0.0040) model time 0.4770 (0.4807) loss 3.3604 (3.2693) grad_norm 1.2535 (inf) loss_scale 2048.0000 (2236.8227) mem 16699MB [2024-08-07 09:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][150/625] eta 0:03:48 lr 0.001118 wd 0.0500 time 0.4788 (0.4815) data time 0.0010 (0.0038) model time 0.4778 (0.4801) loss 3.5356 (3.2888) grad_norm 1.5797 (inf) loss_scale 2048.0000 (2224.3179) mem 16699MB [2024-08-07 09:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][160/625] eta 0:03:43 lr 0.001118 wd 0.0500 time 0.4692 (0.4810) data time 0.0012 (0.0036) model time 0.4680 (0.4794) loss 3.5805 (3.2903) grad_norm 1.3346 (inf) loss_scale 2048.0000 (2213.3665) mem 16699MB [2024-08-07 09:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][170/625] eta 0:03:38 lr 0.001118 wd 0.0500 time 0.4815 (0.4808) data time 0.0008 (0.0035) model time 0.4808 (0.4790) loss 2.6563 (3.2951) grad_norm 1.4791 (inf) loss_scale 2048.0000 (2203.6959) mem 16699MB [2024-08-07 09:19:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][180/625] eta 0:03:33 lr 0.001118 wd 0.0500 time 0.4745 (0.4805) data time 0.0010 (0.0034) model time 0.4735 (0.4788) loss 3.6376 (3.2920) grad_norm 1.3326 (inf) loss_scale 2048.0000 (2195.0939) mem 16699MB [2024-08-07 09:19:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][190/625] eta 0:03:28 lr 0.001118 wd 0.0500 time 0.4810 (0.4803) data time 0.0008 (0.0033) model time 0.4803 (0.4785) loss 3.5746 (3.2803) grad_norm 1.5569 (inf) loss_scale 2048.0000 (2187.3927) mem 16699MB [2024-08-07 09:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][200/625] eta 0:03:24 lr 0.001118 wd 0.0500 time 0.4756 (0.4801) data time 0.0008 (0.0032) model time 0.4748 (0.4782) loss 2.5720 (3.2754) grad_norm 1.5935 (inf) loss_scale 2048.0000 (2180.4577) mem 16699MB [2024-08-07 09:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][210/625] eta 0:03:19 lr 0.001118 wd 0.0500 time 0.4768 (0.4799) data time 0.0007 (0.0031) model time 0.4760 (0.4780) loss 2.1767 (3.2544) grad_norm 1.0544 (inf) loss_scale 2048.0000 (2174.1801) mem 16699MB [2024-08-07 09:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][220/625] eta 0:03:14 lr 0.001118 wd 0.0500 time 0.4731 (0.4797) data time 0.0011 (0.0030) model time 0.4720 (0.4779) loss 3.6083 (3.2592) grad_norm 1.9441 (inf) loss_scale 2048.0000 (2168.4706) mem 16699MB [2024-08-07 09:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][230/625] eta 0:03:09 lr 0.001118 wd 0.0500 time 0.4848 (0.4797) data time 0.0008 (0.0029) model time 0.4840 (0.4778) loss 3.7728 (3.2597) grad_norm 1.0176 (inf) loss_scale 2048.0000 (2163.2554) mem 16699MB [2024-08-07 09:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][240/625] eta 0:03:04 lr 0.001118 wd 0.0500 time 0.4780 (0.4796) data time 0.0010 (0.0028) model time 0.4771 (0.4777) loss 3.4415 (3.2670) grad_norm 1.4992 (inf) loss_scale 2048.0000 (2158.4730) mem 16699MB [2024-08-07 09:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][250/625] eta 0:02:59 lr 0.001118 wd 0.0500 time 0.4769 (0.4795) data time 0.0008 (0.0027) model time 0.4761 (0.4777) loss 2.2717 (3.2769) grad_norm 1.7807 (inf) loss_scale 2048.0000 (2154.0717) mem 16699MB [2024-08-07 09:19:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][260/625] eta 0:02:55 lr 0.001118 wd 0.0500 time 0.4811 (0.4795) data time 0.0008 (0.0027) model time 0.4803 (0.4778) loss 3.7542 (3.2807) grad_norm 2.0355 (inf) loss_scale 2048.0000 (2150.0077) mem 16699MB [2024-08-07 09:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][270/625] eta 0:02:50 lr 0.001118 wd 0.0500 time 0.4754 (0.4795) data time 0.0007 (0.0026) model time 0.4747 (0.4778) loss 2.8487 (3.2757) grad_norm 1.4872 (inf) loss_scale 2048.0000 (2146.2435) mem 16699MB [2024-08-07 09:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][280/625] eta 0:02:45 lr 0.001118 wd 0.0500 time 0.4726 (0.4795) data time 0.0009 (0.0026) model time 0.4717 (0.4778) loss 3.2612 (3.2829) grad_norm 1.3180 (inf) loss_scale 2048.0000 (2142.7473) mem 16699MB [2024-08-07 09:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][290/625] eta 0:02:40 lr 0.001118 wd 0.0500 time 0.4811 (0.4794) data time 0.0010 (0.0025) model time 0.4800 (0.4777) loss 2.0587 (3.2628) grad_norm 1.9232 (inf) loss_scale 2048.0000 (2139.4914) mem 16699MB [2024-08-07 09:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][300/625] eta 0:02:35 lr 0.001118 wd 0.0500 time 0.4797 (0.4794) data time 0.0007 (0.0025) model time 0.4790 (0.4777) loss 2.2671 (3.2505) grad_norm 1.3014 (inf) loss_scale 2048.0000 (2136.4518) mem 16699MB [2024-08-07 09:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][310/625] eta 0:02:30 lr 0.001118 wd 0.0500 time 0.4746 (0.4793) data time 0.0008 (0.0024) model time 0.4738 (0.4776) loss 2.6284 (3.2479) grad_norm 1.0974 (inf) loss_scale 2048.0000 (2133.6077) mem 16699MB [2024-08-07 09:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][320/625] eta 0:02:26 lr 0.001118 wd 0.0500 time 0.4768 (0.4792) data time 0.0010 (0.0024) model time 0.4758 (0.4776) loss 3.7865 (3.2523) grad_norm 1.0298 (inf) loss_scale 2048.0000 (2130.9408) mem 16699MB [2024-08-07 09:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][330/625] eta 0:02:21 lr 0.001118 wd 0.0500 time 0.4768 (0.4792) data time 0.0008 (0.0023) model time 0.4760 (0.4776) loss 3.0527 (3.2524) grad_norm 0.9375 (inf) loss_scale 2048.0000 (2128.4350) mem 16699MB [2024-08-07 09:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][340/625] eta 0:02:16 lr 0.001117 wd 0.0500 time 0.4806 (0.4791) data time 0.0008 (0.0023) model time 0.4798 (0.4775) loss 3.9160 (3.2551) grad_norm 1.7817 (inf) loss_scale 2048.0000 (2126.0762) mem 16699MB [2024-08-07 09:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][350/625] eta 0:02:11 lr 0.001117 wd 0.0500 time 0.4763 (0.4791) data time 0.0007 (0.0023) model time 0.4756 (0.4774) loss 3.5185 (3.2597) grad_norm 1.1397 (inf) loss_scale 2048.0000 (2123.8519) mem 16699MB [2024-08-07 09:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][360/625] eta 0:02:06 lr 0.001117 wd 0.0500 time 0.4772 (0.4791) data time 0.0008 (0.0022) model time 0.4765 (0.4774) loss 2.6512 (3.2573) grad_norm 1.2922 (inf) loss_scale 2048.0000 (2121.7507) mem 16699MB [2024-08-07 09:20:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][370/625] eta 0:02:02 lr 0.001117 wd 0.0500 time 0.4800 (0.4791) data time 0.0010 (0.0022) model time 0.4790 (0.4775) loss 3.6642 (3.2588) grad_norm 1.1219 (inf) loss_scale 2048.0000 (2119.7628) mem 16699MB [2024-08-07 09:20:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][380/625] eta 0:01:57 lr 0.001117 wd 0.0500 time 0.4863 (0.4791) data time 0.0011 (0.0022) model time 0.4852 (0.4775) loss 3.7638 (3.2604) grad_norm 1.2778 (inf) loss_scale 2048.0000 (2117.8793) mem 16699MB [2024-08-07 09:20:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][390/625] eta 0:01:52 lr 0.001117 wd 0.0500 time 0.4737 (0.4790) data time 0.0008 (0.0022) model time 0.4729 (0.4774) loss 3.8295 (3.2592) grad_norm 1.1023 (inf) loss_scale 2048.0000 (2116.0921) mem 16699MB [2024-08-07 09:21:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][400/625] eta 0:01:47 lr 0.001117 wd 0.0500 time 0.4753 (0.4790) data time 0.0008 (0.0021) model time 0.4744 (0.4774) loss 2.8653 (3.2531) grad_norm 2.7479 (inf) loss_scale 2048.0000 (2114.3940) mem 16699MB [2024-08-07 09:21:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][410/625] eta 0:01:42 lr 0.001117 wd 0.0500 time 0.4827 (0.4789) data time 0.0009 (0.0021) model time 0.4818 (0.4773) loss 3.4024 (3.2617) grad_norm 1.1406 (inf) loss_scale 2048.0000 (2112.7786) mem 16699MB [2024-08-07 09:21:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][420/625] eta 0:01:38 lr 0.001117 wd 0.0500 time 0.4677 (0.4791) data time 0.0011 (0.0021) model time 0.4666 (0.4776) loss 2.6121 (3.2663) grad_norm 1.2043 (inf) loss_scale 2048.0000 (2111.2399) mem 16699MB [2024-08-07 09:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][430/625] eta 0:01:33 lr 0.001117 wd 0.0500 time 0.4646 (0.4790) data time 0.0011 (0.0021) model time 0.4635 (0.4774) loss 3.1739 (3.2646) grad_norm 1.2475 (inf) loss_scale 2048.0000 (2109.7726) mem 16699MB [2024-08-07 09:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][440/625] eta 0:01:28 lr 0.001117 wd 0.0500 time 0.4787 (0.4790) data time 0.0008 (0.0021) model time 0.4779 (0.4774) loss 3.5436 (3.2704) grad_norm 1.3530 (inf) loss_scale 2048.0000 (2108.3719) mem 16699MB [2024-08-07 09:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][450/625] eta 0:01:23 lr 0.001117 wd 0.0500 time 0.4780 (0.4789) data time 0.0010 (0.0020) model time 0.4770 (0.4773) loss 3.3881 (3.2675) grad_norm 1.3884 (inf) loss_scale 2048.0000 (2107.0333) mem 16699MB [2024-08-07 09:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][460/625] eta 0:01:19 lr 0.001117 wd 0.0500 time 0.4757 (0.4793) data time 0.0010 (0.0020) model time 0.4747 (0.4778) loss 3.6061 (3.2669) grad_norm 1.6370 (inf) loss_scale 2048.0000 (2105.7527) mem 16699MB [2024-08-07 09:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][470/625] eta 0:01:14 lr 0.001117 wd 0.0500 time 0.4781 (0.4792) data time 0.0011 (0.0020) model time 0.4770 (0.4777) loss 3.4024 (3.2708) grad_norm 2.0870 (inf) loss_scale 2048.0000 (2104.5265) mem 16699MB [2024-08-07 09:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][480/625] eta 0:01:09 lr 0.001117 wd 0.0500 time 0.4754 (0.4791) data time 0.0008 (0.0020) model time 0.4746 (0.4776) loss 2.7109 (3.2709) grad_norm 1.9373 (inf) loss_scale 2048.0000 (2103.3514) mem 16699MB [2024-08-07 09:21:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][490/625] eta 0:01:04 lr 0.001117 wd 0.0500 time 0.4793 (0.4791) data time 0.0010 (0.0020) model time 0.4783 (0.4776) loss 3.2498 (3.2683) grad_norm 1.6934 (inf) loss_scale 2048.0000 (2102.2240) mem 16699MB [2024-08-07 09:21:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][500/625] eta 0:00:59 lr 0.001117 wd 0.0500 time 0.4746 (0.4790) data time 0.0013 (0.0019) model time 0.4732 (0.4775) loss 3.6554 (3.2684) grad_norm 0.9904 (inf) loss_scale 2048.0000 (2101.1417) mem 16699MB [2024-08-07 09:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][510/625] eta 0:00:55 lr 0.001117 wd 0.0500 time 0.4788 (0.4790) data time 0.0011 (0.0019) model time 0.4777 (0.4775) loss 3.4436 (3.2678) grad_norm 1.2239 (inf) loss_scale 2048.0000 (2100.1018) mem 16699MB [2024-08-07 09:21:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][520/625] eta 0:00:50 lr 0.001116 wd 0.0500 time 0.4813 (0.4790) data time 0.0011 (0.0019) model time 0.4802 (0.4776) loss 3.4876 (3.2712) grad_norm 1.2555 (inf) loss_scale 2048.0000 (2099.1017) mem 16699MB [2024-08-07 09:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][530/625] eta 0:00:45 lr 0.001116 wd 0.0500 time 0.4887 (0.4790) data time 0.0008 (0.0019) model time 0.4879 (0.4776) loss 3.6572 (3.2734) grad_norm 2.4481 (inf) loss_scale 2048.0000 (2098.1394) mem 16699MB [2024-08-07 09:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][540/625] eta 0:00:40 lr 0.001116 wd 0.0500 time 0.4868 (0.4791) data time 0.0011 (0.0019) model time 0.4857 (0.4776) loss 2.6045 (3.2653) grad_norm 1.8813 (inf) loss_scale 2048.0000 (2097.2126) mem 16699MB [2024-08-07 09:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][550/625] eta 0:00:35 lr 0.001116 wd 0.0500 time 0.4854 (0.4791) data time 0.0011 (0.0019) model time 0.4844 (0.4776) loss 3.1778 (3.2690) grad_norm 1.2629 (inf) loss_scale 2048.0000 (2096.3194) mem 16699MB [2024-08-07 09:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][560/625] eta 0:00:31 lr 0.001116 wd 0.0500 time 0.4748 (0.4791) data time 0.0009 (0.0019) model time 0.4739 (0.4776) loss 1.8117 (3.2623) grad_norm 1.3249 (inf) loss_scale 2048.0000 (2095.4581) mem 16699MB [2024-08-07 09:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][570/625] eta 0:00:26 lr 0.001116 wd 0.0500 time 0.4805 (0.4791) data time 0.0011 (0.0018) model time 0.4793 (0.4777) loss 3.7202 (3.2610) grad_norm 1.3930 (inf) loss_scale 2048.0000 (2094.6270) mem 16699MB [2024-08-07 09:22:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][580/625] eta 0:00:21 lr 0.001116 wd 0.0500 time 0.4804 (0.4791) data time 0.0009 (0.0018) model time 0.4796 (0.4777) loss 2.6098 (3.2597) grad_norm 1.3891 (inf) loss_scale 2048.0000 (2093.8244) mem 16699MB [2024-08-07 09:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][590/625] eta 0:00:16 lr 0.001116 wd 0.0500 time 0.4804 (0.4792) data time 0.0011 (0.0018) model time 0.4794 (0.4777) loss 2.0258 (3.2562) grad_norm 1.7320 (inf) loss_scale 2048.0000 (2093.0491) mem 16699MB [2024-08-07 09:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][600/625] eta 0:00:11 lr 0.001116 wd 0.0500 time 0.4769 (0.4792) data time 0.0011 (0.0018) model time 0.4758 (0.4778) loss 3.8406 (3.2562) grad_norm 1.1458 (inf) loss_scale 2048.0000 (2092.2995) mem 16699MB [2024-08-07 09:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][610/625] eta 0:00:07 lr 0.001116 wd 0.0500 time 0.4750 (0.4793) data time 0.0006 (0.0018) model time 0.4744 (0.4779) loss 2.2181 (3.2526) grad_norm 1.1290 (inf) loss_scale 2048.0000 (2091.5745) mem 16699MB [2024-08-07 09:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [67/300][620/625] eta 0:00:02 lr 0.001116 wd 0.0500 time 0.4777 (0.4792) data time 0.0006 (0.0018) model time 0.4771 (0.4778) loss 3.8101 (3.2550) grad_norm 1.3086 (inf) loss_scale 2048.0000 (2090.8728) mem 16699MB [2024-08-07 09:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 67 training takes 0:04:59 [2024-08-07 09:22:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:22:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.6216 (0.6216) Acc@1 86.084 (86.084) Acc@5 98.145 (98.145) Mem 16699MB [2024-08-07 09:22:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.9966 (0.7668) Acc@1 76.807 (83.003) Acc@5 94.678 (96.817) Mem 16699MB [2024-08-07 09:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.1211 (0.9090) Acc@1 73.242 (79.376) Acc@5 92.725 (95.106) Mem 16699MB [2024-08-07 09:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.225 Acc@5 95.078 [2024-08-07 09:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-08-07 09:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.22% [2024-08-07 09:22:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 09:22:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 09:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.5278 (0.5278) Acc@1 87.939 (87.939) Acc@5 98.340 (98.340) Mem 16699MB [2024-08-07 09:22:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.8843 (0.6681) Acc@1 78.320 (84.641) Acc@5 95.020 (97.226) Mem 16699MB [2024-08-07 09:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 1.0137 (0.8009) Acc@1 73.730 (81.176) Acc@5 94.189 (95.791) Mem 16699MB [2024-08-07 09:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.890 Acc@5 95.791 [2024-08-07 09:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-07 09:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.89% [2024-08-07 09:22:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:23:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][0/625] eta 0:09:22 lr 0.001116 wd 0.0500 time 0.9004 (0.9004) data time 0.4849 (0.4849) model time 0.0000 (0.0000) loss 3.5167 (3.5167) grad_norm 1.4607 (1.4607) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][10/625] eta 0:05:20 lr 0.001116 wd 0.0500 time 0.4767 (0.5205) data time 0.0012 (0.0451) model time 0.0000 (0.0000) loss 2.5350 (3.1959) grad_norm 0.9638 (1.2862) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][20/625] eta 0:05:02 lr 0.001116 wd 0.0500 time 0.4766 (0.4998) data time 0.0011 (0.0242) model time 0.0000 (0.0000) loss 3.4862 (3.1948) grad_norm 1.2697 (1.3261) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][30/625] eta 0:04:53 lr 0.001116 wd 0.0500 time 0.4778 (0.4932) data time 0.0011 (0.0167) model time 0.0000 (0.0000) loss 2.3660 (3.2743) grad_norm 1.4454 (1.3577) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][40/625] eta 0:04:49 lr 0.001116 wd 0.0500 time 0.7015 (0.4948) data time 0.0011 (0.0130) model time 0.0000 (0.0000) loss 3.8882 (3.2857) grad_norm 1.1467 (1.4114) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][50/625] eta 0:04:43 lr 0.001116 wd 0.0500 time 0.4705 (0.4930) data time 0.0011 (0.0107) model time 0.0000 (0.0000) loss 3.5332 (3.3122) grad_norm 1.4259 (1.4044) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][60/625] eta 0:04:36 lr 0.001116 wd 0.0500 time 0.4748 (0.4902) data time 0.0009 (0.0091) model time 0.4739 (0.4749) loss 3.7016 (3.3316) grad_norm 1.4644 (1.4334) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][70/625] eta 0:04:31 lr 0.001116 wd 0.0500 time 0.4726 (0.4883) data time 0.0008 (0.0080) model time 0.4719 (0.4753) loss 3.7283 (3.3170) grad_norm 2.1275 (1.4614) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][80/625] eta 0:04:25 lr 0.001115 wd 0.0500 time 0.4700 (0.4869) data time 0.0011 (0.0071) model time 0.4688 (0.4754) loss 3.3780 (3.2745) grad_norm 1.6720 (1.4546) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][90/625] eta 0:04:19 lr 0.001115 wd 0.0500 time 0.4727 (0.4858) data time 0.0009 (0.0065) model time 0.4718 (0.4754) loss 3.3730 (3.2981) grad_norm 1.2183 (1.4272) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][100/625] eta 0:04:14 lr 0.001115 wd 0.0500 time 0.4772 (0.4855) data time 0.0010 (0.0060) model time 0.4763 (0.4767) loss 3.4774 (3.2836) grad_norm 1.7989 (1.4434) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][110/625] eta 0:04:09 lr 0.001115 wd 0.0500 time 0.4698 (0.4846) data time 0.0008 (0.0055) model time 0.4690 (0.4763) loss 4.0574 (3.3151) grad_norm 1.3946 (1.4741) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][120/625] eta 0:04:04 lr 0.001115 wd 0.0500 time 0.4751 (0.4840) data time 0.0010 (0.0052) model time 0.4740 (0.4762) loss 3.4368 (3.3003) grad_norm 1.1529 (1.4798) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][130/625] eta 0:03:59 lr 0.001115 wd 0.0500 time 0.4764 (0.4833) data time 0.0011 (0.0049) model time 0.4753 (0.4760) loss 2.5592 (3.2874) grad_norm 1.2014 (1.4607) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][140/625] eta 0:03:54 lr 0.001115 wd 0.0500 time 0.4757 (0.4827) data time 0.0011 (0.0046) model time 0.4746 (0.4757) loss 3.2722 (3.2833) grad_norm 1.5034 (1.4455) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][150/625] eta 0:03:49 lr 0.001115 wd 0.0500 time 0.4759 (0.4822) data time 0.0010 (0.0044) model time 0.4749 (0.4756) loss 3.6893 (3.2936) grad_norm 1.4702 (1.4333) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][160/625] eta 0:03:44 lr 0.001115 wd 0.0500 time 0.4702 (0.4817) data time 0.0008 (0.0042) model time 0.4694 (0.4754) loss 2.3930 (3.2947) grad_norm 1.0956 (1.4244) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][170/625] eta 0:03:39 lr 0.001115 wd 0.0500 time 0.4790 (0.4814) data time 0.0010 (0.0040) model time 0.4780 (0.4754) loss 3.0770 (3.2942) grad_norm 1.3855 (1.4298) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][180/625] eta 0:03:34 lr 0.001115 wd 0.0500 time 0.4794 (0.4811) data time 0.0008 (0.0038) model time 0.4786 (0.4753) loss 3.9251 (3.2963) grad_norm 1.2017 (1.4519) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][190/625] eta 0:03:29 lr 0.001115 wd 0.0500 time 0.4742 (0.4808) data time 0.0008 (0.0037) model time 0.4734 (0.4752) loss 2.6705 (3.3057) grad_norm 1.0568 (1.4460) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][200/625] eta 0:03:24 lr 0.001115 wd 0.0500 time 0.4740 (0.4806) data time 0.0010 (0.0036) model time 0.4730 (0.4752) loss 2.7900 (3.3018) grad_norm 1.9986 (1.4655) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][210/625] eta 0:03:19 lr 0.001115 wd 0.0500 time 0.4741 (0.4804) data time 0.0008 (0.0034) model time 0.4733 (0.4752) loss 4.3218 (3.3127) grad_norm 1.0544 (1.4697) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][220/625] eta 0:03:14 lr 0.001115 wd 0.0500 time 0.4740 (0.4802) data time 0.0010 (0.0033) model time 0.4730 (0.4752) loss 3.1497 (3.2955) grad_norm 1.6521 (1.4685) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][230/625] eta 0:03:09 lr 0.001115 wd 0.0500 time 0.6482 (0.4806) data time 0.0010 (0.0032) model time 0.6472 (0.4760) loss 3.0362 (3.2989) grad_norm 2.3178 (1.4766) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:24:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][240/625] eta 0:03:04 lr 0.001115 wd 0.0500 time 0.4753 (0.4804) data time 0.0010 (0.0032) model time 0.4743 (0.4759) loss 2.6245 (3.2855) grad_norm 1.3686 (1.4654) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][250/625] eta 0:03:00 lr 0.001115 wd 0.0500 time 0.4685 (0.4802) data time 0.0008 (0.0031) model time 0.4678 (0.4758) loss 2.9863 (3.2790) grad_norm 1.3446 (1.4714) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][260/625] eta 0:02:55 lr 0.001114 wd 0.0500 time 0.4786 (0.4800) data time 0.0010 (0.0030) model time 0.4776 (0.4757) loss 3.0653 (3.2760) grad_norm 2.1003 (1.4802) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][270/625] eta 0:02:50 lr 0.001114 wd 0.0500 time 0.4796 (0.4799) data time 0.0008 (0.0029) model time 0.4789 (0.4758) loss 2.8222 (3.2705) grad_norm 1.2408 (1.4746) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][280/625] eta 0:02:45 lr 0.001114 wd 0.0500 time 0.4843 (0.4799) data time 0.0007 (0.0028) model time 0.4835 (0.4759) loss 3.4358 (3.2694) grad_norm 2.0256 (1.4707) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][290/625] eta 0:02:40 lr 0.001114 wd 0.0500 time 0.4764 (0.4798) data time 0.0008 (0.0028) model time 0.4756 (0.4759) loss 3.8272 (3.2722) grad_norm 1.4013 (1.4706) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][300/625] eta 0:02:35 lr 0.001114 wd 0.0500 time 0.4685 (0.4797) data time 0.0008 (0.0027) model time 0.4676 (0.4758) loss 2.6482 (3.2743) grad_norm 1.7611 (1.4798) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][310/625] eta 0:02:31 lr 0.001114 wd 0.0500 time 0.4825 (0.4796) data time 0.0008 (0.0027) model time 0.4817 (0.4758) loss 1.7628 (3.2702) grad_norm 1.6727 (1.4788) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][320/625] eta 0:02:26 lr 0.001114 wd 0.0500 time 0.4777 (0.4800) data time 0.0008 (0.0026) model time 0.4769 (0.4764) loss 3.7134 (3.2666) grad_norm 1.1286 (1.4763) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][330/625] eta 0:02:21 lr 0.001114 wd 0.0500 time 0.4750 (0.4799) data time 0.0010 (0.0026) model time 0.4740 (0.4764) loss 3.1498 (3.2605) grad_norm 1.5140 (1.4701) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][340/625] eta 0:02:16 lr 0.001114 wd 0.0500 time 0.4750 (0.4798) data time 0.0011 (0.0026) model time 0.4739 (0.4764) loss 3.7511 (3.2669) grad_norm 1.8111 (1.4722) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][350/625] eta 0:02:11 lr 0.001114 wd 0.0500 time 0.4835 (0.4797) data time 0.0010 (0.0025) model time 0.4825 (0.4763) loss 2.2380 (3.2712) grad_norm 0.8692 (1.4677) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][360/625] eta 0:02:07 lr 0.001114 wd 0.0500 time 0.4761 (0.4796) data time 0.0011 (0.0025) model time 0.4751 (0.4763) loss 4.0364 (3.2727) grad_norm 1.3826 (1.4723) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:25:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][370/625] eta 0:02:02 lr 0.001114 wd 0.0500 time 0.4735 (0.4795) data time 0.0011 (0.0024) model time 0.4724 (0.4762) loss 3.7178 (3.2721) grad_norm 1.1232 (1.4679) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][380/625] eta 0:01:57 lr 0.001114 wd 0.0500 time 0.4815 (0.4794) data time 0.0011 (0.0024) model time 0.4804 (0.4762) loss 3.1294 (3.2696) grad_norm 1.5195 (1.4611) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][390/625] eta 0:01:52 lr 0.001114 wd 0.0500 time 0.4776 (0.4793) data time 0.0012 (0.0024) model time 0.4764 (0.4761) loss 3.7994 (3.2708) grad_norm 1.8272 (1.4563) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][400/625] eta 0:01:47 lr 0.001114 wd 0.0500 time 0.4760 (0.4792) data time 0.0012 (0.0024) model time 0.4749 (0.4761) loss 3.2265 (3.2746) grad_norm 1.0263 (1.4593) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][410/625] eta 0:01:43 lr 0.001114 wd 0.0500 time 0.4877 (0.4791) data time 0.0012 (0.0023) model time 0.4866 (0.4760) loss 3.3022 (3.2685) grad_norm 1.4711 (1.4579) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][420/625] eta 0:01:38 lr 0.001114 wd 0.0500 time 0.4748 (0.4791) data time 0.0011 (0.0023) model time 0.4738 (0.4760) loss 3.8804 (3.2693) grad_norm 1.4201 (1.4606) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][430/625] eta 0:01:33 lr 0.001114 wd 0.0500 time 0.4741 (0.4790) data time 0.0010 (0.0023) model time 0.4732 (0.4760) loss 4.1308 (3.2730) grad_norm 1.7585 (1.4623) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][440/625] eta 0:01:28 lr 0.001113 wd 0.0500 time 0.4761 (0.4790) data time 0.0011 (0.0023) model time 0.4750 (0.4760) loss 3.1944 (3.2640) grad_norm 2.0055 (1.4702) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][450/625] eta 0:01:23 lr 0.001113 wd 0.0500 time 0.6925 (0.4794) data time 0.0011 (0.0022) model time 0.6914 (0.4765) loss 3.4083 (3.2662) grad_norm 1.5502 (1.4708) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][460/625] eta 0:01:19 lr 0.001113 wd 0.0500 time 0.4087 (0.4796) data time 0.0012 (0.0022) model time 0.4075 (0.4768) loss 3.0488 (3.2687) grad_norm 1.0973 (1.4696) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][470/625] eta 0:01:14 lr 0.001113 wd 0.0500 time 0.4828 (0.4796) data time 0.0009 (0.0022) model time 0.4820 (0.4769) loss 3.6553 (3.2731) grad_norm 1.2576 (1.4662) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][480/625] eta 0:01:09 lr 0.001113 wd 0.0500 time 0.4823 (0.4796) data time 0.0009 (0.0022) model time 0.4814 (0.4769) loss 2.8945 (3.2695) grad_norm 1.7142 (1.4615) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:26:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][490/625] eta 0:01:04 lr 0.001113 wd 0.0500 time 0.4826 (0.4796) data time 0.0011 (0.0021) model time 0.4815 (0.4769) loss 3.1238 (3.2671) grad_norm 1.2876 (1.4604) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][500/625] eta 0:00:59 lr 0.001113 wd 0.0500 time 0.4730 (0.4796) data time 0.0011 (0.0021) model time 0.4719 (0.4769) loss 4.1151 (3.2684) grad_norm 1.7353 (1.4579) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][510/625] eta 0:00:55 lr 0.001113 wd 0.0500 time 0.4757 (0.4795) data time 0.0011 (0.0021) model time 0.4746 (0.4769) loss 2.8162 (3.2723) grad_norm 1.0995 (1.4587) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][520/625] eta 0:00:50 lr 0.001113 wd 0.0500 time 0.4735 (0.4795) data time 0.0008 (0.0021) model time 0.4727 (0.4768) loss 3.1359 (3.2694) grad_norm 1.5159 (1.4567) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][530/625] eta 0:00:45 lr 0.001113 wd 0.0500 time 0.4865 (0.4794) data time 0.0011 (0.0021) model time 0.4854 (0.4768) loss 2.3827 (3.2679) grad_norm 1.3422 (1.4550) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][540/625] eta 0:00:40 lr 0.001113 wd 0.0500 time 0.4765 (0.4794) data time 0.0008 (0.0020) model time 0.4757 (0.4768) loss 2.4156 (3.2685) grad_norm 1.9280 (1.4557) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][550/625] eta 0:00:35 lr 0.001113 wd 0.0500 time 0.4774 (0.4794) data time 0.0008 (0.0020) model time 0.4766 (0.4768) loss 2.6652 (3.2724) grad_norm 1.4685 (1.4553) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][560/625] eta 0:00:31 lr 0.001113 wd 0.0500 time 0.4750 (0.4793) data time 0.0009 (0.0020) model time 0.4742 (0.4768) loss 4.2786 (3.2757) grad_norm 1.6785 (1.4554) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][570/625] eta 0:00:26 lr 0.001113 wd 0.0500 time 0.4755 (0.4793) data time 0.0011 (0.0020) model time 0.4744 (0.4768) loss 3.5800 (3.2780) grad_norm 1.4462 (1.4575) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][580/625] eta 0:00:21 lr 0.001113 wd 0.0500 time 0.4772 (0.4793) data time 0.0011 (0.0020) model time 0.4761 (0.4768) loss 3.2692 (3.2830) grad_norm 1.8376 (1.4590) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][590/625] eta 0:00:16 lr 0.001113 wd 0.0500 time 0.4769 (0.4793) data time 0.0008 (0.0020) model time 0.4761 (0.4768) loss 3.0444 (3.2855) grad_norm 1.6561 (1.4600) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][600/625] eta 0:00:11 lr 0.001113 wd 0.0500 time 0.4762 (0.4796) data time 0.0012 (0.0020) model time 0.4750 (0.4772) loss 3.4570 (3.2852) grad_norm 1.2754 (1.4631) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][610/625] eta 0:00:07 lr 0.001113 wd 0.0500 time 0.4752 (0.4795) data time 0.0006 (0.0020) model time 0.4746 (0.4771) loss 3.5014 (3.2829) grad_norm 1.3627 (1.4617) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [68/300][620/625] eta 0:00:02 lr 0.001112 wd 0.0500 time 0.4702 (0.4794) data time 0.0005 (0.0019) model time 0.4697 (0.4771) loss 3.1565 (3.2845) grad_norm 1.5409 (1.4643) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:27:59 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 68 training takes 0:04:59 [2024-08-07 09:27:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:28:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:28:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 0.5942 (0.5942) Acc@1 86.133 (86.133) Acc@5 97.900 (97.900) Mem 16699MB [2024-08-07 09:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.9854 (0.7322) Acc@1 75.879 (82.986) Acc@5 94.238 (96.800) Mem 16699MB [2024-08-07 09:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0879 (0.8784) Acc@1 72.754 (79.467) Acc@5 92.432 (95.013) Mem 16699MB [2024-08-07 09:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.261 Acc@5 94.992 [2024-08-07 09:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-08-07 09:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.26% [2024-08-07 09:28:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 09:28:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 09:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.5269 (0.5269) Acc@1 87.939 (87.939) Acc@5 98.340 (98.340) Mem 16699MB [2024-08-07 09:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8813 (0.6675) Acc@1 78.467 (84.703) Acc@5 95.166 (97.235) Mem 16699MB [2024-08-07 09:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0088 (0.7990) Acc@1 73.779 (81.234) Acc@5 94.141 (95.824) Mem 16699MB [2024-08-07 09:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.954 Acc@5 95.823 [2024-08-07 09:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-07 09:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 80.95% [2024-08-07 09:28:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:28:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:28:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][0/625] eta 0:08:46 lr 0.001112 wd 0.0500 time 0.8423 (0.8423) data time 0.4282 (0.4282) model time 0.0000 (0.0000) loss 2.4331 (2.4331) grad_norm 1.3868 (1.3868) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][10/625] eta 0:05:14 lr 0.001112 wd 0.0500 time 0.4772 (0.5110) data time 0.0011 (0.0399) model time 0.0000 (0.0000) loss 3.2493 (3.1938) grad_norm 1.3990 (1.2532) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][20/625] eta 0:05:04 lr 0.001112 wd 0.0500 time 0.4761 (0.5025) data time 0.0010 (0.0215) model time 0.0000 (0.0000) loss 3.2451 (3.2490) grad_norm 1.2332 (1.2656) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][30/625] eta 0:04:54 lr 0.001112 wd 0.0500 time 0.4804 (0.4944) data time 0.0010 (0.0149) model time 0.0000 (0.0000) loss 3.7526 (3.2820) grad_norm 2.7586 (1.4828) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][40/625] eta 0:04:46 lr 0.001112 wd 0.0500 time 0.4709 (0.4905) data time 0.0010 (0.0115) model time 0.0000 (0.0000) loss 3.8550 (3.3761) grad_norm 1.5429 (1.5037) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][50/625] eta 0:04:40 lr 0.001112 wd 0.0500 time 0.4731 (0.4875) data time 0.0012 (0.0095) model time 0.0000 (0.0000) loss 4.0451 (3.3284) grad_norm 2.6284 (1.5252) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][60/625] eta 0:04:34 lr 0.001112 wd 0.0500 time 0.4762 (0.4855) data time 0.0008 (0.0081) model time 0.4754 (0.4742) loss 3.2986 (3.3381) grad_norm 1.1918 (1.4852) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][70/625] eta 0:04:28 lr 0.001112 wd 0.0500 time 0.4725 (0.4838) data time 0.0011 (0.0071) model time 0.4714 (0.4732) loss 3.3206 (3.3418) grad_norm 1.2949 (1.4861) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][80/625] eta 0:04:22 lr 0.001112 wd 0.0500 time 0.4709 (0.4825) data time 0.0011 (0.0064) model time 0.4699 (0.4730) loss 3.3368 (3.3477) grad_norm 1.2502 (1.4664) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:28:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][90/625] eta 0:04:17 lr 0.001112 wd 0.0500 time 0.4724 (0.4816) data time 0.0010 (0.0058) model time 0.4714 (0.4729) loss 3.3634 (3.3570) grad_norm 1.9852 (1.4937) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][100/625] eta 0:04:12 lr 0.001112 wd 0.0500 time 0.4721 (0.4808) data time 0.0007 (0.0053) model time 0.4713 (0.4729) loss 3.3527 (3.3853) grad_norm 1.3839 (1.5379) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][110/625] eta 0:04:07 lr 0.001112 wd 0.0500 time 0.4748 (0.4801) data time 0.0008 (0.0050) model time 0.4741 (0.4728) loss 3.6285 (3.3763) grad_norm 1.1911 (1.5125) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][120/625] eta 0:04:02 lr 0.001112 wd 0.0500 time 0.4698 (0.4796) data time 0.0011 (0.0046) model time 0.4688 (0.4728) loss 3.4463 (3.3720) grad_norm 1.5518 (1.5020) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][130/625] eta 0:03:57 lr 0.001112 wd 0.0500 time 0.4675 (0.4793) data time 0.0010 (0.0044) model time 0.4665 (0.4730) loss 2.1814 (3.3582) grad_norm 1.3386 (1.5173) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][140/625] eta 0:03:53 lr 0.001112 wd 0.0500 time 0.4769 (0.4806) data time 0.0011 (0.0041) model time 0.4759 (0.4755) loss 3.1032 (3.3443) grad_norm 1.6984 (1.5226) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][150/625] eta 0:03:48 lr 0.001112 wd 0.0500 time 0.4763 (0.4801) data time 0.0008 (0.0039) model time 0.4755 (0.4753) loss 3.1191 (3.3430) grad_norm 1.9168 (1.5164) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][160/625] eta 0:03:43 lr 0.001112 wd 0.0500 time 0.4731 (0.4799) data time 0.0010 (0.0038) model time 0.4720 (0.4752) loss 3.5397 (3.3471) grad_norm 1.2471 (1.5067) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][170/625] eta 0:03:38 lr 0.001112 wd 0.0500 time 0.4788 (0.4797) data time 0.0008 (0.0036) model time 0.4781 (0.4753) loss 3.4547 (3.3481) grad_norm 1.4594 (1.5007) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][180/625] eta 0:03:33 lr 0.001111 wd 0.0500 time 0.4735 (0.4794) data time 0.0009 (0.0035) model time 0.4726 (0.4752) loss 3.8963 (3.3380) grad_norm 1.3480 (1.5050) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][190/625] eta 0:03:28 lr 0.001111 wd 0.0500 time 0.4702 (0.4792) data time 0.0008 (0.0034) model time 0.4694 (0.4751) loss 3.9452 (3.3241) grad_norm 1.1814 (1.5111) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][200/625] eta 0:03:23 lr 0.001111 wd 0.0500 time 0.4770 (0.4794) data time 0.0011 (0.0032) model time 0.4760 (0.4755) loss 3.1776 (3.3228) grad_norm 1.0132 (1.5045) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][210/625] eta 0:03:18 lr 0.001111 wd 0.0500 time 0.4791 (0.4793) data time 0.0010 (0.0031) model time 0.4781 (0.4756) loss 3.1092 (3.3260) grad_norm 1.4976 (1.5009) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][220/625] eta 0:03:14 lr 0.001111 wd 0.0500 time 0.4792 (0.4792) data time 0.0008 (0.0030) model time 0.4784 (0.4756) loss 3.8859 (3.3220) grad_norm 1.7410 (1.4982) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][230/625] eta 0:03:09 lr 0.001111 wd 0.0500 time 0.4710 (0.4791) data time 0.0010 (0.0030) model time 0.4700 (0.4756) loss 2.7708 (3.3180) grad_norm 1.2627 (1.5065) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][240/625] eta 0:03:04 lr 0.001111 wd 0.0500 time 0.4704 (0.4796) data time 0.0011 (0.0029) model time 0.4693 (0.4763) loss 2.9841 (3.3065) grad_norm 1.3773 (1.4964) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][250/625] eta 0:02:59 lr 0.001111 wd 0.0500 time 0.4751 (0.4794) data time 0.0011 (0.0028) model time 0.4740 (0.4762) loss 2.7227 (3.2942) grad_norm 1.3807 (1.4902) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][260/625] eta 0:02:54 lr 0.001111 wd 0.0500 time 0.4748 (0.4793) data time 0.0008 (0.0028) model time 0.4740 (0.4762) loss 2.5016 (3.2824) grad_norm 1.1133 (1.4842) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][270/625] eta 0:02:50 lr 0.001111 wd 0.0500 time 0.4728 (0.4791) data time 0.0009 (0.0027) model time 0.4719 (0.4761) loss 2.3527 (3.2814) grad_norm 1.2620 (1.4773) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][280/625] eta 0:02:45 lr 0.001111 wd 0.0500 time 0.4768 (0.4791) data time 0.0011 (0.0026) model time 0.4757 (0.4761) loss 2.1169 (3.2673) grad_norm 1.7093 (1.4734) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][290/625] eta 0:02:40 lr 0.001111 wd 0.0500 time 0.4806 (0.4790) data time 0.0008 (0.0026) model time 0.4798 (0.4760) loss 3.5350 (3.2724) grad_norm 1.9373 (1.4755) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][300/625] eta 0:02:35 lr 0.001111 wd 0.0500 time 0.4780 (0.4789) data time 0.0011 (0.0025) model time 0.4769 (0.4760) loss 3.1819 (3.2680) grad_norm 1.0710 (1.4666) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][310/625] eta 0:02:30 lr 0.001111 wd 0.0500 time 0.4758 (0.4788) data time 0.0009 (0.0025) model time 0.4749 (0.4760) loss 2.8132 (3.2701) grad_norm 1.0550 (1.4608) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][320/625] eta 0:02:26 lr 0.001111 wd 0.0500 time 0.4773 (0.4787) data time 0.0010 (0.0025) model time 0.4763 (0.4760) loss 3.5159 (3.2664) grad_norm 1.3565 (1.4635) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][330/625] eta 0:02:21 lr 0.001111 wd 0.0500 time 0.4760 (0.4787) data time 0.0010 (0.0024) model time 0.4750 (0.4760) loss 3.3126 (3.2749) grad_norm 1.6200 (1.4636) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][340/625] eta 0:02:16 lr 0.001111 wd 0.0500 time 0.4836 (0.4787) data time 0.0010 (0.0024) model time 0.4826 (0.4761) loss 3.2624 (3.2713) grad_norm 2.2037 (1.4684) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][350/625] eta 0:02:11 lr 0.001111 wd 0.0500 time 0.4745 (0.4786) data time 0.0008 (0.0023) model time 0.4736 (0.4760) loss 3.8622 (3.2697) grad_norm 0.9728 (1.4675) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][360/625] eta 0:02:06 lr 0.001110 wd 0.0500 time 0.4754 (0.4786) data time 0.0011 (0.0023) model time 0.4744 (0.4760) loss 3.8302 (3.2690) grad_norm 1.3191 (1.4630) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][370/625] eta 0:02:02 lr 0.001110 wd 0.0500 time 0.4756 (0.4785) data time 0.0011 (0.0023) model time 0.4746 (0.4760) loss 3.0563 (3.2747) grad_norm 1.4679 (1.4612) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][380/625] eta 0:01:57 lr 0.001110 wd 0.0500 time 0.4757 (0.4784) data time 0.0011 (0.0022) model time 0.4746 (0.4759) loss 3.6627 (3.2700) grad_norm 1.3305 (1.4630) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][390/625] eta 0:01:52 lr 0.001110 wd 0.0500 time 0.4778 (0.4788) data time 0.0011 (0.0022) model time 0.4767 (0.4764) loss 2.8742 (3.2719) grad_norm 1.0970 (1.4580) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][400/625] eta 0:01:47 lr 0.001110 wd 0.0500 time 0.4791 (0.4788) data time 0.0008 (0.0022) model time 0.4783 (0.4764) loss 2.5233 (3.2652) grad_norm 2.0878 (1.4563) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][410/625] eta 0:01:42 lr 0.001110 wd 0.0500 time 0.4795 (0.4788) data time 0.0009 (0.0022) model time 0.4786 (0.4764) loss 3.4948 (3.2695) grad_norm 1.1658 (1.4526) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][420/625] eta 0:01:38 lr 0.001110 wd 0.0500 time 0.4733 (0.4787) data time 0.0010 (0.0021) model time 0.4724 (0.4764) loss 3.8336 (3.2688) grad_norm 1.1099 (1.4513) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][430/625] eta 0:01:33 lr 0.001110 wd 0.0500 time 0.4844 (0.4787) data time 0.0011 (0.0021) model time 0.4833 (0.4764) loss 3.5076 (3.2741) grad_norm 1.5059 (1.4518) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][440/625] eta 0:01:28 lr 0.001110 wd 0.0500 time 0.4761 (0.4787) data time 0.0011 (0.0021) model time 0.4751 (0.4764) loss 3.2974 (3.2694) grad_norm 1.2913 (1.4549) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][450/625] eta 0:01:23 lr 0.001110 wd 0.0500 time 0.4773 (0.4787) data time 0.0009 (0.0021) model time 0.4764 (0.4764) loss 3.8264 (3.2724) grad_norm 1.5667 (1.4574) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][460/625] eta 0:01:18 lr 0.001110 wd 0.0500 time 0.4824 (0.4786) data time 0.0010 (0.0021) model time 0.4814 (0.4764) loss 3.1260 (3.2657) grad_norm 1.3756 (1.4532) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][470/625] eta 0:01:14 lr 0.001110 wd 0.0500 time 0.4699 (0.4791) data time 0.0011 (0.0020) model time 0.4688 (0.4769) loss 3.0531 (3.2621) grad_norm 1.2253 (1.4480) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][480/625] eta 0:01:09 lr 0.001110 wd 0.0500 time 0.4701 (0.4790) data time 0.0011 (0.0020) model time 0.4690 (0.4769) loss 3.5948 (3.2604) grad_norm 0.9519 (1.4493) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][490/625] eta 0:01:04 lr 0.001110 wd 0.0500 time 0.4723 (0.4789) data time 0.0012 (0.0020) model time 0.4711 (0.4768) loss 3.2730 (3.2626) grad_norm 1.4584 (1.4480) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][500/625] eta 0:00:59 lr 0.001110 wd 0.0500 time 0.4846 (0.4789) data time 0.0010 (0.0020) model time 0.4836 (0.4768) loss 3.8786 (3.2605) grad_norm 1.2811 (1.4549) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][510/625] eta 0:00:55 lr 0.001110 wd 0.0500 time 0.4760 (0.4789) data time 0.0010 (0.0020) model time 0.4749 (0.4768) loss 3.6625 (3.2618) grad_norm 1.2381 (1.4520) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][520/625] eta 0:00:50 lr 0.001110 wd 0.0500 time 0.4758 (0.4788) data time 0.0008 (0.0020) model time 0.4749 (0.4767) loss 2.9362 (3.2608) grad_norm 1.0603 (1.4510) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][530/625] eta 0:00:45 lr 0.001109 wd 0.0500 time 0.4790 (0.4788) data time 0.0008 (0.0019) model time 0.4782 (0.4767) loss 2.2298 (3.2590) grad_norm 1.6434 (1.4618) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][540/625] eta 0:00:40 lr 0.001109 wd 0.0500 time 0.4764 (0.4788) data time 0.0010 (0.0019) model time 0.4754 (0.4768) loss 3.7299 (3.2580) grad_norm 1.1669 (1.4623) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][550/625] eta 0:00:35 lr 0.001109 wd 0.0500 time 0.4828 (0.4788) data time 0.0011 (0.0019) model time 0.4817 (0.4768) loss 2.5365 (3.2573) grad_norm 1.4629 (1.4600) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][560/625] eta 0:00:31 lr 0.001109 wd 0.0500 time 0.4798 (0.4788) data time 0.0009 (0.0019) model time 0.4789 (0.4768) loss 3.5050 (3.2583) grad_norm 1.8331 (1.4646) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][570/625] eta 0:00:26 lr 0.001109 wd 0.0500 time 0.4833 (0.4788) data time 0.0011 (0.0019) model time 0.4822 (0.4768) loss 3.4940 (3.2614) grad_norm 1.2371 (1.4696) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][580/625] eta 0:00:21 lr 0.001109 wd 0.0500 time 0.4803 (0.4788) data time 0.0011 (0.0019) model time 0.4792 (0.4768) loss 3.4819 (3.2573) grad_norm 1.3410 (1.4678) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][590/625] eta 0:00:16 lr 0.001109 wd 0.0500 time 0.4727 (0.4787) data time 0.0010 (0.0019) model time 0.4717 (0.4768) loss 3.2231 (3.2584) grad_norm 1.4721 (1.4662) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:32:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][600/625] eta 0:00:11 lr 0.001109 wd 0.0500 time 0.4766 (0.4788) data time 0.0011 (0.0018) model time 0.4755 (0.4768) loss 2.2781 (3.2599) grad_norm 1.5415 (1.4676) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][610/625] eta 0:00:07 lr 0.001109 wd 0.0500 time 0.4713 (0.4787) data time 0.0008 (0.0018) model time 0.4705 (0.4768) loss 2.7429 (3.2584) grad_norm 1.0429 (1.4668) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [69/300][620/625] eta 0:00:02 lr 0.001109 wd 0.0500 time 0.4661 (0.4786) data time 0.0005 (0.0018) model time 0.4655 (0.4766) loss 2.5690 (3.2597) grad_norm 1.1651 (1.4679) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 69 training takes 0:04:59 [2024-08-07 09:33:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:33:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.529 (0.529) Loss 0.6055 (0.6055) Acc@1 86.230 (86.230) Acc@5 97.803 (97.803) Mem 16699MB [2024-08-07 09:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9814 (0.7434) Acc@1 76.367 (82.879) Acc@5 93.896 (96.715) Mem 16699MB [2024-08-07 09:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.1162 (0.8910) Acc@1 72.021 (79.322) Acc@5 93.115 (95.054) Mem 16699MB [2024-08-07 09:33:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.145 Acc@5 95.064 [2024-08-07 09:33:16 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.1% [2024-08-07 09:33:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.865 (0.865) Loss 0.5259 (0.5259) Acc@1 88.037 (88.037) Acc@5 98.340 (98.340) Mem 16699MB [2024-08-07 09:33:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.193) Loss 0.8779 (0.6665) Acc@1 78.564 (84.748) Acc@5 95.215 (97.270) Mem 16699MB [2024-08-07 09:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 1.0059 (0.7971) Acc@1 74.072 (81.352) Acc@5 94.141 (95.850) Mem 16699MB [2024-08-07 09:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.076 Acc@5 95.841 [2024-08-07 09:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-07 09:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.08% [2024-08-07 09:33:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:33:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:33:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][0/625] eta 0:10:43 lr 0.001109 wd 0.0500 time 1.0295 (1.0295) data time 0.3883 (0.3883) model time 0.0000 (0.0000) loss 3.4992 (3.4992) grad_norm 1.3426 (1.3426) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][10/625] eta 0:05:25 lr 0.001109 wd 0.0500 time 0.4755 (0.5295) data time 0.0008 (0.0363) model time 0.0000 (0.0000) loss 2.9183 (3.6045) grad_norm 1.4959 (1.4644) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][20/625] eta 0:05:05 lr 0.001109 wd 0.0500 time 0.4783 (0.5057) data time 0.0008 (0.0195) model time 0.0000 (0.0000) loss 2.5659 (3.4074) grad_norm 1.2035 (1.7690) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][30/625] eta 0:04:55 lr 0.001109 wd 0.0500 time 0.4859 (0.4971) data time 0.0007 (0.0135) model time 0.0000 (0.0000) loss 3.7887 (3.3473) grad_norm 1.1885 (1.6837) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][40/625] eta 0:04:48 lr 0.001109 wd 0.0500 time 0.4820 (0.4927) data time 0.0011 (0.0105) model time 0.0000 (0.0000) loss 3.1392 (3.2954) grad_norm 1.2033 (1.6047) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][50/625] eta 0:04:41 lr 0.001109 wd 0.0500 time 0.4726 (0.4898) data time 0.0008 (0.0087) model time 0.0000 (0.0000) loss 3.9174 (3.2804) grad_norm 1.3746 (1.5600) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][60/625] eta 0:04:39 lr 0.001109 wd 0.0500 time 0.4799 (0.4949) data time 0.0012 (0.0074) model time 0.4787 (0.5196) loss 3.5347 (3.2836) grad_norm 1.1355 (1.5546) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][70/625] eta 0:04:33 lr 0.001109 wd 0.0500 time 0.4781 (0.4927) data time 0.0008 (0.0065) model time 0.4773 (0.4989) loss 3.7372 (3.2928) grad_norm 1.4442 (1.5643) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][80/625] eta 0:04:27 lr 0.001108 wd 0.0500 time 0.4784 (0.4911) data time 0.0010 (0.0059) model time 0.4773 (0.4921) loss 3.3809 (3.3163) grad_norm 1.3819 (1.5798) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][90/625] eta 0:04:22 lr 0.001108 wd 0.0500 time 0.4743 (0.4898) data time 0.0008 (0.0053) model time 0.4735 (0.4886) loss 1.9646 (3.3060) grad_norm 1.1426 (1.5565) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][100/625] eta 0:04:16 lr 0.001108 wd 0.0500 time 0.4829 (0.4889) data time 0.0010 (0.0049) model time 0.4819 (0.4869) loss 2.9586 (3.2895) grad_norm 1.8358 (1.5597) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][110/625] eta 0:04:11 lr 0.001108 wd 0.0500 time 0.4799 (0.4879) data time 0.0011 (0.0046) model time 0.4788 (0.4852) loss 3.4648 (3.2703) grad_norm 4.0518 (1.5722) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][120/625] eta 0:04:06 lr 0.001108 wd 0.0500 time 0.4794 (0.4873) data time 0.0008 (0.0043) model time 0.4786 (0.4844) loss 3.8313 (3.2831) grad_norm 1.8199 (1.5720) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][130/625] eta 0:04:00 lr 0.001108 wd 0.0500 time 0.4763 (0.4867) data time 0.0010 (0.0041) model time 0.4752 (0.4835) loss 3.5613 (3.2852) grad_norm 1.7633 (1.5466) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-07 09:34:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][140/625] eta 0:03:55 lr 0.001108 wd 0.0500 time 0.4743 (0.4861) data time 0.0009 (0.0038) model time 0.4734 (0.4830) loss 2.6993 (3.2713) grad_norm 1.4537 (1.5304) loss_scale 4096.0000 (2091.5745) mem 16699MB [2024-08-07 09:34:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][150/625] eta 0:03:50 lr 0.001108 wd 0.0500 time 0.4773 (0.4855) data time 0.0009 (0.0037) model time 0.4764 (0.4822) loss 4.0368 (3.2838) grad_norm 2.4129 (1.5407) loss_scale 4096.0000 (2224.3179) mem 16699MB [2024-08-07 09:34:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][160/625] eta 0:03:45 lr 0.001108 wd 0.0500 time 0.4769 (0.4850) data time 0.0008 (0.0035) model time 0.4760 (0.4816) loss 3.0750 (3.2931) grad_norm 1.7150 (1.5485) loss_scale 4096.0000 (2340.5714) mem 16699MB [2024-08-07 09:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][170/625] eta 0:03:40 lr 0.001108 wd 0.0500 time 0.4713 (0.4844) data time 0.0011 (0.0034) model time 0.4702 (0.4809) loss 3.5312 (3.2802) grad_norm 1.1612 (1.5400) loss_scale 4096.0000 (2443.2281) mem 16699MB [2024-08-07 09:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][180/625] eta 0:03:35 lr 0.001108 wd 0.0500 time 0.4748 (0.4838) data time 0.0009 (0.0033) model time 0.4739 (0.4803) loss 3.0495 (3.2819) grad_norm 1.4041 (1.5458) loss_scale 4096.0000 (2534.5414) mem 16699MB [2024-08-07 09:34:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][190/625] eta 0:03:30 lr 0.001108 wd 0.0500 time 0.4772 (0.4834) data time 0.0012 (0.0032) model time 0.4760 (0.4799) loss 3.7349 (3.2875) grad_norm 1.3833 (1.5351) loss_scale 4096.0000 (2616.2932) mem 16699MB [2024-08-07 09:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][200/625] eta 0:03:25 lr 0.001108 wd 0.0500 time 0.4757 (0.4830) data time 0.0011 (0.0031) model time 0.4746 (0.4795) loss 3.7532 (3.2798) grad_norm 2.1714 (1.5349) loss_scale 4096.0000 (2689.9104) mem 16699MB [2024-08-07 09:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][210/625] eta 0:03:20 lr 0.001108 wd 0.0500 time 0.4823 (0.4827) data time 0.0014 (0.0030) model time 0.4809 (0.4792) loss 3.6466 (3.2739) grad_norm 1.5508 (1.5420) loss_scale 4096.0000 (2756.5498) mem 16699MB [2024-08-07 09:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][220/625] eta 0:03:15 lr 0.001108 wd 0.0500 time 0.4753 (0.4824) data time 0.0008 (0.0029) model time 0.4745 (0.4790) loss 3.4925 (3.2698) grad_norm 1.5198 (1.5469) loss_scale 4096.0000 (2817.1584) mem 16699MB [2024-08-07 09:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][230/625] eta 0:03:10 lr 0.001108 wd 0.0500 time 0.4738 (0.4821) data time 0.0009 (0.0028) model time 0.4729 (0.4787) loss 3.6745 (3.2757) grad_norm 1.1668 (1.5436) loss_scale 4096.0000 (2872.5195) mem 16699MB [2024-08-07 09:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][240/625] eta 0:03:05 lr 0.001108 wd 0.0500 time 0.4738 (0.4818) data time 0.0011 (0.0027) model time 0.4727 (0.4785) loss 3.3340 (3.2635) grad_norm 1.2713 (1.5316) loss_scale 4096.0000 (2923.2863) mem 16699MB [2024-08-07 09:35:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][250/625] eta 0:03:01 lr 0.001108 wd 0.0500 time 0.4799 (0.4828) data time 0.0008 (0.0027) model time 0.4791 (0.4799) loss 2.6098 (3.2654) grad_norm 1.6905 (1.5326) loss_scale 4096.0000 (2970.0080) mem 16699MB [2024-08-07 09:35:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][260/625] eta 0:02:56 lr 0.001107 wd 0.0500 time 0.4755 (0.4825) data time 0.0008 (0.0026) model time 0.4747 (0.4796) loss 3.6493 (3.2695) grad_norm 1.2392 (1.5277) loss_scale 4096.0000 (3013.1494) mem 16699MB [2024-08-07 09:35:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][270/625] eta 0:02:51 lr 0.001107 wd 0.0500 time 0.4750 (0.4823) data time 0.0009 (0.0026) model time 0.4741 (0.4794) loss 3.9258 (3.2669) grad_norm 1.2827 (1.5163) loss_scale 4096.0000 (3053.1070) mem 16699MB [2024-08-07 09:35:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][280/625] eta 0:02:46 lr 0.001107 wd 0.0500 time 0.4769 (0.4821) data time 0.0008 (0.0025) model time 0.4760 (0.4792) loss 3.7533 (3.2635) grad_norm 1.2993 (1.5227) loss_scale 4096.0000 (3090.2206) mem 16699MB [2024-08-07 09:35:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][290/625] eta 0:02:41 lr 0.001107 wd 0.0500 time 0.4767 (0.4819) data time 0.0009 (0.0025) model time 0.4759 (0.4791) loss 3.3414 (3.2558) grad_norm 1.5825 (1.5154) loss_scale 4096.0000 (3124.7835) mem 16699MB [2024-08-07 09:35:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][300/625] eta 0:02:36 lr 0.001107 wd 0.0500 time 0.4848 (0.4819) data time 0.0008 (0.0024) model time 0.4839 (0.4791) loss 3.9622 (3.2606) grad_norm 1.2886 (1.5101) loss_scale 4096.0000 (3157.0498) mem 16699MB [2024-08-07 09:35:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][310/625] eta 0:02:31 lr 0.001107 wd 0.0500 time 0.4879 (0.4818) data time 0.0009 (0.0024) model time 0.4869 (0.4791) loss 2.6013 (3.2569) grad_norm 1.5194 (1.5041) loss_scale 4096.0000 (3187.2412) mem 16699MB [2024-08-07 09:35:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][320/625] eta 0:02:26 lr 0.001107 wd 0.0500 time 0.4751 (0.4817) data time 0.0011 (0.0023) model time 0.4741 (0.4790) loss 3.7798 (3.2562) grad_norm 1.1698 (1.4994) loss_scale 4096.0000 (3215.5514) mem 16699MB [2024-08-07 09:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][330/625] eta 0:02:22 lr 0.001107 wd 0.0500 time 0.4799 (0.4817) data time 0.0009 (0.0023) model time 0.4791 (0.4790) loss 3.8098 (3.2607) grad_norm 1.4134 (1.4951) loss_scale 4096.0000 (3242.1511) mem 16699MB [2024-08-07 09:36:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][340/625] eta 0:02:17 lr 0.001107 wd 0.0500 time 0.4815 (0.4820) data time 0.0011 (0.0023) model time 0.4804 (0.4795) loss 3.3040 (3.2587) grad_norm 2.0609 (1.4911) loss_scale 4096.0000 (3267.1906) mem 16699MB [2024-08-07 09:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][350/625] eta 0:02:12 lr 0.001107 wd 0.0500 time 0.4773 (0.4819) data time 0.0008 (0.0022) model time 0.4765 (0.4794) loss 3.3754 (3.2634) grad_norm 1.2617 (1.4897) loss_scale 4096.0000 (3290.8034) mem 16699MB [2024-08-07 09:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][360/625] eta 0:02:07 lr 0.001107 wd 0.0500 time 0.4775 (0.4818) data time 0.0009 (0.0022) model time 0.4765 (0.4793) loss 2.7420 (3.2652) grad_norm 1.0388 (1.4906) loss_scale 4096.0000 (3313.1080) mem 16699MB [2024-08-07 09:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][370/625] eta 0:02:02 lr 0.001107 wd 0.0500 time 0.4785 (0.4817) data time 0.0008 (0.0022) model time 0.4777 (0.4793) loss 1.8861 (3.2618) grad_norm 1.3694 (1.4882) loss_scale 4096.0000 (3334.2102) mem 16699MB [2024-08-07 09:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][380/625] eta 0:01:57 lr 0.001107 wd 0.0500 time 0.4744 (0.4816) data time 0.0010 (0.0022) model time 0.4734 (0.4791) loss 3.4511 (3.2678) grad_norm 1.0916 (1.4807) loss_scale 4096.0000 (3354.2047) mem 16699MB [2024-08-07 09:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][390/625] eta 0:01:53 lr 0.001107 wd 0.0500 time 0.4745 (0.4814) data time 0.0009 (0.0021) model time 0.4736 (0.4790) loss 3.1059 (3.2665) grad_norm 1.0169 (1.4809) loss_scale 4096.0000 (3373.1765) mem 16699MB [2024-08-07 09:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][400/625] eta 0:01:48 lr 0.001107 wd 0.0500 time 0.4711 (0.4812) data time 0.0010 (0.0021) model time 0.4701 (0.4788) loss 3.6334 (3.2748) grad_norm 1.6915 (1.4813) loss_scale 4096.0000 (3391.2020) mem 16699MB [2024-08-07 09:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][410/625] eta 0:01:43 lr 0.001107 wd 0.0500 time 0.4706 (0.4811) data time 0.0008 (0.0021) model time 0.4698 (0.4787) loss 3.9053 (3.2746) grad_norm 1.6110 (1.4809) loss_scale 4096.0000 (3408.3504) mem 16699MB [2024-08-07 09:36:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][420/625] eta 0:01:38 lr 0.001107 wd 0.0500 time 0.4761 (0.4809) data time 0.0011 (0.0021) model time 0.4750 (0.4785) loss 3.3807 (3.2781) grad_norm 1.2596 (1.4788) loss_scale 4096.0000 (3424.6841) mem 16699MB [2024-08-07 09:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][430/625] eta 0:01:33 lr 0.001106 wd 0.0500 time 0.4751 (0.4808) data time 0.0008 (0.0020) model time 0.4743 (0.4785) loss 3.2162 (3.2760) grad_norm 1.5092 (1.4749) loss_scale 4096.0000 (3440.2599) mem 16699MB [2024-08-07 09:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][440/625] eta 0:01:28 lr 0.001106 wd 0.0500 time 0.4804 (0.4808) data time 0.0011 (0.0020) model time 0.4794 (0.4784) loss 3.1039 (3.2792) grad_norm 1.3206 (1.4766) loss_scale 4096.0000 (3455.1293) mem 16699MB [2024-08-07 09:36:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][450/625] eta 0:01:24 lr 0.001106 wd 0.0500 time 0.4820 (0.4807) data time 0.0010 (0.0020) model time 0.4809 (0.4784) loss 3.3417 (3.2793) grad_norm 1.1385 (1.4762) loss_scale 4096.0000 (3469.3392) mem 16699MB [2024-08-07 09:37:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][460/625] eta 0:01:19 lr 0.001106 wd 0.0500 time 0.4764 (0.4806) data time 0.0010 (0.0020) model time 0.4754 (0.4783) loss 2.6160 (3.2781) grad_norm 1.6003 (1.4758) loss_scale 4096.0000 (3482.9328) mem 16699MB [2024-08-07 09:37:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][470/625] eta 0:01:14 lr 0.001106 wd 0.0500 time 0.4799 (0.4817) data time 0.0008 (0.0020) model time 0.4791 (0.4795) loss 3.1584 (3.2759) grad_norm 2.0151 (1.4736) loss_scale 4096.0000 (3495.9490) mem 16699MB [2024-08-07 09:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][480/625] eta 0:01:09 lr 0.001106 wd 0.0500 time 0.4767 (0.4816) data time 0.0008 (0.0020) model time 0.4759 (0.4794) loss 3.5349 (3.2731) grad_norm 0.9939 (1.4752) loss_scale 4096.0000 (3508.4241) mem 16699MB [2024-08-07 09:37:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][490/625] eta 0:01:04 lr 0.001106 wd 0.0500 time 0.4732 (0.4814) data time 0.0011 (0.0019) model time 0.4721 (0.4793) loss 3.7334 (3.2780) grad_norm 1.2432 (1.4726) loss_scale 4096.0000 (3520.3910) mem 16699MB [2024-08-07 09:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][500/625] eta 0:01:00 lr 0.001106 wd 0.0500 time 0.4750 (0.4813) data time 0.0010 (0.0019) model time 0.4740 (0.4792) loss 3.1351 (3.2780) grad_norm 1.5068 (1.4710) loss_scale 4096.0000 (3531.8802) mem 16699MB [2024-08-07 09:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][510/625] eta 0:00:55 lr 0.001106 wd 0.0500 time 0.4833 (0.4813) data time 0.0010 (0.0019) model time 0.4823 (0.4792) loss 3.4075 (3.2795) grad_norm 1.0923 (1.4704) loss_scale 4096.0000 (3542.9198) mem 16699MB [2024-08-07 09:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][520/625] eta 0:00:50 lr 0.001106 wd 0.0500 time 0.4824 (0.4812) data time 0.0011 (0.0019) model time 0.4813 (0.4792) loss 3.7728 (3.2751) grad_norm 1.5295 (1.4713) loss_scale 4096.0000 (3553.5355) mem 16699MB [2024-08-07 09:37:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][530/625] eta 0:00:45 lr 0.001106 wd 0.0500 time 0.4782 (0.4813) data time 0.0011 (0.0019) model time 0.4772 (0.4792) loss 3.5786 (3.2769) grad_norm 1.5185 (1.4728) loss_scale 4096.0000 (3563.7514) mem 16699MB [2024-08-07 09:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][540/625] eta 0:00:40 lr 0.001106 wd 0.0500 time 0.4793 (0.4812) data time 0.0008 (0.0019) model time 0.4784 (0.4792) loss 1.9211 (3.2772) grad_norm 1.2960 (1.4696) loss_scale 4096.0000 (3573.5896) mem 16699MB [2024-08-07 09:37:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][550/625] eta 0:00:36 lr 0.001106 wd 0.0500 time 0.4706 (0.4812) data time 0.0011 (0.0019) model time 0.4694 (0.4791) loss 3.2809 (3.2759) grad_norm 1.1790 (1.4722) loss_scale 4096.0000 (3583.0708) mem 16699MB [2024-08-07 09:37:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][560/625] eta 0:00:31 lr 0.001106 wd 0.0500 time 0.4738 (0.4811) data time 0.0011 (0.0018) model time 0.4727 (0.4790) loss 2.8927 (3.2774) grad_norm 1.0660 (1.4684) loss_scale 4096.0000 (3592.2139) mem 16699MB [2024-08-07 09:37:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][570/625] eta 0:00:26 lr 0.001106 wd 0.0500 time 0.4736 (0.4810) data time 0.0011 (0.0018) model time 0.4725 (0.4789) loss 3.3049 (3.2797) grad_norm 1.7702 (1.4683) loss_scale 4096.0000 (3601.0368) mem 16699MB [2024-08-07 09:38:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][580/625] eta 0:00:21 lr 0.001106 wd 0.0500 time 0.4784 (0.4809) data time 0.0011 (0.0018) model time 0.4773 (0.4789) loss 2.2251 (3.2769) grad_norm 1.8477 (1.4740) loss_scale 4096.0000 (3609.5559) mem 16699MB [2024-08-07 09:38:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][590/625] eta 0:00:16 lr 0.001106 wd 0.0500 time 0.4750 (0.4809) data time 0.0012 (0.0018) model time 0.4738 (0.4789) loss 3.3835 (3.2750) grad_norm 1.9125 (1.4853) loss_scale 4096.0000 (3617.7868) mem 16699MB [2024-08-07 09:38:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][600/625] eta 0:00:12 lr 0.001106 wd 0.0500 time 0.4821 (0.4808) data time 0.0011 (0.0018) model time 0.4811 (0.4788) loss 3.3296 (3.2757) grad_norm 1.7434 (1.4864) loss_scale 4096.0000 (3625.7438) mem 16699MB [2024-08-07 09:38:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][610/625] eta 0:00:07 lr 0.001105 wd 0.0500 time 0.4716 (0.4807) data time 0.0006 (0.0018) model time 0.4711 (0.4787) loss 3.7582 (3.2738) grad_norm 1.1084 (1.4829) loss_scale 4096.0000 (3633.4403) mem 16699MB [2024-08-07 09:38:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [70/300][620/625] eta 0:00:02 lr 0.001105 wd 0.0500 time 0.4748 (0.4815) data time 0.0008 (0.0018) model time 0.4740 (0.4796) loss 3.4742 (3.2788) grad_norm 1.2821 (1.4869) loss_scale 4096.0000 (3640.8889) mem 16699MB [2024-08-07 09:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 70 training takes 0:05:00 [2024-08-07 09:38:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:38:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 09:38:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.539 (0.539) Loss 0.5747 (0.5747) Acc@1 87.158 (87.158) Acc@5 97.949 (97.949) Mem 16699MB [2024-08-07 09:38:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.163) Loss 0.9893 (0.7266) Acc@1 75.830 (82.999) Acc@5 93.994 (96.857) Mem 16699MB [2024-08-07 09:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.1084 (0.8774) Acc@1 72.168 (79.436) Acc@5 92.480 (95.150) Mem 16699MB [2024-08-07 09:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.163 Acc@5 95.128 [2024-08-07 09:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.2% [2024-08-07 09:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.891 (0.891) Loss 0.5244 (0.5244) Acc@1 88.037 (88.037) Acc@5 98.340 (98.340) Mem 16699MB [2024-08-07 09:38:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.8735 (0.6648) Acc@1 78.711 (84.814) Acc@5 95.264 (97.301) Mem 16699MB [2024-08-07 09:38:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 1.0020 (0.7947) Acc@1 74.512 (81.403) Acc@5 94.189 (95.889) Mem 16699MB [2024-08-07 09:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.144 Acc@5 95.883 [2024-08-07 09:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-07 09:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.14% [2024-08-07 09:38:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 09:38:33 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 09:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][0/625] eta 0:08:27 lr 0.001105 wd 0.0500 time 0.8123 (0.8123) data time 0.3797 (0.3797) model time 0.0000 (0.0000) loss 3.0963 (3.0963) grad_norm 2.4414 (2.4414) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 09:38:34 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 09:38:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 09:38:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 10:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 10:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 10:23:17 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 10:23:27 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 10:23:27 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 10:23:30 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 10:23:32 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 10:23:32 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 71) [2024-08-07 10:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 10:24:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][10/625] eta 0:25:29 lr 0.001105 wd 0.0500 time 0.4112 (2.4876) data time 0.0011 (0.0849) model time 0.0000 (0.0000) loss 3.3052 (3.7121) grad_norm 1.5413 (1.9015) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][20/625] eta 0:14:38 lr 0.001105 wd 0.0500 time 0.4179 (1.4516) data time 0.0008 (0.0430) model time 0.0000 (0.0000) loss 3.7128 (3.4882) grad_norm 1.3541 (1.6425) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][30/625] eta 0:11:03 lr 0.001105 wd 0.0500 time 0.4124 (1.1157) data time 0.0009 (0.0290) model time 0.0000 (0.0000) loss 3.8273 (3.5298) grad_norm 1.2988 (1.5221) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][40/625] eta 0:09:13 lr 0.001105 wd 0.0500 time 0.4109 (0.9468) data time 0.0010 (0.0220) model time 0.0000 (0.0000) loss 2.6163 (3.4367) grad_norm 1.4150 (1.5328) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][50/625] eta 0:08:03 lr 0.001105 wd 0.0500 time 0.4121 (0.8401) data time 0.0010 (0.0178) model time 0.0000 (0.0000) loss 2.9059 (3.4018) grad_norm 1.5208 (1.5570) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][60/625] eta 0:07:14 lr 0.001105 wd 0.0500 time 0.4117 (0.7691) data time 0.0008 (0.0150) model time 0.4109 (0.4132) loss 3.3054 (3.3821) grad_norm 1.6229 (1.5318) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][70/625] eta 0:06:38 lr 0.001105 wd 0.0500 time 0.4095 (0.7184) data time 0.0008 (0.0130) model time 0.4087 (0.4133) loss 2.4618 (3.3489) grad_norm 1.2026 (1.5180) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][80/625] eta 0:06:10 lr 0.001105 wd 0.0500 time 0.4198 (0.6803) data time 0.0010 (0.0115) model time 0.4188 (0.4131) loss 3.6557 (3.3339) grad_norm 1.3838 (1.4995) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][90/625] eta 0:05:48 lr 0.001105 wd 0.0500 time 0.4120 (0.6510) data time 0.0007 (0.0104) model time 0.4113 (0.4136) loss 3.9952 (3.3127) grad_norm 1.3076 (1.4711) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][100/625] eta 0:05:29 lr 0.001105 wd 0.0500 time 0.4153 (0.6274) data time 0.0013 (0.0094) model time 0.4140 (0.4138) loss 3.6238 (3.3205) grad_norm 1.3495 (1.4556) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][110/625] eta 0:05:13 lr 0.001105 wd 0.0500 time 0.4133 (0.6086) data time 0.0009 (0.0086) model time 0.4124 (0.4147) loss 2.8812 (3.3248) grad_norm 2.4738 (1.4672) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][120/625] eta 0:04:59 lr 0.001105 wd 0.0500 time 0.4142 (0.5924) data time 0.0007 (0.0080) model time 0.4135 (0.4145) loss 3.8463 (3.3305) grad_norm 2.0647 (1.5045) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][130/625] eta 0:04:46 lr 0.001105 wd 0.0500 time 0.4155 (0.5791) data time 0.0007 (0.0075) model time 0.4148 (0.4150) loss 3.1496 (3.3213) grad_norm 1.5347 (1.4945) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:24:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][140/625] eta 0:04:35 lr 0.001105 wd 0.0500 time 0.4156 (0.5677) data time 0.0008 (0.0070) model time 0.4148 (0.4154) loss 2.3893 (3.3168) grad_norm 1.1830 (1.4989) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][150/625] eta 0:04:24 lr 0.001105 wd 0.0500 time 0.4177 (0.5576) data time 0.0013 (0.0066) model time 0.4164 (0.4153) loss 3.5459 (3.3078) grad_norm 1.4097 (1.4978) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][160/625] eta 0:04:15 lr 0.001104 wd 0.0500 time 0.4150 (0.5487) data time 0.0010 (0.0063) model time 0.4140 (0.4152) loss 3.4694 (3.3003) grad_norm 1.1360 (1.4915) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][170/625] eta 0:04:06 lr 0.001104 wd 0.0500 time 0.4181 (0.5410) data time 0.0007 (0.0059) model time 0.4174 (0.4153) loss 2.5338 (3.2983) grad_norm 1.4513 (1.4845) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][180/625] eta 0:03:57 lr 0.001104 wd 0.0500 time 0.4148 (0.5341) data time 0.0008 (0.0057) model time 0.4140 (0.4154) loss 2.5705 (3.2865) grad_norm 1.1993 (1.4879) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][190/625] eta 0:03:50 lr 0.001104 wd 0.0500 time 0.4092 (0.5290) data time 0.0007 (0.0054) model time 0.4085 (0.4169) loss 2.6734 (3.2817) grad_norm 1.2130 (1.4789) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][200/625] eta 0:03:42 lr 0.001104 wd 0.0500 time 0.4128 (0.5233) data time 0.0009 (0.0052) model time 0.4118 (0.4167) loss 3.4069 (3.2704) grad_norm 1.0968 (1.4738) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][210/625] eta 0:03:35 lr 0.001104 wd 0.0500 time 0.4194 (0.5182) data time 0.0008 (0.0050) model time 0.4185 (0.4166) loss 3.5925 (3.2641) grad_norm 1.9002 (1.4905) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][220/625] eta 0:03:27 lr 0.001104 wd 0.0500 time 0.4228 (0.5135) data time 0.0011 (0.0048) model time 0.4217 (0.4164) loss 3.2556 (3.2642) grad_norm 1.1658 (1.4936) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][230/625] eta 0:03:21 lr 0.001104 wd 0.0500 time 0.4128 (0.5091) data time 0.0010 (0.0047) model time 0.4118 (0.4162) loss 3.4131 (3.2685) grad_norm 1.9278 (1.4941) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][240/625] eta 0:03:20 lr 0.001104 wd 0.0500 time 0.4094 (0.5207) data time 0.0010 (0.0045) model time 0.4084 (0.4357) loss 3.6609 (3.2634) grad_norm 1.3534 (1.5006) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][250/625] eta 0:03:23 lr 0.001104 wd 0.0500 time 0.4397 (0.5440) data time 0.0007 (0.0044) model time 0.4390 (0.4690) loss 2.5715 (3.2542) grad_norm 1.2509 (1.4907) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:25:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][260/625] eta 0:03:19 lr 0.001104 wd 0.0500 time 0.4118 (0.5460) data time 0.0011 (0.0042) model time 0.4107 (0.4750) loss 2.3457 (3.2463) grad_norm 1.2438 (1.4835) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][270/625] eta 0:03:12 lr 0.001104 wd 0.0500 time 0.4345 (0.5413) data time 0.0007 (0.0041) model time 0.4338 (0.4724) loss 3.9265 (3.2427) grad_norm 1.0544 (1.4778) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][280/625] eta 0:03:05 lr 0.001104 wd 0.0500 time 0.4114 (0.5367) data time 0.0009 (0.0040) model time 0.4105 (0.4698) loss 3.7569 (3.2483) grad_norm 1.4345 (1.4730) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][290/625] eta 0:02:58 lr 0.001104 wd 0.0500 time 0.4123 (0.5326) data time 0.0010 (0.0039) model time 0.4113 (0.4675) loss 2.8366 (3.2444) grad_norm 1.2289 (1.4690) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][300/625] eta 0:02:51 lr 0.001104 wd 0.0500 time 0.4100 (0.5287) data time 0.0008 (0.0038) model time 0.4092 (0.4654) loss 2.7008 (3.2280) grad_norm 1.2333 (1.4652) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][310/625] eta 0:02:45 lr 0.001104 wd 0.0500 time 0.4133 (0.5249) data time 0.0011 (0.0037) model time 0.4122 (0.4633) loss 3.0160 (3.2249) grad_norm 1.1475 (1.4611) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][320/625] eta 0:02:39 lr 0.001104 wd 0.0500 time 0.4107 (0.5215) data time 0.0009 (0.0036) model time 0.4098 (0.4615) loss 2.7303 (3.2332) grad_norm 1.5877 (1.4607) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][330/625] eta 0:02:32 lr 0.001103 wd 0.0500 time 0.4105 (0.5182) data time 0.0010 (0.0035) model time 0.4095 (0.4597) loss 3.7471 (3.2360) grad_norm 1.6582 (1.4701) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][340/625] eta 0:02:26 lr 0.001103 wd 0.0500 time 0.4115 (0.5151) data time 0.0010 (0.0035) model time 0.4106 (0.4581) loss 3.1302 (3.2368) grad_norm 1.4620 (1.4687) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][350/625] eta 0:02:20 lr 0.001103 wd 0.0500 time 0.4119 (0.5123) data time 0.0009 (0.0034) model time 0.4110 (0.4567) loss 3.0851 (3.2383) grad_norm 1.4981 (1.4655) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][360/625] eta 0:02:15 lr 0.001103 wd 0.0500 time 0.4069 (0.5095) data time 0.0007 (0.0033) model time 0.4062 (0.4552) loss 3.9308 (3.2419) grad_norm 1.2560 (1.4612) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][370/625] eta 0:02:09 lr 0.001103 wd 0.0500 time 0.4129 (0.5076) data time 0.0011 (0.0033) model time 0.4119 (0.4546) loss 3.9038 (3.2400) grad_norm 1.5212 (1.4636) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][380/625] eta 0:02:03 lr 0.001103 wd 0.0500 time 0.4147 (0.5051) data time 0.0009 (0.0032) model time 0.4138 (0.4533) loss 3.3661 (3.2393) grad_norm 1.5618 (1.4866) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][390/625] eta 0:01:58 lr 0.001103 wd 0.0500 time 0.4094 (0.5028) data time 0.0010 (0.0032) model time 0.4084 (0.4523) loss 2.3859 (3.2299) grad_norm 1.3862 (1.4903) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:26:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][400/625] eta 0:01:52 lr 0.001103 wd 0.0500 time 0.4135 (0.5007) data time 0.0012 (0.0031) model time 0.4123 (0.4513) loss 3.1600 (3.2371) grad_norm 1.0956 (1.4872) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][410/625] eta 0:01:47 lr 0.001103 wd 0.0500 time 0.4217 (0.4986) data time 0.0007 (0.0030) model time 0.4210 (0.4502) loss 3.5855 (3.2417) grad_norm 1.0314 (1.4808) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][420/625] eta 0:01:41 lr 0.001103 wd 0.0500 time 0.4131 (0.4966) data time 0.0008 (0.0030) model time 0.4123 (0.4491) loss 3.1818 (3.2362) grad_norm 1.2484 (1.4795) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][430/625] eta 0:01:36 lr 0.001103 wd 0.0500 time 0.4105 (0.4946) data time 0.0012 (0.0030) model time 0.4093 (0.4482) loss 3.6794 (3.2442) grad_norm 2.0194 (1.4911) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][440/625] eta 0:01:31 lr 0.001103 wd 0.0500 time 0.4144 (0.4928) data time 0.0010 (0.0029) model time 0.4134 (0.4473) loss 2.6687 (3.2481) grad_norm 1.3234 (1.4939) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][450/625] eta 0:01:25 lr 0.001103 wd 0.0500 time 0.4222 (0.4911) data time 0.0007 (0.0029) model time 0.4215 (0.4464) loss 3.5073 (3.2496) grad_norm 1.1139 (1.4914) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][460/625] eta 0:01:20 lr 0.001103 wd 0.0500 time 0.4092 (0.4894) data time 0.0008 (0.0028) model time 0.4084 (0.4456) loss 3.6118 (3.2452) grad_norm 1.3683 (1.4896) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][470/625] eta 0:01:15 lr 0.001103 wd 0.0500 time 0.4106 (0.4878) data time 0.0010 (0.0028) model time 0.4096 (0.4448) loss 3.1484 (3.2376) grad_norm 1.1198 (1.4856) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][480/625] eta 0:01:10 lr 0.001103 wd 0.0500 time 0.4160 (0.4862) data time 0.0010 (0.0027) model time 0.4150 (0.4441) loss 2.8538 (3.2373) grad_norm 1.5962 (1.4876) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][490/625] eta 0:01:05 lr 0.001103 wd 0.0500 time 0.4122 (0.4847) data time 0.0008 (0.0027) model time 0.4113 (0.4433) loss 2.8588 (3.2435) grad_norm 1.5233 (1.4869) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][500/625] eta 0:01:00 lr 0.001102 wd 0.0500 time 0.4134 (0.4833) data time 0.0008 (0.0027) model time 0.4126 (0.4427) loss 2.2093 (3.2433) grad_norm 1.0777 (1.4831) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][510/625] eta 0:00:55 lr 0.001102 wd 0.0500 time 0.4125 (0.4819) data time 0.0009 (0.0026) model time 0.4115 (0.4420) loss 3.4369 (3.2481) grad_norm 0.9128 (1.4783) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][520/625] eta 0:00:50 lr 0.001102 wd 0.0500 time 0.4136 (0.4811) data time 0.0008 (0.0026) model time 0.4128 (0.4419) loss 3.2179 (3.2487) grad_norm 1.1002 (1.4755) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][530/625] eta 0:00:45 lr 0.001102 wd 0.0500 time 0.4116 (0.4798) data time 0.0010 (0.0026) model time 0.4107 (0.4413) loss 3.2455 (3.2407) grad_norm 1.0756 (1.4733) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][540/625] eta 0:00:40 lr 0.001102 wd 0.0500 time 0.4108 (0.4786) data time 0.0008 (0.0026) model time 0.4100 (0.4407) loss 3.4841 (3.2390) grad_norm 1.0650 (1.4743) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:27:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][550/625] eta 0:00:35 lr 0.001102 wd 0.0500 time 0.4100 (0.4774) data time 0.0009 (0.0025) model time 0.4090 (0.4401) loss 1.9649 (3.2387) grad_norm 1.1272 (1.4747) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][560/625] eta 0:00:30 lr 0.001102 wd 0.0500 time 0.4138 (0.4765) data time 0.0011 (0.0025) model time 0.4127 (0.4399) loss 3.4370 (3.2446) grad_norm 2.5093 (1.4785) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][570/625] eta 0:00:26 lr 0.001102 wd 0.0500 time 0.4199 (0.4754) data time 0.0009 (0.0025) model time 0.4190 (0.4393) loss 3.0522 (3.2469) grad_norm 4.1908 (1.4824) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][580/625] eta 0:00:21 lr 0.001102 wd 0.0500 time 0.4107 (0.4743) data time 0.0007 (0.0024) model time 0.4100 (0.4388) loss 3.0404 (3.2495) grad_norm 1.6710 (1.4895) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][590/625] eta 0:00:16 lr 0.001102 wd 0.0500 time 0.4171 (0.4733) data time 0.0007 (0.0024) model time 0.4164 (0.4384) loss 3.8395 (3.2519) grad_norm 1.2309 (1.4902) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][600/625] eta 0:00:11 lr 0.001102 wd 0.0500 time 0.4108 (0.4723) data time 0.0011 (0.0024) model time 0.4097 (0.4379) loss 3.2594 (3.2524) grad_norm 0.8301 (1.4898) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][610/625] eta 0:00:07 lr 0.001102 wd 0.0500 time 0.4273 (0.4714) data time 0.0007 (0.0024) model time 0.4266 (0.4375) loss 3.3846 (3.2511) grad_norm 1.2490 (1.4899) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [71/300][620/625] eta 0:00:02 lr 0.001102 wd 0.0500 time 0.4055 (0.4703) data time 0.0007 (0.0024) model time 0.4048 (0.4369) loss 3.4444 (3.2508) grad_norm 1.1203 (1.4885) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:28:30 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 71 training takes 0:04:53 [2024-08-07 10:28:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 10:28:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 10:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.577 (0.577) Loss 0.5830 (0.5830) Acc@1 87.256 (87.256) Acc@5 98.145 (98.145) Mem 16722MB [2024-08-07 10:28:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.164) Loss 1.0176 (0.7470) Acc@1 76.074 (83.225) Acc@5 93.945 (96.773) Mem 16722MB [2024-08-07 10:28:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.141) Loss 1.1592 (0.9083) Acc@1 72.998 (79.490) Acc@5 92.188 (95.089) Mem 16722MB [2024-08-07 10:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.327 Acc@5 95.086 [2024-08-07 10:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-08-07 10:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.33% [2024-08-07 10:28:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 10:28:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 10:28:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.581 (0.581) Loss 0.5239 (0.5239) Acc@1 88.086 (88.086) Acc@5 98.389 (98.389) Mem 16722MB [2024-08-07 10:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.164) Loss 0.8696 (0.6637) Acc@1 78.809 (84.823) Acc@5 95.264 (97.332) Mem 16722MB [2024-08-07 10:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.140) Loss 1.0000 (0.7929) Acc@1 74.707 (81.452) Acc@5 94.043 (95.919) Mem 16722MB [2024-08-07 10:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.190 Acc@5 95.905 [2024-08-07 10:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-07 10:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.19% [2024-08-07 10:28:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 10:28:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 10:28:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][0/625] eta 0:14:38 lr 0.001102 wd 0.0500 time 1.4062 (1.4062) data time 0.6264 (0.6264) model time 0.0000 (0.0000) loss 3.3016 (3.3016) grad_norm 1.4255 (1.4255) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][10/625] eta 0:05:11 lr 0.001102 wd 0.0500 time 0.4152 (0.5071) data time 0.0012 (0.0579) model time 0.0000 (0.0000) loss 3.1008 (3.1975) grad_norm 1.2114 (1.3821) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][20/625] eta 0:04:45 lr 0.001102 wd 0.0500 time 0.4156 (0.4725) data time 0.0008 (0.0308) model time 0.0000 (0.0000) loss 2.4359 (3.1213) grad_norm 2.0261 (1.4082) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][30/625] eta 0:04:30 lr 0.001102 wd 0.0500 time 0.4178 (0.4541) data time 0.0007 (0.0211) model time 0.0000 (0.0000) loss 3.4096 (3.2084) grad_norm 1.4725 (1.4335) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][40/625] eta 0:04:20 lr 0.001102 wd 0.0500 time 0.4110 (0.4449) data time 0.0007 (0.0162) model time 0.0000 (0.0000) loss 3.4163 (3.1466) grad_norm 1.3603 (1.4647) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][50/625] eta 0:04:12 lr 0.001101 wd 0.0500 time 0.4211 (0.4391) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 3.2501 (3.2319) grad_norm 1.3220 (1.4639) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][60/625] eta 0:04:06 lr 0.001101 wd 0.0500 time 0.4136 (0.4354) data time 0.0009 (0.0112) model time 0.4127 (0.4157) loss 3.3077 (3.2249) grad_norm 1.1979 (1.4365) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][70/625] eta 0:04:00 lr 0.001101 wd 0.0500 time 0.4148 (0.4325) data time 0.0010 (0.0098) model time 0.4138 (0.4147) loss 2.9846 (3.2101) grad_norm 1.3005 (1.4333) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][80/625] eta 0:03:54 lr 0.001101 wd 0.0500 time 0.4111 (0.4306) data time 0.0007 (0.0087) model time 0.4104 (0.4152) loss 3.7650 (3.2004) grad_norm 1.4074 (1.4400) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][90/625] eta 0:03:49 lr 0.001101 wd 0.0500 time 0.4118 (0.4287) data time 0.0010 (0.0078) model time 0.4108 (0.4146) loss 3.7938 (3.1709) grad_norm 2.2326 (1.4714) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][100/625] eta 0:03:44 lr 0.001101 wd 0.0500 time 0.4115 (0.4276) data time 0.0007 (0.0072) model time 0.4108 (0.4149) loss 3.3985 (3.1758) grad_norm 1.4094 (1.4669) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][110/625] eta 0:03:39 lr 0.001101 wd 0.0500 time 0.4108 (0.4263) data time 0.0008 (0.0066) model time 0.4101 (0.4144) loss 3.2289 (3.1915) grad_norm 1.7885 (1.4681) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][120/625] eta 0:03:34 lr 0.001101 wd 0.0500 time 0.4260 (0.4255) data time 0.0010 (0.0061) model time 0.4250 (0.4145) loss 3.5585 (3.2039) grad_norm 1.2701 (1.4594) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][130/625] eta 0:03:31 lr 0.001101 wd 0.0500 time 0.4123 (0.4264) data time 0.0012 (0.0058) model time 0.4111 (0.4173) loss 2.2199 (3.1858) grad_norm 1.5695 (1.4685) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][140/625] eta 0:03:26 lr 0.001101 wd 0.0500 time 0.4104 (0.4256) data time 0.0008 (0.0054) model time 0.4097 (0.4169) loss 3.5528 (3.2035) grad_norm 1.4855 (1.4709) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][150/625] eta 0:03:21 lr 0.001101 wd 0.0500 time 0.4165 (0.4248) data time 0.0010 (0.0051) model time 0.4155 (0.4165) loss 2.5186 (3.2117) grad_norm 1.2182 (1.4720) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][160/625] eta 0:03:17 lr 0.001101 wd 0.0500 time 0.4089 (0.4241) data time 0.0010 (0.0049) model time 0.4079 (0.4162) loss 2.7814 (3.2095) grad_norm 1.6755 (1.4682) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][170/625] eta 0:03:12 lr 0.001101 wd 0.0500 time 0.4138 (0.4236) data time 0.0008 (0.0046) model time 0.4130 (0.4160) loss 3.1608 (3.2182) grad_norm 1.1687 (1.4656) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][180/625] eta 0:03:08 lr 0.001101 wd 0.0500 time 0.4128 (0.4231) data time 0.0008 (0.0044) model time 0.4120 (0.4158) loss 3.8554 (3.2157) grad_norm 1.3150 (1.4718) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][190/625] eta 0:03:03 lr 0.001101 wd 0.0500 time 0.4060 (0.4226) data time 0.0007 (0.0043) model time 0.4053 (0.4155) loss 3.6401 (3.1883) grad_norm 1.1020 (1.4617) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][200/625] eta 0:02:59 lr 0.001101 wd 0.0500 time 0.4109 (0.4222) data time 0.0007 (0.0041) model time 0.4102 (0.4154) loss 2.1214 (3.1845) grad_norm 1.0552 (1.4615) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][210/625] eta 0:02:55 lr 0.001100 wd 0.0500 time 0.4089 (0.4226) data time 0.0010 (0.0039) model time 0.4080 (0.4164) loss 2.7492 (3.1791) grad_norm 1.2262 (1.4531) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][220/625] eta 0:02:51 lr 0.001100 wd 0.0500 time 0.4151 (0.4224) data time 0.0007 (0.0038) model time 0.4144 (0.4164) loss 2.1075 (3.1748) grad_norm 1.6154 (1.4521) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][230/625] eta 0:02:46 lr 0.001100 wd 0.0500 time 0.4082 (0.4220) data time 0.0007 (0.0037) model time 0.4075 (0.4161) loss 3.5903 (3.1825) grad_norm 1.8241 (1.4619) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][240/625] eta 0:02:42 lr 0.001100 wd 0.0500 time 0.4117 (0.4218) data time 0.0010 (0.0036) model time 0.4107 (0.4161) loss 3.3606 (3.1820) grad_norm 1.5832 (1.4632) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][250/625] eta 0:02:38 lr 0.001100 wd 0.0500 time 0.4133 (0.4215) data time 0.0008 (0.0035) model time 0.4126 (0.4160) loss 3.1086 (3.1838) grad_norm 1.7659 (1.4615) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][260/625] eta 0:02:33 lr 0.001100 wd 0.0500 time 0.4142 (0.4213) data time 0.0007 (0.0034) model time 0.4134 (0.4160) loss 2.9020 (3.1808) grad_norm 1.3133 (1.4568) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][270/625] eta 0:02:29 lr 0.001100 wd 0.0500 time 0.4149 (0.4211) data time 0.0009 (0.0033) model time 0.4140 (0.4159) loss 3.2016 (3.1832) grad_norm 1.2396 (1.4539) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][280/625] eta 0:02:25 lr 0.001100 wd 0.0500 time 0.4098 (0.4209) data time 0.0008 (0.0032) model time 0.4090 (0.4159) loss 3.5953 (3.1843) grad_norm 1.2812 (1.4461) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][290/625] eta 0:02:20 lr 0.001100 wd 0.0500 time 0.4164 (0.4207) data time 0.0008 (0.0031) model time 0.4155 (0.4158) loss 4.0149 (3.1841) grad_norm 2.3022 (1.4520) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:30:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][300/625] eta 0:02:16 lr 0.001100 wd 0.0500 time 0.4128 (0.4208) data time 0.0008 (0.0031) model time 0.4120 (0.4160) loss 3.7357 (3.1945) grad_norm 1.3012 (1.4489) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][310/625] eta 0:02:12 lr 0.001100 wd 0.0500 time 0.4145 (0.4205) data time 0.0008 (0.0030) model time 0.4137 (0.4159) loss 2.4609 (3.1970) grad_norm 1.4774 (1.4448) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][320/625] eta 0:02:08 lr 0.001100 wd 0.0500 time 0.4106 (0.4204) data time 0.0009 (0.0029) model time 0.4097 (0.4159) loss 3.4836 (3.1938) grad_norm 2.3134 (1.4556) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][330/625] eta 0:02:03 lr 0.001100 wd 0.0500 time 0.4174 (0.4203) data time 0.0009 (0.0029) model time 0.4165 (0.4159) loss 3.2703 (3.1881) grad_norm 1.4402 (1.4548) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][340/625] eta 0:01:59 lr 0.001100 wd 0.0500 time 0.4138 (0.4202) data time 0.0009 (0.0028) model time 0.4128 (0.4159) loss 3.2196 (3.1901) grad_norm 1.4121 (1.4531) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][350/625] eta 0:01:55 lr 0.001100 wd 0.0500 time 0.4194 (0.4201) data time 0.0007 (0.0028) model time 0.4187 (0.4159) loss 3.9381 (3.1994) grad_norm 1.2893 (1.4480) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][360/625] eta 0:01:51 lr 0.001100 wd 0.0500 time 0.4126 (0.4205) data time 0.0009 (0.0027) model time 0.4117 (0.4165) loss 3.4397 (3.1976) grad_norm 1.4427 (1.4468) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][370/625] eta 0:01:47 lr 0.001100 wd 0.0500 time 0.4167 (0.4204) data time 0.0007 (0.0027) model time 0.4161 (0.4164) loss 3.7119 (3.1995) grad_norm 1.3248 (1.4434) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][380/625] eta 0:01:43 lr 0.001099 wd 0.0500 time 0.4137 (0.4206) data time 0.0007 (0.0026) model time 0.4130 (0.4168) loss 2.5714 (3.1955) grad_norm 1.4882 (1.4385) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][390/625] eta 0:01:38 lr 0.001099 wd 0.0500 time 0.4147 (0.4208) data time 0.0008 (0.0026) model time 0.4140 (0.4171) loss 3.9684 (3.2042) grad_norm 1.3314 (1.4403) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][400/625] eta 0:01:34 lr 0.001099 wd 0.0500 time 0.4108 (0.4207) data time 0.0007 (0.0025) model time 0.4100 (0.4171) loss 2.5361 (3.1970) grad_norm 1.6273 (1.4428) loss_scale 4096.0000 (4096.0000) mem 16722MB [2024-08-07 10:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 10:31:40 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 10:31:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 10:42:16 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 10:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 10:42:31 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 10:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 10:42:43 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 10:42:46 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 10:42:48 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 10:42:48 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 72) [2024-08-07 10:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 10:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][410/625] eta 0:11:30 lr 0.001099 wd 0.0500 time 0.4730 (3.2115) data time 0.0009 (0.1003) model time 0.4721 (3.1112) loss 3.9930 (3.7612) grad_norm 1.5473 (1.2717) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][420/625] eta 0:05:46 lr 0.001099 wd 0.0500 time 0.4728 (1.6906) data time 0.0009 (0.0452) model time 0.4720 (1.6454) loss 3.8308 (3.6155) grad_norm 1.2592 (1.3128) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][430/625] eta 0:04:04 lr 0.001099 wd 0.0500 time 0.4679 (1.2557) data time 0.0011 (0.0294) model time 0.4667 (1.2263) loss 3.5131 (3.6080) grad_norm 1.2904 (1.3304) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][440/625] eta 0:03:16 lr 0.001099 wd 0.0500 time 0.4118 (1.0632) data time 0.0012 (0.0220) model time 0.4106 (1.0412) loss 3.1620 (3.5345) grad_norm 1.0728 (1.3465) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][450/625] eta 0:02:44 lr 0.001099 wd 0.0500 time 0.4736 (0.9401) data time 0.0008 (0.0176) model time 0.4727 (0.9225) loss 3.6839 (3.5244) grad_norm 1.1658 (1.3660) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][460/625] eta 0:02:21 lr 0.001099 wd 0.0500 time 0.4775 (0.8604) data time 0.0008 (0.0148) model time 0.4767 (0.8456) loss 2.9738 (3.4891) grad_norm 1.4842 (1.3504) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][470/625] eta 0:02:04 lr 0.001099 wd 0.0500 time 0.4755 (0.8038) data time 0.0008 (0.0128) model time 0.4747 (0.7910) loss 2.6426 (3.4473) grad_norm 1.8816 (1.3593) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][480/625] eta 0:01:50 lr 0.001099 wd 0.0500 time 0.4761 (0.7617) data time 0.0009 (0.0113) model time 0.4752 (0.7504) loss 3.0828 (3.4245) grad_norm 1.1560 (1.3628) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:43:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][490/625] eta 0:01:38 lr 0.001099 wd 0.0500 time 0.4744 (0.7291) data time 0.0011 (0.0102) model time 0.4733 (0.7189) loss 3.7358 (3.3819) grad_norm 1.2834 (1.3476) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:44:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][500/625] eta 0:01:27 lr 0.001099 wd 0.0500 time 0.4743 (0.7029) data time 0.0008 (0.0092) model time 0.4735 (0.6937) loss 4.0457 (3.3769) grad_norm 1.1774 (1.3319) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][510/625] eta 0:01:18 lr 0.001099 wd 0.0500 time 0.4709 (0.6815) data time 0.0010 (0.0085) model time 0.4700 (0.6730) loss 2.6057 (3.3902) grad_norm 1.0008 (1.3213) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:44:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][520/625] eta 0:01:09 lr 0.001099 wd 0.0500 time 0.4754 (0.6637) data time 0.0010 (0.0078) model time 0.4745 (0.6559) loss 3.3628 (3.3887) grad_norm 1.1182 (1.3182) loss_scale 4096.0000 (4096.0000) mem 16700MB [2024-08-07 10:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 10:44:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 10:44:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 10:52:23 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 10:52:25 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 10:52:27 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 10:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 10:52:50 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 10:52:52 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 10:52:54 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 10:52:55 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 72) [2024-08-07 10:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 10:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][530/625] eta 0:05:23 lr 0.001099 wd 0.0500 time 0.4661 (3.4023) data time 0.0015 (0.0940) model time 0.4645 (3.3084) loss 3.2907 (3.8268) grad_norm 1.3365 (1.5362) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][540/625] eta 0:02:22 lr 0.001099 wd 0.0500 time 0.4676 (1.6759) data time 0.0012 (0.0395) model time 0.4664 (1.6364) loss 3.1662 (3.5669) grad_norm 1.7631 (1.6629) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][550/625] eta 0:01:31 lr 0.001098 wd 0.0500 time 0.4629 (1.2263) data time 0.0010 (0.0253) model time 0.4620 (1.2010) loss 4.2380 (3.5779) grad_norm 1.2849 (1.5754) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][560/625] eta 0:01:07 lr 0.001098 wd 0.0500 time 0.7623 (1.0361) data time 0.0012 (0.0188) model time 0.7611 (1.0173) loss 3.1821 (3.5492) grad_norm 1.3105 (1.5821) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][570/625] eta 0:00:50 lr 0.001098 wd 0.0500 time 0.4638 (0.9128) data time 0.0008 (0.0150) model time 0.4630 (0.8977) loss 3.3264 (3.4780) grad_norm 1.1642 (1.5330) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][580/625] eta 0:00:37 lr 0.001098 wd 0.0500 time 0.4690 (0.8347) data time 0.0011 (0.0126) model time 0.4679 (0.8221) loss 3.2626 (3.4634) grad_norm 1.5218 (1.5568) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][590/625] eta 0:00:27 lr 0.001098 wd 0.0500 time 0.4657 (0.7804) data time 0.0011 (0.0110) model time 0.4646 (0.7695) loss 3.1900 (3.4211) grad_norm 0.9104 (1.5208) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 10:53:54 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 10:53:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 10:54:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 11:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 11:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 11:00:42 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 11:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 11:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 11:18:16 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 11:43:50 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 11:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 11:44:02 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 11:44:16 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 11:44:16 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 11:44:18 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 11:44:20 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 11:44:20 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 72) [2024-08-07 11:44:20 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 11:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][600/625] eta 0:01:38 lr 0.001098 wd 0.0500 time 0.4393 (3.9565) data time 0.0006 (0.1742) model time 0.4387 (3.7823) loss 4.1230 (3.7804) grad_norm 1.2890 (1.4979) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 11:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][610/625] eta 0:00:24 lr 0.001098 wd 0.0500 time 0.4374 (1.6139) data time 0.0006 (0.0589) model time 0.4368 (1.5550) loss 3.7816 (3.5041) grad_norm 1.1635 (1.3642) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 11:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [72/300][620/625] eta 0:00:05 lr 0.001098 wd 0.0500 time 0.4366 (1.1449) data time 0.0006 (0.0356) model time 0.4360 (1.1093) loss 3.5182 (3.4974) grad_norm 1.6429 (1.3613) loss_scale 4096.0000 (4096.0000) mem 16699MB [2024-08-07 11:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 72 training takes 0:00:30 [2024-08-07 11:44:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 11:44:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 11:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.5825 (0.5825) Acc@1 87.354 (87.354) Acc@5 97.900 (97.900) Mem 16699MB [2024-08-07 11:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 1.0195 (0.7270) Acc@1 76.465 (83.172) Acc@5 93.457 (96.764) Mem 16699MB [2024-08-07 11:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 1.1221 (0.8743) Acc@1 72.070 (79.532) Acc@5 93.018 (95.136) Mem 16699MB [2024-08-07 11:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.263 Acc@5 95.148 [2024-08-07 11:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.3% [2024-08-07 11:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.809 (0.809) Loss 0.5234 (0.5234) Acc@1 88.184 (88.184) Acc@5 98.389 (98.389) Mem 16699MB [2024-08-07 11:45:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.184) Loss 0.8672 (0.6623) Acc@1 78.760 (84.894) Acc@5 95.264 (97.323) Mem 16699MB [2024-08-07 11:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.151) Loss 0.9971 (0.7910) Acc@1 74.756 (81.520) Acc@5 94.092 (95.917) Mem 16699MB [2024-08-07 11:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.268 Acc@5 95.901 [2024-08-07 11:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-07 11:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.27% [2024-08-07 11:45:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 11:45:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 11:45:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][0/625] eta 0:19:38 lr 0.001098 wd 0.0500 time 1.8856 (1.8856) data time 0.4212 (0.4212) model time 0.0000 (0.0000) loss 3.8215 (3.8215) grad_norm 1.5030 (1.5030) loss_scale 4096.0000 (4096.0000) mem 16712MB [2024-08-07 11:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][10/625] eta 0:05:52 lr 0.001098 wd 0.0500 time 0.4397 (0.5730) data time 0.0007 (0.0390) model time 0.0000 (0.0000) loss 2.5731 (3.2846) grad_norm 1.1489 (1.6648) loss_scale 4096.0000 (4096.0000) mem 16703MB [2024-08-07 11:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 11:45:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 11:45:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 11:51:18 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 11:51:19 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 11:51:33 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 11:53:55 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 11:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 11:54:19 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 11:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 11:54:37 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 11:54:40 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 11:54:42 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 11:54:42 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 73) [2024-08-07 11:54:42 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 11:55:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][20/625] eta 0:28:01 lr 0.001098 wd 0.0500 time 0.4155 (2.7786) data time 0.0011 (0.0892) model time 0.0000 (0.0000) loss 3.4764 (3.6024) grad_norm 1.0791 (1.2965) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][30/625] eta 0:15:50 lr 0.001098 wd 0.0500 time 0.4149 (1.5978) data time 0.0009 (0.0452) model time 0.0000 (0.0000) loss 3.5869 (3.4513) grad_norm 1.6001 (1.4571) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][40/625] eta 0:11:50 lr 0.001098 wd 0.0500 time 0.4096 (1.2145) data time 0.0011 (0.0305) model time 0.0000 (0.0000) loss 3.8473 (3.5120) grad_norm 1.2700 (1.4763) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][50/625] eta 0:09:47 lr 0.001098 wd 0.0500 time 0.4138 (1.0224) data time 0.0009 (0.0231) model time 0.0000 (0.0000) loss 2.5779 (3.4420) grad_norm 1.2105 (1.4282) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][60/625] eta 0:08:29 lr 0.001098 wd 0.0500 time 0.4340 (0.9017) data time 0.0011 (0.0187) model time 0.4329 (0.4177) loss 3.4138 (3.4320) grad_norm 1.6047 (1.4328) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][70/625] eta 0:07:35 lr 0.001098 wd 0.0500 time 0.4141 (0.8205) data time 0.0008 (0.0158) model time 0.4132 (0.4155) loss 3.3686 (3.3961) grad_norm 1.7020 (1.4087) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][80/625] eta 0:06:55 lr 0.001098 wd 0.0500 time 0.4131 (0.7631) data time 0.0008 (0.0137) model time 0.4123 (0.4164) loss 2.2320 (3.3373) grad_norm 1.4728 (1.3926) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][90/625] eta 0:06:25 lr 0.001097 wd 0.0500 time 0.4404 (0.7205) data time 0.0012 (0.0121) model time 0.4392 (0.4174) loss 3.5059 (3.3282) grad_norm 4.9454 (1.4671) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][100/625] eta 0:06:00 lr 0.001097 wd 0.0500 time 0.4259 (0.6866) data time 0.0008 (0.0109) model time 0.4251 (0.4169) loss 3.6276 (3.3091) grad_norm 1.4254 (1.4634) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][110/625] eta 0:05:39 lr 0.001097 wd 0.0500 time 0.4192 (0.6594) data time 0.0011 (0.0099) model time 0.4181 (0.4164) loss 3.4109 (3.3190) grad_norm 1.2147 (1.4397) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][120/625] eta 0:05:21 lr 0.001097 wd 0.0500 time 0.4308 (0.6373) data time 0.0010 (0.0091) model time 0.4298 (0.4162) loss 2.9382 (3.3202) grad_norm 1.6204 (1.4562) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][130/625] eta 0:05:06 lr 0.001097 wd 0.0500 time 0.4132 (0.6189) data time 0.0008 (0.0084) model time 0.4124 (0.4160) loss 3.6333 (3.3336) grad_norm 1.4735 (1.4738) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][140/625] eta 0:04:52 lr 0.001097 wd 0.0500 time 0.4130 (0.6031) data time 0.0008 (0.0078) model time 0.4122 (0.4157) loss 3.0797 (3.3181) grad_norm 1.0963 (1.4657) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][150/625] eta 0:04:40 lr 0.001097 wd 0.0500 time 0.4152 (0.5896) data time 0.0007 (0.0074) model time 0.4145 (0.4154) loss 2.4028 (3.3110) grad_norm 1.3239 (1.4696) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][160/625] eta 0:04:28 lr 0.001097 wd 0.0500 time 0.4134 (0.5779) data time 0.0010 (0.0069) model time 0.4124 (0.4152) loss 3.4753 (3.3162) grad_norm 2.6197 (1.4884) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][170/625] eta 0:04:18 lr 0.001097 wd 0.0500 time 0.4132 (0.5678) data time 0.0012 (0.0066) model time 0.4120 (0.4152) loss 3.6563 (3.3201) grad_norm 1.1995 (1.4935) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][180/625] eta 0:04:08 lr 0.001097 wd 0.0500 time 0.4123 (0.5588) data time 0.0008 (0.0063) model time 0.4115 (0.4150) loss 2.0769 (3.3180) grad_norm 1.4298 (1.4910) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][190/625] eta 0:03:59 lr 0.001097 wd 0.0500 time 0.4143 (0.5509) data time 0.0008 (0.0060) model time 0.4135 (0.4151) loss 2.6783 (3.3002) grad_norm 1.2674 (1.4829) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][200/625] eta 0:03:51 lr 0.001097 wd 0.0500 time 0.4172 (0.5448) data time 0.0008 (0.0057) model time 0.4164 (0.4163) loss 2.7129 (3.2937) grad_norm 1.2669 (1.4829) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][210/625] eta 0:03:43 lr 0.001097 wd 0.0500 time 0.4214 (0.5382) data time 0.0010 (0.0055) model time 0.4204 (0.4161) loss 3.6168 (3.2849) grad_norm 1.1143 (1.4757) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][220/625] eta 0:03:35 lr 0.001097 wd 0.0500 time 0.4109 (0.5323) data time 0.0008 (0.0053) model time 0.4100 (0.4159) loss 3.1635 (3.2745) grad_norm 1.1194 (1.4699) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][230/625] eta 0:03:28 lr 0.001097 wd 0.0500 time 0.4111 (0.5270) data time 0.0011 (0.0051) model time 0.4100 (0.4158) loss 3.3897 (3.2705) grad_norm 1.7544 (1.4827) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][240/625] eta 0:03:20 lr 0.001097 wd 0.0500 time 0.4117 (0.5220) data time 0.0010 (0.0049) model time 0.4107 (0.4156) loss 3.3750 (3.2694) grad_norm 1.2928 (1.4713) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][250/625] eta 0:03:14 lr 0.001097 wd 0.0500 time 0.4189 (0.5177) data time 0.0010 (0.0047) model time 0.4179 (0.4157) loss 3.4954 (3.2606) grad_norm 1.4482 (1.4726) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][260/625] eta 0:03:07 lr 0.001096 wd 0.0500 time 0.4161 (0.5135) data time 0.0008 (0.0046) model time 0.4153 (0.4155) loss 2.2405 (3.2487) grad_norm 1.2029 (1.4715) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][270/625] eta 0:03:00 lr 0.001096 wd 0.0500 time 0.4087 (0.5097) data time 0.0010 (0.0045) model time 0.4076 (0.4154) loss 2.5781 (3.2429) grad_norm 1.7492 (1.4676) loss_scale 8192.0000 (4222.0308) mem 16721MB [2024-08-07 11:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][280/625] eta 0:02:54 lr 0.001096 wd 0.0500 time 0.4123 (0.5062) data time 0.0011 (0.0043) model time 0.4112 (0.4154) loss 4.0019 (3.2402) grad_norm 0.9907 (1.4706) loss_scale 8192.0000 (4369.0667) mem 16721MB [2024-08-07 11:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][290/625] eta 0:02:48 lr 0.001096 wd 0.0500 time 0.4213 (0.5029) data time 0.0012 (0.0042) model time 0.4201 (0.4153) loss 3.2826 (3.2490) grad_norm 1.2318 (1.4705) loss_scale 8192.0000 (4505.6000) mem 16721MB [2024-08-07 11:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][300/625] eta 0:02:42 lr 0.001096 wd 0.0500 time 0.4144 (0.4999) data time 0.0009 (0.0041) model time 0.4134 (0.4152) loss 2.8847 (3.2454) grad_norm 1.3587 (1.4819) loss_scale 8192.0000 (4632.7172) mem 16721MB [2024-08-07 11:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][310/625] eta 0:02:36 lr 0.001096 wd 0.0500 time 0.4110 (0.4972) data time 0.0009 (0.0040) model time 0.4101 (0.4153) loss 2.9979 (3.2380) grad_norm 1.6596 (1.4814) loss_scale 8192.0000 (4751.3600) mem 16721MB [2024-08-07 11:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][320/625] eta 0:02:30 lr 0.001096 wd 0.0500 time 0.4136 (0.4945) data time 0.0010 (0.0039) model time 0.4126 (0.4153) loss 3.1203 (3.2333) grad_norm 1.9777 (1.4802) loss_scale 8192.0000 (4862.3484) mem 16721MB [2024-08-07 11:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][330/625] eta 0:02:25 lr 0.001096 wd 0.0500 time 0.4153 (0.4920) data time 0.0010 (0.0038) model time 0.4143 (0.4152) loss 2.9296 (3.2416) grad_norm 1.3764 (1.4923) loss_scale 8192.0000 (4966.4000) mem 16721MB [2024-08-07 11:57:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][340/625] eta 0:02:19 lr 0.001096 wd 0.0500 time 0.4189 (0.4897) data time 0.0008 (0.0037) model time 0.4181 (0.4151) loss 3.6832 (3.2463) grad_norm 1.1104 (1.4973) loss_scale 8192.0000 (5064.1455) mem 16721MB [2024-08-07 11:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][350/625] eta 0:02:14 lr 0.001096 wd 0.0500 time 0.4119 (0.4874) data time 0.0012 (0.0037) model time 0.4107 (0.4150) loss 2.9163 (3.2462) grad_norm 1.2749 (1.4935) loss_scale 8192.0000 (5156.1412) mem 16721MB [2024-08-07 11:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][360/625] eta 0:02:08 lr 0.001096 wd 0.0500 time 0.4154 (0.4854) data time 0.0010 (0.0036) model time 0.4145 (0.4151) loss 2.5256 (3.2499) grad_norm 1.0987 (1.5029) loss_scale 8192.0000 (5242.8800) mem 16721MB [2024-08-07 11:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][370/625] eta 0:02:03 lr 0.001096 wd 0.0500 time 0.4140 (0.4835) data time 0.0008 (0.0035) model time 0.4133 (0.4150) loss 4.3579 (3.2518) grad_norm 1.3374 (1.5065) loss_scale 8192.0000 (5324.8000) mem 16721MB [2024-08-07 11:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][380/625] eta 0:01:58 lr 0.001096 wd 0.0500 time 0.4105 (0.4822) data time 0.0011 (0.0034) model time 0.4094 (0.4156) loss 3.5900 (3.2470) grad_norm 1.3879 (1.5020) loss_scale 8192.0000 (5402.2919) mem 16721MB [2024-08-07 11:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][390/625] eta 0:01:52 lr 0.001096 wd 0.0500 time 0.4168 (0.4804) data time 0.0008 (0.0034) model time 0.4160 (0.4155) loss 3.1733 (3.2463) grad_norm 1.3392 (1.4990) loss_scale 8192.0000 (5475.7053) mem 16721MB [2024-08-07 11:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][400/625] eta 0:01:47 lr 0.001096 wd 0.0500 time 0.4236 (0.4787) data time 0.0010 (0.0033) model time 0.4225 (0.4155) loss 2.4780 (3.2384) grad_norm 1.9574 (inf) loss_scale 4096.0000 (5440.3282) mem 16721MB [2024-08-07 11:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][410/625] eta 0:01:42 lr 0.001096 wd 0.0500 time 0.4133 (0.4772) data time 0.0011 (0.0033) model time 0.4122 (0.4155) loss 3.3727 (3.2435) grad_norm 1.1516 (inf) loss_scale 4096.0000 (5406.7200) mem 16721MB [2024-08-07 11:58:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][420/625] eta 0:01:37 lr 0.001096 wd 0.0500 time 0.4175 (0.4756) data time 0.0008 (0.0032) model time 0.4167 (0.4154) loss 3.4864 (3.2516) grad_norm 1.0405 (inf) loss_scale 4096.0000 (5374.7512) mem 16721MB [2024-08-07 11:58:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][430/625] eta 0:01:32 lr 0.001095 wd 0.0500 time 0.4076 (0.4742) data time 0.0008 (0.0032) model time 0.4068 (0.4154) loss 3.5977 (3.2488) grad_norm 2.0704 (inf) loss_scale 4096.0000 (5344.3048) mem 16721MB [2024-08-07 11:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][440/625] eta 0:01:27 lr 0.001095 wd 0.0500 time 0.4121 (0.4727) data time 0.0009 (0.0031) model time 0.4111 (0.4153) loss 3.6801 (3.2542) grad_norm 1.4906 (inf) loss_scale 4096.0000 (5315.2744) mem 16721MB [2024-08-07 11:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][450/625] eta 0:01:22 lr 0.001095 wd 0.0500 time 0.4122 (0.4715) data time 0.0011 (0.0031) model time 0.4111 (0.4153) loss 2.7427 (3.2584) grad_norm 1.4227 (inf) loss_scale 4096.0000 (5287.5636) mem 16721MB [2024-08-07 11:58:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][460/625] eta 0:01:17 lr 0.001095 wd 0.0500 time 0.4110 (0.4702) data time 0.0007 (0.0030) model time 0.4102 (0.4152) loss 3.7535 (3.2573) grad_norm 1.1846 (inf) loss_scale 4096.0000 (5261.0844) mem 16721MB [2024-08-07 11:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][470/625] eta 0:01:12 lr 0.001095 wd 0.0500 time 0.4166 (0.4689) data time 0.0009 (0.0030) model time 0.4157 (0.4151) loss 3.5681 (3.2522) grad_norm 1.0990 (inf) loss_scale 4096.0000 (5235.7565) mem 16721MB [2024-08-07 11:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][480/625] eta 0:01:07 lr 0.001095 wd 0.0500 time 0.4163 (0.4678) data time 0.0010 (0.0029) model time 0.4153 (0.4151) loss 2.9599 (3.2439) grad_norm 1.1987 (inf) loss_scale 4096.0000 (5211.5064) mem 16721MB [2024-08-07 11:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][490/625] eta 0:01:02 lr 0.001095 wd 0.0500 time 0.4110 (0.4666) data time 0.0012 (0.0029) model time 0.4098 (0.4151) loss 2.8852 (3.2421) grad_norm 1.9347 (inf) loss_scale 4096.0000 (5188.2667) mem 16721MB [2024-08-07 11:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][500/625] eta 0:00:58 lr 0.001095 wd 0.0500 time 0.4134 (0.4656) data time 0.0008 (0.0029) model time 0.4126 (0.4151) loss 3.0262 (3.2476) grad_norm 1.3378 (inf) loss_scale 4096.0000 (5165.9755) mem 16721MB [2024-08-07 11:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][510/625] eta 0:00:53 lr 0.001095 wd 0.0500 time 0.4085 (0.4646) data time 0.0008 (0.0028) model time 0.4077 (0.4150) loss 2.3423 (3.2465) grad_norm 1.1934 (inf) loss_scale 4096.0000 (5144.5760) mem 16721MB [2024-08-07 11:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][520/625] eta 0:00:48 lr 0.001095 wd 0.0500 time 0.4117 (0.4636) data time 0.0011 (0.0028) model time 0.4105 (0.4150) loss 3.4854 (3.2491) grad_norm 1.5447 (inf) loss_scale 4096.0000 (5124.0157) mem 16721MB [2024-08-07 11:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][530/625] eta 0:00:43 lr 0.001095 wd 0.0500 time 0.4138 (0.4631) data time 0.0009 (0.0028) model time 0.4129 (0.4154) loss 3.4351 (3.2512) grad_norm 1.9826 (inf) loss_scale 4096.0000 (5104.2462) mem 16721MB [2024-08-07 11:58:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][540/625] eta 0:00:39 lr 0.001095 wd 0.0500 time 0.4122 (0.4623) data time 0.0010 (0.0027) model time 0.4111 (0.4155) loss 3.0998 (3.2458) grad_norm 1.1569 (inf) loss_scale 4096.0000 (5085.2226) mem 16721MB [2024-08-07 11:58:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][550/625] eta 0:00:34 lr 0.001095 wd 0.0500 time 0.4202 (0.4614) data time 0.0008 (0.0027) model time 0.4195 (0.4155) loss 3.2793 (3.2449) grad_norm 1.3778 (inf) loss_scale 4096.0000 (5066.9037) mem 16721MB [2024-08-07 11:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][560/625] eta 0:00:29 lr 0.001095 wd 0.0500 time 0.4210 (0.4607) data time 0.0019 (0.0027) model time 0.4191 (0.4155) loss 2.2497 (3.2452) grad_norm 1.1059 (inf) loss_scale 4096.0000 (5049.2509) mem 16721MB [2024-08-07 11:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][570/625] eta 0:00:25 lr 0.001095 wd 0.0500 time 0.4130 (0.4601) data time 0.0011 (0.0027) model time 0.4119 (0.4158) loss 3.4565 (3.2497) grad_norm 1.2409 (inf) loss_scale 4096.0000 (5032.2286) mem 16721MB [2024-08-07 11:59:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][580/625] eta 0:00:20 lr 0.001095 wd 0.0500 time 0.4124 (0.4594) data time 0.0010 (0.0026) model time 0.4113 (0.4158) loss 3.2544 (3.2512) grad_norm 1.6480 (inf) loss_scale 4096.0000 (5015.8035) mem 16721MB [2024-08-07 11:59:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][590/625] eta 0:00:16 lr 0.001094 wd 0.0500 time 0.4114 (0.4586) data time 0.0008 (0.0026) model time 0.4106 (0.4157) loss 3.0676 (3.2499) grad_norm 1.2691 (inf) loss_scale 4096.0000 (4999.9448) mem 16721MB [2024-08-07 11:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][600/625] eta 0:00:11 lr 0.001094 wd 0.0500 time 0.4165 (0.4579) data time 0.0009 (0.0026) model time 0.4157 (0.4158) loss 4.0078 (3.2540) grad_norm 1.6080 (inf) loss_scale 4096.0000 (4984.6237) mem 16721MB [2024-08-07 11:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][610/625] eta 0:00:06 lr 0.001094 wd 0.0500 time 0.4034 (0.4571) data time 0.0008 (0.0025) model time 0.4026 (0.4157) loss 3.2009 (3.2511) grad_norm 1.5806 (inf) loss_scale 4096.0000 (4969.8133) mem 16721MB [2024-08-07 11:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [73/300][620/625] eta 0:00:02 lr 0.001094 wd 0.0500 time 0.4054 (0.4564) data time 0.0006 (0.0025) model time 0.4049 (0.4156) loss 3.4648 (3.2500) grad_norm 1.1778 (inf) loss_scale 4096.0000 (4955.4885) mem 16721MB [2024-08-07 11:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 73 training takes 0:04:39 [2024-08-07 11:59:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 11:59:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 11:59:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.577 (0.577) Loss 0.5806 (0.5806) Acc@1 86.768 (86.768) Acc@5 97.852 (97.852) Mem 16721MB [2024-08-07 11:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.164) Loss 0.9766 (0.7257) Acc@1 76.270 (83.132) Acc@5 93.604 (96.755) Mem 16721MB [2024-08-07 11:59:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 1.1133 (0.8667) Acc@1 72.656 (79.704) Acc@5 92.383 (95.199) Mem 16721MB [2024-08-07 11:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.443 Acc@5 95.206 [2024-08-07 11:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.4% [2024-08-07 11:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.44% [2024-08-07 11:59:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 11:59:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 11:59:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.578 (0.578) Loss 0.5234 (0.5234) Acc@1 88.184 (88.184) Acc@5 98.438 (98.438) Mem 16721MB [2024-08-07 11:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.121 (0.165) Loss 0.8657 (0.6610) Acc@1 78.809 (84.965) Acc@5 95.410 (97.368) Mem 16721MB [2024-08-07 11:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.142) Loss 0.9941 (0.7889) Acc@1 74.951 (81.589) Acc@5 94.238 (95.964) Mem 16721MB [2024-08-07 11:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.332 Acc@5 95.957 [2024-08-07 11:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-07 11:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.33% [2024-08-07 11:59:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 11:59:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 11:59:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][0/625] eta 0:11:03 lr 0.001094 wd 0.0500 time 1.0616 (1.0616) data time 0.4651 (0.4651) model time 0.0000 (0.0000) loss 3.9212 (3.9212) grad_norm 2.4308 (2.4308) loss_scale 4096.0000 (4096.0000) mem 16719MB [2024-08-07 11:59:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][10/625] eta 0:04:50 lr 0.001094 wd 0.0500 time 0.4201 (0.4721) data time 0.0010 (0.0433) model time 0.0000 (0.0000) loss 3.1759 (3.5203) grad_norm 1.2587 (1.5399) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 11:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][20/625] eta 0:04:29 lr 0.001094 wd 0.0500 time 0.4228 (0.4455) data time 0.0011 (0.0232) model time 0.0000 (0.0000) loss 2.6433 (3.4205) grad_norm 1.8744 (1.6880) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][30/625] eta 0:04:22 lr 0.001094 wd 0.0500 time 0.4095 (0.4417) data time 0.0009 (0.0160) model time 0.0000 (0.0000) loss 2.8489 (3.3014) grad_norm 1.7420 (1.6413) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][40/625] eta 0:04:14 lr 0.001094 wd 0.0500 time 0.4231 (0.4355) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 3.3413 (3.3022) grad_norm 1.2319 (1.7701) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][50/625] eta 0:04:07 lr 0.001094 wd 0.0500 time 0.4104 (0.4310) data time 0.0007 (0.0102) model time 0.0000 (0.0000) loss 3.3558 (3.2319) grad_norm 1.4617 (1.7495) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][60/625] eta 0:04:02 lr 0.001094 wd 0.0500 time 0.4180 (0.4286) data time 0.0008 (0.0087) model time 0.4172 (0.4156) loss 3.3353 (3.2938) grad_norm 1.8504 (1.7491) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][70/625] eta 0:03:56 lr 0.001094 wd 0.0500 time 0.4115 (0.4267) data time 0.0009 (0.0076) model time 0.4106 (0.4149) loss 3.5838 (3.2864) grad_norm 2.1898 (1.7375) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][80/625] eta 0:03:51 lr 0.001094 wd 0.0500 time 0.4094 (0.4255) data time 0.0010 (0.0068) model time 0.4084 (0.4153) loss 2.8926 (3.2580) grad_norm 1.4172 (1.7374) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][90/625] eta 0:03:46 lr 0.001094 wd 0.0500 time 0.4115 (0.4243) data time 0.0009 (0.0062) model time 0.4107 (0.4147) loss 3.5389 (3.2376) grad_norm 1.3954 (1.6968) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][100/625] eta 0:03:42 lr 0.001094 wd 0.0500 time 0.4093 (0.4233) data time 0.0010 (0.0056) model time 0.4082 (0.4146) loss 3.7574 (3.2052) grad_norm 1.4494 (1.6760) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][110/625] eta 0:03:37 lr 0.001094 wd 0.0500 time 0.4091 (0.4225) data time 0.0008 (0.0052) model time 0.4083 (0.4144) loss 3.1752 (3.2025) grad_norm 1.6943 (1.6677) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][120/625] eta 0:03:33 lr 0.001094 wd 0.0500 time 0.4178 (0.4219) data time 0.0008 (0.0049) model time 0.4170 (0.4143) loss 3.3180 (3.2188) grad_norm 1.3730 (1.6650) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][130/625] eta 0:03:28 lr 0.001093 wd 0.0500 time 0.4154 (0.4215) data time 0.0011 (0.0046) model time 0.4143 (0.4144) loss 3.1501 (3.2216) grad_norm 1.0319 (1.6490) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][140/625] eta 0:03:24 lr 0.001093 wd 0.0500 time 0.4177 (0.4225) data time 0.0010 (0.0043) model time 0.4166 (0.4167) loss 2.4985 (3.2059) grad_norm 1.8607 (1.6346) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][150/625] eta 0:03:20 lr 0.001093 wd 0.0500 time 0.4172 (0.4220) data time 0.0007 (0.0041) model time 0.4164 (0.4164) loss 3.1944 (3.2230) grad_norm 1.6333 (1.6098) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][160/625] eta 0:03:16 lr 0.001093 wd 0.0500 time 0.4153 (0.4216) data time 0.0010 (0.0039) model time 0.4143 (0.4162) loss 2.2749 (3.2289) grad_norm 1.6941 (1.5907) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:00:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][170/625] eta 0:03:11 lr 0.001093 wd 0.0500 time 0.4280 (0.4218) data time 0.0010 (0.0038) model time 0.4270 (0.4169) loss 2.1586 (3.2318) grad_norm 1.0438 (1.5681) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][180/625] eta 0:03:07 lr 0.001093 wd 0.0500 time 0.4168 (0.4214) data time 0.0007 (0.0036) model time 0.4161 (0.4167) loss 2.9929 (3.2340) grad_norm 2.5032 (1.5659) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][190/625] eta 0:03:03 lr 0.001093 wd 0.0500 time 0.4246 (0.4211) data time 0.0009 (0.0035) model time 0.4237 (0.4165) loss 3.7924 (3.2300) grad_norm 1.3791 (1.5630) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][200/625] eta 0:02:58 lr 0.001093 wd 0.0500 time 0.4146 (0.4208) data time 0.0008 (0.0034) model time 0.4138 (0.4164) loss 3.8155 (3.2135) grad_norm 1.2547 (1.5620) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][210/625] eta 0:02:54 lr 0.001093 wd 0.0500 time 0.4110 (0.4206) data time 0.0008 (0.0033) model time 0.4102 (0.4162) loss 2.2460 (3.2143) grad_norm 1.2492 (1.5593) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][220/625] eta 0:02:50 lr 0.001093 wd 0.0500 time 0.4411 (0.4213) data time 0.0011 (0.0032) model time 0.4400 (0.4174) loss 2.5136 (3.2012) grad_norm 1.1805 (1.5484) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][230/625] eta 0:02:46 lr 0.001093 wd 0.0500 time 0.4158 (0.4211) data time 0.0008 (0.0031) model time 0.4150 (0.4172) loss 2.3557 (3.1927) grad_norm 1.3555 (1.5389) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][240/625] eta 0:02:42 lr 0.001093 wd 0.0500 time 0.4117 (0.4209) data time 0.0008 (0.0030) model time 0.4109 (0.4171) loss 3.7425 (3.2046) grad_norm 1.1671 (1.5369) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][250/625] eta 0:02:37 lr 0.001093 wd 0.0500 time 0.4167 (0.4206) data time 0.0011 (0.0029) model time 0.4156 (0.4169) loss 3.4370 (3.2030) grad_norm 2.9383 (1.5366) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][260/625] eta 0:02:33 lr 0.001093 wd 0.0500 time 0.4096 (0.4206) data time 0.0007 (0.0028) model time 0.4089 (0.4170) loss 3.3641 (3.2068) grad_norm 1.2858 (1.5343) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][270/625] eta 0:02:29 lr 0.001093 wd 0.0500 time 0.4100 (0.4204) data time 0.0008 (0.0028) model time 0.4092 (0.4169) loss 2.4383 (3.2026) grad_norm 1.6004 (1.5298) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][280/625] eta 0:02:24 lr 0.001093 wd 0.0500 time 0.4171 (0.4203) data time 0.0010 (0.0027) model time 0.4161 (0.4168) loss 3.4282 (3.2030) grad_norm 1.2989 (1.5287) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][290/625] eta 0:02:20 lr 0.001093 wd 0.0500 time 0.4095 (0.4200) data time 0.0007 (0.0027) model time 0.4088 (0.4166) loss 3.4699 (3.2069) grad_norm 1.8650 (1.5276) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][300/625] eta 0:02:16 lr 0.001092 wd 0.0500 time 0.4148 (0.4200) data time 0.0012 (0.0026) model time 0.4137 (0.4167) loss 4.0985 (3.2039) grad_norm 1.6242 (1.5393) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][310/625] eta 0:02:12 lr 0.001092 wd 0.0500 time 0.4178 (0.4199) data time 0.0008 (0.0026) model time 0.4170 (0.4167) loss 3.2897 (3.2144) grad_norm 1.4136 (1.5348) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][320/625] eta 0:02:08 lr 0.001092 wd 0.0500 time 0.4121 (0.4197) data time 0.0008 (0.0025) model time 0.4113 (0.4166) loss 2.5934 (3.2206) grad_norm 1.0546 (1.5236) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][330/625] eta 0:02:03 lr 0.001092 wd 0.0500 time 0.4115 (0.4196) data time 0.0008 (0.0025) model time 0.4107 (0.4165) loss 3.7296 (3.2164) grad_norm 1.8225 (1.5230) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][340/625] eta 0:01:59 lr 0.001092 wd 0.0500 time 0.4125 (0.4195) data time 0.0010 (0.0024) model time 0.4115 (0.4164) loss 3.5590 (3.2053) grad_norm 1.5859 (1.5227) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][350/625] eta 0:01:55 lr 0.001092 wd 0.0500 time 0.4128 (0.4194) data time 0.0011 (0.0024) model time 0.4117 (0.4163) loss 3.1742 (3.2057) grad_norm 1.8615 (1.5191) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][360/625] eta 0:01:51 lr 0.001092 wd 0.0500 time 0.4100 (0.4199) data time 0.0008 (0.0024) model time 0.4092 (0.4170) loss 3.7305 (3.2144) grad_norm 1.7282 (1.5233) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][370/625] eta 0:01:47 lr 0.001092 wd 0.0500 time 0.4286 (0.4198) data time 0.0011 (0.0023) model time 0.4275 (0.4169) loss 3.5133 (3.2132) grad_norm 1.6332 (1.5280) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][380/625] eta 0:01:42 lr 0.001092 wd 0.0500 time 0.4140 (0.4196) data time 0.0008 (0.0023) model time 0.4132 (0.4168) loss 3.7149 (3.2125) grad_norm 1.6485 (1.5310) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][390/625] eta 0:01:38 lr 0.001092 wd 0.0500 time 0.4128 (0.4195) data time 0.0008 (0.0023) model time 0.4120 (0.4168) loss 2.3267 (3.2082) grad_norm 1.4448 (1.5247) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][400/625] eta 0:01:34 lr 0.001092 wd 0.0500 time 0.4215 (0.4195) data time 0.0011 (0.0022) model time 0.4205 (0.4168) loss 3.8634 (3.2147) grad_norm 1.5866 (1.5218) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][410/625] eta 0:01:30 lr 0.001092 wd 0.0500 time 0.4133 (0.4194) data time 0.0008 (0.0022) model time 0.4125 (0.4167) loss 2.3602 (3.2013) grad_norm 1.5536 (1.5286) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][420/625] eta 0:01:25 lr 0.001092 wd 0.0500 time 0.4336 (0.4194) data time 0.0008 (0.0022) model time 0.4327 (0.4167) loss 3.9427 (3.1990) grad_norm 1.5109 (1.5322) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][430/625] eta 0:01:21 lr 0.001092 wd 0.0500 time 0.4180 (0.4192) data time 0.0011 (0.0021) model time 0.4169 (0.4166) loss 3.0026 (3.2038) grad_norm 1.9860 (1.5282) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][440/625] eta 0:01:17 lr 0.001092 wd 0.0500 time 0.6237 (0.4200) data time 0.0010 (0.0021) model time 0.6228 (0.4175) loss 3.5043 (3.2056) grad_norm 0.9148 (1.5245) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:02:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][450/625] eta 0:01:13 lr 0.001092 wd 0.0500 time 0.4129 (0.4199) data time 0.0011 (0.0021) model time 0.4117 (0.4175) loss 4.0099 (3.2123) grad_norm 1.2235 (1.5256) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][460/625] eta 0:01:09 lr 0.001091 wd 0.0500 time 0.4089 (0.4203) data time 0.0011 (0.0021) model time 0.4078 (0.4179) loss 3.0500 (3.2077) grad_norm 1.4978 (1.5232) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][470/625] eta 0:01:05 lr 0.001091 wd 0.0500 time 0.4295 (0.4203) data time 0.0008 (0.0021) model time 0.4286 (0.4180) loss 2.5396 (3.2092) grad_norm 1.2391 (1.5263) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][480/625] eta 0:01:00 lr 0.001091 wd 0.0500 time 0.4241 (0.4202) data time 0.0007 (0.0021) model time 0.4233 (0.4179) loss 3.0376 (3.2115) grad_norm 1.9482 (1.5251) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][490/625] eta 0:00:56 lr 0.001091 wd 0.0500 time 0.4236 (0.4202) data time 0.0008 (0.0020) model time 0.4227 (0.4178) loss 3.0046 (3.2117) grad_norm 1.2163 (1.5272) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][500/625] eta 0:00:52 lr 0.001091 wd 0.0500 time 0.4321 (0.4201) data time 0.0010 (0.0020) model time 0.4312 (0.4177) loss 3.6050 (3.2144) grad_norm 1.6014 (1.5248) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][510/625] eta 0:00:48 lr 0.001091 wd 0.0500 time 0.4176 (0.4204) data time 0.0009 (0.0020) model time 0.4167 (0.4181) loss 3.5493 (3.2122) grad_norm 1.2483 (1.5234) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][520/625] eta 0:00:44 lr 0.001091 wd 0.0500 time 0.4109 (0.4203) data time 0.0010 (0.0020) model time 0.4098 (0.4181) loss 2.9689 (3.2132) grad_norm 1.5207 (1.5234) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][530/625] eta 0:00:39 lr 0.001091 wd 0.0500 time 0.4147 (0.4203) data time 0.0010 (0.0020) model time 0.4137 (0.4181) loss 3.0855 (3.2152) grad_norm 0.9681 (1.5253) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][540/625] eta 0:00:35 lr 0.001091 wd 0.0500 time 0.4262 (0.4203) data time 0.0012 (0.0020) model time 0.4250 (0.4181) loss 3.8256 (3.2158) grad_norm 1.2211 (1.5230) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][550/625] eta 0:00:31 lr 0.001091 wd 0.0500 time 0.4130 (0.4202) data time 0.0012 (0.0019) model time 0.4118 (0.4180) loss 3.3893 (3.2131) grad_norm 1.0700 (1.5203) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][560/625] eta 0:00:27 lr 0.001091 wd 0.0500 time 0.4129 (0.4202) data time 0.0008 (0.0019) model time 0.4121 (0.4180) loss 3.6275 (3.2163) grad_norm 1.1665 (1.5162) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][570/625] eta 0:00:23 lr 0.001091 wd 0.0500 time 0.4124 (0.4201) data time 0.0008 (0.0019) model time 0.4116 (0.4180) loss 3.6150 (3.2240) grad_norm 1.8037 (1.5140) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][580/625] eta 0:00:18 lr 0.001091 wd 0.0500 time 0.4128 (0.4201) data time 0.0008 (0.0019) model time 0.4120 (0.4179) loss 2.9048 (3.2229) grad_norm 1.1697 (1.5125) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][590/625] eta 0:00:14 lr 0.001091 wd 0.0500 time 0.4068 (0.4203) data time 0.0008 (0.0019) model time 0.4060 (0.4182) loss 3.7567 (3.2268) grad_norm 1.5873 (1.5113) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][600/625] eta 0:00:10 lr 0.001091 wd 0.0500 time 0.4141 (0.4202) data time 0.0007 (0.0019) model time 0.4133 (0.4181) loss 3.9999 (3.2297) grad_norm 1.2624 (1.5092) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][610/625] eta 0:00:06 lr 0.001091 wd 0.0500 time 0.4033 (0.4201) data time 0.0008 (0.0019) model time 0.4025 (0.4180) loss 3.1976 (3.2271) grad_norm 1.3733 (1.5108) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [74/300][620/625] eta 0:00:02 lr 0.001090 wd 0.0500 time 0.4070 (0.4199) data time 0.0007 (0.0018) model time 0.4062 (0.4178) loss 3.5540 (3.2243) grad_norm 1.7369 (1.5098) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:09 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 74 training takes 0:04:22 [2024-08-07 12:04:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 12:04:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 12:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.592 (0.592) Loss 0.5811 (0.5811) Acc@1 86.865 (86.865) Acc@5 98.047 (98.047) Mem 16721MB [2024-08-07 12:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.168) Loss 0.9658 (0.7346) Acc@1 76.855 (83.363) Acc@5 93.457 (96.689) Mem 16721MB [2024-08-07 12:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.144) Loss 1.0596 (0.8790) Acc@1 73.291 (79.801) Acc@5 92.920 (95.038) Mem 16721MB [2024-08-07 12:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.615 Acc@5 95.070 [2024-08-07 12:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.6% [2024-08-07 12:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.61% [2024-08-07 12:04:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 12:04:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 12:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.584 (0.584) Loss 0.5234 (0.5234) Acc@1 88.086 (88.086) Acc@5 98.438 (98.438) Mem 16721MB [2024-08-07 12:04:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.165) Loss 0.8638 (0.6593) Acc@1 78.906 (84.988) Acc@5 95.361 (97.385) Mem 16721MB [2024-08-07 12:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.141) Loss 0.9917 (0.7867) Acc@1 75.195 (81.617) Acc@5 94.385 (95.989) Mem 16721MB [2024-08-07 12:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.364 Acc@5 95.983 [2024-08-07 12:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-07 12:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.36% [2024-08-07 12:04:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 12:04:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 12:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][0/625] eta 0:09:41 lr 0.001090 wd 0.0500 time 0.9303 (0.9303) data time 0.4943 (0.4943) model time 0.0000 (0.0000) loss 3.2423 (3.2423) grad_norm 1.0717 (1.0717) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][10/625] eta 0:04:45 lr 0.001090 wd 0.0500 time 0.4332 (0.4647) data time 0.0011 (0.0460) model time 0.0000 (0.0000) loss 1.9736 (3.0686) grad_norm 1.1551 (1.2975) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][20/625] eta 0:04:26 lr 0.001090 wd 0.0500 time 0.4099 (0.4412) data time 0.0011 (0.0247) model time 0.0000 (0.0000) loss 3.4673 (3.2286) grad_norm 1.5728 (1.3048) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][30/625] eta 0:04:17 lr 0.001090 wd 0.0500 time 0.4146 (0.4330) data time 0.0017 (0.0171) model time 0.0000 (0.0000) loss 3.0330 (3.2862) grad_norm 1.3514 (1.3228) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][40/625] eta 0:04:11 lr 0.001090 wd 0.0500 time 0.4179 (0.4293) data time 0.0010 (0.0132) model time 0.0000 (0.0000) loss 3.8829 (3.3500) grad_norm 2.2208 (1.3927) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][50/625] eta 0:04:05 lr 0.001090 wd 0.0500 time 0.4206 (0.4271) data time 0.0011 (0.0108) model time 0.0000 (0.0000) loss 3.7273 (3.3604) grad_norm 1.2980 (1.4727) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][60/625] eta 0:04:00 lr 0.001090 wd 0.0500 time 0.4133 (0.4252) data time 0.0010 (0.0092) model time 0.4123 (0.4143) loss 3.5805 (3.3372) grad_norm 1.3490 (1.4600) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][70/625] eta 0:03:55 lr 0.001090 wd 0.0500 time 0.4185 (0.4240) data time 0.0008 (0.0081) model time 0.4177 (0.4152) loss 4.1887 (3.3147) grad_norm 0.9843 (1.4359) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:04:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][80/625] eta 0:03:50 lr 0.001090 wd 0.0500 time 0.4164 (0.4236) data time 0.0008 (0.0072) model time 0.4156 (0.4166) loss 3.3951 (3.2606) grad_norm 1.3613 (1.4320) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][90/625] eta 0:03:46 lr 0.001090 wd 0.0500 time 0.4098 (0.4243) data time 0.0011 (0.0065) model time 0.4088 (0.4197) loss 3.3493 (3.2531) grad_norm 1.5263 (1.4554) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][100/625] eta 0:03:44 lr 0.001090 wd 0.0500 time 0.4080 (0.4280) data time 0.0010 (0.0060) model time 0.4069 (0.4279) loss 3.1913 (3.2400) grad_norm 1.4416 (1.4457) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][110/625] eta 0:03:39 lr 0.001090 wd 0.0500 time 0.4130 (0.4269) data time 0.0008 (0.0055) model time 0.4122 (0.4258) loss 4.0376 (3.2320) grad_norm 2.2238 (1.4528) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][120/625] eta 0:03:35 lr 0.001090 wd 0.0500 time 0.4266 (0.4261) data time 0.0010 (0.0052) model time 0.4256 (0.4243) loss 3.7894 (3.2410) grad_norm 1.7770 (1.4543) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][130/625] eta 0:03:30 lr 0.001090 wd 0.0500 time 0.4068 (0.4252) data time 0.0010 (0.0049) model time 0.4057 (0.4230) loss 3.6577 (3.2506) grad_norm 1.6084 (1.4483) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][140/625] eta 0:03:25 lr 0.001090 wd 0.0500 time 0.4221 (0.4246) data time 0.0011 (0.0046) model time 0.4210 (0.4221) loss 3.0866 (3.2559) grad_norm 1.9100 (1.4548) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][150/625] eta 0:03:21 lr 0.001090 wd 0.0500 time 0.4199 (0.4242) data time 0.0011 (0.0044) model time 0.4188 (0.4216) loss 3.1464 (3.2552) grad_norm 1.2580 (1.4618) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][160/625] eta 0:03:16 lr 0.001089 wd 0.0500 time 0.4126 (0.4236) data time 0.0010 (0.0042) model time 0.4116 (0.4209) loss 3.2669 (3.2476) grad_norm 1.3367 (1.4556) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][170/625] eta 0:03:12 lr 0.001089 wd 0.0500 time 0.4161 (0.4239) data time 0.0011 (0.0040) model time 0.4150 (0.4215) loss 3.7698 (3.2623) grad_norm 1.2189 (1.4620) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][180/625] eta 0:03:08 lr 0.001089 wd 0.0500 time 0.4159 (0.4235) data time 0.0010 (0.0038) model time 0.4149 (0.4210) loss 3.3629 (3.2573) grad_norm 1.6422 (1.4618) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][190/625] eta 0:03:04 lr 0.001089 wd 0.0500 time 0.4142 (0.4230) data time 0.0008 (0.0037) model time 0.4134 (0.4205) loss 3.7227 (3.2501) grad_norm 0.8377 (1.4590) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][200/625] eta 0:02:59 lr 0.001089 wd 0.0500 time 0.4234 (0.4227) data time 0.0018 (0.0036) model time 0.4216 (0.4202) loss 3.5935 (3.2480) grad_norm 1.6746 (1.4518) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][210/625] eta 0:02:55 lr 0.001089 wd 0.0500 time 0.4266 (0.4225) data time 0.0008 (0.0034) model time 0.4259 (0.4199) loss 3.1425 (3.2531) grad_norm 1.2465 (1.4589) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][220/625] eta 0:02:50 lr 0.001089 wd 0.0500 time 0.4104 (0.4221) data time 0.0008 (0.0033) model time 0.4095 (0.4195) loss 4.0751 (3.2646) grad_norm 1.1406 (1.4591) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:05:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][230/625] eta 0:02:46 lr 0.001089 wd 0.0500 time 0.4127 (0.4218) data time 0.0012 (0.0032) model time 0.4115 (0.4192) loss 3.4252 (3.2695) grad_norm 1.4169 (1.4607) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][240/625] eta 0:02:42 lr 0.001089 wd 0.0500 time 0.4139 (0.4221) data time 0.0008 (0.0031) model time 0.4131 (0.4197) loss 2.6432 (3.2500) grad_norm 1.3903 (1.4586) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][250/625] eta 0:02:38 lr 0.001089 wd 0.0500 time 0.4127 (0.4219) data time 0.0010 (0.0031) model time 0.4117 (0.4195) loss 2.0803 (3.2331) grad_norm 1.2622 (1.4508) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][260/625] eta 0:02:33 lr 0.001089 wd 0.0500 time 0.4143 (0.4217) data time 0.0011 (0.0030) model time 0.4132 (0.4193) loss 3.5612 (3.2314) grad_norm 1.6769 (1.4462) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][270/625] eta 0:02:29 lr 0.001089 wd 0.0500 time 0.4168 (0.4215) data time 0.0012 (0.0029) model time 0.4156 (0.4191) loss 2.1042 (3.2326) grad_norm 1.4644 (1.4411) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][280/625] eta 0:02:25 lr 0.001089 wd 0.0500 time 0.4130 (0.4213) data time 0.0008 (0.0029) model time 0.4122 (0.4189) loss 3.2467 (3.2394) grad_norm 1.2138 (1.4394) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][290/625] eta 0:02:21 lr 0.001089 wd 0.0500 time 0.4120 (0.4212) data time 0.0011 (0.0028) model time 0.4109 (0.4188) loss 2.9488 (3.2426) grad_norm 1.0012 (1.4344) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][300/625] eta 0:02:16 lr 0.001089 wd 0.0500 time 0.4172 (0.4211) data time 0.0008 (0.0028) model time 0.4163 (0.4187) loss 3.9018 (3.2450) grad_norm 1.4556 (1.4316) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][310/625] eta 0:02:12 lr 0.001089 wd 0.0500 time 0.4192 (0.4216) data time 0.0010 (0.0027) model time 0.4182 (0.4194) loss 3.3846 (3.2361) grad_norm 1.5497 (1.4275) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][320/625] eta 0:02:08 lr 0.001088 wd 0.0500 time 0.4099 (0.4216) data time 0.0011 (0.0027) model time 0.4088 (0.4194) loss 2.2750 (3.2319) grad_norm 1.4732 (1.4283) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][330/625] eta 0:02:04 lr 0.001088 wd 0.0500 time 0.4263 (0.4214) data time 0.0008 (0.0026) model time 0.4255 (0.4192) loss 2.8869 (3.2189) grad_norm 1.9315 (1.4344) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][340/625] eta 0:02:00 lr 0.001088 wd 0.0500 time 0.4181 (0.4212) data time 0.0011 (0.0026) model time 0.4170 (0.4190) loss 3.0397 (3.2215) grad_norm 1.2798 (1.4292) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][350/625] eta 0:01:55 lr 0.001088 wd 0.0500 time 0.4404 (0.4210) data time 0.0008 (0.0025) model time 0.4396 (0.4189) loss 3.3175 (3.2246) grad_norm 1.4352 (1.4217) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][360/625] eta 0:01:51 lr 0.001088 wd 0.0500 time 0.4235 (0.4209) data time 0.0010 (0.0025) model time 0.4224 (0.4187) loss 4.0264 (3.2264) grad_norm 0.9426 (1.4202) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:06:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][370/625] eta 0:01:47 lr 0.001088 wd 0.0500 time 0.4128 (0.4207) data time 0.0009 (0.0025) model time 0.4119 (0.4185) loss 3.4739 (3.2309) grad_norm 1.3760 (1.4205) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][380/625] eta 0:01:43 lr 0.001088 wd 0.0500 time 0.4175 (0.4205) data time 0.0010 (0.0024) model time 0.4166 (0.4184) loss 2.9535 (3.2315) grad_norm 1.2372 (1.4182) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][390/625] eta 0:01:38 lr 0.001088 wd 0.0500 time 0.4245 (0.4205) data time 0.0008 (0.0024) model time 0.4236 (0.4183) loss 3.6599 (3.2311) grad_norm 1.2601 (1.4196) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][400/625] eta 0:01:34 lr 0.001088 wd 0.0500 time 0.4141 (0.4204) data time 0.0010 (0.0024) model time 0.4131 (0.4183) loss 2.2554 (3.2336) grad_norm 1.2898 (1.4187) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][410/625] eta 0:01:30 lr 0.001088 wd 0.0500 time 0.4138 (0.4204) data time 0.0012 (0.0023) model time 0.4126 (0.4183) loss 3.4739 (3.2259) grad_norm 1.3613 (1.4239) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][420/625] eta 0:01:26 lr 0.001088 wd 0.0500 time 0.4123 (0.4203) data time 0.0013 (0.0023) model time 0.4109 (0.4182) loss 3.5691 (3.2174) grad_norm 1.3030 (1.4295) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][430/625] eta 0:01:22 lr 0.001088 wd 0.0500 time 0.4146 (0.4207) data time 0.0008 (0.0023) model time 0.4139 (0.4187) loss 3.2921 (3.2153) grad_norm 1.3991 (1.4330) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][440/625] eta 0:01:17 lr 0.001088 wd 0.0500 time 0.4135 (0.4207) data time 0.0010 (0.0023) model time 0.4125 (0.4188) loss 3.3265 (3.2147) grad_norm 1.2933 (1.4361) loss_scale 4096.0000 (4096.0000) mem 16721MB [2024-08-07 12:07:29 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 12:07:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 12:07:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 12:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-07 12:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-07 12:09:52 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-07 12:10:05 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-07 12:10:05 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-07 12:10:07 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-07 12:10:09 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-07 12:10:09 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 75) [2024-08-07 12:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-07 12:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][450/625] eta 0:12:24 lr 0.001088 wd 0.0500 time 0.4672 (4.2543) data time 0.0009 (0.1402) model time 0.4663 (4.1140) loss 3.8214 (3.7015) grad_norm 2.0329 (1.5111) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][460/625] eta 0:04:45 lr 0.001088 wd 0.0500 time 0.4671 (1.7311) data time 0.0009 (0.0475) model time 0.4662 (1.6836) loss 3.9041 (3.4539) grad_norm 1.8110 (1.4429) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][470/625] eta 0:03:09 lr 0.001088 wd 0.0500 time 0.4711 (1.2257) data time 0.0011 (0.0290) model time 0.4700 (1.1967) loss 3.7087 (3.4642) grad_norm 1.5937 (1.4396) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:10:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][480/625] eta 0:02:27 lr 0.001087 wd 0.0500 time 0.4634 (1.0171) data time 0.0010 (0.0210) model time 0.4624 (0.9961) loss 3.3117 (3.4467) grad_norm 1.4798 (1.4432) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:10:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][490/625] eta 0:02:01 lr 0.001087 wd 0.0500 time 0.4648 (0.8996) data time 0.0010 (0.0166) model time 0.4638 (0.8830) loss 4.0937 (3.4176) grad_norm 1.5669 (1.4289) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][500/625] eta 0:01:42 lr 0.001087 wd 0.0500 time 0.4667 (0.8213) data time 0.0008 (0.0138) model time 0.4659 (0.8075) loss 2.7764 (3.3959) grad_norm 1.2237 (1.4367) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][510/625] eta 0:01:28 lr 0.001087 wd 0.0500 time 0.4715 (0.7670) data time 0.0010 (0.0119) model time 0.4705 (0.7551) loss 3.6002 (3.3906) grad_norm 1.3471 (1.4889) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][520/625] eta 0:01:16 lr 0.001087 wd 0.0500 time 0.4665 (0.7271) data time 0.0010 (0.0105) model time 0.4655 (0.7166) loss 2.2772 (3.3303) grad_norm 1.2959 (1.4727) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][530/625] eta 0:01:06 lr 0.001087 wd 0.0500 time 0.4688 (0.6971) data time 0.0008 (0.0093) model time 0.4680 (0.6878) loss 3.2834 (3.3272) grad_norm 1.1442 (1.4563) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][540/625] eta 0:00:57 lr 0.001087 wd 0.0500 time 0.4664 (0.6729) data time 0.0010 (0.0085) model time 0.4654 (0.6644) loss 3.8251 (3.3325) grad_norm 1.4974 (1.4575) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][550/625] eta 0:00:48 lr 0.001087 wd 0.0500 time 0.4645 (0.6532) data time 0.0009 (0.0078) model time 0.4636 (0.6455) loss 2.7724 (3.3554) grad_norm 1.2701 (1.4421) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][560/625] eta 0:00:41 lr 0.001087 wd 0.0500 time 0.4664 (0.6375) data time 0.0008 (0.0072) model time 0.4656 (0.6303) loss 3.0417 (3.3482) grad_norm 1.5165 (1.4246) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][570/625] eta 0:00:34 lr 0.001087 wd 0.0500 time 0.4712 (0.6237) data time 0.0008 (0.0067) model time 0.4703 (0.6170) loss 3.1617 (3.3480) grad_norm 2.1339 (1.4564) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][580/625] eta 0:00:27 lr 0.001087 wd 0.0500 time 0.4672 (0.6121) data time 0.0008 (0.0063) model time 0.4664 (0.6058) loss 2.7392 (3.3425) grad_norm 1.4064 (1.4598) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][590/625] eta 0:00:21 lr 0.001087 wd 0.0500 time 0.4656 (0.6023) data time 0.0009 (0.0060) model time 0.4647 (0.5963) loss 3.1275 (3.3204) grad_norm 1.4304 (1.4570) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][600/625] eta 0:00:14 lr 0.001087 wd 0.0500 time 0.4760 (0.5940) data time 0.0011 (0.0057) model time 0.4749 (0.5883) loss 3.0217 (3.3135) grad_norm 1.7264 (1.4587) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][610/625] eta 0:00:08 lr 0.001087 wd 0.0500 time 0.4631 (0.5864) data time 0.0007 (0.0054) model time 0.4623 (0.5810) loss 3.5202 (3.3116) grad_norm 1.5080 (1.4789) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [75/300][620/625] eta 0:00:02 lr 0.001087 wd 0.0500 time 0.4590 (0.5793) data time 0.0005 (0.0051) model time 0.4585 (0.5742) loss 3.5261 (3.3047) grad_norm 1.1794 (1.4767) loss_scale 4096.0000 (4096.0000) mem 16698MB [2024-08-07 12:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 75 training takes 0:01:43 [2024-08-07 12:11:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 12:12:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 12:12:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.6055 (0.6055) Acc@1 85.938 (85.938) Acc@5 98.145 (98.145) Mem 16698MB [2024-08-07 12:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.9492 (0.7377) Acc@1 77.979 (83.430) Acc@5 94.238 (96.871) Mem 16698MB [2024-08-07 12:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.142) Loss 1.0869 (0.8765) Acc@1 73.096 (80.006) Acc@5 92.920 (95.278) Mem 16698MB [2024-08-07 12:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.710 Acc@5 95.252 [2024-08-07 12:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-07 12:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.71% [2024-08-07 12:12:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 12:12:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 12:12:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.5229 (0.5229) Acc@1 88.184 (88.184) Acc@5 98.438 (98.438) Mem 16698MB [2024-08-07 12:12:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.164) Loss 0.8594 (0.6574) Acc@1 78.955 (85.099) Acc@5 95.361 (97.385) Mem 16698MB [2024-08-07 12:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.142) Loss 0.9883 (0.7843) Acc@1 75.195 (81.694) Acc@5 94.482 (96.005) Mem 16698MB [2024-08-07 12:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.420 Acc@5 96.007 [2024-08-07 12:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-07 12:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.42% [2024-08-07 12:12:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 12:12:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 12:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][0/625] eta 0:10:49 lr 0.001087 wd 0.0500 time 1.0389 (1.0389) data time 0.4030 (0.4030) model time 0.0000 (0.0000) loss 2.6916 (2.6916) grad_norm 0.9583 (0.9583) loss_scale 4096.0000 (4096.0000) mem 16710MB [2024-08-07 12:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][10/625] eta 0:05:19 lr 0.001086 wd 0.0500 time 0.4650 (0.5203) data time 0.0009 (0.0375) model time 0.0000 (0.0000) loss 2.8129 (3.2784) grad_norm 1.8493 (1.4523) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:12:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][20/625] eta 0:05:00 lr 0.001086 wd 0.0500 time 0.4732 (0.4966) data time 0.0010 (0.0203) model time 0.0000 (0.0000) loss 3.4280 (3.1695) grad_norm 1.4918 (1.5325) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:12:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][30/625] eta 0:04:51 lr 0.001086 wd 0.0500 time 0.4831 (0.4896) data time 0.0008 (0.0141) model time 0.0000 (0.0000) loss 3.3156 (3.1265) grad_norm 1.7915 (1.5278) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][40/625] eta 0:04:43 lr 0.001086 wd 0.0500 time 0.4710 (0.4849) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 3.0984 (3.1335) grad_norm 1.4197 (1.5346) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][50/625] eta 0:04:36 lr 0.001086 wd 0.0500 time 0.4694 (0.4817) data time 0.0010 (0.0089) model time 0.0000 (0.0000) loss 3.6353 (3.1713) grad_norm 1.9729 (1.5316) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][60/625] eta 0:04:31 lr 0.001086 wd 0.0500 time 0.4697 (0.4800) data time 0.0010 (0.0076) model time 0.4687 (0.4700) loss 3.7994 (3.1507) grad_norm 1.3660 (1.5106) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][70/625] eta 0:04:25 lr 0.001086 wd 0.0500 time 0.4609 (0.4781) data time 0.0008 (0.0067) model time 0.4601 (0.4678) loss 2.3931 (3.1402) grad_norm 1.3458 (1.4914) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][80/625] eta 0:04:21 lr 0.001086 wd 0.0500 time 0.6881 (0.4795) data time 0.0010 (0.0060) model time 0.6871 (0.4748) loss 2.7362 (3.1426) grad_norm 1.9782 (1.4889) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][90/625] eta 0:04:16 lr 0.001086 wd 0.0500 time 0.4683 (0.4785) data time 0.0008 (0.0055) model time 0.4675 (0.4735) loss 4.2434 (3.1513) grad_norm 1.2286 (1.4773) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][100/625] eta 0:04:11 lr 0.001086 wd 0.0500 time 0.4073 (0.4795) data time 0.0010 (0.0050) model time 0.4063 (0.4763) loss 3.7130 (3.1636) grad_norm 2.5458 (1.4872) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][110/625] eta 0:04:06 lr 0.001086 wd 0.0500 time 0.4649 (0.4789) data time 0.0009 (0.0047) model time 0.4639 (0.4754) loss 2.6512 (3.1664) grad_norm 1.5385 (1.4916) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][120/625] eta 0:04:01 lr 0.001086 wd 0.0500 time 0.4699 (0.4783) data time 0.0008 (0.0044) model time 0.4691 (0.4747) loss 3.0355 (3.1407) grad_norm 1.6916 (1.4839) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][130/625] eta 0:03:56 lr 0.001086 wd 0.0500 time 0.4701 (0.4776) data time 0.0010 (0.0041) model time 0.4691 (0.4739) loss 2.8343 (3.1334) grad_norm 1.4698 (1.4926) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][140/625] eta 0:03:51 lr 0.001086 wd 0.0500 time 0.4679 (0.4768) data time 0.0009 (0.0039) model time 0.4670 (0.4730) loss 3.1495 (3.1618) grad_norm 1.4567 (1.4901) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][150/625] eta 0:03:46 lr 0.001086 wd 0.0500 time 0.4664 (0.4763) data time 0.0007 (0.0037) model time 0.4656 (0.4724) loss 3.8634 (3.1697) grad_norm 1.9476 (1.5027) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][160/625] eta 0:03:41 lr 0.001086 wd 0.0500 time 0.4661 (0.4759) data time 0.0010 (0.0036) model time 0.4651 (0.4721) loss 3.3527 (3.1691) grad_norm 2.0215 (1.5119) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][170/625] eta 0:03:36 lr 0.001085 wd 0.0500 time 0.4706 (0.4756) data time 0.0011 (0.0034) model time 0.4695 (0.4720) loss 2.6530 (3.1724) grad_norm 0.9917 (1.5091) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][180/625] eta 0:03:31 lr 0.001085 wd 0.0500 time 0.4740 (0.4755) data time 0.0007 (0.0033) model time 0.4733 (0.4720) loss 3.8566 (3.1747) grad_norm 1.3673 (1.5160) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][190/625] eta 0:03:26 lr 0.001085 wd 0.0500 time 0.4663 (0.4752) data time 0.0009 (0.0032) model time 0.4654 (0.4718) loss 3.5248 (3.1776) grad_norm 1.1166 (1.5147) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][200/625] eta 0:03:21 lr 0.001085 wd 0.0500 time 0.4721 (0.4750) data time 0.0008 (0.0030) model time 0.4714 (0.4716) loss 3.7073 (3.1757) grad_norm 1.5678 (1.5135) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][210/625] eta 0:03:17 lr 0.001085 wd 0.0500 time 0.4657 (0.4748) data time 0.0009 (0.0029) model time 0.4648 (0.4715) loss 2.3640 (3.1607) grad_norm 1.4053 (1.5073) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][220/625] eta 0:03:12 lr 0.001085 wd 0.0500 time 0.4689 (0.4745) data time 0.0010 (0.0029) model time 0.4679 (0.4714) loss 2.9274 (3.1677) grad_norm 1.3655 (1.5138) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][230/625] eta 0:03:07 lr 0.001085 wd 0.0500 time 0.4715 (0.4743) data time 0.0007 (0.0028) model time 0.4708 (0.4712) loss 3.4055 (3.1807) grad_norm 1.3465 (1.5159) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][240/625] eta 0:03:02 lr 0.001085 wd 0.0500 time 0.4754 (0.4743) data time 0.0008 (0.0027) model time 0.4746 (0.4713) loss 3.6504 (3.1801) grad_norm 1.9242 (1.5158) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][250/625] eta 0:02:57 lr 0.001085 wd 0.0500 time 0.4656 (0.4742) data time 0.0009 (0.0026) model time 0.4647 (0.4712) loss 3.4045 (3.1940) grad_norm 1.4544 (1.5165) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][260/625] eta 0:02:53 lr 0.001085 wd 0.0500 time 0.4640 (0.4740) data time 0.0010 (0.0026) model time 0.4630 (0.4711) loss 3.1706 (3.2052) grad_norm 1.2900 (1.5121) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][270/625] eta 0:02:48 lr 0.001085 wd 0.0500 time 0.4689 (0.4738) data time 0.0008 (0.0025) model time 0.4681 (0.4709) loss 3.5268 (3.2106) grad_norm 1.1818 (1.5066) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][280/625] eta 0:02:43 lr 0.001085 wd 0.0500 time 0.4759 (0.4736) data time 0.0007 (0.0025) model time 0.4752 (0.4707) loss 3.2254 (3.2043) grad_norm 1.4176 (1.5003) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][290/625] eta 0:02:38 lr 0.001085 wd 0.0500 time 0.4689 (0.4734) data time 0.0010 (0.0024) model time 0.4679 (0.4706) loss 3.1142 (3.1964) grad_norm 1.3656 (1.4947) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][300/625] eta 0:02:33 lr 0.001085 wd 0.0500 time 0.4686 (0.4732) data time 0.0010 (0.0024) model time 0.4676 (0.4705) loss 2.4162 (3.1961) grad_norm 1.3724 (1.5044) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][310/625] eta 0:02:29 lr 0.001085 wd 0.0500 time 0.4689 (0.4731) data time 0.0007 (0.0023) model time 0.4681 (0.4704) loss 2.6308 (3.2071) grad_norm 1.0389 (1.5039) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][320/625] eta 0:02:24 lr 0.001085 wd 0.0500 time 0.4737 (0.4730) data time 0.0007 (0.0023) model time 0.4730 (0.4704) loss 2.5096 (3.2045) grad_norm 1.3652 (1.5061) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][330/625] eta 0:02:19 lr 0.001084 wd 0.0500 time 0.4632 (0.4729) data time 0.0009 (0.0022) model time 0.4623 (0.4702) loss 3.3313 (3.2095) grad_norm 1.4106 (1.4999) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][340/625] eta 0:02:14 lr 0.001084 wd 0.0500 time 0.4715 (0.4728) data time 0.0007 (0.0022) model time 0.4708 (0.4702) loss 3.4426 (3.2100) grad_norm 1.2243 (1.4941) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][350/625] eta 0:02:09 lr 0.001084 wd 0.0500 time 0.4654 (0.4726) data time 0.0010 (0.0022) model time 0.4645 (0.4701) loss 3.3248 (3.2032) grad_norm 1.3179 (1.4958) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][360/625] eta 0:02:05 lr 0.001084 wd 0.0500 time 0.4652 (0.4726) data time 0.0007 (0.0021) model time 0.4645 (0.4700) loss 3.5787 (3.2052) grad_norm 1.7189 (1.4902) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][370/625] eta 0:02:00 lr 0.001084 wd 0.0500 time 0.4661 (0.4725) data time 0.0009 (0.0021) model time 0.4651 (0.4700) loss 2.5985 (3.2075) grad_norm 1.0879 (1.4933) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][380/625] eta 0:01:55 lr 0.001084 wd 0.0500 time 0.4715 (0.4725) data time 0.0010 (0.0021) model time 0.4705 (0.4700) loss 3.0563 (3.2161) grad_norm 1.2182 (1.4907) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][390/625] eta 0:01:51 lr 0.001084 wd 0.0500 time 0.4699 (0.4724) data time 0.0010 (0.0021) model time 0.4689 (0.4700) loss 3.5753 (3.2200) grad_norm 1.5037 (1.4973) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][400/625] eta 0:01:46 lr 0.001084 wd 0.0500 time 0.4805 (0.4725) data time 0.0008 (0.0020) model time 0.4797 (0.4701) loss 3.0259 (3.2236) grad_norm 1.3920 (1.4939) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][410/625] eta 0:01:41 lr 0.001084 wd 0.0500 time 0.4760 (0.4725) data time 0.0008 (0.0020) model time 0.4752 (0.4702) loss 3.7338 (3.2259) grad_norm 1.3817 (1.4899) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][420/625] eta 0:01:36 lr 0.001084 wd 0.0500 time 0.4679 (0.4730) data time 0.0010 (0.0020) model time 0.4669 (0.4707) loss 3.4324 (3.2302) grad_norm 1.4225 (1.4861) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][430/625] eta 0:01:32 lr 0.001084 wd 0.0500 time 0.4671 (0.4728) data time 0.0008 (0.0020) model time 0.4663 (0.4706) loss 3.5033 (3.2305) grad_norm 2.1816 (1.4909) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][440/625] eta 0:01:27 lr 0.001084 wd 0.0500 time 0.4708 (0.4731) data time 0.0010 (0.0020) model time 0.4698 (0.4710) loss 3.5357 (3.2322) grad_norm 1.6873 (1.4873) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][450/625] eta 0:01:22 lr 0.001084 wd 0.0500 time 0.4688 (0.4730) data time 0.0007 (0.0019) model time 0.4681 (0.4709) loss 3.9747 (3.2373) grad_norm 1.0503 (1.4828) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][460/625] eta 0:01:18 lr 0.001084 wd 0.0500 time 0.4656 (0.4729) data time 0.0007 (0.0019) model time 0.4648 (0.4708) loss 2.2540 (3.2362) grad_norm 1.1662 (1.4786) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][470/625] eta 0:01:13 lr 0.001084 wd 0.0500 time 0.4730 (0.4729) data time 0.0011 (0.0019) model time 0.4718 (0.4708) loss 3.3369 (3.2317) grad_norm 1.4011 (1.4772) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][480/625] eta 0:01:08 lr 0.001084 wd 0.0500 time 0.4727 (0.4729) data time 0.0007 (0.0019) model time 0.4720 (0.4707) loss 3.0878 (3.2280) grad_norm 1.3171 (1.4783) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][490/625] eta 0:01:03 lr 0.001083 wd 0.0500 time 0.4655 (0.4728) data time 0.0007 (0.0019) model time 0.4648 (0.4707) loss 3.9833 (3.2289) grad_norm 1.7206 (1.4834) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][500/625] eta 0:00:59 lr 0.001083 wd 0.0500 time 0.4653 (0.4727) data time 0.0010 (0.0019) model time 0.4643 (0.4706) loss 2.9982 (3.2323) grad_norm 1.7220 (1.4872) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][510/625] eta 0:00:54 lr 0.001083 wd 0.0500 time 0.4693 (0.4726) data time 0.0008 (0.0018) model time 0.4685 (0.4705) loss 2.5634 (3.2305) grad_norm 1.4940 (1.4866) loss_scale 4096.0000 (4096.0000) mem 16708MB [2024-08-07 12:16:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][520/625] eta 0:00:49 lr 0.001083 wd 0.0500 time 0.4674 (0.4726) data time 0.0007 (0.0018) model time 0.4666 (0.4706) loss 3.0079 (3.2258) grad_norm 1.3284 (1.4837) loss_scale 8192.0000 (4135.3090) mem 16708MB [2024-08-07 12:16:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][530/625] eta 0:00:44 lr 0.001083 wd 0.0500 time 0.4774 (0.4727) data time 0.0009 (0.0018) model time 0.4765 (0.4707) loss 3.4528 (3.2275) grad_norm 1.1586 (1.4815) loss_scale 8192.0000 (4211.7062) mem 16708MB [2024-08-07 12:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][540/625] eta 0:00:40 lr 0.001083 wd 0.0500 time 0.4702 (0.4727) data time 0.0009 (0.0018) model time 0.4692 (0.4707) loss 2.6386 (3.2226) grad_norm 1.6823 (1.4811) loss_scale 8192.0000 (4285.2791) mem 16708MB [2024-08-07 12:16:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][550/625] eta 0:00:35 lr 0.001083 wd 0.0500 time 0.4716 (0.4726) data time 0.0010 (0.0018) model time 0.4706 (0.4706) loss 3.2532 (3.2244) grad_norm 1.8113 (1.4841) loss_scale 8192.0000 (4356.1815) mem 16708MB [2024-08-07 12:16:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][560/625] eta 0:00:30 lr 0.001083 wd 0.0500 time 0.4779 (0.4726) data time 0.0010 (0.0018) model time 0.4769 (0.4706) loss 3.7816 (3.2320) grad_norm 1.2467 (1.4842) loss_scale 8192.0000 (4424.5561) mem 16708MB [2024-08-07 12:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][570/625] eta 0:00:25 lr 0.001083 wd 0.0500 time 0.4662 (0.4725) data time 0.0010 (0.0018) model time 0.4652 (0.4705) loss 2.8561 (3.2298) grad_norm 1.4867 (1.4838) loss_scale 8192.0000 (4490.5359) mem 16708MB [2024-08-07 12:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][580/625] eta 0:00:21 lr 0.001083 wd 0.0500 time 0.4606 (0.4724) data time 0.0009 (0.0018) model time 0.4597 (0.4704) loss 3.6503 (3.2302) grad_norm 1.4956 (1.4808) loss_scale 8192.0000 (4554.2444) mem 16708MB [2024-08-07 12:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][590/625] eta 0:00:16 lr 0.001083 wd 0.0500 time 0.4727 (0.4723) data time 0.0010 (0.0018) model time 0.4717 (0.4704) loss 3.6475 (3.2318) grad_norm 1.0850 (1.4777) loss_scale 8192.0000 (4615.7970) mem 16708MB [2024-08-07 12:17:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][600/625] eta 0:00:11 lr 0.001083 wd 0.0500 time 0.4693 (0.4723) data time 0.0010 (0.0017) model time 0.4683 (0.4703) loss 3.4113 (3.2325) grad_norm 1.1109 (1.4745) loss_scale 8192.0000 (4675.3012) mem 16708MB [2024-08-07 12:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][610/625] eta 0:00:07 lr 0.001083 wd 0.0500 time 0.4643 (0.4726) data time 0.0007 (0.0017) model time 0.4636 (0.4706) loss 3.5355 (3.2315) grad_norm 1.3046 (1.4738) loss_scale 8192.0000 (4732.8576) mem 16708MB [2024-08-07 12:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [76/300][620/625] eta 0:00:02 lr 0.001083 wd 0.0500 time 0.4639 (0.4725) data time 0.0007 (0.0017) model time 0.4633 (0.4706) loss 2.8434 (3.2312) grad_norm 1.0823 (1.4775) loss_scale 8192.0000 (4788.5604) mem 16708MB [2024-08-07 12:17:24 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 76 training takes 0:04:55 [2024-08-07 12:17:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 12:17:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-07 12:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.546 (0.546) Loss 0.5811 (0.5811) Acc@1 87.061 (87.061) Acc@5 98.047 (98.047) Mem 16708MB [2024-08-07 12:17:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.165) Loss 0.9629 (0.7349) Acc@1 77.393 (83.518) Acc@5 94.287 (96.862) Mem 16708MB [2024-08-07 12:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.143) Loss 1.0479 (0.8743) Acc@1 74.609 (80.208) Acc@5 93.506 (95.285) Mem 16708MB [2024-08-07 12:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.878 Acc@5 95.266 [2024-08-07 12:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-07 12:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.88% [2024-08-07 12:17:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-07 12:17:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-07 12:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.541 (0.541) Loss 0.5234 (0.5234) Acc@1 88.281 (88.281) Acc@5 98.486 (98.486) Mem 16708MB [2024-08-07 12:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.164) Loss 0.8560 (0.6557) Acc@1 79.199 (85.201) Acc@5 95.361 (97.359) Mem 16708MB [2024-08-07 12:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 0.9854 (0.7818) Acc@1 75.293 (81.789) Acc@5 94.385 (95.989) Mem 16708MB [2024-08-07 12:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.512 Acc@5 95.997 [2024-08-07 12:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-07 12:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.51% [2024-08-07 12:17:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-07 12:17:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-07 12:17:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][0/625] eta 0:08:52 lr 0.001083 wd 0.0500 time 0.8522 (0.8522) data time 0.4475 (0.4475) model time 0.0000 (0.0000) loss 3.6093 (3.6093) grad_norm 1.8937 (1.8937) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][10/625] eta 0:05:09 lr 0.001083 wd 0.0500 time 0.4657 (0.5033) data time 0.0009 (0.0415) model time 0.0000 (0.0000) loss 3.5501 (2.8969) grad_norm 1.0344 (1.8135) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:17:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][20/625] eta 0:04:54 lr 0.001082 wd 0.0500 time 0.4680 (0.4866) data time 0.0007 (0.0223) model time 0.0000 (0.0000) loss 2.3375 (3.0273) grad_norm 1.2010 (1.5928) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:17:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][30/625] eta 0:04:45 lr 0.001082 wd 0.0500 time 0.4653 (0.4804) data time 0.0010 (0.0154) model time 0.0000 (0.0000) loss 2.5568 (2.9917) grad_norm 1.3001 (1.4980) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][40/625] eta 0:04:39 lr 0.001082 wd 0.0500 time 0.4768 (0.4780) data time 0.0007 (0.0119) model time 0.0000 (0.0000) loss 2.0956 (3.0096) grad_norm 1.8167 (1.4643) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][50/625] eta 0:04:34 lr 0.001082 wd 0.0500 time 0.4721 (0.4767) data time 0.0008 (0.0098) model time 0.0000 (0.0000) loss 3.4593 (3.0855) grad_norm 1.1627 (1.4505) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][60/625] eta 0:04:28 lr 0.001082 wd 0.0500 time 0.4749 (0.4755) data time 0.0010 (0.0083) model time 0.4740 (0.4686) loss 2.9559 (3.0732) grad_norm 1.2725 (1.4335) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][70/625] eta 0:04:23 lr 0.001082 wd 0.0500 time 0.4598 (0.4742) data time 0.0007 (0.0073) model time 0.4591 (0.4669) loss 2.9039 (3.1074) grad_norm 0.8666 (1.4371) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][80/625] eta 0:04:19 lr 0.001082 wd 0.0500 time 0.4704 (0.4757) data time 0.0008 (0.0065) model time 0.4696 (0.4729) loss 2.6256 (3.0908) grad_norm 1.9613 (1.4484) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][90/625] eta 0:04:14 lr 0.001082 wd 0.0500 time 0.4674 (0.4749) data time 0.0009 (0.0059) model time 0.4665 (0.4717) loss 3.1649 (3.1039) grad_norm 1.8700 (1.4589) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][100/625] eta 0:04:08 lr 0.001082 wd 0.0500 time 0.4726 (0.4743) data time 0.0007 (0.0054) model time 0.4719 (0.4708) loss 3.8788 (3.1074) grad_norm 1.0726 (1.4571) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][110/625] eta 0:04:04 lr 0.001082 wd 0.0500 time 0.4712 (0.4741) data time 0.0007 (0.0051) model time 0.4704 (0.4707) loss 3.9847 (3.1118) grad_norm 1.2521 (1.4696) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][120/625] eta 0:03:59 lr 0.001082 wd 0.0500 time 0.4844 (0.4741) data time 0.0008 (0.0048) model time 0.4836 (0.4711) loss 3.4796 (3.1462) grad_norm 1.3947 (1.4865) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][130/625] eta 0:03:54 lr 0.001082 wd 0.0500 time 0.4671 (0.4739) data time 0.0007 (0.0045) model time 0.4664 (0.4710) loss 2.2115 (3.1609) grad_norm 1.2632 (1.4845) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][140/625] eta 0:03:49 lr 0.001082 wd 0.0500 time 0.4847 (0.4739) data time 0.0008 (0.0042) model time 0.4839 (0.4712) loss 3.0571 (3.1459) grad_norm 1.1167 (1.4859) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][150/625] eta 0:03:44 lr 0.001082 wd 0.0500 time 0.4799 (0.4737) data time 0.0010 (0.0041) model time 0.4789 (0.4709) loss 3.5225 (3.1296) grad_norm 1.7534 (1.4979) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][160/625] eta 0:03:40 lr 0.001082 wd 0.0500 time 0.4621 (0.4733) data time 0.0009 (0.0039) model time 0.4612 (0.4705) loss 3.2273 (3.1414) grad_norm 1.5438 (1.4948) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][170/625] eta 0:03:35 lr 0.001082 wd 0.0500 time 0.4669 (0.4729) data time 0.0007 (0.0037) model time 0.4662 (0.4702) loss 4.0318 (3.1646) grad_norm 1.3497 (1.4833) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][180/625] eta 0:03:30 lr 0.001081 wd 0.0500 time 0.4690 (0.4728) data time 0.0010 (0.0036) model time 0.4680 (0.4701) loss 3.4824 (3.1619) grad_norm 1.3465 (1.4890) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][190/625] eta 0:03:25 lr 0.001081 wd 0.0500 time 0.4714 (0.4726) data time 0.0007 (0.0034) model time 0.4707 (0.4700) loss 3.7121 (3.1623) grad_norm 1.3452 (1.4914) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][200/625] eta 0:03:20 lr 0.001081 wd 0.0500 time 0.4766 (0.4725) data time 0.0007 (0.0033) model time 0.4759 (0.4700) loss 2.2528 (3.1619) grad_norm 1.8725 (1.4945) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][210/625] eta 0:03:16 lr 0.001081 wd 0.0500 time 0.4749 (0.4725) data time 0.0008 (0.0032) model time 0.4741 (0.4701) loss 3.9123 (3.1842) grad_norm 1.4377 (1.4925) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][220/625] eta 0:03:11 lr 0.001081 wd 0.0500 time 0.4620 (0.4725) data time 0.0008 (0.0031) model time 0.4612 (0.4701) loss 2.2540 (3.1645) grad_norm 1.6679 (1.4936) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][230/625] eta 0:03:06 lr 0.001081 wd 0.0500 time 0.4683 (0.4724) data time 0.0006 (0.0030) model time 0.4676 (0.4701) loss 4.1010 (3.1649) grad_norm 1.2862 (1.5000) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][240/625] eta 0:03:01 lr 0.001081 wd 0.0500 time 0.5425 (0.4726) data time 0.0010 (0.0029) model time 0.5415 (0.4704) loss 3.0723 (3.1735) grad_norm 1.0646 (1.5065) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][250/625] eta 0:02:57 lr 0.001081 wd 0.0500 time 0.4713 (0.4725) data time 0.0008 (0.0029) model time 0.4706 (0.4703) loss 3.7367 (3.1764) grad_norm 1.2362 (1.5020) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][260/625] eta 0:02:52 lr 0.001081 wd 0.0500 time 0.4722 (0.4724) data time 0.0008 (0.0028) model time 0.4714 (0.4703) loss 3.9527 (3.1820) grad_norm 1.2057 (1.4934) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][270/625] eta 0:02:47 lr 0.001081 wd 0.0500 time 0.4687 (0.4724) data time 0.0009 (0.0027) model time 0.4678 (0.4703) loss 2.7598 (3.1730) grad_norm 1.5190 (1.4955) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][280/625] eta 0:02:42 lr 0.001081 wd 0.0500 time 0.4682 (0.4722) data time 0.0007 (0.0027) model time 0.4674 (0.4701) loss 2.2241 (3.1735) grad_norm 1.7821 (1.4926) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][290/625] eta 0:02:38 lr 0.001081 wd 0.0500 time 0.4717 (0.4721) data time 0.0007 (0.0026) model time 0.4710 (0.4700) loss 2.5551 (3.1723) grad_norm 1.5316 (1.4902) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][300/625] eta 0:02:33 lr 0.001081 wd 0.0500 time 0.4681 (0.4721) data time 0.0007 (0.0025) model time 0.4674 (0.4701) loss 3.3098 (3.1770) grad_norm 1.6261 (1.4933) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][310/625] eta 0:02:28 lr 0.001081 wd 0.0500 time 0.4633 (0.4727) data time 0.0007 (0.0025) model time 0.4626 (0.4708) loss 3.9166 (3.1829) grad_norm 1.1589 (1.4863) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][320/625] eta 0:02:24 lr 0.001081 wd 0.0500 time 0.4641 (0.4725) data time 0.0007 (0.0024) model time 0.4634 (0.4706) loss 3.7776 (3.1862) grad_norm 1.1262 (1.4860) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][330/625] eta 0:02:19 lr 0.001080 wd 0.0500 time 0.4724 (0.4725) data time 0.0009 (0.0024) model time 0.4714 (0.4706) loss 3.1412 (3.1896) grad_norm 1.3655 (1.4845) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][340/625] eta 0:02:14 lr 0.001080 wd 0.0500 time 0.4730 (0.4725) data time 0.0009 (0.0024) model time 0.4722 (0.4707) loss 3.1450 (3.1911) grad_norm 1.3026 (1.4830) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][350/625] eta 0:02:09 lr 0.001080 wd 0.0500 time 0.4713 (0.4725) data time 0.0011 (0.0023) model time 0.4702 (0.4707) loss 3.1937 (3.1929) grad_norm 2.2966 (1.4796) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][360/625] eta 0:02:05 lr 0.001080 wd 0.0500 time 0.4719 (0.4724) data time 0.0009 (0.0023) model time 0.4710 (0.4706) loss 3.1129 (3.1852) grad_norm 1.1920 (1.4801) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][370/625] eta 0:02:00 lr 0.001080 wd 0.0500 time 0.4713 (0.4723) data time 0.0007 (0.0023) model time 0.4706 (0.4705) loss 3.7031 (3.1947) grad_norm 2.1621 (1.4840) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][380/625] eta 0:01:55 lr 0.001080 wd 0.0500 time 0.4587 (0.4722) data time 0.0008 (0.0022) model time 0.4578 (0.4704) loss 3.6482 (3.2040) grad_norm 1.8214 (1.4832) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][390/625] eta 0:01:50 lr 0.001080 wd 0.0500 time 0.4681 (0.4721) data time 0.0008 (0.0022) model time 0.4674 (0.4704) loss 2.9267 (3.2022) grad_norm 1.5115 (1.4788) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][400/625] eta 0:01:46 lr 0.001080 wd 0.0500 time 0.4699 (0.4720) data time 0.0007 (0.0022) model time 0.4692 (0.4703) loss 3.6824 (3.2091) grad_norm 2.1853 (1.4790) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][410/625] eta 0:01:41 lr 0.001080 wd 0.0500 time 0.4767 (0.4720) data time 0.0008 (0.0022) model time 0.4759 (0.4703) loss 3.6300 (3.2108) grad_norm 1.4248 (1.4762) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:20:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][420/625] eta 0:01:36 lr 0.001080 wd 0.0500 time 0.4900 (0.4726) data time 0.0010 (0.0022) model time 0.4890 (0.4709) loss 3.7753 (3.2076) grad_norm 1.1347 (1.4731) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:21:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][430/625] eta 0:01:32 lr 0.001080 wd 0.0500 time 0.4740 (0.4725) data time 0.0010 (0.0021) model time 0.4730 (0.4709) loss 3.7045 (3.2045) grad_norm 1.6954 (1.4703) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:21:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][440/625] eta 0:01:27 lr 0.001080 wd 0.0500 time 0.4663 (0.4724) data time 0.0010 (0.0021) model time 0.4654 (0.4708) loss 2.7109 (3.2084) grad_norm 1.1739 (1.4659) loss_scale 8192.0000 (8192.0000) mem 16708MB [2024-08-07 12:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-07 12:21:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-07 12:21:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 07:53:33 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 07:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 07:53:44 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 07:54:01 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 07:54:01 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 07:54:03 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 07:54:05 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 07:54:05 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 77) [2024-08-10 07:54:06 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 07:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][450/625] eta 0:17:00 lr 0.001080 wd 0.0500 time 0.4693 (5.8311) data time 0.0008 (0.1973) model time 0.4685 (5.6338) loss 4.1442 (3.5557) grad_norm 1.5359 (1.2992) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][460/625] eta 0:05:29 lr 0.001080 wd 0.0500 time 0.4668 (1.9986) data time 0.0008 (0.0572) model time 0.4660 (1.9413) loss 3.4009 (3.4324) grad_norm 1.2789 (1.5858) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][470/625] eta 0:03:30 lr 0.001080 wd 0.0500 time 0.4626 (1.3581) data time 0.0010 (0.0339) model time 0.4616 (1.3242) loss 3.1814 (3.4271) grad_norm 1.0980 (1.5169) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][480/625] eta 0:02:39 lr 0.001080 wd 0.0500 time 0.4598 (1.1027) data time 0.0009 (0.0242) model time 0.4590 (1.0785) loss 2.9755 (3.4177) grad_norm 1.0831 (1.4665) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][490/625] eta 0:02:09 lr 0.001079 wd 0.0500 time 0.4563 (0.9614) data time 0.0008 (0.0190) model time 0.4555 (0.9425) loss 3.4437 (3.3878) grad_norm 1.9981 (1.4651) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][500/625] eta 0:01:48 lr 0.001079 wd 0.0500 time 0.4601 (0.8689) data time 0.0008 (0.0156) model time 0.4593 (0.8532) loss 4.0030 (3.3920) grad_norm 1.9838 (1.4638) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][510/625] eta 0:01:32 lr 0.001079 wd 0.0500 time 0.4684 (0.8059) data time 0.0008 (0.0134) model time 0.4676 (0.7925) loss 3.5034 (3.3538) grad_norm 1.1253 (1.4851) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][520/625] eta 0:01:19 lr 0.001079 wd 0.0500 time 0.4600 (0.7598) data time 0.0008 (0.0117) model time 0.4592 (0.7481) loss 3.4790 (3.3164) grad_norm 1.1528 (1.4729) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][530/625] eta 0:01:08 lr 0.001079 wd 0.0500 time 0.4662 (0.7247) data time 0.0010 (0.0104) model time 0.4652 (0.7142) loss 3.6254 (3.2917) grad_norm 1.4003 (1.4507) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][540/625] eta 0:00:59 lr 0.001079 wd 0.0500 time 0.4620 (0.6969) data time 0.0010 (0.0094) model time 0.4610 (0.6875) loss 2.9507 (3.2878) grad_norm 2.0575 (1.4720) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][550/625] eta 0:00:50 lr 0.001079 wd 0.0500 time 0.4687 (0.6744) data time 0.0011 (0.0086) model time 0.4676 (0.6658) loss 3.6881 (3.3182) grad_norm 1.9427 (1.4967) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][560/625] eta 0:00:42 lr 0.001079 wd 0.0500 time 0.4640 (0.6559) data time 0.0011 (0.0080) model time 0.4629 (0.6479) loss 3.4161 (3.3035) grad_norm 1.1574 (1.4857) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][570/625] eta 0:00:35 lr 0.001079 wd 0.0500 time 0.4646 (0.6403) data time 0.0011 (0.0074) model time 0.4635 (0.6329) loss 2.6489 (3.3003) grad_norm 1.0943 (1.4731) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][580/625] eta 0:00:28 lr 0.001079 wd 0.0500 time 0.4621 (0.6272) data time 0.0010 (0.0069) model time 0.4610 (0.6202) loss 3.6031 (3.3024) grad_norm 1.3499 (1.4716) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][590/625] eta 0:00:21 lr 0.001079 wd 0.0500 time 0.4678 (0.6161) data time 0.0010 (0.0065) model time 0.4668 (0.6095) loss 3.0774 (3.2859) grad_norm 1.4236 (1.4624) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][600/625] eta 0:00:15 lr 0.001079 wd 0.0500 time 0.4716 (0.6064) data time 0.0008 (0.0062) model time 0.4708 (0.6002) loss 3.4230 (3.2834) grad_norm 1.9222 (1.4709) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][610/625] eta 0:00:08 lr 0.001079 wd 0.0500 time 0.4576 (0.5978) data time 0.0005 (0.0059) model time 0.4570 (0.5919) loss 3.5996 (3.2824) grad_norm 1.1330 (1.4660) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [77/300][620/625] eta 0:00:02 lr 0.001079 wd 0.0500 time 0.4559 (0.5898) data time 0.0008 (0.0056) model time 0.4551 (0.5842) loss 2.3880 (3.2745) grad_norm 1.1986 (1.4657) loss_scale 8192.0000 (8192.0000) mem 16711MB [2024-08-10 07:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 77 training takes 0:01:44 [2024-08-10 07:55:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 07:56:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 07:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.612 (0.612) Loss 0.6060 (0.6060) Acc@1 86.475 (86.475) Acc@5 97.852 (97.852) Mem 16711MB [2024-08-10 07:56:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.173) Loss 0.9653 (0.7353) Acc@1 77.100 (83.683) Acc@5 94.385 (96.853) Mem 16711MB [2024-08-10 07:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.147) Loss 1.1250 (0.8812) Acc@1 71.582 (79.964) Acc@5 92.676 (95.131) Mem 16711MB [2024-08-10 07:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.653 Acc@5 95.156 [2024-08-10 07:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.7% [2024-08-10 07:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.900 (0.900) Loss 0.5210 (0.5210) Acc@1 88.330 (88.330) Acc@5 98.535 (98.535) Mem 16711MB [2024-08-10 07:56:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.197) Loss 0.8535 (0.6544) Acc@1 79.346 (85.210) Acc@5 95.312 (97.359) Mem 16711MB [2024-08-10 07:56:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.9834 (0.7796) Acc@1 75.049 (81.836) Acc@5 94.482 (96.017) Mem 16711MB [2024-08-10 07:56:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.560 Acc@5 96.025 [2024-08-10 07:56:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-10 07:56:12 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.56% [2024-08-10 07:56:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 07:56:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 07:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][0/625] eta 0:10:54 lr 0.001079 wd 0.0500 time 1.0470 (1.0470) data time 0.4234 (0.4234) model time 0.0000 (0.0000) loss 3.4675 (3.4675) grad_norm 1.0569 (1.0569) loss_scale 8192.0000 (8192.0000) mem 16710MB [2024-08-10 07:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][10/625] eta 0:05:17 lr 0.001079 wd 0.0500 time 0.4628 (0.5164) data time 0.0008 (0.0395) model time 0.0000 (0.0000) loss 3.5790 (3.2684) grad_norm 1.8034 (1.6771) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][20/625] eta 0:04:57 lr 0.001078 wd 0.0500 time 0.4656 (0.4920) data time 0.0011 (0.0212) model time 0.0000 (0.0000) loss 2.3677 (3.0804) grad_norm 1.3794 (1.5176) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][30/625] eta 0:04:47 lr 0.001078 wd 0.0500 time 0.4656 (0.4836) data time 0.0009 (0.0147) model time 0.0000 (0.0000) loss 3.4274 (3.0798) grad_norm 1.1959 (1.4655) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][40/625] eta 0:04:40 lr 0.001078 wd 0.0500 time 0.4629 (0.4790) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 3.4841 (3.0928) grad_norm 1.5171 (1.4522) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][50/625] eta 0:04:33 lr 0.001078 wd 0.0500 time 0.4605 (0.4760) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 2.3272 (3.0989) grad_norm 1.7507 (1.4933) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][60/625] eta 0:04:28 lr 0.001078 wd 0.0500 time 0.4631 (0.4745) data time 0.0009 (0.0081) model time 0.4622 (0.4656) loss 2.5769 (3.1073) grad_norm 1.1458 (1.4872) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][70/625] eta 0:04:22 lr 0.001078 wd 0.0500 time 0.4632 (0.4731) data time 0.0008 (0.0071) model time 0.4625 (0.4648) loss 3.4361 (3.1122) grad_norm 2.1392 (1.4974) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][80/625] eta 0:04:17 lr 0.001078 wd 0.0500 time 0.4663 (0.4723) data time 0.0010 (0.0063) model time 0.4653 (0.4649) loss 3.6478 (3.0974) grad_norm 1.2680 (1.4998) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:56:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][90/625] eta 0:04:13 lr 0.001078 wd 0.0500 time 0.4670 (0.4743) data time 0.0008 (0.0059) model time 0.4662 (0.4708) loss 2.1351 (3.0770) grad_norm 1.8033 (1.4911) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][100/625] eta 0:04:08 lr 0.001078 wd 0.0500 time 0.4639 (0.4741) data time 0.0010 (0.0054) model time 0.4629 (0.4708) loss 3.6042 (3.1100) grad_norm 1.1651 (1.4756) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][110/625] eta 0:04:03 lr 0.001078 wd 0.0500 time 0.4658 (0.4730) data time 0.0010 (0.0050) model time 0.4647 (0.4691) loss 2.9255 (3.1161) grad_norm 1.0278 (1.4710) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][120/625] eta 0:03:58 lr 0.001078 wd 0.0500 time 0.4612 (0.4725) data time 0.0010 (0.0047) model time 0.4602 (0.4686) loss 3.6046 (3.0890) grad_norm 1.2366 (1.4599) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][130/625] eta 0:03:53 lr 0.001078 wd 0.0500 time 0.4674 (0.4717) data time 0.0010 (0.0044) model time 0.4664 (0.4677) loss 3.8341 (3.0896) grad_norm 1.1103 (1.4554) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][140/625] eta 0:03:49 lr 0.001078 wd 0.0500 time 0.6806 (0.4729) data time 0.0008 (0.0042) model time 0.6798 (0.4700) loss 3.8982 (3.1142) grad_norm 1.1855 (1.5004) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][150/625] eta 0:03:44 lr 0.001078 wd 0.0500 time 0.4879 (0.4721) data time 0.0011 (0.0040) model time 0.4869 (0.4690) loss 1.9892 (3.1273) grad_norm 1.4676 (1.5107) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][160/625] eta 0:03:39 lr 0.001078 wd 0.0500 time 0.4688 (0.4719) data time 0.0011 (0.0039) model time 0.4677 (0.4686) loss 2.7957 (3.1287) grad_norm 1.2287 (1.5181) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][170/625] eta 0:03:34 lr 0.001078 wd 0.0500 time 0.4693 (0.4717) data time 0.0008 (0.0038) model time 0.4686 (0.4685) loss 3.3752 (3.1407) grad_norm 1.2642 (1.5180) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][180/625] eta 0:03:29 lr 0.001077 wd 0.0500 time 0.4646 (0.4714) data time 0.0008 (0.0036) model time 0.4638 (0.4682) loss 2.8873 (3.1448) grad_norm 1.3956 (1.5136) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][190/625] eta 0:03:24 lr 0.001077 wd 0.0500 time 0.4632 (0.4710) data time 0.0011 (0.0035) model time 0.4622 (0.4679) loss 3.8272 (3.1500) grad_norm 1.1152 (1.5071) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][200/625] eta 0:03:20 lr 0.001077 wd 0.0500 time 0.4615 (0.4706) data time 0.0008 (0.0034) model time 0.4607 (0.4674) loss 3.3192 (3.1560) grad_norm 1.5805 (1.5051) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][210/625] eta 0:03:15 lr 0.001077 wd 0.0500 time 0.4660 (0.4703) data time 0.0008 (0.0033) model time 0.4652 (0.4671) loss 3.3303 (3.1479) grad_norm 1.4138 (1.5031) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][220/625] eta 0:03:10 lr 0.001077 wd 0.0500 time 0.4640 (0.4699) data time 0.0007 (0.0032) model time 0.4632 (0.4668) loss 3.5927 (3.1589) grad_norm 1.6811 (1.5083) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][230/625] eta 0:03:05 lr 0.001077 wd 0.0500 time 0.4634 (0.4696) data time 0.0010 (0.0031) model time 0.4624 (0.4665) loss 3.0480 (3.1680) grad_norm 1.9217 (1.5037) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][240/625] eta 0:03:00 lr 0.001077 wd 0.0500 time 0.4664 (0.4694) data time 0.0008 (0.0030) model time 0.4656 (0.4663) loss 2.3058 (3.1601) grad_norm 1.6273 (1.5128) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][250/625] eta 0:02:55 lr 0.001077 wd 0.0500 time 0.4686 (0.4693) data time 0.0008 (0.0029) model time 0.4678 (0.4663) loss 3.1444 (3.1720) grad_norm 1.9057 (1.5154) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][260/625] eta 0:02:51 lr 0.001077 wd 0.0500 time 0.4625 (0.4692) data time 0.0010 (0.0029) model time 0.4614 (0.4662) loss 3.3644 (3.1803) grad_norm 1.5222 (1.5117) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][270/625] eta 0:02:46 lr 0.001077 wd 0.0500 time 0.4604 (0.4690) data time 0.0010 (0.0028) model time 0.4594 (0.4661) loss 3.2639 (3.1772) grad_norm 1.2582 (1.5081) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][280/625] eta 0:02:41 lr 0.001077 wd 0.0500 time 0.4620 (0.4687) data time 0.0010 (0.0027) model time 0.4609 (0.4659) loss 3.2396 (3.1732) grad_norm 1.1712 (1.5124) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][290/625] eta 0:02:36 lr 0.001077 wd 0.0500 time 0.4681 (0.4685) data time 0.0008 (0.0027) model time 0.4673 (0.4657) loss 2.5006 (3.1668) grad_norm 1.7474 (1.5117) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][300/625] eta 0:02:32 lr 0.001077 wd 0.0500 time 0.4662 (0.4683) data time 0.0010 (0.0026) model time 0.4652 (0.4655) loss 3.8848 (3.1672) grad_norm 1.5389 (1.5070) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][310/625] eta 0:02:27 lr 0.001077 wd 0.0500 time 0.4608 (0.4683) data time 0.0010 (0.0026) model time 0.4597 (0.4655) loss 3.2041 (3.1733) grad_norm 2.1107 (1.5021) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][320/625] eta 0:02:22 lr 0.001077 wd 0.0500 time 0.4664 (0.4682) data time 0.0008 (0.0025) model time 0.4655 (0.4654) loss 3.4736 (3.1733) grad_norm 1.0240 (1.5035) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][330/625] eta 0:02:18 lr 0.001076 wd 0.0500 time 0.4646 (0.4681) data time 0.0008 (0.0025) model time 0.4639 (0.4654) loss 4.0058 (3.1750) grad_norm 1.3345 (1.5070) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][340/625] eta 0:02:13 lr 0.001076 wd 0.0500 time 0.4586 (0.4679) data time 0.0010 (0.0024) model time 0.4576 (0.4653) loss 2.0890 (3.1775) grad_norm 2.7592 (1.5135) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:58:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][350/625] eta 0:02:08 lr 0.001076 wd 0.0500 time 0.4664 (0.4678) data time 0.0007 (0.0024) model time 0.4656 (0.4652) loss 3.6073 (3.1700) grad_norm 1.1133 (1.5153) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][360/625] eta 0:02:03 lr 0.001076 wd 0.0500 time 0.4639 (0.4677) data time 0.0010 (0.0024) model time 0.4629 (0.4651) loss 3.7155 (3.1698) grad_norm 1.9182 (1.5147) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][370/625] eta 0:01:59 lr 0.001076 wd 0.0500 time 0.4595 (0.4680) data time 0.0008 (0.0023) model time 0.4587 (0.4655) loss 4.4976 (3.1749) grad_norm 1.0044 (1.5074) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][380/625] eta 0:01:54 lr 0.001076 wd 0.0500 time 0.4630 (0.4678) data time 0.0008 (0.0023) model time 0.4622 (0.4654) loss 4.0179 (3.1798) grad_norm 1.1677 (1.5008) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][390/625] eta 0:01:49 lr 0.001076 wd 0.0500 time 0.4638 (0.4678) data time 0.0010 (0.0023) model time 0.4628 (0.4654) loss 3.1263 (3.1845) grad_norm 1.5770 (1.4987) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][400/625] eta 0:01:45 lr 0.001076 wd 0.0500 time 0.4719 (0.4677) data time 0.0008 (0.0022) model time 0.4711 (0.4653) loss 3.8064 (3.1901) grad_norm 1.7829 (1.4993) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][410/625] eta 0:01:40 lr 0.001076 wd 0.0500 time 0.4678 (0.4677) data time 0.0011 (0.0022) model time 0.4667 (0.4653) loss 2.4407 (3.1909) grad_norm 1.2831 (1.4983) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][420/625] eta 0:01:35 lr 0.001076 wd 0.0500 time 0.4624 (0.4681) data time 0.0009 (0.0022) model time 0.4615 (0.4658) loss 2.8512 (3.1902) grad_norm 1.9823 (1.4975) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][430/625] eta 0:01:31 lr 0.001076 wd 0.0500 time 0.4653 (0.4680) data time 0.0008 (0.0022) model time 0.4644 (0.4658) loss 3.4992 (3.1905) grad_norm 1.1948 (1.4961) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][440/625] eta 0:01:26 lr 0.001076 wd 0.0500 time 0.4595 (0.4679) data time 0.0009 (0.0021) model time 0.4586 (0.4657) loss 3.4035 (3.1940) grad_norm 1.0798 (1.4949) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][450/625] eta 0:01:21 lr 0.001076 wd 0.0500 time 0.4622 (0.4678) data time 0.0011 (0.0021) model time 0.4611 (0.4656) loss 2.9618 (3.1970) grad_norm 1.5663 (1.4970) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][460/625] eta 0:01:17 lr 0.001076 wd 0.0500 time 0.4619 (0.4677) data time 0.0008 (0.0021) model time 0.4611 (0.4655) loss 3.8722 (3.2001) grad_norm 1.0702 (1.4983) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][470/625] eta 0:01:12 lr 0.001076 wd 0.0500 time 0.4730 (0.4677) data time 0.0008 (0.0021) model time 0.4722 (0.4655) loss 2.8618 (3.1950) grad_norm 1.5094 (1.4979) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 07:59:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][480/625] eta 0:01:07 lr 0.001075 wd 0.0500 time 0.4673 (0.4676) data time 0.0009 (0.0020) model time 0.4664 (0.4655) loss 2.8686 (3.1926) grad_norm 1.7128 (1.4966) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 08:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][490/625] eta 0:01:03 lr 0.001075 wd 0.0500 time 0.4588 (0.4676) data time 0.0008 (0.0020) model time 0.4580 (0.4654) loss 3.4524 (3.1929) grad_norm 1.7589 (1.4960) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 08:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][500/625] eta 0:00:58 lr 0.001075 wd 0.0500 time 0.4575 (0.4675) data time 0.0008 (0.0020) model time 0.4567 (0.4653) loss 2.8146 (3.1957) grad_norm 1.3168 (1.4954) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 08:00:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][510/625] eta 0:00:53 lr 0.001075 wd 0.0500 time 0.4692 (0.4677) data time 0.0011 (0.0020) model time 0.4681 (0.4656) loss 2.5167 (3.1978) grad_norm 1.6612 (1.4972) loss_scale 8192.0000 (8192.0000) mem 16715MB [2024-08-10 08:00:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][520/625] eta 0:00:49 lr 0.001075 wd 0.0500 time 0.4630 (0.4676) data time 0.0010 (0.0020) model time 0.4619 (0.4655) loss 2.1977 (3.1925) grad_norm 1.1499 (inf) loss_scale 4096.0000 (8152.6910) mem 16715MB [2024-08-10 08:00:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][530/625] eta 0:00:44 lr 0.001075 wd 0.0500 time 0.4726 (0.4676) data time 0.0011 (0.0020) model time 0.4714 (0.4655) loss 2.4382 (3.1904) grad_norm 1.4598 (inf) loss_scale 4096.0000 (8076.2938) mem 16715MB [2024-08-10 08:00:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][540/625] eta 0:00:39 lr 0.001075 wd 0.0500 time 0.4703 (0.4676) data time 0.0008 (0.0019) model time 0.4695 (0.4655) loss 3.3511 (3.1856) grad_norm 1.3994 (inf) loss_scale 4096.0000 (8002.7209) mem 16715MB [2024-08-10 08:00:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][550/625] eta 0:00:35 lr 0.001075 wd 0.0500 time 0.4686 (0.4676) data time 0.0010 (0.0019) model time 0.4676 (0.4656) loss 3.9385 (3.1848) grad_norm 1.6723 (inf) loss_scale 4096.0000 (7931.8185) mem 16715MB [2024-08-10 08:00:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][560/625] eta 0:00:30 lr 0.001075 wd 0.0500 time 0.4651 (0.4675) data time 0.0011 (0.0019) model time 0.4641 (0.4655) loss 3.3283 (3.1918) grad_norm 1.2543 (inf) loss_scale 4096.0000 (7863.4439) mem 16715MB [2024-08-10 08:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][570/625] eta 0:00:25 lr 0.001075 wd 0.0500 time 0.4637 (0.4675) data time 0.0011 (0.0019) model time 0.4627 (0.4655) loss 3.3335 (3.1904) grad_norm 1.5681 (inf) loss_scale 4096.0000 (7797.4641) mem 16715MB [2024-08-10 08:00:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][580/625] eta 0:00:21 lr 0.001075 wd 0.0500 time 0.4605 (0.4674) data time 0.0011 (0.0019) model time 0.4594 (0.4654) loss 3.7461 (3.1900) grad_norm 1.4217 (inf) loss_scale 4096.0000 (7733.7556) mem 16715MB [2024-08-10 08:00:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][590/625] eta 0:00:16 lr 0.001075 wd 0.0500 time 0.4654 (0.4674) data time 0.0009 (0.0019) model time 0.4646 (0.4654) loss 2.8587 (3.1906) grad_norm 3.3121 (inf) loss_scale 4096.0000 (7672.2030) mem 16715MB [2024-08-10 08:00:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][600/625] eta 0:00:11 lr 0.001075 wd 0.0500 time 0.4695 (0.4673) data time 0.0009 (0.0019) model time 0.4686 (0.4654) loss 2.5124 (3.1946) grad_norm 1.1672 (inf) loss_scale 4096.0000 (7612.6988) mem 16715MB [2024-08-10 08:00:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][610/625] eta 0:00:07 lr 0.001075 wd 0.0500 time 0.4615 (0.4677) data time 0.0009 (0.0018) model time 0.4606 (0.4657) loss 3.3284 (3.1977) grad_norm 2.2371 (inf) loss_scale 4096.0000 (7555.1424) mem 16715MB [2024-08-10 08:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [78/300][620/625] eta 0:00:02 lr 0.001075 wd 0.0500 time 0.4581 (0.4676) data time 0.0007 (0.0018) model time 0.4574 (0.4656) loss 3.6014 (3.1982) grad_norm 1.2002 (inf) loss_scale 4096.0000 (7499.4396) mem 16715MB [2024-08-10 08:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 78 training takes 0:04:52 [2024-08-10 08:01:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:01:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.582 (0.582) Loss 0.6260 (0.6260) Acc@1 86.523 (86.523) Acc@5 97.900 (97.900) Mem 16715MB [2024-08-10 08:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.166) Loss 1.0449 (0.7600) Acc@1 76.270 (83.594) Acc@5 93.750 (96.768) Mem 16715MB [2024-08-10 08:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.143) Loss 1.1631 (0.8972) Acc@1 72.705 (80.120) Acc@5 92.578 (95.187) Mem 16715MB [2024-08-10 08:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.796 Acc@5 95.210 [2024-08-10 08:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-10 08:01:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.870 (0.870) Loss 0.5200 (0.5200) Acc@1 88.379 (88.379) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 08:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.196) Loss 0.8516 (0.6528) Acc@1 79.150 (85.223) Acc@5 95.312 (97.377) Mem 16715MB [2024-08-10 08:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.159) Loss 0.9810 (0.7775) Acc@1 75.146 (81.868) Acc@5 94.531 (96.017) Mem 16715MB [2024-08-10 08:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.592 Acc@5 96.023 [2024-08-10 08:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-10 08:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.59% [2024-08-10 08:01:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:01:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][0/625] eta 0:08:56 lr 0.001075 wd 0.0500 time 0.8587 (0.8587) data time 0.4451 (0.4451) model time 0.0000 (0.0000) loss 2.9793 (2.9793) grad_norm 1.2866 (1.2866) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][10/625] eta 0:05:06 lr 0.001074 wd 0.0500 time 0.4664 (0.4987) data time 0.0009 (0.0414) model time 0.0000 (0.0000) loss 3.5755 (2.9314) grad_norm 1.9470 (1.8939) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][20/625] eta 0:04:52 lr 0.001074 wd 0.0500 time 0.4645 (0.4828) data time 0.0011 (0.0222) model time 0.0000 (0.0000) loss 3.1794 (3.0713) grad_norm 1.5045 (1.7173) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][30/625] eta 0:04:43 lr 0.001074 wd 0.0500 time 0.4630 (0.4765) data time 0.0008 (0.0154) model time 0.0000 (0.0000) loss 2.4268 (3.0268) grad_norm 1.0236 (1.6486) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][40/625] eta 0:04:37 lr 0.001074 wd 0.0500 time 0.4653 (0.4738) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 1.8880 (3.0334) grad_norm 2.1975 (1.6082) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][50/625] eta 0:04:31 lr 0.001074 wd 0.0500 time 0.4659 (0.4722) data time 0.0010 (0.0097) model time 0.0000 (0.0000) loss 3.1748 (3.0469) grad_norm 0.9535 (1.5619) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][60/625] eta 0:04:27 lr 0.001074 wd 0.0500 time 0.4632 (0.4740) data time 0.0010 (0.0083) model time 0.4622 (0.4826) loss 3.7876 (3.0638) grad_norm 1.0351 (1.5168) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][70/625] eta 0:04:22 lr 0.001074 wd 0.0500 time 0.4626 (0.4728) data time 0.0010 (0.0073) model time 0.4617 (0.4735) loss 3.1898 (3.0961) grad_norm 1.5159 (1.5199) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][80/625] eta 0:04:17 lr 0.001074 wd 0.0500 time 0.4602 (0.4719) data time 0.0008 (0.0065) model time 0.4594 (0.4703) loss 3.9896 (3.1021) grad_norm 1.6232 (1.5441) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][90/625] eta 0:04:12 lr 0.001074 wd 0.0500 time 0.4628 (0.4710) data time 0.0011 (0.0059) model time 0.4617 (0.4686) loss 3.5369 (3.1055) grad_norm 1.2684 (1.5648) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][100/625] eta 0:04:06 lr 0.001074 wd 0.0500 time 0.4606 (0.4704) data time 0.0008 (0.0055) model time 0.4599 (0.4676) loss 2.4299 (3.0996) grad_norm 1.2229 (1.5389) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][110/625] eta 0:04:02 lr 0.001074 wd 0.0500 time 0.4631 (0.4700) data time 0.0008 (0.0051) model time 0.4623 (0.4671) loss 2.7808 (3.0977) grad_norm 1.8574 (1.5441) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][120/625] eta 0:03:57 lr 0.001074 wd 0.0500 time 0.4683 (0.4698) data time 0.0008 (0.0047) model time 0.4674 (0.4670) loss 3.8707 (3.1237) grad_norm 1.3950 (1.5469) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][130/625] eta 0:03:52 lr 0.001074 wd 0.0500 time 0.4660 (0.4695) data time 0.0011 (0.0045) model time 0.4649 (0.4668) loss 2.6693 (3.1457) grad_norm 1.2977 (1.5281) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][140/625] eta 0:03:47 lr 0.001074 wd 0.0500 time 0.4591 (0.4692) data time 0.0011 (0.0042) model time 0.4580 (0.4664) loss 3.0311 (3.1331) grad_norm 2.1005 (1.5482) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][150/625] eta 0:03:42 lr 0.001074 wd 0.0500 time 0.4659 (0.4689) data time 0.0007 (0.0040) model time 0.4651 (0.4661) loss 2.2122 (3.1227) grad_norm 1.1031 (1.5404) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][160/625] eta 0:03:37 lr 0.001073 wd 0.0500 time 0.4659 (0.4686) data time 0.0009 (0.0038) model time 0.4649 (0.4659) loss 3.0409 (3.1364) grad_norm 0.9024 (1.5325) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][170/625] eta 0:03:33 lr 0.001073 wd 0.0500 time 0.4621 (0.4684) data time 0.0011 (0.0037) model time 0.4610 (0.4656) loss 3.2901 (3.1493) grad_norm 1.4619 (1.5197) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][180/625] eta 0:03:28 lr 0.001073 wd 0.0500 time 0.4633 (0.4682) data time 0.0010 (0.0035) model time 0.4623 (0.4655) loss 3.5516 (3.1520) grad_norm 1.0020 (1.5351) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][190/625] eta 0:03:23 lr 0.001073 wd 0.0500 time 0.4688 (0.4680) data time 0.0010 (0.0034) model time 0.4678 (0.4654) loss 2.7990 (3.1534) grad_norm 2.1329 (1.5491) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:02:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][200/625] eta 0:03:18 lr 0.001073 wd 0.0500 time 0.4615 (0.4679) data time 0.0008 (0.0033) model time 0.4607 (0.4653) loss 2.8846 (3.1585) grad_norm 1.8522 (inf) loss_scale 2048.0000 (4014.4876) mem 16715MB [2024-08-10 08:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][210/625] eta 0:03:14 lr 0.001073 wd 0.0500 time 0.4619 (0.4678) data time 0.0007 (0.0032) model time 0.4612 (0.4653) loss 3.1836 (3.1672) grad_norm 1.5950 (inf) loss_scale 2048.0000 (3921.2891) mem 16715MB [2024-08-10 08:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][220/625] eta 0:03:09 lr 0.001073 wd 0.0500 time 0.4828 (0.4677) data time 0.0010 (0.0031) model time 0.4818 (0.4652) loss 2.6428 (3.1539) grad_norm 1.1768 (inf) loss_scale 2048.0000 (3836.5249) mem 16715MB [2024-08-10 08:03:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][230/625] eta 0:03:04 lr 0.001073 wd 0.0500 time 0.4635 (0.4675) data time 0.0008 (0.0030) model time 0.4628 (0.4651) loss 3.5165 (3.1498) grad_norm 1.2710 (inf) loss_scale 2048.0000 (3759.0996) mem 16715MB [2024-08-10 08:03:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][240/625] eta 0:02:59 lr 0.001073 wd 0.0500 time 0.4797 (0.4674) data time 0.0010 (0.0029) model time 0.4786 (0.4650) loss 2.1054 (3.1621) grad_norm 1.7727 (inf) loss_scale 2048.0000 (3688.0996) mem 16715MB [2024-08-10 08:03:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][250/625] eta 0:02:55 lr 0.001073 wd 0.0500 time 0.4682 (0.4673) data time 0.0010 (0.0028) model time 0.4672 (0.4650) loss 3.3670 (3.1638) grad_norm 1.5427 (inf) loss_scale 2048.0000 (3622.7570) mem 16715MB [2024-08-10 08:03:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][260/625] eta 0:02:50 lr 0.001073 wd 0.0500 time 0.4652 (0.4676) data time 0.0010 (0.0028) model time 0.4642 (0.4654) loss 3.4641 (3.1722) grad_norm 1.4304 (inf) loss_scale 2048.0000 (3562.4215) mem 16715MB [2024-08-10 08:03:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][270/625] eta 0:02:45 lr 0.001073 wd 0.0500 time 0.4674 (0.4675) data time 0.0010 (0.0027) model time 0.4664 (0.4654) loss 3.6766 (3.1639) grad_norm 1.5506 (inf) loss_scale 2048.0000 (3506.5387) mem 16715MB [2024-08-10 08:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][280/625] eta 0:02:41 lr 0.001073 wd 0.0500 time 0.4674 (0.4675) data time 0.0011 (0.0027) model time 0.4663 (0.4654) loss 3.0376 (3.1712) grad_norm 1.2904 (inf) loss_scale 2048.0000 (3454.6335) mem 16715MB [2024-08-10 08:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][290/625] eta 0:02:36 lr 0.001073 wd 0.0500 time 0.4786 (0.4676) data time 0.0010 (0.0026) model time 0.4776 (0.4655) loss 2.7385 (3.1653) grad_norm 1.4475 (inf) loss_scale 2048.0000 (3406.2955) mem 16715MB [2024-08-10 08:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][300/625] eta 0:02:31 lr 0.001073 wd 0.0500 time 0.4626 (0.4675) data time 0.0010 (0.0026) model time 0.4616 (0.4654) loss 3.7046 (3.1670) grad_norm 2.0310 (inf) loss_scale 2048.0000 (3361.1694) mem 16715MB [2024-08-10 08:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][310/625] eta 0:02:27 lr 0.001072 wd 0.0500 time 0.4672 (0.4681) data time 0.0013 (0.0025) model time 0.4659 (0.4663) loss 3.3728 (3.1677) grad_norm 1.3335 (inf) loss_scale 2048.0000 (3318.9453) mem 16715MB [2024-08-10 08:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][320/625] eta 0:02:22 lr 0.001072 wd 0.0500 time 0.4613 (0.4680) data time 0.0010 (0.0025) model time 0.4603 (0.4662) loss 2.5654 (3.1683) grad_norm 1.2812 (inf) loss_scale 2048.0000 (3279.3520) mem 16715MB [2024-08-10 08:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][330/625] eta 0:02:18 lr 0.001072 wd 0.0500 time 0.4721 (0.4679) data time 0.0008 (0.0024) model time 0.4713 (0.4661) loss 3.2812 (3.1725) grad_norm 1.1558 (inf) loss_scale 2048.0000 (3242.1511) mem 16715MB [2024-08-10 08:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][340/625] eta 0:02:13 lr 0.001072 wd 0.0500 time 0.4619 (0.4679) data time 0.0011 (0.0024) model time 0.4608 (0.4660) loss 3.1854 (3.1746) grad_norm 1.4484 (inf) loss_scale 2048.0000 (3207.1320) mem 16715MB [2024-08-10 08:04:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][350/625] eta 0:02:08 lr 0.001072 wd 0.0500 time 0.4629 (0.4679) data time 0.0010 (0.0023) model time 0.4620 (0.4661) loss 2.4074 (3.1761) grad_norm 1.3667 (inf) loss_scale 2048.0000 (3174.1083) mem 16715MB [2024-08-10 08:04:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][360/625] eta 0:02:03 lr 0.001072 wd 0.0500 time 0.4632 (0.4678) data time 0.0008 (0.0023) model time 0.4623 (0.4660) loss 2.3014 (3.1725) grad_norm 1.4194 (inf) loss_scale 2048.0000 (3142.9141) mem 16715MB [2024-08-10 08:04:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][370/625] eta 0:01:59 lr 0.001072 wd 0.0500 time 0.4737 (0.4678) data time 0.0009 (0.0023) model time 0.4729 (0.4660) loss 3.8639 (3.1811) grad_norm 1.4341 (inf) loss_scale 2048.0000 (3113.4016) mem 16715MB [2024-08-10 08:04:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][380/625] eta 0:01:54 lr 0.001072 wd 0.0500 time 0.4595 (0.4677) data time 0.0011 (0.0022) model time 0.4585 (0.4659) loss 3.6604 (3.1903) grad_norm 1.3524 (inf) loss_scale 2048.0000 (3085.4383) mem 16715MB [2024-08-10 08:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][390/625] eta 0:01:49 lr 0.001072 wd 0.0500 time 0.4623 (0.4676) data time 0.0010 (0.0022) model time 0.4613 (0.4658) loss 3.5654 (3.1951) grad_norm 1.5226 (inf) loss_scale 2048.0000 (3058.9054) mem 16715MB [2024-08-10 08:04:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][400/625] eta 0:01:45 lr 0.001072 wd 0.0500 time 0.4655 (0.4680) data time 0.0010 (0.0022) model time 0.4645 (0.4663) loss 3.4835 (3.1997) grad_norm 1.4943 (inf) loss_scale 2048.0000 (3033.6958) mem 16715MB [2024-08-10 08:04:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][410/625] eta 0:01:40 lr 0.001072 wd 0.0500 time 0.4698 (0.4680) data time 0.0011 (0.0022) model time 0.4687 (0.4663) loss 3.5698 (3.2005) grad_norm 1.1424 (inf) loss_scale 2048.0000 (3009.7129) mem 16715MB [2024-08-10 08:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][420/625] eta 0:01:35 lr 0.001072 wd 0.0500 time 0.4659 (0.4680) data time 0.0010 (0.0022) model time 0.4649 (0.4663) loss 3.2581 (3.1985) grad_norm 2.8748 (inf) loss_scale 2048.0000 (2986.8694) mem 16715MB [2024-08-10 08:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][430/625] eta 0:01:31 lr 0.001072 wd 0.0500 time 0.4629 (0.4679) data time 0.0011 (0.0021) model time 0.4618 (0.4663) loss 2.8936 (3.1935) grad_norm 1.2253 (inf) loss_scale 2048.0000 (2965.0858) mem 16715MB [2024-08-10 08:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][440/625] eta 0:01:26 lr 0.001072 wd 0.0500 time 0.4629 (0.4679) data time 0.0010 (0.0021) model time 0.4619 (0.4662) loss 2.8913 (3.1965) grad_norm 1.5308 (inf) loss_scale 2048.0000 (2944.2902) mem 16715MB [2024-08-10 08:04:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][450/625] eta 0:01:21 lr 0.001072 wd 0.0500 time 0.4617 (0.4678) data time 0.0010 (0.0021) model time 0.4607 (0.4661) loss 3.0393 (3.1911) grad_norm 1.8426 (inf) loss_scale 2048.0000 (2924.4169) mem 16715MB [2024-08-10 08:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][460/625] eta 0:01:17 lr 0.001072 wd 0.0500 time 0.4635 (0.4677) data time 0.0010 (0.0021) model time 0.4625 (0.4660) loss 3.6032 (3.1949) grad_norm 1.2621 (inf) loss_scale 2048.0000 (2905.4056) mem 16715MB [2024-08-10 08:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][470/625] eta 0:01:12 lr 0.001071 wd 0.0500 time 0.4645 (0.4676) data time 0.0008 (0.0020) model time 0.4636 (0.4659) loss 3.7747 (3.1975) grad_norm 1.6651 (inf) loss_scale 2048.0000 (2887.2017) mem 16715MB [2024-08-10 08:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][480/625] eta 0:01:07 lr 0.001071 wd 0.0500 time 0.4668 (0.4676) data time 0.0011 (0.0020) model time 0.4657 (0.4659) loss 2.4826 (3.2020) grad_norm 1.8795 (inf) loss_scale 2048.0000 (2869.7547) mem 16715MB [2024-08-10 08:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][490/625] eta 0:01:03 lr 0.001071 wd 0.0500 time 0.4636 (0.4675) data time 0.0008 (0.0020) model time 0.4628 (0.4659) loss 3.3638 (3.2047) grad_norm 1.9078 (inf) loss_scale 2048.0000 (2853.0183) mem 16715MB [2024-08-10 08:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][500/625] eta 0:00:58 lr 0.001071 wd 0.0500 time 0.4614 (0.4678) data time 0.0008 (0.0020) model time 0.4606 (0.4662) loss 3.4283 (3.2067) grad_norm 1.0549 (inf) loss_scale 2048.0000 (2836.9501) mem 16715MB [2024-08-10 08:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][510/625] eta 0:00:53 lr 0.001071 wd 0.0500 time 0.4672 (0.4677) data time 0.0008 (0.0020) model time 0.4664 (0.4661) loss 3.6131 (3.2026) grad_norm 1.3935 (inf) loss_scale 2048.0000 (2821.5108) mem 16715MB [2024-08-10 08:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][520/625] eta 0:00:49 lr 0.001071 wd 0.0500 time 0.4639 (0.4676) data time 0.0009 (0.0020) model time 0.4630 (0.4660) loss 2.8267 (3.1961) grad_norm 1.4503 (inf) loss_scale 2048.0000 (2806.6641) mem 16715MB [2024-08-10 08:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][530/625] eta 0:00:44 lr 0.001071 wd 0.0500 time 0.4576 (0.4676) data time 0.0011 (0.0019) model time 0.4565 (0.4661) loss 3.5238 (3.2008) grad_norm 1.5641 (inf) loss_scale 2048.0000 (2792.3766) mem 16715MB [2024-08-10 08:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][540/625] eta 0:00:39 lr 0.001071 wd 0.0500 time 0.4585 (0.4675) data time 0.0010 (0.0019) model time 0.4576 (0.4660) loss 3.1715 (3.2006) grad_norm 1.4735 (inf) loss_scale 2048.0000 (2778.6174) mem 16715MB [2024-08-10 08:05:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][550/625] eta 0:00:35 lr 0.001071 wd 0.0500 time 0.4720 (0.4675) data time 0.0011 (0.0019) model time 0.4710 (0.4659) loss 3.2960 (3.2043) grad_norm 3.3189 (inf) loss_scale 2048.0000 (2765.3575) mem 16715MB [2024-08-10 08:05:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][560/625] eta 0:00:30 lr 0.001071 wd 0.0500 time 0.4707 (0.4675) data time 0.0010 (0.0019) model time 0.4698 (0.4659) loss 3.5637 (3.2088) grad_norm 0.9639 (inf) loss_scale 2048.0000 (2752.5704) mem 16715MB [2024-08-10 08:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][570/625] eta 0:00:25 lr 0.001071 wd 0.0500 time 0.4681 (0.4675) data time 0.0010 (0.0019) model time 0.4672 (0.4659) loss 3.6549 (3.2120) grad_norm 1.1974 (inf) loss_scale 2048.0000 (2740.2312) mem 16715MB [2024-08-10 08:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][580/625] eta 0:00:21 lr 0.001071 wd 0.0500 time 0.5997 (0.4677) data time 0.0010 (0.0019) model time 0.5987 (0.4661) loss 2.9576 (3.2138) grad_norm 0.8673 (inf) loss_scale 2048.0000 (2728.3167) mem 16715MB [2024-08-10 08:05:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][590/625] eta 0:00:16 lr 0.001071 wd 0.0500 time 0.4659 (0.4675) data time 0.0008 (0.0019) model time 0.4651 (0.4660) loss 3.5801 (3.2143) grad_norm 1.4778 (inf) loss_scale 2048.0000 (2716.8054) mem 16715MB [2024-08-10 08:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][600/625] eta 0:00:11 lr 0.001071 wd 0.0500 time 0.4621 (0.4675) data time 0.0007 (0.0019) model time 0.4614 (0.4659) loss 3.5973 (3.2174) grad_norm 1.3452 (inf) loss_scale 2048.0000 (2705.6772) mem 16715MB [2024-08-10 08:06:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][610/625] eta 0:00:07 lr 0.001071 wd 0.0500 time 0.4539 (0.4676) data time 0.0008 (0.0018) model time 0.4532 (0.4660) loss 3.8179 (3.2200) grad_norm 1.6689 (inf) loss_scale 2048.0000 (2694.9133) mem 16715MB [2024-08-10 08:06:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [79/300][620/625] eta 0:00:02 lr 0.001070 wd 0.0500 time 0.4590 (0.4674) data time 0.0008 (0.0018) model time 0.4582 (0.4658) loss 3.1627 (3.2193) grad_norm 1.0932 (inf) loss_scale 2048.0000 (2684.4960) mem 16715MB [2024-08-10 08:06:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 79 training takes 0:04:52 [2024-08-10 08:06:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:06:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5923 (0.5923) Acc@1 87.305 (87.305) Acc@5 97.852 (97.852) Mem 16715MB [2024-08-10 08:06:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 1.0000 (0.7316) Acc@1 76.074 (83.474) Acc@5 93.994 (96.915) Mem 16715MB [2024-08-10 08:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.1504 (0.8721) Acc@1 71.973 (80.139) Acc@5 92.725 (95.368) Mem 16715MB [2024-08-10 08:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.788 Acc@5 95.365 [2024-08-10 08:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-10 08:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.009 (1.009) Loss 0.5195 (0.5195) Acc@1 88.232 (88.232) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 08:06:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.206) Loss 0.8506 (0.6513) Acc@1 79.199 (85.245) Acc@5 95.361 (97.385) Mem 16715MB [2024-08-10 08:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.164) Loss 0.9785 (0.7754) Acc@1 75.244 (81.922) Acc@5 94.531 (96.077) Mem 16715MB [2024-08-10 08:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.650 Acc@5 96.085 [2024-08-10 08:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-10 08:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.65% [2024-08-10 08:06:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:06:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][0/625] eta 0:08:15 lr 0.001070 wd 0.0500 time 0.7924 (0.7924) data time 0.3796 (0.3796) model time 0.0000 (0.0000) loss 3.3029 (3.3029) grad_norm 1.6935 (1.6935) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][10/625] eta 0:05:10 lr 0.001070 wd 0.0500 time 0.4710 (0.5042) data time 0.0011 (0.0360) model time 0.0000 (0.0000) loss 2.3544 (3.1630) grad_norm 0.9868 (1.4113) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][20/625] eta 0:04:56 lr 0.001070 wd 0.0500 time 0.4620 (0.4904) data time 0.0009 (0.0194) model time 0.0000 (0.0000) loss 2.7292 (3.2751) grad_norm 1.3004 (1.4245) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][30/625] eta 0:04:47 lr 0.001070 wd 0.0500 time 0.4749 (0.4828) data time 0.0011 (0.0135) model time 0.0000 (0.0000) loss 2.3350 (3.2859) grad_norm 1.3094 (1.5148) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][40/625] eta 0:04:40 lr 0.001070 wd 0.0500 time 0.4598 (0.4797) data time 0.0010 (0.0105) model time 0.0000 (0.0000) loss 3.1955 (3.3005) grad_norm 1.7212 (1.6249) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][50/625] eta 0:04:34 lr 0.001070 wd 0.0500 time 0.4566 (0.4769) data time 0.0008 (0.0086) model time 0.0000 (0.0000) loss 3.0760 (3.2486) grad_norm 1.5923 (1.5812) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][60/625] eta 0:04:28 lr 0.001070 wd 0.0500 time 0.4698 (0.4750) data time 0.0008 (0.0074) model time 0.4690 (0.4641) loss 3.8925 (3.1905) grad_norm 1.5211 (1.6025) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][70/625] eta 0:04:22 lr 0.001070 wd 0.0500 time 0.4648 (0.4738) data time 0.0009 (0.0065) model time 0.4640 (0.4648) loss 3.1440 (3.1555) grad_norm 1.6003 (1.5811) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][80/625] eta 0:04:18 lr 0.001070 wd 0.0500 time 0.6540 (0.4751) data time 0.0012 (0.0058) model time 0.6529 (0.4711) loss 2.6748 (3.1811) grad_norm 1.2732 (1.5572) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][90/625] eta 0:04:13 lr 0.001070 wd 0.0500 time 0.4637 (0.4742) data time 0.0010 (0.0053) model time 0.4627 (0.4697) loss 3.4729 (3.1954) grad_norm 1.2721 (1.5407) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][100/625] eta 0:04:08 lr 0.001070 wd 0.0500 time 0.4627 (0.4730) data time 0.0010 (0.0049) model time 0.4617 (0.4679) loss 2.9877 (3.2020) grad_norm 1.4405 (1.5295) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][110/625] eta 0:04:03 lr 0.001070 wd 0.0500 time 0.4586 (0.4728) data time 0.0012 (0.0046) model time 0.4574 (0.4682) loss 2.9310 (3.2008) grad_norm 1.5278 (1.5427) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][120/625] eta 0:03:58 lr 0.001070 wd 0.0500 time 0.4635 (0.4721) data time 0.0010 (0.0043) model time 0.4624 (0.4675) loss 2.0220 (3.1792) grad_norm 1.7097 (1.5344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][130/625] eta 0:03:53 lr 0.001070 wd 0.0500 time 0.4653 (0.4715) data time 0.0010 (0.0040) model time 0.4643 (0.4669) loss 3.3298 (3.1858) grad_norm 1.5596 (1.5341) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][140/625] eta 0:03:48 lr 0.001069 wd 0.0500 time 0.4624 (0.4711) data time 0.0011 (0.0038) model time 0.4612 (0.4667) loss 3.6406 (3.1602) grad_norm 1.2260 (1.5499) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][150/625] eta 0:03:43 lr 0.001069 wd 0.0500 time 0.4591 (0.4707) data time 0.0010 (0.0036) model time 0.4581 (0.4665) loss 3.3516 (3.1644) grad_norm 2.1002 (1.5629) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][160/625] eta 0:03:38 lr 0.001069 wd 0.0500 time 0.4630 (0.4704) data time 0.0008 (0.0035) model time 0.4622 (0.4664) loss 3.6461 (3.1698) grad_norm 1.7355 (1.5642) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][170/625] eta 0:03:33 lr 0.001069 wd 0.0500 time 0.4653 (0.4701) data time 0.0010 (0.0033) model time 0.4643 (0.4661) loss 3.7581 (3.1836) grad_norm 1.6370 (1.5600) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][180/625] eta 0:03:29 lr 0.001069 wd 0.0500 time 0.4711 (0.4698) data time 0.0010 (0.0032) model time 0.4701 (0.4660) loss 3.6306 (3.1963) grad_norm 1.2126 (1.5544) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][190/625] eta 0:03:24 lr 0.001069 wd 0.0500 time 0.4595 (0.4695) data time 0.0010 (0.0031) model time 0.4585 (0.4658) loss 2.3871 (3.1934) grad_norm 1.5577 (1.5596) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][200/625] eta 0:03:19 lr 0.001069 wd 0.0500 time 0.4573 (0.4691) data time 0.0011 (0.0030) model time 0.4563 (0.4654) loss 3.3166 (3.1961) grad_norm 1.7617 (1.5563) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:07:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][210/625] eta 0:03:14 lr 0.001069 wd 0.0500 time 0.4722 (0.4689) data time 0.0008 (0.0029) model time 0.4714 (0.4653) loss 2.6828 (3.2021) grad_norm 1.6922 (1.5494) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][220/625] eta 0:03:09 lr 0.001069 wd 0.0500 time 0.4678 (0.4687) data time 0.0008 (0.0028) model time 0.4671 (0.4652) loss 4.1312 (3.1826) grad_norm 1.5476 (1.5512) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][230/625] eta 0:03:05 lr 0.001069 wd 0.0500 time 0.4644 (0.4687) data time 0.0008 (0.0027) model time 0.4636 (0.4653) loss 3.1474 (3.1640) grad_norm 0.9851 (1.5419) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][240/625] eta 0:03:00 lr 0.001069 wd 0.0500 time 0.4632 (0.4686) data time 0.0008 (0.0027) model time 0.4624 (0.4652) loss 3.9362 (3.1623) grad_norm 1.7709 (1.5373) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][250/625] eta 0:02:55 lr 0.001069 wd 0.0500 time 0.4618 (0.4684) data time 0.0008 (0.0026) model time 0.4610 (0.4652) loss 2.9567 (3.1620) grad_norm 1.2938 (1.5324) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][260/625] eta 0:02:50 lr 0.001069 wd 0.0500 time 0.4622 (0.4681) data time 0.0010 (0.0025) model time 0.4612 (0.4650) loss 3.5833 (3.1722) grad_norm 1.7871 (1.5290) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][270/625] eta 0:02:46 lr 0.001069 wd 0.0500 time 0.4581 (0.4679) data time 0.0010 (0.0025) model time 0.4571 (0.4647) loss 3.6111 (3.1783) grad_norm 1.3979 (1.5343) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][280/625] eta 0:02:41 lr 0.001069 wd 0.0500 time 0.4662 (0.4677) data time 0.0007 (0.0024) model time 0.4655 (0.4646) loss 2.8596 (3.1827) grad_norm 1.1419 (1.5341) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][290/625] eta 0:02:36 lr 0.001068 wd 0.0500 time 0.4631 (0.4676) data time 0.0009 (0.0024) model time 0.4622 (0.4645) loss 3.3202 (3.1798) grad_norm 1.6193 (1.5274) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][300/625] eta 0:02:32 lr 0.001068 wd 0.0500 time 0.6800 (0.4682) data time 0.0010 (0.0023) model time 0.6790 (0.4654) loss 3.7620 (3.1845) grad_norm 1.1821 (1.5287) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][310/625] eta 0:02:27 lr 0.001068 wd 0.0500 time 0.4622 (0.4680) data time 0.0010 (0.0023) model time 0.4612 (0.4652) loss 2.9847 (3.1879) grad_norm 1.4184 (1.5207) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][320/625] eta 0:02:22 lr 0.001068 wd 0.0500 time 0.4583 (0.4679) data time 0.0008 (0.0023) model time 0.4574 (0.4651) loss 2.5242 (3.1808) grad_norm 0.9743 (1.5176) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][330/625] eta 0:02:18 lr 0.001068 wd 0.0500 time 0.4616 (0.4684) data time 0.0010 (0.0022) model time 0.4606 (0.4657) loss 3.3308 (3.1700) grad_norm 7.1339 (1.5378) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][340/625] eta 0:02:13 lr 0.001068 wd 0.0500 time 0.4599 (0.4682) data time 0.0011 (0.0022) model time 0.4588 (0.4656) loss 3.5471 (3.1774) grad_norm 2.0563 (1.5386) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][350/625] eta 0:02:08 lr 0.001068 wd 0.0500 time 0.4607 (0.4681) data time 0.0011 (0.0022) model time 0.4597 (0.4655) loss 2.5593 (3.1785) grad_norm 1.3595 (1.5420) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][360/625] eta 0:02:04 lr 0.001068 wd 0.0500 time 0.4768 (0.4681) data time 0.0008 (0.0021) model time 0.4759 (0.4656) loss 2.9810 (3.1783) grad_norm 1.0998 (1.5381) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][370/625] eta 0:01:59 lr 0.001068 wd 0.0500 time 0.4669 (0.4680) data time 0.0008 (0.0021) model time 0.4661 (0.4656) loss 2.6034 (3.1855) grad_norm 1.3904 (1.5336) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][380/625] eta 0:01:54 lr 0.001068 wd 0.0500 time 0.4881 (0.4680) data time 0.0010 (0.0021) model time 0.4871 (0.4656) loss 2.8575 (3.1866) grad_norm 1.0794 (1.5293) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][390/625] eta 0:01:49 lr 0.001068 wd 0.0500 time 0.4618 (0.4680) data time 0.0010 (0.0021) model time 0.4608 (0.4655) loss 3.1054 (3.1837) grad_norm 2.0741 (1.5315) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][400/625] eta 0:01:45 lr 0.001068 wd 0.0500 time 0.4610 (0.4678) data time 0.0010 (0.0020) model time 0.4600 (0.4654) loss 3.0554 (3.1884) grad_norm 1.0427 (1.5309) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][410/625] eta 0:01:40 lr 0.001068 wd 0.0500 time 0.4599 (0.4677) data time 0.0012 (0.0020) model time 0.4588 (0.4654) loss 3.2995 (3.1885) grad_norm 1.3408 (1.5337) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][420/625] eta 0:01:35 lr 0.001068 wd 0.0500 time 0.4647 (0.4678) data time 0.0008 (0.0020) model time 0.4640 (0.4654) loss 3.9029 (3.1849) grad_norm 1.1754 (1.5333) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][430/625] eta 0:01:31 lr 0.001068 wd 0.0500 time 0.4676 (0.4678) data time 0.0008 (0.0020) model time 0.4668 (0.4655) loss 3.4903 (3.1839) grad_norm 1.7939 (1.5344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][440/625] eta 0:01:26 lr 0.001067 wd 0.0500 time 0.4657 (0.4677) data time 0.0008 (0.0019) model time 0.4649 (0.4655) loss 1.7978 (3.1777) grad_norm 1.3103 (1.5343) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][450/625] eta 0:01:21 lr 0.001067 wd 0.0500 time 0.4644 (0.4682) data time 0.0008 (0.0019) model time 0.4637 (0.4660) loss 3.4896 (3.1861) grad_norm 1.5387 (1.5369) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][460/625] eta 0:01:17 lr 0.001067 wd 0.0500 time 0.4604 (0.4681) data time 0.0008 (0.0019) model time 0.4597 (0.4660) loss 1.9612 (3.1861) grad_norm 1.7680 (1.5371) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][470/625] eta 0:01:12 lr 0.001067 wd 0.0500 time 0.4614 (0.4684) data time 0.0007 (0.0019) model time 0.4607 (0.4663) loss 3.2660 (3.1826) grad_norm 1.2475 (1.5346) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][480/625] eta 0:01:07 lr 0.001067 wd 0.0500 time 0.4695 (0.4684) data time 0.0010 (0.0019) model time 0.4685 (0.4662) loss 2.8745 (3.1842) grad_norm 1.6230 (1.5351) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][490/625] eta 0:01:03 lr 0.001067 wd 0.0500 time 0.4583 (0.4682) data time 0.0011 (0.0019) model time 0.4572 (0.4661) loss 3.1750 (3.1780) grad_norm 1.6151 (1.5389) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][500/625] eta 0:00:58 lr 0.001067 wd 0.0500 time 0.4652 (0.4683) data time 0.0008 (0.0019) model time 0.4644 (0.4663) loss 3.4609 (3.1774) grad_norm 1.5362 (1.5354) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][510/625] eta 0:00:53 lr 0.001067 wd 0.0500 time 0.4667 (0.4683) data time 0.0008 (0.0019) model time 0.4659 (0.4662) loss 2.7757 (3.1781) grad_norm 1.2587 (1.5292) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][520/625] eta 0:00:49 lr 0.001067 wd 0.0500 time 0.4636 (0.4683) data time 0.0011 (0.0019) model time 0.4625 (0.4662) loss 3.3134 (3.1761) grad_norm 2.3001 (1.5290) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][530/625] eta 0:00:44 lr 0.001067 wd 0.0500 time 0.4603 (0.4682) data time 0.0008 (0.0018) model time 0.4595 (0.4662) loss 3.3592 (3.1786) grad_norm 1.9143 (1.5350) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][540/625] eta 0:00:39 lr 0.001067 wd 0.0500 time 0.4634 (0.4681) data time 0.0009 (0.0018) model time 0.4625 (0.4661) loss 2.3966 (3.1749) grad_norm 1.3719 (1.5355) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][550/625] eta 0:00:35 lr 0.001067 wd 0.0500 time 0.4610 (0.4680) data time 0.0008 (0.0018) model time 0.4602 (0.4660) loss 2.3944 (3.1769) grad_norm 1.4773 (1.5355) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][560/625] eta 0:00:30 lr 0.001067 wd 0.0500 time 0.4644 (0.4680) data time 0.0010 (0.0018) model time 0.4635 (0.4660) loss 3.5602 (3.1724) grad_norm 1.4913 (1.5341) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][570/625] eta 0:00:25 lr 0.001067 wd 0.0500 time 0.4608 (0.4679) data time 0.0010 (0.0018) model time 0.4598 (0.4659) loss 3.6336 (3.1724) grad_norm 1.3750 (1.5334) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][580/625] eta 0:00:21 lr 0.001067 wd 0.0500 time 0.4750 (0.4679) data time 0.0008 (0.0018) model time 0.4743 (0.4659) loss 3.4963 (3.1769) grad_norm 1.7419 (1.5365) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:10:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][590/625] eta 0:00:16 lr 0.001066 wd 0.0500 time 0.4655 (0.4679) data time 0.0010 (0.0018) model time 0.4646 (0.4659) loss 3.5000 (3.1768) grad_norm 1.6335 (1.5344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][600/625] eta 0:00:11 lr 0.001066 wd 0.0500 time 0.4683 (0.4678) data time 0.0010 (0.0018) model time 0.4673 (0.4659) loss 3.6660 (3.1800) grad_norm 1.2413 (1.5311) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][610/625] eta 0:00:07 lr 0.001066 wd 0.0500 time 0.4591 (0.4678) data time 0.0005 (0.0018) model time 0.4585 (0.4658) loss 4.1099 (3.1817) grad_norm 2.0265 (1.5270) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [80/300][620/625] eta 0:00:02 lr 0.001066 wd 0.0500 time 0.4626 (0.4677) data time 0.0007 (0.0017) model time 0.4619 (0.4657) loss 2.6669 (3.1797) grad_norm 1.2219 (1.5331) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:11 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 80 training takes 0:04:52 [2024-08-10 08:11:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:11:13 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:11:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5791 (0.5791) Acc@1 87.549 (87.549) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 08:11:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9331 (0.7329) Acc@1 77.344 (83.625) Acc@5 95.020 (97.030) Mem 16715MB [2024-08-10 08:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0830 (0.8744) Acc@1 73.828 (80.090) Acc@5 92.871 (95.373) Mem 16715MB [2024-08-10 08:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.850 Acc@5 95.349 [2024-08-10 08:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-10 08:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.883 (0.883) Loss 0.5190 (0.5190) Acc@1 88.281 (88.281) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 08:11:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.8477 (0.6501) Acc@1 79.053 (85.285) Acc@5 95.361 (97.390) Mem 16715MB [2024-08-10 08:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.9761 (0.7739) Acc@1 75.586 (81.994) Acc@5 94.482 (96.080) Mem 16715MB [2024-08-10 08:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.718 Acc@5 96.091 [2024-08-10 08:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-10 08:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.72% [2024-08-10 08:11:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:11:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:11:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][0/625] eta 0:08:28 lr 0.001066 wd 0.0500 time 0.8136 (0.8136) data time 0.4000 (0.4000) model time 0.0000 (0.0000) loss 3.7127 (3.7127) grad_norm 1.7018 (1.7018) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][10/625] eta 0:05:05 lr 0.001066 wd 0.0500 time 0.4706 (0.4971) data time 0.0009 (0.0384) model time 0.0000 (0.0000) loss 3.5987 (3.1094) grad_norm 1.3479 (1.4928) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][20/625] eta 0:04:51 lr 0.001066 wd 0.0500 time 0.4680 (0.4819) data time 0.0009 (0.0206) model time 0.0000 (0.0000) loss 2.9159 (3.2091) grad_norm 1.7247 (1.4240) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][30/625] eta 0:04:43 lr 0.001066 wd 0.0500 time 0.4610 (0.4767) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 3.8075 (3.1858) grad_norm 1.4454 (1.4947) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][40/625] eta 0:04:37 lr 0.001066 wd 0.0500 time 0.4756 (0.4741) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 2.6291 (3.2142) grad_norm 1.2303 (1.4655) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][50/625] eta 0:04:33 lr 0.001066 wd 0.0500 time 0.4658 (0.4751) data time 0.0010 (0.0091) model time 0.0000 (0.0000) loss 3.1599 (3.1980) grad_norm 1.2583 (1.4486) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][60/625] eta 0:04:27 lr 0.001066 wd 0.0500 time 0.4666 (0.4733) data time 0.0009 (0.0078) model time 0.4657 (0.4631) loss 3.5066 (3.1857) grad_norm 1.2826 (1.4474) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][70/625] eta 0:04:23 lr 0.001066 wd 0.0500 time 0.4636 (0.4751) data time 0.0008 (0.0068) model time 0.4627 (0.4739) loss 3.5779 (3.1759) grad_norm 1.0455 (1.4339) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][80/625] eta 0:04:18 lr 0.001066 wd 0.0500 time 0.4723 (0.4739) data time 0.0008 (0.0061) model time 0.4716 (0.4707) loss 3.4931 (3.2019) grad_norm 1.3251 (1.4620) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][90/625] eta 0:04:12 lr 0.001066 wd 0.0500 time 0.4597 (0.4728) data time 0.0008 (0.0056) model time 0.4589 (0.4687) loss 3.1850 (3.1997) grad_norm 1.6727 (1.4689) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][100/625] eta 0:04:07 lr 0.001066 wd 0.0500 time 0.4668 (0.4720) data time 0.0008 (0.0051) model time 0.4660 (0.4677) loss 3.9881 (3.2149) grad_norm 1.3088 (1.4813) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][110/625] eta 0:04:02 lr 0.001065 wd 0.0500 time 0.4684 (0.4714) data time 0.0010 (0.0047) model time 0.4674 (0.4672) loss 3.4709 (3.2143) grad_norm 1.3509 (1.4738) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][120/625] eta 0:03:57 lr 0.001065 wd 0.0500 time 0.4599 (0.4708) data time 0.0011 (0.0044) model time 0.4588 (0.4666) loss 3.9215 (3.2172) grad_norm 1.1654 (1.4606) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][130/625] eta 0:03:52 lr 0.001065 wd 0.0500 time 0.4622 (0.4703) data time 0.0008 (0.0042) model time 0.4614 (0.4661) loss 2.5315 (3.2096) grad_norm 1.1073 (1.4504) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][140/625] eta 0:03:47 lr 0.001065 wd 0.0500 time 0.4627 (0.4698) data time 0.0010 (0.0040) model time 0.4617 (0.4657) loss 2.8554 (3.2250) grad_norm 1.1566 (1.4439) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][150/625] eta 0:03:42 lr 0.001065 wd 0.0500 time 0.4608 (0.4693) data time 0.0007 (0.0038) model time 0.4601 (0.4652) loss 3.9634 (3.2120) grad_norm 1.2412 (1.4419) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][160/625] eta 0:03:38 lr 0.001065 wd 0.0500 time 0.4722 (0.4691) data time 0.0012 (0.0036) model time 0.4710 (0.4652) loss 2.7319 (3.2056) grad_norm 1.1304 (1.4314) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][170/625] eta 0:03:33 lr 0.001065 wd 0.0500 time 0.4718 (0.4689) data time 0.0010 (0.0035) model time 0.4708 (0.4652) loss 3.2470 (3.1895) grad_norm 1.6725 (1.4321) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][180/625] eta 0:03:28 lr 0.001065 wd 0.0500 time 0.4539 (0.4688) data time 0.0008 (0.0033) model time 0.4531 (0.4652) loss 2.5772 (3.1743) grad_norm 1.9014 (1.4646) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][190/625] eta 0:03:23 lr 0.001065 wd 0.0500 time 0.4664 (0.4685) data time 0.0010 (0.0032) model time 0.4654 (0.4651) loss 2.8041 (3.1792) grad_norm 1.1333 (1.4683) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][200/625] eta 0:03:19 lr 0.001065 wd 0.0500 time 0.4635 (0.4684) data time 0.0011 (0.0031) model time 0.4625 (0.4650) loss 2.9709 (3.1733) grad_norm 1.7670 (1.4697) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][210/625] eta 0:03:14 lr 0.001065 wd 0.0500 time 0.4639 (0.4681) data time 0.0008 (0.0030) model time 0.4631 (0.4648) loss 4.0190 (3.1732) grad_norm 1.3162 (1.4781) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][220/625] eta 0:03:09 lr 0.001065 wd 0.0500 time 0.4642 (0.4680) data time 0.0008 (0.0029) model time 0.4635 (0.4648) loss 3.5265 (3.1780) grad_norm 1.0676 (1.4796) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][230/625] eta 0:03:04 lr 0.001065 wd 0.0500 time 0.4738 (0.4679) data time 0.0009 (0.0028) model time 0.4728 (0.4648) loss 2.2883 (3.1764) grad_norm 1.1083 (1.4722) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][240/625] eta 0:03:00 lr 0.001065 wd 0.0500 time 0.4657 (0.4678) data time 0.0010 (0.0028) model time 0.4647 (0.4648) loss 2.8453 (3.1801) grad_norm 1.5177 (1.4758) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][250/625] eta 0:02:55 lr 0.001065 wd 0.0500 time 0.4658 (0.4678) data time 0.0011 (0.0027) model time 0.4648 (0.4648) loss 3.2292 (3.1782) grad_norm 1.6052 (1.4714) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][260/625] eta 0:02:50 lr 0.001064 wd 0.0500 time 0.4653 (0.4677) data time 0.0008 (0.0026) model time 0.4645 (0.4648) loss 3.4722 (3.1775) grad_norm 1.8533 (1.4847) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][270/625] eta 0:02:45 lr 0.001064 wd 0.0500 time 0.4614 (0.4675) data time 0.0008 (0.0026) model time 0.4606 (0.4647) loss 3.6913 (3.1838) grad_norm 1.1709 (1.4929) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][280/625] eta 0:02:41 lr 0.001064 wd 0.0500 time 0.4606 (0.4675) data time 0.0008 (0.0025) model time 0.4598 (0.4647) loss 3.8278 (3.1934) grad_norm 1.6423 (1.4898) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][290/625] eta 0:02:36 lr 0.001064 wd 0.0500 time 0.4738 (0.4674) data time 0.0008 (0.0025) model time 0.4730 (0.4647) loss 3.0547 (3.1931) grad_norm 1.1844 (1.4893) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][300/625] eta 0:02:31 lr 0.001064 wd 0.0500 time 0.4643 (0.4674) data time 0.0011 (0.0024) model time 0.4633 (0.4647) loss 3.0386 (3.1966) grad_norm 1.1116 (1.4868) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][310/625] eta 0:02:27 lr 0.001064 wd 0.0500 time 0.4621 (0.4673) data time 0.0008 (0.0024) model time 0.4614 (0.4647) loss 3.9252 (3.1907) grad_norm 2.6697 (1.4987) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][320/625] eta 0:02:22 lr 0.001064 wd 0.0500 time 0.4632 (0.4673) data time 0.0008 (0.0023) model time 0.4624 (0.4647) loss 3.4841 (3.1890) grad_norm 1.3268 (1.4957) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][330/625] eta 0:02:17 lr 0.001064 wd 0.0500 time 0.4684 (0.4672) data time 0.0010 (0.0023) model time 0.4674 (0.4647) loss 1.9489 (3.1874) grad_norm 1.6493 (1.4911) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][340/625] eta 0:02:13 lr 0.001064 wd 0.0500 time 0.4638 (0.4672) data time 0.0010 (0.0023) model time 0.4627 (0.4647) loss 3.1081 (3.1857) grad_norm 1.6145 (1.4905) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][350/625] eta 0:02:08 lr 0.001064 wd 0.0500 time 0.4730 (0.4671) data time 0.0008 (0.0022) model time 0.4722 (0.4647) loss 3.6126 (3.1869) grad_norm 1.4917 (1.4942) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][360/625] eta 0:02:03 lr 0.001064 wd 0.0500 time 0.4557 (0.4671) data time 0.0010 (0.0022) model time 0.4547 (0.4647) loss 2.5135 (3.1910) grad_norm 0.9474 (1.4857) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][370/625] eta 0:01:59 lr 0.001064 wd 0.0500 time 0.4738 (0.4671) data time 0.0008 (0.0022) model time 0.4730 (0.4647) loss 4.1947 (3.1904) grad_norm 1.4006 (1.4816) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][380/625] eta 0:01:54 lr 0.001064 wd 0.0500 time 0.4700 (0.4676) data time 0.0011 (0.0021) model time 0.4689 (0.4653) loss 3.8826 (3.1945) grad_norm 1.8385 (1.4825) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][390/625] eta 0:01:49 lr 0.001064 wd 0.0500 time 0.4665 (0.4676) data time 0.0010 (0.0021) model time 0.4655 (0.4654) loss 3.7079 (3.1964) grad_norm 1.2710 (1.4867) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][400/625] eta 0:01:45 lr 0.001064 wd 0.0500 time 0.4681 (0.4675) data time 0.0008 (0.0021) model time 0.4672 (0.4653) loss 3.9061 (3.2042) grad_norm 1.3082 (1.4824) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][410/625] eta 0:01:40 lr 0.001063 wd 0.0500 time 0.4648 (0.4680) data time 0.0008 (0.0021) model time 0.4640 (0.4659) loss 2.5758 (3.1989) grad_norm 1.3947 (1.4778) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][420/625] eta 0:01:35 lr 0.001063 wd 0.0500 time 0.4640 (0.4679) data time 0.0008 (0.0020) model time 0.4632 (0.4658) loss 2.6876 (3.1990) grad_norm 1.2759 (1.4781) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][430/625] eta 0:01:31 lr 0.001063 wd 0.0500 time 0.4656 (0.4678) data time 0.0007 (0.0020) model time 0.4649 (0.4657) loss 3.6612 (3.2007) grad_norm 1.1419 (1.4718) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][440/625] eta 0:01:26 lr 0.001063 wd 0.0500 time 0.4763 (0.4677) data time 0.0008 (0.0020) model time 0.4755 (0.4657) loss 3.0750 (3.2034) grad_norm 1.2190 (1.4682) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][450/625] eta 0:01:21 lr 0.001063 wd 0.0500 time 0.4607 (0.4678) data time 0.0008 (0.0020) model time 0.4599 (0.4658) loss 2.9404 (3.2005) grad_norm 1.7062 (1.4718) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:14:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][460/625] eta 0:01:17 lr 0.001063 wd 0.0500 time 0.4588 (0.4677) data time 0.0009 (0.0020) model time 0.4579 (0.4657) loss 3.8165 (3.2043) grad_norm 1.2177 (1.4728) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][470/625] eta 0:01:12 lr 0.001063 wd 0.0500 time 0.4641 (0.4677) data time 0.0007 (0.0019) model time 0.4633 (0.4657) loss 3.5931 (3.2053) grad_norm 1.1959 (1.4703) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][480/625] eta 0:01:07 lr 0.001063 wd 0.0500 time 0.4615 (0.4676) data time 0.0008 (0.0019) model time 0.4607 (0.4656) loss 3.5681 (3.2082) grad_norm 1.4662 (1.4688) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][490/625] eta 0:01:03 lr 0.001063 wd 0.0500 time 0.4617 (0.4676) data time 0.0010 (0.0019) model time 0.4606 (0.4656) loss 3.0605 (3.2098) grad_norm 1.4950 (1.4693) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][500/625] eta 0:00:58 lr 0.001063 wd 0.0500 time 0.4602 (0.4675) data time 0.0011 (0.0019) model time 0.4591 (0.4656) loss 3.3401 (3.2153) grad_norm 2.0199 (1.4714) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][510/625] eta 0:00:53 lr 0.001063 wd 0.0500 time 0.4647 (0.4675) data time 0.0010 (0.0019) model time 0.4637 (0.4655) loss 3.2409 (3.2198) grad_norm 1.2244 (1.4722) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][520/625] eta 0:00:49 lr 0.001063 wd 0.0500 time 0.4638 (0.4674) data time 0.0011 (0.0019) model time 0.4628 (0.4655) loss 3.3587 (3.2220) grad_norm 1.8837 (1.4705) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][530/625] eta 0:00:44 lr 0.001063 wd 0.0500 time 0.4732 (0.4674) data time 0.0008 (0.0019) model time 0.4725 (0.4655) loss 2.4910 (3.2182) grad_norm 1.5022 (1.4691) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][540/625] eta 0:00:39 lr 0.001063 wd 0.0500 time 0.4694 (0.4674) data time 0.0010 (0.0018) model time 0.4684 (0.4655) loss 2.8453 (3.2164) grad_norm 1.1660 (1.4652) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][550/625] eta 0:00:35 lr 0.001062 wd 0.0500 time 0.4663 (0.4674) data time 0.0010 (0.0018) model time 0.4653 (0.4655) loss 2.1610 (3.2171) grad_norm 1.3686 (1.4659) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][560/625] eta 0:00:30 lr 0.001062 wd 0.0500 time 0.4592 (0.4673) data time 0.0011 (0.0018) model time 0.4582 (0.4655) loss 3.3761 (3.2139) grad_norm 1.0239 (1.4645) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][570/625] eta 0:00:25 lr 0.001062 wd 0.0500 time 0.4628 (0.4674) data time 0.0009 (0.0018) model time 0.4619 (0.4656) loss 2.5058 (3.2135) grad_norm 1.0300 (1.4663) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][580/625] eta 0:00:21 lr 0.001062 wd 0.0500 time 0.4600 (0.4673) data time 0.0008 (0.0018) model time 0.4592 (0.4655) loss 3.6758 (3.2156) grad_norm 1.9927 (1.4692) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][590/625] eta 0:00:16 lr 0.001062 wd 0.0500 time 0.4632 (0.4672) data time 0.0007 (0.0018) model time 0.4625 (0.4654) loss 3.9282 (3.2175) grad_norm 1.1492 (1.4745) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][600/625] eta 0:00:11 lr 0.001062 wd 0.0500 time 0.4640 (0.4675) data time 0.0010 (0.0018) model time 0.4631 (0.4657) loss 3.4741 (3.2153) grad_norm 2.5979 (1.4734) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][610/625] eta 0:00:07 lr 0.001062 wd 0.0500 time 0.4616 (0.4675) data time 0.0005 (0.0017) model time 0.4611 (0.4657) loss 2.8389 (3.2180) grad_norm 2.2720 (1.4823) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [81/300][620/625] eta 0:00:02 lr 0.001062 wd 0.0500 time 0.4611 (0.4674) data time 0.0005 (0.0017) model time 0.4606 (0.4656) loss 4.0229 (3.2200) grad_norm 1.7670 (1.4846) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 81 training takes 0:04:52 [2024-08-10 08:16:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:16:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.6089 (0.6089) Acc@1 86.914 (86.914) Acc@5 97.900 (97.900) Mem 16715MB [2024-08-10 08:16:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.9414 (0.7158) Acc@1 78.076 (83.829) Acc@5 94.629 (96.964) Mem 16715MB [2024-08-10 08:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.0889 (0.8563) Acc@1 73.193 (80.259) Acc@5 92.725 (95.415) Mem 16715MB [2024-08-10 08:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.902 Acc@5 95.341 [2024-08-10 08:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.9% [2024-08-10 08:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 79.90% [2024-08-10 08:16:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 08:16:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 08:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.5181 (0.5181) Acc@1 88.428 (88.428) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 08:16:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.8452 (0.6483) Acc@1 79.492 (85.378) Acc@5 95.361 (97.399) Mem 16715MB [2024-08-10 08:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9736 (0.7716) Acc@1 75.488 (82.073) Acc@5 94.531 (96.108) Mem 16715MB [2024-08-10 08:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.796 Acc@5 96.121 [2024-08-10 08:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-10 08:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.80% [2024-08-10 08:16:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:16:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][0/625] eta 0:08:38 lr 0.001062 wd 0.0500 time 0.8288 (0.8288) data time 0.4225 (0.4225) model time 0.0000 (0.0000) loss 4.2711 (4.2711) grad_norm 1.7413 (1.7413) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][10/625] eta 0:05:04 lr 0.001062 wd 0.0500 time 0.4637 (0.4958) data time 0.0010 (0.0394) model time 0.0000 (0.0000) loss 3.8017 (3.4484) grad_norm 1.7264 (1.4556) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][20/625] eta 0:04:50 lr 0.001062 wd 0.0500 time 0.4619 (0.4800) data time 0.0008 (0.0211) model time 0.0000 (0.0000) loss 3.4460 (3.3530) grad_norm 1.4989 (1.4339) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][30/625] eta 0:04:42 lr 0.001062 wd 0.0500 time 0.4727 (0.4749) data time 0.0010 (0.0147) model time 0.0000 (0.0000) loss 3.0251 (3.3637) grad_norm 0.8730 (1.4867) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][40/625] eta 0:04:36 lr 0.001062 wd 0.0500 time 0.4692 (0.4727) data time 0.0010 (0.0114) model time 0.0000 (0.0000) loss 3.5363 (3.3578) grad_norm 1.7539 (1.4834) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][50/625] eta 0:04:32 lr 0.001062 wd 0.0500 time 0.4648 (0.4745) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 3.5249 (3.3394) grad_norm 1.6295 (1.5011) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][60/625] eta 0:04:27 lr 0.001062 wd 0.0500 time 0.4606 (0.4729) data time 0.0012 (0.0080) model time 0.4594 (0.4634) loss 2.8598 (3.2727) grad_norm 1.7668 (1.5050) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][70/625] eta 0:04:21 lr 0.001062 wd 0.0500 time 0.4642 (0.4716) data time 0.0011 (0.0070) model time 0.4631 (0.4631) loss 3.1084 (3.2529) grad_norm 1.3228 (1.5366) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][80/625] eta 0:04:16 lr 0.001061 wd 0.0500 time 0.4629 (0.4703) data time 0.0010 (0.0063) model time 0.4620 (0.4620) loss 2.4958 (3.2434) grad_norm 1.0531 (1.5189) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][90/625] eta 0:04:11 lr 0.001061 wd 0.0500 time 0.4641 (0.4693) data time 0.0008 (0.0057) model time 0.4633 (0.4616) loss 2.6074 (3.2647) grad_norm 1.1885 (1.4904) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][100/625] eta 0:04:06 lr 0.001061 wd 0.0500 time 0.4686 (0.4688) data time 0.0011 (0.0053) model time 0.4675 (0.4618) loss 2.9985 (3.2652) grad_norm 1.5700 (1.4747) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][110/625] eta 0:04:01 lr 0.001061 wd 0.0500 time 0.4624 (0.4684) data time 0.0010 (0.0049) model time 0.4615 (0.4621) loss 2.6149 (3.2357) grad_norm 1.8592 (1.4771) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][120/625] eta 0:03:56 lr 0.001061 wd 0.0500 time 0.4639 (0.4681) data time 0.0011 (0.0046) model time 0.4628 (0.4623) loss 3.1105 (3.2310) grad_norm 1.5768 (1.4862) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][130/625] eta 0:03:51 lr 0.001061 wd 0.0500 time 0.4656 (0.4678) data time 0.0010 (0.0043) model time 0.4646 (0.4624) loss 3.2379 (3.2324) grad_norm 1.3389 (1.4866) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][140/625] eta 0:03:46 lr 0.001061 wd 0.0500 time 0.4588 (0.4675) data time 0.0011 (0.0041) model time 0.4577 (0.4623) loss 2.8147 (3.2270) grad_norm 1.1038 (1.4770) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][150/625] eta 0:03:41 lr 0.001061 wd 0.0500 time 0.4600 (0.4672) data time 0.0010 (0.0039) model time 0.4590 (0.4623) loss 3.0150 (3.2191) grad_norm 1.4104 (1.4692) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][160/625] eta 0:03:37 lr 0.001061 wd 0.0500 time 0.4656 (0.4683) data time 0.0008 (0.0037) model time 0.4648 (0.4643) loss 2.7627 (3.2073) grad_norm 1.0101 (1.4530) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][170/625] eta 0:03:32 lr 0.001061 wd 0.0500 time 0.4665 (0.4679) data time 0.0008 (0.0036) model time 0.4657 (0.4641) loss 3.4570 (3.2014) grad_norm 1.3410 (1.4396) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][180/625] eta 0:03:28 lr 0.001061 wd 0.0500 time 0.4668 (0.4678) data time 0.0008 (0.0034) model time 0.4659 (0.4640) loss 2.1982 (3.1740) grad_norm 1.2033 (1.4510) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][190/625] eta 0:03:23 lr 0.001061 wd 0.0500 time 0.4648 (0.4677) data time 0.0008 (0.0033) model time 0.4640 (0.4641) loss 3.8067 (3.1649) grad_norm 1.2557 (1.4599) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][200/625] eta 0:03:18 lr 0.001061 wd 0.0500 time 0.4666 (0.4677) data time 0.0010 (0.0032) model time 0.4657 (0.4642) loss 1.9819 (3.1569) grad_norm 0.7987 (1.4542) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][210/625] eta 0:03:13 lr 0.001061 wd 0.0500 time 0.4699 (0.4674) data time 0.0010 (0.0031) model time 0.4689 (0.4641) loss 3.5329 (3.1560) grad_norm 1.2408 (1.4589) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][220/625] eta 0:03:09 lr 0.001060 wd 0.0500 time 0.4616 (0.4672) data time 0.0010 (0.0030) model time 0.4606 (0.4639) loss 3.4098 (3.1684) grad_norm 1.1229 (1.4624) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][230/625] eta 0:03:04 lr 0.001060 wd 0.0500 time 0.4619 (0.4670) data time 0.0007 (0.0029) model time 0.4612 (0.4638) loss 1.9809 (3.1612) grad_norm 1.1105 (1.4592) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][240/625] eta 0:02:59 lr 0.001060 wd 0.0500 time 0.4651 (0.4669) data time 0.0008 (0.0028) model time 0.4643 (0.4638) loss 2.8499 (3.1660) grad_norm 1.3878 (1.4517) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][250/625] eta 0:02:55 lr 0.001060 wd 0.0500 time 0.4666 (0.4669) data time 0.0007 (0.0028) model time 0.4658 (0.4638) loss 3.9219 (3.1664) grad_norm 1.0947 (1.4522) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][260/625] eta 0:02:50 lr 0.001060 wd 0.0500 time 0.4717 (0.4669) data time 0.0012 (0.0027) model time 0.4706 (0.4640) loss 3.6051 (3.1750) grad_norm 2.0688 (1.4575) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][270/625] eta 0:02:45 lr 0.001060 wd 0.0500 time 0.4732 (0.4669) data time 0.0010 (0.0026) model time 0.4722 (0.4641) loss 2.4450 (3.1678) grad_norm 1.4023 (1.4571) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][280/625] eta 0:02:41 lr 0.001060 wd 0.0500 time 0.4553 (0.4668) data time 0.0008 (0.0026) model time 0.4545 (0.4640) loss 2.7264 (3.1648) grad_norm 2.3065 (1.4686) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][290/625] eta 0:02:36 lr 0.001060 wd 0.0500 time 0.4655 (0.4667) data time 0.0008 (0.0025) model time 0.4647 (0.4640) loss 3.7122 (3.1684) grad_norm 2.3650 (1.4709) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][300/625] eta 0:02:31 lr 0.001060 wd 0.0500 time 0.4653 (0.4667) data time 0.0010 (0.0025) model time 0.4643 (0.4640) loss 3.0886 (3.1723) grad_norm 1.9290 (1.4719) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][310/625] eta 0:02:26 lr 0.001060 wd 0.0500 time 0.4668 (0.4666) data time 0.0007 (0.0024) model time 0.4661 (0.4639) loss 3.2035 (3.1690) grad_norm 1.4523 (1.4724) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][320/625] eta 0:02:22 lr 0.001060 wd 0.0500 time 0.4631 (0.4665) data time 0.0008 (0.0024) model time 0.4623 (0.4639) loss 2.2322 (3.1714) grad_norm 1.9344 (1.4778) loss_scale 4096.0000 (2067.1402) mem 16715MB [2024-08-10 08:19:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][330/625] eta 0:02:17 lr 0.001060 wd 0.0500 time 0.4693 (0.4665) data time 0.0010 (0.0023) model time 0.4684 (0.4640) loss 3.0941 (3.1733) grad_norm 2.0110 (1.4765) loss_scale 4096.0000 (2128.4350) mem 16715MB [2024-08-10 08:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][340/625] eta 0:02:12 lr 0.001060 wd 0.0500 time 0.4619 (0.4665) data time 0.0010 (0.0023) model time 0.4610 (0.4640) loss 3.2839 (3.1812) grad_norm 1.1684 (1.4710) loss_scale 4096.0000 (2186.1349) mem 16715MB [2024-08-10 08:19:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][350/625] eta 0:02:08 lr 0.001060 wd 0.0500 time 0.4675 (0.4664) data time 0.0008 (0.0023) model time 0.4667 (0.4640) loss 3.3857 (3.1752) grad_norm 1.5702 (1.4675) loss_scale 4096.0000 (2240.5470) mem 16715MB [2024-08-10 08:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][360/625] eta 0:02:03 lr 0.001060 wd 0.0500 time 0.4630 (0.4664) data time 0.0010 (0.0022) model time 0.4620 (0.4640) loss 3.1006 (3.1797) grad_norm 2.4343 (1.4680) loss_scale 4096.0000 (2291.9446) mem 16715MB [2024-08-10 08:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][370/625] eta 0:01:58 lr 0.001059 wd 0.0500 time 0.4642 (0.4663) data time 0.0008 (0.0022) model time 0.4635 (0.4639) loss 3.7457 (3.1813) grad_norm 1.4209 (1.4703) loss_scale 4096.0000 (2340.5714) mem 16715MB [2024-08-10 08:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][380/625] eta 0:01:54 lr 0.001059 wd 0.0500 time 0.6571 (0.4666) data time 0.0010 (0.0022) model time 0.6561 (0.4644) loss 2.5447 (3.1812) grad_norm 1.7968 (1.4660) loss_scale 4096.0000 (2386.6457) mem 16715MB [2024-08-10 08:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][390/625] eta 0:01:49 lr 0.001059 wd 0.0500 time 0.4659 (0.4674) data time 0.0010 (0.0022) model time 0.4649 (0.4653) loss 3.1873 (3.1754) grad_norm 1.5458 (1.4703) loss_scale 4096.0000 (2430.3632) mem 16715MB [2024-08-10 08:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][400/625] eta 0:01:45 lr 0.001059 wd 0.0500 time 0.4577 (0.4674) data time 0.0010 (0.0021) model time 0.4567 (0.4653) loss 3.3585 (3.1742) grad_norm 1.2585 (1.4705) loss_scale 4096.0000 (2471.9002) mem 16715MB [2024-08-10 08:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][410/625] eta 0:01:40 lr 0.001059 wd 0.0500 time 0.4629 (0.4674) data time 0.0011 (0.0021) model time 0.4618 (0.4653) loss 2.6912 (3.1719) grad_norm 1.1027 (1.4675) loss_scale 4096.0000 (2511.4161) mem 16715MB [2024-08-10 08:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][420/625] eta 0:01:35 lr 0.001059 wd 0.0500 time 0.4673 (0.4674) data time 0.0010 (0.0021) model time 0.4663 (0.4653) loss 2.7734 (3.1797) grad_norm 2.0604 (1.4706) loss_scale 4096.0000 (2549.0546) mem 16715MB [2024-08-10 08:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][430/625] eta 0:01:31 lr 0.001059 wd 0.0500 time 0.4547 (0.4673) data time 0.0010 (0.0021) model time 0.4537 (0.4652) loss 2.8704 (3.1766) grad_norm 2.5628 (1.4740) loss_scale 4096.0000 (2584.9466) mem 16715MB [2024-08-10 08:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][440/625] eta 0:01:26 lr 0.001059 wd 0.0500 time 0.4634 (0.4673) data time 0.0010 (0.0020) model time 0.4625 (0.4653) loss 2.7000 (3.1781) grad_norm 1.9931 (1.4768) loss_scale 4096.0000 (2619.2109) mem 16715MB [2024-08-10 08:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][450/625] eta 0:01:21 lr 0.001059 wd 0.0500 time 0.4623 (0.4673) data time 0.0008 (0.0020) model time 0.4615 (0.4653) loss 2.3427 (3.1792) grad_norm 2.5928 (1.5048) loss_scale 4096.0000 (2651.9557) mem 16715MB [2024-08-10 08:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][460/625] eta 0:01:17 lr 0.001059 wd 0.0500 time 0.4727 (0.4673) data time 0.0012 (0.0020) model time 0.4715 (0.4653) loss 2.1902 (3.1788) grad_norm 1.2984 (1.5077) loss_scale 4096.0000 (2683.2798) mem 16715MB [2024-08-10 08:20:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][470/625] eta 0:01:12 lr 0.001059 wd 0.0500 time 0.4672 (0.4672) data time 0.0010 (0.0020) model time 0.4662 (0.4652) loss 3.3998 (3.1787) grad_norm 1.1194 (1.5031) loss_scale 4096.0000 (2713.2739) mem 16715MB [2024-08-10 08:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][480/625] eta 0:01:07 lr 0.001059 wd 0.0500 time 0.4718 (0.4672) data time 0.0010 (0.0020) model time 0.4708 (0.4652) loss 3.4892 (3.1776) grad_norm 1.5874 (1.5012) loss_scale 4096.0000 (2742.0208) mem 16715MB [2024-08-10 08:20:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][490/625] eta 0:01:03 lr 0.001059 wd 0.0500 time 0.4696 (0.4671) data time 0.0010 (0.0019) model time 0.4685 (0.4652) loss 3.6358 (3.1796) grad_norm 0.9178 (1.4971) loss_scale 4096.0000 (2769.5967) mem 16715MB [2024-08-10 08:20:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][500/625] eta 0:00:58 lr 0.001059 wd 0.0500 time 0.4609 (0.4671) data time 0.0010 (0.0019) model time 0.4598 (0.4652) loss 3.0061 (3.1777) grad_norm 1.2859 (1.5029) loss_scale 4096.0000 (2796.0719) mem 16715MB [2024-08-10 08:20:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][510/625] eta 0:00:53 lr 0.001058 wd 0.0500 time 0.4627 (0.4673) data time 0.0010 (0.0019) model time 0.4618 (0.4654) loss 2.7874 (3.1757) grad_norm 1.2621 (1.5056) loss_scale 4096.0000 (2821.5108) mem 16715MB [2024-08-10 08:20:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][520/625] eta 0:00:49 lr 0.001058 wd 0.0500 time 0.4544 (0.4673) data time 0.0010 (0.0019) model time 0.4534 (0.4654) loss 3.4874 (3.1766) grad_norm 1.0759 (1.5057) loss_scale 4096.0000 (2845.9731) mem 16715MB [2024-08-10 08:20:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][530/625] eta 0:00:44 lr 0.001058 wd 0.0500 time 0.4617 (0.4683) data time 0.0008 (0.0019) model time 0.4609 (0.4665) loss 2.6878 (3.1784) grad_norm 1.3870 (1.5046) loss_scale 4096.0000 (2869.5141) mem 16715MB [2024-08-10 08:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][540/625] eta 0:00:39 lr 0.001058 wd 0.0500 time 0.4614 (0.4682) data time 0.0011 (0.0019) model time 0.4603 (0.4664) loss 3.4779 (3.1794) grad_norm 1.3564 (1.5042) loss_scale 4096.0000 (2892.1848) mem 16715MB [2024-08-10 08:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][550/625] eta 0:00:35 lr 0.001058 wd 0.0500 time 0.4540 (0.4682) data time 0.0011 (0.0019) model time 0.4529 (0.4664) loss 3.0056 (3.1806) grad_norm 1.2487 (1.4999) loss_scale 4096.0000 (2914.0327) mem 16715MB [2024-08-10 08:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][560/625] eta 0:00:30 lr 0.001058 wd 0.0500 time 0.4699 (0.4683) data time 0.0011 (0.0019) model time 0.4688 (0.4665) loss 3.3811 (3.1895) grad_norm 1.7638 (1.5013) loss_scale 4096.0000 (2935.1016) mem 16715MB [2024-08-10 08:20:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][570/625] eta 0:00:25 lr 0.001058 wd 0.0500 time 0.4669 (0.4683) data time 0.0008 (0.0019) model time 0.4660 (0.4665) loss 3.6077 (3.1893) grad_norm 1.7174 (1.5017) loss_scale 4096.0000 (2955.4326) mem 16715MB [2024-08-10 08:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][580/625] eta 0:00:21 lr 0.001058 wd 0.0500 time 0.4656 (0.4685) data time 0.0008 (0.0019) model time 0.4648 (0.4668) loss 3.8800 (3.1900) grad_norm 1.3602 (1.5021) loss_scale 4096.0000 (2975.0637) mem 16715MB [2024-08-10 08:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][590/625] eta 0:00:16 lr 0.001058 wd 0.0500 time 0.4589 (0.4685) data time 0.0007 (0.0019) model time 0.4582 (0.4667) loss 3.6393 (3.1893) grad_norm 1.7562 (1.5037) loss_scale 4096.0000 (2994.0305) mem 16715MB [2024-08-10 08:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][600/625] eta 0:00:11 lr 0.001058 wd 0.0500 time 0.4611 (0.4684) data time 0.0010 (0.0019) model time 0.4601 (0.4666) loss 3.5321 (3.1887) grad_norm 1.1012 (1.5039) loss_scale 4096.0000 (3012.3661) mem 16715MB [2024-08-10 08:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][610/625] eta 0:00:07 lr 0.001058 wd 0.0500 time 0.4660 (0.4684) data time 0.0005 (0.0019) model time 0.4655 (0.4667) loss 3.2116 (3.1863) grad_norm 1.5652 (1.5035) loss_scale 4096.0000 (3030.1015) mem 16715MB [2024-08-10 08:21:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [82/300][620/625] eta 0:00:02 lr 0.001058 wd 0.0500 time 0.4649 (0.4683) data time 0.0007 (0.0018) model time 0.4642 (0.4666) loss 2.4574 (3.1888) grad_norm 0.9518 (1.5002) loss_scale 4096.0000 (3047.2657) mem 16715MB [2024-08-10 08:21:18 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 82 training takes 0:04:52 [2024-08-10 08:21:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:21:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.5825 (0.5825) Acc@1 86.328 (86.328) Acc@5 97.998 (97.998) Mem 16715MB [2024-08-10 08:21:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.163) Loss 0.9492 (0.7227) Acc@1 77.490 (83.367) Acc@5 94.678 (96.879) Mem 16715MB [2024-08-10 08:21:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0879 (0.8571) Acc@1 73.975 (80.034) Acc@5 92.969 (95.366) Mem 16715MB [2024-08-10 08:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.816 Acc@5 95.341 [2024-08-10 08:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 79.8% [2024-08-10 08:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.925 (0.925) Loss 0.5181 (0.5181) Acc@1 88.379 (88.379) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:21:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.200) Loss 0.8423 (0.6475) Acc@1 79.541 (85.427) Acc@5 95.361 (97.394) Mem 16715MB [2024-08-10 08:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.161) Loss 0.9692 (0.7700) Acc@1 75.488 (82.157) Acc@5 94.531 (96.098) Mem 16715MB [2024-08-10 08:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.868 Acc@5 96.109 [2024-08-10 08:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 08:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.87% [2024-08-10 08:21:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:21:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][0/625] eta 0:08:40 lr 0.001058 wd 0.0500 time 0.8332 (0.8332) data time 0.4183 (0.4183) model time 0.0000 (0.0000) loss 2.7850 (2.7850) grad_norm 1.5900 (1.5900) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][10/625] eta 0:05:07 lr 0.001058 wd 0.0500 time 0.4819 (0.4995) data time 0.0008 (0.0390) model time 0.0000 (0.0000) loss 3.9796 (3.3177) grad_norm 1.4767 (2.0041) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][20/625] eta 0:04:52 lr 0.001058 wd 0.0500 time 0.4649 (0.4827) data time 0.0008 (0.0209) model time 0.0000 (0.0000) loss 2.1576 (3.2088) grad_norm 1.7592 (1.8093) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:21:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][30/625] eta 0:04:44 lr 0.001057 wd 0.0500 time 0.4647 (0.4774) data time 0.0008 (0.0145) model time 0.0000 (0.0000) loss 3.1230 (3.2052) grad_norm 2.3503 (1.7117) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][40/625] eta 0:04:38 lr 0.001057 wd 0.0500 time 0.4673 (0.4754) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 3.8588 (3.1890) grad_norm 2.2694 (1.6770) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][50/625] eta 0:04:32 lr 0.001057 wd 0.0500 time 0.4607 (0.4741) data time 0.0010 (0.0094) model time 0.0000 (0.0000) loss 2.7148 (3.2040) grad_norm 1.5569 (1.7176) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:21:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][60/625] eta 0:04:27 lr 0.001057 wd 0.0500 time 0.4588 (0.4731) data time 0.0010 (0.0081) model time 0.4578 (0.4664) loss 2.2333 (3.1858) grad_norm 1.3859 (1.6662) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][70/625] eta 0:04:22 lr 0.001057 wd 0.0500 time 0.5465 (0.4738) data time 0.0010 (0.0071) model time 0.5455 (0.4717) loss 3.3043 (3.1710) grad_norm 1.8599 (1.6571) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][80/625] eta 0:04:17 lr 0.001057 wd 0.0500 time 0.4646 (0.4727) data time 0.0008 (0.0063) model time 0.4638 (0.4692) loss 3.9778 (3.1876) grad_norm 0.9583 (1.6282) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][90/625] eta 0:04:12 lr 0.001057 wd 0.0500 time 0.4671 (0.4719) data time 0.0008 (0.0058) model time 0.4663 (0.4680) loss 4.0213 (3.2063) grad_norm 1.6207 (1.6239) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][100/625] eta 0:04:08 lr 0.001057 wd 0.0500 time 0.4598 (0.4730) data time 0.0008 (0.0053) model time 0.4590 (0.4707) loss 3.7356 (3.1807) grad_norm 2.2952 (1.6313) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][110/625] eta 0:04:05 lr 0.001057 wd 0.0500 time 0.4657 (0.4765) data time 0.0008 (0.0049) model time 0.4649 (0.4774) loss 3.9453 (3.1888) grad_norm 1.0842 (1.5961) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][120/625] eta 0:04:00 lr 0.001057 wd 0.0500 time 0.4626 (0.4755) data time 0.0010 (0.0046) model time 0.4616 (0.4754) loss 3.5570 (3.1964) grad_norm 1.2154 (1.5660) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][130/625] eta 0:03:55 lr 0.001057 wd 0.0500 time 0.4597 (0.4749) data time 0.0011 (0.0044) model time 0.4586 (0.4743) loss 2.7403 (3.1830) grad_norm 1.8474 (1.5545) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][140/625] eta 0:03:50 lr 0.001057 wd 0.0500 time 0.4715 (0.4743) data time 0.0010 (0.0041) model time 0.4705 (0.4733) loss 2.6229 (3.1755) grad_norm 1.7264 (1.6032) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][150/625] eta 0:03:45 lr 0.001057 wd 0.0500 time 0.4653 (0.4738) data time 0.0008 (0.0040) model time 0.4644 (0.4724) loss 2.3588 (3.1723) grad_norm 1.3734 (1.5980) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][160/625] eta 0:03:40 lr 0.001057 wd 0.0500 time 0.5286 (0.4736) data time 0.0010 (0.0038) model time 0.5276 (0.4721) loss 2.2711 (3.1782) grad_norm 1.4175 (1.5802) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][170/625] eta 0:03:35 lr 0.001057 wd 0.0500 time 0.4685 (0.4730) data time 0.0008 (0.0037) model time 0.4677 (0.4713) loss 3.5325 (3.1671) grad_norm 2.2035 (1.5760) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][180/625] eta 0:03:30 lr 0.001056 wd 0.0500 time 0.4579 (0.4724) data time 0.0008 (0.0035) model time 0.4572 (0.4705) loss 2.7298 (3.1611) grad_norm 1.0983 (1.5677) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][190/625] eta 0:03:25 lr 0.001056 wd 0.0500 time 0.4643 (0.4720) data time 0.0008 (0.0034) model time 0.4635 (0.4700) loss 3.8894 (3.1808) grad_norm 1.5230 (1.5616) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][200/625] eta 0:03:20 lr 0.001056 wd 0.0500 time 0.4680 (0.4716) data time 0.0008 (0.0033) model time 0.4672 (0.4696) loss 3.4530 (3.1786) grad_norm 2.0399 (1.5507) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][210/625] eta 0:03:15 lr 0.001056 wd 0.0500 time 0.4697 (0.4714) data time 0.0010 (0.0032) model time 0.4687 (0.4694) loss 2.8668 (3.1745) grad_norm 1.3986 (1.5414) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][220/625] eta 0:03:10 lr 0.001056 wd 0.0500 time 0.4638 (0.4712) data time 0.0007 (0.0031) model time 0.4631 (0.4692) loss 3.9364 (3.1771) grad_norm 1.5589 (1.5391) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][230/625] eta 0:03:06 lr 0.001056 wd 0.0500 time 0.4647 (0.4710) data time 0.0008 (0.0030) model time 0.4639 (0.4689) loss 2.6870 (3.1763) grad_norm 1.0887 (1.5434) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][240/625] eta 0:03:01 lr 0.001056 wd 0.0500 time 0.4617 (0.4706) data time 0.0008 (0.0029) model time 0.4609 (0.4686) loss 2.9273 (3.1734) grad_norm 1.1978 (1.5403) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][250/625] eta 0:02:56 lr 0.001056 wd 0.0500 time 0.4629 (0.4703) data time 0.0010 (0.0028) model time 0.4619 (0.4682) loss 2.3984 (3.1684) grad_norm 1.8133 (1.5384) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][260/625] eta 0:02:51 lr 0.001056 wd 0.0500 time 0.4581 (0.4700) data time 0.0008 (0.0027) model time 0.4573 (0.4678) loss 4.0751 (3.1843) grad_norm 1.1614 (1.5408) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][270/625] eta 0:02:46 lr 0.001056 wd 0.0500 time 0.4714 (0.4698) data time 0.0009 (0.0027) model time 0.4705 (0.4677) loss 3.1254 (3.1861) grad_norm 1.4950 (1.5578) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][280/625] eta 0:02:42 lr 0.001056 wd 0.0500 time 0.4662 (0.4696) data time 0.0010 (0.0026) model time 0.4651 (0.4675) loss 3.5486 (3.1940) grad_norm 1.5430 (1.5540) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][290/625] eta 0:02:37 lr 0.001056 wd 0.0500 time 0.4659 (0.4695) data time 0.0010 (0.0026) model time 0.4649 (0.4674) loss 3.4022 (3.1979) grad_norm 1.5572 (1.5450) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][300/625] eta 0:02:32 lr 0.001056 wd 0.0500 time 0.4588 (0.4693) data time 0.0010 (0.0025) model time 0.4578 (0.4672) loss 3.5437 (3.1940) grad_norm 2.6207 (1.5569) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:23:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][310/625] eta 0:02:27 lr 0.001056 wd 0.0500 time 0.4671 (0.4692) data time 0.0011 (0.0025) model time 0.4660 (0.4671) loss 3.8871 (3.2002) grad_norm 1.1182 (1.5611) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][320/625] eta 0:02:23 lr 0.001055 wd 0.0500 time 0.4608 (0.4696) data time 0.0010 (0.0024) model time 0.4598 (0.4676) loss 3.0614 (3.2020) grad_norm 1.5594 (1.5552) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][330/625] eta 0:02:18 lr 0.001055 wd 0.0500 time 0.4658 (0.4693) data time 0.0007 (0.0024) model time 0.4651 (0.4673) loss 3.6725 (3.2058) grad_norm 1.2072 (1.5513) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][340/625] eta 0:02:13 lr 0.001055 wd 0.0500 time 0.4608 (0.4691) data time 0.0010 (0.0024) model time 0.4598 (0.4671) loss 3.1213 (3.2028) grad_norm 0.9808 (1.5420) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][350/625] eta 0:02:08 lr 0.001055 wd 0.0500 time 0.4646 (0.4689) data time 0.0007 (0.0023) model time 0.4639 (0.4669) loss 2.3404 (3.2036) grad_norm 1.2834 (1.5357) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][360/625] eta 0:02:04 lr 0.001055 wd 0.0500 time 0.4666 (0.4688) data time 0.0009 (0.0023) model time 0.4657 (0.4668) loss 3.1317 (3.2061) grad_norm 1.4015 (1.5405) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][370/625] eta 0:01:59 lr 0.001055 wd 0.0500 time 0.4675 (0.4688) data time 0.0010 (0.0023) model time 0.4665 (0.4668) loss 3.4835 (3.2108) grad_norm 1.2323 (1.5351) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][380/625] eta 0:01:54 lr 0.001055 wd 0.0500 time 0.4617 (0.4687) data time 0.0010 (0.0022) model time 0.4607 (0.4667) loss 3.0625 (3.2129) grad_norm 1.0054 (1.5299) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][390/625] eta 0:01:50 lr 0.001055 wd 0.0500 time 0.4704 (0.4686) data time 0.0010 (0.0022) model time 0.4695 (0.4666) loss 3.1645 (3.2149) grad_norm 2.0150 (1.5470) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][400/625] eta 0:01:45 lr 0.001055 wd 0.0500 time 0.4631 (0.4685) data time 0.0010 (0.0022) model time 0.4621 (0.4666) loss 2.8177 (3.2139) grad_norm 1.1679 (1.5430) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][410/625] eta 0:01:40 lr 0.001055 wd 0.0500 time 0.4698 (0.4685) data time 0.0009 (0.0021) model time 0.4690 (0.4666) loss 3.7108 (3.2116) grad_norm 1.4864 (1.5403) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][420/625] eta 0:01:36 lr 0.001055 wd 0.0500 time 0.4730 (0.4685) data time 0.0007 (0.0021) model time 0.4722 (0.4666) loss 2.8180 (3.2137) grad_norm 1.4223 (1.5369) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][430/625] eta 0:01:31 lr 0.001055 wd 0.0500 time 0.4673 (0.4685) data time 0.0010 (0.0021) model time 0.4663 (0.4666) loss 3.5761 (3.2165) grad_norm 1.1182 (1.5308) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:24:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][440/625] eta 0:01:26 lr 0.001055 wd 0.0500 time 0.4696 (0.4690) data time 0.0008 (0.0021) model time 0.4688 (0.4672) loss 3.5898 (3.2182) grad_norm 1.1729 (1.5334) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][450/625] eta 0:01:22 lr 0.001055 wd 0.0500 time 0.4677 (0.4690) data time 0.0008 (0.0020) model time 0.4669 (0.4672) loss 3.4837 (3.2135) grad_norm 1.5783 (1.5311) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][460/625] eta 0:01:17 lr 0.001054 wd 0.0500 time 0.4095 (0.4693) data time 0.0008 (0.0020) model time 0.4087 (0.4676) loss 3.3889 (3.2196) grad_norm 1.4300 (1.5281) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][470/625] eta 0:01:12 lr 0.001054 wd 0.0500 time 0.4732 (0.4693) data time 0.0008 (0.0020) model time 0.4724 (0.4676) loss 2.5559 (3.2144) grad_norm 1.0991 (1.5258) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][480/625] eta 0:01:08 lr 0.001054 wd 0.0500 time 0.4624 (0.4693) data time 0.0008 (0.0020) model time 0.4617 (0.4676) loss 3.3478 (3.2141) grad_norm 1.3180 (1.5256) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][490/625] eta 0:01:03 lr 0.001054 wd 0.0500 time 0.4625 (0.4691) data time 0.0008 (0.0020) model time 0.4617 (0.4674) loss 3.1273 (3.2154) grad_norm 1.5409 (1.5235) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][500/625] eta 0:00:58 lr 0.001054 wd 0.0500 time 0.4693 (0.4691) data time 0.0011 (0.0019) model time 0.4682 (0.4674) loss 2.7530 (3.2160) grad_norm 1.0117 (1.5190) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][510/625] eta 0:00:53 lr 0.001054 wd 0.0500 time 0.4639 (0.4690) data time 0.0008 (0.0019) model time 0.4631 (0.4673) loss 2.5137 (3.2110) grad_norm 1.2841 (1.5207) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][520/625] eta 0:00:49 lr 0.001054 wd 0.0500 time 0.4627 (0.4689) data time 0.0010 (0.0019) model time 0.4617 (0.4672) loss 3.4313 (3.2127) grad_norm 1.4119 (1.5332) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][530/625] eta 0:00:44 lr 0.001054 wd 0.0500 time 0.4636 (0.4688) data time 0.0008 (0.0019) model time 0.4627 (0.4671) loss 3.1869 (3.2114) grad_norm 1.2964 (1.5301) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][540/625] eta 0:00:39 lr 0.001054 wd 0.0500 time 0.4633 (0.4687) data time 0.0008 (0.0019) model time 0.4625 (0.4671) loss 3.1754 (3.2083) grad_norm 1.1694 (1.5257) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][550/625] eta 0:00:35 lr 0.001054 wd 0.0500 time 0.4646 (0.4686) data time 0.0010 (0.0019) model time 0.4636 (0.4670) loss 2.9951 (3.2119) grad_norm 1.2193 (1.5213) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][560/625] eta 0:00:30 lr 0.001054 wd 0.0500 time 0.4634 (0.4685) data time 0.0007 (0.0019) model time 0.4627 (0.4669) loss 3.4515 (3.2083) grad_norm 1.3629 (1.5167) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:25:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][570/625] eta 0:00:25 lr 0.001054 wd 0.0500 time 0.4641 (0.4685) data time 0.0008 (0.0018) model time 0.4633 (0.4668) loss 3.4728 (3.2066) grad_norm 1.6636 (1.5145) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][580/625] eta 0:00:21 lr 0.001054 wd 0.0500 time 0.4624 (0.4684) data time 0.0011 (0.0018) model time 0.4613 (0.4668) loss 3.6794 (3.2040) grad_norm 1.0347 (1.5127) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][590/625] eta 0:00:16 lr 0.001054 wd 0.0500 time 0.4643 (0.4684) data time 0.0008 (0.0018) model time 0.4635 (0.4667) loss 3.4330 (3.2026) grad_norm 2.5194 (1.5189) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][600/625] eta 0:00:11 lr 0.001053 wd 0.0500 time 0.4594 (0.4684) data time 0.0010 (0.0018) model time 0.4584 (0.4667) loss 3.4791 (3.2002) grad_norm 1.4707 (1.5201) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][610/625] eta 0:00:07 lr 0.001053 wd 0.0500 time 0.4678 (0.4683) data time 0.0005 (0.0018) model time 0.4673 (0.4666) loss 3.4474 (3.2007) grad_norm 1.4276 (1.5201) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [83/300][620/625] eta 0:00:02 lr 0.001053 wd 0.0500 time 0.4533 (0.4681) data time 0.0005 (0.0018) model time 0.4528 (0.4665) loss 3.6328 (3.2007) grad_norm 1.1701 (1.5174) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 83 training takes 0:04:52 [2024-08-10 08:26:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:26:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:26:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.5649 (0.5649) Acc@1 88.037 (88.037) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 08:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9473 (0.7282) Acc@1 78.320 (83.705) Acc@5 94.824 (96.888) Mem 16715MB [2024-08-10 08:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.1094 (0.8686) Acc@1 73.096 (80.162) Acc@5 92.822 (95.282) Mem 16715MB [2024-08-10 08:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.996 Acc@5 95.298 [2024-08-10 08:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-10 08:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.00% [2024-08-10 08:26:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 08:26:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 08:26:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.5176 (0.5176) Acc@1 88.428 (88.428) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.8408 (0.6468) Acc@1 79.443 (85.418) Acc@5 95.410 (97.390) Mem 16715MB [2024-08-10 08:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9678 (0.7686) Acc@1 75.488 (82.189) Acc@5 94.531 (96.124) Mem 16715MB [2024-08-10 08:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.904 Acc@5 96.131 [2024-08-10 08:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 08:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.90% [2024-08-10 08:26:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:26:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:26:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][0/625] eta 0:11:14 lr 0.001053 wd 0.0500 time 1.0792 (1.0792) data time 0.6637 (0.6637) model time 0.0000 (0.0000) loss 3.3703 (3.3703) grad_norm 0.9827 (0.9827) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][10/625] eta 0:05:33 lr 0.001053 wd 0.0500 time 0.4627 (0.5422) data time 0.0011 (0.0613) model time 0.0000 (0.0000) loss 2.4597 (2.9745) grad_norm 1.2690 (1.4592) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][20/625] eta 0:05:11 lr 0.001053 wd 0.0500 time 0.4660 (0.5141) data time 0.0008 (0.0326) model time 0.0000 (0.0000) loss 2.4121 (2.9339) grad_norm 1.7535 (1.5101) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][30/625] eta 0:04:56 lr 0.001053 wd 0.0500 time 0.4668 (0.4986) data time 0.0011 (0.0224) model time 0.0000 (0.0000) loss 2.8925 (3.0781) grad_norm 1.6541 (1.5066) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][40/625] eta 0:04:47 lr 0.001053 wd 0.0500 time 0.4530 (0.4915) data time 0.0009 (0.0173) model time 0.0000 (0.0000) loss 4.1213 (3.1407) grad_norm 1.9318 (1.5366) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:26:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][50/625] eta 0:04:39 lr 0.001053 wd 0.0500 time 0.4633 (0.4859) data time 0.0009 (0.0141) model time 0.0000 (0.0000) loss 3.6731 (3.1542) grad_norm 1.5296 (1.5234) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][60/625] eta 0:04:32 lr 0.001053 wd 0.0500 time 0.4541 (0.4819) data time 0.0010 (0.0119) model time 0.4531 (0.4603) loss 2.6061 (3.1491) grad_norm 0.9954 (1.4746) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][70/625] eta 0:04:25 lr 0.001053 wd 0.0500 time 0.4569 (0.4788) data time 0.0011 (0.0104) model time 0.4559 (0.4597) loss 3.5332 (3.1615) grad_norm 1.2420 (1.4853) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][80/625] eta 0:04:19 lr 0.001053 wd 0.0500 time 0.4622 (0.4770) data time 0.0010 (0.0093) model time 0.4611 (0.4608) loss 2.6779 (3.1516) grad_norm 1.7046 (1.5064) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][90/625] eta 0:04:14 lr 0.001053 wd 0.0500 time 0.4652 (0.4759) data time 0.0010 (0.0084) model time 0.4642 (0.4620) loss 2.9214 (3.1636) grad_norm 1.2582 (1.5301) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][100/625] eta 0:04:09 lr 0.001053 wd 0.0500 time 0.4655 (0.4748) data time 0.0008 (0.0077) model time 0.4646 (0.4624) loss 3.1294 (3.1592) grad_norm 1.0809 (1.5307) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][110/625] eta 0:04:04 lr 0.001053 wd 0.0500 time 0.4629 (0.4738) data time 0.0008 (0.0071) model time 0.4621 (0.4624) loss 3.4320 (3.1613) grad_norm 1.3713 (1.5423) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][120/625] eta 0:03:58 lr 0.001052 wd 0.0500 time 0.4651 (0.4727) data time 0.0007 (0.0066) model time 0.4643 (0.4620) loss 3.5034 (3.1797) grad_norm 1.1602 (1.5432) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][130/625] eta 0:03:53 lr 0.001052 wd 0.0500 time 0.4610 (0.4719) data time 0.0011 (0.0062) model time 0.4599 (0.4619) loss 2.8941 (3.1808) grad_norm 1.0755 (1.5363) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][140/625] eta 0:03:48 lr 0.001052 wd 0.0500 time 0.4679 (0.4712) data time 0.0009 (0.0058) model time 0.4670 (0.4617) loss 2.7481 (3.1660) grad_norm 2.0090 (1.5391) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][150/625] eta 0:03:43 lr 0.001052 wd 0.0500 time 0.4634 (0.4707) data time 0.0010 (0.0055) model time 0.4624 (0.4619) loss 3.0173 (3.1520) grad_norm 1.2073 (1.5395) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][160/625] eta 0:03:38 lr 0.001052 wd 0.0500 time 0.4678 (0.4704) data time 0.0008 (0.0052) model time 0.4670 (0.4620) loss 3.5034 (3.1475) grad_norm 1.7039 (1.5304) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][170/625] eta 0:03:33 lr 0.001052 wd 0.0500 time 0.4611 (0.4700) data time 0.0008 (0.0050) model time 0.4603 (0.4622) loss 2.6414 (3.1380) grad_norm 1.6064 (1.5385) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:27:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][180/625] eta 0:03:29 lr 0.001052 wd 0.0500 time 0.4654 (0.4697) data time 0.0010 (0.0048) model time 0.4644 (0.4622) loss 3.3475 (3.1276) grad_norm 1.5721 (1.5471) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][190/625] eta 0:03:24 lr 0.001052 wd 0.0500 time 0.4615 (0.4693) data time 0.0008 (0.0046) model time 0.4607 (0.4621) loss 3.3664 (3.1324) grad_norm 2.2555 (1.5586) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][200/625] eta 0:03:19 lr 0.001052 wd 0.0500 time 0.4598 (0.4699) data time 0.0007 (0.0044) model time 0.4590 (0.4634) loss 3.3338 (3.1367) grad_norm 1.3522 (1.5554) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][210/625] eta 0:03:14 lr 0.001052 wd 0.0500 time 0.4608 (0.4695) data time 0.0008 (0.0042) model time 0.4600 (0.4632) loss 3.9100 (3.1332) grad_norm 0.9921 (1.5422) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][220/625] eta 0:03:10 lr 0.001052 wd 0.0500 time 0.4667 (0.4692) data time 0.0009 (0.0041) model time 0.4658 (0.4631) loss 3.4896 (3.1272) grad_norm 1.2318 (1.5377) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][230/625] eta 0:03:05 lr 0.001052 wd 0.0500 time 0.4664 (0.4689) data time 0.0011 (0.0039) model time 0.4653 (0.4631) loss 3.1981 (3.1317) grad_norm 1.5162 (1.5266) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][240/625] eta 0:03:00 lr 0.001052 wd 0.0500 time 0.4620 (0.4694) data time 0.0007 (0.0038) model time 0.4613 (0.4640) loss 3.1530 (3.1337) grad_norm 1.6693 (1.5229) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][250/625] eta 0:02:55 lr 0.001052 wd 0.0500 time 0.4631 (0.4693) data time 0.0010 (0.0037) model time 0.4620 (0.4640) loss 3.1486 (3.1444) grad_norm 1.2443 (1.5206) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][260/625] eta 0:02:51 lr 0.001051 wd 0.0500 time 0.4627 (0.4691) data time 0.0010 (0.0036) model time 0.4617 (0.4639) loss 3.4105 (3.1430) grad_norm 1.2018 (1.5132) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][270/625] eta 0:02:46 lr 0.001051 wd 0.0500 time 0.4538 (0.4688) data time 0.0012 (0.0035) model time 0.4527 (0.4637) loss 3.0413 (3.1380) grad_norm 2.0403 (1.5276) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 08:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][280/625] eta 0:02:41 lr 0.001051 wd 0.0500 time 0.4648 (0.4685) data time 0.0008 (0.0034) model time 0.4640 (0.4635) loss 3.6647 (3.1442) grad_norm 1.0884 (inf) loss_scale 2048.0000 (4044.9822) mem 16715MB [2024-08-10 08:28:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][290/625] eta 0:02:36 lr 0.001051 wd 0.0500 time 0.4587 (0.4682) data time 0.0010 (0.0034) model time 0.4577 (0.4634) loss 3.5028 (3.1476) grad_norm 1.2884 (inf) loss_scale 2048.0000 (3976.3574) mem 16715MB [2024-08-10 08:28:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][300/625] eta 0:02:32 lr 0.001051 wd 0.0500 time 0.4651 (0.4681) data time 0.0010 (0.0033) model time 0.4641 (0.4633) loss 2.2129 (3.1462) grad_norm 1.4502 (inf) loss_scale 2048.0000 (3912.2924) mem 16715MB [2024-08-10 08:28:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][310/625] eta 0:02:27 lr 0.001051 wd 0.0500 time 0.4604 (0.4679) data time 0.0008 (0.0032) model time 0.4596 (0.4633) loss 3.8027 (3.1465) grad_norm 1.7262 (inf) loss_scale 2048.0000 (3852.3473) mem 16715MB [2024-08-10 08:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][320/625] eta 0:02:22 lr 0.001051 wd 0.0500 time 0.4653 (0.4678) data time 0.0009 (0.0031) model time 0.4645 (0.4633) loss 3.2895 (3.1479) grad_norm 1.1141 (inf) loss_scale 2048.0000 (3796.1371) mem 16715MB [2024-08-10 08:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][330/625] eta 0:02:17 lr 0.001051 wd 0.0500 time 0.4629 (0.4676) data time 0.0008 (0.0031) model time 0.4621 (0.4632) loss 2.9401 (3.1534) grad_norm 4.0582 (inf) loss_scale 2048.0000 (3743.3233) mem 16715MB [2024-08-10 08:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][340/625] eta 0:02:13 lr 0.001051 wd 0.0500 time 0.4670 (0.4675) data time 0.0008 (0.0030) model time 0.4662 (0.4632) loss 2.2487 (3.1459) grad_norm 1.6409 (inf) loss_scale 2048.0000 (3693.6070) mem 16715MB [2024-08-10 08:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][350/625] eta 0:02:08 lr 0.001051 wd 0.0500 time 0.4607 (0.4674) data time 0.0009 (0.0030) model time 0.4598 (0.4632) loss 3.2850 (3.1507) grad_norm 2.6701 (inf) loss_scale 2048.0000 (3646.7236) mem 16715MB [2024-08-10 08:29:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][360/625] eta 0:02:03 lr 0.001051 wd 0.0500 time 0.4711 (0.4673) data time 0.0010 (0.0029) model time 0.4700 (0.4632) loss 3.6897 (3.1553) grad_norm 1.2548 (inf) loss_scale 2048.0000 (3602.4377) mem 16715MB [2024-08-10 08:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][370/625] eta 0:01:59 lr 0.001051 wd 0.0500 time 0.4643 (0.4673) data time 0.0010 (0.0029) model time 0.4633 (0.4632) loss 1.9634 (3.1529) grad_norm 1.1402 (inf) loss_scale 2048.0000 (3560.5391) mem 16715MB [2024-08-10 08:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][380/625] eta 0:01:54 lr 0.001051 wd 0.0500 time 0.4665 (0.4672) data time 0.0008 (0.0028) model time 0.4657 (0.4632) loss 3.6455 (3.1474) grad_norm 1.1561 (inf) loss_scale 2048.0000 (3520.8399) mem 16715MB [2024-08-10 08:29:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][390/625] eta 0:01:49 lr 0.001051 wd 0.0500 time 0.4619 (0.4676) data time 0.0011 (0.0028) model time 0.4608 (0.4638) loss 3.5506 (3.1511) grad_norm 1.4312 (inf) loss_scale 2048.0000 (3483.1714) mem 16715MB [2024-08-10 08:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][400/625] eta 0:01:45 lr 0.001051 wd 0.0500 time 0.4617 (0.4675) data time 0.0010 (0.0027) model time 0.4607 (0.4638) loss 2.7059 (3.1518) grad_norm 1.1890 (inf) loss_scale 2048.0000 (3447.3815) mem 16715MB [2024-08-10 08:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][410/625] eta 0:01:40 lr 0.001050 wd 0.0500 time 0.4604 (0.4675) data time 0.0009 (0.0027) model time 0.4596 (0.4638) loss 3.0650 (3.1508) grad_norm 1.8859 (inf) loss_scale 2048.0000 (3413.3333) mem 16715MB [2024-08-10 08:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][420/625] eta 0:01:35 lr 0.001050 wd 0.0500 time 0.4575 (0.4683) data time 0.0010 (0.0026) model time 0.4565 (0.4648) loss 3.3985 (3.1472) grad_norm 1.4332 (inf) loss_scale 2048.0000 (3380.9026) mem 16715MB [2024-08-10 08:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][430/625] eta 0:01:31 lr 0.001050 wd 0.0500 time 0.4639 (0.4682) data time 0.0007 (0.0026) model time 0.4632 (0.4647) loss 3.1571 (3.1521) grad_norm 1.2169 (inf) loss_scale 2048.0000 (3349.9768) mem 16715MB [2024-08-10 08:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][440/625] eta 0:01:26 lr 0.001050 wd 0.0500 time 0.4675 (0.4681) data time 0.0010 (0.0026) model time 0.4665 (0.4647) loss 3.2202 (3.1523) grad_norm 1.6547 (inf) loss_scale 2048.0000 (3320.4535) mem 16715MB [2024-08-10 08:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][450/625] eta 0:01:21 lr 0.001050 wd 0.0500 time 0.4568 (0.4680) data time 0.0012 (0.0025) model time 0.4556 (0.4647) loss 2.2471 (3.1525) grad_norm 1.8233 (inf) loss_scale 2048.0000 (3292.2395) mem 16715MB [2024-08-10 08:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][460/625] eta 0:01:17 lr 0.001050 wd 0.0500 time 0.4637 (0.4679) data time 0.0012 (0.0025) model time 0.4626 (0.4646) loss 3.6189 (3.1544) grad_norm 1.2326 (inf) loss_scale 2048.0000 (3265.2495) mem 16715MB [2024-08-10 08:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][470/625] eta 0:01:12 lr 0.001050 wd 0.0500 time 0.4641 (0.4678) data time 0.0011 (0.0025) model time 0.4631 (0.4646) loss 3.0506 (3.1607) grad_norm 1.3738 (inf) loss_scale 2048.0000 (3239.4055) mem 16715MB [2024-08-10 08:30:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][480/625] eta 0:01:07 lr 0.001050 wd 0.0500 time 0.4663 (0.4677) data time 0.0012 (0.0025) model time 0.4651 (0.4645) loss 3.0723 (3.1633) grad_norm 2.0878 (inf) loss_scale 2048.0000 (3214.6362) mem 16715MB [2024-08-10 08:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][490/625] eta 0:01:03 lr 0.001050 wd 0.0500 time 0.4572 (0.4676) data time 0.0010 (0.0024) model time 0.4563 (0.4644) loss 3.3542 (3.1597) grad_norm 1.7130 (inf) loss_scale 2048.0000 (3190.8758) mem 16715MB [2024-08-10 08:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][500/625] eta 0:00:58 lr 0.001050 wd 0.0500 time 0.4653 (0.4675) data time 0.0008 (0.0024) model time 0.4645 (0.4644) loss 3.4607 (3.1554) grad_norm 2.3513 (inf) loss_scale 2048.0000 (3168.0639) mem 16715MB [2024-08-10 08:30:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][510/625] eta 0:00:53 lr 0.001050 wd 0.0500 time 0.4615 (0.4674) data time 0.0007 (0.0024) model time 0.4607 (0.4643) loss 3.2975 (3.1580) grad_norm 1.9606 (inf) loss_scale 2048.0000 (3146.1448) mem 16715MB [2024-08-10 08:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][520/625] eta 0:00:49 lr 0.001050 wd 0.0500 time 0.4642 (0.4673) data time 0.0009 (0.0024) model time 0.4632 (0.4642) loss 3.5458 (3.1629) grad_norm 2.2143 (inf) loss_scale 2048.0000 (3125.0672) mem 16715MB [2024-08-10 08:30:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][530/625] eta 0:00:44 lr 0.001050 wd 0.0500 time 0.4725 (0.4673) data time 0.0010 (0.0023) model time 0.4715 (0.4642) loss 2.9761 (3.1578) grad_norm 1.1660 (inf) loss_scale 2048.0000 (3104.7834) mem 16715MB [2024-08-10 08:30:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][540/625] eta 0:00:39 lr 0.001050 wd 0.0500 time 0.4647 (0.4672) data time 0.0008 (0.0023) model time 0.4639 (0.4642) loss 2.7156 (3.1621) grad_norm 1.1896 (inf) loss_scale 2048.0000 (3085.2495) mem 16715MB [2024-08-10 08:30:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][550/625] eta 0:00:35 lr 0.001049 wd 0.0500 time 0.4630 (0.4672) data time 0.0007 (0.0023) model time 0.4623 (0.4642) loss 2.9149 (3.1663) grad_norm 1.3910 (inf) loss_scale 2048.0000 (3066.4247) mem 16715MB [2024-08-10 08:30:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][560/625] eta 0:00:30 lr 0.001049 wd 0.0500 time 0.6557 (0.4679) data time 0.0009 (0.0023) model time 0.6548 (0.4650) loss 2.5500 (3.1657) grad_norm 1.4723 (inf) loss_scale 2048.0000 (3048.2709) mem 16715MB [2024-08-10 08:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][570/625] eta 0:00:25 lr 0.001049 wd 0.0500 time 0.4594 (0.4678) data time 0.0007 (0.0022) model time 0.4587 (0.4650) loss 3.1081 (3.1672) grad_norm 1.3198 (inf) loss_scale 2048.0000 (3030.7531) mem 16715MB [2024-08-10 08:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][580/625] eta 0:00:21 lr 0.001049 wd 0.0500 time 0.4620 (0.4678) data time 0.0011 (0.0022) model time 0.4609 (0.4650) loss 3.7216 (3.1697) grad_norm 1.4875 (inf) loss_scale 2048.0000 (3013.8382) mem 16715MB [2024-08-10 08:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][590/625] eta 0:00:16 lr 0.001049 wd 0.0500 time 0.4690 (0.4678) data time 0.0009 (0.0022) model time 0.4681 (0.4650) loss 3.9361 (3.1702) grad_norm 1.5707 (inf) loss_scale 2048.0000 (2997.4958) mem 16715MB [2024-08-10 08:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][600/625] eta 0:00:11 lr 0.001049 wd 0.0500 time 0.4625 (0.4677) data time 0.0011 (0.0022) model time 0.4614 (0.4650) loss 3.4233 (3.1724) grad_norm 1.1951 (inf) loss_scale 2048.0000 (2981.6972) mem 16715MB [2024-08-10 08:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][610/625] eta 0:00:07 lr 0.001049 wd 0.0500 time 0.4646 (0.4677) data time 0.0005 (0.0022) model time 0.4641 (0.4649) loss 2.0375 (3.1703) grad_norm 1.3538 (inf) loss_scale 2048.0000 (2966.4157) mem 16715MB [2024-08-10 08:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [84/300][620/625] eta 0:00:02 lr 0.001049 wd 0.0500 time 0.4617 (0.4676) data time 0.0005 (0.0021) model time 0.4612 (0.4649) loss 4.0014 (3.1723) grad_norm 1.4280 (inf) loss_scale 2048.0000 (2951.6264) mem 16715MB [2024-08-10 08:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 84 training takes 0:04:52 [2024-08-10 08:31:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:31:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:31:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5684 (0.5684) Acc@1 87.549 (87.549) Acc@5 97.998 (97.998) Mem 16715MB [2024-08-10 08:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.160) Loss 0.9355 (0.7038) Acc@1 76.855 (83.767) Acc@5 94.727 (97.053) Mem 16715MB [2024-08-10 08:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.0605 (0.8469) Acc@1 73.779 (80.178) Acc@5 93.555 (95.457) Mem 16715MB [2024-08-10 08:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.990 Acc@5 95.453 [2024-08-10 08:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-10 08:31:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.856 (0.856) Loss 0.5161 (0.5161) Acc@1 88.477 (88.477) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:31:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.193) Loss 0.8389 (0.6456) Acc@1 79.541 (85.489) Acc@5 95.508 (97.403) Mem 16715MB [2024-08-10 08:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.9619 (0.7669) Acc@1 75.684 (82.259) Acc@5 94.580 (96.150) Mem 16715MB [2024-08-10 08:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.984 Acc@5 96.155 [2024-08-10 08:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-10 08:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 81.98% [2024-08-10 08:31:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:31:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][0/625] eta 0:11:17 lr 0.001049 wd 0.0500 time 1.0844 (1.0844) data time 0.4348 (0.4348) model time 0.0000 (0.0000) loss 3.1842 (3.1842) grad_norm 2.0174 (2.0174) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][10/625] eta 0:05:19 lr 0.001049 wd 0.0500 time 0.4620 (0.5195) data time 0.0008 (0.0405) model time 0.0000 (0.0000) loss 4.0415 (3.3113) grad_norm 2.6723 (1.8225) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][20/625] eta 0:04:58 lr 0.001049 wd 0.0500 time 0.4646 (0.4930) data time 0.0013 (0.0217) model time 0.0000 (0.0000) loss 3.3197 (3.2384) grad_norm 1.2821 (1.5989) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][30/625] eta 0:04:48 lr 0.001049 wd 0.0500 time 0.4667 (0.4841) data time 0.0010 (0.0151) model time 0.0000 (0.0000) loss 3.0206 (3.2623) grad_norm 1.5980 (1.4828) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][40/625] eta 0:04:40 lr 0.001049 wd 0.0500 time 0.4685 (0.4795) data time 0.0010 (0.0117) model time 0.0000 (0.0000) loss 2.6424 (3.2437) grad_norm 1.9476 (1.5214) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][50/625] eta 0:04:34 lr 0.001049 wd 0.0500 time 0.4713 (0.4773) data time 0.0010 (0.0096) model time 0.0000 (0.0000) loss 3.2841 (3.2587) grad_norm 1.2743 (1.5160) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][60/625] eta 0:04:28 lr 0.001048 wd 0.0500 time 0.4707 (0.4753) data time 0.0008 (0.0082) model time 0.4699 (0.4643) loss 4.0463 (3.2219) grad_norm 1.5204 (1.5181) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][70/625] eta 0:04:22 lr 0.001048 wd 0.0500 time 0.4630 (0.4738) data time 0.0011 (0.0072) model time 0.4619 (0.4640) loss 3.1807 (3.2190) grad_norm 1.1948 (1.5223) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][80/625] eta 0:04:17 lr 0.001048 wd 0.0500 time 0.4652 (0.4727) data time 0.0010 (0.0064) model time 0.4641 (0.4639) loss 3.5060 (3.2110) grad_norm 1.1626 (1.5031) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][90/625] eta 0:04:12 lr 0.001048 wd 0.0500 time 0.4662 (0.4717) data time 0.0008 (0.0058) model time 0.4654 (0.4635) loss 3.0344 (3.2171) grad_norm 1.6095 (1.4903) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][100/625] eta 0:04:07 lr 0.001048 wd 0.0500 time 0.4671 (0.4710) data time 0.0008 (0.0054) model time 0.4663 (0.4637) loss 3.4871 (3.2173) grad_norm 1.5434 (1.4765) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][110/625] eta 0:04:02 lr 0.001048 wd 0.0500 time 0.4690 (0.4707) data time 0.0008 (0.0050) model time 0.4682 (0.4640) loss 2.5789 (3.2285) grad_norm 1.2585 (1.4875) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][120/625] eta 0:03:58 lr 0.001048 wd 0.0500 time 0.4660 (0.4721) data time 0.0010 (0.0046) model time 0.4650 (0.4673) loss 3.2584 (3.2435) grad_norm 1.1095 (1.4884) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][130/625] eta 0:03:53 lr 0.001048 wd 0.0500 time 0.4626 (0.4715) data time 0.0008 (0.0044) model time 0.4618 (0.4668) loss 3.9067 (3.2467) grad_norm 1.6453 (1.4753) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][140/625] eta 0:03:48 lr 0.001048 wd 0.0500 time 0.4547 (0.4708) data time 0.0012 (0.0041) model time 0.4535 (0.4660) loss 3.2778 (3.2607) grad_norm 1.1876 (1.4716) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][150/625] eta 0:03:43 lr 0.001048 wd 0.0500 time 0.4599 (0.4702) data time 0.0010 (0.0039) model time 0.4589 (0.4655) loss 3.2617 (3.2612) grad_norm 1.4840 (1.4853) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][160/625] eta 0:03:38 lr 0.001048 wd 0.0500 time 0.4642 (0.4696) data time 0.0009 (0.0038) model time 0.4633 (0.4650) loss 3.6096 (3.2509) grad_norm 1.3576 (1.4900) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:32:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][170/625] eta 0:03:33 lr 0.001048 wd 0.0500 time 0.4637 (0.4692) data time 0.0009 (0.0036) model time 0.4628 (0.4648) loss 2.8187 (3.2497) grad_norm 1.0794 (1.4778) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][180/625] eta 0:03:28 lr 0.001048 wd 0.0500 time 0.4616 (0.4690) data time 0.0010 (0.0035) model time 0.4606 (0.4648) loss 3.1419 (3.2416) grad_norm 1.2211 (1.4691) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][190/625] eta 0:03:23 lr 0.001048 wd 0.0500 time 0.4598 (0.4687) data time 0.0008 (0.0033) model time 0.4590 (0.4645) loss 2.3313 (3.2417) grad_norm 1.6097 (1.4559) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][200/625] eta 0:03:19 lr 0.001047 wd 0.0500 time 0.4688 (0.4685) data time 0.0010 (0.0032) model time 0.4678 (0.4645) loss 3.2162 (3.2332) grad_norm 1.9019 (1.4638) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][210/625] eta 0:03:14 lr 0.001047 wd 0.0500 time 0.4623 (0.4682) data time 0.0008 (0.0031) model time 0.4615 (0.4643) loss 3.6386 (3.2333) grad_norm 1.4234 (1.4644) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][220/625] eta 0:03:09 lr 0.001047 wd 0.0500 time 0.4653 (0.4680) data time 0.0010 (0.0030) model time 0.4642 (0.4642) loss 3.2418 (3.2228) grad_norm 1.2760 (1.4574) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][230/625] eta 0:03:04 lr 0.001047 wd 0.0500 time 0.4816 (0.4680) data time 0.0011 (0.0029) model time 0.4805 (0.4644) loss 3.1387 (3.2174) grad_norm 1.0567 (1.4674) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][240/625] eta 0:03:00 lr 0.001047 wd 0.0500 time 0.4619 (0.4680) data time 0.0009 (0.0029) model time 0.4610 (0.4644) loss 3.3954 (3.2212) grad_norm 1.6117 (1.4720) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][250/625] eta 0:02:55 lr 0.001047 wd 0.0500 time 0.4671 (0.4683) data time 0.0010 (0.0028) model time 0.4661 (0.4649) loss 3.4313 (3.2186) grad_norm 2.1416 (1.4700) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][260/625] eta 0:02:50 lr 0.001047 wd 0.0500 time 0.4683 (0.4683) data time 0.0010 (0.0027) model time 0.4674 (0.4651) loss 3.0915 (3.2179) grad_norm 2.2386 (1.4740) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][270/625] eta 0:02:46 lr 0.001047 wd 0.0500 time 0.4623 (0.4682) data time 0.0009 (0.0027) model time 0.4614 (0.4651) loss 2.9008 (3.2142) grad_norm 1.5232 (1.4802) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][280/625] eta 0:02:41 lr 0.001047 wd 0.0500 time 0.4620 (0.4682) data time 0.0010 (0.0026) model time 0.4610 (0.4652) loss 3.6854 (3.2168) grad_norm 1.6337 (1.4832) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][290/625] eta 0:02:36 lr 0.001047 wd 0.0500 time 0.4616 (0.4680) data time 0.0009 (0.0026) model time 0.4607 (0.4650) loss 3.6257 (3.2186) grad_norm 1.2549 (1.4811) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:33:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][300/625] eta 0:02:32 lr 0.001047 wd 0.0500 time 0.4591 (0.4679) data time 0.0009 (0.0025) model time 0.4583 (0.4649) loss 3.7326 (3.2087) grad_norm 1.6553 (1.4814) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][310/625] eta 0:02:27 lr 0.001047 wd 0.0500 time 0.4633 (0.4677) data time 0.0008 (0.0025) model time 0.4626 (0.4648) loss 3.1882 (3.1981) grad_norm 1.7214 (1.4851) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][320/625] eta 0:02:22 lr 0.001047 wd 0.0500 time 0.4740 (0.4676) data time 0.0007 (0.0024) model time 0.4732 (0.4647) loss 3.8545 (3.1918) grad_norm 1.1843 (1.4896) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][330/625] eta 0:02:17 lr 0.001047 wd 0.0500 time 0.4651 (0.4678) data time 0.0007 (0.0025) model time 0.4643 (0.4649) loss 1.8615 (3.1891) grad_norm 1.3835 (1.4856) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][340/625] eta 0:02:13 lr 0.001046 wd 0.0500 time 0.4602 (0.4684) data time 0.0012 (0.0024) model time 0.4590 (0.4657) loss 3.1441 (3.1930) grad_norm 1.2310 (1.4810) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][350/625] eta 0:02:08 lr 0.001046 wd 0.0500 time 0.4650 (0.4684) data time 0.0011 (0.0024) model time 0.4639 (0.4657) loss 3.0063 (3.1860) grad_norm 1.9876 (1.4841) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][360/625] eta 0:02:04 lr 0.001046 wd 0.0500 time 0.4787 (0.4684) data time 0.0007 (0.0024) model time 0.4779 (0.4657) loss 2.8616 (3.1835) grad_norm 1.1743 (1.4785) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][370/625] eta 0:01:59 lr 0.001046 wd 0.0500 time 0.4654 (0.4683) data time 0.0008 (0.0024) model time 0.4646 (0.4656) loss 2.9052 (3.1874) grad_norm 1.7221 (1.4859) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][380/625] eta 0:01:54 lr 0.001046 wd 0.0500 time 0.4613 (0.4682) data time 0.0009 (0.0023) model time 0.4604 (0.4655) loss 3.6900 (3.1892) grad_norm 1.5692 (1.4877) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][390/625] eta 0:01:50 lr 0.001046 wd 0.0500 time 0.4759 (0.4682) data time 0.0008 (0.0023) model time 0.4751 (0.4655) loss 3.3256 (3.1938) grad_norm 1.4444 (1.4881) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][400/625] eta 0:01:45 lr 0.001046 wd 0.0500 time 0.4640 (0.4683) data time 0.0008 (0.0023) model time 0.4632 (0.4657) loss 3.2998 (3.1951) grad_norm 1.6023 (1.4862) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][410/625] eta 0:01:40 lr 0.001046 wd 0.0500 time 0.4651 (0.4684) data time 0.0008 (0.0023) model time 0.4643 (0.4658) loss 2.9856 (3.1890) grad_norm 1.0455 (1.4819) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][420/625] eta 0:01:36 lr 0.001046 wd 0.0500 time 0.4850 (0.4685) data time 0.0011 (0.0023) model time 0.4839 (0.4660) loss 3.5473 (3.1911) grad_norm 0.9773 (1.4820) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][430/625] eta 0:01:31 lr 0.001046 wd 0.0500 time 0.4613 (0.4684) data time 0.0010 (0.0022) model time 0.4603 (0.4659) loss 3.1717 (3.1980) grad_norm 1.2516 (1.4836) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][440/625] eta 0:01:26 lr 0.001046 wd 0.0500 time 0.4628 (0.4682) data time 0.0010 (0.0022) model time 0.4617 (0.4658) loss 2.4088 (3.1956) grad_norm 1.3990 (1.4858) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][450/625] eta 0:01:21 lr 0.001046 wd 0.0500 time 0.5011 (0.4682) data time 0.0008 (0.0022) model time 0.5002 (0.4658) loss 3.6542 (3.2005) grad_norm 1.2229 (1.4822) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][460/625] eta 0:01:17 lr 0.001046 wd 0.0500 time 0.4628 (0.4687) data time 0.0011 (0.0022) model time 0.4617 (0.4663) loss 2.8109 (3.2003) grad_norm 2.3100 (1.4851) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][470/625] eta 0:01:12 lr 0.001046 wd 0.0500 time 0.4763 (0.4686) data time 0.0010 (0.0022) model time 0.4753 (0.4663) loss 3.7143 (3.1996) grad_norm 1.6903 (1.4850) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][480/625] eta 0:01:07 lr 0.001045 wd 0.0500 time 0.4627 (0.4685) data time 0.0007 (0.0021) model time 0.4620 (0.4663) loss 3.6579 (3.1960) grad_norm 1.1589 (1.4822) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][490/625] eta 0:01:03 lr 0.001045 wd 0.0500 time 0.4638 (0.4685) data time 0.0010 (0.0021) model time 0.4628 (0.4662) loss 3.3590 (3.1943) grad_norm 1.7845 (1.4806) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][500/625] eta 0:00:58 lr 0.001045 wd 0.0500 time 0.4623 (0.4684) data time 0.0009 (0.0021) model time 0.4613 (0.4661) loss 3.5480 (3.1930) grad_norm 1.7983 (1.4805) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][510/625] eta 0:00:53 lr 0.001045 wd 0.0500 time 0.4649 (0.4683) data time 0.0008 (0.0021) model time 0.4641 (0.4660) loss 3.8138 (3.1965) grad_norm 1.1929 (1.4773) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][520/625] eta 0:00:49 lr 0.001045 wd 0.0500 time 0.4647 (0.4682) data time 0.0009 (0.0020) model time 0.4638 (0.4660) loss 3.1197 (3.1933) grad_norm 1.0163 (1.4747) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][530/625] eta 0:00:44 lr 0.001045 wd 0.0500 time 0.4604 (0.4682) data time 0.0008 (0.0020) model time 0.4597 (0.4661) loss 2.6555 (3.1914) grad_norm 1.9733 (1.4900) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][540/625] eta 0:00:39 lr 0.001045 wd 0.0500 time 0.4671 (0.4682) data time 0.0010 (0.0020) model time 0.4661 (0.4660) loss 2.5038 (3.1898) grad_norm 1.4208 (1.4904) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][550/625] eta 0:00:35 lr 0.001045 wd 0.0500 time 0.4682 (0.4682) data time 0.0008 (0.0020) model time 0.4674 (0.4660) loss 3.7782 (3.1898) grad_norm 1.9566 (1.4958) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:35:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][560/625] eta 0:00:30 lr 0.001045 wd 0.0500 time 0.4645 (0.4682) data time 0.0008 (0.0020) model time 0.4637 (0.4660) loss 3.3141 (3.1893) grad_norm 1.8593 (1.4965) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][570/625] eta 0:00:25 lr 0.001045 wd 0.0500 time 0.4672 (0.4681) data time 0.0008 (0.0020) model time 0.4665 (0.4660) loss 3.2663 (3.1862) grad_norm 1.2461 (1.4942) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][580/625] eta 0:00:21 lr 0.001045 wd 0.0500 time 0.4641 (0.4681) data time 0.0009 (0.0019) model time 0.4632 (0.4660) loss 3.0232 (3.1891) grad_norm 1.3833 (1.5023) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][590/625] eta 0:00:16 lr 0.001045 wd 0.0500 time 0.4659 (0.4680) data time 0.0010 (0.0019) model time 0.4649 (0.4659) loss 2.9215 (3.1918) grad_norm 1.3502 (1.5068) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][600/625] eta 0:00:11 lr 0.001045 wd 0.0500 time 0.4642 (0.4679) data time 0.0010 (0.0019) model time 0.4632 (0.4659) loss 3.7841 (3.1919) grad_norm 1.3499 (1.5116) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][610/625] eta 0:00:07 lr 0.001045 wd 0.0500 time 0.4639 (0.4679) data time 0.0005 (0.0019) model time 0.4634 (0.4658) loss 3.8752 (3.1970) grad_norm 1.1739 (1.5091) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [85/300][620/625] eta 0:00:02 lr 0.001044 wd 0.0500 time 0.4591 (0.4678) data time 0.0007 (0.0019) model time 0.4583 (0.4658) loss 3.2703 (3.1947) grad_norm 1.0329 (1.5084) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 85 training takes 0:04:52 [2024-08-10 08:36:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:36:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:36:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.5981 (0.5981) Acc@1 87.158 (87.158) Acc@5 98.242 (98.242) Mem 16715MB [2024-08-10 08:36:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.122 (0.162) Loss 0.9365 (0.7318) Acc@1 78.662 (83.740) Acc@5 94.580 (97.004) Mem 16715MB [2024-08-10 08:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0811 (0.8634) Acc@1 73.633 (80.269) Acc@5 93.164 (95.408) Mem 16715MB [2024-08-10 08:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 79.984 Acc@5 95.389 [2024-08-10 08:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.0% [2024-08-10 08:36:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.862 (0.862) Loss 0.5151 (0.5151) Acc@1 88.477 (88.477) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:36:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.197) Loss 0.8369 (0.6449) Acc@1 79.834 (85.529) Acc@5 95.654 (97.425) Mem 16715MB [2024-08-10 08:36:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.9590 (0.7657) Acc@1 75.537 (82.257) Acc@5 94.531 (96.170) Mem 16715MB [2024-08-10 08:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.998 Acc@5 96.179 [2024-08-10 08:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-10 08:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.00% [2024-08-10 08:36:38 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:36:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:36:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][0/625] eta 0:08:39 lr 0.001044 wd 0.0500 time 0.8310 (0.8310) data time 0.4142 (0.4142) model time 0.0000 (0.0000) loss 3.7452 (3.7452) grad_norm 1.8058 (1.8058) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][10/625] eta 0:05:05 lr 0.001044 wd 0.0500 time 0.4674 (0.4972) data time 0.0011 (0.0387) model time 0.0000 (0.0000) loss 3.4693 (3.3273) grad_norm 2.3261 (1.5749) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][20/625] eta 0:04:51 lr 0.001044 wd 0.0500 time 0.4607 (0.4815) data time 0.0008 (0.0208) model time 0.0000 (0.0000) loss 2.4510 (3.2708) grad_norm 1.1462 (1.5620) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][30/625] eta 0:04:47 lr 0.001044 wd 0.0500 time 0.6862 (0.4832) data time 0.0011 (0.0144) model time 0.0000 (0.0000) loss 3.1856 (3.2701) grad_norm 1.5338 (1.5717) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:36:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][40/625] eta 0:04:40 lr 0.001044 wd 0.0500 time 0.4701 (0.4787) data time 0.0007 (0.0111) model time 0.0000 (0.0000) loss 3.4025 (3.2826) grad_norm 1.2684 (1.5378) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][50/625] eta 0:04:34 lr 0.001044 wd 0.0500 time 0.4136 (0.4778) data time 0.0009 (0.0092) model time 0.0000 (0.0000) loss 2.9875 (3.2298) grad_norm 1.7835 (1.5302) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][60/625] eta 0:04:28 lr 0.001044 wd 0.0500 time 0.4648 (0.4761) data time 0.0008 (0.0078) model time 0.4640 (0.4663) loss 3.8168 (3.2787) grad_norm 2.1327 (1.6864) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][70/625] eta 0:04:23 lr 0.001044 wd 0.0500 time 0.4651 (0.4750) data time 0.0010 (0.0069) model time 0.4641 (0.4670) loss 2.2944 (3.2568) grad_norm 1.4535 (1.6781) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][80/625] eta 0:04:18 lr 0.001044 wd 0.0500 time 0.4700 (0.4740) data time 0.0010 (0.0062) model time 0.4691 (0.4664) loss 3.5217 (3.2785) grad_norm 1.3166 (1.6248) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][90/625] eta 0:04:13 lr 0.001044 wd 0.0500 time 0.4603 (0.4730) data time 0.0008 (0.0056) model time 0.4595 (0.4658) loss 3.2492 (3.2796) grad_norm 1.1583 (1.6018) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][100/625] eta 0:04:07 lr 0.001044 wd 0.0500 time 0.4658 (0.4723) data time 0.0010 (0.0051) model time 0.4648 (0.4657) loss 3.3947 (3.2714) grad_norm 1.6513 (1.5855) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][110/625] eta 0:04:02 lr 0.001044 wd 0.0500 time 0.4613 (0.4716) data time 0.0008 (0.0048) model time 0.4605 (0.4653) loss 2.5113 (3.2749) grad_norm 1.5250 (1.5857) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][120/625] eta 0:03:57 lr 0.001044 wd 0.0500 time 0.4653 (0.4712) data time 0.0010 (0.0045) model time 0.4643 (0.4654) loss 3.0183 (3.2744) grad_norm 1.5429 (1.6164) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][130/625] eta 0:03:53 lr 0.001044 wd 0.0500 time 0.4619 (0.4707) data time 0.0010 (0.0042) model time 0.4609 (0.4652) loss 3.0679 (3.2717) grad_norm 1.4599 (1.6183) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][140/625] eta 0:03:48 lr 0.001043 wd 0.0500 time 0.4647 (0.4705) data time 0.0011 (0.0040) model time 0.4636 (0.4653) loss 3.6230 (3.2616) grad_norm 1.2277 (1.6119) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][150/625] eta 0:03:43 lr 0.001043 wd 0.0500 time 0.4585 (0.4701) data time 0.0009 (0.0038) model time 0.4576 (0.4651) loss 2.0279 (3.2496) grad_norm 0.9607 (1.6008) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:37:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][160/625] eta 0:03:38 lr 0.001043 wd 0.0500 time 0.4627 (0.4697) data time 0.0007 (0.0036) model time 0.4619 (0.4650) loss 3.4116 (3.2447) grad_norm 1.0207 (1.5848) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][170/625] eta 0:03:33 lr 0.001043 wd 0.0500 time 0.4613 (0.4696) data time 0.0008 (0.0035) model time 0.4605 (0.4650) loss 3.8449 (3.2538) grad_norm 1.3138 (1.5744) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][180/625] eta 0:03:28 lr 0.001043 wd 0.0500 time 0.4613 (0.4692) data time 0.0008 (0.0033) model time 0.4606 (0.4648) loss 3.0785 (3.2393) grad_norm 1.1427 (1.5617) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][190/625] eta 0:03:24 lr 0.001043 wd 0.0500 time 0.4629 (0.4691) data time 0.0007 (0.0032) model time 0.4622 (0.4649) loss 3.4428 (3.2334) grad_norm 1.8051 (1.5752) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][200/625] eta 0:03:19 lr 0.001043 wd 0.0500 time 0.4677 (0.4690) data time 0.0009 (0.0031) model time 0.4668 (0.4650) loss 3.5747 (3.2384) grad_norm 1.9039 (1.5885) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][210/625] eta 0:03:14 lr 0.001043 wd 0.0500 time 0.4636 (0.4690) data time 0.0009 (0.0030) model time 0.4626 (0.4651) loss 3.4476 (3.2225) grad_norm 1.6665 (1.5977) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][220/625] eta 0:03:10 lr 0.001043 wd 0.0500 time 0.4632 (0.4699) data time 0.0012 (0.0030) model time 0.4620 (0.4664) loss 3.4430 (3.2321) grad_norm 1.4091 (1.5857) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][230/625] eta 0:03:05 lr 0.001043 wd 0.0500 time 0.4617 (0.4697) data time 0.0010 (0.0029) model time 0.4607 (0.4663) loss 3.5061 (3.2314) grad_norm 1.5306 (1.5701) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][240/625] eta 0:03:00 lr 0.001043 wd 0.0500 time 0.4691 (0.4696) data time 0.0010 (0.0028) model time 0.4681 (0.4663) loss 3.5703 (3.2374) grad_norm 1.9019 (1.5643) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][250/625] eta 0:02:56 lr 0.001043 wd 0.0500 time 0.4649 (0.4696) data time 0.0008 (0.0027) model time 0.4640 (0.4664) loss 3.0821 (3.2240) grad_norm 1.4109 (1.5565) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][260/625] eta 0:02:51 lr 0.001043 wd 0.0500 time 0.4672 (0.4694) data time 0.0008 (0.0027) model time 0.4664 (0.4663) loss 2.1270 (3.2262) grad_norm 1.3229 (1.5512) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][270/625] eta 0:02:46 lr 0.001042 wd 0.0500 time 0.4088 (0.4699) data time 0.0011 (0.0026) model time 0.4078 (0.4670) loss 3.0452 (3.2252) grad_norm 1.7334 (1.5543) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][280/625] eta 0:02:42 lr 0.001042 wd 0.0500 time 0.4662 (0.4698) data time 0.0010 (0.0026) model time 0.4652 (0.4669) loss 1.8753 (3.2195) grad_norm 2.3575 (1.5554) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][290/625] eta 0:02:37 lr 0.001042 wd 0.0500 time 0.4655 (0.4697) data time 0.0011 (0.0025) model time 0.4644 (0.4669) loss 3.2115 (3.2286) grad_norm 1.3122 (1.5467) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][300/625] eta 0:02:32 lr 0.001042 wd 0.0500 time 0.4643 (0.4696) data time 0.0011 (0.0025) model time 0.4633 (0.4669) loss 3.0271 (3.2212) grad_norm 1.3136 (1.5363) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][310/625] eta 0:02:27 lr 0.001042 wd 0.0500 time 0.4562 (0.4695) data time 0.0008 (0.0024) model time 0.4554 (0.4668) loss 3.5374 (3.2166) grad_norm 1.8363 (1.5338) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][320/625] eta 0:02:23 lr 0.001042 wd 0.0500 time 0.4656 (0.4693) data time 0.0010 (0.0024) model time 0.4646 (0.4667) loss 3.1781 (3.2164) grad_norm 2.4111 (1.5584) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][330/625] eta 0:02:18 lr 0.001042 wd 0.0500 time 0.4597 (0.4691) data time 0.0008 (0.0023) model time 0.4588 (0.4665) loss 3.7630 (3.2166) grad_norm 1.1546 (1.5593) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][340/625] eta 0:02:13 lr 0.001042 wd 0.0500 time 0.4693 (0.4690) data time 0.0009 (0.0023) model time 0.4684 (0.4664) loss 3.3414 (3.2136) grad_norm 2.2320 (1.5539) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][350/625] eta 0:02:08 lr 0.001042 wd 0.0500 time 0.4656 (0.4689) data time 0.0008 (0.0023) model time 0.4647 (0.4664) loss 3.6345 (3.2145) grad_norm 1.5997 (1.5565) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][360/625] eta 0:02:04 lr 0.001042 wd 0.0500 time 0.4645 (0.4689) data time 0.0008 (0.0022) model time 0.4637 (0.4664) loss 3.0163 (3.2104) grad_norm 1.0029 (1.5496) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][370/625] eta 0:01:59 lr 0.001042 wd 0.0500 time 0.4630 (0.4688) data time 0.0008 (0.0022) model time 0.4621 (0.4663) loss 1.9854 (3.2032) grad_norm 1.5176 (1.5452) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][380/625] eta 0:01:54 lr 0.001042 wd 0.0500 time 0.4636 (0.4687) data time 0.0008 (0.0022) model time 0.4628 (0.4662) loss 4.0699 (3.2074) grad_norm 2.7123 (1.5463) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][390/625] eta 0:01:50 lr 0.001042 wd 0.0500 time 0.4642 (0.4686) data time 0.0008 (0.0021) model time 0.4634 (0.4661) loss 3.6352 (3.2055) grad_norm 1.7258 (1.5457) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][400/625] eta 0:01:45 lr 0.001042 wd 0.0500 time 0.4715 (0.4685) data time 0.0008 (0.0021) model time 0.4707 (0.4660) loss 3.7795 (3.2047) grad_norm 1.7153 (1.5476) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][410/625] eta 0:01:40 lr 0.001041 wd 0.0500 time 0.4653 (0.4684) data time 0.0011 (0.0021) model time 0.4642 (0.4659) loss 3.1194 (3.2040) grad_norm 1.5696 (1.5476) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:39:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][420/625] eta 0:01:36 lr 0.001041 wd 0.0500 time 0.4637 (0.4688) data time 0.0007 (0.0021) model time 0.4630 (0.4664) loss 3.8562 (3.2040) grad_norm 1.8582 (1.5452) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][430/625] eta 0:01:31 lr 0.001041 wd 0.0500 time 0.4782 (0.4687) data time 0.0011 (0.0020) model time 0.4771 (0.4664) loss 3.3273 (3.2009) grad_norm 1.3330 (1.5399) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][440/625] eta 0:01:26 lr 0.001041 wd 0.0500 time 0.6462 (0.4690) data time 0.0010 (0.0020) model time 0.6452 (0.4668) loss 3.3524 (3.2007) grad_norm 1.4290 (1.5424) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][450/625] eta 0:01:22 lr 0.001041 wd 0.0500 time 0.4611 (0.4693) data time 0.0008 (0.0020) model time 0.4602 (0.4672) loss 3.6640 (3.2064) grad_norm 1.6941 (1.5381) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][460/625] eta 0:01:17 lr 0.001041 wd 0.0500 time 0.4627 (0.4691) data time 0.0010 (0.0020) model time 0.4617 (0.4670) loss 3.2237 (3.2071) grad_norm 1.5021 (1.5349) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][470/625] eta 0:01:12 lr 0.001041 wd 0.0500 time 0.4596 (0.4690) data time 0.0010 (0.0020) model time 0.4586 (0.4668) loss 2.8685 (3.2021) grad_norm 1.0747 (1.5303) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][480/625] eta 0:01:07 lr 0.001041 wd 0.0500 time 0.4656 (0.4689) data time 0.0008 (0.0019) model time 0.4648 (0.4667) loss 3.5230 (3.2048) grad_norm 1.7689 (1.5306) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][490/625] eta 0:01:03 lr 0.001041 wd 0.0500 time 0.4634 (0.4688) data time 0.0008 (0.0019) model time 0.4626 (0.4666) loss 2.1258 (3.2008) grad_norm 2.5229 (1.5378) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][500/625] eta 0:00:58 lr 0.001041 wd 0.0500 time 0.4630 (0.4687) data time 0.0011 (0.0019) model time 0.4619 (0.4666) loss 3.4744 (3.2037) grad_norm 1.5464 (1.5412) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][510/625] eta 0:00:53 lr 0.001041 wd 0.0500 time 0.4624 (0.4686) data time 0.0010 (0.0019) model time 0.4614 (0.4665) loss 2.8556 (3.2056) grad_norm 1.1987 (1.5382) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][520/625] eta 0:00:49 lr 0.001041 wd 0.0500 time 0.4614 (0.4685) data time 0.0009 (0.0019) model time 0.4605 (0.4664) loss 3.7044 (3.2063) grad_norm 1.3083 (1.5383) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][530/625] eta 0:00:44 lr 0.001041 wd 0.0500 time 0.4667 (0.4684) data time 0.0010 (0.0019) model time 0.4657 (0.4663) loss 2.3036 (3.2050) grad_norm 1.5686 (1.5399) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][540/625] eta 0:00:39 lr 0.001041 wd 0.0500 time 0.4627 (0.4683) data time 0.0010 (0.0018) model time 0.4617 (0.4663) loss 3.4202 (3.2026) grad_norm 1.1671 (1.5390) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:40:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][550/625] eta 0:00:35 lr 0.001040 wd 0.0500 time 0.4615 (0.4682) data time 0.0010 (0.0018) model time 0.4605 (0.4662) loss 3.4064 (3.2025) grad_norm 1.6775 (1.5364) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][560/625] eta 0:00:30 lr 0.001040 wd 0.0500 time 0.4637 (0.4682) data time 0.0009 (0.0018) model time 0.4628 (0.4661) loss 3.5440 (3.2030) grad_norm 1.2511 (1.5331) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][570/625] eta 0:00:25 lr 0.001040 wd 0.0500 time 0.4602 (0.4682) data time 0.0008 (0.0018) model time 0.4595 (0.4661) loss 3.2950 (3.2029) grad_norm 1.4139 (1.5353) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][580/625] eta 0:00:21 lr 0.001040 wd 0.0500 time 0.4657 (0.4681) data time 0.0008 (0.0018) model time 0.4649 (0.4661) loss 3.9890 (3.2046) grad_norm 1.3308 (1.5344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][590/625] eta 0:00:16 lr 0.001040 wd 0.0500 time 0.4642 (0.4689) data time 0.0008 (0.0018) model time 0.4634 (0.4670) loss 4.0386 (3.2065) grad_norm 1.5661 (1.5319) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][600/625] eta 0:00:11 lr 0.001040 wd 0.0500 time 0.4644 (0.4688) data time 0.0010 (0.0018) model time 0.4634 (0.4669) loss 3.1317 (3.2056) grad_norm 1.1748 (1.5340) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][610/625] eta 0:00:07 lr 0.001040 wd 0.0500 time 0.4557 (0.4687) data time 0.0008 (0.0018) model time 0.4550 (0.4668) loss 3.8358 (3.2085) grad_norm 1.5164 (1.5328) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [86/300][620/625] eta 0:00:02 lr 0.001040 wd 0.0500 time 0.4632 (0.4686) data time 0.0005 (0.0017) model time 0.4626 (0.4667) loss 3.6248 (3.2087) grad_norm 1.4044 (1.5332) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 86 training takes 0:04:52 [2024-08-10 08:41:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:41:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:41:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.6191 (0.6191) Acc@1 86.670 (86.670) Acc@5 97.998 (97.998) Mem 16715MB [2024-08-10 08:41:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.9800 (0.7405) Acc@1 78.076 (83.931) Acc@5 94.287 (96.897) Mem 16715MB [2024-08-10 08:41:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 1.1123 (0.8800) Acc@1 74.023 (80.459) Acc@5 92.822 (95.285) Mem 16715MB [2024-08-10 08:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.140 Acc@5 95.298 [2024-08-10 08:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-08-10 08:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.14% [2024-08-10 08:41:38 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 08:41:39 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 08:41:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.507 (0.507) Loss 0.5132 (0.5132) Acc@1 88.428 (88.428) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:41:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.160) Loss 0.8369 (0.6438) Acc@1 79.834 (85.578) Acc@5 95.752 (97.448) Mem 16715MB [2024-08-10 08:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9570 (0.7641) Acc@1 75.537 (82.292) Acc@5 94.629 (96.189) Mem 16715MB [2024-08-10 08:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.036 Acc@5 96.195 [2024-08-10 08:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-10 08:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.04% [2024-08-10 08:41:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:41:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][0/625] eta 0:09:02 lr 0.001040 wd 0.0500 time 0.8679 (0.8679) data time 0.4553 (0.4553) model time 0.0000 (0.0000) loss 2.3414 (2.3414) grad_norm 1.6154 (1.6154) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][10/625] eta 0:05:09 lr 0.001040 wd 0.0500 time 0.4692 (0.5028) data time 0.0010 (0.0424) model time 0.0000 (0.0000) loss 3.2467 (3.1360) grad_norm 1.2958 (1.5128) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][20/625] eta 0:04:53 lr 0.001040 wd 0.0500 time 0.4666 (0.4853) data time 0.0008 (0.0227) model time 0.0000 (0.0000) loss 4.0113 (3.2702) grad_norm 1.5116 (1.6846) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:41:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][30/625] eta 0:04:44 lr 0.001040 wd 0.0500 time 0.4658 (0.4784) data time 0.0009 (0.0157) model time 0.0000 (0.0000) loss 2.8788 (3.2583) grad_norm 2.0371 (1.6544) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][40/625] eta 0:04:37 lr 0.001040 wd 0.0500 time 0.4599 (0.4747) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 2.7792 (3.2219) grad_norm 1.3924 (1.6249) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][50/625] eta 0:04:31 lr 0.001040 wd 0.0500 time 0.4555 (0.4718) data time 0.0011 (0.0100) model time 0.0000 (0.0000) loss 2.8206 (3.1917) grad_norm 1.2909 (1.5988) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][60/625] eta 0:04:25 lr 0.001039 wd 0.0500 time 0.4642 (0.4702) data time 0.0009 (0.0085) model time 0.4633 (0.4609) loss 3.4801 (3.2257) grad_norm 1.3116 (1.5800) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][70/625] eta 0:04:20 lr 0.001039 wd 0.0500 time 0.4703 (0.4692) data time 0.0008 (0.0074) model time 0.4696 (0.4614) loss 3.8154 (3.2698) grad_norm 1.2615 (1.5631) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][80/625] eta 0:04:15 lr 0.001039 wd 0.0500 time 0.4672 (0.4687) data time 0.0010 (0.0066) model time 0.4662 (0.4623) loss 3.0464 (3.2687) grad_norm 1.6486 (1.5503) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][90/625] eta 0:04:10 lr 0.001039 wd 0.0500 time 0.4650 (0.4683) data time 0.0008 (0.0060) model time 0.4642 (0.4629) loss 2.4594 (3.2471) grad_norm 1.2925 (1.5594) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][100/625] eta 0:04:05 lr 0.001039 wd 0.0500 time 0.4625 (0.4678) data time 0.0010 (0.0055) model time 0.4615 (0.4628) loss 3.2842 (3.2217) grad_norm 0.9365 (1.5279) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][110/625] eta 0:04:00 lr 0.001039 wd 0.0500 time 0.4609 (0.4672) data time 0.0010 (0.0051) model time 0.4600 (0.4622) loss 3.2649 (3.2256) grad_norm 1.5058 (1.5302) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][120/625] eta 0:03:55 lr 0.001039 wd 0.0500 time 0.4602 (0.4667) data time 0.0008 (0.0048) model time 0.4595 (0.4620) loss 1.7600 (3.2129) grad_norm 1.4554 (1.5255) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][130/625] eta 0:03:52 lr 0.001039 wd 0.0500 time 0.4613 (0.4696) data time 0.0009 (0.0045) model time 0.4604 (0.4671) loss 2.4913 (3.2224) grad_norm 1.1172 (1.5325) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][140/625] eta 0:03:47 lr 0.001039 wd 0.0500 time 0.4620 (0.4691) data time 0.0011 (0.0043) model time 0.4609 (0.4665) loss 3.3984 (3.2356) grad_norm 1.2755 (1.5180) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:42:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][150/625] eta 0:03:42 lr 0.001039 wd 0.0500 time 0.4659 (0.4687) data time 0.0010 (0.0041) model time 0.4649 (0.4661) loss 3.8319 (3.2273) grad_norm 1.2888 (1.5251) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][160/625] eta 0:03:37 lr 0.001039 wd 0.0500 time 0.4670 (0.4684) data time 0.0011 (0.0039) model time 0.4659 (0.4658) loss 3.7545 (3.2083) grad_norm 1.6287 (1.5119) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][170/625] eta 0:03:33 lr 0.001039 wd 0.0500 time 0.4684 (0.4682) data time 0.0011 (0.0037) model time 0.4674 (0.4656) loss 3.4633 (3.2267) grad_norm 2.2163 (1.5144) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][180/625] eta 0:03:28 lr 0.001039 wd 0.0500 time 0.4635 (0.4680) data time 0.0010 (0.0036) model time 0.4624 (0.4654) loss 2.9778 (3.2337) grad_norm 1.1788 (1.5143) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][190/625] eta 0:03:23 lr 0.001039 wd 0.0500 time 0.4640 (0.4678) data time 0.0010 (0.0034) model time 0.4631 (0.4652) loss 2.6084 (3.2267) grad_norm 1.9233 (1.5132) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][200/625] eta 0:03:19 lr 0.001038 wd 0.0500 time 0.4626 (0.4683) data time 0.0008 (0.0033) model time 0.4617 (0.4661) loss 3.2199 (3.2304) grad_norm 1.2607 (1.4997) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][210/625] eta 0:03:14 lr 0.001038 wd 0.0500 time 0.4687 (0.4681) data time 0.0009 (0.0032) model time 0.4677 (0.4659) loss 3.3809 (3.2399) grad_norm 1.5026 (1.4991) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][220/625] eta 0:03:09 lr 0.001038 wd 0.0500 time 0.4647 (0.4680) data time 0.0008 (0.0031) model time 0.4639 (0.4657) loss 3.4516 (3.2328) grad_norm 2.0464 (1.4981) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][230/625] eta 0:03:04 lr 0.001038 wd 0.0500 time 0.4711 (0.4681) data time 0.0010 (0.0030) model time 0.4701 (0.4660) loss 3.5074 (3.2196) grad_norm 1.8259 (1.5074) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][240/625] eta 0:03:00 lr 0.001038 wd 0.0500 time 0.4633 (0.4681) data time 0.0010 (0.0029) model time 0.4623 (0.4661) loss 3.0926 (3.2150) grad_norm 1.7961 (1.5031) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][250/625] eta 0:02:55 lr 0.001038 wd 0.0500 time 0.4638 (0.4680) data time 0.0008 (0.0029) model time 0.4630 (0.4659) loss 3.4053 (3.2172) grad_norm 1.3804 (1.5314) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][260/625] eta 0:02:50 lr 0.001038 wd 0.0500 time 0.4627 (0.4678) data time 0.0008 (0.0028) model time 0.4619 (0.4658) loss 3.7384 (3.2205) grad_norm 1.2428 (1.5274) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][270/625] eta 0:02:46 lr 0.001038 wd 0.0500 time 0.4590 (0.4676) data time 0.0008 (0.0027) model time 0.4582 (0.4656) loss 3.1085 (3.2256) grad_norm 1.4044 (1.5241) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:43:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][280/625] eta 0:02:41 lr 0.001038 wd 0.0500 time 0.4627 (0.4674) data time 0.0010 (0.0027) model time 0.4617 (0.4653) loss 3.1492 (3.2287) grad_norm 0.9461 (1.5193) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][290/625] eta 0:02:36 lr 0.001038 wd 0.0500 time 0.4627 (0.4672) data time 0.0011 (0.0026) model time 0.4616 (0.4652) loss 2.4055 (3.2282) grad_norm 1.3848 (1.5098) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][300/625] eta 0:02:31 lr 0.001038 wd 0.0500 time 0.4684 (0.4672) data time 0.0007 (0.0026) model time 0.4676 (0.4652) loss 3.6044 (3.2271) grad_norm 1.2148 (1.5052) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][310/625] eta 0:02:27 lr 0.001038 wd 0.0500 time 0.4651 (0.4671) data time 0.0011 (0.0025) model time 0.4640 (0.4651) loss 2.9375 (3.2124) grad_norm 1.5055 (1.5021) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][320/625] eta 0:02:22 lr 0.001038 wd 0.0500 time 0.4619 (0.4670) data time 0.0011 (0.0025) model time 0.4608 (0.4651) loss 3.1506 (3.2107) grad_norm 1.5815 (1.4970) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][330/625] eta 0:02:17 lr 0.001038 wd 0.0500 time 0.4634 (0.4669) data time 0.0008 (0.0024) model time 0.4626 (0.4649) loss 2.7581 (3.2130) grad_norm 1.8803 (1.4957) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][340/625] eta 0:02:13 lr 0.001037 wd 0.0500 time 0.4598 (0.4668) data time 0.0007 (0.0024) model time 0.4590 (0.4648) loss 3.6634 (3.2028) grad_norm 1.1053 (1.4995) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][350/625] eta 0:02:08 lr 0.001037 wd 0.0500 time 0.4599 (0.4666) data time 0.0007 (0.0024) model time 0.4591 (0.4647) loss 3.0095 (3.2020) grad_norm 0.9625 (1.4935) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][360/625] eta 0:02:03 lr 0.001037 wd 0.0500 time 0.4634 (0.4665) data time 0.0010 (0.0023) model time 0.4624 (0.4645) loss 3.6999 (3.2031) grad_norm 1.4884 (1.4940) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][370/625] eta 0:01:58 lr 0.001037 wd 0.0500 time 0.4608 (0.4664) data time 0.0013 (0.0023) model time 0.4595 (0.4645) loss 3.5925 (3.2054) grad_norm 1.3063 (1.4983) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][380/625] eta 0:01:54 lr 0.001037 wd 0.0500 time 0.4691 (0.4664) data time 0.0007 (0.0023) model time 0.4683 (0.4645) loss 1.8522 (3.2022) grad_norm 1.6515 (1.5090) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][390/625] eta 0:01:49 lr 0.001037 wd 0.0500 time 0.4679 (0.4663) data time 0.0007 (0.0022) model time 0.4672 (0.4645) loss 3.3716 (3.2045) grad_norm 1.4727 (1.5200) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][400/625] eta 0:01:44 lr 0.001037 wd 0.0500 time 0.4605 (0.4663) data time 0.0010 (0.0022) model time 0.4595 (0.4644) loss 3.1682 (3.2084) grad_norm 1.2817 (1.5225) loss_scale 4096.0000 (2058.2145) mem 16715MB [2024-08-10 08:44:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][410/625] eta 0:01:40 lr 0.001037 wd 0.0500 time 0.4663 (0.4662) data time 0.0008 (0.0022) model time 0.4655 (0.4643) loss 3.8187 (3.1968) grad_norm 1.1648 (1.5210) loss_scale 4096.0000 (2107.7956) mem 16715MB [2024-08-10 08:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][420/625] eta 0:01:35 lr 0.001037 wd 0.0500 time 0.4665 (0.4665) data time 0.0009 (0.0021) model time 0.4656 (0.4647) loss 2.9198 (3.1969) grad_norm 1.7063 (1.5164) loss_scale 4096.0000 (2155.0214) mem 16715MB [2024-08-10 08:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][430/625] eta 0:01:30 lr 0.001037 wd 0.0500 time 0.4651 (0.4664) data time 0.0008 (0.0021) model time 0.4643 (0.4646) loss 3.5531 (3.1917) grad_norm 1.5161 (1.5119) loss_scale 4096.0000 (2200.0557) mem 16715MB [2024-08-10 08:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][440/625] eta 0:01:26 lr 0.001037 wd 0.0500 time 0.4604 (0.4664) data time 0.0008 (0.0021) model time 0.4596 (0.4646) loss 3.2745 (3.1969) grad_norm 1.4067 (1.5120) loss_scale 4096.0000 (2243.0476) mem 16715MB [2024-08-10 08:45:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][450/625] eta 0:01:21 lr 0.001037 wd 0.0500 time 0.4665 (0.4664) data time 0.0008 (0.0021) model time 0.4658 (0.4646) loss 3.2106 (3.2035) grad_norm 1.4075 (1.5090) loss_scale 4096.0000 (2284.1330) mem 16715MB [2024-08-10 08:45:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][460/625] eta 0:01:16 lr 0.001037 wd 0.0500 time 0.4631 (0.4664) data time 0.0008 (0.0020) model time 0.4623 (0.4646) loss 3.4527 (3.2027) grad_norm 1.1995 (inf) loss_scale 2048.0000 (2292.3384) mem 16715MB [2024-08-10 08:45:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][470/625] eta 0:01:12 lr 0.001036 wd 0.0500 time 0.4791 (0.4673) data time 0.0010 (0.0020) model time 0.4781 (0.4657) loss 3.4488 (3.2051) grad_norm 2.2723 (inf) loss_scale 2048.0000 (2287.1507) mem 16715MB [2024-08-10 08:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][480/625] eta 0:01:07 lr 0.001036 wd 0.0500 time 0.4673 (0.4673) data time 0.0008 (0.0020) model time 0.4664 (0.4657) loss 3.6951 (3.2036) grad_norm 1.9525 (inf) loss_scale 2048.0000 (2282.1788) mem 16715MB [2024-08-10 08:45:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][490/625] eta 0:01:03 lr 0.001036 wd 0.0500 time 0.4624 (0.4673) data time 0.0011 (0.0020) model time 0.4613 (0.4657) loss 3.3054 (3.2078) grad_norm 1.0914 (inf) loss_scale 2048.0000 (2277.4094) mem 16715MB [2024-08-10 08:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][500/625] eta 0:00:58 lr 0.001036 wd 0.0500 time 0.4636 (0.4672) data time 0.0009 (0.0020) model time 0.4626 (0.4656) loss 3.6419 (3.2107) grad_norm 1.7355 (inf) loss_scale 2048.0000 (2272.8303) mem 16715MB [2024-08-10 08:45:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][510/625] eta 0:00:53 lr 0.001036 wd 0.0500 time 0.4648 (0.4671) data time 0.0011 (0.0020) model time 0.4637 (0.4656) loss 3.5678 (3.2164) grad_norm 1.2939 (inf) loss_scale 2048.0000 (2268.4305) mem 16715MB [2024-08-10 08:45:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][520/625] eta 0:00:49 lr 0.001036 wd 0.0500 time 0.4626 (0.4671) data time 0.0009 (0.0019) model time 0.4618 (0.4655) loss 4.0398 (3.2169) grad_norm 1.1809 (inf) loss_scale 2048.0000 (2264.1996) mem 16715MB [2024-08-10 08:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][530/625] eta 0:00:44 lr 0.001036 wd 0.0500 time 0.4634 (0.4671) data time 0.0009 (0.0019) model time 0.4625 (0.4655) loss 3.2721 (3.2130) grad_norm 1.2139 (inf) loss_scale 2048.0000 (2260.1281) mem 16715MB [2024-08-10 08:45:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][540/625] eta 0:00:39 lr 0.001036 wd 0.0500 time 0.4629 (0.4670) data time 0.0008 (0.0019) model time 0.4621 (0.4654) loss 2.6394 (3.2137) grad_norm 1.5052 (inf) loss_scale 2048.0000 (2256.2070) mem 16715MB [2024-08-10 08:46:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][550/625] eta 0:00:35 lr 0.001036 wd 0.0500 time 0.4559 (0.4670) data time 0.0010 (0.0019) model time 0.4549 (0.4654) loss 3.3390 (3.2110) grad_norm 1.8698 (inf) loss_scale 2048.0000 (2252.4283) mem 16715MB [2024-08-10 08:46:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][560/625] eta 0:00:30 lr 0.001036 wd 0.0500 time 0.4684 (0.4670) data time 0.0007 (0.0019) model time 0.4676 (0.4654) loss 2.9954 (3.2121) grad_norm 1.4938 (inf) loss_scale 2048.0000 (2248.7843) mem 16715MB [2024-08-10 08:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][570/625] eta 0:00:25 lr 0.001036 wd 0.0500 time 0.4631 (0.4672) data time 0.0007 (0.0019) model time 0.4624 (0.4657) loss 3.4173 (3.2142) grad_norm 1.7424 (inf) loss_scale 2048.0000 (2245.2680) mem 16715MB [2024-08-10 08:46:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][580/625] eta 0:00:21 lr 0.001036 wd 0.0500 time 0.4692 (0.4671) data time 0.0010 (0.0019) model time 0.4682 (0.4656) loss 3.4524 (3.2146) grad_norm 1.8614 (inf) loss_scale 2048.0000 (2241.8726) mem 16715MB [2024-08-10 08:46:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][590/625] eta 0:00:16 lr 0.001036 wd 0.0500 time 0.4672 (0.4671) data time 0.0008 (0.0019) model time 0.4664 (0.4656) loss 3.6508 (3.2159) grad_norm 1.4731 (inf) loss_scale 2048.0000 (2238.5922) mem 16715MB [2024-08-10 08:46:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][600/625] eta 0:00:11 lr 0.001036 wd 0.0500 time 0.4688 (0.4671) data time 0.0008 (0.0018) model time 0.4680 (0.4656) loss 4.2122 (3.2164) grad_norm 1.4571 (inf) loss_scale 2048.0000 (2235.4210) mem 16715MB [2024-08-10 08:46:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][610/625] eta 0:00:07 lr 0.001035 wd 0.0500 time 0.4617 (0.4672) data time 0.0008 (0.0019) model time 0.4609 (0.4656) loss 3.0812 (3.2167) grad_norm 1.2559 (inf) loss_scale 2048.0000 (2232.3535) mem 16715MB [2024-08-10 08:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [87/300][620/625] eta 0:00:02 lr 0.001035 wd 0.0500 time 0.4592 (0.4671) data time 0.0006 (0.0019) model time 0.4586 (0.4655) loss 2.8590 (3.2182) grad_norm 1.3461 (inf) loss_scale 2048.0000 (2229.3849) mem 16715MB [2024-08-10 08:46:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 87 training takes 0:04:51 [2024-08-10 08:46:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:46:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.5811 (0.5811) Acc@1 87.549 (87.549) Acc@5 98.096 (98.096) Mem 16715MB [2024-08-10 08:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9136 (0.7212) Acc@1 79.102 (83.882) Acc@5 95.020 (97.115) Mem 16715MB [2024-08-10 08:46:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0312 (0.8608) Acc@1 74.805 (80.350) Acc@5 93.799 (95.540) Mem 16715MB [2024-08-10 08:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.096 Acc@5 95.541 [2024-08-10 08:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.1% [2024-08-10 08:46:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.956 (0.956) Loss 0.5122 (0.5122) Acc@1 88.477 (88.477) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.202) Loss 0.8340 (0.6424) Acc@1 79.834 (85.605) Acc@5 95.508 (97.465) Mem 16715MB [2024-08-10 08:46:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.165) Loss 0.9541 (0.7625) Acc@1 75.732 (82.343) Acc@5 94.580 (96.205) Mem 16715MB [2024-08-10 08:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.078 Acc@5 96.211 [2024-08-10 08:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-10 08:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.08% [2024-08-10 08:46:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:46:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:46:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][0/625] eta 0:08:09 lr 0.001035 wd 0.0500 time 0.7832 (0.7832) data time 0.3640 (0.3640) model time 0.0000 (0.0000) loss 1.9977 (1.9977) grad_norm 1.3491 (1.3491) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][10/625] eta 0:05:04 lr 0.001035 wd 0.0500 time 0.4767 (0.4947) data time 0.0011 (0.0344) model time 0.0000 (0.0000) loss 3.7511 (3.1217) grad_norm 1.2788 (1.5133) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:46:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][20/625] eta 0:04:50 lr 0.001035 wd 0.0500 time 0.4574 (0.4798) data time 0.0011 (0.0186) model time 0.0000 (0.0000) loss 3.2790 (3.0356) grad_norm 1.2277 (1.3961) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][30/625] eta 0:04:43 lr 0.001035 wd 0.0500 time 0.4684 (0.4760) data time 0.0008 (0.0129) model time 0.0000 (0.0000) loss 3.2039 (3.0009) grad_norm 1.0648 (1.3651) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][40/625] eta 0:04:37 lr 0.001035 wd 0.0500 time 0.4749 (0.4738) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 3.4281 (2.9796) grad_norm 1.4081 (1.4090) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][50/625] eta 0:04:34 lr 0.001035 wd 0.0500 time 0.4645 (0.4770) data time 0.0011 (0.0083) model time 0.0000 (0.0000) loss 2.8475 (2.9589) grad_norm 1.2768 (1.4530) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][60/625] eta 0:04:30 lr 0.001035 wd 0.0500 time 0.5322 (0.4793) data time 0.0008 (0.0071) model time 0.5314 (0.4898) loss 3.3125 (3.0065) grad_norm 1.1880 (1.4376) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][70/625] eta 0:04:25 lr 0.001035 wd 0.0500 time 0.4653 (0.4783) data time 0.0010 (0.0069) model time 0.4643 (0.4783) loss 2.8319 (3.0032) grad_norm 1.6068 (1.4551) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][80/625] eta 0:04:22 lr 0.001035 wd 0.0500 time 0.4679 (0.4812) data time 0.0010 (0.0061) model time 0.4669 (0.4859) loss 3.2102 (3.0381) grad_norm 1.0728 (1.4571) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][90/625] eta 0:04:16 lr 0.001035 wd 0.0500 time 0.4703 (0.4797) data time 0.0008 (0.0056) model time 0.4695 (0.4809) loss 3.4773 (3.0575) grad_norm 1.3897 (1.4352) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][100/625] eta 0:04:11 lr 0.001035 wd 0.0500 time 0.4803 (0.4792) data time 0.0010 (0.0052) model time 0.4793 (0.4794) loss 2.8491 (3.0663) grad_norm 1.5209 (1.4391) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][110/625] eta 0:04:06 lr 0.001035 wd 0.0500 time 0.4709 (0.4780) data time 0.0010 (0.0048) model time 0.4699 (0.4770) loss 3.4043 (3.0830) grad_norm 1.8769 (1.4604) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][120/625] eta 0:04:01 lr 0.001034 wd 0.0500 time 0.4699 (0.4778) data time 0.0007 (0.0045) model time 0.4692 (0.4767) loss 3.0014 (3.1001) grad_norm 1.6830 (1.5017) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][130/625] eta 0:03:56 lr 0.001034 wd 0.0500 time 0.4579 (0.4768) data time 0.0011 (0.0042) model time 0.4568 (0.4750) loss 2.9982 (3.1015) grad_norm 1.6830 (1.4984) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][140/625] eta 0:03:50 lr 0.001034 wd 0.0500 time 0.4629 (0.4760) data time 0.0011 (0.0040) model time 0.4618 (0.4738) loss 3.4283 (3.1297) grad_norm 1.3858 (1.4892) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:47:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][150/625] eta 0:03:45 lr 0.001034 wd 0.0500 time 0.4741 (0.4751) data time 0.0011 (0.0038) model time 0.4730 (0.4726) loss 3.4770 (3.1255) grad_norm 1.4485 (1.4958) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][160/625] eta 0:03:40 lr 0.001034 wd 0.0500 time 0.4516 (0.4745) data time 0.0011 (0.0037) model time 0.4506 (0.4718) loss 3.1078 (3.1338) grad_norm 1.4182 (1.5160) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][170/625] eta 0:03:35 lr 0.001034 wd 0.0500 time 0.4659 (0.4738) data time 0.0010 (0.0035) model time 0.4648 (0.4710) loss 2.9159 (3.1420) grad_norm 1.1569 (1.5036) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][180/625] eta 0:03:30 lr 0.001034 wd 0.0500 time 0.4730 (0.4734) data time 0.0008 (0.0034) model time 0.4722 (0.4706) loss 2.2856 (3.1275) grad_norm 1.1980 (1.4858) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][190/625] eta 0:03:25 lr 0.001034 wd 0.0500 time 0.4676 (0.4732) data time 0.0011 (0.0033) model time 0.4665 (0.4703) loss 3.4013 (3.1355) grad_norm 1.3588 (1.4734) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][200/625] eta 0:03:21 lr 0.001034 wd 0.0500 time 0.4603 (0.4730) data time 0.0010 (0.0032) model time 0.4593 (0.4701) loss 2.1897 (3.1455) grad_norm 1.2641 (1.4654) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][210/625] eta 0:03:16 lr 0.001034 wd 0.0500 time 0.4638 (0.4726) data time 0.0011 (0.0031) model time 0.4627 (0.4698) loss 3.7498 (3.1416) grad_norm 1.4284 (1.4608) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][220/625] eta 0:03:11 lr 0.001034 wd 0.0500 time 0.4715 (0.4722) data time 0.0008 (0.0030) model time 0.4707 (0.4694) loss 3.0441 (3.1483) grad_norm 1.2205 (1.4556) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][230/625] eta 0:03:06 lr 0.001034 wd 0.0500 time 0.4675 (0.4720) data time 0.0011 (0.0029) model time 0.4664 (0.4691) loss 3.4354 (3.1493) grad_norm 1.3513 (1.4507) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][240/625] eta 0:03:02 lr 0.001034 wd 0.0500 time 0.6566 (0.4733) data time 0.0010 (0.0029) model time 0.6555 (0.4709) loss 2.5583 (3.1390) grad_norm 2.6477 (1.4583) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][250/625] eta 0:02:57 lr 0.001033 wd 0.0500 time 0.4701 (0.4732) data time 0.0008 (0.0029) model time 0.4693 (0.4707) loss 2.3385 (3.1312) grad_norm 1.7669 (1.4620) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][260/625] eta 0:02:52 lr 0.001033 wd 0.0500 time 0.4587 (0.4729) data time 0.0008 (0.0028) model time 0.4579 (0.4705) loss 2.7922 (3.1340) grad_norm 4.6377 (1.4801) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][270/625] eta 0:02:47 lr 0.001033 wd 0.0500 time 0.4626 (0.4726) data time 0.0008 (0.0027) model time 0.4618 (0.4701) loss 3.7916 (3.1392) grad_norm 1.3600 (1.4768) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][280/625] eta 0:02:42 lr 0.001033 wd 0.0500 time 0.4651 (0.4723) data time 0.0007 (0.0027) model time 0.4643 (0.4698) loss 3.0820 (3.1426) grad_norm 1.8485 (1.4776) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][290/625] eta 0:02:38 lr 0.001033 wd 0.0500 time 0.4708 (0.4719) data time 0.0008 (0.0026) model time 0.4700 (0.4694) loss 3.6593 (3.1497) grad_norm 1.5506 (1.4718) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][300/625] eta 0:02:33 lr 0.001033 wd 0.0500 time 0.4560 (0.4716) data time 0.0011 (0.0026) model time 0.4549 (0.4691) loss 3.6369 (3.1579) grad_norm 1.2906 (1.4735) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][310/625] eta 0:02:28 lr 0.001033 wd 0.0500 time 0.4701 (0.4713) data time 0.0010 (0.0025) model time 0.4691 (0.4688) loss 3.0757 (3.1576) grad_norm 1.0937 (1.4701) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][320/625] eta 0:02:23 lr 0.001033 wd 0.0500 time 0.4587 (0.4711) data time 0.0007 (0.0025) model time 0.4579 (0.4687) loss 2.7129 (3.1395) grad_norm 1.8159 (1.4700) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][330/625] eta 0:02:18 lr 0.001033 wd 0.0500 time 0.4631 (0.4710) data time 0.0008 (0.0024) model time 0.4624 (0.4685) loss 2.6258 (3.1394) grad_norm 1.3875 (1.4816) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][340/625] eta 0:02:14 lr 0.001033 wd 0.0500 time 0.4689 (0.4708) data time 0.0008 (0.0024) model time 0.4681 (0.4683) loss 2.9759 (3.1373) grad_norm 1.5739 (1.4781) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][350/625] eta 0:02:09 lr 0.001033 wd 0.0500 time 0.4560 (0.4706) data time 0.0012 (0.0024) model time 0.4548 (0.4681) loss 2.8606 (3.1439) grad_norm 1.3948 (1.4709) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][360/625] eta 0:02:04 lr 0.001033 wd 0.0500 time 0.4665 (0.4703) data time 0.0008 (0.0023) model time 0.4657 (0.4679) loss 3.6225 (3.1510) grad_norm 1.1852 (1.4687) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][370/625] eta 0:01:59 lr 0.001033 wd 0.0500 time 0.4610 (0.4701) data time 0.0009 (0.0023) model time 0.4601 (0.4677) loss 2.3022 (3.1534) grad_norm 1.8605 (1.4795) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][380/625] eta 0:01:55 lr 0.001033 wd 0.0500 time 0.4616 (0.4699) data time 0.0012 (0.0023) model time 0.4603 (0.4675) loss 3.3948 (3.1527) grad_norm 1.0841 (1.4744) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][390/625] eta 0:01:50 lr 0.001032 wd 0.0500 time 0.4642 (0.4698) data time 0.0008 (0.0022) model time 0.4634 (0.4674) loss 2.3951 (3.1540) grad_norm 1.2844 (1.4698) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][400/625] eta 0:01:45 lr 0.001032 wd 0.0500 time 0.4780 (0.4698) data time 0.0011 (0.0022) model time 0.4769 (0.4674) loss 3.5684 (3.1565) grad_norm 2.5231 (1.4735) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][410/625] eta 0:01:40 lr 0.001032 wd 0.0500 time 0.4622 (0.4697) data time 0.0008 (0.0022) model time 0.4614 (0.4673) loss 3.4938 (3.1530) grad_norm 2.1253 (1.4734) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][420/625] eta 0:01:36 lr 0.001032 wd 0.0500 time 0.4639 (0.4700) data time 0.0008 (0.0022) model time 0.4631 (0.4677) loss 2.9135 (3.1542) grad_norm 1.2007 (1.4695) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][430/625] eta 0:01:31 lr 0.001032 wd 0.0500 time 0.4656 (0.4698) data time 0.0008 (0.0021) model time 0.4648 (0.4676) loss 3.7327 (3.1458) grad_norm 1.5283 (1.4705) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][440/625] eta 0:01:26 lr 0.001032 wd 0.0500 time 0.4570 (0.4697) data time 0.0010 (0.0021) model time 0.4559 (0.4675) loss 3.4302 (3.1470) grad_norm 1.3397 (1.4698) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][450/625] eta 0:01:22 lr 0.001032 wd 0.0500 time 0.4662 (0.4696) data time 0.0010 (0.0021) model time 0.4651 (0.4674) loss 2.5367 (3.1477) grad_norm 1.5969 (1.4821) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][460/625] eta 0:01:17 lr 0.001032 wd 0.0500 time 0.4623 (0.4700) data time 0.0010 (0.0021) model time 0.4613 (0.4679) loss 3.5504 (3.1503) grad_norm 1.3208 (1.4782) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][470/625] eta 0:01:12 lr 0.001032 wd 0.0500 time 0.4700 (0.4707) data time 0.0011 (0.0020) model time 0.4689 (0.4686) loss 3.6137 (3.1534) grad_norm 1.2784 (1.4765) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][480/625] eta 0:01:08 lr 0.001032 wd 0.0500 time 0.4696 (0.4706) data time 0.0010 (0.0020) model time 0.4686 (0.4686) loss 2.8595 (3.1486) grad_norm 1.0844 (1.4739) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][490/625] eta 0:01:03 lr 0.001032 wd 0.0500 time 0.4681 (0.4705) data time 0.0010 (0.0020) model time 0.4671 (0.4685) loss 3.6107 (3.1461) grad_norm 1.0734 (1.4736) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][500/625] eta 0:00:58 lr 0.001032 wd 0.0500 time 0.4640 (0.4704) data time 0.0008 (0.0020) model time 0.4632 (0.4684) loss 3.7772 (3.1488) grad_norm 1.3282 (1.4790) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][510/625] eta 0:00:54 lr 0.001032 wd 0.0500 time 0.4602 (0.4703) data time 0.0010 (0.0020) model time 0.4592 (0.4683) loss 3.1705 (3.1501) grad_norm 1.2220 (1.4779) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][520/625] eta 0:00:49 lr 0.001031 wd 0.0500 time 0.4645 (0.4702) data time 0.0008 (0.0019) model time 0.4637 (0.4682) loss 3.7107 (3.1462) grad_norm 1.2582 (1.4810) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:50:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][530/625] eta 0:00:44 lr 0.001031 wd 0.0500 time 0.4752 (0.4701) data time 0.0008 (0.0019) model time 0.4745 (0.4681) loss 3.9423 (3.1428) grad_norm 3.0439 (1.4864) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][540/625] eta 0:00:39 lr 0.001031 wd 0.0500 time 0.4641 (0.4700) data time 0.0009 (0.0019) model time 0.4632 (0.4680) loss 3.1361 (3.1438) grad_norm 1.3036 (1.4876) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][550/625] eta 0:00:35 lr 0.001031 wd 0.0500 time 0.4668 (0.4699) data time 0.0010 (0.0019) model time 0.4659 (0.4679) loss 3.4624 (3.1457) grad_norm 1.1296 (1.4854) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][560/625] eta 0:00:30 lr 0.001031 wd 0.0500 time 0.4647 (0.4698) data time 0.0008 (0.0019) model time 0.4638 (0.4679) loss 3.4097 (3.1459) grad_norm 2.2122 (1.4850) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][570/625] eta 0:00:25 lr 0.001031 wd 0.0500 time 0.4610 (0.4697) data time 0.0008 (0.0019) model time 0.4603 (0.4678) loss 2.9440 (3.1453) grad_norm 1.9283 (1.4875) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][580/625] eta 0:00:21 lr 0.001031 wd 0.0500 time 0.4682 (0.4696) data time 0.0010 (0.0019) model time 0.4671 (0.4677) loss 3.6014 (3.1406) grad_norm 2.2525 (1.4912) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][590/625] eta 0:00:16 lr 0.001031 wd 0.0500 time 0.4645 (0.4695) data time 0.0010 (0.0018) model time 0.4635 (0.4676) loss 2.3618 (3.1412) grad_norm 1.2571 (1.4907) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][600/625] eta 0:00:11 lr 0.001031 wd 0.0500 time 0.4591 (0.4694) data time 0.0010 (0.0018) model time 0.4581 (0.4675) loss 2.9158 (3.1420) grad_norm 0.9343 (1.4862) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][610/625] eta 0:00:07 lr 0.001031 wd 0.0500 time 0.4614 (0.4703) data time 0.0007 (0.0018) model time 0.4607 (0.4685) loss 3.0745 (3.1430) grad_norm 1.0417 (1.4878) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [88/300][620/625] eta 0:00:02 lr 0.001031 wd 0.0500 time 0.4633 (0.4702) data time 0.0005 (0.0018) model time 0.4628 (0.4684) loss 3.3854 (3.1405) grad_norm 1.3259 (1.4953) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 88 training takes 0:04:53 [2024-08-10 08:51:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:51:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:51:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.565 (0.565) Loss 0.5869 (0.5869) Acc@1 87.451 (87.451) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 08:51:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.165) Loss 0.9453 (0.7185) Acc@1 78.271 (83.842) Acc@5 94.824 (97.093) Mem 16715MB [2024-08-10 08:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.143) Loss 1.0664 (0.8493) Acc@1 73.145 (80.425) Acc@5 93.750 (95.582) Mem 16715MB [2024-08-10 08:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.160 Acc@5 95.551 [2024-08-10 08:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-10 08:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.16% [2024-08-10 08:51:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 08:51:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 08:51:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.502 (0.502) Loss 0.5122 (0.5122) Acc@1 88.428 (88.428) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 08:51:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.158) Loss 0.8296 (0.6412) Acc@1 80.029 (85.649) Acc@5 95.508 (97.479) Mem 16715MB [2024-08-10 08:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.139) Loss 0.9517 (0.7610) Acc@1 75.928 (82.394) Acc@5 94.629 (96.229) Mem 16715MB [2024-08-10 08:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.128 Acc@5 96.245 [2024-08-10 08:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-10 08:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.13% [2024-08-10 08:51:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:51:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:51:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][0/625] eta 0:08:56 lr 0.001031 wd 0.0500 time 0.8577 (0.8577) data time 0.4507 (0.4507) model time 0.0000 (0.0000) loss 3.5955 (3.5955) grad_norm 1.3637 (1.3637) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:51:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][10/625] eta 0:05:07 lr 0.001031 wd 0.0500 time 0.4617 (0.5002) data time 0.0008 (0.0420) model time 0.0000 (0.0000) loss 4.0384 (3.3224) grad_norm 1.8645 (1.6053) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][20/625] eta 0:04:52 lr 0.001031 wd 0.0500 time 0.4564 (0.4833) data time 0.0012 (0.0226) model time 0.0000 (0.0000) loss 2.4935 (3.1823) grad_norm 1.2298 (1.6217) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][30/625] eta 0:04:43 lr 0.001030 wd 0.0500 time 0.4586 (0.4769) data time 0.0010 (0.0156) model time 0.0000 (0.0000) loss 3.7388 (3.1117) grad_norm 1.1245 (1.5475) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][40/625] eta 0:04:37 lr 0.001030 wd 0.0500 time 0.4699 (0.4735) data time 0.0008 (0.0121) model time 0.0000 (0.0000) loss 3.0356 (3.1500) grad_norm 2.1372 (1.5531) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][50/625] eta 0:04:31 lr 0.001030 wd 0.0500 time 0.4658 (0.4721) data time 0.0010 (0.0099) model time 0.0000 (0.0000) loss 2.3658 (3.1385) grad_norm 1.1482 (1.4973) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][60/625] eta 0:04:26 lr 0.001030 wd 0.0500 time 0.4650 (0.4711) data time 0.0011 (0.0085) model time 0.4639 (0.4649) loss 3.2035 (3.1133) grad_norm 1.4830 (1.4685) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][70/625] eta 0:04:21 lr 0.001030 wd 0.0500 time 0.4659 (0.4707) data time 0.0009 (0.0075) model time 0.4650 (0.4658) loss 3.7569 (3.1098) grad_norm 1.7898 (1.4802) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][80/625] eta 0:04:16 lr 0.001030 wd 0.0500 time 0.4684 (0.4702) data time 0.0010 (0.0067) model time 0.4674 (0.4657) loss 3.3314 (3.1291) grad_norm 1.9655 (1.4985) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][90/625] eta 0:04:11 lr 0.001030 wd 0.0500 time 0.4638 (0.4693) data time 0.0008 (0.0061) model time 0.4630 (0.4646) loss 2.6340 (3.1171) grad_norm 1.2815 (1.5452) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][100/625] eta 0:04:06 lr 0.001030 wd 0.0500 time 0.4647 (0.4695) data time 0.0008 (0.0056) model time 0.4639 (0.4656) loss 3.8733 (3.1662) grad_norm 1.7845 (1.5605) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][110/625] eta 0:04:01 lr 0.001030 wd 0.0500 time 0.4613 (0.4691) data time 0.0011 (0.0052) model time 0.4602 (0.4654) loss 3.0450 (3.2009) grad_norm 1.3906 (1.5618) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][120/625] eta 0:03:56 lr 0.001030 wd 0.0500 time 0.4677 (0.4689) data time 0.0008 (0.0048) model time 0.4669 (0.4654) loss 3.9240 (3.2037) grad_norm 1.3648 (1.5550) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][130/625] eta 0:03:51 lr 0.001030 wd 0.0500 time 0.4659 (0.4686) data time 0.0008 (0.0046) model time 0.4651 (0.4652) loss 3.2782 (3.1943) grad_norm 1.3485 (1.5376) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][140/625] eta 0:03:47 lr 0.001030 wd 0.0500 time 0.4604 (0.4684) data time 0.0011 (0.0043) model time 0.4593 (0.4651) loss 3.3210 (3.1780) grad_norm 1.5773 (1.5327) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][150/625] eta 0:03:42 lr 0.001030 wd 0.0500 time 0.4626 (0.4681) data time 0.0011 (0.0041) model time 0.4616 (0.4649) loss 2.5855 (3.1801) grad_norm 1.6299 (1.5344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][160/625] eta 0:03:37 lr 0.001030 wd 0.0500 time 0.4636 (0.4677) data time 0.0008 (0.0039) model time 0.4628 (0.4646) loss 3.5266 (3.1740) grad_norm 1.7129 (1.5501) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][170/625] eta 0:03:32 lr 0.001029 wd 0.0500 time 0.4682 (0.4675) data time 0.0008 (0.0037) model time 0.4675 (0.4645) loss 3.5514 (3.1897) grad_norm 1.4343 (1.5501) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][180/625] eta 0:03:27 lr 0.001029 wd 0.0500 time 0.4565 (0.4673) data time 0.0008 (0.0036) model time 0.4557 (0.4643) loss 3.7638 (3.1855) grad_norm 1.0963 (1.5422) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][190/625] eta 0:03:23 lr 0.001029 wd 0.0500 time 0.4654 (0.4671) data time 0.0011 (0.0034) model time 0.4644 (0.4642) loss 3.2577 (3.1912) grad_norm 1.6560 (1.5775) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][200/625] eta 0:03:18 lr 0.001029 wd 0.0500 time 0.4610 (0.4670) data time 0.0010 (0.0033) model time 0.4600 (0.4641) loss 3.5383 (3.1933) grad_norm 1.2815 (1.5760) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][210/625] eta 0:03:14 lr 0.001029 wd 0.0500 time 0.4687 (0.4686) data time 0.0008 (0.0032) model time 0.4679 (0.4664) loss 3.8318 (3.2045) grad_norm 1.3353 (1.5610) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][220/625] eta 0:03:09 lr 0.001029 wd 0.0500 time 0.4683 (0.4685) data time 0.0010 (0.0031) model time 0.4672 (0.4663) loss 3.1556 (3.2057) grad_norm 0.8755 (1.5408) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][230/625] eta 0:03:04 lr 0.001029 wd 0.0500 time 0.4681 (0.4683) data time 0.0011 (0.0030) model time 0.4671 (0.4661) loss 2.0125 (3.1901) grad_norm 1.3970 (1.5273) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][240/625] eta 0:03:00 lr 0.001029 wd 0.0500 time 0.4625 (0.4681) data time 0.0010 (0.0030) model time 0.4615 (0.4660) loss 3.3432 (3.1975) grad_norm 1.7494 (1.5315) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][250/625] eta 0:02:55 lr 0.001029 wd 0.0500 time 0.4593 (0.4679) data time 0.0012 (0.0029) model time 0.4582 (0.4658) loss 3.3941 (3.1953) grad_norm 6.1848 (1.5559) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:53:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][260/625] eta 0:02:50 lr 0.001029 wd 0.0500 time 0.4679 (0.4678) data time 0.0008 (0.0028) model time 0.4672 (0.4657) loss 4.0124 (3.2002) grad_norm 1.6772 (1.5691) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][270/625] eta 0:02:46 lr 0.001029 wd 0.0500 time 0.4693 (0.4678) data time 0.0010 (0.0027) model time 0.4684 (0.4657) loss 3.5373 (3.2010) grad_norm 2.0614 (1.5651) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][280/625] eta 0:02:41 lr 0.001029 wd 0.0500 time 0.4638 (0.4677) data time 0.0010 (0.0027) model time 0.4628 (0.4656) loss 2.1488 (3.1942) grad_norm 1.1604 (1.5631) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][290/625] eta 0:02:36 lr 0.001029 wd 0.0500 time 0.4625 (0.4675) data time 0.0010 (0.0026) model time 0.4614 (0.4654) loss 3.0284 (3.2022) grad_norm 1.2013 (1.5568) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][300/625] eta 0:02:31 lr 0.001028 wd 0.0500 time 0.4679 (0.4673) data time 0.0010 (0.0026) model time 0.4669 (0.4652) loss 3.2506 (3.2051) grad_norm 1.3957 (1.5502) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][310/625] eta 0:02:27 lr 0.001028 wd 0.0500 time 0.4613 (0.4671) data time 0.0008 (0.0025) model time 0.4604 (0.4651) loss 2.4781 (3.2025) grad_norm 1.2253 (1.5405) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][320/625] eta 0:02:22 lr 0.001028 wd 0.0500 time 0.4665 (0.4675) data time 0.0007 (0.0025) model time 0.4658 (0.4656) loss 3.3483 (3.1976) grad_norm 1.4568 (1.5442) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][330/625] eta 0:02:17 lr 0.001028 wd 0.0500 time 0.4618 (0.4674) data time 0.0008 (0.0024) model time 0.4610 (0.4655) loss 1.9874 (3.1943) grad_norm 1.4128 (1.5382) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][340/625] eta 0:02:13 lr 0.001028 wd 0.0500 time 0.4697 (0.4673) data time 0.0008 (0.0024) model time 0.4689 (0.4654) loss 3.0284 (3.1978) grad_norm 1.8578 (1.5361) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][350/625] eta 0:02:08 lr 0.001028 wd 0.0500 time 0.7053 (0.4680) data time 0.0010 (0.0024) model time 0.7043 (0.4662) loss 3.6807 (3.1905) grad_norm 1.2856 (1.5355) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][360/625] eta 0:02:04 lr 0.001028 wd 0.0500 time 0.4669 (0.4684) data time 0.0010 (0.0023) model time 0.4659 (0.4668) loss 2.6564 (3.1925) grad_norm 1.7954 (1.5400) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][370/625] eta 0:01:59 lr 0.001028 wd 0.0500 time 0.4625 (0.4683) data time 0.0011 (0.0023) model time 0.4614 (0.4666) loss 2.9063 (3.1939) grad_norm 1.2154 (1.5379) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][380/625] eta 0:01:54 lr 0.001028 wd 0.0500 time 0.4653 (0.4682) data time 0.0008 (0.0023) model time 0.4645 (0.4665) loss 2.8003 (3.1955) grad_norm 1.3494 (1.5367) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:54:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][390/625] eta 0:01:50 lr 0.001028 wd 0.0500 time 0.4629 (0.4681) data time 0.0010 (0.0022) model time 0.4619 (0.4665) loss 3.2618 (3.1861) grad_norm 1.0888 (1.5318) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][400/625] eta 0:01:45 lr 0.001028 wd 0.0500 time 0.4656 (0.4680) data time 0.0010 (0.0022) model time 0.4646 (0.4663) loss 3.1971 (3.1815) grad_norm 1.5435 (1.5236) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][410/625] eta 0:01:40 lr 0.001028 wd 0.0500 time 0.4763 (0.4679) data time 0.0008 (0.0022) model time 0.4755 (0.4663) loss 3.5645 (3.1856) grad_norm 1.6052 (1.5225) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][420/625] eta 0:01:35 lr 0.001028 wd 0.0500 time 0.4625 (0.4679) data time 0.0008 (0.0021) model time 0.4617 (0.4662) loss 3.5317 (3.1868) grad_norm 1.8821 (1.5263) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][430/625] eta 0:01:31 lr 0.001027 wd 0.0500 time 0.4683 (0.4679) data time 0.0008 (0.0021) model time 0.4676 (0.4662) loss 3.2603 (3.1877) grad_norm 3.2077 (1.5424) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][440/625] eta 0:01:26 lr 0.001027 wd 0.0500 time 0.4645 (0.4678) data time 0.0010 (0.0021) model time 0.4635 (0.4661) loss 3.5544 (3.1839) grad_norm 1.4257 (1.5415) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][450/625] eta 0:01:21 lr 0.001027 wd 0.0500 time 0.4625 (0.4677) data time 0.0010 (0.0021) model time 0.4615 (0.4660) loss 2.5987 (3.1827) grad_norm 1.2915 (1.5348) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][460/625] eta 0:01:17 lr 0.001027 wd 0.0500 time 0.4594 (0.4676) data time 0.0010 (0.0021) model time 0.4584 (0.4659) loss 3.2434 (3.1828) grad_norm 1.4913 (1.5327) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][470/625] eta 0:01:12 lr 0.001027 wd 0.0500 time 0.4619 (0.4679) data time 0.0008 (0.0020) model time 0.4611 (0.4663) loss 2.1998 (3.1767) grad_norm 2.8387 (1.5334) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][480/625] eta 0:01:07 lr 0.001027 wd 0.0500 time 0.4588 (0.4677) data time 0.0010 (0.0020) model time 0.4578 (0.4661) loss 2.3621 (3.1773) grad_norm 1.5438 (1.5357) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][490/625] eta 0:01:03 lr 0.001027 wd 0.0500 time 0.4681 (0.4677) data time 0.0010 (0.0020) model time 0.4671 (0.4661) loss 3.0850 (3.1788) grad_norm 1.3570 (1.5306) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][500/625] eta 0:00:58 lr 0.001027 wd 0.0500 time 0.4650 (0.4677) data time 0.0008 (0.0020) model time 0.4642 (0.4661) loss 2.2780 (3.1796) grad_norm 1.3166 (1.5326) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][510/625] eta 0:00:53 lr 0.001027 wd 0.0500 time 0.4622 (0.4676) data time 0.0007 (0.0020) model time 0.4614 (0.4660) loss 2.2479 (3.1720) grad_norm 1.4228 (1.5299) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][520/625] eta 0:00:49 lr 0.001027 wd 0.0500 time 0.4604 (0.4675) data time 0.0008 (0.0019) model time 0.4595 (0.4660) loss 3.0315 (3.1698) grad_norm 1.1148 (1.5273) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][530/625] eta 0:00:44 lr 0.001027 wd 0.0500 time 0.4644 (0.4675) data time 0.0007 (0.0019) model time 0.4637 (0.4659) loss 2.6379 (3.1723) grad_norm 1.5063 (1.5241) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][540/625] eta 0:00:39 lr 0.001027 wd 0.0500 time 0.4627 (0.4674) data time 0.0007 (0.0019) model time 0.4620 (0.4658) loss 2.9996 (3.1722) grad_norm 1.3294 (1.5226) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][550/625] eta 0:00:35 lr 0.001027 wd 0.0500 time 0.4614 (0.4673) data time 0.0008 (0.0019) model time 0.4606 (0.4658) loss 3.4826 (3.1731) grad_norm 1.1813 (1.5210) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][560/625] eta 0:00:30 lr 0.001027 wd 0.0500 time 0.4657 (0.4673) data time 0.0008 (0.0019) model time 0.4649 (0.4657) loss 3.8104 (3.1755) grad_norm 2.0466 (1.5178) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][570/625] eta 0:00:25 lr 0.001026 wd 0.0500 time 0.4690 (0.4672) data time 0.0011 (0.0019) model time 0.4680 (0.4657) loss 3.1445 (3.1731) grad_norm 1.2579 (1.5160) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][580/625] eta 0:00:21 lr 0.001026 wd 0.0500 time 0.4608 (0.4672) data time 0.0008 (0.0018) model time 0.4600 (0.4656) loss 3.8291 (3.1780) grad_norm 1.3305 (1.5143) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][590/625] eta 0:00:16 lr 0.001026 wd 0.0500 time 0.4596 (0.4671) data time 0.0012 (0.0018) model time 0.4584 (0.4656) loss 2.5599 (3.1743) grad_norm 1.0789 (1.5104) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][600/625] eta 0:00:11 lr 0.001026 wd 0.0500 time 0.4636 (0.4671) data time 0.0011 (0.0018) model time 0.4625 (0.4656) loss 2.0572 (3.1753) grad_norm 1.9799 (1.5128) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][610/625] eta 0:00:07 lr 0.001026 wd 0.0500 time 0.4605 (0.4670) data time 0.0005 (0.0018) model time 0.4600 (0.4655) loss 3.8152 (3.1790) grad_norm 1.7928 (1.5133) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [89/300][620/625] eta 0:00:02 lr 0.001026 wd 0.0500 time 0.4608 (0.4669) data time 0.0007 (0.0018) model time 0.4601 (0.4654) loss 3.6531 (3.1795) grad_norm 1.6692 (1.5118) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 89 training takes 0:04:51 [2024-08-10 08:56:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 08:56:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 08:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.5957 (0.5957) Acc@1 87.695 (87.695) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 08:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9365 (0.7274) Acc@1 77.588 (83.940) Acc@5 94.434 (97.030) Mem 16715MB [2024-08-10 08:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.1270 (0.8624) Acc@1 72.070 (80.371) Acc@5 92.578 (95.426) Mem 16715MB [2024-08-10 08:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.172 Acc@5 95.405 [2024-08-10 08:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-10 08:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.17% [2024-08-10 08:56:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 08:56:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 08:56:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5112 (0.5112) Acc@1 88.428 (88.428) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 08:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.8252 (0.6394) Acc@1 80.176 (85.747) Acc@5 95.654 (97.483) Mem 16715MB [2024-08-10 08:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9512 (0.7591) Acc@1 75.928 (82.461) Acc@5 94.482 (96.226) Mem 16715MB [2024-08-10 08:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.182 Acc@5 96.247 [2024-08-10 08:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-10 08:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.18% [2024-08-10 08:56:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 08:56:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 08:56:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][0/625] eta 0:08:52 lr 0.001026 wd 0.0500 time 0.8514 (0.8514) data time 0.4389 (0.4389) model time 0.0000 (0.0000) loss 3.6176 (3.6176) grad_norm 1.3857 (1.3857) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][10/625] eta 0:05:08 lr 0.001026 wd 0.0500 time 0.4623 (0.5023) data time 0.0008 (0.0408) model time 0.0000 (0.0000) loss 3.5129 (3.1929) grad_norm 1.7682 (1.6728) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][20/625] eta 0:04:53 lr 0.001026 wd 0.0500 time 0.4608 (0.4849) data time 0.0008 (0.0219) model time 0.0000 (0.0000) loss 2.7509 (3.2370) grad_norm 1.4884 (1.6026) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][30/625] eta 0:04:52 lr 0.001026 wd 0.0500 time 0.4602 (0.4922) data time 0.0010 (0.0151) model time 0.0000 (0.0000) loss 3.4957 (3.2641) grad_norm 1.7650 (1.5505) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][40/625] eta 0:04:44 lr 0.001026 wd 0.0500 time 0.4608 (0.4858) data time 0.0010 (0.0117) model time 0.0000 (0.0000) loss 2.3975 (3.1512) grad_norm 1.5219 (1.6171) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][50/625] eta 0:04:37 lr 0.001026 wd 0.0500 time 0.4662 (0.4818) data time 0.0011 (0.0096) model time 0.0000 (0.0000) loss 2.9590 (3.0790) grad_norm 1.1547 (1.6038) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][60/625] eta 0:04:30 lr 0.001026 wd 0.0500 time 0.4591 (0.4788) data time 0.0010 (0.0082) model time 0.4581 (0.4622) loss 3.5503 (3.0889) grad_norm 1.5738 (1.5577) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][70/625] eta 0:04:24 lr 0.001025 wd 0.0500 time 0.4595 (0.4769) data time 0.0011 (0.0073) model time 0.4584 (0.4629) loss 2.8178 (3.0945) grad_norm 1.4169 (1.6317) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][80/625] eta 0:04:19 lr 0.001025 wd 0.0500 time 0.4612 (0.4756) data time 0.0008 (0.0066) model time 0.4604 (0.4636) loss 3.3312 (3.1025) grad_norm 1.1820 (1.7323) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][90/625] eta 0:04:13 lr 0.001025 wd 0.0500 time 0.4578 (0.4745) data time 0.0010 (0.0060) model time 0.4568 (0.4637) loss 2.9499 (3.1061) grad_norm 1.6191 (1.6939) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][100/625] eta 0:04:08 lr 0.001025 wd 0.0500 time 0.4612 (0.4732) data time 0.0010 (0.0055) model time 0.4602 (0.4630) loss 2.3115 (3.1028) grad_norm 1.3394 (1.6604) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][110/625] eta 0:04:03 lr 0.001025 wd 0.0500 time 0.4637 (0.4721) data time 0.0008 (0.0051) model time 0.4629 (0.4626) loss 3.4628 (3.1147) grad_norm 0.9195 (1.6474) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][120/625] eta 0:03:58 lr 0.001025 wd 0.0500 time 0.4630 (0.4717) data time 0.0008 (0.0048) model time 0.4622 (0.4630) loss 2.0156 (3.1079) grad_norm 1.4558 (1.6207) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][130/625] eta 0:03:53 lr 0.001025 wd 0.0500 time 0.4599 (0.4709) data time 0.0011 (0.0045) model time 0.4588 (0.4627) loss 2.0277 (3.1132) grad_norm 1.6751 (1.6198) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][140/625] eta 0:03:48 lr 0.001025 wd 0.0500 time 0.4663 (0.4706) data time 0.0011 (0.0043) model time 0.4652 (0.4631) loss 2.9153 (3.1244) grad_norm 1.7358 (1.6336) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][150/625] eta 0:03:43 lr 0.001025 wd 0.0500 time 0.4667 (0.4703) data time 0.0011 (0.0041) model time 0.4656 (0.4631) loss 3.6480 (3.1184) grad_norm 0.9224 (1.6096) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][160/625] eta 0:03:38 lr 0.001025 wd 0.0500 time 0.4631 (0.4700) data time 0.0009 (0.0039) model time 0.4622 (0.4632) loss 2.7938 (3.1219) grad_norm 1.7094 (1.6195) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][170/625] eta 0:03:33 lr 0.001025 wd 0.0500 time 0.4636 (0.4696) data time 0.0010 (0.0037) model time 0.4626 (0.4632) loss 3.9655 (3.1377) grad_norm 1.5129 (1.6039) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][180/625] eta 0:03:28 lr 0.001025 wd 0.0500 time 0.4576 (0.4692) data time 0.0008 (0.0035) model time 0.4568 (0.4631) loss 3.9200 (3.1315) grad_norm 1.1325 (1.5900) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][190/625] eta 0:03:23 lr 0.001025 wd 0.0500 time 0.4633 (0.4688) data time 0.0008 (0.0034) model time 0.4625 (0.4629) loss 3.8581 (3.1540) grad_norm 1.4855 (1.5835) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][200/625] eta 0:03:19 lr 0.001025 wd 0.0500 time 0.4589 (0.4688) data time 0.0011 (0.0034) model time 0.4578 (0.4631) loss 2.7959 (3.1518) grad_norm 1.5208 (1.5835) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][210/625] eta 0:03:14 lr 0.001024 wd 0.0500 time 0.4636 (0.4685) data time 0.0010 (0.0033) model time 0.4627 (0.4629) loss 3.4415 (3.1490) grad_norm 1.4325 (1.5875) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][220/625] eta 0:03:09 lr 0.001024 wd 0.0500 time 0.4657 (0.4688) data time 0.0008 (0.0032) model time 0.4649 (0.4636) loss 4.0491 (3.1606) grad_norm 1.5865 (1.5789) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][230/625] eta 0:03:05 lr 0.001024 wd 0.0500 time 0.4717 (0.4687) data time 0.0010 (0.0031) model time 0.4706 (0.4637) loss 2.4137 (3.1653) grad_norm 1.3017 (1.5686) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][240/625] eta 0:03:00 lr 0.001024 wd 0.0500 time 0.4629 (0.4686) data time 0.0010 (0.0030) model time 0.4619 (0.4638) loss 3.2472 (3.1630) grad_norm 1.2827 (1.5651) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][250/625] eta 0:02:55 lr 0.001024 wd 0.0500 time 0.4635 (0.4692) data time 0.0008 (0.0029) model time 0.4627 (0.4647) loss 4.1755 (3.1682) grad_norm 1.0688 (1.5535) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:58:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][260/625] eta 0:02:51 lr 0.001024 wd 0.0500 time 0.4625 (0.4689) data time 0.0010 (0.0029) model time 0.4615 (0.4645) loss 3.0111 (3.1609) grad_norm 1.7337 (1.5537) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][270/625] eta 0:02:46 lr 0.001024 wd 0.0500 time 0.4602 (0.4687) data time 0.0011 (0.0028) model time 0.4592 (0.4644) loss 3.5179 (3.1692) grad_norm 1.2711 (1.5529) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][280/625] eta 0:02:41 lr 0.001024 wd 0.0500 time 0.4665 (0.4686) data time 0.0010 (0.0028) model time 0.4654 (0.4644) loss 3.4696 (3.1670) grad_norm 1.1176 (1.5516) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][290/625] eta 0:02:36 lr 0.001024 wd 0.0500 time 0.4596 (0.4685) data time 0.0010 (0.0027) model time 0.4587 (0.4644) loss 3.1763 (3.1605) grad_norm 1.0705 (1.5536) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][300/625] eta 0:02:32 lr 0.001024 wd 0.0500 time 0.4645 (0.4684) data time 0.0008 (0.0026) model time 0.4637 (0.4645) loss 2.3889 (3.1546) grad_norm 1.1988 (1.5520) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][310/625] eta 0:02:27 lr 0.001024 wd 0.0500 time 0.4683 (0.4684) data time 0.0010 (0.0026) model time 0.4673 (0.4645) loss 3.3155 (3.1583) grad_norm 2.5516 (1.5577) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][320/625] eta 0:02:22 lr 0.001024 wd 0.0500 time 0.4633 (0.4682) data time 0.0008 (0.0025) model time 0.4625 (0.4644) loss 2.5540 (3.1508) grad_norm 1.1087 (1.5593) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][330/625] eta 0:02:18 lr 0.001024 wd 0.0500 time 0.4588 (0.4680) data time 0.0008 (0.0025) model time 0.4579 (0.4643) loss 2.3992 (3.1487) grad_norm 1.3383 (1.5551) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][340/625] eta 0:02:13 lr 0.001023 wd 0.0500 time 0.4646 (0.4680) data time 0.0008 (0.0025) model time 0.4637 (0.4643) loss 3.2188 (3.1520) grad_norm 1.2746 (1.5537) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][350/625] eta 0:02:08 lr 0.001023 wd 0.0500 time 0.4627 (0.4678) data time 0.0008 (0.0024) model time 0.4619 (0.4642) loss 3.4096 (3.1473) grad_norm 1.7620 (1.5573) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][360/625] eta 0:02:03 lr 0.001023 wd 0.0500 time 0.4629 (0.4677) data time 0.0010 (0.0024) model time 0.4619 (0.4642) loss 3.8398 (3.1460) grad_norm 1.5240 (1.5574) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][370/625] eta 0:01:59 lr 0.001023 wd 0.0500 time 0.4817 (0.4690) data time 0.0010 (0.0023) model time 0.4807 (0.4658) loss 2.1999 (3.1379) grad_norm 1.3638 (1.5487) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 08:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][380/625] eta 0:01:54 lr 0.001023 wd 0.0500 time 0.4627 (0.4689) data time 0.0010 (0.0023) model time 0.4617 (0.4657) loss 3.4876 (3.1360) grad_norm 1.1168 (1.5456) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][390/625] eta 0:01:50 lr 0.001023 wd 0.0500 time 0.4628 (0.4688) data time 0.0010 (0.0023) model time 0.4619 (0.4657) loss 3.3843 (3.1361) grad_norm 1.1749 (1.5385) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][400/625] eta 0:01:45 lr 0.001023 wd 0.0500 time 0.4580 (0.4691) data time 0.0009 (0.0023) model time 0.4571 (0.4660) loss 2.2362 (3.1398) grad_norm 2.2681 (1.5363) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][410/625] eta 0:01:40 lr 0.001023 wd 0.0500 time 0.4628 (0.4689) data time 0.0008 (0.0022) model time 0.4620 (0.4658) loss 3.7535 (3.1451) grad_norm 1.1192 (1.5323) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][420/625] eta 0:01:36 lr 0.001023 wd 0.0500 time 0.4591 (0.4687) data time 0.0010 (0.0022) model time 0.4581 (0.4658) loss 3.0233 (3.1510) grad_norm 1.7274 (1.5300) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][430/625] eta 0:01:31 lr 0.001023 wd 0.0500 time 0.4596 (0.4686) data time 0.0012 (0.0022) model time 0.4585 (0.4657) loss 3.3839 (3.1473) grad_norm 1.1291 (1.5447) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][440/625] eta 0:01:26 lr 0.001023 wd 0.0500 time 0.4610 (0.4685) data time 0.0008 (0.0022) model time 0.4603 (0.4656) loss 3.6782 (3.1474) grad_norm 1.6795 (1.5493) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][450/625] eta 0:01:21 lr 0.001023 wd 0.0500 time 0.4651 (0.4684) data time 0.0011 (0.0021) model time 0.4640 (0.4655) loss 3.4740 (3.1484) grad_norm 1.2944 (1.5500) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][460/625] eta 0:01:17 lr 0.001023 wd 0.0500 time 0.4560 (0.4683) data time 0.0011 (0.0021) model time 0.4549 (0.4654) loss 3.6787 (3.1451) grad_norm 1.1268 (1.5460) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][470/625] eta 0:01:12 lr 0.001022 wd 0.0500 time 0.4567 (0.4682) data time 0.0009 (0.0021) model time 0.4559 (0.4654) loss 3.5385 (3.1462) grad_norm 1.2509 (1.5449) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][480/625] eta 0:01:07 lr 0.001022 wd 0.0500 time 0.4489 (0.4681) data time 0.0011 (0.0021) model time 0.4478 (0.4653) loss 2.9180 (3.1490) grad_norm inf (inf) loss_scale 1024.0000 (2045.8711) mem 16715MB [2024-08-10 09:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][490/625] eta 0:01:03 lr 0.001022 wd 0.0500 time 0.4610 (0.4681) data time 0.0010 (0.0021) model time 0.4600 (0.4653) loss 3.0099 (3.1467) grad_norm 1.1249 (inf) loss_scale 1024.0000 (2025.0591) mem 16715MB [2024-08-10 09:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][500/625] eta 0:00:58 lr 0.001022 wd 0.0500 time 0.4613 (0.4680) data time 0.0008 (0.0021) model time 0.4605 (0.4653) loss 3.5352 (3.1490) grad_norm 1.5133 (inf) loss_scale 1024.0000 (2005.0778) mem 16715MB [2024-08-10 09:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][510/625] eta 0:00:53 lr 0.001022 wd 0.0500 time 0.4668 (0.4681) data time 0.0008 (0.0020) model time 0.4661 (0.4653) loss 3.9041 (3.1512) grad_norm 1.7395 (inf) loss_scale 1024.0000 (1985.8787) mem 16715MB [2024-08-10 09:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][520/625] eta 0:00:49 lr 0.001022 wd 0.0500 time 0.4721 (0.4680) data time 0.0010 (0.0020) model time 0.4710 (0.4653) loss 3.0770 (3.1475) grad_norm 1.2530 (inf) loss_scale 1024.0000 (1967.4165) mem 16715MB [2024-08-10 09:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][530/625] eta 0:00:44 lr 0.001022 wd 0.0500 time 0.4708 (0.4681) data time 0.0008 (0.0020) model time 0.4699 (0.4654) loss 3.5926 (3.1540) grad_norm 3.0958 (inf) loss_scale 1024.0000 (1949.6497) mem 16715MB [2024-08-10 09:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][540/625] eta 0:00:39 lr 0.001022 wd 0.0500 time 0.4610 (0.4680) data time 0.0008 (0.0020) model time 0.4602 (0.4654) loss 2.5117 (3.1502) grad_norm 1.9940 (inf) loss_scale 1024.0000 (1932.5397) mem 16715MB [2024-08-10 09:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][550/625] eta 0:00:35 lr 0.001022 wd 0.0500 time 0.4557 (0.4679) data time 0.0009 (0.0020) model time 0.4547 (0.4653) loss 3.4397 (3.1510) grad_norm 1.4524 (inf) loss_scale 1024.0000 (1916.0508) mem 16715MB [2024-08-10 09:01:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][560/625] eta 0:00:30 lr 0.001022 wd 0.0500 time 0.4560 (0.4685) data time 0.0008 (0.0020) model time 0.4552 (0.4659) loss 3.9021 (3.1475) grad_norm 1.7220 (inf) loss_scale 1024.0000 (1900.1497) mem 16715MB [2024-08-10 09:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][570/625] eta 0:00:25 lr 0.001022 wd 0.0500 time 0.4653 (0.4685) data time 0.0008 (0.0020) model time 0.4645 (0.4659) loss 2.7854 (3.1458) grad_norm 1.4129 (inf) loss_scale 1024.0000 (1884.8056) mem 16715MB [2024-08-10 09:01:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][580/625] eta 0:00:21 lr 0.001022 wd 0.0500 time 0.4753 (0.4684) data time 0.0010 (0.0020) model time 0.4743 (0.4659) loss 3.0763 (3.1484) grad_norm 1.3161 (inf) loss_scale 1024.0000 (1869.9897) mem 16715MB [2024-08-10 09:01:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][590/625] eta 0:00:16 lr 0.001022 wd 0.0500 time 0.4592 (0.4683) data time 0.0010 (0.0019) model time 0.4582 (0.4658) loss 3.3430 (3.1542) grad_norm 1.3284 (inf) loss_scale 1024.0000 (1855.6751) mem 16715MB [2024-08-10 09:01:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][600/625] eta 0:00:11 lr 0.001021 wd 0.0500 time 0.4675 (0.4683) data time 0.0008 (0.0019) model time 0.4667 (0.4658) loss 3.0061 (3.1563) grad_norm 1.3677 (inf) loss_scale 1024.0000 (1841.8369) mem 16715MB [2024-08-10 09:01:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][610/625] eta 0:00:07 lr 0.001021 wd 0.0500 time 0.4626 (0.4682) data time 0.0007 (0.0019) model time 0.4619 (0.4658) loss 3.6218 (3.1571) grad_norm 1.2024 (inf) loss_scale 1024.0000 (1828.4517) mem 16715MB [2024-08-10 09:01:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [90/300][620/625] eta 0:00:02 lr 0.001021 wd 0.0500 time 0.4571 (0.4681) data time 0.0005 (0.0019) model time 0.4566 (0.4656) loss 2.8834 (3.1573) grad_norm 1.4612 (inf) loss_scale 1024.0000 (1815.4976) mem 16715MB [2024-08-10 09:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 90 training takes 0:04:52 [2024-08-10 09:01:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:01:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5962 (0.5962) Acc@1 87.158 (87.158) Acc@5 98.438 (98.438) Mem 16715MB [2024-08-10 09:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.9404 (0.7221) Acc@1 78.271 (84.273) Acc@5 94.922 (97.013) Mem 16715MB [2024-08-10 09:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.141) Loss 1.1016 (0.8656) Acc@1 73.340 (80.673) Acc@5 93.359 (95.471) Mem 16715MB [2024-08-10 09:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.432 Acc@5 95.507 [2024-08-10 09:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-08-10 09:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.43% [2024-08-10 09:01:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 09:01:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 09:01:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.507 (0.507) Loss 0.5103 (0.5103) Acc@1 88.477 (88.477) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.8232 (0.6379) Acc@1 80.371 (85.809) Acc@5 95.801 (97.501) Mem 16715MB [2024-08-10 09:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9482 (0.7571) Acc@1 75.879 (82.566) Acc@5 94.678 (96.250) Mem 16715MB [2024-08-10 09:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.284 Acc@5 96.273 [2024-08-10 09:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 09:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.28% [2024-08-10 09:02:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:02:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][0/625] eta 0:08:51 lr 0.001021 wd 0.0500 time 0.8501 (0.8501) data time 0.4367 (0.4367) model time 0.0000 (0.0000) loss 3.4414 (3.4414) grad_norm 1.1580 (1.1580) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][10/625] eta 0:05:05 lr 0.001021 wd 0.0500 time 0.4645 (0.4970) data time 0.0010 (0.0406) model time 0.0000 (0.0000) loss 2.6155 (3.0024) grad_norm 1.3482 (1.3531) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][20/625] eta 0:04:51 lr 0.001021 wd 0.0500 time 0.4606 (0.4812) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 3.2985 (3.0595) grad_norm 1.2842 (1.4034) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][30/625] eta 0:04:42 lr 0.001021 wd 0.0500 time 0.4682 (0.4753) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 3.5523 (3.0654) grad_norm 1.1297 (1.4043) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][40/625] eta 0:04:36 lr 0.001021 wd 0.0500 time 0.4652 (0.4731) data time 0.0011 (0.0117) model time 0.0000 (0.0000) loss 2.3412 (3.0537) grad_norm 1.5460 (1.4497) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][50/625] eta 0:04:31 lr 0.001021 wd 0.0500 time 0.4601 (0.4713) data time 0.0010 (0.0096) model time 0.0000 (0.0000) loss 3.5183 (3.0492) grad_norm 1.1879 (1.4212) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][60/625] eta 0:04:25 lr 0.001021 wd 0.0500 time 0.4607 (0.4700) data time 0.0009 (0.0082) model time 0.4598 (0.4623) loss 2.4039 (3.0962) grad_norm 1.5615 (1.3951) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][70/625] eta 0:04:20 lr 0.001021 wd 0.0500 time 0.4630 (0.4691) data time 0.0010 (0.0072) model time 0.4620 (0.4624) loss 3.1517 (3.0908) grad_norm 1.3396 (1.3954) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][80/625] eta 0:04:15 lr 0.001021 wd 0.0500 time 0.4652 (0.4686) data time 0.0008 (0.0064) model time 0.4644 (0.4630) loss 3.7798 (3.1087) grad_norm 1.5846 (1.4173) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][90/625] eta 0:04:10 lr 0.001021 wd 0.0500 time 0.4596 (0.4682) data time 0.0012 (0.0059) model time 0.4585 (0.4630) loss 2.9993 (3.1288) grad_norm 1.2791 (1.4375) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][100/625] eta 0:04:05 lr 0.001021 wd 0.0500 time 0.4690 (0.4682) data time 0.0009 (0.0054) model time 0.4681 (0.4639) loss 3.4543 (3.1336) grad_norm 1.3216 (1.4340) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][110/625] eta 0:04:01 lr 0.001020 wd 0.0500 time 0.4639 (0.4681) data time 0.0010 (0.0050) model time 0.4629 (0.4643) loss 2.1412 (3.1398) grad_norm 1.1114 (1.4372) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:02:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][120/625] eta 0:03:56 lr 0.001020 wd 0.0500 time 0.4625 (0.4678) data time 0.0010 (0.0047) model time 0.4615 (0.4642) loss 3.4540 (3.1381) grad_norm 2.4468 (1.4513) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][130/625] eta 0:03:53 lr 0.001020 wd 0.0500 time 0.4597 (0.4721) data time 0.0008 (0.0044) model time 0.4589 (0.4716) loss 2.3794 (3.1315) grad_norm 1.1513 (1.4295) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][140/625] eta 0:03:48 lr 0.001020 wd 0.0500 time 0.4667 (0.4715) data time 0.0011 (0.0042) model time 0.4656 (0.4706) loss 3.3035 (3.1513) grad_norm 1.3914 (1.4214) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][150/625] eta 0:03:43 lr 0.001020 wd 0.0500 time 0.4744 (0.4712) data time 0.0008 (0.0039) model time 0.4736 (0.4701) loss 3.1748 (3.1666) grad_norm 1.7410 (1.4460) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][160/625] eta 0:03:38 lr 0.001020 wd 0.0500 time 0.4666 (0.4708) data time 0.0008 (0.0038) model time 0.4657 (0.4694) loss 3.8348 (3.1758) grad_norm 1.2669 (1.4426) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][170/625] eta 0:03:34 lr 0.001020 wd 0.0500 time 0.4657 (0.4705) data time 0.0010 (0.0036) model time 0.4647 (0.4691) loss 2.6058 (3.1640) grad_norm 1.9311 (1.4485) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][180/625] eta 0:03:29 lr 0.001020 wd 0.0500 time 0.4627 (0.4702) data time 0.0008 (0.0035) model time 0.4619 (0.4687) loss 3.2888 (3.1621) grad_norm 1.0558 (1.4427) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][190/625] eta 0:03:24 lr 0.001020 wd 0.0500 time 0.4657 (0.4701) data time 0.0008 (0.0034) model time 0.4649 (0.4685) loss 3.1667 (3.1700) grad_norm 1.6863 (1.4386) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][200/625] eta 0:03:19 lr 0.001020 wd 0.0500 time 0.4695 (0.4698) data time 0.0009 (0.0032) model time 0.4687 (0.4682) loss 3.7362 (3.1695) grad_norm 1.5255 (1.4394) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][210/625] eta 0:03:14 lr 0.001020 wd 0.0500 time 0.4596 (0.4695) data time 0.0009 (0.0031) model time 0.4587 (0.4679) loss 3.2501 (3.1764) grad_norm 2.1685 (1.4505) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][220/625] eta 0:03:10 lr 0.001020 wd 0.0500 time 0.4632 (0.4702) data time 0.0009 (0.0030) model time 0.4624 (0.4688) loss 3.5068 (3.1721) grad_norm 1.9556 (1.4576) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][230/625] eta 0:03:05 lr 0.001020 wd 0.0500 time 0.4681 (0.4701) data time 0.0008 (0.0030) model time 0.4673 (0.4687) loss 3.0746 (3.1793) grad_norm 1.2973 (1.4542) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][240/625] eta 0:03:00 lr 0.001019 wd 0.0500 time 0.4647 (0.4700) data time 0.0009 (0.0029) model time 0.4638 (0.4685) loss 3.4095 (3.1890) grad_norm 1.6147 (1.4512) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][250/625] eta 0:02:56 lr 0.001019 wd 0.0500 time 0.4651 (0.4699) data time 0.0009 (0.0028) model time 0.4642 (0.4684) loss 3.0004 (3.1897) grad_norm 1.0538 (1.4524) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][260/625] eta 0:02:51 lr 0.001019 wd 0.0500 time 0.4644 (0.4697) data time 0.0008 (0.0027) model time 0.4636 (0.4683) loss 3.3391 (3.1867) grad_norm 1.3363 (1.4491) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][270/625] eta 0:02:46 lr 0.001019 wd 0.0500 time 0.4631 (0.4696) data time 0.0010 (0.0027) model time 0.4621 (0.4681) loss 2.7942 (3.1831) grad_norm 1.3310 (1.4988) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][280/625] eta 0:02:41 lr 0.001019 wd 0.0500 time 0.4614 (0.4694) data time 0.0010 (0.0026) model time 0.4604 (0.4679) loss 3.1781 (3.1808) grad_norm 1.1309 (1.5026) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][290/625] eta 0:02:37 lr 0.001019 wd 0.0500 time 0.4628 (0.4692) data time 0.0010 (0.0026) model time 0.4618 (0.4677) loss 3.2828 (3.1818) grad_norm 1.3979 (1.5022) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][300/625] eta 0:02:32 lr 0.001019 wd 0.0500 time 0.4592 (0.4690) data time 0.0007 (0.0025) model time 0.4585 (0.4675) loss 3.4142 (3.1890) grad_norm 1.5077 (1.5027) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][310/625] eta 0:02:27 lr 0.001019 wd 0.0500 time 0.4643 (0.4689) data time 0.0010 (0.0025) model time 0.4633 (0.4673) loss 3.0986 (3.1899) grad_norm 1.0791 (1.5017) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][320/625] eta 0:02:22 lr 0.001019 wd 0.0500 time 0.4663 (0.4688) data time 0.0010 (0.0024) model time 0.4653 (0.4672) loss 2.9382 (3.1978) grad_norm 1.2959 (1.4977) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][330/625] eta 0:02:18 lr 0.001019 wd 0.0500 time 0.4656 (0.4686) data time 0.0010 (0.0024) model time 0.4646 (0.4671) loss 3.0347 (3.1964) grad_norm 1.3343 (1.4913) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][340/625] eta 0:02:13 lr 0.001019 wd 0.0500 time 0.4646 (0.4685) data time 0.0010 (0.0024) model time 0.4636 (0.4669) loss 3.3894 (3.1969) grad_norm 1.2943 (1.4901) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][350/625] eta 0:02:09 lr 0.001019 wd 0.0500 time 0.6503 (0.4701) data time 0.0008 (0.0023) model time 0.6496 (0.4688) loss 3.6580 (3.1945) grad_norm 1.2466 (1.4903) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][360/625] eta 0:02:04 lr 0.001019 wd 0.0500 time 0.6841 (0.4705) data time 0.0010 (0.0023) model time 0.6831 (0.4693) loss 3.1622 (3.1918) grad_norm 1.5305 (1.4956) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:04:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][370/625] eta 0:01:59 lr 0.001018 wd 0.0500 time 0.4617 (0.4702) data time 0.0009 (0.0023) model time 0.4608 (0.4689) loss 3.4705 (3.1907) grad_norm 1.3422 (1.4959) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][380/625] eta 0:01:55 lr 0.001018 wd 0.0500 time 0.4664 (0.4700) data time 0.0008 (0.0022) model time 0.4657 (0.4687) loss 1.9849 (3.1826) grad_norm 1.3398 (1.5015) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][390/625] eta 0:01:50 lr 0.001018 wd 0.0500 time 0.4661 (0.4699) data time 0.0008 (0.0022) model time 0.4653 (0.4686) loss 3.4699 (3.1803) grad_norm 1.4946 (1.5030) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][400/625] eta 0:01:45 lr 0.001018 wd 0.0500 time 0.4647 (0.4698) data time 0.0010 (0.0022) model time 0.4637 (0.4685) loss 3.4108 (3.1818) grad_norm 1.2338 (1.5050) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][410/625] eta 0:01:40 lr 0.001018 wd 0.0500 time 0.4661 (0.4697) data time 0.0009 (0.0022) model time 0.4651 (0.4683) loss 3.3230 (3.1785) grad_norm 1.4013 (1.5035) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][420/625] eta 0:01:36 lr 0.001018 wd 0.0500 time 0.4689 (0.4696) data time 0.0010 (0.0021) model time 0.4679 (0.4682) loss 2.5428 (3.1797) grad_norm 2.3160 (1.5048) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][430/625] eta 0:01:31 lr 0.001018 wd 0.0500 time 0.4739 (0.4694) data time 0.0008 (0.0021) model time 0.4732 (0.4681) loss 3.7602 (3.1813) grad_norm 1.4259 (1.5045) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][440/625] eta 0:01:26 lr 0.001018 wd 0.0500 time 0.4626 (0.4693) data time 0.0008 (0.0021) model time 0.4618 (0.4679) loss 2.8526 (3.1847) grad_norm 1.5435 (1.5038) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][450/625] eta 0:01:22 lr 0.001018 wd 0.0500 time 0.4665 (0.4692) data time 0.0008 (0.0021) model time 0.4657 (0.4678) loss 3.4100 (3.1872) grad_norm 1.7588 (1.5058) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][460/625] eta 0:01:17 lr 0.001018 wd 0.0500 time 0.4710 (0.4691) data time 0.0010 (0.0020) model time 0.4700 (0.4677) loss 2.8820 (3.1880) grad_norm 2.4874 (1.5098) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][470/625] eta 0:01:12 lr 0.001018 wd 0.0500 time 0.4652 (0.4690) data time 0.0009 (0.0020) model time 0.4642 (0.4676) loss 3.6196 (3.1889) grad_norm 0.9880 (1.5063) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][480/625] eta 0:01:07 lr 0.001018 wd 0.0500 time 0.4617 (0.4689) data time 0.0008 (0.0020) model time 0.4610 (0.4676) loss 2.7457 (3.1893) grad_norm 1.5615 (1.5021) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][490/625] eta 0:01:03 lr 0.001018 wd 0.0500 time 0.4631 (0.4688) data time 0.0010 (0.0020) model time 0.4622 (0.4674) loss 2.5186 (3.1924) grad_norm 1.7291 (1.5010) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][500/625] eta 0:00:58 lr 0.001017 wd 0.0500 time 0.4616 (0.4699) data time 0.0008 (0.0020) model time 0.4609 (0.4687) loss 3.6216 (3.1891) grad_norm 2.3161 (1.5035) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][510/625] eta 0:00:54 lr 0.001017 wd 0.0500 time 0.4514 (0.4698) data time 0.0010 (0.0019) model time 0.4504 (0.4685) loss 2.9044 (3.1857) grad_norm 1.1468 (1.5025) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][520/625] eta 0:00:49 lr 0.001017 wd 0.0500 time 0.4634 (0.4696) data time 0.0008 (0.0019) model time 0.4626 (0.4683) loss 3.8407 (3.1876) grad_norm 1.6225 (1.5010) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][530/625] eta 0:00:44 lr 0.001017 wd 0.0500 time 0.4684 (0.4695) data time 0.0008 (0.0019) model time 0.4676 (0.4682) loss 3.5680 (3.1910) grad_norm 1.2943 (1.5003) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][540/625] eta 0:00:39 lr 0.001017 wd 0.0500 time 0.4628 (0.4694) data time 0.0009 (0.0019) model time 0.4618 (0.4681) loss 3.2721 (3.1905) grad_norm 4.1117 (1.5063) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][550/625] eta 0:00:35 lr 0.001017 wd 0.0500 time 0.4682 (0.4694) data time 0.0010 (0.0019) model time 0.4673 (0.4681) loss 3.1930 (3.1918) grad_norm 1.7543 (1.5118) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][560/625] eta 0:00:30 lr 0.001017 wd 0.0500 time 0.4630 (0.4693) data time 0.0008 (0.0019) model time 0.4622 (0.4680) loss 3.2237 (3.1921) grad_norm 2.4349 (1.5087) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][570/625] eta 0:00:25 lr 0.001017 wd 0.0500 time 0.4720 (0.4692) data time 0.0007 (0.0019) model time 0.4713 (0.4679) loss 3.4506 (3.1881) grad_norm 1.6319 (1.5100) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][580/625] eta 0:00:21 lr 0.001017 wd 0.0500 time 0.4644 (0.4691) data time 0.0010 (0.0018) model time 0.4634 (0.4678) loss 3.7813 (3.1893) grad_norm 1.6198 (1.5139) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][590/625] eta 0:00:16 lr 0.001017 wd 0.0500 time 0.4593 (0.4690) data time 0.0010 (0.0018) model time 0.4583 (0.4677) loss 3.1518 (3.1850) grad_norm 1.2661 (1.5144) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][600/625] eta 0:00:11 lr 0.001017 wd 0.0500 time 0.4651 (0.4689) data time 0.0010 (0.0018) model time 0.4641 (0.4676) loss 2.4777 (3.1836) grad_norm 1.2026 (1.5106) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][610/625] eta 0:00:07 lr 0.001017 wd 0.0500 time 0.4663 (0.4689) data time 0.0005 (0.0018) model time 0.4657 (0.4676) loss 4.0232 (3.1879) grad_norm 2.1603 (1.5122) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [91/300][620/625] eta 0:00:02 lr 0.001017 wd 0.0500 time 0.4631 (0.4688) data time 0.0005 (0.0018) model time 0.4626 (0.4675) loss 3.1909 (3.1870) grad_norm 1.6446 (1.5090) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 91 training takes 0:04:52 [2024-08-10 09:06:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:06:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:06:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.5854 (0.5854) Acc@1 87.451 (87.451) Acc@5 97.949 (97.949) Mem 16715MB [2024-08-10 09:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.9771 (0.7330) Acc@1 77.881 (83.949) Acc@5 94.775 (96.813) Mem 16715MB [2024-08-10 09:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0605 (0.8635) Acc@1 74.170 (80.459) Acc@5 93.457 (95.419) Mem 16715MB [2024-08-10 09:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.218 Acc@5 95.487 [2024-08-10 09:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-10 09:07:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.893 (0.893) Loss 0.5093 (0.5093) Acc@1 88.428 (88.428) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.199) Loss 0.8232 (0.6371) Acc@1 80.420 (85.809) Acc@5 95.850 (97.505) Mem 16715MB [2024-08-10 09:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.9434 (0.7555) Acc@1 75.781 (82.550) Acc@5 94.775 (96.263) Mem 16715MB [2024-08-10 09:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.278 Acc@5 96.289 [2024-08-10 09:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 09:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][0/625] eta 0:14:02 lr 0.001016 wd 0.0500 time 1.3479 (1.3479) data time 0.6884 (0.6884) model time 0.0000 (0.0000) loss 3.8445 (3.8445) grad_norm 1.5989 (1.5989) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][10/625] eta 0:05:33 lr 0.001016 wd 0.0500 time 0.4596 (0.5425) data time 0.0008 (0.0635) model time 0.0000 (0.0000) loss 3.5577 (3.4801) grad_norm 1.1914 (1.5190) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][20/625] eta 0:05:10 lr 0.001016 wd 0.0500 time 0.4626 (0.5140) data time 0.0011 (0.0338) model time 0.0000 (0.0000) loss 2.9179 (3.2076) grad_norm 1.3081 (1.5040) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][30/625] eta 0:04:56 lr 0.001016 wd 0.0500 time 0.4580 (0.4976) data time 0.0009 (0.0233) model time 0.0000 (0.0000) loss 3.9663 (3.1594) grad_norm 1.2235 (1.5196) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][40/625] eta 0:04:46 lr 0.001016 wd 0.0500 time 0.4639 (0.4893) data time 0.0009 (0.0179) model time 0.0000 (0.0000) loss 3.3511 (3.2487) grad_norm 1.2387 (1.4667) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][50/625] eta 0:04:38 lr 0.001016 wd 0.0500 time 0.4654 (0.4849) data time 0.0011 (0.0146) model time 0.0000 (0.0000) loss 2.3210 (3.1690) grad_norm 1.4981 (1.4455) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][60/625] eta 0:04:32 lr 0.001016 wd 0.0500 time 0.4700 (0.4821) data time 0.0008 (0.0123) model time 0.4691 (0.4666) loss 3.7489 (3.2178) grad_norm 1.5757 (1.5701) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][70/625] eta 0:04:26 lr 0.001016 wd 0.0500 time 0.4670 (0.4799) data time 0.0008 (0.0107) model time 0.4663 (0.4662) loss 3.1392 (3.1908) grad_norm 1.6990 (1.5839) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][80/625] eta 0:04:20 lr 0.001016 wd 0.0500 time 0.4640 (0.4782) data time 0.0010 (0.0096) model time 0.4630 (0.4656) loss 2.8593 (3.1463) grad_norm 1.1293 (1.5667) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][90/625] eta 0:04:18 lr 0.001016 wd 0.0500 time 0.6415 (0.4835) data time 0.0008 (0.0086) model time 0.6407 (0.4807) loss 3.4081 (3.1651) grad_norm 2.0930 (1.5599) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][100/625] eta 0:04:12 lr 0.001016 wd 0.0500 time 0.4604 (0.4815) data time 0.0008 (0.0079) model time 0.4596 (0.4769) loss 3.1008 (3.1784) grad_norm 1.5028 (1.5413) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][110/625] eta 0:04:07 lr 0.001016 wd 0.0500 time 0.4705 (0.4800) data time 0.0011 (0.0073) model time 0.4694 (0.4747) loss 3.4354 (3.1768) grad_norm 1.1987 (1.5287) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][120/625] eta 0:04:01 lr 0.001016 wd 0.0500 time 0.4601 (0.4787) data time 0.0009 (0.0067) model time 0.4592 (0.4732) loss 2.2257 (3.1435) grad_norm 1.3006 (1.5359) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][130/625] eta 0:03:56 lr 0.001015 wd 0.0500 time 0.4641 (0.4778) data time 0.0008 (0.0063) model time 0.4632 (0.4722) loss 2.1052 (3.1394) grad_norm 1.4580 (1.5328) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][140/625] eta 0:03:51 lr 0.001015 wd 0.0500 time 0.4642 (0.4768) data time 0.0011 (0.0059) model time 0.4631 (0.4711) loss 2.3349 (3.1354) grad_norm 1.5895 (1.5316) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][150/625] eta 0:03:46 lr 0.001015 wd 0.0500 time 0.4678 (0.4759) data time 0.0008 (0.0056) model time 0.4670 (0.4702) loss 2.1725 (3.1363) grad_norm 1.4855 (1.5186) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][160/625] eta 0:03:40 lr 0.001015 wd 0.0500 time 0.4592 (0.4750) data time 0.0011 (0.0053) model time 0.4581 (0.4694) loss 3.2674 (3.1420) grad_norm 1.4120 (1.5290) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][170/625] eta 0:03:35 lr 0.001015 wd 0.0500 time 0.4607 (0.4743) data time 0.0008 (0.0051) model time 0.4598 (0.4687) loss 3.7082 (3.1384) grad_norm 3.5592 (1.5584) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][180/625] eta 0:03:30 lr 0.001015 wd 0.0500 time 0.4595 (0.4736) data time 0.0010 (0.0048) model time 0.4584 (0.4680) loss 3.2243 (3.1480) grad_norm 1.2102 (1.5787) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][190/625] eta 0:03:25 lr 0.001015 wd 0.0500 time 0.4666 (0.4731) data time 0.0010 (0.0047) model time 0.4656 (0.4677) loss 2.9260 (3.1304) grad_norm 1.8583 (1.5832) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][200/625] eta 0:03:20 lr 0.001015 wd 0.0500 time 0.4628 (0.4726) data time 0.0008 (0.0045) model time 0.4620 (0.4674) loss 2.8845 (3.1285) grad_norm 1.1548 (1.5722) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][210/625] eta 0:03:15 lr 0.001015 wd 0.0500 time 0.4657 (0.4722) data time 0.0010 (0.0043) model time 0.4647 (0.4671) loss 3.6341 (3.1318) grad_norm 1.5618 (1.5649) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][220/625] eta 0:03:11 lr 0.001015 wd 0.0500 time 0.4711 (0.4718) data time 0.0008 (0.0042) model time 0.4704 (0.4668) loss 1.9647 (3.1300) grad_norm 1.8930 (1.5687) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][230/625] eta 0:03:06 lr 0.001015 wd 0.0500 time 0.4628 (0.4714) data time 0.0008 (0.0040) model time 0.4620 (0.4665) loss 2.3146 (3.1263) grad_norm 1.4804 (1.5667) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][240/625] eta 0:03:01 lr 0.001015 wd 0.0500 time 0.4609 (0.4717) data time 0.0011 (0.0039) model time 0.4598 (0.4671) loss 2.1974 (3.1319) grad_norm 1.7890 (1.5653) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][250/625] eta 0:02:56 lr 0.001015 wd 0.0500 time 0.4577 (0.4713) data time 0.0010 (0.0038) model time 0.4567 (0.4668) loss 3.1545 (3.1270) grad_norm 1.9413 (1.5961) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][260/625] eta 0:02:51 lr 0.001014 wd 0.0500 time 0.4654 (0.4710) data time 0.0010 (0.0037) model time 0.4644 (0.4665) loss 3.7317 (3.1277) grad_norm 1.1068 (1.6001) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][270/625] eta 0:02:47 lr 0.001014 wd 0.0500 time 0.4699 (0.4707) data time 0.0009 (0.0036) model time 0.4690 (0.4663) loss 3.0071 (3.1266) grad_norm 1.2734 (1.5998) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][280/625] eta 0:02:42 lr 0.001014 wd 0.0500 time 0.4627 (0.4705) data time 0.0009 (0.0035) model time 0.4618 (0.4662) loss 3.7078 (3.1266) grad_norm 1.6493 (1.5860) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][290/625] eta 0:02:37 lr 0.001014 wd 0.0500 time 0.4611 (0.4703) data time 0.0008 (0.0034) model time 0.4603 (0.4661) loss 3.5081 (3.1364) grad_norm 1.2523 (1.5744) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][300/625] eta 0:02:32 lr 0.001014 wd 0.0500 time 0.4669 (0.4702) data time 0.0012 (0.0034) model time 0.4657 (0.4660) loss 2.6318 (3.1420) grad_norm 1.2259 (1.5703) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][310/625] eta 0:02:28 lr 0.001014 wd 0.0500 time 0.4607 (0.4700) data time 0.0010 (0.0033) model time 0.4597 (0.4660) loss 3.3937 (3.1378) grad_norm 1.2702 (1.5673) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][320/625] eta 0:02:23 lr 0.001014 wd 0.0500 time 0.4598 (0.4698) data time 0.0010 (0.0033) model time 0.4588 (0.4658) loss 3.4222 (3.1334) grad_norm 1.4947 (1.5632) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][330/625] eta 0:02:18 lr 0.001014 wd 0.0500 time 0.4690 (0.4696) data time 0.0008 (0.0032) model time 0.4682 (0.4657) loss 3.3374 (3.1410) grad_norm 1.7835 (1.5597) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][340/625] eta 0:02:13 lr 0.001014 wd 0.0500 time 0.4626 (0.4696) data time 0.0008 (0.0031) model time 0.4618 (0.4657) loss 2.4759 (3.1460) grad_norm 1.8993 (1.5591) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][350/625] eta 0:02:09 lr 0.001014 wd 0.0500 time 0.4653 (0.4695) data time 0.0008 (0.0031) model time 0.4645 (0.4657) loss 3.8655 (3.1512) grad_norm 1.0024 (1.5511) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][360/625] eta 0:02:04 lr 0.001014 wd 0.0500 time 0.4652 (0.4695) data time 0.0008 (0.0031) model time 0.4644 (0.4657) loss 3.3636 (3.1529) grad_norm 1.2893 (1.5479) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][370/625] eta 0:01:59 lr 0.001014 wd 0.0500 time 0.4622 (0.4693) data time 0.0008 (0.0030) model time 0.4614 (0.4656) loss 3.5034 (3.1480) grad_norm 1.6490 (1.5451) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][380/625] eta 0:01:55 lr 0.001014 wd 0.0500 time 0.4138 (0.4696) data time 0.0011 (0.0030) model time 0.4127 (0.4660) loss 3.7955 (3.1553) grad_norm 1.4556 (1.5466) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][390/625] eta 0:01:50 lr 0.001013 wd 0.0500 time 0.4637 (0.4694) data time 0.0008 (0.0029) model time 0.4629 (0.4659) loss 3.7326 (3.1504) grad_norm 1.4660 (1.5449) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][400/625] eta 0:01:45 lr 0.001013 wd 0.0500 time 0.4635 (0.4694) data time 0.0010 (0.0029) model time 0.4625 (0.4659) loss 2.9196 (3.1451) grad_norm 1.9505 (1.5548) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][410/625] eta 0:01:40 lr 0.001013 wd 0.0500 time 0.4828 (0.4693) data time 0.0011 (0.0028) model time 0.4817 (0.4659) loss 3.1969 (3.1451) grad_norm 1.3253 (1.5509) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][420/625] eta 0:01:36 lr 0.001013 wd 0.0500 time 0.4607 (0.4694) data time 0.0010 (0.0029) model time 0.4597 (0.4660) loss 2.7625 (3.1413) grad_norm 1.2499 (1.5485) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][430/625] eta 0:01:31 lr 0.001013 wd 0.0500 time 0.4670 (0.4704) data time 0.0010 (0.0028) model time 0.4660 (0.4671) loss 3.4556 (3.1444) grad_norm 1.3520 (1.5508) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][440/625] eta 0:01:26 lr 0.001013 wd 0.0500 time 0.4674 (0.4702) data time 0.0010 (0.0028) model time 0.4664 (0.4671) loss 2.8205 (3.1507) grad_norm 1.0933 (1.5469) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][450/625] eta 0:01:22 lr 0.001013 wd 0.0500 time 0.4643 (0.4700) data time 0.0010 (0.0028) model time 0.4632 (0.4669) loss 3.4413 (3.1470) grad_norm 1.5543 (1.5472) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][460/625] eta 0:01:17 lr 0.001013 wd 0.0500 time 0.4665 (0.4699) data time 0.0010 (0.0027) model time 0.4655 (0.4668) loss 3.3858 (3.1503) grad_norm 1.4330 (1.5456) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][470/625] eta 0:01:12 lr 0.001013 wd 0.0500 time 0.4678 (0.4698) data time 0.0008 (0.0027) model time 0.4670 (0.4667) loss 2.3820 (3.1543) grad_norm 1.0585 (1.5448) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][480/625] eta 0:01:08 lr 0.001013 wd 0.0500 time 0.4646 (0.4697) data time 0.0008 (0.0027) model time 0.4638 (0.4667) loss 3.6880 (3.1569) grad_norm 1.8354 (1.5412) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][490/625] eta 0:01:03 lr 0.001013 wd 0.0500 time 0.4708 (0.4697) data time 0.0008 (0.0026) model time 0.4700 (0.4667) loss 3.6801 (3.1617) grad_norm 1.5132 (1.5371) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][500/625] eta 0:00:58 lr 0.001013 wd 0.0500 time 0.4698 (0.4696) data time 0.0011 (0.0026) model time 0.4688 (0.4666) loss 3.2211 (3.1654) grad_norm 2.5383 (1.5504) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][510/625] eta 0:00:53 lr 0.001013 wd 0.0500 time 0.4685 (0.4695) data time 0.0010 (0.0026) model time 0.4675 (0.4665) loss 3.2886 (3.1651) grad_norm 1.3041 (1.5503) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][520/625] eta 0:00:49 lr 0.001012 wd 0.0500 time 0.4666 (0.4694) data time 0.0010 (0.0025) model time 0.4655 (0.4665) loss 3.4275 (3.1692) grad_norm 1.1266 (1.5421) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][530/625] eta 0:00:44 lr 0.001012 wd 0.0500 time 0.4683 (0.4693) data time 0.0008 (0.0025) model time 0.4675 (0.4664) loss 3.2267 (3.1663) grad_norm 1.1906 (1.5368) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][540/625] eta 0:00:39 lr 0.001012 wd 0.0500 time 0.4680 (0.4693) data time 0.0011 (0.0025) model time 0.4669 (0.4664) loss 3.0843 (3.1697) grad_norm 1.1953 (1.5348) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][550/625] eta 0:00:35 lr 0.001012 wd 0.0500 time 0.4716 (0.4692) data time 0.0007 (0.0025) model time 0.4709 (0.4664) loss 3.7656 (3.1724) grad_norm 1.1636 (1.5320) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][560/625] eta 0:00:30 lr 0.001012 wd 0.0500 time 0.4629 (0.4691) data time 0.0008 (0.0024) model time 0.4621 (0.4663) loss 3.9235 (3.1783) grad_norm 1.4689 (1.5404) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][570/625] eta 0:00:25 lr 0.001012 wd 0.0500 time 0.4638 (0.4691) data time 0.0008 (0.0024) model time 0.4630 (0.4663) loss 2.9008 (3.1794) grad_norm 1.1934 (1.5413) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][580/625] eta 0:00:21 lr 0.001012 wd 0.0500 time 0.4675 (0.4690) data time 0.0011 (0.0024) model time 0.4664 (0.4662) loss 3.2022 (3.1728) grad_norm 1.2085 (1.5373) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][590/625] eta 0:00:16 lr 0.001012 wd 0.0500 time 0.4622 (0.4689) data time 0.0008 (0.0024) model time 0.4614 (0.4662) loss 3.4837 (3.1682) grad_norm 1.8397 (1.5402) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][600/625] eta 0:00:11 lr 0.001012 wd 0.0500 time 0.4640 (0.4688) data time 0.0012 (0.0023) model time 0.4628 (0.4661) loss 3.3326 (3.1697) grad_norm 1.5442 (1.5383) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][610/625] eta 0:00:07 lr 0.001012 wd 0.0500 time 0.4584 (0.4687) data time 0.0005 (0.0023) model time 0.4578 (0.4660) loss 3.3411 (3.1743) grad_norm 1.3457 (1.5354) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [92/300][620/625] eta 0:00:02 lr 0.001012 wd 0.0500 time 0.4605 (0.4691) data time 0.0008 (0.0023) model time 0.4597 (0.4665) loss 3.1745 (3.1726) grad_norm 1.7210 (1.5337) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 92 training takes 0:04:53 [2024-08-10 09:11:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:11:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.563 (0.563) Loss 0.6050 (0.6050) Acc@1 87.012 (87.012) Acc@5 98.193 (98.193) Mem 16715MB [2024-08-10 09:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.165) Loss 0.9390 (0.7249) Acc@1 77.930 (84.024) Acc@5 94.824 (97.097) Mem 16715MB [2024-08-10 09:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.122 (0.144) Loss 1.0752 (0.8609) Acc@1 73.682 (80.448) Acc@5 93.506 (95.554) Mem 16715MB [2024-08-10 09:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.246 Acc@5 95.525 [2024-08-10 09:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.2% [2024-08-10 09:12:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.835 (0.835) Loss 0.5083 (0.5083) Acc@1 88.574 (88.574) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.194) Loss 0.8237 (0.6365) Acc@1 80.566 (85.871) Acc@5 95.703 (97.523) Mem 16715MB [2024-08-10 09:12:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.9414 (0.7545) Acc@1 75.830 (82.636) Acc@5 94.873 (96.275) Mem 16715MB [2024-08-10 09:12:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.344 Acc@5 96.299 [2024-08-10 09:12:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 09:12:06 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.34% [2024-08-10 09:12:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:12:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][0/625] eta 0:08:24 lr 0.001012 wd 0.0500 time 0.8074 (0.8074) data time 0.3965 (0.3965) model time 0.0000 (0.0000) loss 2.6641 (2.6641) grad_norm 1.6320 (1.6320) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][10/625] eta 0:05:05 lr 0.001012 wd 0.0500 time 0.4654 (0.4960) data time 0.0009 (0.0369) model time 0.0000 (0.0000) loss 2.7075 (2.9711) grad_norm 1.4746 (1.6016) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][20/625] eta 0:04:50 lr 0.001011 wd 0.0500 time 0.4592 (0.4806) data time 0.0008 (0.0199) model time 0.0000 (0.0000) loss 4.0176 (3.0466) grad_norm 1.3956 (1.5075) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][30/625] eta 0:04:42 lr 0.001011 wd 0.0500 time 0.4594 (0.4744) data time 0.0010 (0.0138) model time 0.0000 (0.0000) loss 3.2194 (3.0847) grad_norm 2.1022 (1.6341) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][40/625] eta 0:04:35 lr 0.001011 wd 0.0500 time 0.4646 (0.4715) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 3.6457 (3.0770) grad_norm 1.6685 (1.6479) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][50/625] eta 0:04:30 lr 0.001011 wd 0.0500 time 0.4787 (0.4701) data time 0.0007 (0.0088) model time 0.0000 (0.0000) loss 3.5203 (3.0593) grad_norm 1.3584 (1.5900) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][60/625] eta 0:04:25 lr 0.001011 wd 0.0500 time 0.4661 (0.4691) data time 0.0010 (0.0075) model time 0.4652 (0.4630) loss 3.3576 (3.0762) grad_norm 1.0827 (1.5466) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][70/625] eta 0:04:20 lr 0.001011 wd 0.0500 time 0.4617 (0.4685) data time 0.0010 (0.0066) model time 0.4607 (0.4633) loss 3.3257 (3.0640) grad_norm 2.2329 (1.5723) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][80/625] eta 0:04:15 lr 0.001011 wd 0.0500 time 0.4657 (0.4680) data time 0.0010 (0.0059) model time 0.4647 (0.4633) loss 3.1454 (3.0825) grad_norm 1.4018 (1.5670) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][90/625] eta 0:04:10 lr 0.001011 wd 0.0500 time 0.4682 (0.4677) data time 0.0008 (0.0054) model time 0.4674 (0.4636) loss 2.9966 (3.0527) grad_norm 1.5322 (1.5704) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:12:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][100/625] eta 0:04:05 lr 0.001011 wd 0.0500 time 0.4662 (0.4673) data time 0.0007 (0.0049) model time 0.4654 (0.4635) loss 3.2983 (3.0774) grad_norm 1.4693 (1.5640) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][110/625] eta 0:04:00 lr 0.001011 wd 0.0500 time 0.4577 (0.4671) data time 0.0008 (0.0046) model time 0.4569 (0.4634) loss 3.1614 (3.0851) grad_norm 1.4784 (1.5621) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][120/625] eta 0:03:55 lr 0.001011 wd 0.0500 time 0.4657 (0.4673) data time 0.0008 (0.0043) model time 0.4649 (0.4642) loss 3.7272 (3.1201) grad_norm 1.2990 (1.6199) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][130/625] eta 0:03:51 lr 0.001011 wd 0.0500 time 0.4677 (0.4672) data time 0.0010 (0.0041) model time 0.4667 (0.4643) loss 2.7641 (3.1348) grad_norm 1.2253 (1.6100) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][140/625] eta 0:03:47 lr 0.001011 wd 0.0500 time 0.4611 (0.4686) data time 0.0010 (0.0039) model time 0.4601 (0.4666) loss 2.9754 (3.1426) grad_norm 1.7382 (1.5933) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][150/625] eta 0:03:43 lr 0.001010 wd 0.0500 time 0.4116 (0.4697) data time 0.0011 (0.0037) model time 0.4105 (0.4683) loss 3.7116 (3.1562) grad_norm 1.2177 (1.5791) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][160/625] eta 0:03:38 lr 0.001010 wd 0.0500 time 0.4638 (0.4695) data time 0.0012 (0.0035) model time 0.4626 (0.4681) loss 2.2525 (3.1647) grad_norm 1.3780 (1.5671) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][170/625] eta 0:03:33 lr 0.001010 wd 0.0500 time 0.4614 (0.4693) data time 0.0010 (0.0034) model time 0.4604 (0.4678) loss 3.6300 (3.1552) grad_norm 1.1451 (1.5540) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][180/625] eta 0:03:28 lr 0.001010 wd 0.0500 time 0.4555 (0.4688) data time 0.0010 (0.0033) model time 0.4545 (0.4672) loss 2.6085 (3.1601) grad_norm 1.2206 (1.5418) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][190/625] eta 0:03:23 lr 0.001010 wd 0.0500 time 0.4603 (0.4684) data time 0.0011 (0.0031) model time 0.4592 (0.4667) loss 3.0521 (3.1622) grad_norm 1.3510 (1.5345) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][200/625] eta 0:03:18 lr 0.001010 wd 0.0500 time 0.4630 (0.4681) data time 0.0010 (0.0030) model time 0.4620 (0.4663) loss 2.3055 (3.1532) grad_norm 1.9536 (1.5322) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][210/625] eta 0:03:14 lr 0.001010 wd 0.0500 time 0.4647 (0.4678) data time 0.0010 (0.0030) model time 0.4637 (0.4660) loss 3.2636 (3.1568) grad_norm 1.6828 (1.5478) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][220/625] eta 0:03:09 lr 0.001010 wd 0.0500 time 0.4633 (0.4677) data time 0.0011 (0.0029) model time 0.4623 (0.4659) loss 3.5700 (3.1499) grad_norm 1.2488 (1.5535) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][230/625] eta 0:03:04 lr 0.001010 wd 0.0500 time 0.4572 (0.4676) data time 0.0008 (0.0028) model time 0.4564 (0.4657) loss 3.9112 (3.1533) grad_norm 1.3089 (1.5557) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][240/625] eta 0:02:59 lr 0.001010 wd 0.0500 time 0.4607 (0.4675) data time 0.0008 (0.0027) model time 0.4599 (0.4656) loss 3.7983 (3.1565) grad_norm 1.4443 (1.5473) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][250/625] eta 0:02:55 lr 0.001010 wd 0.0500 time 0.4627 (0.4674) data time 0.0008 (0.0027) model time 0.4619 (0.4656) loss 3.7724 (3.1544) grad_norm 1.3861 (1.5489) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][260/625] eta 0:02:50 lr 0.001010 wd 0.0500 time 0.4599 (0.4673) data time 0.0008 (0.0026) model time 0.4591 (0.4655) loss 3.6794 (3.1455) grad_norm 1.3805 (1.5471) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][270/625] eta 0:02:45 lr 0.001010 wd 0.0500 time 0.4651 (0.4671) data time 0.0010 (0.0025) model time 0.4642 (0.4653) loss 3.1588 (3.1464) grad_norm 1.1898 (1.5517) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][280/625] eta 0:02:41 lr 0.001009 wd 0.0500 time 0.4633 (0.4670) data time 0.0007 (0.0025) model time 0.4626 (0.4651) loss 2.9645 (3.1459) grad_norm 1.0460 (1.5465) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][290/625] eta 0:02:36 lr 0.001009 wd 0.0500 time 0.4615 (0.4669) data time 0.0010 (0.0024) model time 0.4605 (0.4651) loss 3.7947 (3.1475) grad_norm 1.6972 (1.5486) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][300/625] eta 0:02:31 lr 0.001009 wd 0.0500 time 0.4669 (0.4668) data time 0.0008 (0.0024) model time 0.4661 (0.4651) loss 2.3367 (3.1330) grad_norm 1.9356 (1.5504) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][310/625] eta 0:02:27 lr 0.001009 wd 0.0500 time 0.4618 (0.4667) data time 0.0008 (0.0024) model time 0.4610 (0.4650) loss 3.5875 (3.1346) grad_norm 1.1222 (1.5507) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][320/625] eta 0:02:22 lr 0.001009 wd 0.0500 time 0.4676 (0.4673) data time 0.0009 (0.0023) model time 0.4667 (0.4657) loss 3.6440 (3.1306) grad_norm 1.4530 (1.5500) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][330/625] eta 0:02:17 lr 0.001009 wd 0.0500 time 0.4584 (0.4673) data time 0.0010 (0.0023) model time 0.4574 (0.4657) loss 3.4903 (3.1355) grad_norm 1.3983 (1.5448) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][340/625] eta 0:02:13 lr 0.001009 wd 0.0500 time 0.4602 (0.4672) data time 0.0010 (0.0023) model time 0.4591 (0.4656) loss 3.0860 (3.1395) grad_norm 1.1749 (1.5483) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][350/625] eta 0:02:08 lr 0.001009 wd 0.0500 time 0.4602 (0.4671) data time 0.0008 (0.0023) model time 0.4594 (0.4655) loss 4.0406 (3.1368) grad_norm 1.1034 (1.5478) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][360/625] eta 0:02:03 lr 0.001009 wd 0.0500 time 0.4639 (0.4670) data time 0.0010 (0.0022) model time 0.4629 (0.4654) loss 3.2801 (3.1388) grad_norm 1.2635 (1.5439) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][370/625] eta 0:01:59 lr 0.001009 wd 0.0500 time 0.4618 (0.4670) data time 0.0008 (0.0022) model time 0.4610 (0.4653) loss 3.4108 (3.1387) grad_norm 1.4936 (1.5417) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][380/625] eta 0:01:54 lr 0.001009 wd 0.0500 time 0.4727 (0.4669) data time 0.0009 (0.0022) model time 0.4718 (0.4652) loss 2.3437 (3.1399) grad_norm 1.3794 (1.5352) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][390/625] eta 0:01:49 lr 0.001009 wd 0.0500 time 0.4636 (0.4668) data time 0.0011 (0.0021) model time 0.4624 (0.4651) loss 3.3690 (3.1405) grad_norm 1.2349 (1.5361) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][400/625] eta 0:01:45 lr 0.001009 wd 0.0500 time 0.4603 (0.4667) data time 0.0008 (0.0021) model time 0.4594 (0.4651) loss 2.5873 (3.1364) grad_norm 1.2830 (1.5315) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][410/625] eta 0:01:40 lr 0.001008 wd 0.0500 time 0.4670 (0.4667) data time 0.0008 (0.0021) model time 0.4661 (0.4650) loss 2.9398 (3.1337) grad_norm 1.5027 (1.5227) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][420/625] eta 0:01:35 lr 0.001008 wd 0.0500 time 0.4674 (0.4666) data time 0.0008 (0.0021) model time 0.4666 (0.4650) loss 3.7020 (3.1363) grad_norm 1.6590 (1.5337) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][430/625] eta 0:01:30 lr 0.001008 wd 0.0500 time 0.4659 (0.4666) data time 0.0009 (0.0020) model time 0.4650 (0.4650) loss 2.7125 (3.1403) grad_norm 1.1810 (1.5290) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][440/625] eta 0:01:26 lr 0.001008 wd 0.0500 time 0.4620 (0.4665) data time 0.0008 (0.0020) model time 0.4612 (0.4649) loss 3.1529 (3.1388) grad_norm 1.7153 (1.5249) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][450/625] eta 0:01:21 lr 0.001008 wd 0.0500 time 0.4623 (0.4665) data time 0.0009 (0.0020) model time 0.4614 (0.4649) loss 3.3777 (3.1396) grad_norm 1.4045 (1.5206) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][460/625] eta 0:01:16 lr 0.001008 wd 0.0500 time 0.4616 (0.4664) data time 0.0008 (0.0020) model time 0.4608 (0.4648) loss 3.9667 (3.1448) grad_norm 1.3563 (1.5176) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][470/625] eta 0:01:12 lr 0.001008 wd 0.0500 time 0.4614 (0.4664) data time 0.0008 (0.0020) model time 0.4605 (0.4648) loss 2.4253 (3.1459) grad_norm 1.2030 (1.5152) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][480/625] eta 0:01:07 lr 0.001008 wd 0.0500 time 0.4631 (0.4667) data time 0.0007 (0.0019) model time 0.4624 (0.4652) loss 3.4756 (3.1527) grad_norm 1.6576 (1.5115) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][490/625] eta 0:01:03 lr 0.001008 wd 0.0500 time 0.4590 (0.4670) data time 0.0009 (0.0019) model time 0.4581 (0.4655) loss 3.1123 (3.1535) grad_norm 1.2617 (1.5120) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][500/625] eta 0:00:58 lr 0.001008 wd 0.0500 time 0.4680 (0.4670) data time 0.0008 (0.0019) model time 0.4672 (0.4655) loss 3.6064 (3.1522) grad_norm 1.6784 (1.5133) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][510/625] eta 0:00:53 lr 0.001008 wd 0.0500 time 0.4723 (0.4674) data time 0.0008 (0.0019) model time 0.4716 (0.4660) loss 3.7458 (3.1572) grad_norm 2.0443 (1.5125) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][520/625] eta 0:00:49 lr 0.001008 wd 0.0500 time 0.4637 (0.4674) data time 0.0007 (0.0019) model time 0.4629 (0.4660) loss 2.2930 (3.1592) grad_norm 1.4147 (1.5274) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][530/625] eta 0:00:44 lr 0.001008 wd 0.0500 time 0.4623 (0.4674) data time 0.0010 (0.0019) model time 0.4612 (0.4660) loss 3.0870 (3.1639) grad_norm 1.0993 (1.5342) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][540/625] eta 0:00:39 lr 0.001007 wd 0.0500 time 0.4650 (0.4674) data time 0.0008 (0.0018) model time 0.4642 (0.4659) loss 3.4754 (3.1640) grad_norm 1.1218 (1.5292) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][550/625] eta 0:00:35 lr 0.001007 wd 0.0500 time 0.4600 (0.4673) data time 0.0007 (0.0018) model time 0.4592 (0.4659) loss 3.0636 (3.1627) grad_norm 0.9983 (1.5274) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][560/625] eta 0:00:30 lr 0.001007 wd 0.0500 time 0.4611 (0.4672) data time 0.0007 (0.0018) model time 0.4604 (0.4658) loss 3.5352 (3.1658) grad_norm 1.5364 (1.5245) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][570/625] eta 0:00:25 lr 0.001007 wd 0.0500 time 0.4710 (0.4672) data time 0.0008 (0.0018) model time 0.4702 (0.4657) loss 3.0097 (3.1651) grad_norm 1.3765 (1.5253) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][580/625] eta 0:00:21 lr 0.001007 wd 0.0500 time 0.4606 (0.4672) data time 0.0012 (0.0018) model time 0.4594 (0.4658) loss 3.4871 (3.1662) grad_norm 1.4906 (1.5236) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][590/625] eta 0:00:16 lr 0.001007 wd 0.0500 time 0.4634 (0.4672) data time 0.0010 (0.0018) model time 0.4624 (0.4658) loss 3.0174 (3.1645) grad_norm 1.3907 (1.5246) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][600/625] eta 0:00:11 lr 0.001007 wd 0.0500 time 0.4671 (0.4672) data time 0.0008 (0.0018) model time 0.4663 (0.4658) loss 2.7448 (3.1679) grad_norm 1.0328 (1.5216) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 09:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][610/625] eta 0:00:07 lr 0.001007 wd 0.0500 time 0.4574 (0.4671) data time 0.0007 (0.0018) model time 0.4566 (0.4657) loss 2.5237 (3.1626) grad_norm 2.2936 (1.5167) loss_scale 2048.0000 (1034.0556) mem 16715MB [2024-08-10 09:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [93/300][620/625] eta 0:00:02 lr 0.001007 wd 0.0500 time 0.4561 (0.4670) data time 0.0005 (0.0017) model time 0.4556 (0.4656) loss 2.8501 (3.1667) grad_norm 1.8660 (1.5186) loss_scale 2048.0000 (1050.3833) mem 16715MB [2024-08-10 09:17:00 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 93 training takes 0:04:51 [2024-08-10 09:17:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:17:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:17:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.5557 (0.5557) Acc@1 87.646 (87.646) Acc@5 98.389 (98.389) Mem 16715MB [2024-08-10 09:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9390 (0.7050) Acc@1 76.514 (84.051) Acc@5 94.727 (97.079) Mem 16715MB [2024-08-10 09:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.142) Loss 1.0479 (0.8487) Acc@1 74.316 (80.452) Acc@5 94.092 (95.475) Mem 16715MB [2024-08-10 09:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.268 Acc@5 95.477 [2024-08-10 09:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.3% [2024-08-10 09:17:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.849 (0.849) Loss 0.5083 (0.5083) Acc@1 88.623 (88.623) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:17:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.194) Loss 0.8228 (0.6354) Acc@1 80.566 (85.902) Acc@5 95.654 (97.528) Mem 16715MB [2024-08-10 09:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.9395 (0.7531) Acc@1 75.684 (82.617) Acc@5 94.873 (96.280) Mem 16715MB [2024-08-10 09:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.316 Acc@5 96.305 [2024-08-10 09:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 09:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][0/625] eta 0:14:19 lr 0.001007 wd 0.0500 time 1.3748 (1.3748) data time 0.8858 (0.8858) model time 0.0000 (0.0000) loss 3.2978 (3.2978) grad_norm 1.3739 (1.3739) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][10/625] eta 0:05:35 lr 0.001007 wd 0.0500 time 0.4664 (0.5462) data time 0.0008 (0.0815) model time 0.0000 (0.0000) loss 3.5403 (3.0529) grad_norm 1.5064 (1.5526) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][20/625] eta 0:05:07 lr 0.001007 wd 0.0500 time 0.4739 (0.5085) data time 0.0008 (0.0432) model time 0.0000 (0.0000) loss 3.9154 (3.2229) grad_norm 1.8964 (1.6037) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][30/625] eta 0:04:57 lr 0.001007 wd 0.0500 time 0.4641 (0.5000) data time 0.0008 (0.0296) model time 0.0000 (0.0000) loss 2.6604 (3.2277) grad_norm 1.8526 (1.6713) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][40/625] eta 0:04:47 lr 0.001006 wd 0.0500 time 0.4687 (0.4916) data time 0.0009 (0.0227) model time 0.0000 (0.0000) loss 2.2020 (3.1778) grad_norm 1.2916 (1.6077) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][50/625] eta 0:04:39 lr 0.001006 wd 0.0500 time 0.4600 (0.4859) data time 0.0012 (0.0185) model time 0.0000 (0.0000) loss 2.7219 (3.1226) grad_norm 1.8100 (1.6043) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][60/625] eta 0:04:32 lr 0.001006 wd 0.0500 time 0.4633 (0.4821) data time 0.0008 (0.0156) model time 0.4625 (0.4618) loss 2.3080 (3.1323) grad_norm 1.3385 (1.5664) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][70/625] eta 0:04:27 lr 0.001006 wd 0.0500 time 0.4628 (0.4822) data time 0.0008 (0.0135) model time 0.4620 (0.4718) loss 3.4198 (3.1645) grad_norm 1.4521 (1.5267) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][80/625] eta 0:04:21 lr 0.001006 wd 0.0500 time 0.4664 (0.4799) data time 0.0008 (0.0120) model time 0.4656 (0.4686) loss 3.7364 (3.1945) grad_norm 1.1175 (1.5035) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][90/625] eta 0:04:17 lr 0.001006 wd 0.0500 time 0.4616 (0.4808) data time 0.0010 (0.0108) model time 0.4605 (0.4731) loss 3.1442 (3.2106) grad_norm 1.6009 (1.5022) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:17:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][100/625] eta 0:04:11 lr 0.001006 wd 0.0500 time 0.4700 (0.4792) data time 0.0008 (0.0098) model time 0.4692 (0.4714) loss 3.6225 (3.1863) grad_norm 1.8207 (1.5460) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][110/625] eta 0:04:06 lr 0.001006 wd 0.0500 time 0.4666 (0.4780) data time 0.0008 (0.0090) model time 0.4658 (0.4703) loss 3.6584 (3.2008) grad_norm 1.4371 (1.5775) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][120/625] eta 0:04:00 lr 0.001006 wd 0.0500 time 0.4573 (0.4767) data time 0.0008 (0.0084) model time 0.4564 (0.4690) loss 3.6851 (3.1931) grad_norm 1.0791 (1.5546) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][130/625] eta 0:03:55 lr 0.001006 wd 0.0500 time 0.4644 (0.4757) data time 0.0011 (0.0078) model time 0.4633 (0.4681) loss 2.6714 (3.2121) grad_norm 1.4434 (1.5508) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][140/625] eta 0:03:50 lr 0.001006 wd 0.0500 time 0.4611 (0.4747) data time 0.0009 (0.0073) model time 0.4603 (0.4674) loss 2.9603 (3.1890) grad_norm 1.5855 (1.5474) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][150/625] eta 0:03:45 lr 0.001006 wd 0.0500 time 0.4595 (0.4740) data time 0.0010 (0.0069) model time 0.4584 (0.4668) loss 3.2959 (3.1873) grad_norm 1.7681 (1.5461) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][160/625] eta 0:03:40 lr 0.001005 wd 0.0500 time 0.4634 (0.4735) data time 0.0007 (0.0065) model time 0.4627 (0.4667) loss 4.0421 (3.1965) grad_norm 1.4827 (1.5342) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][170/625] eta 0:03:35 lr 0.001005 wd 0.0500 time 0.4648 (0.4731) data time 0.0008 (0.0062) model time 0.4639 (0.4666) loss 2.6078 (3.1975) grad_norm 1.4367 (1.5371) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][180/625] eta 0:03:30 lr 0.001005 wd 0.0500 time 0.4614 (0.4736) data time 0.0010 (0.0060) model time 0.4605 (0.4677) loss 3.6267 (3.1969) grad_norm 1.4069 (1.5383) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][190/625] eta 0:03:25 lr 0.001005 wd 0.0500 time 0.4657 (0.4731) data time 0.0010 (0.0057) model time 0.4647 (0.4674) loss 3.3724 (3.2014) grad_norm 1.0857 (1.5407) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][200/625] eta 0:03:20 lr 0.001005 wd 0.0500 time 0.4627 (0.4726) data time 0.0008 (0.0055) model time 0.4620 (0.4671) loss 2.5920 (3.1820) grad_norm 1.7452 (1.5391) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][210/625] eta 0:03:15 lr 0.001005 wd 0.0500 time 0.4638 (0.4722) data time 0.0010 (0.0052) model time 0.4628 (0.4668) loss 2.9513 (3.1723) grad_norm 1.3162 (1.5369) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][220/625] eta 0:03:11 lr 0.001005 wd 0.0500 time 0.4640 (0.4718) data time 0.0008 (0.0050) model time 0.4632 (0.4665) loss 3.1165 (3.1719) grad_norm 1.5504 (1.5351) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][230/625] eta 0:03:06 lr 0.001005 wd 0.0500 time 0.4706 (0.4715) data time 0.0010 (0.0049) model time 0.4696 (0.4664) loss 3.5868 (3.1657) grad_norm 2.1096 (1.5513) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][240/625] eta 0:03:01 lr 0.001005 wd 0.0500 time 0.4671 (0.4712) data time 0.0011 (0.0047) model time 0.4660 (0.4663) loss 2.9609 (3.1717) grad_norm 2.0321 (1.5715) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][250/625] eta 0:02:56 lr 0.001005 wd 0.0500 time 0.4660 (0.4710) data time 0.0010 (0.0046) model time 0.4650 (0.4661) loss 2.9858 (3.1650) grad_norm 1.1180 (1.5695) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][260/625] eta 0:02:52 lr 0.001005 wd 0.0500 time 0.4609 (0.4713) data time 0.0009 (0.0044) model time 0.4599 (0.4667) loss 2.0803 (3.1582) grad_norm 1.1634 (1.5564) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][270/625] eta 0:02:47 lr 0.001005 wd 0.0500 time 0.4709 (0.4710) data time 0.0009 (0.0043) model time 0.4699 (0.4665) loss 2.7724 (3.1647) grad_norm 1.2781 (1.5440) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][280/625] eta 0:02:42 lr 0.001005 wd 0.0500 time 0.4713 (0.4707) data time 0.0010 (0.0042) model time 0.4703 (0.4663) loss 3.1163 (3.1615) grad_norm 1.0219 (1.5347) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][290/625] eta 0:02:37 lr 0.001004 wd 0.0500 time 0.4625 (0.4705) data time 0.0010 (0.0041) model time 0.4615 (0.4661) loss 3.1536 (3.1562) grad_norm 1.1858 (1.5353) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][300/625] eta 0:02:32 lr 0.001004 wd 0.0500 time 0.4709 (0.4703) data time 0.0009 (0.0040) model time 0.4699 (0.4661) loss 3.4904 (3.1634) grad_norm 1.5884 (1.5302) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][310/625] eta 0:02:28 lr 0.001004 wd 0.0500 time 0.4860 (0.4708) data time 0.0009 (0.0039) model time 0.4850 (0.4668) loss 2.7406 (3.1577) grad_norm 1.7177 (1.5295) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][320/625] eta 0:02:23 lr 0.001004 wd 0.0500 time 0.4689 (0.4707) data time 0.0009 (0.0038) model time 0.4681 (0.4668) loss 3.9539 (3.1579) grad_norm 2.6152 (1.5331) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][330/625] eta 0:02:18 lr 0.001004 wd 0.0500 time 0.4573 (0.4705) data time 0.0007 (0.0037) model time 0.4566 (0.4667) loss 3.2396 (3.1595) grad_norm 1.8351 (1.5446) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][340/625] eta 0:02:14 lr 0.001004 wd 0.0500 time 0.4612 (0.4703) data time 0.0012 (0.0036) model time 0.4599 (0.4666) loss 3.0817 (3.1597) grad_norm 1.4693 (1.5525) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][350/625] eta 0:02:09 lr 0.001004 wd 0.0500 time 0.4570 (0.4702) data time 0.0008 (0.0036) model time 0.4563 (0.4665) loss 3.9092 (3.1673) grad_norm 1.9582 (1.5782) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][360/625] eta 0:02:04 lr 0.001004 wd 0.0500 time 0.4632 (0.4700) data time 0.0008 (0.0035) model time 0.4624 (0.4663) loss 1.8535 (3.1581) grad_norm 1.4969 (1.5749) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][370/625] eta 0:01:59 lr 0.001004 wd 0.0500 time 0.4595 (0.4697) data time 0.0010 (0.0034) model time 0.4585 (0.4661) loss 3.4279 (3.1629) grad_norm 1.9904 (1.5789) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][380/625] eta 0:01:55 lr 0.001004 wd 0.0500 time 0.4693 (0.4696) data time 0.0008 (0.0034) model time 0.4685 (0.4661) loss 2.6316 (3.1615) grad_norm 1.5447 (1.5822) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][390/625] eta 0:01:50 lr 0.001004 wd 0.0500 time 0.4641 (0.4695) data time 0.0010 (0.0033) model time 0.4631 (0.4660) loss 3.7160 (3.1512) grad_norm 1.7735 (1.5785) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][400/625] eta 0:01:45 lr 0.001004 wd 0.0500 time 0.4663 (0.4694) data time 0.0008 (0.0033) model time 0.4654 (0.4659) loss 3.7602 (3.1546) grad_norm 1.7295 (1.5744) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][410/625] eta 0:01:40 lr 0.001004 wd 0.0500 time 0.4628 (0.4693) data time 0.0008 (0.0032) model time 0.4620 (0.4659) loss 2.4134 (3.1555) grad_norm 1.6101 (1.5716) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][420/625] eta 0:01:36 lr 0.001003 wd 0.0500 time 0.4606 (0.4691) data time 0.0010 (0.0031) model time 0.4596 (0.4658) loss 3.0274 (3.1508) grad_norm 1.7649 (1.5680) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][430/625] eta 0:01:31 lr 0.001003 wd 0.0500 time 0.4608 (0.4691) data time 0.0010 (0.0031) model time 0.4598 (0.4658) loss 3.8739 (3.1514) grad_norm 2.3531 (1.5660) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][440/625] eta 0:01:26 lr 0.001003 wd 0.0500 time 0.4753 (0.4690) data time 0.0008 (0.0031) model time 0.4745 (0.4657) loss 2.8924 (3.1483) grad_norm 1.5176 (1.5652) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][450/625] eta 0:01:22 lr 0.001003 wd 0.0500 time 0.4637 (0.4689) data time 0.0008 (0.0030) model time 0.4629 (0.4656) loss 3.5068 (3.1487) grad_norm 1.4907 (1.5622) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][460/625] eta 0:01:17 lr 0.001003 wd 0.0500 time 0.4660 (0.4693) data time 0.0010 (0.0030) model time 0.4650 (0.4662) loss 2.8892 (3.1545) grad_norm 1.2503 (1.5601) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][470/625] eta 0:01:12 lr 0.001003 wd 0.0500 time 0.4639 (0.4692) data time 0.0009 (0.0029) model time 0.4630 (0.4661) loss 3.5598 (3.1538) grad_norm 1.2797 (1.5589) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][480/625] eta 0:01:08 lr 0.001003 wd 0.0500 time 0.4629 (0.4695) data time 0.0010 (0.0029) model time 0.4619 (0.4665) loss 3.2213 (3.1515) grad_norm 2.4081 (1.5573) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:20:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][490/625] eta 0:01:03 lr 0.001003 wd 0.0500 time 0.4604 (0.4694) data time 0.0008 (0.0029) model time 0.4595 (0.4664) loss 3.2695 (3.1529) grad_norm 1.0696 (1.5521) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][500/625] eta 0:00:58 lr 0.001003 wd 0.0500 time 0.4648 (0.4693) data time 0.0010 (0.0028) model time 0.4638 (0.4663) loss 3.5066 (3.1519) grad_norm 1.7935 (1.5573) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][510/625] eta 0:00:53 lr 0.001003 wd 0.0500 time 0.4629 (0.4695) data time 0.0008 (0.0028) model time 0.4621 (0.4666) loss 3.7017 (3.1603) grad_norm 1.6947 (1.5618) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][520/625] eta 0:00:49 lr 0.001003 wd 0.0500 time 0.4620 (0.4693) data time 0.0009 (0.0028) model time 0.4611 (0.4665) loss 3.7360 (3.1625) grad_norm 1.5544 (1.5610) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][530/625] eta 0:00:44 lr 0.001003 wd 0.0500 time 0.4642 (0.4692) data time 0.0011 (0.0027) model time 0.4631 (0.4664) loss 2.5942 (3.1630) grad_norm 1.2829 (1.5566) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][540/625] eta 0:00:39 lr 0.001002 wd 0.0500 time 0.4708 (0.4692) data time 0.0009 (0.0027) model time 0.4699 (0.4664) loss 2.0964 (3.1619) grad_norm 1.1355 (1.5686) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][550/625] eta 0:00:35 lr 0.001002 wd 0.0500 time 0.4663 (0.4692) data time 0.0009 (0.0027) model time 0.4655 (0.4664) loss 3.5007 (3.1624) grad_norm 1.3317 (1.5646) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][560/625] eta 0:00:30 lr 0.001002 wd 0.0500 time 0.4610 (0.4690) data time 0.0008 (0.0026) model time 0.4602 (0.4663) loss 3.7430 (3.1637) grad_norm 1.4528 (1.5604) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][570/625] eta 0:00:25 lr 0.001002 wd 0.0500 time 0.4602 (0.4689) data time 0.0012 (0.0026) model time 0.4591 (0.4662) loss 2.7395 (3.1656) grad_norm 1.3513 (1.5590) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][580/625] eta 0:00:21 lr 0.001002 wd 0.0500 time 0.4626 (0.4688) data time 0.0011 (0.0026) model time 0.4615 (0.4661) loss 2.7798 (3.1676) grad_norm 0.9810 (1.5637) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][590/625] eta 0:00:16 lr 0.001002 wd 0.0500 time 0.4594 (0.4687) data time 0.0008 (0.0026) model time 0.4587 (0.4660) loss 3.1020 (3.1670) grad_norm 1.8752 (1.5668) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][600/625] eta 0:00:11 lr 0.001002 wd 0.0500 time 0.4649 (0.4689) data time 0.0008 (0.0027) model time 0.4641 (0.4661) loss 2.4668 (3.1641) grad_norm 1.3500 (1.5691) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:21:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][610/625] eta 0:00:07 lr 0.001002 wd 0.0500 time 0.4666 (0.4688) data time 0.0007 (0.0026) model time 0.4659 (0.4660) loss 3.3022 (3.1651) grad_norm 1.0001 (1.5701) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [94/300][620/625] eta 0:00:02 lr 0.001002 wd 0.0500 time 0.4589 (0.4688) data time 0.0005 (0.0026) model time 0.4583 (0.4660) loss 2.3883 (3.1596) grad_norm 2.0143 (1.5717) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:02 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 94 training takes 0:04:53 [2024-08-10 09:22:02 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:22:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.527 (0.527) Loss 0.5688 (0.5688) Acc@1 87.109 (87.109) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 09:22:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8760 (0.6986) Acc@1 79.102 (84.282) Acc@5 94.873 (97.039) Mem 16715MB [2024-08-10 09:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 1.0010 (0.8327) Acc@1 74.561 (80.792) Acc@5 93.848 (95.554) Mem 16715MB [2024-08-10 09:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.396 Acc@5 95.519 [2024-08-10 09:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-08-10 09:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.876 (0.876) Loss 0.5073 (0.5073) Acc@1 88.525 (88.525) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:22:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.196) Loss 0.8223 (0.6347) Acc@1 80.176 (85.866) Acc@5 95.654 (97.519) Mem 16715MB [2024-08-10 09:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.159) Loss 0.9375 (0.7517) Acc@1 75.586 (82.601) Acc@5 94.824 (96.291) Mem 16715MB [2024-08-10 09:22:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.298 Acc@5 96.329 [2024-08-10 09:22:11 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 09:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][0/625] eta 0:13:12 lr 0.001002 wd 0.0500 time 1.2687 (1.2687) data time 0.7005 (0.7005) model time 0.0000 (0.0000) loss 2.3557 (2.3557) grad_norm 1.2660 (1.2660) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][10/625] eta 0:05:30 lr 0.001002 wd 0.0500 time 0.4686 (0.5379) data time 0.0008 (0.0648) model time 0.0000 (0.0000) loss 3.3511 (3.0598) grad_norm 1.4196 (1.2852) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][20/625] eta 0:05:04 lr 0.001002 wd 0.0500 time 0.4636 (0.5029) data time 0.0009 (0.0344) model time 0.0000 (0.0000) loss 2.6538 (3.0946) grad_norm 1.6011 (1.4777) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][30/625] eta 0:04:52 lr 0.001002 wd 0.0500 time 0.5175 (0.4922) data time 0.0010 (0.0237) model time 0.0000 (0.0000) loss 3.1446 (3.0274) grad_norm 1.5383 (1.4876) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][40/625] eta 0:04:44 lr 0.001001 wd 0.0500 time 0.4642 (0.4857) data time 0.0010 (0.0181) model time 0.0000 (0.0000) loss 2.5482 (3.0627) grad_norm 1.3046 (1.4838) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][50/625] eta 0:04:37 lr 0.001001 wd 0.0500 time 0.4670 (0.4827) data time 0.0010 (0.0148) model time 0.0000 (0.0000) loss 2.1939 (3.1324) grad_norm 1.5334 (1.4706) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][60/625] eta 0:04:32 lr 0.001001 wd 0.0500 time 0.4693 (0.4825) data time 0.0007 (0.0125) model time 0.4685 (0.4805) loss 2.5161 (3.1437) grad_norm 1.6878 (1.4650) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][70/625] eta 0:04:26 lr 0.001001 wd 0.0500 time 0.4645 (0.4801) data time 0.0010 (0.0109) model time 0.4634 (0.4722) loss 3.2620 (3.1495) grad_norm 1.4102 (1.4783) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][80/625] eta 0:04:22 lr 0.001001 wd 0.0500 time 0.4647 (0.4808) data time 0.0007 (0.0097) model time 0.4640 (0.4766) loss 3.6497 (3.1520) grad_norm 1.1643 (1.4640) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][90/625] eta 0:04:16 lr 0.001001 wd 0.0500 time 0.4633 (0.4789) data time 0.0009 (0.0088) model time 0.4625 (0.4731) loss 3.0108 (3.1638) grad_norm 1.0997 (1.4479) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][100/625] eta 0:04:10 lr 0.001001 wd 0.0500 time 0.4622 (0.4774) data time 0.0011 (0.0080) model time 0.4611 (0.4709) loss 3.5193 (3.1336) grad_norm 1.6169 (1.4438) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][110/625] eta 0:04:05 lr 0.001001 wd 0.0500 time 0.4708 (0.4763) data time 0.0011 (0.0074) model time 0.4697 (0.4697) loss 3.5072 (3.1451) grad_norm 1.2367 (1.4317) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][120/625] eta 0:04:00 lr 0.001001 wd 0.0500 time 0.4608 (0.4753) data time 0.0010 (0.0068) model time 0.4598 (0.4688) loss 3.3118 (3.1213) grad_norm 1.2520 (1.4448) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][130/625] eta 0:03:54 lr 0.001001 wd 0.0500 time 0.4614 (0.4744) data time 0.0008 (0.0064) model time 0.4605 (0.4680) loss 3.2928 (3.1329) grad_norm 1.5178 (1.4505) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][140/625] eta 0:03:49 lr 0.001001 wd 0.0500 time 0.4723 (0.4737) data time 0.0009 (0.0060) model time 0.4714 (0.4675) loss 3.8264 (3.1235) grad_norm 1.6803 (1.4657) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][150/625] eta 0:03:44 lr 0.001001 wd 0.0500 time 0.4639 (0.4729) data time 0.0008 (0.0057) model time 0.4631 (0.4668) loss 2.0425 (3.1254) grad_norm 1.6171 (1.4972) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][160/625] eta 0:03:39 lr 0.001001 wd 0.0500 time 0.4617 (0.4722) data time 0.0011 (0.0054) model time 0.4606 (0.4662) loss 3.4387 (3.1304) grad_norm 0.9861 (1.4955) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][170/625] eta 0:03:34 lr 0.001000 wd 0.0500 time 0.4670 (0.4716) data time 0.0009 (0.0052) model time 0.4661 (0.4658) loss 2.3836 (3.1362) grad_norm 1.3756 (1.4860) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][180/625] eta 0:03:29 lr 0.001000 wd 0.0500 time 0.4654 (0.4712) data time 0.0013 (0.0049) model time 0.4641 (0.4656) loss 3.2489 (3.1427) grad_norm 0.8324 (1.5015) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][190/625] eta 0:03:25 lr 0.001000 wd 0.0500 time 0.4768 (0.4714) data time 0.0011 (0.0047) model time 0.4758 (0.4662) loss 2.7857 (3.1366) grad_norm 1.1779 (1.5006) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][200/625] eta 0:03:20 lr 0.001000 wd 0.0500 time 0.4173 (0.4720) data time 0.0009 (0.0046) model time 0.4164 (0.4673) loss 3.8778 (3.1486) grad_norm 1.6571 (1.4946) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][210/625] eta 0:03:15 lr 0.001000 wd 0.0500 time 0.4690 (0.4716) data time 0.0008 (0.0044) model time 0.4682 (0.4670) loss 2.2101 (3.1558) grad_norm 1.2655 (1.4840) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][220/625] eta 0:03:10 lr 0.001000 wd 0.0500 time 0.4639 (0.4712) data time 0.0010 (0.0042) model time 0.4629 (0.4667) loss 3.2875 (3.1669) grad_norm 4.3688 (1.5290) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:23:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][230/625] eta 0:03:05 lr 0.001000 wd 0.0500 time 0.4609 (0.4709) data time 0.0010 (0.0041) model time 0.4599 (0.4664) loss 2.6620 (3.1737) grad_norm 1.4457 (1.5422) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][240/625] eta 0:03:01 lr 0.001000 wd 0.0500 time 0.4616 (0.4705) data time 0.0008 (0.0040) model time 0.4608 (0.4661) loss 4.1181 (3.1740) grad_norm 1.6296 (1.5337) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][250/625] eta 0:02:56 lr 0.001000 wd 0.0500 time 0.4597 (0.4702) data time 0.0008 (0.0039) model time 0.4589 (0.4659) loss 2.1225 (3.1711) grad_norm 1.2614 (1.5225) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][260/625] eta 0:02:51 lr 0.001000 wd 0.0500 time 0.4670 (0.4701) data time 0.0010 (0.0038) model time 0.4660 (0.4659) loss 2.6539 (3.1740) grad_norm 1.4528 (1.5219) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][270/625] eta 0:02:46 lr 0.001000 wd 0.0500 time 0.4678 (0.4699) data time 0.0008 (0.0037) model time 0.4670 (0.4659) loss 2.4816 (3.1745) grad_norm 2.0667 (1.5356) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][280/625] eta 0:02:42 lr 0.001000 wd 0.0500 time 0.4667 (0.4698) data time 0.0010 (0.0036) model time 0.4657 (0.4658) loss 3.4774 (3.1688) grad_norm 1.4120 (1.5328) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][290/625] eta 0:02:37 lr 0.000999 wd 0.0500 time 0.4643 (0.4696) data time 0.0010 (0.0035) model time 0.4633 (0.4658) loss 3.0351 (3.1665) grad_norm 1.0902 (1.5319) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][300/625] eta 0:02:32 lr 0.000999 wd 0.0500 time 0.4731 (0.4695) data time 0.0010 (0.0034) model time 0.4721 (0.4657) loss 2.8176 (3.1707) grad_norm 1.2712 (1.5215) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][310/625] eta 0:02:27 lr 0.000999 wd 0.0500 time 0.4623 (0.4693) data time 0.0007 (0.0033) model time 0.4616 (0.4656) loss 3.3147 (3.1650) grad_norm 1.7482 (1.5174) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][320/625] eta 0:02:23 lr 0.000999 wd 0.0500 time 0.4585 (0.4691) data time 0.0008 (0.0033) model time 0.4577 (0.4654) loss 3.6860 (3.1626) grad_norm 1.8449 (1.5210) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][330/625] eta 0:02:18 lr 0.000999 wd 0.0500 time 0.4653 (0.4690) data time 0.0010 (0.0032) model time 0.4643 (0.4654) loss 3.4676 (3.1641) grad_norm 1.2833 (1.5178) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][340/625] eta 0:02:13 lr 0.000999 wd 0.0500 time 0.4687 (0.4699) data time 0.0010 (0.0031) model time 0.4677 (0.4666) loss 3.4136 (3.1611) grad_norm 1.9111 (1.5125) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][350/625] eta 0:02:09 lr 0.000999 wd 0.0500 time 0.4674 (0.4698) data time 0.0010 (0.0031) model time 0.4664 (0.4666) loss 3.1466 (3.1623) grad_norm 1.6009 (1.5176) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][360/625] eta 0:02:04 lr 0.000999 wd 0.0500 time 0.4629 (0.4697) data time 0.0010 (0.0030) model time 0.4618 (0.4665) loss 3.1353 (3.1615) grad_norm 1.5470 (1.5150) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][370/625] eta 0:01:59 lr 0.000999 wd 0.0500 time 0.4620 (0.4695) data time 0.0010 (0.0030) model time 0.4610 (0.4664) loss 3.0690 (3.1523) grad_norm 1.3589 (1.5216) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][380/625] eta 0:01:55 lr 0.000999 wd 0.0500 time 0.4732 (0.4694) data time 0.0008 (0.0029) model time 0.4724 (0.4663) loss 2.7271 (3.1517) grad_norm 1.9340 (1.5212) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][390/625] eta 0:01:50 lr 0.000999 wd 0.0500 time 0.4554 (0.4693) data time 0.0010 (0.0029) model time 0.4544 (0.4662) loss 3.1275 (3.1497) grad_norm 1.0485 (1.5168) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][400/625] eta 0:01:45 lr 0.000999 wd 0.0500 time 0.4685 (0.4692) data time 0.0010 (0.0028) model time 0.4675 (0.4661) loss 2.9215 (3.1498) grad_norm 1.3507 (1.5170) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][410/625] eta 0:01:41 lr 0.000999 wd 0.0500 time 0.4679 (0.4701) data time 0.0012 (0.0028) model time 0.4667 (0.4672) loss 3.6440 (3.1525) grad_norm 1.3077 (1.5182) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][420/625] eta 0:01:36 lr 0.000998 wd 0.0500 time 0.4648 (0.4700) data time 0.0009 (0.0027) model time 0.4639 (0.4671) loss 2.9447 (3.1473) grad_norm 0.8717 (1.5191) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][430/625] eta 0:01:31 lr 0.000998 wd 0.0500 time 0.4642 (0.4699) data time 0.0008 (0.0027) model time 0.4634 (0.4671) loss 2.9089 (3.1524) grad_norm 0.9687 (1.5168) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][440/625] eta 0:01:26 lr 0.000998 wd 0.0500 time 0.4623 (0.4697) data time 0.0008 (0.0027) model time 0.4615 (0.4670) loss 3.5291 (3.1518) grad_norm 1.4875 (1.5129) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][450/625] eta 0:01:22 lr 0.000998 wd 0.0500 time 0.4630 (0.4696) data time 0.0011 (0.0026) model time 0.4619 (0.4669) loss 3.4290 (3.1497) grad_norm 1.4544 (1.5160) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][460/625] eta 0:01:17 lr 0.000998 wd 0.0500 time 0.4631 (0.4695) data time 0.0010 (0.0026) model time 0.4621 (0.4668) loss 2.9024 (3.1464) grad_norm 1.3120 (1.5142) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][470/625] eta 0:01:12 lr 0.000998 wd 0.0500 time 0.4691 (0.4694) data time 0.0009 (0.0026) model time 0.4682 (0.4667) loss 3.8518 (3.1507) grad_norm 1.2946 (1.5122) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:25:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][480/625] eta 0:01:08 lr 0.000998 wd 0.0500 time 0.4648 (0.4693) data time 0.0010 (0.0025) model time 0.4638 (0.4666) loss 3.3377 (3.1445) grad_norm 1.5710 (1.5229) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][490/625] eta 0:01:03 lr 0.000998 wd 0.0500 time 0.4648 (0.4693) data time 0.0010 (0.0025) model time 0.4637 (0.4666) loss 3.4130 (3.1494) grad_norm 1.4453 (1.5249) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][500/625] eta 0:00:58 lr 0.000998 wd 0.0500 time 0.4617 (0.4691) data time 0.0008 (0.0025) model time 0.4609 (0.4665) loss 3.0941 (3.1507) grad_norm 1.2806 (1.5228) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][510/625] eta 0:00:53 lr 0.000998 wd 0.0500 time 0.4684 (0.4691) data time 0.0010 (0.0025) model time 0.4674 (0.4665) loss 3.2403 (3.1507) grad_norm 1.9777 (1.5248) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][520/625] eta 0:00:49 lr 0.000998 wd 0.0500 time 0.4612 (0.4690) data time 0.0010 (0.0024) model time 0.4602 (0.4664) loss 3.2857 (3.1528) grad_norm 1.5464 (1.5248) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][530/625] eta 0:00:44 lr 0.000998 wd 0.0500 time 0.4904 (0.4696) data time 0.0008 (0.0024) model time 0.4896 (0.4671) loss 3.4015 (3.1520) grad_norm 2.3466 (1.5397) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][540/625] eta 0:00:39 lr 0.000997 wd 0.0500 time 0.4649 (0.4698) data time 0.0010 (0.0024) model time 0.4639 (0.4674) loss 3.4552 (3.1576) grad_norm 1.3189 (1.5415) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][550/625] eta 0:00:35 lr 0.000997 wd 0.0500 time 0.4693 (0.4698) data time 0.0010 (0.0024) model time 0.4684 (0.4674) loss 2.4174 (3.1588) grad_norm 1.0958 (1.5387) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][560/625] eta 0:00:30 lr 0.000997 wd 0.0500 time 0.4657 (0.4700) data time 0.0008 (0.0023) model time 0.4649 (0.4677) loss 4.0515 (3.1599) grad_norm 1.4475 (1.5337) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][570/625] eta 0:00:25 lr 0.000997 wd 0.0500 time 0.4618 (0.4700) data time 0.0008 (0.0023) model time 0.4610 (0.4677) loss 3.6755 (3.1622) grad_norm 1.2234 (1.5346) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][580/625] eta 0:00:21 lr 0.000997 wd 0.0500 time 0.4624 (0.4699) data time 0.0010 (0.0023) model time 0.4614 (0.4676) loss 3.1617 (3.1599) grad_norm 1.7700 (1.5346) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][590/625] eta 0:00:16 lr 0.000997 wd 0.0500 time 0.4569 (0.4699) data time 0.0010 (0.0023) model time 0.4559 (0.4676) loss 3.2388 (3.1595) grad_norm 1.1211 (1.5335) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][600/625] eta 0:00:11 lr 0.000997 wd 0.0500 time 0.4688 (0.4702) data time 0.0010 (0.0022) model time 0.4678 (0.4680) loss 3.0453 (3.1587) grad_norm 1.3891 (1.5301) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][610/625] eta 0:00:07 lr 0.000997 wd 0.0500 time 0.4555 (0.4701) data time 0.0005 (0.0022) model time 0.4549 (0.4679) loss 3.3896 (3.1571) grad_norm 1.1957 (1.5335) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [95/300][620/625] eta 0:00:02 lr 0.000997 wd 0.0500 time 0.4566 (0.4700) data time 0.0005 (0.0022) model time 0.4561 (0.4678) loss 3.5698 (3.1616) grad_norm 1.2413 (1.5331) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:04 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 95 training takes 0:04:53 [2024-08-10 09:27:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:27:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.5649 (0.5649) Acc@1 87.646 (87.646) Acc@5 98.242 (98.242) Mem 16715MB [2024-08-10 09:27:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.9541 (0.7188) Acc@1 76.514 (84.020) Acc@5 94.775 (97.021) Mem 16715MB [2024-08-10 09:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 1.0664 (0.8481) Acc@1 73.242 (80.620) Acc@5 93.359 (95.471) Mem 16715MB [2024-08-10 09:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.410 Acc@5 95.477 [2024-08-10 09:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.4% [2024-08-10 09:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.849 (0.849) Loss 0.5068 (0.5068) Acc@1 88.574 (88.574) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.192) Loss 0.8218 (0.6345) Acc@1 80.225 (85.898) Acc@5 95.703 (97.532) Mem 16715MB [2024-08-10 09:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.9355 (0.7510) Acc@1 75.732 (82.657) Acc@5 94.775 (96.305) Mem 16715MB [2024-08-10 09:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.370 Acc@5 96.347 [2024-08-10 09:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 09:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.37% [2024-08-10 09:27:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:27:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:27:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][0/625] eta 0:08:44 lr 0.000997 wd 0.0500 time 0.8397 (0.8397) data time 0.4350 (0.4350) model time 0.0000 (0.0000) loss 2.8748 (2.8748) grad_norm 1.1027 (1.1027) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][10/625] eta 0:05:07 lr 0.000997 wd 0.0500 time 0.4642 (0.5006) data time 0.0010 (0.0408) model time 0.0000 (0.0000) loss 3.4758 (3.2505) grad_norm 1.3643 (1.5662) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][20/625] eta 0:04:53 lr 0.000997 wd 0.0500 time 0.4550 (0.4849) data time 0.0010 (0.0219) model time 0.0000 (0.0000) loss 3.5143 (3.1215) grad_norm 1.8696 (1.6127) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][30/625] eta 0:04:44 lr 0.000997 wd 0.0500 time 0.4637 (0.4784) data time 0.0007 (0.0152) model time 0.0000 (0.0000) loss 3.3356 (3.1506) grad_norm 1.1404 (1.6699) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][40/625] eta 0:04:39 lr 0.000996 wd 0.0500 time 0.4639 (0.4777) data time 0.0010 (0.0118) model time 0.0000 (0.0000) loss 2.6823 (3.1949) grad_norm 2.4626 (1.6281) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][50/625] eta 0:04:33 lr 0.000996 wd 0.0500 time 0.4611 (0.4748) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 2.5772 (3.1521) grad_norm 1.3921 (1.5955) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][60/625] eta 0:04:28 lr 0.000996 wd 0.0500 time 0.4634 (0.4746) data time 0.0010 (0.0083) model time 0.4624 (0.4720) loss 3.4615 (3.2026) grad_norm 1.3897 (1.5713) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][70/625] eta 0:04:22 lr 0.000996 wd 0.0500 time 0.4812 (0.4734) data time 0.0009 (0.0073) model time 0.4803 (0.4688) loss 3.6057 (3.2439) grad_norm 1.1980 (1.5340) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][80/625] eta 0:04:17 lr 0.000996 wd 0.0500 time 0.4581 (0.4725) data time 0.0010 (0.0065) model time 0.4571 (0.4675) loss 2.9729 (3.2682) grad_norm 1.6645 (1.5401) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:27:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][90/625] eta 0:04:13 lr 0.000996 wd 0.0500 time 0.4658 (0.4735) data time 0.0012 (0.0059) model time 0.4647 (0.4706) loss 3.8474 (3.2580) grad_norm 1.3425 (1.5315) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][100/625] eta 0:04:08 lr 0.000996 wd 0.0500 time 0.4751 (0.4730) data time 0.0008 (0.0054) model time 0.4743 (0.4700) loss 1.8847 (3.2189) grad_norm 1.4010 (1.5180) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][110/625] eta 0:04:04 lr 0.000996 wd 0.0500 time 0.4651 (0.4750) data time 0.0012 (0.0050) model time 0.4639 (0.4740) loss 3.0554 (3.2119) grad_norm 1.1821 (1.5094) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][120/625] eta 0:04:00 lr 0.000996 wd 0.0500 time 0.4587 (0.4753) data time 0.0010 (0.0047) model time 0.4576 (0.4746) loss 3.1463 (3.2261) grad_norm 1.1112 (1.5094) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][130/625] eta 0:03:54 lr 0.000996 wd 0.0500 time 0.4641 (0.4745) data time 0.0009 (0.0044) model time 0.4633 (0.4732) loss 2.5927 (3.2148) grad_norm 1.5096 (1.5007) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][140/625] eta 0:03:49 lr 0.000996 wd 0.0500 time 0.4676 (0.4739) data time 0.0011 (0.0042) model time 0.4666 (0.4723) loss 3.5020 (3.2046) grad_norm 1.6498 (1.4906) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][150/625] eta 0:03:44 lr 0.000996 wd 0.0500 time 0.4728 (0.4734) data time 0.0008 (0.0040) model time 0.4719 (0.4715) loss 1.8242 (3.1894) grad_norm 2.3746 (1.4941) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][160/625] eta 0:03:39 lr 0.000996 wd 0.0500 time 0.4659 (0.4728) data time 0.0010 (0.0038) model time 0.4649 (0.4708) loss 3.3800 (3.1852) grad_norm 1.4603 (1.4863) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][170/625] eta 0:03:35 lr 0.000995 wd 0.0500 time 0.4626 (0.4737) data time 0.0008 (0.0036) model time 0.4618 (0.4721) loss 3.4245 (3.1861) grad_norm 1.7536 (1.4875) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][180/625] eta 0:03:30 lr 0.000995 wd 0.0500 time 0.4630 (0.4732) data time 0.0011 (0.0035) model time 0.4619 (0.4715) loss 2.4926 (3.1742) grad_norm 1.1246 (1.4775) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][190/625] eta 0:03:25 lr 0.000995 wd 0.0500 time 0.4570 (0.4726) data time 0.0008 (0.0034) model time 0.4562 (0.4707) loss 3.0946 (3.1711) grad_norm 1.3929 (1.4818) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][200/625] eta 0:03:20 lr 0.000995 wd 0.0500 time 0.4635 (0.4720) data time 0.0010 (0.0033) model time 0.4625 (0.4700) loss 2.9809 (3.1597) grad_norm 1.4497 (1.4807) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][210/625] eta 0:03:15 lr 0.000995 wd 0.0500 time 0.4618 (0.4717) data time 0.0012 (0.0032) model time 0.4607 (0.4696) loss 3.2779 (3.1610) grad_norm 1.6145 (1.4788) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:28:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][220/625] eta 0:03:10 lr 0.000995 wd 0.0500 time 0.4626 (0.4714) data time 0.0009 (0.0031) model time 0.4617 (0.4693) loss 2.0473 (3.1583) grad_norm 1.1964 (1.4984) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][230/625] eta 0:03:06 lr 0.000995 wd 0.0500 time 0.4631 (0.4711) data time 0.0008 (0.0030) model time 0.4623 (0.4690) loss 3.2077 (3.1654) grad_norm 1.0073 (1.4951) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][240/625] eta 0:03:01 lr 0.000995 wd 0.0500 time 0.4647 (0.4708) data time 0.0007 (0.0029) model time 0.4640 (0.4686) loss 3.0114 (3.1611) grad_norm 1.3821 (1.4827) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][250/625] eta 0:02:56 lr 0.000995 wd 0.0500 time 0.4623 (0.4705) data time 0.0007 (0.0028) model time 0.4615 (0.4683) loss 3.5160 (3.1661) grad_norm 1.3716 (1.4771) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][260/625] eta 0:02:51 lr 0.000995 wd 0.0500 time 0.4644 (0.4702) data time 0.0010 (0.0028) model time 0.4633 (0.4680) loss 3.5043 (3.1646) grad_norm 1.5272 (1.4764) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][270/625] eta 0:02:46 lr 0.000995 wd 0.0500 time 0.4710 (0.4700) data time 0.0008 (0.0027) model time 0.4702 (0.4678) loss 3.2466 (3.1658) grad_norm 1.0319 (1.4782) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][280/625] eta 0:02:42 lr 0.000995 wd 0.0500 time 0.4689 (0.4701) data time 0.0010 (0.0026) model time 0.4678 (0.4680) loss 3.2566 (3.1626) grad_norm 1.2355 (1.4808) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][290/625] eta 0:02:37 lr 0.000994 wd 0.0500 time 0.4646 (0.4700) data time 0.0008 (0.0026) model time 0.4638 (0.4679) loss 3.4616 (3.1543) grad_norm 1.2991 (1.4861) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][300/625] eta 0:02:32 lr 0.000994 wd 0.0500 time 0.4621 (0.4698) data time 0.0009 (0.0025) model time 0.4612 (0.4677) loss 3.0094 (3.1449) grad_norm 1.2918 (1.4915) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][310/625] eta 0:02:27 lr 0.000994 wd 0.0500 time 0.4627 (0.4696) data time 0.0008 (0.0025) model time 0.4619 (0.4676) loss 2.6900 (3.1454) grad_norm 1.1810 (1.4905) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][320/625] eta 0:02:23 lr 0.000994 wd 0.0500 time 0.4676 (0.4694) data time 0.0007 (0.0024) model time 0.4669 (0.4674) loss 2.9200 (3.1473) grad_norm 1.3657 (1.5016) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][330/625] eta 0:02:18 lr 0.000994 wd 0.0500 time 0.4607 (0.4699) data time 0.0010 (0.0024) model time 0.4597 (0.4679) loss 3.7136 (3.1422) grad_norm 1.1064 (1.4957) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][340/625] eta 0:02:13 lr 0.000994 wd 0.0500 time 0.4563 (0.4695) data time 0.0010 (0.0024) model time 0.4554 (0.4675) loss 2.7991 (3.1444) grad_norm 2.0961 (1.4991) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][350/625] eta 0:02:09 lr 0.000994 wd 0.0500 time 0.4656 (0.4693) data time 0.0012 (0.0023) model time 0.4644 (0.4673) loss 3.5301 (3.1475) grad_norm 1.1960 (1.5040) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][360/625] eta 0:02:04 lr 0.000994 wd 0.0500 time 0.4634 (0.4692) data time 0.0010 (0.0023) model time 0.4624 (0.4672) loss 3.0955 (3.1488) grad_norm 1.0793 (1.5033) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][370/625] eta 0:01:59 lr 0.000994 wd 0.0500 time 0.4644 (0.4691) data time 0.0010 (0.0023) model time 0.4634 (0.4671) loss 2.5871 (3.1453) grad_norm 1.6247 (1.5030) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][380/625] eta 0:01:54 lr 0.000994 wd 0.0500 time 0.4615 (0.4689) data time 0.0008 (0.0022) model time 0.4607 (0.4669) loss 3.7695 (3.1510) grad_norm 1.4703 (1.4997) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][390/625] eta 0:01:50 lr 0.000994 wd 0.0500 time 0.4781 (0.4692) data time 0.0008 (0.0022) model time 0.4774 (0.4673) loss 3.6872 (3.1545) grad_norm 1.2283 (1.4958) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][400/625] eta 0:01:45 lr 0.000994 wd 0.0500 time 0.4550 (0.4691) data time 0.0010 (0.0022) model time 0.4540 (0.4672) loss 2.9850 (3.1512) grad_norm 1.8136 (1.5004) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][410/625] eta 0:01:40 lr 0.000994 wd 0.0500 time 0.4626 (0.4690) data time 0.0011 (0.0021) model time 0.4615 (0.4670) loss 3.2350 (3.1498) grad_norm 1.4777 (1.5018) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][420/625] eta 0:01:36 lr 0.000993 wd 0.0500 time 0.4629 (0.4688) data time 0.0011 (0.0021) model time 0.4618 (0.4669) loss 3.3449 (3.1515) grad_norm 1.3271 (1.4985) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][430/625] eta 0:01:31 lr 0.000993 wd 0.0500 time 0.4646 (0.4688) data time 0.0010 (0.0021) model time 0.4636 (0.4669) loss 2.1521 (3.1476) grad_norm 1.2482 (1.4933) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][440/625] eta 0:01:26 lr 0.000993 wd 0.0500 time 0.4763 (0.4687) data time 0.0010 (0.0021) model time 0.4753 (0.4669) loss 3.3603 (3.1444) grad_norm 1.4595 (1.4997) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][450/625] eta 0:01:22 lr 0.000993 wd 0.0500 time 0.4590 (0.4691) data time 0.0009 (0.0020) model time 0.4581 (0.4672) loss 3.3715 (3.1459) grad_norm 1.2906 (1.5024) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][460/625] eta 0:01:17 lr 0.000993 wd 0.0500 time 0.4588 (0.4689) data time 0.0008 (0.0020) model time 0.4579 (0.4671) loss 2.2754 (3.1423) grad_norm 1.4105 (1.5020) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:30:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][470/625] eta 0:01:12 lr 0.000993 wd 0.0500 time 0.4633 (0.4692) data time 0.0008 (0.0020) model time 0.4625 (0.4674) loss 3.4524 (3.1481) grad_norm 3.1887 (1.5047) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][480/625] eta 0:01:08 lr 0.000993 wd 0.0500 time 0.4673 (0.4690) data time 0.0010 (0.0020) model time 0.4663 (0.4673) loss 3.4680 (3.1499) grad_norm 1.4598 (1.5042) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][490/625] eta 0:01:03 lr 0.000993 wd 0.0500 time 0.4654 (0.4689) data time 0.0008 (0.0020) model time 0.4646 (0.4672) loss 3.6899 (3.1483) grad_norm 1.1434 (1.4975) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][500/625] eta 0:00:58 lr 0.000993 wd 0.0500 time 0.4636 (0.4692) data time 0.0011 (0.0020) model time 0.4626 (0.4675) loss 3.7161 (3.1485) grad_norm 2.3350 (1.4966) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][510/625] eta 0:00:53 lr 0.000993 wd 0.0500 time 0.4633 (0.4691) data time 0.0009 (0.0019) model time 0.4624 (0.4674) loss 3.3619 (3.1438) grad_norm 1.6498 (1.4985) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][520/625] eta 0:00:49 lr 0.000993 wd 0.0500 time 0.4770 (0.4691) data time 0.0008 (0.0019) model time 0.4762 (0.4674) loss 3.8440 (3.1480) grad_norm 1.5168 (1.4998) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][530/625] eta 0:00:44 lr 0.000993 wd 0.0500 time 0.4637 (0.4694) data time 0.0008 (0.0019) model time 0.4630 (0.4678) loss 2.7344 (3.1456) grad_norm 2.1267 (1.4998) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][540/625] eta 0:00:39 lr 0.000992 wd 0.0500 time 0.4523 (0.4693) data time 0.0008 (0.0019) model time 0.4515 (0.4676) loss 3.4694 (3.1471) grad_norm 1.4585 (1.4957) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][550/625] eta 0:00:35 lr 0.000992 wd 0.0500 time 0.4480 (0.4691) data time 0.0009 (0.0019) model time 0.4471 (0.4675) loss 2.3704 (3.1476) grad_norm 2.0409 (1.5013) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][560/625] eta 0:00:30 lr 0.000992 wd 0.0500 time 0.4697 (0.4690) data time 0.0010 (0.0019) model time 0.4687 (0.4674) loss 3.5117 (3.1483) grad_norm 1.3749 (1.5029) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][570/625] eta 0:00:25 lr 0.000992 wd 0.0500 time 0.4709 (0.4690) data time 0.0009 (0.0018) model time 0.4700 (0.4673) loss 2.4607 (3.1458) grad_norm 1.1957 (1.5041) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][580/625] eta 0:00:21 lr 0.000992 wd 0.0500 time 0.4658 (0.4689) data time 0.0010 (0.0018) model time 0.4648 (0.4673) loss 3.2154 (3.1478) grad_norm 1.4293 (1.4999) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][590/625] eta 0:00:16 lr 0.000992 wd 0.0500 time 0.4602 (0.4688) data time 0.0008 (0.0018) model time 0.4594 (0.4672) loss 3.9941 (3.1474) grad_norm 1.2960 (1.4999) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][600/625] eta 0:00:11 lr 0.000992 wd 0.0500 time 0.4656 (0.4688) data time 0.0011 (0.0018) model time 0.4645 (0.4671) loss 2.8827 (3.1455) grad_norm 1.7674 (1.5058) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][610/625] eta 0:00:07 lr 0.000992 wd 0.0500 time 0.4614 (0.4687) data time 0.0008 (0.0018) model time 0.4606 (0.4670) loss 3.1278 (3.1493) grad_norm 1.0510 (1.5070) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [96/300][620/625] eta 0:00:02 lr 0.000992 wd 0.0500 time 0.4599 (0.4685) data time 0.0007 (0.0018) model time 0.4592 (0.4669) loss 2.5528 (3.1481) grad_norm 1.5822 (1.5115) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 96 training takes 0:04:52 [2024-08-10 09:32:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:32:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.5762 (0.5762) Acc@1 87.402 (87.402) Acc@5 98.193 (98.193) Mem 16715MB [2024-08-10 09:32:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.9751 (0.7202) Acc@1 76.807 (83.984) Acc@5 94.678 (97.155) Mem 16715MB [2024-08-10 09:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 1.0654 (0.8565) Acc@1 74.414 (80.585) Acc@5 93.555 (95.568) Mem 16715MB [2024-08-10 09:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.488 Acc@5 95.593 [2024-08-10 09:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-10 09:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.49% [2024-08-10 09:32:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 09:32:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 09:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.5073 (0.5073) Acc@1 88.623 (88.623) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:32:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.160) Loss 0.8208 (0.6337) Acc@1 80.322 (85.871) Acc@5 95.703 (97.532) Mem 16715MB [2024-08-10 09:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9321 (0.7500) Acc@1 75.928 (82.671) Acc@5 94.775 (96.308) Mem 16715MB [2024-08-10 09:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.386 Acc@5 96.347 [2024-08-10 09:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 09:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.39% [2024-08-10 09:32:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:32:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:32:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][0/625] eta 0:08:29 lr 0.000992 wd 0.0500 time 0.8155 (0.8155) data time 0.4096 (0.4096) model time 0.0000 (0.0000) loss 3.2789 (3.2789) grad_norm 1.6761 (1.6761) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][10/625] eta 0:05:06 lr 0.000992 wd 0.0500 time 0.4645 (0.4984) data time 0.0008 (0.0383) model time 0.0000 (0.0000) loss 3.2442 (3.3425) grad_norm 1.9314 (1.5135) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][20/625] eta 0:04:52 lr 0.000992 wd 0.0500 time 0.4622 (0.4836) data time 0.0010 (0.0205) model time 0.0000 (0.0000) loss 3.0915 (3.3024) grad_norm 0.9718 (1.4140) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][30/625] eta 0:04:51 lr 0.000992 wd 0.0500 time 0.4659 (0.4893) data time 0.0010 (0.0143) model time 0.0000 (0.0000) loss 3.4332 (3.2393) grad_norm 2.0499 (1.5506) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][40/625] eta 0:04:42 lr 0.000991 wd 0.0500 time 0.4611 (0.4832) data time 0.0008 (0.0110) model time 0.0000 (0.0000) loss 2.9356 (3.1760) grad_norm 1.2721 (1.5175) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][50/625] eta 0:04:36 lr 0.000991 wd 0.0500 time 0.4660 (0.4804) data time 0.0007 (0.0091) model time 0.0000 (0.0000) loss 3.5840 (3.1099) grad_norm 2.0703 (1.4823) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][60/625] eta 0:04:29 lr 0.000991 wd 0.0500 time 0.4610 (0.4775) data time 0.0007 (0.0078) model time 0.4602 (0.4615) loss 3.9970 (3.1029) grad_norm 1.0075 (1.4772) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][70/625] eta 0:04:23 lr 0.000991 wd 0.0500 time 0.4672 (0.4754) data time 0.0008 (0.0069) model time 0.4664 (0.4616) loss 2.6467 (3.0879) grad_norm 1.8679 (1.4880) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:32:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][80/625] eta 0:04:19 lr 0.000991 wd 0.0500 time 0.4632 (0.4764) data time 0.0010 (0.0061) model time 0.4621 (0.4686) loss 2.3336 (3.0613) grad_norm 1.5718 (1.5210) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:33:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][90/625] eta 0:04:14 lr 0.000991 wd 0.0500 time 0.4688 (0.4752) data time 0.0008 (0.0057) model time 0.4680 (0.4671) loss 2.6968 (3.0502) grad_norm 1.3339 (1.5064) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][100/625] eta 0:04:08 lr 0.000991 wd 0.0500 time 0.4624 (0.4743) data time 0.0008 (0.0053) model time 0.4616 (0.4666) loss 2.4666 (3.0466) grad_norm 2.2398 (1.5183) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][110/625] eta 0:04:04 lr 0.000991 wd 0.0500 time 0.4609 (0.4755) data time 0.0007 (0.0049) model time 0.4602 (0.4700) loss 3.2864 (3.0706) grad_norm 1.1336 (1.5165) loss_scale 4096.0000 (2158.7027) mem 16715MB [2024-08-10 09:33:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][120/625] eta 0:03:59 lr 0.000991 wd 0.0500 time 0.4632 (0.4745) data time 0.0010 (0.0046) model time 0.4622 (0.4689) loss 2.8347 (3.0700) grad_norm 1.4424 (1.5043) loss_scale 4096.0000 (2318.8099) mem 16715MB [2024-08-10 09:33:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][130/625] eta 0:03:54 lr 0.000991 wd 0.0500 time 0.4694 (0.4737) data time 0.0009 (0.0043) model time 0.4685 (0.4682) loss 3.6475 (3.0585) grad_norm 1.9407 (1.5027) loss_scale 4096.0000 (2454.4733) mem 16715MB [2024-08-10 09:33:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][140/625] eta 0:03:49 lr 0.000991 wd 0.0500 time 0.4721 (0.4742) data time 0.0008 (0.0041) model time 0.4713 (0.4693) loss 2.3639 (3.0388) grad_norm 1.6765 (1.4999) loss_scale 4096.0000 (2570.8936) mem 16715MB [2024-08-10 09:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][150/625] eta 0:03:45 lr 0.000991 wd 0.0500 time 0.4642 (0.4741) data time 0.0008 (0.0039) model time 0.4635 (0.4697) loss 3.9455 (3.0391) grad_norm 2.4276 (1.5494) loss_scale 4096.0000 (2671.8940) mem 16715MB [2024-08-10 09:33:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][160/625] eta 0:03:40 lr 0.000990 wd 0.0500 time 0.4649 (0.4735) data time 0.0008 (0.0037) model time 0.4641 (0.4691) loss 3.0927 (3.0351) grad_norm 1.9057 (1.5674) loss_scale 4096.0000 (2760.3478) mem 16715MB [2024-08-10 09:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][170/625] eta 0:03:35 lr 0.000990 wd 0.0500 time 0.4637 (0.4729) data time 0.0008 (0.0036) model time 0.4629 (0.4685) loss 2.6705 (3.0403) grad_norm 1.1246 (1.5602) loss_scale 4096.0000 (2838.4561) mem 16715MB [2024-08-10 09:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][180/625] eta 0:03:30 lr 0.000990 wd 0.0500 time 0.4652 (0.4724) data time 0.0008 (0.0034) model time 0.4645 (0.4681) loss 3.1562 (3.0435) grad_norm 1.1648 (1.5598) loss_scale 4096.0000 (2907.9337) mem 16715MB [2024-08-10 09:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][190/625] eta 0:03:25 lr 0.000990 wd 0.0500 time 0.4708 (0.4720) data time 0.0010 (0.0033) model time 0.4698 (0.4677) loss 2.8165 (3.0489) grad_norm 1.9891 (1.5639) loss_scale 4096.0000 (2970.1361) mem 16715MB [2024-08-10 09:33:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][200/625] eta 0:03:20 lr 0.000990 wd 0.0500 time 0.4642 (0.4716) data time 0.0010 (0.0032) model time 0.4632 (0.4674) loss 3.4274 (3.0585) grad_norm 1.5645 (1.5539) loss_scale 4096.0000 (3026.1493) mem 16715MB [2024-08-10 09:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][210/625] eta 0:03:15 lr 0.000990 wd 0.0500 time 0.4632 (0.4711) data time 0.0008 (0.0031) model time 0.4624 (0.4670) loss 3.9384 (3.0669) grad_norm 1.3673 (1.5464) loss_scale 4096.0000 (3076.8531) mem 16715MB [2024-08-10 09:34:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][220/625] eta 0:03:11 lr 0.000990 wd 0.0500 time 0.4629 (0.4718) data time 0.0010 (0.0030) model time 0.4620 (0.4681) loss 3.4431 (3.0641) grad_norm 1.3824 (1.5470) loss_scale 4096.0000 (3122.9683) mem 16715MB [2024-08-10 09:34:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][230/625] eta 0:03:06 lr 0.000990 wd 0.0500 time 0.4697 (0.4716) data time 0.0009 (0.0029) model time 0.4689 (0.4680) loss 3.5408 (3.0660) grad_norm 1.6346 (1.5546) loss_scale 4096.0000 (3165.0909) mem 16715MB [2024-08-10 09:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][240/625] eta 0:03:01 lr 0.000990 wd 0.0500 time 0.4655 (0.4714) data time 0.0008 (0.0028) model time 0.4647 (0.4678) loss 3.2074 (3.0701) grad_norm 2.0820 (1.5659) loss_scale 4096.0000 (3203.7178) mem 16715MB [2024-08-10 09:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][250/625] eta 0:02:56 lr 0.000990 wd 0.0500 time 0.4658 (0.4713) data time 0.0011 (0.0028) model time 0.4647 (0.4678) loss 3.5394 (3.0835) grad_norm 1.3658 (1.5662) loss_scale 4096.0000 (3239.2669) mem 16715MB [2024-08-10 09:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][260/625] eta 0:02:51 lr 0.000990 wd 0.0500 time 0.4605 (0.4711) data time 0.0008 (0.0027) model time 0.4597 (0.4676) loss 3.5308 (3.0858) grad_norm 1.5045 (1.5629) loss_scale 4096.0000 (3272.0920) mem 16715MB [2024-08-10 09:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][270/625] eta 0:02:47 lr 0.000990 wd 0.0500 time 0.4674 (0.4708) data time 0.0008 (0.0026) model time 0.4666 (0.4674) loss 3.7224 (3.0854) grad_norm 1.6832 (1.5591) loss_scale 4096.0000 (3302.4945) mem 16715MB [2024-08-10 09:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][280/625] eta 0:02:42 lr 0.000989 wd 0.0500 time 0.4607 (0.4705) data time 0.0009 (0.0026) model time 0.4599 (0.4672) loss 2.3508 (3.0813) grad_norm 1.3920 (1.5545) loss_scale 4096.0000 (3330.7331) mem 16715MB [2024-08-10 09:34:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][290/625] eta 0:02:37 lr 0.000989 wd 0.0500 time 0.4608 (0.4703) data time 0.0009 (0.0025) model time 0.4599 (0.4670) loss 3.2731 (3.0949) grad_norm 1.7936 (1.5577) loss_scale 4096.0000 (3357.0309) mem 16715MB [2024-08-10 09:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][300/625] eta 0:02:32 lr 0.000989 wd 0.0500 time 0.4655 (0.4701) data time 0.0010 (0.0025) model time 0.4645 (0.4669) loss 3.2868 (3.0894) grad_norm 1.5982 (1.5599) loss_scale 4096.0000 (3381.5814) mem 16715MB [2024-08-10 09:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][310/625] eta 0:02:28 lr 0.000989 wd 0.0500 time 0.4615 (0.4700) data time 0.0009 (0.0024) model time 0.4606 (0.4668) loss 2.3408 (3.0889) grad_norm 1.2007 (1.5512) loss_scale 4096.0000 (3404.5531) mem 16715MB [2024-08-10 09:34:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][320/625] eta 0:02:23 lr 0.000989 wd 0.0500 time 0.4602 (0.4698) data time 0.0011 (0.0024) model time 0.4592 (0.4667) loss 3.1549 (3.0803) grad_norm 1.7681 (1.5454) loss_scale 4096.0000 (3426.0935) mem 16715MB [2024-08-10 09:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][330/625] eta 0:02:18 lr 0.000989 wd 0.0500 time 0.4650 (0.4697) data time 0.0007 (0.0024) model time 0.4643 (0.4666) loss 3.2353 (3.0854) grad_norm 1.5295 (1.5446) loss_scale 4096.0000 (3446.3323) mem 16715MB [2024-08-10 09:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][340/625] eta 0:02:13 lr 0.000989 wd 0.0500 time 0.4598 (0.4694) data time 0.0008 (0.0023) model time 0.4590 (0.4663) loss 2.8892 (3.0819) grad_norm 1.2474 (1.5439) loss_scale 4096.0000 (3465.3842) mem 16715MB [2024-08-10 09:35:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][350/625] eta 0:02:09 lr 0.000989 wd 0.0500 time 0.4641 (0.4693) data time 0.0009 (0.0023) model time 0.4632 (0.4663) loss 2.4445 (3.0810) grad_norm 1.3198 (1.5400) loss_scale 4096.0000 (3483.3504) mem 16715MB [2024-08-10 09:35:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][360/625] eta 0:02:04 lr 0.000989 wd 0.0500 time 0.4645 (0.4697) data time 0.0009 (0.0023) model time 0.4635 (0.4668) loss 3.6215 (3.0853) grad_norm 1.6452 (1.5418) loss_scale 4096.0000 (3500.3213) mem 16715MB [2024-08-10 09:35:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][370/625] eta 0:01:59 lr 0.000989 wd 0.0500 time 0.4649 (0.4695) data time 0.0007 (0.0022) model time 0.4642 (0.4667) loss 3.6722 (3.0834) grad_norm 1.1587 (1.5419) loss_scale 4096.0000 (3516.3774) mem 16715MB [2024-08-10 09:35:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][380/625] eta 0:01:55 lr 0.000989 wd 0.0500 time 0.4634 (0.4695) data time 0.0008 (0.0022) model time 0.4626 (0.4667) loss 3.5882 (3.0813) grad_norm 1.1298 (1.5373) loss_scale 4096.0000 (3531.5906) mem 16715MB [2024-08-10 09:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][390/625] eta 0:01:50 lr 0.000989 wd 0.0500 time 0.4638 (0.4694) data time 0.0010 (0.0022) model time 0.4627 (0.4666) loss 3.1194 (3.0789) grad_norm 1.4865 (1.5548) loss_scale 4096.0000 (3546.0256) mem 16715MB [2024-08-10 09:35:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][400/625] eta 0:01:45 lr 0.000989 wd 0.0500 time 0.4667 (0.4694) data time 0.0010 (0.0021) model time 0.4657 (0.4666) loss 2.2911 (3.0754) grad_norm 1.0354 (1.5517) loss_scale 4096.0000 (3559.7406) mem 16715MB [2024-08-10 09:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][410/625] eta 0:01:40 lr 0.000988 wd 0.0500 time 0.4647 (0.4693) data time 0.0009 (0.0021) model time 0.4638 (0.4666) loss 3.9927 (3.0796) grad_norm 2.8327 (1.5536) loss_scale 4096.0000 (3572.7883) mem 16715MB [2024-08-10 09:35:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][420/625] eta 0:01:36 lr 0.000988 wd 0.0500 time 0.4654 (0.4697) data time 0.0008 (0.0021) model time 0.4646 (0.4671) loss 3.0988 (3.0850) grad_norm 1.3337 (1.5557) loss_scale 4096.0000 (3585.2162) mem 16715MB [2024-08-10 09:35:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][430/625] eta 0:01:31 lr 0.000988 wd 0.0500 time 0.4645 (0.4696) data time 0.0008 (0.0021) model time 0.4637 (0.4670) loss 2.7682 (3.0807) grad_norm 1.0369 (1.5499) loss_scale 4096.0000 (3597.0673) mem 16715MB [2024-08-10 09:35:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][440/625] eta 0:01:27 lr 0.000988 wd 0.0500 time 0.4643 (0.4703) data time 0.0008 (0.0020) model time 0.4636 (0.4679) loss 3.6967 (3.0809) grad_norm 2.2832 (1.5525) loss_scale 4096.0000 (3608.3810) mem 16715MB [2024-08-10 09:35:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][450/625] eta 0:01:22 lr 0.000988 wd 0.0500 time 0.4695 (0.4703) data time 0.0009 (0.0020) model time 0.4686 (0.4678) loss 3.4656 (3.0828) grad_norm 1.3081 (1.5545) loss_scale 4096.0000 (3619.1929) mem 16715MB [2024-08-10 09:35:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][460/625] eta 0:01:17 lr 0.000988 wd 0.0500 time 0.4735 (0.4702) data time 0.0009 (0.0020) model time 0.4726 (0.4678) loss 2.8484 (3.0854) grad_norm 1.9135 (1.5553) loss_scale 4096.0000 (3629.5358) mem 16715MB [2024-08-10 09:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][470/625] eta 0:01:12 lr 0.000988 wd 0.0500 time 0.4600 (0.4701) data time 0.0013 (0.0020) model time 0.4587 (0.4677) loss 3.4056 (3.0912) grad_norm 1.3648 (1.5516) loss_scale 4096.0000 (3639.4395) mem 16715MB [2024-08-10 09:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][480/625] eta 0:01:08 lr 0.000988 wd 0.0500 time 0.4611 (0.4700) data time 0.0009 (0.0020) model time 0.4603 (0.4676) loss 2.9904 (3.0913) grad_norm 1.6551 (1.5502) loss_scale 4096.0000 (3648.9314) mem 16715MB [2024-08-10 09:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][490/625] eta 0:01:03 lr 0.000988 wd 0.0500 time 0.4679 (0.4699) data time 0.0009 (0.0019) model time 0.4670 (0.4676) loss 3.7297 (3.0911) grad_norm 1.1283 (1.5511) loss_scale 4096.0000 (3658.0367) mem 16715MB [2024-08-10 09:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][500/625] eta 0:00:58 lr 0.000988 wd 0.0500 time 0.4687 (0.4698) data time 0.0009 (0.0019) model time 0.4677 (0.4674) loss 2.5943 (3.0902) grad_norm 2.3654 (1.5487) loss_scale 4096.0000 (3666.7784) mem 16715MB [2024-08-10 09:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][510/625] eta 0:00:54 lr 0.000988 wd 0.0500 time 0.4617 (0.4700) data time 0.0011 (0.0019) model time 0.4606 (0.4677) loss 2.8675 (3.0908) grad_norm 1.9852 (1.5476) loss_scale 4096.0000 (3675.1781) mem 16715MB [2024-08-10 09:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][520/625] eta 0:00:49 lr 0.000988 wd 0.0500 time 0.4884 (0.4700) data time 0.0008 (0.0019) model time 0.4877 (0.4677) loss 3.7532 (3.0943) grad_norm 2.3874 (1.5544) loss_scale 4096.0000 (3683.2553) mem 16715MB [2024-08-10 09:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][530/625] eta 0:00:44 lr 0.000987 wd 0.0500 time 0.4611 (0.4699) data time 0.0008 (0.0019) model time 0.4602 (0.4676) loss 2.5599 (3.0964) grad_norm 1.5688 (1.5516) loss_scale 4096.0000 (3691.0282) mem 16715MB [2024-08-10 09:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][540/625] eta 0:00:39 lr 0.000987 wd 0.0500 time 0.4681 (0.4698) data time 0.0010 (0.0019) model time 0.4670 (0.4676) loss 2.9175 (3.0980) grad_norm 1.4079 (1.5533) loss_scale 4096.0000 (3698.5139) mem 16715MB [2024-08-10 09:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][550/625] eta 0:00:35 lr 0.000987 wd 0.0500 time 0.4639 (0.4697) data time 0.0008 (0.0019) model time 0.4631 (0.4675) loss 3.6415 (3.0997) grad_norm 1.7003 (1.5534) loss_scale 4096.0000 (3705.7278) mem 16715MB [2024-08-10 09:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][560/625] eta 0:00:30 lr 0.000987 wd 0.0500 time 0.4581 (0.4696) data time 0.0008 (0.0018) model time 0.4573 (0.4674) loss 3.4807 (3.1019) grad_norm 1.2884 (1.5497) loss_scale 4096.0000 (3712.6845) mem 16715MB [2024-08-10 09:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][570/625] eta 0:00:25 lr 0.000987 wd 0.0500 time 0.4686 (0.4695) data time 0.0007 (0.0018) model time 0.4679 (0.4673) loss 3.2912 (3.1026) grad_norm 1.4762 (1.5526) loss_scale 4096.0000 (3719.3975) mem 16715MB [2024-08-10 09:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][580/625] eta 0:00:21 lr 0.000987 wd 0.0500 time 0.4638 (0.4693) data time 0.0010 (0.0018) model time 0.4628 (0.4672) loss 3.3289 (3.1055) grad_norm 1.4539 (1.5495) loss_scale 4096.0000 (3725.8795) mem 16715MB [2024-08-10 09:36:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][590/625] eta 0:00:16 lr 0.000987 wd 0.0500 time 0.4653 (0.4696) data time 0.0009 (0.0018) model time 0.4644 (0.4675) loss 3.1255 (3.1055) grad_norm 1.1243 (1.5427) loss_scale 4096.0000 (3732.1421) mem 16715MB [2024-08-10 09:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][600/625] eta 0:00:11 lr 0.000987 wd 0.0500 time 0.4667 (0.4695) data time 0.0010 (0.0018) model time 0.4657 (0.4674) loss 3.3250 (3.1067) grad_norm 1.0928 (1.5420) loss_scale 4096.0000 (3738.1963) mem 16715MB [2024-08-10 09:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][610/625] eta 0:00:07 lr 0.000987 wd 0.0500 time 0.4619 (0.4697) data time 0.0005 (0.0018) model time 0.4614 (0.4676) loss 2.4814 (3.1049) grad_norm 1.3514 (1.5389) loss_scale 4096.0000 (3744.0524) mem 16715MB [2024-08-10 09:37:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [97/300][620/625] eta 0:00:02 lr 0.000987 wd 0.0500 time 0.4615 (0.4696) data time 0.0005 (0.0018) model time 0.4610 (0.4675) loss 3.1214 (3.1110) grad_norm 1.3208 (1.5368) loss_scale 4096.0000 (3749.7198) mem 16715MB [2024-08-10 09:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 97 training takes 0:04:53 [2024-08-10 09:37:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:37:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5938 (0.5938) Acc@1 87.793 (87.793) Acc@5 97.998 (97.998) Mem 16715MB [2024-08-10 09:37:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.9590 (0.7220) Acc@1 77.441 (84.286) Acc@5 95.068 (97.203) Mem 16715MB [2024-08-10 09:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0762 (0.8526) Acc@1 74.268 (81.006) Acc@5 93.311 (95.652) Mem 16715MB [2024-08-10 09:37:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.738 Acc@5 95.661 [2024-08-10 09:37:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-10 09:37:19 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.74% [2024-08-10 09:37:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 09:37:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 09:37:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.5063 (0.5063) Acc@1 88.623 (88.623) Acc@5 98.438 (98.438) Mem 16715MB [2024-08-10 09:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.8198 (0.6325) Acc@1 80.225 (85.875) Acc@5 95.752 (97.528) Mem 16715MB [2024-08-10 09:37:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9316 (0.7486) Acc@1 76.221 (82.706) Acc@5 94.775 (96.301) Mem 16715MB [2024-08-10 09:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.420 Acc@5 96.339 [2024-08-10 09:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 09:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.42% [2024-08-10 09:37:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:37:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][0/625] eta 0:08:51 lr 0.000987 wd 0.0500 time 0.8499 (0.8499) data time 0.4323 (0.4323) model time 0.0000 (0.0000) loss 3.5568 (3.5568) grad_norm 1.6677 (1.6677) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][10/625] eta 0:05:19 lr 0.000987 wd 0.0500 time 0.4666 (0.5191) data time 0.0008 (0.0403) model time 0.0000 (0.0000) loss 3.3897 (3.2579) grad_norm 1.0410 (1.4677) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][20/625] eta 0:04:58 lr 0.000987 wd 0.0500 time 0.4620 (0.4930) data time 0.0011 (0.0216) model time 0.0000 (0.0000) loss 3.2170 (3.2367) grad_norm 1.9704 (1.4516) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][30/625] eta 0:04:48 lr 0.000986 wd 0.0500 time 0.4716 (0.4848) data time 0.0009 (0.0150) model time 0.0000 (0.0000) loss 3.2274 (3.1948) grad_norm 1.2380 (1.6583) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][40/625] eta 0:04:41 lr 0.000986 wd 0.0500 time 0.4670 (0.4804) data time 0.0010 (0.0116) model time 0.0000 (0.0000) loss 3.3707 (3.2412) grad_norm 1.5630 (1.6303) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][50/625] eta 0:04:36 lr 0.000986 wd 0.0500 time 0.6101 (0.4807) data time 0.0010 (0.0095) model time 0.0000 (0.0000) loss 2.8427 (3.2515) grad_norm 2.0291 (1.5863) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][60/625] eta 0:04:29 lr 0.000986 wd 0.0500 time 0.4687 (0.4775) data time 0.0010 (0.0082) model time 0.4676 (0.4600) loss 2.6330 (3.2140) grad_norm 1.2637 (1.5343) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][70/625] eta 0:04:23 lr 0.000986 wd 0.0500 time 0.4619 (0.4757) data time 0.0008 (0.0071) model time 0.4611 (0.4617) loss 2.9843 (3.1679) grad_norm 1.9352 (1.5274) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][80/625] eta 0:04:18 lr 0.000986 wd 0.0500 time 0.4651 (0.4743) data time 0.0008 (0.0064) model time 0.4643 (0.4623) loss 3.8996 (3.1616) grad_norm 1.7605 (1.5368) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][90/625] eta 0:04:13 lr 0.000986 wd 0.0500 time 0.4677 (0.4734) data time 0.0011 (0.0058) model time 0.4666 (0.4630) loss 3.1311 (3.1583) grad_norm 1.3792 (1.5429) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][100/625] eta 0:04:08 lr 0.000986 wd 0.0500 time 0.4671 (0.4726) data time 0.0008 (0.0054) model time 0.4663 (0.4633) loss 2.6616 (3.1451) grad_norm 1.1389 (1.5703) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][110/625] eta 0:04:03 lr 0.000986 wd 0.0500 time 0.4683 (0.4721) data time 0.0010 (0.0050) model time 0.4673 (0.4637) loss 3.0684 (3.1282) grad_norm 1.2046 (1.5878) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][120/625] eta 0:03:58 lr 0.000986 wd 0.0500 time 0.4554 (0.4716) data time 0.0007 (0.0046) model time 0.4547 (0.4639) loss 2.5596 (3.1174) grad_norm 1.1579 (1.5686) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][130/625] eta 0:03:53 lr 0.000986 wd 0.0500 time 0.4720 (0.4726) data time 0.0010 (0.0044) model time 0.4710 (0.4664) loss 3.1404 (3.1112) grad_norm 1.8820 (1.5497) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][140/625] eta 0:03:49 lr 0.000986 wd 0.0500 time 0.4677 (0.4722) data time 0.0012 (0.0041) model time 0.4665 (0.4663) loss 2.7108 (3.1279) grad_norm 2.0724 (1.5806) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][150/625] eta 0:03:44 lr 0.000985 wd 0.0500 time 0.4673 (0.4717) data time 0.0008 (0.0039) model time 0.4666 (0.4661) loss 2.6830 (3.1313) grad_norm 0.8554 (1.5724) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][160/625] eta 0:03:39 lr 0.000985 wd 0.0500 time 0.4634 (0.4712) data time 0.0010 (0.0038) model time 0.4624 (0.4657) loss 3.5843 (3.1337) grad_norm 1.6475 (1.5786) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][170/625] eta 0:03:34 lr 0.000985 wd 0.0500 time 0.4650 (0.4707) data time 0.0008 (0.0036) model time 0.4642 (0.4654) loss 3.5591 (3.1385) grad_norm 0.9102 (1.5585) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][180/625] eta 0:03:29 lr 0.000985 wd 0.0500 time 0.4637 (0.4704) data time 0.0010 (0.0035) model time 0.4627 (0.4653) loss 2.3589 (3.1263) grad_norm 1.9754 (1.5492) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][190/625] eta 0:03:24 lr 0.000985 wd 0.0500 time 0.4599 (0.4701) data time 0.0010 (0.0033) model time 0.4588 (0.4651) loss 3.2483 (3.1352) grad_norm 1.2482 (1.5488) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][200/625] eta 0:03:20 lr 0.000985 wd 0.0500 time 0.4629 (0.4707) data time 0.0008 (0.0032) model time 0.4621 (0.4663) loss 2.6756 (3.1327) grad_norm 1.2779 (1.5449) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][210/625] eta 0:03:15 lr 0.000985 wd 0.0500 time 0.4695 (0.4703) data time 0.0008 (0.0031) model time 0.4687 (0.4659) loss 3.7487 (3.1340) grad_norm 1.4436 (1.5436) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][220/625] eta 0:03:10 lr 0.000985 wd 0.0500 time 0.4627 (0.4698) data time 0.0011 (0.0030) model time 0.4615 (0.4655) loss 3.2887 (3.1442) grad_norm 1.7041 (1.5566) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][230/625] eta 0:03:05 lr 0.000985 wd 0.0500 time 0.4584 (0.4696) data time 0.0008 (0.0029) model time 0.4576 (0.4653) loss 4.0077 (3.1525) grad_norm 1.2640 (1.5523) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][240/625] eta 0:03:00 lr 0.000985 wd 0.0500 time 0.4610 (0.4693) data time 0.0010 (0.0029) model time 0.4600 (0.4651) loss 3.8467 (3.1573) grad_norm 1.7299 (1.5478) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][250/625] eta 0:02:55 lr 0.000985 wd 0.0500 time 0.4739 (0.4690) data time 0.0008 (0.0028) model time 0.4732 (0.4650) loss 2.5178 (3.1630) grad_norm 1.4113 (1.5420) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][260/625] eta 0:02:51 lr 0.000985 wd 0.0500 time 0.4603 (0.4689) data time 0.0010 (0.0027) model time 0.4592 (0.4649) loss 2.4781 (3.1587) grad_norm 1.3708 (1.5350) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][270/625] eta 0:02:46 lr 0.000984 wd 0.0500 time 0.6889 (0.4698) data time 0.0008 (0.0027) model time 0.6881 (0.4661) loss 2.4589 (3.1555) grad_norm 1.2547 (1.5253) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][280/625] eta 0:02:41 lr 0.000984 wd 0.0500 time 0.4625 (0.4694) data time 0.0010 (0.0026) model time 0.4615 (0.4657) loss 2.9688 (3.1568) grad_norm 1.6486 (1.5310) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][290/625] eta 0:02:37 lr 0.000984 wd 0.0500 time 0.4603 (0.4692) data time 0.0008 (0.0026) model time 0.4596 (0.4656) loss 3.3015 (3.1608) grad_norm 1.9747 (1.5350) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][300/625] eta 0:02:32 lr 0.000984 wd 0.0500 time 0.6706 (0.4696) data time 0.0008 (0.0025) model time 0.6698 (0.4663) loss 3.3012 (3.1524) grad_norm 1.3470 (1.5310) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][310/625] eta 0:02:27 lr 0.000984 wd 0.0500 time 0.4647 (0.4695) data time 0.0009 (0.0025) model time 0.4637 (0.4662) loss 2.4969 (3.1435) grad_norm 1.7573 (1.5342) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:39:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][320/625] eta 0:02:23 lr 0.000984 wd 0.0500 time 0.4593 (0.4695) data time 0.0012 (0.0025) model time 0.4582 (0.4662) loss 3.2808 (3.1383) grad_norm 1.5296 (1.5324) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][330/625] eta 0:02:18 lr 0.000984 wd 0.0500 time 0.5327 (0.4695) data time 0.0010 (0.0024) model time 0.5317 (0.4664) loss 3.2214 (3.1383) grad_norm 1.7388 (1.5424) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][340/625] eta 0:02:13 lr 0.000984 wd 0.0500 time 0.4595 (0.4694) data time 0.0010 (0.0024) model time 0.4585 (0.4663) loss 3.5105 (3.1447) grad_norm 1.6560 (1.5447) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][350/625] eta 0:02:09 lr 0.000984 wd 0.0500 time 0.4645 (0.4693) data time 0.0010 (0.0023) model time 0.4635 (0.4663) loss 2.9362 (3.1524) grad_norm 1.3576 (1.5495) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][360/625] eta 0:02:04 lr 0.000984 wd 0.0500 time 0.4720 (0.4693) data time 0.0010 (0.0024) model time 0.4710 (0.4663) loss 3.1713 (3.1533) grad_norm 1.5772 (1.5572) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][370/625] eta 0:01:59 lr 0.000984 wd 0.0500 time 0.4631 (0.4693) data time 0.0008 (0.0023) model time 0.4623 (0.4663) loss 3.8095 (3.1571) grad_norm 1.4083 (1.5564) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][380/625] eta 0:01:55 lr 0.000984 wd 0.0500 time 0.4662 (0.4694) data time 0.0010 (0.0023) model time 0.4652 (0.4665) loss 2.3966 (3.1569) grad_norm 1.7284 (1.5575) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][390/625] eta 0:01:50 lr 0.000983 wd 0.0500 time 0.4807 (0.4694) data time 0.0010 (0.0023) model time 0.4797 (0.4665) loss 3.0834 (3.1542) grad_norm 1.8179 (1.5557) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][400/625] eta 0:01:45 lr 0.000983 wd 0.0500 time 0.4666 (0.4696) data time 0.0008 (0.0023) model time 0.4657 (0.4667) loss 1.9203 (3.1454) grad_norm 1.4445 (1.5616) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][410/625] eta 0:01:40 lr 0.000983 wd 0.0500 time 0.5053 (0.4696) data time 0.0011 (0.0023) model time 0.5042 (0.4668) loss 3.5833 (3.1479) grad_norm 1.4987 (1.5710) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][420/625] eta 0:01:36 lr 0.000983 wd 0.0500 time 0.4682 (0.4706) data time 0.0011 (0.0023) model time 0.4672 (0.4679) loss 3.1417 (3.1475) grad_norm 1.3801 (1.5647) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][430/625] eta 0:01:31 lr 0.000983 wd 0.0500 time 0.4711 (0.4706) data time 0.0008 (0.0023) model time 0.4703 (0.4679) loss 3.0250 (3.1559) grad_norm 1.5544 (1.5651) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][440/625] eta 0:01:27 lr 0.000983 wd 0.0500 time 0.4607 (0.4706) data time 0.0008 (0.0024) model time 0.4599 (0.4679) loss 3.9865 (3.1509) grad_norm 1.0329 (1.5616) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:40:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][450/625] eta 0:01:22 lr 0.000983 wd 0.0500 time 0.4597 (0.4704) data time 0.0008 (0.0023) model time 0.4589 (0.4677) loss 2.7557 (3.1528) grad_norm 1.6737 (1.5574) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][460/625] eta 0:01:17 lr 0.000983 wd 0.0500 time 0.4717 (0.4703) data time 0.0010 (0.0023) model time 0.4708 (0.4676) loss 3.4367 (3.1533) grad_norm 1.0373 (1.5544) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][470/625] eta 0:01:12 lr 0.000983 wd 0.0500 time 0.4624 (0.4706) data time 0.0008 (0.0023) model time 0.4616 (0.4679) loss 3.1349 (3.1576) grad_norm 1.4797 (1.5525) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][480/625] eta 0:01:08 lr 0.000983 wd 0.0500 time 0.4631 (0.4705) data time 0.0008 (0.0023) model time 0.4623 (0.4679) loss 2.7006 (3.1620) grad_norm 1.5368 (1.5518) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][490/625] eta 0:01:03 lr 0.000983 wd 0.0500 time 0.5993 (0.4707) data time 0.0008 (0.0022) model time 0.5985 (0.4681) loss 3.3366 (3.1616) grad_norm 1.4427 (1.5511) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][500/625] eta 0:00:58 lr 0.000983 wd 0.0500 time 0.4623 (0.4706) data time 0.0011 (0.0022) model time 0.4612 (0.4681) loss 3.3324 (3.1634) grad_norm 1.2591 (1.5502) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][510/625] eta 0:00:54 lr 0.000982 wd 0.0500 time 0.4661 (0.4705) data time 0.0008 (0.0022) model time 0.4653 (0.4680) loss 3.0572 (3.1630) grad_norm 1.3654 (1.5490) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][520/625] eta 0:00:49 lr 0.000982 wd 0.0500 time 0.4636 (0.4703) data time 0.0011 (0.0022) model time 0.4625 (0.4678) loss 3.2580 (3.1602) grad_norm 1.7529 (1.5509) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][530/625] eta 0:00:44 lr 0.000982 wd 0.0500 time 0.4780 (0.4702) data time 0.0008 (0.0022) model time 0.4772 (0.4677) loss 3.8847 (3.1612) grad_norm 1.8823 (1.5539) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][540/625] eta 0:00:39 lr 0.000982 wd 0.0500 time 0.4669 (0.4701) data time 0.0010 (0.0021) model time 0.4659 (0.4676) loss 3.8655 (3.1684) grad_norm 1.4145 (1.5526) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][550/625] eta 0:00:35 lr 0.000982 wd 0.0500 time 0.4662 (0.4701) data time 0.0009 (0.0021) model time 0.4653 (0.4676) loss 3.3480 (3.1702) grad_norm 1.6237 (1.5537) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][560/625] eta 0:00:30 lr 0.000982 wd 0.0500 time 0.4667 (0.4700) data time 0.0008 (0.0021) model time 0.4659 (0.4676) loss 3.0224 (3.1648) grad_norm 1.7690 (1.5535) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][570/625] eta 0:00:25 lr 0.000982 wd 0.0500 time 0.4665 (0.4703) data time 0.0007 (0.0021) model time 0.4657 (0.4679) loss 3.0588 (3.1581) grad_norm 1.5682 (1.5546) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:41:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][580/625] eta 0:00:21 lr 0.000982 wd 0.0500 time 0.4602 (0.4702) data time 0.0008 (0.0021) model time 0.4594 (0.4678) loss 2.5051 (3.1587) grad_norm 2.3908 (1.5609) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][590/625] eta 0:00:16 lr 0.000982 wd 0.0500 time 0.4611 (0.4701) data time 0.0011 (0.0020) model time 0.4600 (0.4677) loss 2.5073 (3.1591) grad_norm 1.4610 (1.5600) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][600/625] eta 0:00:11 lr 0.000982 wd 0.0500 time 0.4607 (0.4699) data time 0.0008 (0.0020) model time 0.4599 (0.4676) loss 2.0843 (3.1581) grad_norm 1.2914 (1.5607) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][610/625] eta 0:00:07 lr 0.000982 wd 0.0500 time 0.4570 (0.4697) data time 0.0005 (0.0020) model time 0.4564 (0.4674) loss 3.6511 (3.1611) grad_norm 1.3392 (1.5562) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [98/300][620/625] eta 0:00:02 lr 0.000982 wd 0.0500 time 0.4611 (0.4696) data time 0.0007 (0.0020) model time 0.4604 (0.4673) loss 3.5777 (3.1601) grad_norm 1.4133 (1.5593) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:19 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 98 training takes 0:04:53 [2024-08-10 09:42:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:42:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5571 (0.5571) Acc@1 86.963 (86.963) Acc@5 98.438 (98.438) Mem 16715MB [2024-08-10 09:42:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.9629 (0.7042) Acc@1 76.904 (84.455) Acc@5 94.092 (97.128) Mem 16715MB [2024-08-10 09:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0459 (0.8403) Acc@1 74.463 (81.087) Acc@5 93.848 (95.657) Mem 16715MB [2024-08-10 09:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.748 Acc@5 95.689 [2024-08-10 09:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.7% [2024-08-10 09:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.75% [2024-08-10 09:42:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 09:42:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 09:42:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.569 (0.569) Loss 0.5049 (0.5049) Acc@1 88.721 (88.721) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:42:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.165) Loss 0.8179 (0.6313) Acc@1 80.225 (85.875) Acc@5 95.752 (97.554) Mem 16715MB [2024-08-10 09:42:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.143) Loss 0.9297 (0.7470) Acc@1 76.367 (82.722) Acc@5 94.727 (96.301) Mem 16715MB [2024-08-10 09:42:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.436 Acc@5 96.345 [2024-08-10 09:42:29 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 09:42:29 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.44% [2024-08-10 09:42:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:42:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:42:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][0/625] eta 0:11:44 lr 0.000982 wd 0.0500 time 1.1278 (1.1278) data time 0.4838 (0.4838) model time 0.0000 (0.0000) loss 2.9122 (2.9122) grad_norm 1.3627 (1.3627) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][10/625] eta 0:05:19 lr 0.000981 wd 0.0500 time 0.4592 (0.5192) data time 0.0010 (0.0450) model time 0.0000 (0.0000) loss 3.0598 (2.7741) grad_norm 1.7941 (1.4948) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][20/625] eta 0:04:58 lr 0.000981 wd 0.0500 time 0.4701 (0.4937) data time 0.0008 (0.0241) model time 0.0000 (0.0000) loss 3.6650 (3.0119) grad_norm 3.4839 (1.7041) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][30/625] eta 0:04:48 lr 0.000981 wd 0.0500 time 0.4549 (0.4840) data time 0.0012 (0.0167) model time 0.0000 (0.0000) loss 2.6147 (3.0404) grad_norm 1.2462 (1.6471) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][40/625] eta 0:04:40 lr 0.000981 wd 0.0500 time 0.4674 (0.4787) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 2.8367 (3.0793) grad_norm 1.6549 (1.5730) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:42:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][50/625] eta 0:04:35 lr 0.000981 wd 0.0500 time 0.4637 (0.4795) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 3.0648 (3.1001) grad_norm 1.4011 (1.5678) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][60/625] eta 0:04:29 lr 0.000981 wd 0.0500 time 0.4657 (0.4771) data time 0.0009 (0.0090) model time 0.4649 (0.4642) loss 2.8901 (3.1320) grad_norm 1.8529 (1.5671) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][70/625] eta 0:04:23 lr 0.000981 wd 0.0500 time 0.4643 (0.4751) data time 0.0008 (0.0079) model time 0.4635 (0.4630) loss 3.3372 (3.1698) grad_norm 2.3296 (1.5788) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][80/625] eta 0:04:20 lr 0.000981 wd 0.0500 time 0.4633 (0.4775) data time 0.0008 (0.0070) model time 0.4625 (0.4731) loss 3.7792 (3.1709) grad_norm 1.4863 (1.6059) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][90/625] eta 0:04:14 lr 0.000981 wd 0.0500 time 0.4660 (0.4760) data time 0.0008 (0.0064) model time 0.4653 (0.4705) loss 2.0429 (3.1406) grad_norm 1.1777 (1.5972) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][100/625] eta 0:04:09 lr 0.000981 wd 0.0500 time 0.4636 (0.4748) data time 0.0007 (0.0058) model time 0.4629 (0.4689) loss 2.2543 (3.0903) grad_norm 1.2609 (1.5849) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][110/625] eta 0:04:04 lr 0.000981 wd 0.0500 time 0.4637 (0.4738) data time 0.0008 (0.0054) model time 0.4629 (0.4680) loss 4.0913 (3.0869) grad_norm 1.1887 (1.5681) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][120/625] eta 0:03:59 lr 0.000981 wd 0.0500 time 0.4720 (0.4752) data time 0.0008 (0.0050) model time 0.4712 (0.4711) loss 2.6624 (3.0765) grad_norm 1.8183 (1.5668) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][130/625] eta 0:03:54 lr 0.000980 wd 0.0500 time 0.4675 (0.4745) data time 0.0010 (0.0047) model time 0.4665 (0.4702) loss 2.8444 (3.1031) grad_norm 1.1976 (1.5531) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][140/625] eta 0:03:49 lr 0.000980 wd 0.0500 time 0.4656 (0.4739) data time 0.0010 (0.0045) model time 0.4646 (0.4697) loss 3.4605 (3.1336) grad_norm 1.6085 (1.5549) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][150/625] eta 0:03:44 lr 0.000980 wd 0.0500 time 0.4664 (0.4734) data time 0.0008 (0.0043) model time 0.4656 (0.4693) loss 2.9824 (3.1251) grad_norm 1.5518 (1.5478) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][160/625] eta 0:03:39 lr 0.000980 wd 0.0500 time 0.4681 (0.4729) data time 0.0011 (0.0041) model time 0.4671 (0.4688) loss 3.4629 (3.1283) grad_norm 1.8768 (1.5611) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][170/625] eta 0:03:34 lr 0.000980 wd 0.0500 time 0.4624 (0.4723) data time 0.0008 (0.0039) model time 0.4616 (0.4681) loss 2.4364 (3.1270) grad_norm 1.6920 (1.5549) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:43:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][180/625] eta 0:03:29 lr 0.000980 wd 0.0500 time 0.4622 (0.4718) data time 0.0008 (0.0038) model time 0.4614 (0.4676) loss 2.3003 (3.1270) grad_norm 1.1385 (1.5443) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][190/625] eta 0:03:24 lr 0.000980 wd 0.0500 time 0.4606 (0.4713) data time 0.0010 (0.0037) model time 0.4596 (0.4671) loss 3.6649 (3.1387) grad_norm 1.2633 (1.5380) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][200/625] eta 0:03:20 lr 0.000980 wd 0.0500 time 0.4673 (0.4709) data time 0.0008 (0.0035) model time 0.4665 (0.4669) loss 3.1627 (3.1260) grad_norm 1.6440 (1.5292) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][210/625] eta 0:03:15 lr 0.000980 wd 0.0500 time 0.4618 (0.4706) data time 0.0010 (0.0034) model time 0.4609 (0.4666) loss 3.1329 (3.1235) grad_norm 1.2085 (1.5236) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][220/625] eta 0:03:10 lr 0.000980 wd 0.0500 time 0.4653 (0.4704) data time 0.0008 (0.0033) model time 0.4645 (0.4665) loss 2.0959 (3.1076) grad_norm 2.2486 (1.5154) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][230/625] eta 0:03:06 lr 0.000980 wd 0.0500 time 0.4638 (0.4709) data time 0.0008 (0.0032) model time 0.4630 (0.4673) loss 2.7191 (3.1054) grad_norm 1.5496 (1.5275) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][240/625] eta 0:03:01 lr 0.000980 wd 0.0500 time 0.4651 (0.4710) data time 0.0008 (0.0031) model time 0.4643 (0.4676) loss 3.1435 (3.1030) grad_norm 1.0925 (1.5426) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][250/625] eta 0:02:56 lr 0.000979 wd 0.0500 time 0.4599 (0.4707) data time 0.0008 (0.0030) model time 0.4591 (0.4673) loss 3.5696 (3.1040) grad_norm 1.5293 (1.5395) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][260/625] eta 0:02:51 lr 0.000979 wd 0.0500 time 0.4563 (0.4705) data time 0.0010 (0.0030) model time 0.4553 (0.4671) loss 3.0709 (3.1010) grad_norm 1.3196 (1.5447) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][270/625] eta 0:02:47 lr 0.000979 wd 0.0500 time 0.5153 (0.4705) data time 0.0009 (0.0029) model time 0.5143 (0.4673) loss 2.6028 (3.0951) grad_norm 1.3111 (1.5406) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][280/625] eta 0:02:42 lr 0.000979 wd 0.0500 time 0.4669 (0.4703) data time 0.0010 (0.0029) model time 0.4659 (0.4671) loss 3.5589 (3.1016) grad_norm 2.3459 (1.5432) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][290/625] eta 0:02:37 lr 0.000979 wd 0.0500 time 0.4634 (0.4703) data time 0.0008 (0.0028) model time 0.4627 (0.4671) loss 3.1139 (3.0984) grad_norm 1.7379 (1.5507) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][300/625] eta 0:02:33 lr 0.000979 wd 0.0500 time 0.4630 (0.4711) data time 0.0009 (0.0027) model time 0.4621 (0.4682) loss 2.3023 (3.0939) grad_norm 1.3270 (1.5477) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][310/625] eta 0:02:28 lr 0.000979 wd 0.0500 time 0.4621 (0.4708) data time 0.0009 (0.0027) model time 0.4612 (0.4680) loss 3.6206 (3.0953) grad_norm 1.4264 (1.5414) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][320/625] eta 0:02:23 lr 0.000979 wd 0.0500 time 0.4733 (0.4707) data time 0.0008 (0.0026) model time 0.4725 (0.4679) loss 3.3079 (3.0959) grad_norm 1.3015 (1.5340) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][330/625] eta 0:02:18 lr 0.000979 wd 0.0500 time 0.4614 (0.4704) data time 0.0008 (0.0026) model time 0.4606 (0.4676) loss 2.8968 (3.0991) grad_norm 2.2079 (1.5361) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][340/625] eta 0:02:13 lr 0.000979 wd 0.0500 time 0.4648 (0.4702) data time 0.0008 (0.0026) model time 0.4640 (0.4674) loss 2.2672 (3.0895) grad_norm 1.4928 (1.5304) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][350/625] eta 0:02:09 lr 0.000979 wd 0.0500 time 0.4733 (0.4700) data time 0.0012 (0.0025) model time 0.4722 (0.4673) loss 2.2818 (3.0822) grad_norm 1.2471 (1.5331) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][360/625] eta 0:02:04 lr 0.000979 wd 0.0500 time 0.4627 (0.4699) data time 0.0008 (0.0025) model time 0.4619 (0.4671) loss 3.1626 (3.0819) grad_norm 1.3564 (1.5356) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][370/625] eta 0:01:59 lr 0.000978 wd 0.0500 time 0.4602 (0.4702) data time 0.0010 (0.0024) model time 0.4592 (0.4676) loss 3.3668 (3.0843) grad_norm 1.6838 (1.5357) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][380/625] eta 0:01:55 lr 0.000978 wd 0.0500 time 0.4640 (0.4700) data time 0.0008 (0.0024) model time 0.4633 (0.4674) loss 3.4089 (3.0898) grad_norm 1.3809 (1.5310) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][390/625] eta 0:01:50 lr 0.000978 wd 0.0500 time 0.4640 (0.4698) data time 0.0008 (0.0024) model time 0.4632 (0.4672) loss 3.4258 (3.0885) grad_norm 2.0810 (1.5338) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 09:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][400/625] eta 0:01:45 lr 0.000978 wd 0.0500 time 0.4683 (0.4696) data time 0.0008 (0.0023) model time 0.4675 (0.4670) loss 3.3691 (3.0852) grad_norm 2.3120 (inf) loss_scale 2048.0000 (4070.4638) mem 16715MB [2024-08-10 09:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][410/625] eta 0:01:40 lr 0.000978 wd 0.0500 time 0.4505 (0.4693) data time 0.0008 (0.0023) model time 0.4497 (0.4668) loss 3.2992 (3.0905) grad_norm 1.4510 (inf) loss_scale 2048.0000 (4021.2555) mem 16715MB [2024-08-10 09:45:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][420/625] eta 0:01:36 lr 0.000978 wd 0.0500 time 0.4638 (0.4692) data time 0.0010 (0.0023) model time 0.4627 (0.4666) loss 3.0367 (3.0934) grad_norm 1.8420 (inf) loss_scale 2048.0000 (3974.3848) mem 16715MB [2024-08-10 09:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][430/625] eta 0:01:31 lr 0.000978 wd 0.0500 time 0.4687 (0.4691) data time 0.0010 (0.0022) model time 0.4677 (0.4665) loss 2.6309 (3.0932) grad_norm 1.3495 (inf) loss_scale 2048.0000 (3929.6891) mem 16715MB [2024-08-10 09:45:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][440/625] eta 0:01:26 lr 0.000978 wd 0.0500 time 0.4632 (0.4695) data time 0.0010 (0.0022) model time 0.4622 (0.4670) loss 2.9232 (3.0956) grad_norm 1.1801 (inf) loss_scale 2048.0000 (3887.0204) mem 16715MB [2024-08-10 09:46:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][450/625] eta 0:01:22 lr 0.000978 wd 0.0500 time 0.4554 (0.4697) data time 0.0008 (0.0022) model time 0.4546 (0.4673) loss 2.8115 (3.0982) grad_norm 1.2957 (inf) loss_scale 2048.0000 (3846.2439) mem 16715MB [2024-08-10 09:46:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][460/625] eta 0:01:17 lr 0.000978 wd 0.0500 time 0.4617 (0.4704) data time 0.0011 (0.0022) model time 0.4607 (0.4682) loss 2.6156 (3.1026) grad_norm 1.8768 (inf) loss_scale 2048.0000 (3807.2364) mem 16715MB [2024-08-10 09:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][470/625] eta 0:01:12 lr 0.000978 wd 0.0500 time 0.4651 (0.4703) data time 0.0008 (0.0021) model time 0.4643 (0.4680) loss 2.3153 (3.1033) grad_norm 2.3148 (inf) loss_scale 2048.0000 (3769.8854) mem 16715MB [2024-08-10 09:46:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][480/625] eta 0:01:08 lr 0.000978 wd 0.0500 time 0.4583 (0.4701) data time 0.0008 (0.0021) model time 0.4575 (0.4679) loss 1.9867 (3.1048) grad_norm 1.3010 (inf) loss_scale 2048.0000 (3734.0873) mem 16715MB [2024-08-10 09:46:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][490/625] eta 0:01:03 lr 0.000977 wd 0.0500 time 0.4650 (0.4700) data time 0.0008 (0.0021) model time 0.4642 (0.4678) loss 3.1173 (3.1067) grad_norm 1.0774 (inf) loss_scale 2048.0000 (3699.7475) mem 16715MB [2024-08-10 09:46:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][500/625] eta 0:00:58 lr 0.000977 wd 0.0500 time 0.4665 (0.4699) data time 0.0009 (0.0021) model time 0.4656 (0.4677) loss 3.6891 (3.1110) grad_norm 1.5312 (inf) loss_scale 2048.0000 (3666.7784) mem 16715MB [2024-08-10 09:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][510/625] eta 0:00:54 lr 0.000977 wd 0.0500 time 0.4690 (0.4698) data time 0.0009 (0.0021) model time 0.4680 (0.4676) loss 3.0944 (3.1068) grad_norm 1.0851 (inf) loss_scale 2048.0000 (3635.0998) mem 16715MB [2024-08-10 09:46:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][520/625] eta 0:00:49 lr 0.000977 wd 0.0500 time 0.4609 (0.4697) data time 0.0008 (0.0020) model time 0.4601 (0.4676) loss 3.4932 (3.1038) grad_norm 1.3582 (inf) loss_scale 2048.0000 (3604.6372) mem 16715MB [2024-08-10 09:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][530/625] eta 0:00:44 lr 0.000977 wd 0.0500 time 0.4717 (0.4696) data time 0.0010 (0.0020) model time 0.4707 (0.4675) loss 2.7689 (3.0984) grad_norm 1.1893 (inf) loss_scale 2048.0000 (3575.3220) mem 16715MB [2024-08-10 09:46:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][540/625] eta 0:00:39 lr 0.000977 wd 0.0500 time 0.4559 (0.4695) data time 0.0010 (0.0020) model time 0.4550 (0.4673) loss 2.8654 (3.1022) grad_norm 2.1681 (inf) loss_scale 2048.0000 (3547.0906) mem 16715MB [2024-08-10 09:46:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][550/625] eta 0:00:35 lr 0.000977 wd 0.0500 time 0.4616 (0.4693) data time 0.0008 (0.0020) model time 0.4608 (0.4672) loss 3.9094 (3.1026) grad_norm 1.5147 (inf) loss_scale 2048.0000 (3519.8838) mem 16715MB [2024-08-10 09:46:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][560/625] eta 0:00:30 lr 0.000977 wd 0.0500 time 0.4671 (0.4693) data time 0.0010 (0.0020) model time 0.4662 (0.4671) loss 2.9607 (3.1037) grad_norm 1.1407 (inf) loss_scale 2048.0000 (3493.6471) mem 16715MB [2024-08-10 09:46:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][570/625] eta 0:00:25 lr 0.000977 wd 0.0500 time 0.4683 (0.4692) data time 0.0010 (0.0020) model time 0.4672 (0.4671) loss 3.3279 (3.1071) grad_norm 1.0453 (inf) loss_scale 2048.0000 (3468.3292) mem 16715MB [2024-08-10 09:47:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][580/625] eta 0:00:21 lr 0.000977 wd 0.0500 time 0.4657 (0.4692) data time 0.0007 (0.0019) model time 0.4650 (0.4671) loss 3.6242 (3.1035) grad_norm 1.7263 (inf) loss_scale 2048.0000 (3443.8830) mem 16715MB [2024-08-10 09:47:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][590/625] eta 0:00:16 lr 0.000977 wd 0.0500 time 0.4619 (0.4691) data time 0.0008 (0.0019) model time 0.4611 (0.4670) loss 3.5027 (3.1061) grad_norm 2.0515 (inf) loss_scale 2048.0000 (3420.2640) mem 16715MB [2024-08-10 09:47:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][600/625] eta 0:00:11 lr 0.000977 wd 0.0500 time 0.4621 (0.4690) data time 0.0010 (0.0019) model time 0.4611 (0.4669) loss 3.4551 (3.1054) grad_norm 2.0146 (inf) loss_scale 2048.0000 (3397.4309) mem 16715MB [2024-08-10 09:47:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][610/625] eta 0:00:07 lr 0.000976 wd 0.0500 time 0.4613 (0.4692) data time 0.0005 (0.0019) model time 0.4608 (0.4671) loss 3.3543 (3.1102) grad_norm 1.0902 (inf) loss_scale 2048.0000 (3375.3453) mem 16715MB [2024-08-10 09:47:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [99/300][620/625] eta 0:00:02 lr 0.000976 wd 0.0500 time 0.4628 (0.4690) data time 0.0006 (0.0019) model time 0.4622 (0.4670) loss 2.7183 (3.1067) grad_norm 1.1819 (inf) loss_scale 2048.0000 (3353.9710) mem 16715MB [2024-08-10 09:47:24 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 99 training takes 0:04:53 [2024-08-10 09:47:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:47:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:47:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.576 (0.576) Loss 0.5986 (0.5986) Acc@1 87.012 (87.012) Acc@5 98.096 (98.096) Mem 16715MB [2024-08-10 09:47:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.166) Loss 0.9692 (0.7302) Acc@1 77.783 (84.082) Acc@5 94.727 (97.017) Mem 16715MB [2024-08-10 09:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.143) Loss 1.0664 (0.8583) Acc@1 74.463 (80.757) Acc@5 93.066 (95.552) Mem 16715MB [2024-08-10 09:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.596 Acc@5 95.599 [2024-08-10 09:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-08-10 09:47:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.942 (0.942) Loss 0.5034 (0.5034) Acc@1 88.721 (88.721) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 09:47:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.201) Loss 0.8164 (0.6300) Acc@1 80.127 (85.866) Acc@5 95.801 (97.616) Mem 16715MB [2024-08-10 09:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.161) Loss 0.9268 (0.7453) Acc@1 76.709 (82.715) Acc@5 94.824 (96.361) Mem 16715MB [2024-08-10 09:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.438 Acc@5 96.393 [2024-08-10 09:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 09:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.44% [2024-08-10 09:47:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:47:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][0/625] eta 0:08:35 lr 0.000976 wd 0.0500 time 0.8241 (0.8241) data time 0.4062 (0.4062) model time 0.0000 (0.0000) loss 3.3763 (3.3763) grad_norm 1.4063 (1.4063) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][10/625] eta 0:05:05 lr 0.000976 wd 0.0500 time 0.4645 (0.4968) data time 0.0010 (0.0379) model time 0.0000 (0.0000) loss 3.3607 (3.0939) grad_norm 1.6709 (1.5806) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:47:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][20/625] eta 0:04:51 lr 0.000976 wd 0.0500 time 0.4619 (0.4815) data time 0.0007 (0.0203) model time 0.0000 (0.0000) loss 3.1176 (2.9960) grad_norm 1.5066 (1.4677) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:47:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][30/625] eta 0:04:43 lr 0.000976 wd 0.0500 time 0.4643 (0.4760) data time 0.0008 (0.0141) model time 0.0000 (0.0000) loss 2.7655 (2.9807) grad_norm 1.2845 (1.4197) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:47:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][40/625] eta 0:04:39 lr 0.000976 wd 0.0500 time 0.4626 (0.4782) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 3.7960 (3.0184) grad_norm 1.8181 (1.4328) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:47:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][50/625] eta 0:04:33 lr 0.000976 wd 0.0500 time 0.4601 (0.4753) data time 0.0011 (0.0089) model time 0.0000 (0.0000) loss 3.2071 (3.0325) grad_norm 1.7756 (1.4377) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][60/625] eta 0:04:27 lr 0.000976 wd 0.0500 time 0.4575 (0.4729) data time 0.0011 (0.0077) model time 0.4563 (0.4598) loss 3.3582 (3.0843) grad_norm 1.3258 (1.4507) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][70/625] eta 0:04:23 lr 0.000976 wd 0.0500 time 0.4631 (0.4742) data time 0.0010 (0.0067) model time 0.4621 (0.4701) loss 3.5226 (3.1274) grad_norm 1.9141 (1.4812) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][80/625] eta 0:04:17 lr 0.000976 wd 0.0500 time 0.4644 (0.4733) data time 0.0010 (0.0060) model time 0.4634 (0.4688) loss 2.7215 (3.1244) grad_norm 1.4939 (1.4757) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][90/625] eta 0:04:12 lr 0.000976 wd 0.0500 time 0.4635 (0.4724) data time 0.0008 (0.0055) model time 0.4627 (0.4676) loss 3.6559 (3.1268) grad_norm 1.5521 (1.4772) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][100/625] eta 0:04:07 lr 0.000976 wd 0.0500 time 0.4620 (0.4717) data time 0.0008 (0.0050) model time 0.4612 (0.4669) loss 2.9151 (3.1411) grad_norm 1.4070 (1.5001) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][110/625] eta 0:04:02 lr 0.000975 wd 0.0500 time 0.4650 (0.4709) data time 0.0008 (0.0047) model time 0.4642 (0.4661) loss 2.7280 (3.1461) grad_norm 1.3401 (1.5211) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][120/625] eta 0:03:57 lr 0.000975 wd 0.0500 time 0.4638 (0.4702) data time 0.0010 (0.0044) model time 0.4627 (0.4655) loss 3.6970 (3.1544) grad_norm 1.4855 (1.5277) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][130/625] eta 0:03:52 lr 0.000975 wd 0.0500 time 0.4639 (0.4696) data time 0.0008 (0.0041) model time 0.4631 (0.4649) loss 2.9420 (3.1377) grad_norm 1.5366 (1.5376) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][140/625] eta 0:03:47 lr 0.000975 wd 0.0500 time 0.4624 (0.4691) data time 0.0008 (0.0039) model time 0.4616 (0.4645) loss 3.5750 (3.1585) grad_norm 1.0319 (1.5557) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][150/625] eta 0:03:43 lr 0.000975 wd 0.0500 time 0.4486 (0.4700) data time 0.0009 (0.0037) model time 0.4477 (0.4663) loss 2.8896 (3.1664) grad_norm 1.5235 (1.5496) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][160/625] eta 0:03:38 lr 0.000975 wd 0.0500 time 0.4647 (0.4697) data time 0.0010 (0.0036) model time 0.4637 (0.4661) loss 3.0125 (3.1790) grad_norm 2.5177 (1.5744) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][170/625] eta 0:03:33 lr 0.000975 wd 0.0500 time 0.4637 (0.4695) data time 0.0008 (0.0034) model time 0.4629 (0.4660) loss 3.8653 (3.1854) grad_norm 1.9006 (1.6091) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][180/625] eta 0:03:28 lr 0.000975 wd 0.0500 time 0.4639 (0.4691) data time 0.0010 (0.0033) model time 0.4629 (0.4656) loss 2.6174 (3.1707) grad_norm 1.7783 (1.6378) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][190/625] eta 0:03:23 lr 0.000975 wd 0.0500 time 0.4623 (0.4687) data time 0.0011 (0.0032) model time 0.4612 (0.4653) loss 3.4987 (3.1645) grad_norm 1.5137 (1.6314) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][200/625] eta 0:03:19 lr 0.000975 wd 0.0500 time 0.4602 (0.4683) data time 0.0008 (0.0031) model time 0.4594 (0.4649) loss 3.4729 (3.1656) grad_norm 1.6561 (1.6123) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][210/625] eta 0:03:14 lr 0.000975 wd 0.0500 time 0.4700 (0.4694) data time 0.0011 (0.0030) model time 0.4689 (0.4664) loss 3.2739 (3.1649) grad_norm 1.4541 (1.6116) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][220/625] eta 0:03:09 lr 0.000975 wd 0.0500 time 0.4639 (0.4690) data time 0.0011 (0.0029) model time 0.4628 (0.4661) loss 3.0690 (3.1768) grad_norm 1.4266 (1.6151) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][230/625] eta 0:03:05 lr 0.000974 wd 0.0500 time 0.4675 (0.4698) data time 0.0008 (0.0028) model time 0.4668 (0.4671) loss 3.0067 (3.1749) grad_norm 1.5602 (1.6124) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][240/625] eta 0:03:00 lr 0.000974 wd 0.0500 time 0.4698 (0.4695) data time 0.0009 (0.0027) model time 0.4689 (0.4669) loss 2.5632 (3.1666) grad_norm 2.0525 (1.6104) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][250/625] eta 0:02:55 lr 0.000974 wd 0.0500 time 0.4575 (0.4692) data time 0.0008 (0.0027) model time 0.4567 (0.4666) loss 3.4451 (3.1578) grad_norm 1.7899 (1.6023) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][260/625] eta 0:02:51 lr 0.000974 wd 0.0500 time 0.4651 (0.4690) data time 0.0011 (0.0026) model time 0.4640 (0.4665) loss 2.9867 (3.1621) grad_norm 1.4268 (1.5988) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][270/625] eta 0:02:46 lr 0.000974 wd 0.0500 time 0.4621 (0.4688) data time 0.0008 (0.0025) model time 0.4613 (0.4662) loss 4.1501 (3.1766) grad_norm 1.0270 (1.5889) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][280/625] eta 0:02:41 lr 0.000974 wd 0.0500 time 0.4541 (0.4685) data time 0.0011 (0.0025) model time 0.4530 (0.4660) loss 2.6066 (3.1616) grad_norm 1.1594 (1.5762) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][290/625] eta 0:02:36 lr 0.000974 wd 0.0500 time 0.4669 (0.4684) data time 0.0010 (0.0024) model time 0.4659 (0.4659) loss 3.8631 (3.1559) grad_norm 1.3431 (1.5942) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][300/625] eta 0:02:32 lr 0.000974 wd 0.0500 time 0.4627 (0.4682) data time 0.0011 (0.0024) model time 0.4617 (0.4657) loss 2.9483 (3.1483) grad_norm 1.1814 (1.5899) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][310/625] eta 0:02:27 lr 0.000974 wd 0.0500 time 0.4629 (0.4681) data time 0.0011 (0.0024) model time 0.4617 (0.4657) loss 3.2253 (3.1485) grad_norm 0.9297 (1.5780) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][320/625] eta 0:02:22 lr 0.000974 wd 0.0500 time 0.4630 (0.4680) data time 0.0008 (0.0023) model time 0.4621 (0.4656) loss 4.1475 (3.1538) grad_norm 2.0557 (1.5701) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][330/625] eta 0:02:18 lr 0.000974 wd 0.0500 time 0.4635 (0.4678) data time 0.0011 (0.0023) model time 0.4624 (0.4654) loss 3.6563 (3.1534) grad_norm 1.0939 (1.5652) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][340/625] eta 0:02:13 lr 0.000974 wd 0.0500 time 0.4561 (0.4677) data time 0.0008 (0.0022) model time 0.4553 (0.4653) loss 3.0205 (3.1496) grad_norm 2.9585 (1.5661) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][350/625] eta 0:02:08 lr 0.000973 wd 0.0500 time 0.6379 (0.4680) data time 0.0012 (0.0022) model time 0.6367 (0.4657) loss 3.6346 (3.1560) grad_norm 1.2850 (1.5619) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][360/625] eta 0:02:04 lr 0.000973 wd 0.0500 time 0.4639 (0.4684) data time 0.0010 (0.0022) model time 0.4629 (0.4662) loss 2.6565 (3.1537) grad_norm 2.1868 (1.5606) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][370/625] eta 0:01:59 lr 0.000973 wd 0.0500 time 0.4641 (0.4682) data time 0.0010 (0.0021) model time 0.4631 (0.4660) loss 2.8532 (3.1549) grad_norm 1.6645 (1.5628) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][380/625] eta 0:01:54 lr 0.000973 wd 0.0500 time 0.4642 (0.4681) data time 0.0009 (0.0021) model time 0.4633 (0.4659) loss 3.7377 (3.1538) grad_norm 1.2981 (1.5724) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][390/625] eta 0:01:49 lr 0.000973 wd 0.0500 time 0.4661 (0.4680) data time 0.0008 (0.0021) model time 0.4653 (0.4658) loss 3.5805 (3.1546) grad_norm 1.3781 (1.5761) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][400/625] eta 0:01:45 lr 0.000973 wd 0.0500 time 0.4699 (0.4679) data time 0.0010 (0.0021) model time 0.4689 (0.4658) loss 2.6277 (3.1562) grad_norm 1.7138 (1.5687) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][410/625] eta 0:01:40 lr 0.000973 wd 0.0500 time 0.4730 (0.4683) data time 0.0009 (0.0020) model time 0.4721 (0.4662) loss 3.1913 (3.1533) grad_norm 1.6968 (1.5658) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][420/625] eta 0:01:35 lr 0.000973 wd 0.0500 time 0.4636 (0.4682) data time 0.0011 (0.0020) model time 0.4625 (0.4662) loss 3.5721 (3.1558) grad_norm 1.5777 (1.5671) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:50:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][430/625] eta 0:01:31 lr 0.000973 wd 0.0500 time 0.4621 (0.4681) data time 0.0007 (0.0020) model time 0.4614 (0.4661) loss 3.3458 (3.1550) grad_norm 1.3832 (1.5638) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][440/625] eta 0:01:26 lr 0.000973 wd 0.0500 time 0.4606 (0.4680) data time 0.0009 (0.0020) model time 0.4597 (0.4660) loss 3.4480 (3.1600) grad_norm 1.7547 (1.5684) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][450/625] eta 0:01:21 lr 0.000973 wd 0.0500 time 0.4599 (0.4685) data time 0.0011 (0.0020) model time 0.4588 (0.4666) loss 3.2517 (3.1618) grad_norm 1.4812 (1.5742) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][460/625] eta 0:01:17 lr 0.000973 wd 0.0500 time 0.4676 (0.4685) data time 0.0008 (0.0019) model time 0.4669 (0.4666) loss 4.1935 (3.1614) grad_norm 1.1537 (1.5791) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][470/625] eta 0:01:12 lr 0.000972 wd 0.0500 time 0.4660 (0.4685) data time 0.0008 (0.0019) model time 0.4651 (0.4666) loss 3.7050 (3.1664) grad_norm 1.1104 (1.5802) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][480/625] eta 0:01:07 lr 0.000972 wd 0.0500 time 0.4614 (0.4684) data time 0.0008 (0.0019) model time 0.4605 (0.4665) loss 3.4002 (3.1667) grad_norm 1.5281 (1.5735) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][490/625] eta 0:01:03 lr 0.000972 wd 0.0500 time 0.4610 (0.4687) data time 0.0008 (0.0019) model time 0.4602 (0.4668) loss 3.6741 (3.1636) grad_norm 1.1183 (1.5764) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][500/625] eta 0:00:58 lr 0.000972 wd 0.0500 time 0.4673 (0.4687) data time 0.0010 (0.0019) model time 0.4663 (0.4668) loss 3.2723 (3.1673) grad_norm 1.5280 (1.5774) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][510/625] eta 0:00:53 lr 0.000972 wd 0.0500 time 0.4729 (0.4686) data time 0.0010 (0.0018) model time 0.4719 (0.4668) loss 3.3670 (3.1720) grad_norm 1.2457 (1.5732) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][520/625] eta 0:00:49 lr 0.000972 wd 0.0500 time 0.4657 (0.4686) data time 0.0008 (0.0018) model time 0.4649 (0.4668) loss 3.7411 (3.1691) grad_norm 1.3554 (1.5695) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][530/625] eta 0:00:44 lr 0.000972 wd 0.0500 time 0.4635 (0.4686) data time 0.0011 (0.0018) model time 0.4624 (0.4668) loss 3.1512 (3.1728) grad_norm 1.5242 (1.5677) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][540/625] eta 0:00:39 lr 0.000972 wd 0.0500 time 0.4611 (0.4685) data time 0.0008 (0.0018) model time 0.4603 (0.4667) loss 3.5858 (3.1779) grad_norm 1.2458 (1.5696) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][550/625] eta 0:00:35 lr 0.000972 wd 0.0500 time 0.4617 (0.4684) data time 0.0008 (0.0018) model time 0.4608 (0.4667) loss 2.8139 (3.1783) grad_norm 1.0274 (1.5673) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:51:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][560/625] eta 0:00:30 lr 0.000972 wd 0.0500 time 0.4619 (0.4683) data time 0.0008 (0.0018) model time 0.4611 (0.4666) loss 3.4867 (3.1824) grad_norm 1.2648 (1.5678) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][570/625] eta 0:00:25 lr 0.000972 wd 0.0500 time 0.4620 (0.4682) data time 0.0008 (0.0018) model time 0.4612 (0.4664) loss 2.7121 (3.1804) grad_norm 1.5792 (1.5699) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][580/625] eta 0:00:21 lr 0.000971 wd 0.0500 time 0.4643 (0.4681) data time 0.0009 (0.0017) model time 0.4635 (0.4664) loss 2.8568 (3.1781) grad_norm 1.0761 (1.5671) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][590/625] eta 0:00:16 lr 0.000971 wd 0.0500 time 0.4613 (0.4685) data time 0.0008 (0.0017) model time 0.4605 (0.4668) loss 3.7724 (3.1801) grad_norm 1.4396 (1.5638) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][600/625] eta 0:00:11 lr 0.000971 wd 0.0500 time 0.4705 (0.4687) data time 0.0008 (0.0017) model time 0.4697 (0.4670) loss 2.1388 (3.1803) grad_norm 2.3083 (1.5639) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][610/625] eta 0:00:07 lr 0.000971 wd 0.0500 time 0.4611 (0.4686) data time 0.0008 (0.0017) model time 0.4603 (0.4669) loss 3.3971 (3.1812) grad_norm 1.7898 (1.5667) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [100/300][620/625] eta 0:00:02 lr 0.000971 wd 0.0500 time 0.4600 (0.4685) data time 0.0007 (0.0017) model time 0.4593 (0.4668) loss 3.1072 (3.1789) grad_norm 1.1851 (1.5662) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 100 training takes 0:04:52 [2024-08-10 09:52:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:52:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5732 (0.5732) Acc@1 87.842 (87.842) Acc@5 98.096 (98.096) Mem 16715MB [2024-08-10 09:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.9204 (0.7014) Acc@1 77.930 (84.255) Acc@5 95.166 (97.177) Mem 16715MB [2024-08-10 09:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0225 (0.8338) Acc@1 74.951 (80.894) Acc@5 93.799 (95.703) Mem 16715MB [2024-08-10 09:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.574 Acc@5 95.703 [2024-08-10 09:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-08-10 09:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.891 (0.891) Loss 0.5029 (0.5029) Acc@1 88.770 (88.770) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:52:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.197) Loss 0.8120 (0.6285) Acc@1 80.225 (85.955) Acc@5 95.801 (97.599) Mem 16715MB [2024-08-10 09:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.9238 (0.7436) Acc@1 76.758 (82.806) Acc@5 94.873 (96.356) Mem 16715MB [2024-08-10 09:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.518 Acc@5 96.397 [2024-08-10 09:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-10 09:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.52% [2024-08-10 09:52:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:52:39 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][0/625] eta 0:08:16 lr 0.000971 wd 0.0500 time 0.7936 (0.7936) data time 0.3853 (0.3853) model time 0.0000 (0.0000) loss 3.0514 (3.0514) grad_norm 1.1040 (1.1040) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][10/625] eta 0:05:05 lr 0.000971 wd 0.0500 time 0.4718 (0.4964) data time 0.0008 (0.0364) model time 0.0000 (0.0000) loss 3.6291 (3.0110) grad_norm 1.5528 (1.3198) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][20/625] eta 0:04:51 lr 0.000971 wd 0.0500 time 0.4696 (0.4813) data time 0.0008 (0.0195) model time 0.0000 (0.0000) loss 3.4183 (3.0733) grad_norm 1.4117 (1.3808) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][30/625] eta 0:04:47 lr 0.000971 wd 0.0500 time 0.4665 (0.4824) data time 0.0010 (0.0136) model time 0.0000 (0.0000) loss 2.2910 (3.0767) grad_norm 1.0529 (1.3822) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][40/625] eta 0:04:40 lr 0.000971 wd 0.0500 time 0.4671 (0.4787) data time 0.0010 (0.0107) model time 0.0000 (0.0000) loss 3.2327 (3.0955) grad_norm 1.5837 (1.3888) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][50/625] eta 0:04:34 lr 0.000971 wd 0.0500 time 0.4672 (0.4768) data time 0.0011 (0.0088) model time 0.0000 (0.0000) loss 2.8937 (3.1333) grad_norm 1.0242 (1.3634) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][60/625] eta 0:04:28 lr 0.000971 wd 0.0500 time 0.4624 (0.4751) data time 0.0008 (0.0078) model time 0.4616 (0.4636) loss 3.0670 (3.1750) grad_norm 1.7551 (1.3926) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][70/625] eta 0:04:22 lr 0.000971 wd 0.0500 time 0.4612 (0.4735) data time 0.0008 (0.0068) model time 0.4603 (0.4632) loss 2.9656 (3.1851) grad_norm 1.4934 (1.4057) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][80/625] eta 0:04:17 lr 0.000970 wd 0.0500 time 0.4613 (0.4722) data time 0.0011 (0.0061) model time 0.4602 (0.4629) loss 3.3898 (3.1724) grad_norm 4.2761 (1.4687) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][90/625] eta 0:04:12 lr 0.000970 wd 0.0500 time 0.4591 (0.4714) data time 0.0011 (0.0056) model time 0.4580 (0.4628) loss 2.5681 (3.1601) grad_norm 1.1551 (1.4924) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][100/625] eta 0:04:07 lr 0.000970 wd 0.0500 time 0.4665 (0.4710) data time 0.0007 (0.0052) model time 0.4658 (0.4636) loss 3.5117 (3.1605) grad_norm 0.9275 (1.4815) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][110/625] eta 0:04:02 lr 0.000970 wd 0.0500 time 0.4627 (0.4707) data time 0.0010 (0.0049) model time 0.4617 (0.4639) loss 3.1345 (3.1448) grad_norm 1.3342 (1.4701) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][120/625] eta 0:03:57 lr 0.000970 wd 0.0500 time 0.4651 (0.4701) data time 0.0010 (0.0045) model time 0.4641 (0.4637) loss 1.6707 (3.1364) grad_norm 2.1476 (1.4906) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][130/625] eta 0:03:53 lr 0.000970 wd 0.0500 time 0.4638 (0.4717) data time 0.0008 (0.0043) model time 0.4631 (0.4670) loss 4.0005 (3.1226) grad_norm 1.6581 (1.4812) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][140/625] eta 0:03:48 lr 0.000970 wd 0.0500 time 0.4586 (0.4711) data time 0.0009 (0.0041) model time 0.4578 (0.4665) loss 2.2365 (3.0965) grad_norm 1.2762 (1.4766) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][150/625] eta 0:03:44 lr 0.000970 wd 0.0500 time 0.4640 (0.4724) data time 0.0007 (0.0039) model time 0.4633 (0.4687) loss 2.9927 (3.0785) grad_norm 1.5750 (1.4823) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:53:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][160/625] eta 0:03:39 lr 0.000970 wd 0.0500 time 0.4621 (0.4721) data time 0.0010 (0.0037) model time 0.4611 (0.4686) loss 2.4135 (3.0719) grad_norm 1.2059 (1.4786) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][170/625] eta 0:03:35 lr 0.000970 wd 0.0500 time 0.4635 (0.4729) data time 0.0008 (0.0036) model time 0.4627 (0.4698) loss 3.7700 (3.0693) grad_norm 1.8058 (1.4850) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][180/625] eta 0:03:30 lr 0.000970 wd 0.0500 time 0.4651 (0.4723) data time 0.0007 (0.0034) model time 0.4644 (0.4693) loss 3.4199 (3.0739) grad_norm 1.1988 (1.4950) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][190/625] eta 0:03:25 lr 0.000970 wd 0.0500 time 0.4700 (0.4723) data time 0.0010 (0.0033) model time 0.4690 (0.4693) loss 2.3399 (3.0724) grad_norm 1.6177 (1.4949) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][200/625] eta 0:03:20 lr 0.000969 wd 0.0500 time 0.4631 (0.4722) data time 0.0010 (0.0032) model time 0.4621 (0.4693) loss 2.5387 (3.0663) grad_norm 1.7552 (1.4903) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][210/625] eta 0:03:15 lr 0.000969 wd 0.0500 time 0.4642 (0.4719) data time 0.0010 (0.0031) model time 0.4632 (0.4690) loss 2.1631 (3.0632) grad_norm 1.3858 (1.4844) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][220/625] eta 0:03:11 lr 0.000969 wd 0.0500 time 0.4998 (0.4717) data time 0.0011 (0.0030) model time 0.4987 (0.4690) loss 2.8779 (3.0580) grad_norm 1.1019 (1.4807) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][230/625] eta 0:03:06 lr 0.000969 wd 0.0500 time 0.4585 (0.4715) data time 0.0010 (0.0029) model time 0.4574 (0.4687) loss 2.9932 (3.0542) grad_norm 1.0910 (1.4762) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][240/625] eta 0:03:01 lr 0.000969 wd 0.0500 time 0.4653 (0.4713) data time 0.0011 (0.0028) model time 0.4642 (0.4686) loss 3.4427 (3.0677) grad_norm 1.2376 (1.4758) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][250/625] eta 0:02:56 lr 0.000969 wd 0.0500 time 0.4823 (0.4711) data time 0.0009 (0.0028) model time 0.4814 (0.4684) loss 3.4316 (3.0754) grad_norm 1.1308 (1.4717) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][260/625] eta 0:02:51 lr 0.000969 wd 0.0500 time 0.4626 (0.4709) data time 0.0011 (0.0028) model time 0.4614 (0.4682) loss 2.7858 (3.0774) grad_norm 1.8921 (1.4718) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][270/625] eta 0:02:47 lr 0.000969 wd 0.0500 time 0.4665 (0.4707) data time 0.0012 (0.0027) model time 0.4654 (0.4679) loss 2.8880 (3.0616) grad_norm 2.4953 (1.4787) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][280/625] eta 0:02:42 lr 0.000969 wd 0.0500 time 0.4656 (0.4704) data time 0.0011 (0.0027) model time 0.4646 (0.4677) loss 3.5754 (3.0662) grad_norm 1.5861 (1.4974) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:54:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][290/625] eta 0:02:37 lr 0.000969 wd 0.0500 time 0.4608 (0.4702) data time 0.0012 (0.0026) model time 0.4596 (0.4675) loss 3.4155 (3.0719) grad_norm 1.7293 (1.4975) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][300/625] eta 0:02:32 lr 0.000969 wd 0.0500 time 0.4646 (0.4700) data time 0.0011 (0.0026) model time 0.4635 (0.4674) loss 2.8045 (3.0782) grad_norm 2.1096 (1.4982) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][310/625] eta 0:02:28 lr 0.000969 wd 0.0500 time 0.4665 (0.4699) data time 0.0011 (0.0025) model time 0.4654 (0.4673) loss 3.1273 (3.0822) grad_norm 1.2088 (1.5283) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][320/625] eta 0:02:23 lr 0.000968 wd 0.0500 time 0.4650 (0.4698) data time 0.0011 (0.0025) model time 0.4639 (0.4672) loss 3.8070 (3.0880) grad_norm 1.5129 (1.5216) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][330/625] eta 0:02:18 lr 0.000968 wd 0.0500 time 0.4667 (0.4698) data time 0.0011 (0.0025) model time 0.4656 (0.4672) loss 3.5780 (3.0896) grad_norm 1.3721 (1.5180) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][340/625] eta 0:02:13 lr 0.000968 wd 0.0500 time 0.4656 (0.4697) data time 0.0008 (0.0024) model time 0.4648 (0.4671) loss 2.4182 (3.0888) grad_norm 1.1584 (1.5156) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][350/625] eta 0:02:09 lr 0.000968 wd 0.0500 time 0.4631 (0.4695) data time 0.0008 (0.0024) model time 0.4623 (0.4670) loss 2.7495 (3.0860) grad_norm 1.2876 (1.5138) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][360/625] eta 0:02:04 lr 0.000968 wd 0.0500 time 0.4622 (0.4694) data time 0.0009 (0.0024) model time 0.4612 (0.4669) loss 2.6140 (3.0923) grad_norm 1.9772 (1.5226) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][370/625] eta 0:01:59 lr 0.000968 wd 0.0500 time 0.4652 (0.4697) data time 0.0011 (0.0023) model time 0.4642 (0.4673) loss 2.9355 (3.0911) grad_norm 1.2373 (1.5259) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][380/625] eta 0:01:55 lr 0.000968 wd 0.0500 time 0.4611 (0.4695) data time 0.0009 (0.0023) model time 0.4602 (0.4671) loss 2.2284 (3.0895) grad_norm 1.4550 (1.5217) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][390/625] eta 0:01:50 lr 0.000968 wd 0.0500 time 0.4594 (0.4698) data time 0.0010 (0.0023) model time 0.4585 (0.4675) loss 3.1051 (3.0951) grad_norm 2.2619 (1.5222) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][400/625] eta 0:01:45 lr 0.000968 wd 0.0500 time 0.4795 (0.4698) data time 0.0007 (0.0022) model time 0.4788 (0.4675) loss 3.7437 (3.0980) grad_norm 1.4242 (1.5200) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][410/625] eta 0:01:40 lr 0.000968 wd 0.0500 time 0.4710 (0.4698) data time 0.0008 (0.0022) model time 0.4701 (0.4675) loss 2.7398 (3.0987) grad_norm 1.5040 (1.5186) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][420/625] eta 0:01:36 lr 0.000968 wd 0.0500 time 0.4731 (0.4697) data time 0.0012 (0.0022) model time 0.4719 (0.4674) loss 2.6057 (3.0954) grad_norm 1.2489 (1.5153) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][430/625] eta 0:01:31 lr 0.000967 wd 0.0500 time 0.4610 (0.4695) data time 0.0008 (0.0021) model time 0.4602 (0.4673) loss 2.9004 (3.0986) grad_norm 3.0345 (1.5347) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][440/625] eta 0:01:26 lr 0.000967 wd 0.0500 time 0.4646 (0.4694) data time 0.0011 (0.0021) model time 0.4635 (0.4671) loss 3.4361 (3.1018) grad_norm 1.5160 (1.5348) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][450/625] eta 0:01:22 lr 0.000967 wd 0.0500 time 0.4624 (0.4692) data time 0.0009 (0.0021) model time 0.4615 (0.4670) loss 2.1800 (3.1029) grad_norm 1.5915 (1.5379) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][460/625] eta 0:01:17 lr 0.000967 wd 0.0500 time 0.4581 (0.4691) data time 0.0011 (0.0021) model time 0.4571 (0.4669) loss 3.3882 (3.1095) grad_norm 1.2254 (1.5367) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][470/625] eta 0:01:12 lr 0.000967 wd 0.0500 time 0.4663 (0.4695) data time 0.0010 (0.0021) model time 0.4653 (0.4674) loss 3.0882 (3.1069) grad_norm 1.0216 (1.5327) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][480/625] eta 0:01:08 lr 0.000967 wd 0.0500 time 0.4709 (0.4698) data time 0.0011 (0.0020) model time 0.4699 (0.4678) loss 3.4191 (3.1071) grad_norm 1.2944 (1.5336) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][490/625] eta 0:01:03 lr 0.000967 wd 0.0500 time 0.4693 (0.4697) data time 0.0010 (0.0020) model time 0.4683 (0.4677) loss 3.6174 (3.1084) grad_norm 1.1429 (1.5298) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][500/625] eta 0:00:58 lr 0.000967 wd 0.0500 time 0.4634 (0.4696) data time 0.0008 (0.0020) model time 0.4626 (0.4675) loss 3.1084 (3.1053) grad_norm 1.2696 (1.5282) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][510/625] eta 0:00:53 lr 0.000967 wd 0.0500 time 0.4657 (0.4694) data time 0.0008 (0.0020) model time 0.4649 (0.4674) loss 1.9481 (3.0988) grad_norm 1.5891 (1.5252) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][520/625] eta 0:00:49 lr 0.000967 wd 0.0500 time 0.4622 (0.4693) data time 0.0010 (0.0020) model time 0.4611 (0.4673) loss 3.4315 (3.0964) grad_norm 1.5736 (1.5248) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][530/625] eta 0:00:44 lr 0.000967 wd 0.0500 time 0.4695 (0.4695) data time 0.0008 (0.0019) model time 0.4687 (0.4676) loss 3.1873 (3.1000) grad_norm 1.2261 (1.5225) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][540/625] eta 0:00:39 lr 0.000967 wd 0.0500 time 0.4692 (0.4695) data time 0.0010 (0.0019) model time 0.4682 (0.4675) loss 3.5743 (3.1031) grad_norm 1.3844 (1.5224) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:56:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][550/625] eta 0:00:35 lr 0.000966 wd 0.0500 time 0.4641 (0.4694) data time 0.0010 (0.0019) model time 0.4631 (0.4675) loss 3.1292 (3.1011) grad_norm 1.4978 (1.5216) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][560/625] eta 0:00:30 lr 0.000966 wd 0.0500 time 0.4572 (0.4695) data time 0.0009 (0.0019) model time 0.4563 (0.4676) loss 3.5242 (3.1040) grad_norm 1.7857 (1.5296) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][570/625] eta 0:00:25 lr 0.000966 wd 0.0500 time 0.4759 (0.4694) data time 0.0008 (0.0019) model time 0.4751 (0.4675) loss 3.7896 (3.1064) grad_norm 1.6130 (1.5250) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][580/625] eta 0:00:21 lr 0.000966 wd 0.0500 time 0.4617 (0.4693) data time 0.0010 (0.0019) model time 0.4607 (0.4674) loss 2.4871 (3.1034) grad_norm 1.2070 (1.5200) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][590/625] eta 0:00:16 lr 0.000966 wd 0.0500 time 0.4622 (0.4692) data time 0.0011 (0.0019) model time 0.4611 (0.4673) loss 3.1102 (3.1066) grad_norm 1.3013 (1.5175) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][600/625] eta 0:00:11 lr 0.000966 wd 0.0500 time 0.4635 (0.4691) data time 0.0008 (0.0019) model time 0.4627 (0.4672) loss 3.3610 (3.1117) grad_norm 1.8336 (1.5181) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][610/625] eta 0:00:07 lr 0.000966 wd 0.0500 time 0.4613 (0.4690) data time 0.0007 (0.0018) model time 0.4605 (0.4671) loss 3.5453 (3.1113) grad_norm 2.0200 (1.5195) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [101/300][620/625] eta 0:00:02 lr 0.000966 wd 0.0500 time 0.4641 (0.4689) data time 0.0005 (0.0018) model time 0.4636 (0.4670) loss 3.0622 (3.1129) grad_norm 2.8042 (1.5204) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 101 training takes 0:04:53 [2024-08-10 09:57:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 09:57:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 09:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.594 (0.594) Loss 0.5610 (0.5610) Acc@1 88.086 (88.086) Acc@5 98.096 (98.096) Mem 16715MB [2024-08-10 09:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.168) Loss 0.9062 (0.6982) Acc@1 78.271 (84.322) Acc@5 95.361 (97.075) Mem 16715MB [2024-08-10 09:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.144) Loss 1.0449 (0.8273) Acc@1 75.195 (81.080) Acc@5 93.506 (95.657) Mem 16715MB [2024-08-10 09:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.852 Acc@5 95.681 [2024-08-10 09:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-10 09:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.85% [2024-08-10 09:57:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 09:57:39 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 09:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.540 (0.540) Loss 0.5029 (0.5029) Acc@1 88.721 (88.721) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 09:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.162) Loss 0.8120 (0.6277) Acc@1 80.371 (85.982) Acc@5 95.752 (97.612) Mem 16715MB [2024-08-10 09:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.9209 (0.7423) Acc@1 76.855 (82.859) Acc@5 95.020 (96.384) Mem 16715MB [2024-08-10 09:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.576 Acc@5 96.431 [2024-08-10 09:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-10 09:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.58% [2024-08-10 09:57:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 09:57:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 09:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][0/625] eta 0:08:50 lr 0.000966 wd 0.0500 time 0.8481 (0.8481) data time 0.4410 (0.4410) model time 0.0000 (0.0000) loss 3.1225 (3.1225) grad_norm 1.6458 (1.6458) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][10/625] eta 0:05:05 lr 0.000966 wd 0.0500 time 0.4654 (0.4973) data time 0.0009 (0.0410) model time 0.0000 (0.0000) loss 3.5885 (3.1611) grad_norm 1.6954 (1.3212) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][20/625] eta 0:04:50 lr 0.000966 wd 0.0500 time 0.4619 (0.4806) data time 0.0011 (0.0220) model time 0.0000 (0.0000) loss 3.2372 (3.0961) grad_norm 1.7153 (1.3045) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][30/625] eta 0:04:43 lr 0.000966 wd 0.0500 time 0.4695 (0.4757) data time 0.0010 (0.0153) model time 0.0000 (0.0000) loss 2.5370 (3.0642) grad_norm 1.8073 (1.3956) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][40/625] eta 0:04:36 lr 0.000965 wd 0.0500 time 0.4821 (0.4735) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 3.5587 (3.0942) grad_norm 2.9234 (1.4576) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][50/625] eta 0:04:31 lr 0.000965 wd 0.0500 time 0.4714 (0.4724) data time 0.0010 (0.0097) model time 0.0000 (0.0000) loss 3.2623 (3.1185) grad_norm 1.6431 (1.5308) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][60/625] eta 0:04:28 lr 0.000965 wd 0.0500 time 0.4610 (0.4757) data time 0.0009 (0.0083) model time 0.4601 (0.4910) loss 2.5207 (3.0920) grad_norm 1.2057 (1.5011) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][70/625] eta 0:04:23 lr 0.000965 wd 0.0500 time 0.4676 (0.4744) data time 0.0011 (0.0073) model time 0.4665 (0.4783) loss 3.3763 (3.1264) grad_norm 1.3324 (1.5007) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][80/625] eta 0:04:17 lr 0.000965 wd 0.0500 time 0.4635 (0.4733) data time 0.0008 (0.0065) model time 0.4627 (0.4737) loss 2.4015 (3.1042) grad_norm 1.3547 (1.5016) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][90/625] eta 0:04:12 lr 0.000965 wd 0.0500 time 0.4777 (0.4725) data time 0.0010 (0.0059) model time 0.4767 (0.4715) loss 2.9815 (3.1089) grad_norm 1.4578 (1.4877) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][100/625] eta 0:04:07 lr 0.000965 wd 0.0500 time 0.4585 (0.4718) data time 0.0009 (0.0055) model time 0.4576 (0.4700) loss 2.7060 (3.1218) grad_norm 1.7400 (1.4796) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][110/625] eta 0:04:03 lr 0.000965 wd 0.0500 time 0.4640 (0.4731) data time 0.0011 (0.0051) model time 0.4629 (0.4724) loss 3.3488 (3.1194) grad_norm 1.1634 (1.4809) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][120/625] eta 0:03:58 lr 0.000965 wd 0.0500 time 0.4651 (0.4726) data time 0.0008 (0.0048) model time 0.4643 (0.4715) loss 2.5158 (3.1270) grad_norm 3.2972 (1.5106) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][130/625] eta 0:03:54 lr 0.000965 wd 0.0500 time 0.4702 (0.4735) data time 0.0010 (0.0045) model time 0.4692 (0.4731) loss 2.2207 (3.1040) grad_norm 1.0262 (1.5197) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][140/625] eta 0:03:49 lr 0.000965 wd 0.0500 time 0.4633 (0.4732) data time 0.0009 (0.0042) model time 0.4624 (0.4724) loss 2.3854 (3.1059) grad_norm 1.3209 (1.5186) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:58:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][150/625] eta 0:03:44 lr 0.000965 wd 0.0500 time 0.4660 (0.4729) data time 0.0010 (0.0040) model time 0.4650 (0.4721) loss 3.3681 (3.0893) grad_norm 1.7995 (1.5308) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][160/625] eta 0:03:40 lr 0.000964 wd 0.0500 time 0.4113 (0.4734) data time 0.0010 (0.0038) model time 0.4103 (0.4727) loss 3.1711 (3.0861) grad_norm 1.0472 (1.5303) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][170/625] eta 0:03:35 lr 0.000964 wd 0.0500 time 0.4635 (0.4728) data time 0.0010 (0.0037) model time 0.4625 (0.4718) loss 2.3573 (3.0742) grad_norm 1.3934 (1.5259) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][180/625] eta 0:03:30 lr 0.000964 wd 0.0500 time 0.4633 (0.4723) data time 0.0010 (0.0036) model time 0.4623 (0.4711) loss 3.4375 (3.0914) grad_norm 1.5826 (1.5193) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][190/625] eta 0:03:25 lr 0.000964 wd 0.0500 time 0.4656 (0.4719) data time 0.0010 (0.0034) model time 0.4646 (0.4705) loss 3.3527 (3.0973) grad_norm 2.4728 (1.5286) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][200/625] eta 0:03:20 lr 0.000964 wd 0.0500 time 0.4599 (0.4715) data time 0.0011 (0.0033) model time 0.4588 (0.4701) loss 1.8423 (3.1048) grad_norm 1.3305 (1.5300) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][210/625] eta 0:03:15 lr 0.000964 wd 0.0500 time 0.4667 (0.4712) data time 0.0007 (0.0032) model time 0.4659 (0.4697) loss 2.9604 (3.1076) grad_norm 2.0532 (1.5708) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][220/625] eta 0:03:10 lr 0.000964 wd 0.0500 time 0.4649 (0.4709) data time 0.0009 (0.0031) model time 0.4639 (0.4693) loss 3.6089 (3.0985) grad_norm 1.7691 (1.5663) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][230/625] eta 0:03:05 lr 0.000964 wd 0.0500 time 0.4593 (0.4706) data time 0.0009 (0.0030) model time 0.4584 (0.4689) loss 3.2172 (3.0997) grad_norm 1.0942 (1.5565) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][240/625] eta 0:03:01 lr 0.000964 wd 0.0500 time 0.4606 (0.4703) data time 0.0009 (0.0030) model time 0.4598 (0.4686) loss 3.5178 (3.1031) grad_norm 1.0802 (1.5526) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][250/625] eta 0:02:56 lr 0.000964 wd 0.0500 time 0.4630 (0.4708) data time 0.0011 (0.0029) model time 0.4620 (0.4692) loss 3.5780 (3.0956) grad_norm 1.5255 (1.5499) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][260/625] eta 0:02:51 lr 0.000964 wd 0.0500 time 0.4668 (0.4706) data time 0.0008 (0.0028) model time 0.4660 (0.4690) loss 3.1435 (3.0951) grad_norm 1.0520 (1.5548) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][270/625] eta 0:02:46 lr 0.000964 wd 0.0500 time 0.4671 (0.4704) data time 0.0008 (0.0028) model time 0.4664 (0.4688) loss 3.2565 (3.0988) grad_norm 2.2522 (1.5626) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 09:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][280/625] eta 0:02:42 lr 0.000963 wd 0.0500 time 0.4651 (0.4703) data time 0.0011 (0.0027) model time 0.4639 (0.4687) loss 2.6217 (3.0905) grad_norm 2.0139 (1.5672) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][290/625] eta 0:02:37 lr 0.000963 wd 0.0500 time 0.4620 (0.4701) data time 0.0010 (0.0027) model time 0.4611 (0.4684) loss 2.5224 (3.0906) grad_norm 1.5773 (1.5612) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][300/625] eta 0:02:32 lr 0.000963 wd 0.0500 time 0.4630 (0.4698) data time 0.0007 (0.0026) model time 0.4623 (0.4681) loss 2.2026 (3.0836) grad_norm 1.7263 (1.5534) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][310/625] eta 0:02:27 lr 0.000963 wd 0.0500 time 0.4612 (0.4695) data time 0.0009 (0.0025) model time 0.4604 (0.4678) loss 3.3734 (3.0874) grad_norm 1.3144 (1.5559) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][320/625] eta 0:02:23 lr 0.000963 wd 0.0500 time 0.4608 (0.4693) data time 0.0011 (0.0025) model time 0.4597 (0.4676) loss 3.7027 (3.0970) grad_norm 2.0369 (1.5559) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][330/625] eta 0:02:18 lr 0.000963 wd 0.0500 time 0.4672 (0.4691) data time 0.0012 (0.0025) model time 0.4659 (0.4674) loss 2.6806 (3.0934) grad_norm 2.1063 (1.5778) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][340/625] eta 0:02:13 lr 0.000963 wd 0.0500 time 0.4651 (0.4691) data time 0.0010 (0.0024) model time 0.4641 (0.4673) loss 3.2320 (3.0947) grad_norm 1.1509 (1.5833) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][350/625] eta 0:02:09 lr 0.000963 wd 0.0500 time 0.4672 (0.4695) data time 0.0009 (0.0024) model time 0.4663 (0.4679) loss 2.3354 (3.0988) grad_norm 1.0474 (1.5749) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][360/625] eta 0:02:04 lr 0.000963 wd 0.0500 time 0.4612 (0.4694) data time 0.0008 (0.0023) model time 0.4604 (0.4678) loss 2.2551 (3.0963) grad_norm 1.5114 (1.5709) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][370/625] eta 0:01:59 lr 0.000963 wd 0.0500 time 0.4645 (0.4693) data time 0.0010 (0.0023) model time 0.4635 (0.4677) loss 3.1252 (3.1014) grad_norm 1.5521 (1.5734) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][380/625] eta 0:01:54 lr 0.000963 wd 0.0500 time 0.4649 (0.4692) data time 0.0007 (0.0023) model time 0.4642 (0.4675) loss 2.5328 (3.1017) grad_norm 1.4587 (1.5797) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][390/625] eta 0:01:50 lr 0.000963 wd 0.0500 time 0.4630 (0.4694) data time 0.0011 (0.0022) model time 0.4619 (0.4679) loss 3.3282 (3.1092) grad_norm 1.2141 (1.5799) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][400/625] eta 0:01:45 lr 0.000962 wd 0.0500 time 0.4628 (0.4693) data time 0.0009 (0.0022) model time 0.4619 (0.4677) loss 3.5318 (3.1134) grad_norm 1.5678 (1.5770) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:00:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][410/625] eta 0:01:40 lr 0.000962 wd 0.0500 time 0.4617 (0.4691) data time 0.0008 (0.0022) model time 0.4610 (0.4676) loss 3.5799 (3.1083) grad_norm 2.0516 (1.5792) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][420/625] eta 0:01:36 lr 0.000962 wd 0.0500 time 0.4650 (0.4691) data time 0.0010 (0.0022) model time 0.4640 (0.4675) loss 3.5712 (3.1078) grad_norm 1.5845 (1.5760) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][430/625] eta 0:01:31 lr 0.000962 wd 0.0500 time 0.4711 (0.4690) data time 0.0008 (0.0021) model time 0.4703 (0.4674) loss 3.4943 (3.1087) grad_norm 1.2539 (1.5704) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][440/625] eta 0:01:26 lr 0.000962 wd 0.0500 time 0.4662 (0.4694) data time 0.0009 (0.0021) model time 0.4653 (0.4679) loss 3.5884 (3.1129) grad_norm 1.6929 (1.5695) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][450/625] eta 0:01:22 lr 0.000962 wd 0.0500 time 0.4655 (0.4693) data time 0.0008 (0.0021) model time 0.4648 (0.4678) loss 3.3591 (3.1176) grad_norm 1.4930 (1.5700) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][460/625] eta 0:01:17 lr 0.000962 wd 0.0500 time 0.4584 (0.4692) data time 0.0012 (0.0021) model time 0.4572 (0.4677) loss 3.1041 (3.1163) grad_norm 1.6167 (1.5802) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][470/625] eta 0:01:12 lr 0.000962 wd 0.0500 time 0.4797 (0.4696) data time 0.0010 (0.0020) model time 0.4787 (0.4682) loss 3.2865 (3.1119) grad_norm 1.7222 (1.5765) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][480/625] eta 0:01:08 lr 0.000962 wd 0.0500 time 0.4646 (0.4696) data time 0.0010 (0.0020) model time 0.4637 (0.4681) loss 2.9991 (3.1153) grad_norm 1.2619 (1.5725) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][490/625] eta 0:01:03 lr 0.000962 wd 0.0500 time 0.4633 (0.4695) data time 0.0012 (0.0020) model time 0.4621 (0.4680) loss 3.2782 (3.1157) grad_norm 1.0544 (1.5710) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][500/625] eta 0:00:58 lr 0.000962 wd 0.0500 time 0.4656 (0.4698) data time 0.0010 (0.0020) model time 0.4646 (0.4684) loss 3.4200 (3.1193) grad_norm 1.1012 (1.5697) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][510/625] eta 0:00:54 lr 0.000961 wd 0.0500 time 0.4672 (0.4698) data time 0.0007 (0.0020) model time 0.4665 (0.4684) loss 2.2549 (3.1154) grad_norm 1.7145 (1.5655) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][520/625] eta 0:00:49 lr 0.000961 wd 0.0500 time 0.4642 (0.4697) data time 0.0009 (0.0020) model time 0.4633 (0.4682) loss 2.4353 (3.1169) grad_norm 1.3769 (1.5656) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][530/625] eta 0:00:44 lr 0.000961 wd 0.0500 time 0.4673 (0.4700) data time 0.0010 (0.0020) model time 0.4663 (0.4685) loss 2.9314 (3.1167) grad_norm 1.4198 (1.5675) loss_scale 4096.0000 (2086.5687) mem 16715MB [2024-08-10 10:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][540/625] eta 0:00:39 lr 0.000961 wd 0.0500 time 0.4672 (0.4699) data time 0.0010 (0.0020) model time 0.4662 (0.4685) loss 2.9717 (3.1209) grad_norm 1.9057 (1.5702) loss_scale 4096.0000 (2123.7116) mem 16715MB [2024-08-10 10:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][550/625] eta 0:00:35 lr 0.000961 wd 0.0500 time 0.4675 (0.4698) data time 0.0010 (0.0020) model time 0.4665 (0.4684) loss 3.1272 (3.1212) grad_norm 1.3342 (1.5733) loss_scale 4096.0000 (2159.5064) mem 16715MB [2024-08-10 10:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][560/625] eta 0:00:30 lr 0.000961 wd 0.0500 time 0.4628 (0.4698) data time 0.0010 (0.0019) model time 0.4618 (0.4684) loss 3.2646 (3.1218) grad_norm 1.5694 (1.5737) loss_scale 4096.0000 (2194.0250) mem 16715MB [2024-08-10 10:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][570/625] eta 0:00:25 lr 0.000961 wd 0.0500 time 0.4615 (0.4697) data time 0.0008 (0.0019) model time 0.4607 (0.4683) loss 3.6637 (3.1226) grad_norm 1.1950 (1.5740) loss_scale 4096.0000 (2227.3345) mem 16715MB [2024-08-10 10:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][580/625] eta 0:00:21 lr 0.000961 wd 0.0500 time 0.4588 (0.4696) data time 0.0010 (0.0019) model time 0.4579 (0.4682) loss 3.4066 (3.1223) grad_norm 1.2731 (1.5701) loss_scale 4096.0000 (2259.4974) mem 16715MB [2024-08-10 10:02:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][590/625] eta 0:00:16 lr 0.000961 wd 0.0500 time 0.4607 (0.4695) data time 0.0010 (0.0019) model time 0.4597 (0.4681) loss 3.2644 (3.1195) grad_norm 1.2945 (1.5653) loss_scale 4096.0000 (2290.5719) mem 16715MB [2024-08-10 10:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][600/625] eta 0:00:11 lr 0.000961 wd 0.0500 time 0.4824 (0.4695) data time 0.0008 (0.0019) model time 0.4816 (0.4681) loss 2.8267 (3.1192) grad_norm 1.2608 (1.5609) loss_scale 4096.0000 (2320.6123) mem 16715MB [2024-08-10 10:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][610/625] eta 0:00:07 lr 0.000961 wd 0.0500 time 0.4585 (0.4699) data time 0.0006 (0.0019) model time 0.4579 (0.4684) loss 3.6608 (3.1186) grad_norm 1.2167 (1.5557) loss_scale 4096.0000 (2349.6694) mem 16715MB [2024-08-10 10:02:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [102/300][620/625] eta 0:00:02 lr 0.000961 wd 0.0500 time 0.4595 (0.4697) data time 0.0007 (0.0019) model time 0.4588 (0.4682) loss 2.4223 (3.1187) grad_norm 1.4445 (1.5551) loss_scale 4096.0000 (2377.7907) mem 16715MB [2024-08-10 10:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 102 training takes 0:04:53 [2024-08-10 10:02:38 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:02:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.530 (0.530) Loss 0.5576 (0.5576) Acc@1 88.086 (88.086) Acc@5 98.340 (98.340) Mem 16715MB [2024-08-10 10:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.9092 (0.6927) Acc@1 78.174 (84.650) Acc@5 94.824 (97.164) Mem 16715MB [2024-08-10 10:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0352 (0.8284) Acc@1 74.512 (81.210) Acc@5 93.799 (95.673) Mem 16715MB [2024-08-10 10:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.892 Acc@5 95.657 [2024-08-10 10:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-10 10:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.89% [2024-08-10 10:02:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 10:02:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 10:02:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.539 (0.539) Loss 0.5010 (0.5010) Acc@1 88.672 (88.672) Acc@5 98.438 (98.438) Mem 16715MB [2024-08-10 10:02:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.162) Loss 0.8101 (0.6266) Acc@1 80.127 (86.031) Acc@5 95.801 (97.607) Mem 16715MB [2024-08-10 10:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 0.9180 (0.7409) Acc@1 77.051 (82.915) Acc@5 95.068 (96.391) Mem 16715MB [2024-08-10 10:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.654 Acc@5 96.435 [2024-08-10 10:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 10:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.65% [2024-08-10 10:02:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:02:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:02:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][0/625] eta 0:09:11 lr 0.000961 wd 0.0500 time 0.8825 (0.8825) data time 0.4718 (0.4718) model time 0.0000 (0.0000) loss 3.4465 (3.4465) grad_norm 1.3238 (1.3238) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:02:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][10/625] eta 0:05:21 lr 0.000960 wd 0.0500 time 0.4642 (0.5223) data time 0.0007 (0.0439) model time 0.0000 (0.0000) loss 3.6474 (3.0409) grad_norm 1.8453 (1.7175) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][20/625] eta 0:05:00 lr 0.000960 wd 0.0500 time 0.4683 (0.4962) data time 0.0010 (0.0235) model time 0.0000 (0.0000) loss 3.0670 (3.1638) grad_norm 1.6201 (1.8200) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:03:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][30/625] eta 0:04:49 lr 0.000960 wd 0.0500 time 0.4607 (0.4868) data time 0.0008 (0.0163) model time 0.0000 (0.0000) loss 2.2423 (3.1828) grad_norm 1.4445 (1.7303) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][40/625] eta 0:04:42 lr 0.000960 wd 0.0500 time 0.4698 (0.4822) data time 0.0010 (0.0127) model time 0.0000 (0.0000) loss 3.2872 (3.1801) grad_norm 2.4783 (1.6758) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:03:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][50/625] eta 0:04:35 lr 0.000960 wd 0.0500 time 0.4580 (0.4784) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 3.5196 (3.1321) grad_norm 1.2106 (1.6228) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][60/625] eta 0:04:28 lr 0.000960 wd 0.0500 time 0.4635 (0.4758) data time 0.0010 (0.0089) model time 0.4624 (0.4614) loss 2.4085 (3.1484) grad_norm 1.3068 (1.5860) loss_scale 4096.0000 (4096.0000) mem 16715MB [2024-08-10 10:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][70/625] eta 0:04:23 lr 0.000960 wd 0.0500 time 0.4519 (0.4752) data time 0.0008 (0.0078) model time 0.4510 (0.4659) loss 3.3130 (3.1573) grad_norm inf (inf) loss_scale 2048.0000 (4067.1549) mem 16715MB [2024-08-10 10:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][80/625] eta 0:04:18 lr 0.000960 wd 0.0500 time 0.4688 (0.4741) data time 0.0011 (0.0069) model time 0.4677 (0.4655) loss 2.4746 (3.1774) grad_norm 1.2858 (inf) loss_scale 2048.0000 (3817.8765) mem 16715MB [2024-08-10 10:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][90/625] eta 0:04:14 lr 0.000960 wd 0.0500 time 0.4694 (0.4750) data time 0.0009 (0.0063) model time 0.4685 (0.4694) loss 3.4963 (3.1972) grad_norm 2.3016 (inf) loss_scale 2048.0000 (3623.3846) mem 16715MB [2024-08-10 10:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][100/625] eta 0:04:08 lr 0.000960 wd 0.0500 time 0.4621 (0.4738) data time 0.0010 (0.0058) model time 0.4611 (0.4680) loss 3.5708 (3.1749) grad_norm 1.3675 (inf) loss_scale 2048.0000 (3467.4059) mem 16715MB [2024-08-10 10:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][110/625] eta 0:04:03 lr 0.000960 wd 0.0500 time 0.4628 (0.4727) data time 0.0008 (0.0054) model time 0.4620 (0.4667) loss 3.7668 (3.1652) grad_norm 1.6344 (inf) loss_scale 2048.0000 (3339.5315) mem 16715MB [2024-08-10 10:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][120/625] eta 0:03:58 lr 0.000959 wd 0.0500 time 0.4654 (0.4718) data time 0.0010 (0.0050) model time 0.4644 (0.4658) loss 2.3821 (3.1393) grad_norm 1.7294 (inf) loss_scale 2048.0000 (3232.7934) mem 16715MB [2024-08-10 10:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][130/625] eta 0:03:53 lr 0.000959 wd 0.0500 time 0.4585 (0.4710) data time 0.0007 (0.0047) model time 0.4577 (0.4652) loss 2.7366 (3.1010) grad_norm 2.0825 (inf) loss_scale 2048.0000 (3142.3511) mem 16715MB [2024-08-10 10:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][140/625] eta 0:03:48 lr 0.000959 wd 0.0500 time 0.4638 (0.4705) data time 0.0009 (0.0044) model time 0.4629 (0.4650) loss 2.8376 (3.1038) grad_norm 2.0112 (inf) loss_scale 2048.0000 (3064.7376) mem 16715MB [2024-08-10 10:04:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][150/625] eta 0:03:43 lr 0.000959 wd 0.0500 time 0.4598 (0.4702) data time 0.0008 (0.0042) model time 0.4590 (0.4649) loss 3.6077 (3.0968) grad_norm 1.9664 (inf) loss_scale 2048.0000 (2997.4040) mem 16715MB [2024-08-10 10:04:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][160/625] eta 0:03:38 lr 0.000959 wd 0.0500 time 0.4650 (0.4699) data time 0.0010 (0.0040) model time 0.4640 (0.4648) loss 3.5484 (3.1035) grad_norm 2.3917 (inf) loss_scale 2048.0000 (2938.4348) mem 16715MB [2024-08-10 10:04:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][170/625] eta 0:03:33 lr 0.000959 wd 0.0500 time 0.4670 (0.4695) data time 0.0009 (0.0038) model time 0.4661 (0.4646) loss 3.4764 (3.0981) grad_norm 1.6908 (inf) loss_scale 2048.0000 (2886.3626) mem 16715MB [2024-08-10 10:04:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][180/625] eta 0:03:28 lr 0.000959 wd 0.0500 time 0.4639 (0.4692) data time 0.0010 (0.0037) model time 0.4629 (0.4645) loss 2.4720 (3.0863) grad_norm 1.0683 (inf) loss_scale 2048.0000 (2840.0442) mem 16715MB [2024-08-10 10:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][190/625] eta 0:03:23 lr 0.000959 wd 0.0500 time 0.4687 (0.4689) data time 0.0011 (0.0035) model time 0.4676 (0.4644) loss 3.7275 (3.0887) grad_norm 1.0907 (inf) loss_scale 2048.0000 (2798.5759) mem 16715MB [2024-08-10 10:04:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][200/625] eta 0:03:19 lr 0.000959 wd 0.0500 time 0.4602 (0.4694) data time 0.0008 (0.0034) model time 0.4594 (0.4652) loss 3.9383 (3.1032) grad_norm 1.2994 (inf) loss_scale 2048.0000 (2761.2338) mem 16715MB [2024-08-10 10:04:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][210/625] eta 0:03:15 lr 0.000959 wd 0.0500 time 0.7012 (0.4703) data time 0.0008 (0.0033) model time 0.7004 (0.4666) loss 2.5822 (3.0951) grad_norm 1.4374 (inf) loss_scale 2048.0000 (2727.4313) mem 16715MB [2024-08-10 10:04:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][220/625] eta 0:03:10 lr 0.000959 wd 0.0500 time 0.4660 (0.4701) data time 0.0010 (0.0032) model time 0.4650 (0.4665) loss 3.2999 (3.1040) grad_norm 1.4933 (inf) loss_scale 2048.0000 (2696.6878) mem 16715MB [2024-08-10 10:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][230/625] eta 0:03:05 lr 0.000959 wd 0.0500 time 0.4652 (0.4700) data time 0.0011 (0.0031) model time 0.4641 (0.4665) loss 3.3917 (3.1035) grad_norm 1.0778 (inf) loss_scale 2048.0000 (2668.6061) mem 16715MB [2024-08-10 10:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][240/625] eta 0:03:00 lr 0.000958 wd 0.0500 time 0.4676 (0.4697) data time 0.0010 (0.0030) model time 0.4666 (0.4663) loss 3.5774 (3.1039) grad_norm 1.3173 (inf) loss_scale 2048.0000 (2642.8548) mem 16715MB [2024-08-10 10:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][250/625] eta 0:02:56 lr 0.000958 wd 0.0500 time 0.4638 (0.4695) data time 0.0008 (0.0029) model time 0.4630 (0.4661) loss 3.6401 (3.1028) grad_norm 2.3126 (inf) loss_scale 2048.0000 (2619.1554) mem 16715MB [2024-08-10 10:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][260/625] eta 0:02:51 lr 0.000958 wd 0.0500 time 0.4634 (0.4692) data time 0.0008 (0.0029) model time 0.4627 (0.4659) loss 3.8519 (3.1001) grad_norm 1.0800 (inf) loss_scale 2048.0000 (2597.2720) mem 16715MB [2024-08-10 10:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][270/625] eta 0:02:46 lr 0.000958 wd 0.0500 time 0.4626 (0.4690) data time 0.0008 (0.0028) model time 0.4618 (0.4657) loss 3.4114 (3.0997) grad_norm 2.6144 (inf) loss_scale 2048.0000 (2577.0037) mem 16715MB [2024-08-10 10:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][280/625] eta 0:02:41 lr 0.000958 wd 0.0500 time 0.4660 (0.4687) data time 0.0008 (0.0028) model time 0.4652 (0.4655) loss 3.6037 (3.1113) grad_norm 2.6117 (inf) loss_scale 2048.0000 (2558.1779) mem 16715MB [2024-08-10 10:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][290/625] eta 0:02:37 lr 0.000958 wd 0.0500 time 0.4585 (0.4693) data time 0.0009 (0.0027) model time 0.4576 (0.4662) loss 2.6137 (3.1045) grad_norm 1.6685 (inf) loss_scale 2048.0000 (2540.6460) mem 16715MB [2024-08-10 10:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][300/625] eta 0:02:32 lr 0.000958 wd 0.0500 time 0.4660 (0.4691) data time 0.0009 (0.0026) model time 0.4651 (0.4661) loss 3.1469 (3.1023) grad_norm 1.6529 (inf) loss_scale 2048.0000 (2524.2791) mem 16715MB [2024-08-10 10:05:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][310/625] eta 0:02:27 lr 0.000958 wd 0.0500 time 0.4623 (0.4690) data time 0.0010 (0.0026) model time 0.4613 (0.4661) loss 3.1612 (3.1034) grad_norm 1.6995 (inf) loss_scale 2048.0000 (2508.9646) mem 16715MB [2024-08-10 10:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][320/625] eta 0:02:23 lr 0.000958 wd 0.0500 time 0.4772 (0.4689) data time 0.0010 (0.0025) model time 0.4762 (0.4660) loss 3.0919 (3.1035) grad_norm 1.5482 (inf) loss_scale 2048.0000 (2494.6044) mem 16715MB [2024-08-10 10:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][330/625] eta 0:02:18 lr 0.000958 wd 0.0500 time 0.4655 (0.4688) data time 0.0010 (0.0025) model time 0.4644 (0.4659) loss 3.5502 (3.1065) grad_norm 1.4591 (inf) loss_scale 2048.0000 (2481.1118) mem 16715MB [2024-08-10 10:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][340/625] eta 0:02:13 lr 0.000958 wd 0.0500 time 0.4613 (0.4686) data time 0.0007 (0.0025) model time 0.4606 (0.4658) loss 3.2199 (3.1131) grad_norm 1.3285 (inf) loss_scale 2048.0000 (2468.4106) mem 16715MB [2024-08-10 10:05:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][350/625] eta 0:02:08 lr 0.000958 wd 0.0500 time 0.4773 (0.4685) data time 0.0010 (0.0024) model time 0.4764 (0.4657) loss 2.7454 (3.1168) grad_norm 1.1080 (inf) loss_scale 2048.0000 (2456.4330) mem 16715MB [2024-08-10 10:05:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][360/625] eta 0:02:04 lr 0.000957 wd 0.0500 time 0.4612 (0.4690) data time 0.0010 (0.0024) model time 0.4603 (0.4663) loss 3.4221 (3.1183) grad_norm 1.4268 (inf) loss_scale 2048.0000 (2445.1191) mem 16715MB [2024-08-10 10:05:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][370/625] eta 0:01:59 lr 0.000957 wd 0.0500 time 0.4683 (0.4689) data time 0.0013 (0.0023) model time 0.4670 (0.4663) loss 2.3162 (3.1151) grad_norm 2.3578 (inf) loss_scale 2048.0000 (2434.4151) mem 16715MB [2024-08-10 10:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][380/625] eta 0:01:54 lr 0.000957 wd 0.0500 time 0.4652 (0.4689) data time 0.0008 (0.0023) model time 0.4644 (0.4663) loss 4.0003 (3.1129) grad_norm 1.2187 (inf) loss_scale 2048.0000 (2424.2730) mem 16715MB [2024-08-10 10:05:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][390/625] eta 0:01:50 lr 0.000957 wd 0.0500 time 0.4670 (0.4688) data time 0.0010 (0.0023) model time 0.4660 (0.4663) loss 2.8048 (3.1112) grad_norm 1.4613 (inf) loss_scale 2048.0000 (2414.6496) mem 16715MB [2024-08-10 10:05:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][400/625] eta 0:01:45 lr 0.000957 wd 0.0500 time 0.4639 (0.4687) data time 0.0008 (0.0023) model time 0.4631 (0.4662) loss 2.4343 (3.1153) grad_norm 3.3187 (inf) loss_scale 2048.0000 (2405.5062) mem 16715MB [2024-08-10 10:06:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][410/625] eta 0:01:40 lr 0.000957 wd 0.0500 time 0.4586 (0.4686) data time 0.0010 (0.0022) model time 0.4576 (0.4661) loss 2.1670 (3.1144) grad_norm 3.5096 (inf) loss_scale 2048.0000 (2396.8078) mem 16715MB [2024-08-10 10:06:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][420/625] eta 0:01:36 lr 0.000957 wd 0.0500 time 0.4633 (0.4689) data time 0.0008 (0.0022) model time 0.4625 (0.4665) loss 2.9074 (3.1153) grad_norm 1.2576 (inf) loss_scale 2048.0000 (2388.5226) mem 16715MB [2024-08-10 10:06:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][430/625] eta 0:01:31 lr 0.000957 wd 0.0500 time 0.4658 (0.4699) data time 0.0008 (0.0022) model time 0.4650 (0.4677) loss 2.1041 (3.1141) grad_norm 1.5346 (inf) loss_scale 2048.0000 (2380.6218) mem 16715MB [2024-08-10 10:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][440/625] eta 0:01:26 lr 0.000957 wd 0.0500 time 0.4675 (0.4699) data time 0.0010 (0.0022) model time 0.4665 (0.4677) loss 3.5617 (3.1070) grad_norm 2.1409 (inf) loss_scale 2048.0000 (2373.0794) mem 16715MB [2024-08-10 10:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][450/625] eta 0:01:22 lr 0.000957 wd 0.0500 time 0.4658 (0.4699) data time 0.0009 (0.0022) model time 0.4649 (0.4676) loss 3.1151 (3.1055) grad_norm 1.4157 (inf) loss_scale 2048.0000 (2365.8714) mem 16715MB [2024-08-10 10:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][460/625] eta 0:01:17 lr 0.000957 wd 0.0500 time 0.4624 (0.4698) data time 0.0008 (0.0022) model time 0.4617 (0.4676) loss 2.4104 (3.1066) grad_norm 1.4271 (inf) loss_scale 1024.0000 (2354.5336) mem 16715MB [2024-08-10 10:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][470/625] eta 0:01:12 lr 0.000956 wd 0.0500 time 0.4600 (0.4699) data time 0.0008 (0.0022) model time 0.4591 (0.4677) loss 2.0111 (3.1028) grad_norm 1.7970 (inf) loss_scale 1024.0000 (2326.2845) mem 16715MB [2024-08-10 10:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][480/625] eta 0:01:08 lr 0.000956 wd 0.0500 time 0.4737 (0.4698) data time 0.0007 (0.0022) model time 0.4730 (0.4675) loss 3.8302 (3.1052) grad_norm 2.5197 (inf) loss_scale 1024.0000 (2299.2100) mem 16715MB [2024-08-10 10:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][490/625] eta 0:01:03 lr 0.000956 wd 0.0500 time 0.4617 (0.4696) data time 0.0008 (0.0021) model time 0.4609 (0.4674) loss 2.9098 (3.1038) grad_norm 1.4464 (inf) loss_scale 1024.0000 (2273.2383) mem 16715MB [2024-08-10 10:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][500/625] eta 0:00:58 lr 0.000956 wd 0.0500 time 0.4708 (0.4695) data time 0.0010 (0.0021) model time 0.4697 (0.4673) loss 3.3636 (3.0990) grad_norm 1.4571 (inf) loss_scale 1024.0000 (2248.3034) mem 16715MB [2024-08-10 10:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][510/625] eta 0:00:53 lr 0.000956 wd 0.0500 time 0.4637 (0.4694) data time 0.0011 (0.0021) model time 0.4626 (0.4672) loss 3.7076 (3.0999) grad_norm 1.2438 (inf) loss_scale 1024.0000 (2224.3444) mem 16715MB [2024-08-10 10:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][520/625] eta 0:00:49 lr 0.000956 wd 0.0500 time 0.4653 (0.4693) data time 0.0009 (0.0021) model time 0.4643 (0.4671) loss 2.9818 (3.1038) grad_norm 2.0862 (inf) loss_scale 1024.0000 (2201.3052) mem 16715MB [2024-08-10 10:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][530/625] eta 0:00:44 lr 0.000956 wd 0.0500 time 0.4637 (0.4692) data time 0.0008 (0.0021) model time 0.4629 (0.4671) loss 2.6386 (3.1015) grad_norm 1.3042 (inf) loss_scale 1024.0000 (2179.1337) mem 16715MB [2024-08-10 10:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][540/625] eta 0:00:39 lr 0.000956 wd 0.0500 time 0.4642 (0.4691) data time 0.0008 (0.0021) model time 0.4634 (0.4670) loss 2.2451 (3.1006) grad_norm 1.5281 (inf) loss_scale 1024.0000 (2157.7819) mem 16715MB [2024-08-10 10:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][550/625] eta 0:00:35 lr 0.000956 wd 0.0500 time 0.4644 (0.4690) data time 0.0008 (0.0020) model time 0.4636 (0.4669) loss 2.0249 (3.0989) grad_norm 1.2750 (inf) loss_scale 1024.0000 (2137.2051) mem 16715MB [2024-08-10 10:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][560/625] eta 0:00:30 lr 0.000956 wd 0.0500 time 0.4677 (0.4690) data time 0.0008 (0.0020) model time 0.4669 (0.4669) loss 2.9169 (3.0999) grad_norm 1.8425 (inf) loss_scale 1024.0000 (2117.3619) mem 16715MB [2024-08-10 10:07:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][570/625] eta 0:00:25 lr 0.000956 wd 0.0500 time 0.4641 (0.4693) data time 0.0010 (0.0020) model time 0.4631 (0.4672) loss 3.7947 (3.0987) grad_norm 1.2628 (inf) loss_scale 1024.0000 (2098.2137) mem 16715MB [2024-08-10 10:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][580/625] eta 0:00:21 lr 0.000956 wd 0.0500 time 0.4644 (0.4692) data time 0.0008 (0.0020) model time 0.4636 (0.4671) loss 2.8586 (3.0983) grad_norm 1.2681 (inf) loss_scale 1024.0000 (2079.7246) mem 16715MB [2024-08-10 10:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][590/625] eta 0:00:16 lr 0.000955 wd 0.0500 time 0.4594 (0.4691) data time 0.0008 (0.0020) model time 0.4586 (0.4671) loss 3.4415 (3.0985) grad_norm 1.3648 (inf) loss_scale 1024.0000 (2061.8613) mem 16715MB [2024-08-10 10:07:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][600/625] eta 0:00:11 lr 0.000955 wd 0.0500 time 0.4624 (0.4691) data time 0.0008 (0.0020) model time 0.4616 (0.4670) loss 3.6466 (3.0977) grad_norm 1.9178 (inf) loss_scale 1024.0000 (2044.5923) mem 16715MB [2024-08-10 10:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][610/625] eta 0:00:07 lr 0.000955 wd 0.0500 time 0.4581 (0.4690) data time 0.0008 (0.0020) model time 0.4573 (0.4670) loss 3.5389 (3.0991) grad_norm 1.5749 (inf) loss_scale 1024.0000 (2027.8887) mem 16715MB [2024-08-10 10:07:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [103/300][620/625] eta 0:00:02 lr 0.000955 wd 0.0500 time 0.4597 (0.4690) data time 0.0008 (0.0019) model time 0.4589 (0.4670) loss 2.9691 (3.0976) grad_norm 1.4400 (inf) loss_scale 1024.0000 (2011.7230) mem 16715MB [2024-08-10 10:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 103 training takes 0:04:53 [2024-08-10 10:07:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:07:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:07:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.5210 (0.5210) Acc@1 88.135 (88.135) Acc@5 98.389 (98.389) Mem 16715MB [2024-08-10 10:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8970 (0.6927) Acc@1 77.832 (84.339) Acc@5 95.801 (97.243) Mem 16715MB [2024-08-10 10:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0625 (0.8275) Acc@1 74.121 (80.831) Acc@5 93.359 (95.738) Mem 16715MB [2024-08-10 10:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.482 Acc@5 95.693 [2024-08-10 10:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.5% [2024-08-10 10:07:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.925 (0.925) Loss 0.5005 (0.5005) Acc@1 88.770 (88.770) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 10:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.197) Loss 0.8081 (0.6261) Acc@1 80.127 (86.057) Acc@5 95.801 (97.607) Mem 16715MB [2024-08-10 10:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.160) Loss 0.9160 (0.7398) Acc@1 77.051 (82.952) Acc@5 95.020 (96.408) Mem 16715MB [2024-08-10 10:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.698 Acc@5 96.455 [2024-08-10 10:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 10:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.70% [2024-08-10 10:07:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:07:54 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][0/625] eta 0:09:10 lr 0.000955 wd 0.0500 time 0.8808 (0.8808) data time 0.4772 (0.4772) model time 0.0000 (0.0000) loss 3.7878 (3.7878) grad_norm 1.6941 (1.6941) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:07:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][10/625] eta 0:05:11 lr 0.000955 wd 0.0500 time 0.4669 (0.5063) data time 0.0008 (0.0444) model time 0.0000 (0.0000) loss 3.4770 (3.2073) grad_norm 1.2717 (1.3268) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][20/625] eta 0:04:57 lr 0.000955 wd 0.0500 time 0.4614 (0.4918) data time 0.0010 (0.0242) model time 0.0000 (0.0000) loss 3.5595 (3.1396) grad_norm 1.3822 (1.2903) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][30/625] eta 0:04:55 lr 0.000955 wd 0.0500 time 0.4657 (0.4972) data time 0.0009 (0.0170) model time 0.0000 (0.0000) loss 2.1561 (3.1199) grad_norm 1.2164 (1.4199) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][40/625] eta 0:04:46 lr 0.000955 wd 0.0500 time 0.4625 (0.4903) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 3.6621 (3.0753) grad_norm 1.3696 (1.4389) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][50/625] eta 0:04:39 lr 0.000955 wd 0.0500 time 0.4618 (0.4858) data time 0.0009 (0.0110) model time 0.0000 (0.0000) loss 2.7348 (3.0846) grad_norm 1.5929 (1.4364) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][60/625] eta 0:04:32 lr 0.000955 wd 0.0500 time 0.4677 (0.4827) data time 0.0007 (0.0094) model time 0.4669 (0.4659) loss 3.1019 (3.0903) grad_norm 1.1696 (1.4244) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][70/625] eta 0:04:26 lr 0.000955 wd 0.0500 time 0.4745 (0.4808) data time 0.0008 (0.0084) model time 0.4737 (0.4668) loss 3.4469 (3.0714) grad_norm 1.7196 (1.4823) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][80/625] eta 0:04:21 lr 0.000954 wd 0.0500 time 0.4634 (0.4790) data time 0.0010 (0.0074) model time 0.4624 (0.4661) loss 3.3567 (3.0927) grad_norm 1.2250 (1.4627) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][90/625] eta 0:04:16 lr 0.000954 wd 0.0500 time 0.4953 (0.4787) data time 0.0008 (0.0071) model time 0.4945 (0.4676) loss 4.0407 (3.0873) grad_norm 1.9890 (1.4856) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][100/625] eta 0:04:10 lr 0.000954 wd 0.0500 time 0.4671 (0.4778) data time 0.0011 (0.0067) model time 0.4660 (0.4675) loss 2.6084 (3.0762) grad_norm 1.9133 (1.4859) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][110/625] eta 0:04:05 lr 0.000954 wd 0.0500 time 0.4630 (0.4776) data time 0.0010 (0.0062) model time 0.4620 (0.4687) loss 3.1799 (3.0684) grad_norm 1.4327 (1.4684) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][120/625] eta 0:04:01 lr 0.000954 wd 0.0500 time 0.4618 (0.4783) data time 0.0008 (0.0058) model time 0.4610 (0.4709) loss 2.7450 (3.0609) grad_norm 1.5500 (1.4630) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:08:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][130/625] eta 0:03:56 lr 0.000954 wd 0.0500 time 0.4633 (0.4775) data time 0.0010 (0.0056) model time 0.4623 (0.4701) loss 2.6231 (3.0346) grad_norm 2.0265 (1.4787) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][140/625] eta 0:03:51 lr 0.000954 wd 0.0500 time 0.4641 (0.4782) data time 0.0010 (0.0053) model time 0.4631 (0.4719) loss 2.1524 (3.0368) grad_norm 1.3688 (1.4781) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][150/625] eta 0:03:46 lr 0.000954 wd 0.0500 time 0.4667 (0.4772) data time 0.0010 (0.0050) model time 0.4658 (0.4710) loss 3.1656 (3.0521) grad_norm 2.1352 (1.5167) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][160/625] eta 0:03:41 lr 0.000954 wd 0.0500 time 0.4670 (0.4766) data time 0.0008 (0.0048) model time 0.4661 (0.4704) loss 1.9494 (3.0656) grad_norm 1.2827 (1.5344) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][170/625] eta 0:03:36 lr 0.000954 wd 0.0500 time 0.4654 (0.4763) data time 0.0010 (0.0046) model time 0.4644 (0.4704) loss 2.7219 (3.0655) grad_norm 1.7451 (1.5427) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][180/625] eta 0:03:31 lr 0.000954 wd 0.0500 time 0.4735 (0.4758) data time 0.0009 (0.0044) model time 0.4726 (0.4701) loss 2.9594 (3.0711) grad_norm 1.0868 (1.5376) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][190/625] eta 0:03:26 lr 0.000954 wd 0.0500 time 0.4613 (0.4755) data time 0.0010 (0.0042) model time 0.4602 (0.4700) loss 2.3194 (3.0633) grad_norm 1.2579 (1.5338) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][200/625] eta 0:03:21 lr 0.000953 wd 0.0500 time 0.4617 (0.4750) data time 0.0010 (0.0041) model time 0.4607 (0.4696) loss 2.8753 (3.0643) grad_norm 1.8688 (1.5359) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][210/625] eta 0:03:16 lr 0.000953 wd 0.0500 time 0.4655 (0.4744) data time 0.0008 (0.0040) model time 0.4647 (0.4691) loss 3.2894 (3.0727) grad_norm 2.8108 (1.5586) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][220/625] eta 0:03:11 lr 0.000953 wd 0.0500 time 0.4631 (0.4739) data time 0.0010 (0.0038) model time 0.4621 (0.4687) loss 3.2302 (3.0728) grad_norm 1.4422 (1.5586) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][230/625] eta 0:03:07 lr 0.000953 wd 0.0500 time 0.4657 (0.4735) data time 0.0010 (0.0037) model time 0.4647 (0.4684) loss 3.0287 (3.0695) grad_norm 1.7628 (1.5582) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][240/625] eta 0:03:02 lr 0.000953 wd 0.0500 time 0.4622 (0.4731) data time 0.0008 (0.0036) model time 0.4614 (0.4681) loss 1.9334 (3.0683) grad_norm 0.9563 (1.5592) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][250/625] eta 0:02:57 lr 0.000953 wd 0.0500 time 0.4672 (0.4729) data time 0.0008 (0.0035) model time 0.4664 (0.4680) loss 3.1130 (3.0694) grad_norm 1.0894 (1.5591) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:09:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][260/625] eta 0:02:52 lr 0.000953 wd 0.0500 time 0.4600 (0.4726) data time 0.0008 (0.0034) model time 0.4593 (0.4679) loss 3.9538 (3.0726) grad_norm 1.5083 (1.5574) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][270/625] eta 0:02:47 lr 0.000953 wd 0.0500 time 0.4596 (0.4724) data time 0.0009 (0.0033) model time 0.4588 (0.4678) loss 3.9586 (3.0861) grad_norm 1.1008 (1.5549) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][280/625] eta 0:02:42 lr 0.000953 wd 0.0500 time 0.4663 (0.4722) data time 0.0008 (0.0032) model time 0.4655 (0.4677) loss 3.3962 (3.0927) grad_norm 1.7910 (1.5558) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][290/625] eta 0:02:38 lr 0.000953 wd 0.0500 time 0.4634 (0.4719) data time 0.0008 (0.0032) model time 0.4626 (0.4674) loss 2.1939 (3.0875) grad_norm 1.5385 (1.5510) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][300/625] eta 0:02:33 lr 0.000953 wd 0.0500 time 0.4662 (0.4715) data time 0.0008 (0.0031) model time 0.4654 (0.4672) loss 2.0087 (3.0824) grad_norm 1.6624 (1.5507) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][310/625] eta 0:02:28 lr 0.000952 wd 0.0500 time 0.4653 (0.4713) data time 0.0007 (0.0030) model time 0.4646 (0.4671) loss 2.7140 (3.0831) grad_norm 1.5244 (1.5427) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][320/625] eta 0:02:23 lr 0.000952 wd 0.0500 time 0.4644 (0.4712) data time 0.0009 (0.0030) model time 0.4635 (0.4671) loss 3.3427 (3.0931) grad_norm 2.0314 (1.5655) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][330/625] eta 0:02:18 lr 0.000952 wd 0.0500 time 0.4873 (0.4712) data time 0.0009 (0.0029) model time 0.4864 (0.4671) loss 3.2747 (3.1003) grad_norm 1.9785 (1.6010) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][340/625] eta 0:02:14 lr 0.000952 wd 0.0500 time 0.4648 (0.4710) data time 0.0008 (0.0028) model time 0.4640 (0.4670) loss 3.7575 (3.1087) grad_norm 1.2685 (1.5975) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][350/625] eta 0:02:09 lr 0.000952 wd 0.0500 time 0.4638 (0.4708) data time 0.0008 (0.0028) model time 0.4630 (0.4669) loss 2.9812 (3.1151) grad_norm 1.2333 (1.5928) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][360/625] eta 0:02:04 lr 0.000952 wd 0.0500 time 0.4617 (0.4712) data time 0.0010 (0.0028) model time 0.4607 (0.4673) loss 2.4875 (3.1106) grad_norm 0.9084 (1.5863) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][370/625] eta 0:02:00 lr 0.000952 wd 0.0500 time 0.4630 (0.4717) data time 0.0008 (0.0028) model time 0.4622 (0.4680) loss 3.6785 (3.1084) grad_norm 2.3891 (1.5879) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][380/625] eta 0:01:55 lr 0.000952 wd 0.0500 time 0.4741 (0.4715) data time 0.0009 (0.0027) model time 0.4732 (0.4678) loss 2.7384 (3.1061) grad_norm 3.0423 (1.5940) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:10:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][390/625] eta 0:01:50 lr 0.000952 wd 0.0500 time 0.4631 (0.4713) data time 0.0010 (0.0027) model time 0.4621 (0.4677) loss 3.1753 (3.0981) grad_norm 1.2028 (1.5906) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][400/625] eta 0:01:45 lr 0.000952 wd 0.0500 time 0.4612 (0.4710) data time 0.0010 (0.0026) model time 0.4602 (0.4675) loss 2.8649 (3.1018) grad_norm 1.3025 (1.5849) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][410/625] eta 0:01:41 lr 0.000952 wd 0.0500 time 0.4650 (0.4709) data time 0.0012 (0.0026) model time 0.4638 (0.4674) loss 3.8337 (3.1074) grad_norm 2.0760 (1.5858) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][420/625] eta 0:01:36 lr 0.000952 wd 0.0500 time 0.4598 (0.4708) data time 0.0007 (0.0026) model time 0.4590 (0.4673) loss 3.7629 (3.1144) grad_norm 1.4265 (1.5806) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][430/625] eta 0:01:31 lr 0.000951 wd 0.0500 time 0.4655 (0.4707) data time 0.0013 (0.0025) model time 0.4642 (0.4673) loss 1.9148 (3.1095) grad_norm 1.0370 (1.5761) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][440/625] eta 0:01:27 lr 0.000951 wd 0.0500 time 0.4632 (0.4705) data time 0.0009 (0.0025) model time 0.4623 (0.4671) loss 3.1416 (3.1159) grad_norm 1.2240 (1.5747) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][450/625] eta 0:01:22 lr 0.000951 wd 0.0500 time 0.4588 (0.4704) data time 0.0008 (0.0025) model time 0.4580 (0.4670) loss 4.2846 (3.1200) grad_norm 1.6074 (1.5820) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][460/625] eta 0:01:17 lr 0.000951 wd 0.0500 time 0.4645 (0.4707) data time 0.0008 (0.0024) model time 0.4638 (0.4675) loss 2.6698 (3.1210) grad_norm 1.9611 (1.5878) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][470/625] eta 0:01:12 lr 0.000951 wd 0.0500 time 0.4656 (0.4706) data time 0.0008 (0.0024) model time 0.4648 (0.4674) loss 3.6591 (3.1213) grad_norm 1.3683 (1.5902) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][480/625] eta 0:01:08 lr 0.000951 wd 0.0500 time 0.4643 (0.4708) data time 0.0011 (0.0024) model time 0.4632 (0.4677) loss 2.2045 (3.1148) grad_norm 1.9700 (1.5875) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][490/625] eta 0:01:03 lr 0.000951 wd 0.0500 time 0.4590 (0.4706) data time 0.0008 (0.0023) model time 0.4582 (0.4676) loss 3.6595 (3.1203) grad_norm 1.7100 (1.5827) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][500/625] eta 0:00:58 lr 0.000951 wd 0.0500 time 0.4570 (0.4705) data time 0.0008 (0.0023) model time 0.4562 (0.4674) loss 2.4399 (3.1163) grad_norm 2.2495 (1.5839) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][510/625] eta 0:00:54 lr 0.000951 wd 0.0500 time 0.4649 (0.4703) data time 0.0011 (0.0023) model time 0.4638 (0.4673) loss 3.3961 (3.1129) grad_norm 1.9144 (1.5902) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][520/625] eta 0:00:49 lr 0.000951 wd 0.0500 time 0.4560 (0.4701) data time 0.0009 (0.0023) model time 0.4551 (0.4671) loss 3.6710 (3.1168) grad_norm 1.8117 (1.5913) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][530/625] eta 0:00:44 lr 0.000951 wd 0.0500 time 0.4564 (0.4700) data time 0.0007 (0.0022) model time 0.4557 (0.4670) loss 2.5072 (3.1170) grad_norm 1.2753 (1.5889) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][540/625] eta 0:00:39 lr 0.000950 wd 0.0500 time 0.4598 (0.4698) data time 0.0010 (0.0022) model time 0.4588 (0.4669) loss 2.9124 (3.1125) grad_norm 1.3429 (1.5865) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][550/625] eta 0:00:35 lr 0.000950 wd 0.0500 time 0.4624 (0.4699) data time 0.0007 (0.0022) model time 0.4617 (0.4670) loss 2.0802 (3.1169) grad_norm 1.3456 (1.5890) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][560/625] eta 0:00:30 lr 0.000950 wd 0.0500 time 0.4656 (0.4701) data time 0.0007 (0.0022) model time 0.4649 (0.4673) loss 2.5571 (3.1137) grad_norm 1.1356 (1.5839) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][570/625] eta 0:00:25 lr 0.000950 wd 0.0500 time 0.4598 (0.4700) data time 0.0010 (0.0022) model time 0.4587 (0.4672) loss 2.7932 (3.1153) grad_norm 1.4465 (1.5824) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][580/625] eta 0:00:21 lr 0.000950 wd 0.0500 time 0.4575 (0.4699) data time 0.0008 (0.0021) model time 0.4567 (0.4671) loss 3.9027 (3.1148) grad_norm 1.6988 (1.5854) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][590/625] eta 0:00:16 lr 0.000950 wd 0.0500 time 0.4587 (0.4697) data time 0.0010 (0.0021) model time 0.4577 (0.4670) loss 3.2718 (3.1122) grad_norm 1.2676 (1.5831) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][600/625] eta 0:00:11 lr 0.000950 wd 0.0500 time 0.4659 (0.4696) data time 0.0007 (0.0021) model time 0.4652 (0.4669) loss 2.2497 (3.1089) grad_norm 1.7114 (1.5810) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][610/625] eta 0:00:07 lr 0.000950 wd 0.0500 time 0.4658 (0.4696) data time 0.0005 (0.0021) model time 0.4652 (0.4668) loss 3.3334 (3.1049) grad_norm 1.4895 (1.5819) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [104/300][620/625] eta 0:00:02 lr 0.000950 wd 0.0500 time 0.4616 (0.4694) data time 0.0007 (0.0021) model time 0.4609 (0.4667) loss 2.2698 (3.1046) grad_norm 1.2899 (1.5845) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 104 training takes 0:04:53 [2024-08-10 10:12:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:12:49 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.5781 (0.5781) Acc@1 85.986 (85.986) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 10:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.9307 (0.6967) Acc@1 77.734 (84.579) Acc@5 94.629 (97.217) Mem 16715MB [2024-08-10 10:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0371 (0.8310) Acc@1 75.146 (81.197) Acc@5 94.287 (95.761) Mem 16715MB [2024-08-10 10:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.776 Acc@5 95.701 [2024-08-10 10:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-10 10:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.810 (0.810) Loss 0.4993 (0.4993) Acc@1 88.770 (88.770) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 10:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.192) Loss 0.8062 (0.6253) Acc@1 80.176 (85.977) Acc@5 95.850 (97.607) Mem 16715MB [2024-08-10 10:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.157) Loss 0.9146 (0.7386) Acc@1 77.197 (82.919) Acc@5 94.971 (96.438) Mem 16715MB [2024-08-10 10:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.648 Acc@5 96.485 [2024-08-10 10:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-10 10:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][0/625] eta 0:13:48 lr 0.000950 wd 0.0500 time 1.3260 (1.3260) data time 0.7312 (0.7312) model time 0.0000 (0.0000) loss 2.0811 (2.0811) grad_norm 1.1679 (1.1679) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][10/625] eta 0:05:33 lr 0.000950 wd 0.0500 time 0.4609 (0.5418) data time 0.0008 (0.0675) model time 0.0000 (0.0000) loss 3.1144 (3.1920) grad_norm 1.1782 (1.3213) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][20/625] eta 0:05:04 lr 0.000950 wd 0.0500 time 0.4593 (0.5039) data time 0.0010 (0.0359) model time 0.0000 (0.0000) loss 2.9632 (3.0821) grad_norm 1.1590 (1.3521) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][30/625] eta 0:04:52 lr 0.000949 wd 0.0500 time 0.4661 (0.4911) data time 0.0010 (0.0247) model time 0.0000 (0.0000) loss 3.4170 (3.1392) grad_norm 1.9077 (1.5713) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][40/625] eta 0:04:46 lr 0.000949 wd 0.0500 time 0.4607 (0.4897) data time 0.0010 (0.0189) model time 0.0000 (0.0000) loss 3.3685 (3.1359) grad_norm 1.6361 (1.5627) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][50/625] eta 0:04:38 lr 0.000949 wd 0.0500 time 0.4626 (0.4850) data time 0.0010 (0.0154) model time 0.0000 (0.0000) loss 3.1253 (3.1663) grad_norm 1.3919 (1.5484) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][60/625] eta 0:04:32 lr 0.000949 wd 0.0500 time 0.4639 (0.4818) data time 0.0010 (0.0131) model time 0.4629 (0.4640) loss 3.2923 (3.1397) grad_norm 0.9444 (1.5124) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][70/625] eta 0:04:27 lr 0.000949 wd 0.0500 time 0.4660 (0.4819) data time 0.0007 (0.0114) model time 0.4653 (0.4730) loss 3.4391 (3.1676) grad_norm 1.4553 (1.5034) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][80/625] eta 0:04:22 lr 0.000949 wd 0.0500 time 0.4598 (0.4814) data time 0.0007 (0.0101) model time 0.4591 (0.4742) loss 3.8404 (3.1545) grad_norm 1.9350 (1.5717) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][90/625] eta 0:04:16 lr 0.000949 wd 0.0500 time 0.4626 (0.4793) data time 0.0011 (0.0091) model time 0.4615 (0.4709) loss 2.1776 (3.1153) grad_norm 1.1706 (1.5698) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][100/625] eta 0:04:10 lr 0.000949 wd 0.0500 time 0.4650 (0.4775) data time 0.0008 (0.0083) model time 0.4643 (0.4688) loss 2.8961 (3.1278) grad_norm 1.1732 (1.5675) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][110/625] eta 0:04:05 lr 0.000949 wd 0.0500 time 0.4760 (0.4763) data time 0.0009 (0.0076) model time 0.4751 (0.4679) loss 2.7746 (3.1484) grad_norm 1.5379 (1.5597) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][120/625] eta 0:04:00 lr 0.000949 wd 0.0500 time 0.4599 (0.4754) data time 0.0008 (0.0071) model time 0.4591 (0.4673) loss 3.3058 (3.1501) grad_norm 1.7219 (1.5652) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][130/625] eta 0:03:55 lr 0.000949 wd 0.0500 time 0.4639 (0.4764) data time 0.0008 (0.0066) model time 0.4631 (0.4699) loss 3.7790 (3.1388) grad_norm 2.6395 (1.5819) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][140/625] eta 0:03:50 lr 0.000949 wd 0.0500 time 0.4678 (0.4756) data time 0.0011 (0.0062) model time 0.4667 (0.4692) loss 3.1986 (3.1124) grad_norm 1.1152 (1.5617) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][150/625] eta 0:03:45 lr 0.000948 wd 0.0500 time 0.4640 (0.4747) data time 0.0007 (0.0059) model time 0.4633 (0.4684) loss 3.3389 (3.1244) grad_norm 1.3171 (1.5516) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][160/625] eta 0:03:40 lr 0.000948 wd 0.0500 time 0.4693 (0.4741) data time 0.0008 (0.0056) model time 0.4685 (0.4679) loss 3.1275 (3.1284) grad_norm 1.5083 (1.5621) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][170/625] eta 0:03:35 lr 0.000948 wd 0.0500 time 0.4624 (0.4734) data time 0.0008 (0.0053) model time 0.4617 (0.4675) loss 3.3959 (3.1193) grad_norm 1.6051 (1.5635) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][180/625] eta 0:03:30 lr 0.000948 wd 0.0500 time 0.4688 (0.4728) data time 0.0008 (0.0051) model time 0.4680 (0.4670) loss 2.2937 (3.1113) grad_norm 1.3304 (1.5552) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][190/625] eta 0:03:25 lr 0.000948 wd 0.0500 time 0.4638 (0.4724) data time 0.0011 (0.0049) model time 0.4627 (0.4667) loss 3.7543 (3.1231) grad_norm 1.0051 (1.5507) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][200/625] eta 0:03:20 lr 0.000948 wd 0.0500 time 0.4657 (0.4720) data time 0.0010 (0.0047) model time 0.4647 (0.4665) loss 2.8495 (3.1127) grad_norm 1.9257 (1.5527) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][210/625] eta 0:03:15 lr 0.000948 wd 0.0500 time 0.4669 (0.4716) data time 0.0011 (0.0045) model time 0.4658 (0.4663) loss 3.7757 (3.1216) grad_norm 2.2265 (1.5525) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][220/625] eta 0:03:10 lr 0.000948 wd 0.0500 time 0.4679 (0.4713) data time 0.0008 (0.0044) model time 0.4671 (0.4661) loss 3.9175 (3.1167) grad_norm 1.4592 (1.5510) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][230/625] eta 0:03:06 lr 0.000948 wd 0.0500 time 0.4607 (0.4715) data time 0.0008 (0.0042) model time 0.4599 (0.4667) loss 3.4552 (3.1173) grad_norm 1.2366 (1.5501) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][240/625] eta 0:03:01 lr 0.000948 wd 0.0500 time 0.4660 (0.4712) data time 0.0008 (0.0041) model time 0.4652 (0.4664) loss 3.2148 (3.1162) grad_norm 1.3673 (1.5482) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][250/625] eta 0:02:56 lr 0.000948 wd 0.0500 time 0.4652 (0.4709) data time 0.0010 (0.0040) model time 0.4642 (0.4662) loss 3.3963 (3.1196) grad_norm 1.7638 (1.5455) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][260/625] eta 0:02:51 lr 0.000947 wd 0.0500 time 0.4616 (0.4709) data time 0.0011 (0.0039) model time 0.4605 (0.4664) loss 2.9691 (3.1194) grad_norm 1.2905 (1.5447) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][270/625] eta 0:02:47 lr 0.000947 wd 0.0500 time 0.4664 (0.4707) data time 0.0010 (0.0038) model time 0.4653 (0.4663) loss 3.1530 (3.1153) grad_norm 1.1611 (1.5390) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][280/625] eta 0:02:42 lr 0.000947 wd 0.0500 time 0.4610 (0.4705) data time 0.0011 (0.0037) model time 0.4599 (0.4662) loss 3.4401 (3.1236) grad_norm 1.5329 (1.5417) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][290/625] eta 0:02:37 lr 0.000947 wd 0.0500 time 0.4620 (0.4703) data time 0.0008 (0.0036) model time 0.4612 (0.4661) loss 3.5537 (3.1145) grad_norm 1.4090 (1.5406) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][300/625] eta 0:02:32 lr 0.000947 wd 0.0500 time 0.4616 (0.4701) data time 0.0010 (0.0035) model time 0.4606 (0.4660) loss 3.1044 (3.1167) grad_norm 1.0910 (1.5338) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][310/625] eta 0:02:28 lr 0.000947 wd 0.0500 time 0.4633 (0.4699) data time 0.0011 (0.0034) model time 0.4623 (0.4659) loss 3.1087 (3.1124) grad_norm 1.6776 (1.5374) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][320/625] eta 0:02:23 lr 0.000947 wd 0.0500 time 0.4646 (0.4697) data time 0.0008 (0.0034) model time 0.4637 (0.4657) loss 3.8451 (3.1069) grad_norm 1.4437 (1.5357) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][330/625] eta 0:02:18 lr 0.000947 wd 0.0500 time 0.4657 (0.4696) data time 0.0008 (0.0033) model time 0.4649 (0.4657) loss 3.2064 (3.0986) grad_norm 1.3028 (1.5326) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][340/625] eta 0:02:13 lr 0.000947 wd 0.0500 time 0.4682 (0.4695) data time 0.0008 (0.0032) model time 0.4674 (0.4657) loss 3.3859 (3.1053) grad_norm 1.5480 (1.5279) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][350/625] eta 0:02:09 lr 0.000947 wd 0.0500 time 0.4662 (0.4699) data time 0.0007 (0.0032) model time 0.4655 (0.4663) loss 2.3622 (3.1066) grad_norm 1.7300 (1.5250) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][360/625] eta 0:02:04 lr 0.000947 wd 0.0500 time 0.4641 (0.4698) data time 0.0011 (0.0031) model time 0.4630 (0.4662) loss 3.3099 (3.1058) grad_norm 1.5052 (1.5289) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][370/625] eta 0:01:59 lr 0.000947 wd 0.0500 time 0.4606 (0.4696) data time 0.0009 (0.0031) model time 0.4597 (0.4661) loss 3.2116 (3.1094) grad_norm 2.1562 (1.5295) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][380/625] eta 0:01:55 lr 0.000946 wd 0.0500 time 0.4608 (0.4694) data time 0.0008 (0.0030) model time 0.4600 (0.4659) loss 2.9859 (3.1073) grad_norm 1.3392 (1.5277) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:15:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][390/625] eta 0:01:50 lr 0.000946 wd 0.0500 time 0.4602 (0.4692) data time 0.0008 (0.0030) model time 0.4594 (0.4658) loss 3.4869 (3.1070) grad_norm 1.5607 (1.5215) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][400/625] eta 0:01:45 lr 0.000946 wd 0.0500 time 0.4672 (0.4691) data time 0.0008 (0.0029) model time 0.4664 (0.4657) loss 3.5594 (3.1022) grad_norm 1.1983 (1.5218) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][410/625] eta 0:01:40 lr 0.000946 wd 0.0500 time 0.4608 (0.4694) data time 0.0010 (0.0029) model time 0.4599 (0.4660) loss 3.0030 (3.1052) grad_norm 1.4812 (1.5200) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][420/625] eta 0:01:36 lr 0.000946 wd 0.0500 time 0.4627 (0.4693) data time 0.0012 (0.0028) model time 0.4615 (0.4660) loss 3.3801 (3.1052) grad_norm 1.7131 (1.5314) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][430/625] eta 0:01:31 lr 0.000946 wd 0.0500 time 0.4627 (0.4692) data time 0.0010 (0.0028) model time 0.4617 (0.4659) loss 3.1299 (3.1028) grad_norm 1.8873 (1.5314) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][440/625] eta 0:01:26 lr 0.000946 wd 0.0500 time 0.4639 (0.4690) data time 0.0010 (0.0027) model time 0.4629 (0.4659) loss 3.3463 (3.1044) grad_norm 1.3374 (1.5317) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][450/625] eta 0:01:22 lr 0.000946 wd 0.0500 time 0.4632 (0.4693) data time 0.0007 (0.0027) model time 0.4625 (0.4662) loss 3.2122 (3.0973) grad_norm 1.5806 (1.5286) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][460/625] eta 0:01:17 lr 0.000946 wd 0.0500 time 0.4616 (0.4692) data time 0.0010 (0.0027) model time 0.4606 (0.4661) loss 3.2525 (3.1020) grad_norm 2.2412 (1.5308) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][470/625] eta 0:01:12 lr 0.000946 wd 0.0500 time 0.4606 (0.4690) data time 0.0012 (0.0026) model time 0.4594 (0.4660) loss 2.4196 (3.1044) grad_norm 2.2025 (1.5296) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][480/625] eta 0:01:08 lr 0.000946 wd 0.0500 time 0.4633 (0.4693) data time 0.0011 (0.0026) model time 0.4622 (0.4664) loss 3.2717 (3.1033) grad_norm 3.0086 (1.5378) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][490/625] eta 0:01:03 lr 0.000945 wd 0.0500 time 0.4672 (0.4693) data time 0.0008 (0.0026) model time 0.4664 (0.4663) loss 3.2371 (3.1076) grad_norm 1.3567 (1.5485) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][500/625] eta 0:00:58 lr 0.000945 wd 0.0500 time 0.4627 (0.4697) data time 0.0008 (0.0025) model time 0.4619 (0.4668) loss 2.7710 (3.1025) grad_norm 1.4196 (1.5456) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:16:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][510/625] eta 0:00:54 lr 0.000945 wd 0.0500 time 0.4665 (0.4696) data time 0.0010 (0.0025) model time 0.4656 (0.4668) loss 3.4989 (3.1003) grad_norm 1.6152 (1.5422) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][520/625] eta 0:00:49 lr 0.000945 wd 0.0500 time 0.4663 (0.4695) data time 0.0009 (0.0025) model time 0.4653 (0.4667) loss 3.2853 (3.1026) grad_norm 1.9279 (1.5442) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][530/625] eta 0:00:44 lr 0.000945 wd 0.0500 time 0.4639 (0.4694) data time 0.0010 (0.0025) model time 0.4629 (0.4666) loss 3.6207 (3.1033) grad_norm 1.7657 (1.5459) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][540/625] eta 0:00:39 lr 0.000945 wd 0.0500 time 0.4668 (0.4693) data time 0.0010 (0.0024) model time 0.4658 (0.4665) loss 3.2388 (3.1064) grad_norm 1.7227 (1.5468) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][550/625] eta 0:00:35 lr 0.000945 wd 0.0500 time 0.4628 (0.4692) data time 0.0010 (0.0024) model time 0.4618 (0.4665) loss 2.8709 (3.1061) grad_norm 1.1177 (1.5468) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][560/625] eta 0:00:30 lr 0.000945 wd 0.0500 time 0.4649 (0.4691) data time 0.0010 (0.0024) model time 0.4639 (0.4664) loss 3.6671 (3.1061) grad_norm 1.5903 (1.5434) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][570/625] eta 0:00:25 lr 0.000945 wd 0.0500 time 0.4639 (0.4691) data time 0.0008 (0.0024) model time 0.4631 (0.4664) loss 3.3754 (3.1066) grad_norm 1.5656 (1.5424) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][580/625] eta 0:00:21 lr 0.000945 wd 0.0500 time 0.4601 (0.4690) data time 0.0011 (0.0024) model time 0.4589 (0.4663) loss 3.0397 (3.1028) grad_norm 1.3007 (1.5445) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][590/625] eta 0:00:16 lr 0.000945 wd 0.0500 time 0.4618 (0.4692) data time 0.0010 (0.0023) model time 0.4608 (0.4666) loss 3.4218 (3.1062) grad_norm 1.3043 (1.5438) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][600/625] eta 0:00:11 lr 0.000944 wd 0.0500 time 0.4623 (0.4693) data time 0.0011 (0.0023) model time 0.4612 (0.4667) loss 3.4692 (3.1047) grad_norm 2.0513 (1.5427) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][610/625] eta 0:00:07 lr 0.000944 wd 0.0500 time 0.4605 (0.4692) data time 0.0005 (0.0023) model time 0.4600 (0.4667) loss 2.6122 (3.1047) grad_norm 1.5322 (1.5456) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [105/300][620/625] eta 0:00:02 lr 0.000944 wd 0.0500 time 0.4601 (0.4691) data time 0.0005 (0.0023) model time 0.4596 (0.4665) loss 3.7366 (3.1074) grad_norm 1.8895 (1.5516) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:17:49 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 105 training takes 0:04:53 [2024-08-10 10:17:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:17:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:17:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.562 (0.562) Loss 0.5786 (0.5786) Acc@1 87.549 (87.549) Acc@5 98.193 (98.193) Mem 16715MB [2024-08-10 10:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.166) Loss 0.9014 (0.7090) Acc@1 79.248 (84.641) Acc@5 95.117 (97.093) Mem 16715MB [2024-08-10 10:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.143) Loss 1.0303 (0.8473) Acc@1 75.439 (80.978) Acc@5 93.457 (95.585) Mem 16715MB [2024-08-10 10:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.612 Acc@5 95.597 [2024-08-10 10:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.6% [2024-08-10 10:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.815 (0.815) Loss 0.4983 (0.4983) Acc@1 88.721 (88.721) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 10:17:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.192) Loss 0.8018 (0.6247) Acc@1 80.127 (86.000) Acc@5 95.850 (97.656) Mem 16715MB [2024-08-10 10:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.9131 (0.7375) Acc@1 77.197 (82.985) Acc@5 95.166 (96.484) Mem 16715MB [2024-08-10 10:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.690 Acc@5 96.515 [2024-08-10 10:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 10:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][0/625] eta 0:20:21 lr 0.000944 wd 0.0500 time 1.9547 (1.9547) data time 0.6447 (0.6447) model time 0.0000 (0.0000) loss 3.6051 (3.6051) grad_norm 1.4223 (1.4223) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][10/625] eta 0:06:08 lr 0.000944 wd 0.0500 time 0.4613 (0.5991) data time 0.0011 (0.0596) model time 0.0000 (0.0000) loss 3.3347 (2.7901) grad_norm 1.9632 (1.7528) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][20/625] eta 0:05:23 lr 0.000944 wd 0.0500 time 0.4626 (0.5354) data time 0.0008 (0.0317) model time 0.0000 (0.0000) loss 2.6249 (2.8620) grad_norm 1.5558 (1.9150) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][30/625] eta 0:05:04 lr 0.000944 wd 0.0500 time 0.4662 (0.5121) data time 0.0008 (0.0218) model time 0.0000 (0.0000) loss 2.6118 (2.9809) grad_norm 1.4223 (1.8589) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][40/625] eta 0:04:52 lr 0.000944 wd 0.0500 time 0.4641 (0.5003) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 3.4857 (3.0363) grad_norm 1.3173 (1.7521) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][50/625] eta 0:04:43 lr 0.000944 wd 0.0500 time 0.4616 (0.4933) data time 0.0011 (0.0137) model time 0.0000 (0.0000) loss 3.3125 (3.0172) grad_norm 2.0675 (1.7012) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][60/625] eta 0:04:36 lr 0.000944 wd 0.0500 time 0.4854 (0.4888) data time 0.0010 (0.0116) model time 0.4843 (0.4651) loss 3.2823 (3.0621) grad_norm 0.9866 (1.6541) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][70/625] eta 0:04:30 lr 0.000944 wd 0.0500 time 0.4676 (0.4866) data time 0.0010 (0.0103) model time 0.4666 (0.4678) loss 3.5412 (3.0638) grad_norm 1.1966 (1.6542) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][80/625] eta 0:04:24 lr 0.000944 wd 0.0500 time 0.4620 (0.4846) data time 0.0010 (0.0092) model time 0.4610 (0.4683) loss 2.9004 (3.0671) grad_norm 1.1893 (1.6365) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][90/625] eta 0:04:19 lr 0.000943 wd 0.0500 time 0.4616 (0.4855) data time 0.0008 (0.0083) model time 0.4607 (0.4741) loss 2.4777 (3.0541) grad_norm 1.0755 (1.6128) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][100/625] eta 0:04:14 lr 0.000943 wd 0.0500 time 0.4651 (0.4846) data time 0.0011 (0.0078) model time 0.4640 (0.4740) loss 3.4760 (3.0493) grad_norm 2.3525 (1.7450) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][110/625] eta 0:04:08 lr 0.000943 wd 0.0500 time 0.4743 (0.4831) data time 0.0010 (0.0073) model time 0.4733 (0.4727) loss 3.5008 (3.0349) grad_norm 1.0484 (1.7158) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][120/625] eta 0:04:03 lr 0.000943 wd 0.0500 time 0.4723 (0.4817) data time 0.0008 (0.0067) model time 0.4715 (0.4715) loss 3.5445 (3.0460) grad_norm 1.6556 (1.6930) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][130/625] eta 0:03:58 lr 0.000943 wd 0.0500 time 0.4690 (0.4819) data time 0.0008 (0.0063) model time 0.4682 (0.4730) loss 3.6560 (3.0544) grad_norm 1.5277 (1.7000) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][140/625] eta 0:03:53 lr 0.000943 wd 0.0500 time 0.4637 (0.4807) data time 0.0008 (0.0060) model time 0.4628 (0.4720) loss 2.1775 (3.0552) grad_norm 1.5269 (1.6780) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][150/625] eta 0:03:47 lr 0.000943 wd 0.0500 time 0.4671 (0.4797) data time 0.0008 (0.0056) model time 0.4663 (0.4713) loss 1.9082 (3.0426) grad_norm 1.3114 (1.6740) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][160/625] eta 0:03:42 lr 0.000943 wd 0.0500 time 0.4691 (0.4794) data time 0.0008 (0.0054) model time 0.4682 (0.4714) loss 3.1238 (3.0468) grad_norm 1.3021 (1.6519) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][170/625] eta 0:03:38 lr 0.000943 wd 0.0500 time 0.4747 (0.4798) data time 0.0009 (0.0051) model time 0.4738 (0.4726) loss 2.5170 (3.0428) grad_norm 1.5575 (1.6390) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][180/625] eta 0:03:33 lr 0.000943 wd 0.0500 time 0.4602 (0.4791) data time 0.0010 (0.0049) model time 0.4592 (0.4720) loss 3.2289 (3.0586) grad_norm 1.2868 (1.6215) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][190/625] eta 0:03:28 lr 0.000943 wd 0.0500 time 0.4594 (0.4789) data time 0.0008 (0.0047) model time 0.4587 (0.4721) loss 2.4430 (3.0548) grad_norm 1.4159 (1.6113) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][200/625] eta 0:03:23 lr 0.000943 wd 0.0500 time 0.4571 (0.4790) data time 0.0008 (0.0049) model time 0.4563 (0.4723) loss 2.2373 (3.0613) grad_norm 1.1307 (1.6066) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][210/625] eta 0:03:18 lr 0.000942 wd 0.0500 time 0.4680 (0.4784) data time 0.0010 (0.0047) model time 0.4670 (0.4718) loss 3.8112 (3.0615) grad_norm 1.4565 (1.5991) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][220/625] eta 0:03:13 lr 0.000942 wd 0.0500 time 0.4609 (0.4782) data time 0.0010 (0.0048) model time 0.4599 (0.4715) loss 3.4700 (3.0660) grad_norm 1.6846 (1.6073) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][230/625] eta 0:03:08 lr 0.000942 wd 0.0500 time 0.4626 (0.4777) data time 0.0010 (0.0047) model time 0.4616 (0.4711) loss 2.5742 (3.0649) grad_norm 1.4186 (1.5991) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][240/625] eta 0:03:03 lr 0.000942 wd 0.0500 time 0.4561 (0.4772) data time 0.0011 (0.0045) model time 0.4550 (0.4708) loss 2.2267 (3.0706) grad_norm 1.2510 (1.6006) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][250/625] eta 0:02:58 lr 0.000942 wd 0.0500 time 0.5007 (0.4768) data time 0.0008 (0.0044) model time 0.4999 (0.4706) loss 2.3193 (3.0695) grad_norm 1.2428 (1.5903) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][260/625] eta 0:02:53 lr 0.000942 wd 0.0500 time 0.4584 (0.4764) data time 0.0009 (0.0043) model time 0.4576 (0.4702) loss 2.0779 (3.0626) grad_norm 1.3957 (1.5826) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][270/625] eta 0:02:48 lr 0.000942 wd 0.0500 time 0.4646 (0.4760) data time 0.0010 (0.0042) model time 0.4636 (0.4700) loss 2.8181 (3.0644) grad_norm 1.7513 (1.5789) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][280/625] eta 0:02:44 lr 0.000942 wd 0.0500 time 0.4618 (0.4756) data time 0.0009 (0.0041) model time 0.4609 (0.4697) loss 3.5030 (3.0585) grad_norm 2.9008 (1.5840) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][290/625] eta 0:02:39 lr 0.000942 wd 0.0500 time 0.4689 (0.4753) data time 0.0010 (0.0040) model time 0.4679 (0.4696) loss 3.0886 (3.0595) grad_norm 1.4804 (1.5883) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][300/625] eta 0:02:34 lr 0.000942 wd 0.0500 time 0.4687 (0.4750) data time 0.0008 (0.0039) model time 0.4678 (0.4694) loss 2.2928 (3.0584) grad_norm 2.1954 (1.5811) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][310/625] eta 0:02:29 lr 0.000942 wd 0.0500 time 0.4656 (0.4752) data time 0.0010 (0.0038) model time 0.4645 (0.4699) loss 2.1480 (3.0533) grad_norm 1.2164 (1.5772) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][320/625] eta 0:02:24 lr 0.000941 wd 0.0500 time 0.4594 (0.4749) data time 0.0011 (0.0037) model time 0.4582 (0.4696) loss 2.6416 (3.0481) grad_norm 1.2439 (1.5736) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][330/625] eta 0:02:20 lr 0.000941 wd 0.0500 time 0.4678 (0.4746) data time 0.0008 (0.0036) model time 0.4671 (0.4694) loss 3.5038 (3.0496) grad_norm 1.4217 (1.5685) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][340/625] eta 0:02:15 lr 0.000941 wd 0.0500 time 0.4612 (0.4753) data time 0.0011 (0.0036) model time 0.4601 (0.4704) loss 3.1081 (3.0580) grad_norm 1.1644 (1.5597) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][350/625] eta 0:02:10 lr 0.000941 wd 0.0500 time 0.4680 (0.4750) data time 0.0008 (0.0035) model time 0.4673 (0.4701) loss 2.1413 (3.0579) grad_norm 1.8912 (1.5784) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][360/625] eta 0:02:05 lr 0.000941 wd 0.0500 time 0.4648 (0.4747) data time 0.0008 (0.0034) model time 0.4639 (0.4699) loss 3.2144 (3.0612) grad_norm 1.1596 (1.5801) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][370/625] eta 0:02:00 lr 0.000941 wd 0.0500 time 0.4667 (0.4745) data time 0.0011 (0.0034) model time 0.4656 (0.4698) loss 3.3467 (3.0602) grad_norm 1.5697 (1.5780) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:20:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][380/625] eta 0:01:56 lr 0.000941 wd 0.0500 time 0.4646 (0.4742) data time 0.0009 (0.0033) model time 0.4638 (0.4696) loss 3.6958 (3.0613) grad_norm 1.3794 (1.5791) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][390/625] eta 0:01:51 lr 0.000941 wd 0.0500 time 0.4649 (0.4739) data time 0.0011 (0.0032) model time 0.4638 (0.4693) loss 2.7608 (3.0589) grad_norm 1.6345 (1.5795) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][400/625] eta 0:01:46 lr 0.000941 wd 0.0500 time 0.4613 (0.4735) data time 0.0011 (0.0032) model time 0.4602 (0.4690) loss 3.4079 (3.0657) grad_norm 1.4373 (1.5843) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][410/625] eta 0:01:41 lr 0.000941 wd 0.0500 time 0.4615 (0.4736) data time 0.0011 (0.0031) model time 0.4605 (0.4691) loss 2.2725 (3.0675) grad_norm 2.3682 (1.5895) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][420/625] eta 0:01:37 lr 0.000941 wd 0.0500 time 0.4603 (0.4733) data time 0.0008 (0.0031) model time 0.4595 (0.4689) loss 3.5344 (3.0764) grad_norm 2.2157 (1.5883) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][430/625] eta 0:01:32 lr 0.000940 wd 0.0500 time 0.4644 (0.4736) data time 0.0011 (0.0031) model time 0.4633 (0.4694) loss 3.0088 (3.0799) grad_norm 1.3364 (1.5859) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][440/625] eta 0:01:27 lr 0.000940 wd 0.0500 time 0.4608 (0.4734) data time 0.0011 (0.0030) model time 0.4597 (0.4692) loss 3.2530 (3.0811) grad_norm 1.4301 (1.5838) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][450/625] eta 0:01:22 lr 0.000940 wd 0.0500 time 0.4682 (0.4732) data time 0.0011 (0.0030) model time 0.4671 (0.4690) loss 3.1986 (3.0813) grad_norm 2.1762 (1.5838) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][460/625] eta 0:01:18 lr 0.000940 wd 0.0500 time 0.4599 (0.4730) data time 0.0009 (0.0029) model time 0.4590 (0.4689) loss 2.8087 (3.0802) grad_norm 1.6991 (1.5826) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][470/625] eta 0:01:13 lr 0.000940 wd 0.0500 time 0.4613 (0.4732) data time 0.0010 (0.0029) model time 0.4603 (0.4692) loss 3.3426 (3.0824) grad_norm 1.3046 (1.5785) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][480/625] eta 0:01:08 lr 0.000940 wd 0.0500 time 0.4647 (0.4731) data time 0.0007 (0.0029) model time 0.4640 (0.4691) loss 2.7069 (3.0825) grad_norm 1.1387 (1.5757) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][490/625] eta 0:01:03 lr 0.000940 wd 0.0500 time 0.4684 (0.4729) data time 0.0008 (0.0028) model time 0.4676 (0.4690) loss 3.5141 (3.0830) grad_norm 1.1798 (1.5755) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:21:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][500/625] eta 0:00:59 lr 0.000940 wd 0.0500 time 0.4610 (0.4727) data time 0.0008 (0.0028) model time 0.4602 (0.4688) loss 2.9020 (3.0846) grad_norm 1.2480 (1.5717) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][510/625] eta 0:00:54 lr 0.000940 wd 0.0500 time 0.4610 (0.4725) data time 0.0008 (0.0028) model time 0.4602 (0.4687) loss 3.7553 (3.0891) grad_norm 3.2647 (1.5769) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][520/625] eta 0:00:49 lr 0.000940 wd 0.0500 time 0.4638 (0.4723) data time 0.0008 (0.0027) model time 0.4630 (0.4685) loss 3.9603 (3.0923) grad_norm 3.1694 (1.5830) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][530/625] eta 0:00:44 lr 0.000940 wd 0.0500 time 0.4661 (0.4726) data time 0.0010 (0.0027) model time 0.4651 (0.4689) loss 2.1867 (3.0908) grad_norm 1.7682 (1.5850) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][540/625] eta 0:00:40 lr 0.000940 wd 0.0500 time 0.4629 (0.4726) data time 0.0009 (0.0027) model time 0.4620 (0.4689) loss 3.2627 (3.0938) grad_norm 1.0780 (1.5853) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][550/625] eta 0:00:35 lr 0.000939 wd 0.0500 time 0.4627 (0.4724) data time 0.0013 (0.0026) model time 0.4614 (0.4687) loss 3.2922 (3.0983) grad_norm 1.6465 (1.5835) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][560/625] eta 0:00:30 lr 0.000939 wd 0.0500 time 0.4593 (0.4725) data time 0.0011 (0.0026) model time 0.4582 (0.4689) loss 3.4972 (3.1022) grad_norm 1.8857 (1.5840) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][570/625] eta 0:00:25 lr 0.000939 wd 0.0500 time 0.4636 (0.4725) data time 0.0013 (0.0026) model time 0.4624 (0.4689) loss 3.5348 (3.1033) grad_norm 1.1227 (1.5788) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][580/625] eta 0:00:21 lr 0.000939 wd 0.0500 time 0.4670 (0.4724) data time 0.0009 (0.0026) model time 0.4661 (0.4688) loss 3.7003 (3.1068) grad_norm 1.9682 (1.5747) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:22:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][590/625] eta 0:00:16 lr 0.000939 wd 0.0500 time 0.4630 (0.4723) data time 0.0010 (0.0025) model time 0.4620 (0.4688) loss 3.6422 (3.1106) grad_norm 1.8518 (1.5774) loss_scale 2048.0000 (1036.1286) mem 16715MB [2024-08-10 10:22:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][600/625] eta 0:00:11 lr 0.000939 wd 0.0500 time 0.4615 (0.4722) data time 0.0009 (0.0025) model time 0.4606 (0.4687) loss 2.9170 (3.1129) grad_norm 1.4653 (1.5777) loss_scale 2048.0000 (1052.9651) mem 16715MB [2024-08-10 10:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][610/625] eta 0:00:07 lr 0.000939 wd 0.0500 time 0.4595 (0.4720) data time 0.0005 (0.0025) model time 0.4590 (0.4686) loss 2.7020 (3.1159) grad_norm 1.6413 (1.5878) loss_scale 2048.0000 (1069.2504) mem 16715MB [2024-08-10 10:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [106/300][620/625] eta 0:00:02 lr 0.000939 wd 0.0500 time 0.4621 (0.4722) data time 0.0005 (0.0025) model time 0.4616 (0.4688) loss 3.8925 (3.1151) grad_norm 1.8461 (1.5887) loss_scale 2048.0000 (1085.0113) mem 16715MB [2024-08-10 10:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 106 training takes 0:04:55 [2024-08-10 10:22:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:22:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:22:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 0.5454 (0.5454) Acc@1 87.549 (87.549) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 10:22:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.164) Loss 0.8892 (0.6763) Acc@1 78.906 (84.357) Acc@5 94.531 (97.079) Mem 16715MB [2024-08-10 10:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.143) Loss 0.9907 (0.8104) Acc@1 75.879 (81.015) Acc@5 94.238 (95.624) Mem 16715MB [2024-08-10 10:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.808 Acc@5 95.653 [2024-08-10 10:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-10 10:23:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.886 (0.886) Loss 0.4971 (0.4971) Acc@1 88.672 (88.672) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 10:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.197) Loss 0.7998 (0.6239) Acc@1 80.371 (86.009) Acc@5 95.850 (97.661) Mem 16715MB [2024-08-10 10:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.9106 (0.7362) Acc@1 77.490 (83.022) Acc@5 95.117 (96.494) Mem 16715MB [2024-08-10 10:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.704 Acc@5 96.519 [2024-08-10 10:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 10:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.70% [2024-08-10 10:23:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:23:04 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][0/625] eta 0:09:05 lr 0.000939 wd 0.0500 time 0.8722 (0.8722) data time 0.4560 (0.4560) model time 0.0000 (0.0000) loss 3.7776 (3.7776) grad_norm 1.3683 (1.3683) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][10/625] eta 0:05:11 lr 0.000939 wd 0.0500 time 0.4721 (0.5070) data time 0.0008 (0.0433) model time 0.0000 (0.0000) loss 3.3016 (3.1906) grad_norm 1.9214 (1.4026) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][20/625] eta 0:04:55 lr 0.000939 wd 0.0500 time 0.4659 (0.4891) data time 0.0010 (0.0231) model time 0.0000 (0.0000) loss 3.1865 (3.1431) grad_norm 1.4542 (1.6302) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][30/625] eta 0:04:46 lr 0.000939 wd 0.0500 time 0.4684 (0.4823) data time 0.0010 (0.0161) model time 0.0000 (0.0000) loss 2.9140 (3.0834) grad_norm 1.1656 (1.6780) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][40/625] eta 0:04:40 lr 0.000938 wd 0.0500 time 0.4659 (0.4797) data time 0.0007 (0.0124) model time 0.0000 (0.0000) loss 2.5933 (3.0578) grad_norm 3.8737 (1.6787) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][50/625] eta 0:04:35 lr 0.000938 wd 0.0500 time 0.4626 (0.4794) data time 0.0010 (0.0102) model time 0.0000 (0.0000) loss 3.3657 (3.0594) grad_norm 1.3428 (1.7083) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][60/625] eta 0:04:31 lr 0.000938 wd 0.0500 time 0.4588 (0.4798) data time 0.0010 (0.0087) model time 0.4578 (0.4809) loss 2.6830 (3.0507) grad_norm 1.6784 (1.6987) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][70/625] eta 0:04:25 lr 0.000938 wd 0.0500 time 0.4618 (0.4775) data time 0.0009 (0.0076) model time 0.4609 (0.4717) loss 2.5453 (3.0497) grad_norm 1.3559 (1.6709) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][80/625] eta 0:04:19 lr 0.000938 wd 0.0500 time 0.4708 (0.4764) data time 0.0007 (0.0068) model time 0.4701 (0.4703) loss 3.9141 (3.0844) grad_norm 1.0736 (1.6485) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][90/625] eta 0:04:14 lr 0.000938 wd 0.0500 time 0.4648 (0.4752) data time 0.0007 (0.0062) model time 0.4641 (0.4688) loss 1.9382 (3.0968) grad_norm 1.2188 (1.6338) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][100/625] eta 0:04:09 lr 0.000938 wd 0.0500 time 0.4696 (0.4746) data time 0.0011 (0.0057) model time 0.4685 (0.4686) loss 3.4045 (3.1089) grad_norm 1.3084 (1.6272) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:23:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][110/625] eta 0:04:05 lr 0.000938 wd 0.0500 time 0.4657 (0.4759) data time 0.0008 (0.0053) model time 0.4649 (0.4718) loss 3.8481 (3.1228) grad_norm 2.5805 (1.6549) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][120/625] eta 0:04:00 lr 0.000938 wd 0.0500 time 0.4576 (0.4762) data time 0.0009 (0.0049) model time 0.4567 (0.4727) loss 3.2472 (3.1183) grad_norm 1.6295 (1.6439) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][130/625] eta 0:03:55 lr 0.000938 wd 0.0500 time 0.4598 (0.4749) data time 0.0008 (0.0046) model time 0.4591 (0.4710) loss 3.4192 (3.1063) grad_norm 1.2041 (1.6311) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][140/625] eta 0:03:50 lr 0.000938 wd 0.0500 time 0.4564 (0.4758) data time 0.0010 (0.0044) model time 0.4554 (0.4727) loss 2.5565 (3.0830) grad_norm 1.2493 (1.6307) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][150/625] eta 0:03:45 lr 0.000937 wd 0.0500 time 0.4702 (0.4749) data time 0.0007 (0.0041) model time 0.4695 (0.4715) loss 4.0656 (3.0810) grad_norm 1.5784 (1.6621) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][160/625] eta 0:03:40 lr 0.000937 wd 0.0500 time 0.4626 (0.4741) data time 0.0009 (0.0040) model time 0.4617 (0.4706) loss 3.1116 (3.0719) grad_norm 1.5503 (1.6641) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][170/625] eta 0:03:35 lr 0.000937 wd 0.0500 time 0.4646 (0.4736) data time 0.0008 (0.0038) model time 0.4638 (0.4701) loss 3.3116 (3.0842) grad_norm 1.5657 (1.6518) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][180/625] eta 0:03:30 lr 0.000937 wd 0.0500 time 0.4677 (0.4732) data time 0.0007 (0.0036) model time 0.4670 (0.4697) loss 3.8252 (3.0944) grad_norm 1.6354 (1.6762) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][190/625] eta 0:03:25 lr 0.000937 wd 0.0500 time 0.4659 (0.4728) data time 0.0007 (0.0035) model time 0.4651 (0.4693) loss 3.6585 (3.0892) grad_norm 1.3247 (1.6579) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][200/625] eta 0:03:20 lr 0.000937 wd 0.0500 time 0.4613 (0.4722) data time 0.0009 (0.0034) model time 0.4604 (0.4687) loss 3.0087 (3.0970) grad_norm 1.2011 (1.6593) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][210/625] eta 0:03:15 lr 0.000937 wd 0.0500 time 0.4577 (0.4718) data time 0.0007 (0.0032) model time 0.4570 (0.4684) loss 3.1868 (3.0929) grad_norm 1.2820 (1.6708) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][220/625] eta 0:03:10 lr 0.000937 wd 0.0500 time 0.4629 (0.4714) data time 0.0009 (0.0032) model time 0.4620 (0.4680) loss 3.1799 (3.1042) grad_norm 1.4285 (1.6583) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][230/625] eta 0:03:06 lr 0.000937 wd 0.0500 time 0.4680 (0.4712) data time 0.0007 (0.0031) model time 0.4672 (0.4679) loss 3.7593 (3.1037) grad_norm 1.5556 (1.6531) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:24:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][240/625] eta 0:03:01 lr 0.000937 wd 0.0500 time 0.4626 (0.4710) data time 0.0007 (0.0030) model time 0.4619 (0.4677) loss 2.0954 (3.0857) grad_norm 1.3052 (1.6455) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][250/625] eta 0:02:56 lr 0.000937 wd 0.0500 time 0.4649 (0.4713) data time 0.0010 (0.0029) model time 0.4639 (0.4682) loss 2.9069 (3.0875) grad_norm 1.2964 (1.6354) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][260/625] eta 0:02:51 lr 0.000936 wd 0.0500 time 0.4663 (0.4712) data time 0.0008 (0.0028) model time 0.4655 (0.4681) loss 3.1489 (3.0903) grad_norm 1.5059 (1.6350) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][270/625] eta 0:02:47 lr 0.000936 wd 0.0500 time 0.4645 (0.4709) data time 0.0011 (0.0028) model time 0.4634 (0.4679) loss 3.0025 (3.0902) grad_norm 1.2921 (1.6333) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][280/625] eta 0:02:42 lr 0.000936 wd 0.0500 time 0.4578 (0.4706) data time 0.0011 (0.0027) model time 0.4568 (0.4676) loss 3.5277 (3.0909) grad_norm 2.0513 (1.6231) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][290/625] eta 0:02:37 lr 0.000936 wd 0.0500 time 0.4640 (0.4709) data time 0.0009 (0.0027) model time 0.4632 (0.4680) loss 3.7582 (3.0934) grad_norm 1.1677 (1.6232) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][300/625] eta 0:02:32 lr 0.000936 wd 0.0500 time 0.4664 (0.4707) data time 0.0011 (0.0026) model time 0.4653 (0.4679) loss 3.2498 (3.0901) grad_norm 1.7541 (1.6181) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][310/625] eta 0:02:28 lr 0.000936 wd 0.0500 time 0.4638 (0.4706) data time 0.0010 (0.0026) model time 0.4627 (0.4678) loss 2.9318 (3.0882) grad_norm 1.2414 (1.6052) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][320/625] eta 0:02:23 lr 0.000936 wd 0.0500 time 0.4653 (0.4704) data time 0.0009 (0.0025) model time 0.4644 (0.4677) loss 3.1773 (3.0839) grad_norm 1.7976 (1.6024) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][330/625] eta 0:02:18 lr 0.000936 wd 0.0500 time 0.4632 (0.4708) data time 0.0010 (0.0025) model time 0.4622 (0.4682) loss 2.7585 (3.0770) grad_norm 2.1223 (1.6030) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][340/625] eta 0:02:14 lr 0.000936 wd 0.0500 time 0.4576 (0.4706) data time 0.0011 (0.0024) model time 0.4565 (0.4680) loss 3.0619 (3.0805) grad_norm 1.1736 (1.5986) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][350/625] eta 0:02:09 lr 0.000936 wd 0.0500 time 0.4631 (0.4704) data time 0.0012 (0.0024) model time 0.4619 (0.4678) loss 3.2243 (3.0805) grad_norm 1.3949 (1.5938) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][360/625] eta 0:02:04 lr 0.000936 wd 0.0500 time 0.4687 (0.4702) data time 0.0009 (0.0024) model time 0.4678 (0.4676) loss 2.8167 (3.0812) grad_norm 1.4303 (1.5914) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:25:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][370/625] eta 0:01:59 lr 0.000935 wd 0.0500 time 0.4592 (0.4700) data time 0.0011 (0.0023) model time 0.4581 (0.4675) loss 3.3297 (3.0828) grad_norm 1.1514 (1.6029) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][380/625] eta 0:01:55 lr 0.000935 wd 0.0500 time 0.4605 (0.4699) data time 0.0008 (0.0023) model time 0.4597 (0.4674) loss 3.8872 (3.0855) grad_norm 1.3484 (1.5993) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][390/625] eta 0:01:50 lr 0.000935 wd 0.0500 time 0.4673 (0.4698) data time 0.0010 (0.0023) model time 0.4663 (0.4673) loss 3.2917 (3.0874) grad_norm 1.3360 (1.5967) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][400/625] eta 0:01:45 lr 0.000935 wd 0.0500 time 0.4619 (0.4696) data time 0.0009 (0.0022) model time 0.4610 (0.4672) loss 3.1804 (3.0890) grad_norm 1.3965 (1.5960) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][410/625] eta 0:01:40 lr 0.000935 wd 0.0500 time 0.4601 (0.4695) data time 0.0009 (0.0022) model time 0.4592 (0.4670) loss 2.4565 (3.0841) grad_norm 1.0470 (1.5903) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][420/625] eta 0:01:36 lr 0.000935 wd 0.0500 time 0.4633 (0.4693) data time 0.0007 (0.0022) model time 0.4626 (0.4669) loss 3.5556 (3.0842) grad_norm 1.2454 (1.5863) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][430/625] eta 0:01:31 lr 0.000935 wd 0.0500 time 0.4618 (0.4692) data time 0.0011 (0.0021) model time 0.4608 (0.4668) loss 2.7375 (3.0811) grad_norm 2.2358 (1.5841) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][440/625] eta 0:01:26 lr 0.000935 wd 0.0500 time 0.4634 (0.4690) data time 0.0010 (0.0021) model time 0.4624 (0.4666) loss 3.2842 (3.0823) grad_norm 2.3107 (1.5905) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][450/625] eta 0:01:22 lr 0.000935 wd 0.0500 time 0.4631 (0.4692) data time 0.0010 (0.0021) model time 0.4620 (0.4668) loss 3.1825 (3.0837) grad_norm 4.0562 (inf) loss_scale 1024.0000 (2043.4590) mem 16715MB [2024-08-10 10:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][460/625] eta 0:01:17 lr 0.000935 wd 0.0500 time 0.4658 (0.4691) data time 0.0011 (0.0021) model time 0.4647 (0.4668) loss 2.8269 (3.0823) grad_norm 1.4471 (inf) loss_scale 1024.0000 (2021.3449) mem 16715MB [2024-08-10 10:26:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][470/625] eta 0:01:12 lr 0.000935 wd 0.0500 time 0.4644 (0.4700) data time 0.0008 (0.0021) model time 0.4636 (0.4678) loss 2.9305 (3.0877) grad_norm 1.1353 (inf) loss_scale 1024.0000 (2000.1699) mem 16715MB [2024-08-10 10:26:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][480/625] eta 0:01:08 lr 0.000935 wd 0.0500 time 0.4621 (0.4708) data time 0.0009 (0.0020) model time 0.4612 (0.4687) loss 3.8232 (3.0921) grad_norm 1.3127 (inf) loss_scale 1024.0000 (1979.8753) mem 16715MB [2024-08-10 10:26:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][490/625] eta 0:01:03 lr 0.000934 wd 0.0500 time 0.4625 (0.4706) data time 0.0007 (0.0020) model time 0.4617 (0.4685) loss 3.0601 (3.0951) grad_norm 1.3458 (inf) loss_scale 1024.0000 (1960.4073) mem 16715MB [2024-08-10 10:27:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][500/625] eta 0:00:58 lr 0.000934 wd 0.0500 time 0.4608 (0.4705) data time 0.0011 (0.0020) model time 0.4597 (0.4684) loss 3.2683 (3.0985) grad_norm 1.6199 (inf) loss_scale 1024.0000 (1941.7166) mem 16715MB [2024-08-10 10:27:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][510/625] eta 0:00:54 lr 0.000934 wd 0.0500 time 0.4644 (0.4704) data time 0.0008 (0.0020) model time 0.4635 (0.4683) loss 2.9145 (3.1004) grad_norm 1.4422 (inf) loss_scale 1024.0000 (1923.7573) mem 16715MB [2024-08-10 10:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][520/625] eta 0:00:49 lr 0.000934 wd 0.0500 time 0.4658 (0.4702) data time 0.0009 (0.0020) model time 0.4649 (0.4682) loss 3.5610 (3.0995) grad_norm 1.6651 (inf) loss_scale 1024.0000 (1906.4875) mem 16715MB [2024-08-10 10:27:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][530/625] eta 0:00:44 lr 0.000934 wd 0.0500 time 0.4644 (0.4701) data time 0.0010 (0.0020) model time 0.4634 (0.4681) loss 3.7679 (3.0994) grad_norm 1.5806 (inf) loss_scale 1024.0000 (1889.8682) mem 16715MB [2024-08-10 10:27:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][540/625] eta 0:00:39 lr 0.000934 wd 0.0500 time 0.4660 (0.4700) data time 0.0010 (0.0019) model time 0.4650 (0.4680) loss 2.9445 (3.0975) grad_norm 1.2389 (inf) loss_scale 1024.0000 (1873.8632) mem 16715MB [2024-08-10 10:27:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][550/625] eta 0:00:35 lr 0.000934 wd 0.0500 time 0.4625 (0.4699) data time 0.0010 (0.0019) model time 0.4615 (0.4679) loss 3.4713 (3.0919) grad_norm 2.9088 (inf) loss_scale 1024.0000 (1858.4392) mem 16715MB [2024-08-10 10:27:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][560/625] eta 0:00:30 lr 0.000934 wd 0.0500 time 0.4597 (0.4698) data time 0.0007 (0.0019) model time 0.4590 (0.4678) loss 3.7567 (3.0971) grad_norm 1.3248 (inf) loss_scale 1024.0000 (1843.5651) mem 16715MB [2024-08-10 10:27:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][570/625] eta 0:00:25 lr 0.000934 wd 0.0500 time 0.4635 (0.4697) data time 0.0010 (0.0019) model time 0.4625 (0.4677) loss 3.8404 (3.0975) grad_norm 1.2775 (inf) loss_scale 1024.0000 (1829.2119) mem 16715MB [2024-08-10 10:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][580/625] eta 0:00:21 lr 0.000934 wd 0.0500 time 0.4603 (0.4696) data time 0.0010 (0.0019) model time 0.4593 (0.4676) loss 3.0334 (3.1009) grad_norm 1.4497 (inf) loss_scale 1024.0000 (1815.3528) mem 16715MB [2024-08-10 10:27:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][590/625] eta 0:00:16 lr 0.000934 wd 0.0500 time 0.4667 (0.4696) data time 0.0007 (0.0019) model time 0.4659 (0.4676) loss 3.2131 (3.1062) grad_norm 1.4845 (inf) loss_scale 1024.0000 (1801.9628) mem 16715MB [2024-08-10 10:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][600/625] eta 0:00:11 lr 0.000933 wd 0.0500 time 0.4696 (0.4695) data time 0.0009 (0.0018) model time 0.4687 (0.4675) loss 2.7895 (3.1065) grad_norm 0.9137 (inf) loss_scale 1024.0000 (1789.0183) mem 16715MB [2024-08-10 10:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][610/625] eta 0:00:07 lr 0.000933 wd 0.0500 time 0.6277 (0.4700) data time 0.0005 (0.0019) model time 0.6272 (0.4681) loss 2.0915 (3.1024) grad_norm 1.8576 (inf) loss_scale 1024.0000 (1776.4975) mem 16715MB [2024-08-10 10:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [107/300][620/625] eta 0:00:02 lr 0.000933 wd 0.0500 time 0.4605 (0.4699) data time 0.0008 (0.0018) model time 0.4597 (0.4680) loss 3.3736 (3.1040) grad_norm 1.1606 (inf) loss_scale 1024.0000 (1764.3800) mem 16715MB [2024-08-10 10:27:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 107 training takes 0:04:53 [2024-08-10 10:27:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:28:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.5894 (0.5894) Acc@1 87.988 (87.988) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 10:28:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.162) Loss 0.9810 (0.7356) Acc@1 75.586 (83.936) Acc@5 94.238 (97.017) Mem 16715MB [2024-08-10 10:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 1.0684 (0.8584) Acc@1 74.268 (80.966) Acc@5 93.408 (95.598) Mem 16715MB [2024-08-10 10:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.804 Acc@5 95.591 [2024-08-10 10:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-10 10:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.873 (0.873) Loss 0.4963 (0.4963) Acc@1 88.672 (88.672) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 10:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.197) Loss 0.7993 (0.6232) Acc@1 80.469 (86.009) Acc@5 95.801 (97.670) Mem 16715MB [2024-08-10 10:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.9111 (0.7351) Acc@1 77.441 (83.024) Acc@5 95.166 (96.484) Mem 16715MB [2024-08-10 10:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.716 Acc@5 96.507 [2024-08-10 10:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 10:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.72% [2024-08-10 10:28:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:28:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][0/625] eta 0:08:29 lr 0.000933 wd 0.0500 time 0.8151 (0.8151) data time 0.4035 (0.4035) model time 0.0000 (0.0000) loss 4.1315 (4.1315) grad_norm 1.0130 (1.0130) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][10/625] eta 0:05:14 lr 0.000933 wd 0.0500 time 0.4539 (0.5106) data time 0.0010 (0.0377) model time 0.0000 (0.0000) loss 3.0788 (3.0860) grad_norm 1.4122 (1.2756) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][20/625] eta 0:04:55 lr 0.000933 wd 0.0500 time 0.4686 (0.4883) data time 0.0010 (0.0203) model time 0.0000 (0.0000) loss 3.6501 (3.0505) grad_norm 1.0779 (1.3256) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][30/625] eta 0:04:49 lr 0.000933 wd 0.0500 time 0.4671 (0.4858) data time 0.0010 (0.0141) model time 0.0000 (0.0000) loss 3.2406 (2.9887) grad_norm 1.2343 (1.3699) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][40/625] eta 0:04:41 lr 0.000933 wd 0.0500 time 0.4736 (0.4818) data time 0.0010 (0.0109) model time 0.0000 (0.0000) loss 3.4369 (3.0443) grad_norm 1.7884 (1.6008) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][50/625] eta 0:04:35 lr 0.000933 wd 0.0500 time 0.4629 (0.4793) data time 0.0009 (0.0090) model time 0.0000 (0.0000) loss 2.2221 (3.1068) grad_norm 2.0397 (1.6490) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][60/625] eta 0:04:30 lr 0.000933 wd 0.0500 time 0.4688 (0.4779) data time 0.0009 (0.0077) model time 0.4679 (0.4701) loss 2.1772 (3.0946) grad_norm 1.5155 (1.6975) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][70/625] eta 0:04:26 lr 0.000933 wd 0.0500 time 0.4595 (0.4799) data time 0.0010 (0.0069) model time 0.4584 (0.4797) loss 3.2172 (3.0713) grad_norm 1.4153 (1.6785) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][80/625] eta 0:04:21 lr 0.000933 wd 0.0500 time 0.4605 (0.4804) data time 0.0010 (0.0062) model time 0.4595 (0.4807) loss 3.5179 (3.0886) grad_norm 1.6722 (1.6648) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][90/625] eta 0:04:15 lr 0.000932 wd 0.0500 time 0.4659 (0.4785) data time 0.0008 (0.0056) model time 0.4651 (0.4762) loss 3.0180 (3.0887) grad_norm 1.3241 (1.6334) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:28:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][100/625] eta 0:04:10 lr 0.000932 wd 0.0500 time 0.4661 (0.4772) data time 0.0008 (0.0052) model time 0.4653 (0.4737) loss 2.2427 (3.0797) grad_norm 1.9279 (1.6333) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][110/625] eta 0:04:05 lr 0.000932 wd 0.0500 time 0.4682 (0.4764) data time 0.0007 (0.0048) model time 0.4675 (0.4726) loss 3.2531 (3.0628) grad_norm 1.4733 (1.6142) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][120/625] eta 0:04:00 lr 0.000932 wd 0.0500 time 0.4754 (0.4756) data time 0.0009 (0.0045) model time 0.4745 (0.4717) loss 2.7448 (3.0565) grad_norm 1.6202 (1.5897) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][130/625] eta 0:03:55 lr 0.000932 wd 0.0500 time 0.4651 (0.4750) data time 0.0007 (0.0042) model time 0.4643 (0.4710) loss 2.7926 (3.0667) grad_norm 1.5227 (1.6109) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][140/625] eta 0:03:49 lr 0.000932 wd 0.0500 time 0.4605 (0.4742) data time 0.0011 (0.0040) model time 0.4594 (0.4701) loss 3.1163 (3.0752) grad_norm 0.9578 (1.6033) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][150/625] eta 0:03:44 lr 0.000932 wd 0.0500 time 0.4640 (0.4734) data time 0.0010 (0.0038) model time 0.4630 (0.4693) loss 3.6029 (3.0838) grad_norm 1.5504 (1.6002) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][160/625] eta 0:03:39 lr 0.000932 wd 0.0500 time 0.4640 (0.4728) data time 0.0007 (0.0036) model time 0.4633 (0.4686) loss 2.4673 (3.0821) grad_norm 2.0406 (1.6141) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][170/625] eta 0:03:34 lr 0.000932 wd 0.0500 time 0.4557 (0.4721) data time 0.0008 (0.0035) model time 0.4550 (0.4679) loss 3.1318 (3.0917) grad_norm 1.3352 (1.6155) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][180/625] eta 0:03:29 lr 0.000932 wd 0.0500 time 0.4651 (0.4718) data time 0.0008 (0.0034) model time 0.4643 (0.4677) loss 3.6093 (3.1156) grad_norm 1.3792 (1.5992) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][190/625] eta 0:03:25 lr 0.000932 wd 0.0500 time 0.4651 (0.4714) data time 0.0010 (0.0032) model time 0.4641 (0.4674) loss 3.4734 (3.1109) grad_norm 2.6527 (1.5847) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][200/625] eta 0:03:20 lr 0.000931 wd 0.0500 time 0.4736 (0.4713) data time 0.0011 (0.0031) model time 0.4725 (0.4674) loss 2.8429 (3.1119) grad_norm 1.3530 (1.5950) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][210/625] eta 0:03:15 lr 0.000931 wd 0.0500 time 0.6600 (0.4719) data time 0.0008 (0.0030) model time 0.6592 (0.4684) loss 2.8265 (3.0911) grad_norm 1.1589 (1.5849) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][220/625] eta 0:03:11 lr 0.000931 wd 0.0500 time 0.4649 (0.4728) data time 0.0010 (0.0029) model time 0.4639 (0.4697) loss 1.9761 (3.0838) grad_norm 1.1007 (1.5765) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][230/625] eta 0:03:06 lr 0.000931 wd 0.0500 time 0.4590 (0.4731) data time 0.0010 (0.0029) model time 0.4581 (0.4703) loss 3.8808 (3.0898) grad_norm 1.6141 (1.5750) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][240/625] eta 0:03:02 lr 0.000931 wd 0.0500 time 0.4663 (0.4727) data time 0.0008 (0.0028) model time 0.4655 (0.4699) loss 3.7022 (3.0907) grad_norm 1.3424 (1.5715) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][250/625] eta 0:02:57 lr 0.000931 wd 0.0500 time 0.4608 (0.4723) data time 0.0010 (0.0027) model time 0.4598 (0.4694) loss 3.2152 (3.0964) grad_norm 1.1914 (1.5652) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][260/625] eta 0:02:52 lr 0.000931 wd 0.0500 time 0.4696 (0.4728) data time 0.0007 (0.0027) model time 0.4689 (0.4701) loss 3.3577 (3.1074) grad_norm 2.2014 (1.5814) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][270/625] eta 0:02:47 lr 0.000931 wd 0.0500 time 0.4686 (0.4726) data time 0.0010 (0.0026) model time 0.4676 (0.4699) loss 2.9793 (3.1026) grad_norm 1.9930 (1.6174) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][280/625] eta 0:02:42 lr 0.000931 wd 0.0500 time 0.4595 (0.4723) data time 0.0010 (0.0025) model time 0.4585 (0.4697) loss 3.4263 (3.1015) grad_norm 1.6670 (1.6349) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][290/625] eta 0:02:38 lr 0.000931 wd 0.0500 time 0.4771 (0.4720) data time 0.0007 (0.0025) model time 0.4764 (0.4694) loss 3.3192 (3.1142) grad_norm 2.3556 (1.6271) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][300/625] eta 0:02:33 lr 0.000931 wd 0.0500 time 0.4669 (0.4717) data time 0.0010 (0.0024) model time 0.4659 (0.4691) loss 3.2673 (3.1189) grad_norm 1.6199 (1.6187) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][310/625] eta 0:02:28 lr 0.000930 wd 0.0500 time 0.4717 (0.4715) data time 0.0013 (0.0024) model time 0.4704 (0.4688) loss 2.8658 (3.1183) grad_norm 2.3241 (1.6161) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][320/625] eta 0:02:23 lr 0.000930 wd 0.0500 time 0.4632 (0.4712) data time 0.0007 (0.0024) model time 0.4625 (0.4686) loss 4.2418 (3.1220) grad_norm 1.3882 (1.6104) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][330/625] eta 0:02:18 lr 0.000930 wd 0.0500 time 0.4690 (0.4711) data time 0.0010 (0.0023) model time 0.4680 (0.4685) loss 2.9837 (3.1064) grad_norm 1.2528 (1.6078) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][340/625] eta 0:02:14 lr 0.000930 wd 0.0500 time 0.4648 (0.4709) data time 0.0008 (0.0023) model time 0.4641 (0.4683) loss 3.3422 (3.1025) grad_norm 2.4585 (1.6131) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][350/625] eta 0:02:09 lr 0.000930 wd 0.0500 time 0.4695 (0.4708) data time 0.0010 (0.0022) model time 0.4685 (0.4682) loss 3.2190 (3.1028) grad_norm 2.0874 (1.6180) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][360/625] eta 0:02:05 lr 0.000930 wd 0.0500 time 0.4583 (0.4721) data time 0.0007 (0.0022) model time 0.4576 (0.4699) loss 2.9549 (3.0973) grad_norm 1.5349 (1.6211) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][370/625] eta 0:02:00 lr 0.000930 wd 0.0500 time 0.4677 (0.4720) data time 0.0008 (0.0022) model time 0.4669 (0.4698) loss 3.0117 (3.0943) grad_norm 1.1547 (1.6343) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][380/625] eta 0:01:55 lr 0.000930 wd 0.0500 time 0.4665 (0.4723) data time 0.0010 (0.0021) model time 0.4656 (0.4701) loss 3.3804 (3.0950) grad_norm 1.0448 (1.6288) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][390/625] eta 0:01:50 lr 0.000930 wd 0.0500 time 0.4718 (0.4721) data time 0.0008 (0.0021) model time 0.4709 (0.4700) loss 3.4157 (3.0927) grad_norm 1.6264 (1.6261) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][400/625] eta 0:01:46 lr 0.000930 wd 0.0500 time 0.4664 (0.4720) data time 0.0010 (0.0021) model time 0.4655 (0.4699) loss 2.9065 (3.0903) grad_norm 1.3888 (1.6232) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][410/625] eta 0:01:41 lr 0.000930 wd 0.0500 time 0.4755 (0.4719) data time 0.0008 (0.0021) model time 0.4747 (0.4697) loss 3.6385 (3.0884) grad_norm 1.5574 (1.6203) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][420/625] eta 0:01:36 lr 0.000929 wd 0.0500 time 0.4653 (0.4722) data time 0.0008 (0.0020) model time 0.4645 (0.4701) loss 2.1473 (3.0929) grad_norm 1.3110 (1.6184) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][430/625] eta 0:01:32 lr 0.000929 wd 0.0500 time 0.4725 (0.4721) data time 0.0009 (0.0020) model time 0.4715 (0.4700) loss 3.3193 (3.0934) grad_norm 1.5211 (1.6218) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][440/625] eta 0:01:27 lr 0.000929 wd 0.0500 time 0.4686 (0.4723) data time 0.0010 (0.0020) model time 0.4676 (0.4703) loss 2.5336 (3.0916) grad_norm 1.3812 (1.6238) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][450/625] eta 0:01:22 lr 0.000929 wd 0.0500 time 0.4704 (0.4722) data time 0.0008 (0.0020) model time 0.4696 (0.4702) loss 2.3508 (3.0864) grad_norm 1.0245 (1.6275) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][460/625] eta 0:01:17 lr 0.000929 wd 0.0500 time 0.4613 (0.4720) data time 0.0008 (0.0020) model time 0.4605 (0.4700) loss 3.5097 (3.0859) grad_norm 1.2322 (1.6202) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][470/625] eta 0:01:13 lr 0.000929 wd 0.0500 time 0.4741 (0.4719) data time 0.0008 (0.0019) model time 0.4733 (0.4699) loss 2.6074 (3.0789) grad_norm 1.0957 (1.6151) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][480/625] eta 0:01:08 lr 0.000929 wd 0.0500 time 0.4748 (0.4723) data time 0.0008 (0.0019) model time 0.4740 (0.4704) loss 2.6781 (3.0839) grad_norm 1.5087 (1.6208) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][490/625] eta 0:01:03 lr 0.000929 wd 0.0500 time 0.4636 (0.4722) data time 0.0010 (0.0019) model time 0.4626 (0.4703) loss 2.9327 (3.0878) grad_norm 1.7256 (1.6291) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][500/625] eta 0:00:59 lr 0.000929 wd 0.0500 time 0.4672 (0.4720) data time 0.0007 (0.0019) model time 0.4664 (0.4701) loss 2.4766 (3.0900) grad_norm 2.4897 (1.6319) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][510/625] eta 0:00:54 lr 0.000929 wd 0.0500 time 0.4629 (0.4719) data time 0.0011 (0.0019) model time 0.4618 (0.4699) loss 3.3322 (3.0889) grad_norm 1.8049 (1.6305) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][520/625] eta 0:00:49 lr 0.000929 wd 0.0500 time 0.4623 (0.4717) data time 0.0010 (0.0019) model time 0.4613 (0.4697) loss 2.9998 (3.0933) grad_norm 1.3419 (1.6340) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][530/625] eta 0:00:44 lr 0.000929 wd 0.0500 time 0.4683 (0.4715) data time 0.0008 (0.0018) model time 0.4675 (0.4696) loss 3.3105 (3.0982) grad_norm 1.7245 (1.6315) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][540/625] eta 0:00:40 lr 0.000928 wd 0.0500 time 0.4629 (0.4713) data time 0.0009 (0.0018) model time 0.4619 (0.4694) loss 3.4012 (3.0969) grad_norm 1.4767 (1.6289) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][550/625] eta 0:00:35 lr 0.000928 wd 0.0500 time 0.4605 (0.4712) data time 0.0010 (0.0018) model time 0.4595 (0.4693) loss 3.1809 (3.1005) grad_norm 5.1634 (1.6363) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][560/625] eta 0:00:30 lr 0.000928 wd 0.0500 time 0.4742 (0.4711) data time 0.0010 (0.0018) model time 0.4732 (0.4692) loss 3.5435 (3.1030) grad_norm 1.0304 (1.6332) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][570/625] eta 0:00:25 lr 0.000928 wd 0.0500 time 0.4651 (0.4711) data time 0.0007 (0.0018) model time 0.4643 (0.4692) loss 3.0100 (3.1031) grad_norm 1.5026 (1.6267) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][580/625] eta 0:00:21 lr 0.000928 wd 0.0500 time 0.4633 (0.4710) data time 0.0009 (0.0018) model time 0.4623 (0.4691) loss 3.2093 (3.0990) grad_norm 1.8462 (1.6219) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][590/625] eta 0:00:16 lr 0.000928 wd 0.0500 time 0.4599 (0.4711) data time 0.0007 (0.0018) model time 0.4592 (0.4692) loss 3.6135 (3.0990) grad_norm 1.8366 (1.6318) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][600/625] eta 0:00:11 lr 0.000928 wd 0.0500 time 0.4609 (0.4710) data time 0.0008 (0.0017) model time 0.4601 (0.4691) loss 3.5880 (3.1008) grad_norm 1.7719 (1.6350) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][610/625] eta 0:00:07 lr 0.000928 wd 0.0500 time 0.4605 (0.4710) data time 0.0005 (0.0017) model time 0.4600 (0.4691) loss 3.7433 (3.1059) grad_norm 1.1814 (1.6343) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [108/300][620/625] eta 0:00:02 lr 0.000928 wd 0.0500 time 0.4594 (0.4708) data time 0.0008 (0.0017) model time 0.4586 (0.4690) loss 2.4885 (3.1051) grad_norm 1.2261 (1.6300) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 108 training takes 0:04:54 [2024-08-10 10:33:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:33:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5659 (0.5659) Acc@1 87.793 (87.793) Acc@5 98.145 (98.145) Mem 16715MB [2024-08-10 10:33:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.9146 (0.7065) Acc@1 77.246 (84.224) Acc@5 95.264 (97.168) Mem 16715MB [2024-08-10 10:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.0576 (0.8303) Acc@1 74.707 (81.066) Acc@5 93.701 (95.726) Mem 16715MB [2024-08-10 10:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.768 Acc@5 95.725 [2024-08-10 10:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-10 10:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.941 (0.941) Loss 0.4961 (0.4961) Acc@1 88.672 (88.672) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 10:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.200) Loss 0.7983 (0.6226) Acc@1 80.371 (86.071) Acc@5 95.947 (97.652) Mem 16715MB [2024-08-10 10:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.161) Loss 0.9106 (0.7341) Acc@1 77.393 (83.061) Acc@5 95.166 (96.484) Mem 16715MB [2024-08-10 10:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.762 Acc@5 96.513 [2024-08-10 10:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 10:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.76% [2024-08-10 10:33:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:33:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][0/625] eta 0:08:37 lr 0.000928 wd 0.0500 time 0.8285 (0.8285) data time 0.4241 (0.4241) model time 0.0000 (0.0000) loss 3.2192 (3.2192) grad_norm 1.1502 (1.1502) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][10/625] eta 0:05:07 lr 0.000928 wd 0.0500 time 0.4700 (0.4996) data time 0.0008 (0.0395) model time 0.0000 (0.0000) loss 3.8371 (3.1656) grad_norm 1.1080 (1.4967) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][20/625] eta 0:04:52 lr 0.000927 wd 0.0500 time 0.4704 (0.4831) data time 0.0010 (0.0212) model time 0.0000 (0.0000) loss 3.3080 (3.2396) grad_norm 1.3974 (1.4759) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][30/625] eta 0:04:51 lr 0.000927 wd 0.0500 time 0.4662 (0.4899) data time 0.0008 (0.0147) model time 0.0000 (0.0000) loss 2.0624 (3.1569) grad_norm 2.2023 (1.8470) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][40/625] eta 0:04:42 lr 0.000927 wd 0.0500 time 0.4637 (0.4834) data time 0.0010 (0.0114) model time 0.0000 (0.0000) loss 3.2075 (3.1498) grad_norm 1.6356 (1.8775) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][50/625] eta 0:04:35 lr 0.000927 wd 0.0500 time 0.4621 (0.4795) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 3.8260 (3.1414) grad_norm 1.4603 (1.7597) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][60/625] eta 0:04:29 lr 0.000927 wd 0.0500 time 0.4674 (0.4772) data time 0.0008 (0.0080) model time 0.4666 (0.4645) loss 3.8388 (3.1696) grad_norm 1.7824 (1.7182) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][70/625] eta 0:04:24 lr 0.000927 wd 0.0500 time 0.4636 (0.4760) data time 0.0008 (0.0070) model time 0.4628 (0.4659) loss 3.4142 (3.1461) grad_norm 1.3380 (1.6698) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][80/625] eta 0:04:18 lr 0.000927 wd 0.0500 time 0.4671 (0.4750) data time 0.0010 (0.0063) model time 0.4661 (0.4663) loss 3.2535 (3.1413) grad_norm 2.4645 (1.6446) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:33:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][90/625] eta 0:04:13 lr 0.000927 wd 0.0500 time 0.4641 (0.4741) data time 0.0007 (0.0057) model time 0.4634 (0.4661) loss 2.7040 (3.1217) grad_norm 1.2866 (1.6191) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][100/625] eta 0:04:08 lr 0.000927 wd 0.0500 time 0.4656 (0.4732) data time 0.0010 (0.0052) model time 0.4646 (0.4657) loss 2.9799 (3.0982) grad_norm 1.2201 (1.6465) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][110/625] eta 0:04:03 lr 0.000927 wd 0.0500 time 0.4653 (0.4724) data time 0.0008 (0.0049) model time 0.4645 (0.4652) loss 2.3122 (3.1046) grad_norm 2.2189 (1.6531) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][120/625] eta 0:03:58 lr 0.000927 wd 0.0500 time 0.4628 (0.4718) data time 0.0010 (0.0045) model time 0.4618 (0.4651) loss 3.2177 (3.0875) grad_norm 1.4001 (1.6446) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][130/625] eta 0:03:53 lr 0.000926 wd 0.0500 time 0.4672 (0.4725) data time 0.0007 (0.0043) model time 0.4664 (0.4670) loss 3.8382 (3.0944) grad_norm 1.5059 (1.6492) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][140/625] eta 0:03:48 lr 0.000926 wd 0.0500 time 0.4645 (0.4721) data time 0.0009 (0.0040) model time 0.4636 (0.4669) loss 3.6817 (3.0949) grad_norm 1.3769 (1.6436) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][150/625] eta 0:03:44 lr 0.000926 wd 0.0500 time 0.4086 (0.4728) data time 0.0011 (0.0038) model time 0.4075 (0.4683) loss 2.9710 (3.1065) grad_norm 1.3190 (1.6284) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][160/625] eta 0:03:39 lr 0.000926 wd 0.0500 time 0.4656 (0.4723) data time 0.0011 (0.0037) model time 0.4646 (0.4679) loss 2.3793 (3.0971) grad_norm 1.3029 (1.6093) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][170/625] eta 0:03:34 lr 0.000926 wd 0.0500 time 0.4697 (0.4719) data time 0.0010 (0.0035) model time 0.4687 (0.4676) loss 3.3048 (3.1010) grad_norm 1.7054 (1.6137) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][180/625] eta 0:03:29 lr 0.000926 wd 0.0500 time 0.4711 (0.4715) data time 0.0010 (0.0034) model time 0.4701 (0.4673) loss 3.5901 (3.1134) grad_norm 1.6091 (1.6140) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][190/625] eta 0:03:25 lr 0.000926 wd 0.0500 time 0.4638 (0.4722) data time 0.0010 (0.0032) model time 0.4628 (0.4686) loss 3.0029 (3.1173) grad_norm 2.0575 (1.6031) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][200/625] eta 0:03:20 lr 0.000926 wd 0.0500 time 0.4657 (0.4718) data time 0.0009 (0.0031) model time 0.4648 (0.4682) loss 3.5083 (3.1109) grad_norm 1.3066 (1.5990) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][210/625] eta 0:03:15 lr 0.000926 wd 0.0500 time 0.4637 (0.4716) data time 0.0009 (0.0030) model time 0.4628 (0.4681) loss 3.2197 (3.1068) grad_norm 1.2857 (1.5890) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][220/625] eta 0:03:10 lr 0.000926 wd 0.0500 time 0.4672 (0.4714) data time 0.0008 (0.0029) model time 0.4664 (0.4679) loss 2.8639 (3.1085) grad_norm 1.5818 (1.5982) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][230/625] eta 0:03:06 lr 0.000926 wd 0.0500 time 0.4610 (0.4711) data time 0.0010 (0.0029) model time 0.4601 (0.4676) loss 3.4366 (3.1037) grad_norm 2.9152 (1.6110) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][240/625] eta 0:03:01 lr 0.000925 wd 0.0500 time 0.4591 (0.4707) data time 0.0010 (0.0028) model time 0.4581 (0.4673) loss 3.3260 (3.0974) grad_norm 1.6177 (1.6205) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][250/625] eta 0:02:56 lr 0.000925 wd 0.0500 time 0.4665 (0.4704) data time 0.0008 (0.0027) model time 0.4658 (0.4670) loss 2.5121 (3.0996) grad_norm 1.4148 (1.6145) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][260/625] eta 0:02:51 lr 0.000925 wd 0.0500 time 0.4609 (0.4700) data time 0.0008 (0.0027) model time 0.4601 (0.4666) loss 2.7576 (3.0934) grad_norm 1.5806 (1.6085) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][270/625] eta 0:02:46 lr 0.000925 wd 0.0500 time 0.4639 (0.4697) data time 0.0011 (0.0026) model time 0.4628 (0.4664) loss 1.8222 (3.0929) grad_norm 1.6345 (1.5997) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][280/625] eta 0:02:41 lr 0.000925 wd 0.0500 time 0.4651 (0.4695) data time 0.0009 (0.0026) model time 0.4642 (0.4662) loss 2.5818 (3.0890) grad_norm 1.0588 (1.5936) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][290/625] eta 0:02:37 lr 0.000925 wd 0.0500 time 0.4647 (0.4693) data time 0.0010 (0.0025) model time 0.4637 (0.4661) loss 3.5454 (3.0947) grad_norm 1.2535 (1.5868) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][300/625] eta 0:02:32 lr 0.000925 wd 0.0500 time 0.6474 (0.4697) data time 0.0011 (0.0024) model time 0.6463 (0.4667) loss 3.2563 (3.0859) grad_norm 1.9072 (1.5870) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][310/625] eta 0:02:27 lr 0.000925 wd 0.0500 time 0.4561 (0.4694) data time 0.0008 (0.0024) model time 0.4553 (0.4664) loss 3.7169 (3.0956) grad_norm 1.5520 (1.6091) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][320/625] eta 0:02:23 lr 0.000925 wd 0.0500 time 0.4630 (0.4692) data time 0.0010 (0.0024) model time 0.4620 (0.4662) loss 3.2353 (3.0939) grad_norm 1.3994 (1.6269) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][330/625] eta 0:02:18 lr 0.000925 wd 0.0500 time 0.4637 (0.4689) data time 0.0010 (0.0023) model time 0.4627 (0.4660) loss 3.4736 (3.0944) grad_norm 2.2283 (1.6315) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][340/625] eta 0:02:13 lr 0.000925 wd 0.0500 time 0.4612 (0.4687) data time 0.0008 (0.0023) model time 0.4604 (0.4658) loss 3.5922 (3.0951) grad_norm 1.1227 (1.6276) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:35:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][350/625] eta 0:02:08 lr 0.000925 wd 0.0500 time 0.4612 (0.4686) data time 0.0008 (0.0022) model time 0.4604 (0.4657) loss 3.6015 (3.1018) grad_norm 1.6708 (1.6257) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][360/625] eta 0:02:04 lr 0.000924 wd 0.0500 time 0.4655 (0.4684) data time 0.0008 (0.0022) model time 0.4646 (0.4655) loss 3.6327 (3.0930) grad_norm 2.2496 (1.6277) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][370/625] eta 0:01:59 lr 0.000924 wd 0.0500 time 0.4666 (0.4694) data time 0.0010 (0.0022) model time 0.4656 (0.4667) loss 3.1491 (3.0953) grad_norm 1.5880 (1.6317) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][380/625] eta 0:01:54 lr 0.000924 wd 0.0500 time 0.4606 (0.4692) data time 0.0010 (0.0022) model time 0.4596 (0.4665) loss 3.4498 (3.0996) grad_norm 1.3361 (1.6278) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][390/625] eta 0:01:50 lr 0.000924 wd 0.0500 time 0.4525 (0.4690) data time 0.0011 (0.0021) model time 0.4514 (0.4664) loss 2.0241 (3.0991) grad_norm 1.1550 (1.6183) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][400/625] eta 0:01:45 lr 0.000924 wd 0.0500 time 0.4578 (0.4689) data time 0.0010 (0.0021) model time 0.4567 (0.4663) loss 2.8650 (3.1014) grad_norm 1.2251 (1.6127) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][410/625] eta 0:01:40 lr 0.000924 wd 0.0500 time 0.4587 (0.4697) data time 0.0009 (0.0021) model time 0.4579 (0.4672) loss 3.3374 (3.0995) grad_norm 1.4820 (1.6127) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][420/625] eta 0:01:36 lr 0.000924 wd 0.0500 time 0.4671 (0.4695) data time 0.0010 (0.0021) model time 0.4661 (0.4670) loss 3.4166 (3.0990) grad_norm 1.4932 (1.6269) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][430/625] eta 0:01:31 lr 0.000924 wd 0.0500 time 0.4735 (0.4695) data time 0.0007 (0.0021) model time 0.4728 (0.4670) loss 3.1450 (3.0965) grad_norm 1.9439 (1.6300) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][440/625] eta 0:01:26 lr 0.000924 wd 0.0500 time 0.4653 (0.4695) data time 0.0007 (0.0020) model time 0.4646 (0.4671) loss 2.7753 (3.0937) grad_norm 1.3509 (1.6279) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][450/625] eta 0:01:22 lr 0.000924 wd 0.0500 time 0.4634 (0.4694) data time 0.0010 (0.0020) model time 0.4625 (0.4670) loss 3.7278 (3.0954) grad_norm 1.6776 (1.6214) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][460/625] eta 0:01:17 lr 0.000924 wd 0.0500 time 0.4807 (0.4694) data time 0.0009 (0.0020) model time 0.4797 (0.4670) loss 3.1670 (3.0940) grad_norm 1.3822 (1.6171) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:36:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][470/625] eta 0:01:12 lr 0.000923 wd 0.0500 time 0.4615 (0.4697) data time 0.0009 (0.0020) model time 0.4606 (0.4674) loss 2.0925 (3.0924) grad_norm 1.5738 (1.6134) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][480/625] eta 0:01:08 lr 0.000923 wd 0.0500 time 0.4648 (0.4696) data time 0.0009 (0.0020) model time 0.4639 (0.4673) loss 3.3564 (3.0995) grad_norm 1.6801 (1.6085) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][490/625] eta 0:01:03 lr 0.000923 wd 0.0500 time 0.5568 (0.4701) data time 0.0008 (0.0020) model time 0.5560 (0.4679) loss 2.3562 (3.0985) grad_norm 1.2731 (1.6078) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][500/625] eta 0:00:58 lr 0.000923 wd 0.0500 time 0.4732 (0.4700) data time 0.0008 (0.0019) model time 0.4724 (0.4678) loss 2.2287 (3.1012) grad_norm 1.2960 (1.6038) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][510/625] eta 0:00:54 lr 0.000923 wd 0.0500 time 0.4647 (0.4699) data time 0.0007 (0.0019) model time 0.4641 (0.4678) loss 3.3706 (3.1034) grad_norm 1.1476 (1.6019) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][520/625] eta 0:00:49 lr 0.000923 wd 0.0500 time 0.4600 (0.4700) data time 0.0009 (0.0019) model time 0.4590 (0.4679) loss 3.0776 (3.1043) grad_norm 1.4059 (1.5975) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][530/625] eta 0:00:44 lr 0.000923 wd 0.0500 time 0.4624 (0.4699) data time 0.0008 (0.0019) model time 0.4616 (0.4678) loss 3.3131 (3.1066) grad_norm 1.6890 (1.5959) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][540/625] eta 0:00:39 lr 0.000923 wd 0.0500 time 0.4645 (0.4699) data time 0.0009 (0.0019) model time 0.4636 (0.4678) loss 3.2820 (3.1085) grad_norm 1.3804 (1.5923) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][550/625] eta 0:00:35 lr 0.000923 wd 0.0500 time 0.4614 (0.4700) data time 0.0008 (0.0019) model time 0.4607 (0.4679) loss 3.3602 (3.1095) grad_norm 1.8715 (1.5946) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][560/625] eta 0:00:30 lr 0.000923 wd 0.0500 time 0.4599 (0.4709) data time 0.0009 (0.0019) model time 0.4590 (0.4689) loss 2.1102 (3.1049) grad_norm 1.3812 (1.5982) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][570/625] eta 0:00:25 lr 0.000923 wd 0.0500 time 0.4631 (0.4709) data time 0.0010 (0.0019) model time 0.4622 (0.4689) loss 3.4309 (3.0994) grad_norm 1.0583 (1.5950) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][580/625] eta 0:00:21 lr 0.000922 wd 0.0500 time 0.4783 (0.4710) data time 0.0011 (0.0019) model time 0.4772 (0.4690) loss 3.3255 (3.1013) grad_norm 1.1296 (1.5927) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][590/625] eta 0:00:16 lr 0.000922 wd 0.0500 time 0.4606 (0.4710) data time 0.0011 (0.0018) model time 0.4595 (0.4691) loss 3.2891 (3.0988) grad_norm 1.1704 (1.5940) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][600/625] eta 0:00:11 lr 0.000922 wd 0.0500 time 0.4611 (0.4710) data time 0.0008 (0.0018) model time 0.4602 (0.4690) loss 3.3147 (3.0970) grad_norm 1.5907 (1.5941) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][610/625] eta 0:00:07 lr 0.000922 wd 0.0500 time 0.4742 (0.4709) data time 0.0008 (0.0018) model time 0.4734 (0.4690) loss 3.4310 (3.0979) grad_norm 1.1629 (1.5940) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [109/300][620/625] eta 0:00:02 lr 0.000922 wd 0.0500 time 0.4572 (0.4707) data time 0.0005 (0.0018) model time 0.4566 (0.4688) loss 3.4117 (3.0989) grad_norm 1.2204 (1.5901) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 109 training takes 0:04:54 [2024-08-10 10:38:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:38:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:38:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.5469 (0.5469) Acc@1 88.086 (88.086) Acc@5 98.340 (98.340) Mem 16715MB [2024-08-10 10:38:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.8770 (0.6835) Acc@1 79.053 (84.615) Acc@5 95.459 (97.257) Mem 16715MB [2024-08-10 10:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0137 (0.8174) Acc@1 75.098 (81.264) Acc@5 94.189 (95.787) Mem 16715MB [2024-08-10 10:38:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.956 Acc@5 95.765 [2024-08-10 10:38:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-10 10:38:14 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.96% [2024-08-10 10:38:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 10:38:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 10:38:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.4956 (0.4956) Acc@1 88.770 (88.770) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 10:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.167) Loss 0.7979 (0.6213) Acc@1 80.322 (86.119) Acc@5 95.898 (97.670) Mem 16715MB [2024-08-10 10:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.143) Loss 0.9106 (0.7327) Acc@1 77.246 (83.126) Acc@5 95.117 (96.468) Mem 16715MB [2024-08-10 10:38:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.823 Acc@5 96.507 [2024-08-10 10:38:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 10:38:19 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.82% [2024-08-10 10:38:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:38:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][0/625] eta 0:09:40 lr 0.000922 wd 0.0500 time 0.9283 (0.9283) data time 0.5133 (0.5133) model time 0.0000 (0.0000) loss 2.1725 (2.1725) grad_norm 1.2055 (1.2055) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][10/625] eta 0:05:12 lr 0.000922 wd 0.0500 time 0.4645 (0.5087) data time 0.0010 (0.0490) model time 0.0000 (0.0000) loss 3.2396 (2.9304) grad_norm 1.4701 (1.6230) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][20/625] eta 0:04:55 lr 0.000922 wd 0.0500 time 0.4613 (0.4879) data time 0.0010 (0.0261) model time 0.0000 (0.0000) loss 3.4144 (3.0147) grad_norm 1.4220 (1.5390) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][30/625] eta 0:04:45 lr 0.000922 wd 0.0500 time 0.4650 (0.4806) data time 0.0008 (0.0181) model time 0.0000 (0.0000) loss 2.9693 (2.9947) grad_norm 1.3162 (1.6168) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][40/625] eta 0:04:38 lr 0.000922 wd 0.0500 time 0.4633 (0.4764) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 3.2521 (2.9996) grad_norm 1.2114 (1.5587) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][50/625] eta 0:04:34 lr 0.000922 wd 0.0500 time 0.4610 (0.4766) data time 0.0010 (0.0114) model time 0.0000 (0.0000) loss 3.0553 (3.0119) grad_norm 1.0480 (1.5256) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][60/625] eta 0:04:27 lr 0.000921 wd 0.0500 time 0.4620 (0.4743) data time 0.0010 (0.0097) model time 0.4610 (0.4614) loss 3.2543 (3.0032) grad_norm 1.5078 (1.5252) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][70/625] eta 0:04:22 lr 0.000921 wd 0.0500 time 0.4644 (0.4731) data time 0.0007 (0.0085) model time 0.4636 (0.4629) loss 2.3987 (3.0329) grad_norm 1.7312 (1.5204) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:38:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][80/625] eta 0:04:18 lr 0.000921 wd 0.0500 time 0.4613 (0.4746) data time 0.0008 (0.0076) model time 0.4605 (0.4702) loss 3.3465 (2.9949) grad_norm 1.5812 (1.5040) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][90/625] eta 0:04:13 lr 0.000921 wd 0.0500 time 0.4603 (0.4734) data time 0.0011 (0.0068) model time 0.4591 (0.4682) loss 3.0852 (2.9996) grad_norm 1.8231 (1.5110) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][100/625] eta 0:04:07 lr 0.000921 wd 0.0500 time 0.4521 (0.4722) data time 0.0010 (0.0063) model time 0.4511 (0.4667) loss 3.3263 (3.0099) grad_norm 1.1874 (1.6731) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][110/625] eta 0:04:02 lr 0.000921 wd 0.0500 time 0.4635 (0.4715) data time 0.0011 (0.0058) model time 0.4624 (0.4660) loss 2.6426 (3.0318) grad_norm 1.2754 (1.6668) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][120/625] eta 0:03:59 lr 0.000921 wd 0.0500 time 0.4682 (0.4746) data time 0.0011 (0.0054) model time 0.4671 (0.4721) loss 3.1513 (3.0561) grad_norm 1.0226 (1.6683) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][130/625] eta 0:03:56 lr 0.000921 wd 0.0500 time 0.4579 (0.4780) data time 0.0008 (0.0051) model time 0.4571 (0.4778) loss 3.4889 (3.0471) grad_norm 1.0006 (1.7229) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][140/625] eta 0:03:51 lr 0.000921 wd 0.0500 time 0.4605 (0.4768) data time 0.0009 (0.0048) model time 0.4597 (0.4759) loss 2.4603 (3.0545) grad_norm 1.3265 (1.7004) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][150/625] eta 0:03:46 lr 0.000921 wd 0.0500 time 0.4632 (0.4760) data time 0.0011 (0.0046) model time 0.4621 (0.4746) loss 3.2041 (3.0656) grad_norm 1.2221 (1.6922) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][160/625] eta 0:03:41 lr 0.000921 wd 0.0500 time 0.4621 (0.4753) data time 0.0010 (0.0044) model time 0.4611 (0.4735) loss 2.8239 (3.0774) grad_norm 1.4051 (1.6687) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][170/625] eta 0:03:35 lr 0.000920 wd 0.0500 time 0.4649 (0.4747) data time 0.0010 (0.0042) model time 0.4639 (0.4727) loss 3.5580 (3.0717) grad_norm 1.8382 (1.6578) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][180/625] eta 0:03:30 lr 0.000920 wd 0.0500 time 0.4604 (0.4741) data time 0.0011 (0.0040) model time 0.4593 (0.4719) loss 2.3388 (3.0572) grad_norm 1.4740 (1.6579) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][190/625] eta 0:03:25 lr 0.000920 wd 0.0500 time 0.4610 (0.4735) data time 0.0010 (0.0039) model time 0.4600 (0.4712) loss 2.6325 (3.0627) grad_norm 0.9133 (1.6512) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:39:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][200/625] eta 0:03:21 lr 0.000920 wd 0.0500 time 0.4591 (0.4730) data time 0.0009 (0.0037) model time 0.4582 (0.4706) loss 2.5604 (3.0623) grad_norm 1.9024 (1.6493) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][210/625] eta 0:03:16 lr 0.000920 wd 0.0500 time 0.4781 (0.4726) data time 0.0010 (0.0036) model time 0.4771 (0.4701) loss 3.1284 (3.0677) grad_norm 1.1448 (1.6351) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][220/625] eta 0:03:11 lr 0.000920 wd 0.0500 time 0.4679 (0.4720) data time 0.0008 (0.0035) model time 0.4671 (0.4695) loss 3.1636 (3.0768) grad_norm 1.1688 (1.6356) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][230/625] eta 0:03:06 lr 0.000920 wd 0.0500 time 0.4643 (0.4719) data time 0.0012 (0.0034) model time 0.4631 (0.4694) loss 2.7801 (3.0639) grad_norm 1.4663 (1.6485) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][240/625] eta 0:03:01 lr 0.000920 wd 0.0500 time 0.4642 (0.4718) data time 0.0008 (0.0033) model time 0.4634 (0.4693) loss 3.1224 (3.0693) grad_norm 1.5678 (1.6431) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][250/625] eta 0:02:56 lr 0.000920 wd 0.0500 time 0.4628 (0.4715) data time 0.0010 (0.0032) model time 0.4617 (0.4691) loss 3.3233 (3.0851) grad_norm 1.1976 (1.6326) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][260/625] eta 0:02:51 lr 0.000920 wd 0.0500 time 0.4644 (0.4712) data time 0.0009 (0.0031) model time 0.4635 (0.4688) loss 2.6046 (3.0802) grad_norm 1.4825 (1.6322) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][270/625] eta 0:02:47 lr 0.000920 wd 0.0500 time 0.4639 (0.4709) data time 0.0008 (0.0031) model time 0.4631 (0.4684) loss 3.4053 (3.0786) grad_norm 1.2772 (1.6285) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][280/625] eta 0:02:42 lr 0.000919 wd 0.0500 time 0.4634 (0.4707) data time 0.0007 (0.0030) model time 0.4626 (0.4682) loss 3.6513 (3.0848) grad_norm 1.0353 (1.6206) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][290/625] eta 0:02:37 lr 0.000919 wd 0.0500 time 0.4650 (0.4704) data time 0.0010 (0.0029) model time 0.4640 (0.4680) loss 3.2980 (3.0898) grad_norm 1.5332 (1.6232) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][300/625] eta 0:02:33 lr 0.000919 wd 0.0500 time 0.4748 (0.4709) data time 0.0011 (0.0029) model time 0.4737 (0.4685) loss 3.6464 (3.0943) grad_norm 1.2396 (1.6269) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][310/625] eta 0:02:28 lr 0.000919 wd 0.0500 time 0.4639 (0.4707) data time 0.0010 (0.0028) model time 0.4629 (0.4685) loss 2.4326 (3.0972) grad_norm 1.6760 (1.6218) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][320/625] eta 0:02:23 lr 0.000919 wd 0.0500 time 0.4618 (0.4706) data time 0.0008 (0.0028) model time 0.4611 (0.4683) loss 2.7212 (3.0941) grad_norm 1.5683 (1.6154) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][330/625] eta 0:02:18 lr 0.000919 wd 0.0500 time 0.4602 (0.4704) data time 0.0010 (0.0027) model time 0.4592 (0.4681) loss 2.7488 (3.0947) grad_norm 1.3417 (1.6192) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][340/625] eta 0:02:13 lr 0.000919 wd 0.0500 time 0.4648 (0.4701) data time 0.0008 (0.0027) model time 0.4640 (0.4679) loss 3.4860 (3.0986) grad_norm 2.2681 (1.6224) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][350/625] eta 0:02:09 lr 0.000919 wd 0.0500 time 0.4590 (0.4709) data time 0.0010 (0.0026) model time 0.4580 (0.4688) loss 2.7087 (3.0977) grad_norm 1.4947 (1.6269) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][360/625] eta 0:02:04 lr 0.000919 wd 0.0500 time 0.4684 (0.4712) data time 0.0010 (0.0026) model time 0.4674 (0.4692) loss 3.2957 (3.1017) grad_norm 1.0490 (1.6247) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][370/625] eta 0:02:00 lr 0.000919 wd 0.0500 time 0.4651 (0.4710) data time 0.0010 (0.0025) model time 0.4641 (0.4690) loss 2.6556 (3.1000) grad_norm 1.4155 (1.6219) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][380/625] eta 0:01:55 lr 0.000919 wd 0.0500 time 0.4620 (0.4709) data time 0.0008 (0.0025) model time 0.4613 (0.4689) loss 2.6375 (3.0930) grad_norm 1.5924 (1.6170) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][390/625] eta 0:01:50 lr 0.000918 wd 0.0500 time 0.4625 (0.4708) data time 0.0008 (0.0025) model time 0.4617 (0.4688) loss 3.3360 (3.1004) grad_norm 1.1575 (1.6123) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][400/625] eta 0:01:45 lr 0.000918 wd 0.0500 time 0.4663 (0.4707) data time 0.0010 (0.0024) model time 0.4653 (0.4687) loss 2.9972 (3.1008) grad_norm 1.2253 (1.6122) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][410/625] eta 0:01:41 lr 0.000918 wd 0.0500 time 0.4599 (0.4705) data time 0.0011 (0.0024) model time 0.4588 (0.4685) loss 2.6800 (3.1070) grad_norm 1.3023 (1.6068) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][420/625] eta 0:01:36 lr 0.000918 wd 0.0500 time 0.4745 (0.4703) data time 0.0010 (0.0024) model time 0.4735 (0.4684) loss 3.0708 (3.1069) grad_norm 1.1279 (1.6015) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][430/625] eta 0:01:31 lr 0.000918 wd 0.0500 time 0.4610 (0.4701) data time 0.0008 (0.0023) model time 0.4602 (0.4682) loss 3.4689 (3.1072) grad_norm 1.6354 (1.5982) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][440/625] eta 0:01:26 lr 0.000918 wd 0.0500 time 0.4684 (0.4700) data time 0.0010 (0.0023) model time 0.4674 (0.4680) loss 2.9160 (3.1113) grad_norm 1.7139 (1.5984) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][450/625] eta 0:01:22 lr 0.000918 wd 0.0500 time 0.7306 (0.4709) data time 0.0009 (0.0023) model time 0.7297 (0.4690) loss 2.5630 (3.1107) grad_norm 1.2869 (1.5929) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:41:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][460/625] eta 0:01:17 lr 0.000918 wd 0.0500 time 0.4685 (0.4711) data time 0.0010 (0.0022) model time 0.4675 (0.4693) loss 2.9914 (3.1100) grad_norm 1.6684 (1.5910) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][470/625] eta 0:01:13 lr 0.000918 wd 0.0500 time 0.4598 (0.4713) data time 0.0010 (0.0022) model time 0.4587 (0.4696) loss 3.1375 (3.1099) grad_norm 1.2706 (1.5908) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][480/625] eta 0:01:08 lr 0.000918 wd 0.0500 time 0.4567 (0.4711) data time 0.0009 (0.0022) model time 0.4559 (0.4694) loss 2.7380 (3.1097) grad_norm 1.1800 (1.5873) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][490/625] eta 0:01:03 lr 0.000918 wd 0.0500 time 0.4652 (0.4710) data time 0.0008 (0.0022) model time 0.4643 (0.4692) loss 3.5272 (3.1091) grad_norm 2.1168 (1.5866) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][500/625] eta 0:00:58 lr 0.000917 wd 0.0500 time 0.4620 (0.4720) data time 0.0010 (0.0022) model time 0.4610 (0.4703) loss 3.4324 (3.1071) grad_norm 1.1036 (1.5828) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][510/625] eta 0:00:54 lr 0.000917 wd 0.0500 time 0.4614 (0.4718) data time 0.0011 (0.0021) model time 0.4602 (0.4701) loss 3.1043 (3.1083) grad_norm 1.4998 (1.5799) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][520/625] eta 0:00:49 lr 0.000917 wd 0.0500 time 0.4627 (0.4717) data time 0.0012 (0.0021) model time 0.4615 (0.4700) loss 3.2622 (3.1094) grad_norm 1.8800 (1.5780) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][530/625] eta 0:00:44 lr 0.000917 wd 0.0500 time 0.4622 (0.4716) data time 0.0010 (0.0021) model time 0.4612 (0.4699) loss 3.2472 (3.1081) grad_norm 1.0590 (1.5777) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][540/625] eta 0:00:40 lr 0.000917 wd 0.0500 time 0.4655 (0.4715) data time 0.0010 (0.0021) model time 0.4645 (0.4698) loss 3.1433 (3.1086) grad_norm 1.3283 (1.5765) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][550/625] eta 0:00:35 lr 0.000917 wd 0.0500 time 0.4663 (0.4713) data time 0.0010 (0.0021) model time 0.4653 (0.4697) loss 2.6585 (3.1075) grad_norm 2.2070 (1.5815) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][560/625] eta 0:00:30 lr 0.000917 wd 0.0500 time 0.4625 (0.4712) data time 0.0010 (0.0020) model time 0.4615 (0.4696) loss 2.2301 (3.1026) grad_norm 1.2508 (1.5766) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][570/625] eta 0:00:25 lr 0.000917 wd 0.0500 time 0.4623 (0.4711) data time 0.0010 (0.0020) model time 0.4614 (0.4695) loss 3.0540 (3.1050) grad_norm 1.2960 (1.5753) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:42:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][580/625] eta 0:00:21 lr 0.000917 wd 0.0500 time 0.4657 (0.4710) data time 0.0009 (0.0020) model time 0.4648 (0.4693) loss 2.9022 (3.1035) grad_norm 1.4266 (1.5721) loss_scale 2048.0000 (1036.3373) mem 16715MB [2024-08-10 10:42:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][590/625] eta 0:00:16 lr 0.000917 wd 0.0500 time 0.4646 (0.4708) data time 0.0007 (0.0020) model time 0.4639 (0.4692) loss 3.0198 (3.1064) grad_norm 2.2524 (1.5701) loss_scale 2048.0000 (1053.4552) mem 16715MB [2024-08-10 10:43:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][600/625] eta 0:00:11 lr 0.000917 wd 0.0500 time 0.4680 (0.4708) data time 0.0010 (0.0020) model time 0.4670 (0.4691) loss 2.6236 (3.1056) grad_norm 2.0688 (1.5731) loss_scale 2048.0000 (1070.0033) mem 16715MB [2024-08-10 10:43:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][610/625] eta 0:00:07 lr 0.000917 wd 0.0500 time 0.4618 (0.4710) data time 0.0007 (0.0020) model time 0.4611 (0.4694) loss 2.4473 (3.1115) grad_norm 1.6501 (1.5707) loss_scale 2048.0000 (1086.0098) mem 16715MB [2024-08-10 10:43:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [110/300][620/625] eta 0:00:02 lr 0.000916 wd 0.0500 time 0.4604 (0.4709) data time 0.0005 (0.0019) model time 0.4599 (0.4692) loss 3.7654 (3.1080) grad_norm 1.6752 (1.5675) loss_scale 2048.0000 (1101.5008) mem 16715MB [2024-08-10 10:43:15 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 110 training takes 0:04:54 [2024-08-10 10:43:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:43:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5762 (0.5762) Acc@1 87.598 (87.598) Acc@5 97.949 (97.949) Mem 16715MB [2024-08-10 10:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.8936 (0.6961) Acc@1 77.344 (84.433) Acc@5 94.873 (97.172) Mem 16715MB [2024-08-10 10:43:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.0166 (0.8238) Acc@1 75.439 (81.159) Acc@5 94.043 (95.752) Mem 16715MB [2024-08-10 10:43:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.958 Acc@5 95.729 [2024-08-10 10:43:20 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-10 10:43:20 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 80.96% [2024-08-10 10:43:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 10:43:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 10:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.4954 (0.4954) Acc@1 88.721 (88.721) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 10:43:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.7979 (0.6202) Acc@1 80.469 (86.155) Acc@5 95.996 (97.687) Mem 16715MB [2024-08-10 10:43:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9102 (0.7312) Acc@1 77.393 (83.185) Acc@5 95.312 (96.508) Mem 16715MB [2024-08-10 10:43:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.875 Acc@5 96.533 [2024-08-10 10:43:25 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 10:43:25 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.88% [2024-08-10 10:43:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:43:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][0/625] eta 0:08:46 lr 0.000916 wd 0.0500 time 0.8427 (0.8427) data time 0.4360 (0.4360) model time 0.0000 (0.0000) loss 4.0688 (4.0688) grad_norm 1.6509 (1.6509) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 10:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][10/625] eta 0:05:05 lr 0.000916 wd 0.0500 time 0.4639 (0.4968) data time 0.0009 (0.0407) model time 0.0000 (0.0000) loss 3.1140 (3.2470) grad_norm 1.8931 (inf) loss_scale 1024.0000 (1489.4545) mem 16715MB [2024-08-10 10:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][20/625] eta 0:04:51 lr 0.000916 wd 0.0500 time 0.4635 (0.4814) data time 0.0009 (0.0218) model time 0.0000 (0.0000) loss 2.7525 (3.1868) grad_norm 2.5984 (inf) loss_scale 1024.0000 (1267.8095) mem 16715MB [2024-08-10 10:43:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][30/625] eta 0:04:47 lr 0.000916 wd 0.0500 time 0.4662 (0.4836) data time 0.0008 (0.0151) model time 0.0000 (0.0000) loss 2.5131 (3.1508) grad_norm 1.4959 (inf) loss_scale 1024.0000 (1189.1613) mem 16715MB [2024-08-10 10:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][40/625] eta 0:04:40 lr 0.000916 wd 0.0500 time 0.4608 (0.4787) data time 0.0010 (0.0117) model time 0.0000 (0.0000) loss 3.0853 (3.1964) grad_norm 1.8790 (inf) loss_scale 1024.0000 (1148.8780) mem 16715MB [2024-08-10 10:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][50/625] eta 0:04:33 lr 0.000916 wd 0.0500 time 0.4651 (0.4764) data time 0.0010 (0.0096) model time 0.0000 (0.0000) loss 3.1546 (3.1540) grad_norm 1.3661 (inf) loss_scale 1024.0000 (1124.3922) mem 16715MB [2024-08-10 10:43:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][60/625] eta 0:04:28 lr 0.000916 wd 0.0500 time 0.4667 (0.4745) data time 0.0008 (0.0082) model time 0.4659 (0.4636) loss 2.3045 (3.1061) grad_norm 1.3647 (inf) loss_scale 1024.0000 (1107.9344) mem 16715MB [2024-08-10 10:44:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][70/625] eta 0:04:23 lr 0.000916 wd 0.0500 time 0.4624 (0.4754) data time 0.0008 (0.0072) model time 0.4616 (0.4717) loss 3.5546 (3.0961) grad_norm 1.3316 (inf) loss_scale 1024.0000 (1096.1127) mem 16715MB [2024-08-10 10:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][80/625] eta 0:04:18 lr 0.000916 wd 0.0500 time 0.4610 (0.4737) data time 0.0011 (0.0065) model time 0.4600 (0.4682) loss 3.1491 (3.0789) grad_norm 1.4293 (inf) loss_scale 1024.0000 (1087.2099) mem 16715MB [2024-08-10 10:44:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][90/625] eta 0:04:15 lr 0.000916 wd 0.0500 time 0.4562 (0.4768) data time 0.0008 (0.0059) model time 0.4553 (0.4763) loss 2.8379 (3.0912) grad_norm 1.3904 (inf) loss_scale 1024.0000 (1080.2637) mem 16715MB [2024-08-10 10:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][100/625] eta 0:04:09 lr 0.000915 wd 0.0500 time 0.4601 (0.4754) data time 0.0010 (0.0054) model time 0.4590 (0.4733) loss 3.4786 (3.0924) grad_norm 0.9006 (inf) loss_scale 1024.0000 (1074.6931) mem 16715MB [2024-08-10 10:44:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][110/625] eta 0:04:04 lr 0.000915 wd 0.0500 time 0.4651 (0.4743) data time 0.0008 (0.0050) model time 0.4642 (0.4715) loss 2.8858 (3.0846) grad_norm 1.0974 (inf) loss_scale 1024.0000 (1070.1261) mem 16715MB [2024-08-10 10:44:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][120/625] eta 0:03:59 lr 0.000915 wd 0.0500 time 0.4656 (0.4750) data time 0.0011 (0.0047) model time 0.4645 (0.4730) loss 3.6308 (3.0905) grad_norm 1.5794 (inf) loss_scale 1024.0000 (1066.3140) mem 16715MB [2024-08-10 10:44:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][130/625] eta 0:03:54 lr 0.000915 wd 0.0500 time 0.4598 (0.4742) data time 0.0008 (0.0044) model time 0.4590 (0.4717) loss 3.3506 (3.0692) grad_norm 1.3934 (inf) loss_scale 1024.0000 (1063.0840) mem 16715MB [2024-08-10 10:44:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][140/625] eta 0:03:49 lr 0.000915 wd 0.0500 time 0.4607 (0.4734) data time 0.0013 (0.0042) model time 0.4594 (0.4705) loss 3.2439 (3.0596) grad_norm 1.6439 (inf) loss_scale 1024.0000 (1060.3121) mem 16715MB [2024-08-10 10:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][150/625] eta 0:03:44 lr 0.000915 wd 0.0500 time 0.4614 (0.4727) data time 0.0011 (0.0040) model time 0.4603 (0.4698) loss 3.7123 (3.0679) grad_norm 1.3450 (inf) loss_scale 1024.0000 (1057.9073) mem 16715MB [2024-08-10 10:44:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][160/625] eta 0:03:39 lr 0.000915 wd 0.0500 time 0.4616 (0.4722) data time 0.0008 (0.0038) model time 0.4608 (0.4691) loss 3.4595 (3.0725) grad_norm 1.1198 (inf) loss_scale 1024.0000 (1055.8012) mem 16715MB [2024-08-10 10:44:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][170/625] eta 0:03:34 lr 0.000915 wd 0.0500 time 0.4798 (0.4718) data time 0.0008 (0.0036) model time 0.4790 (0.4688) loss 2.7800 (3.0607) grad_norm 1.4893 (inf) loss_scale 1024.0000 (1053.9415) mem 16715MB [2024-08-10 10:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][180/625] eta 0:03:29 lr 0.000915 wd 0.0500 time 0.4624 (0.4714) data time 0.0008 (0.0035) model time 0.4616 (0.4684) loss 3.4804 (3.0541) grad_norm 2.0962 (inf) loss_scale 1024.0000 (1052.2873) mem 16715MB [2024-08-10 10:44:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][190/625] eta 0:03:24 lr 0.000915 wd 0.0500 time 0.4649 (0.4712) data time 0.0010 (0.0034) model time 0.4639 (0.4682) loss 3.3662 (3.0559) grad_norm 1.7333 (inf) loss_scale 1024.0000 (1050.8063) mem 16715MB [2024-08-10 10:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][200/625] eta 0:03:20 lr 0.000915 wd 0.0500 time 0.4621 (0.4709) data time 0.0010 (0.0033) model time 0.4610 (0.4678) loss 3.4838 (3.0645) grad_norm 2.1522 (inf) loss_scale 1024.0000 (1049.4726) mem 16715MB [2024-08-10 10:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][210/625] eta 0:03:15 lr 0.000914 wd 0.0500 time 0.4578 (0.4711) data time 0.0008 (0.0032) model time 0.4570 (0.4683) loss 2.2230 (3.0628) grad_norm 2.0692 (inf) loss_scale 1024.0000 (1048.2654) mem 16715MB [2024-08-10 10:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][220/625] eta 0:03:11 lr 0.000914 wd 0.0500 time 0.4730 (0.4716) data time 0.0008 (0.0031) model time 0.4722 (0.4691) loss 3.0098 (3.0482) grad_norm 1.3624 (inf) loss_scale 1024.0000 (1047.1674) mem 16715MB [2024-08-10 10:45:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][230/625] eta 0:03:06 lr 0.000914 wd 0.0500 time 0.4640 (0.4713) data time 0.0009 (0.0030) model time 0.4631 (0.4688) loss 2.1853 (3.0473) grad_norm 1.3860 (inf) loss_scale 1024.0000 (1046.1645) mem 16715MB [2024-08-10 10:45:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][240/625] eta 0:03:01 lr 0.000914 wd 0.0500 time 0.4538 (0.4709) data time 0.0010 (0.0029) model time 0.4528 (0.4683) loss 3.0736 (3.0500) grad_norm 1.4816 (inf) loss_scale 1024.0000 (1045.2448) mem 16715MB [2024-08-10 10:45:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][250/625] eta 0:02:56 lr 0.000914 wd 0.0500 time 0.4652 (0.4708) data time 0.0012 (0.0028) model time 0.4640 (0.4682) loss 2.6847 (3.0470) grad_norm 1.3705 (inf) loss_scale 1024.0000 (1044.3984) mem 16715MB [2024-08-10 10:45:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][260/625] eta 0:02:51 lr 0.000914 wd 0.0500 time 0.4643 (0.4706) data time 0.0007 (0.0028) model time 0.4635 (0.4681) loss 2.4755 (3.0513) grad_norm 1.1958 (inf) loss_scale 1024.0000 (1043.6169) mem 16715MB [2024-08-10 10:45:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][270/625] eta 0:02:47 lr 0.000914 wd 0.0500 time 0.4587 (0.4705) data time 0.0009 (0.0027) model time 0.4578 (0.4680) loss 2.0800 (3.0471) grad_norm 2.2875 (inf) loss_scale 1024.0000 (1042.8930) mem 16715MB [2024-08-10 10:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][280/625] eta 0:02:42 lr 0.000914 wd 0.0500 time 0.4645 (0.4704) data time 0.0010 (0.0027) model time 0.4635 (0.4680) loss 3.2188 (3.0460) grad_norm 1.9048 (inf) loss_scale 1024.0000 (1042.2206) mem 16715MB [2024-08-10 10:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][290/625] eta 0:02:37 lr 0.000914 wd 0.0500 time 0.4617 (0.4701) data time 0.0008 (0.0026) model time 0.4609 (0.4677) loss 2.5335 (3.0448) grad_norm 1.3775 (inf) loss_scale 1024.0000 (1041.5945) mem 16715MB [2024-08-10 10:45:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][300/625] eta 0:02:32 lr 0.000914 wd 0.0500 time 0.4626 (0.4700) data time 0.0008 (0.0026) model time 0.4618 (0.4675) loss 3.5748 (3.0471) grad_norm 1.5542 (inf) loss_scale 1024.0000 (1041.0100) mem 16715MB [2024-08-10 10:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][310/625] eta 0:02:27 lr 0.000914 wd 0.0500 time 0.4607 (0.4697) data time 0.0010 (0.0025) model time 0.4597 (0.4673) loss 3.0066 (3.0435) grad_norm 1.3270 (inf) loss_scale 1024.0000 (1040.4630) mem 16715MB [2024-08-10 10:45:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][320/625] eta 0:02:23 lr 0.000913 wd 0.0500 time 0.4655 (0.4696) data time 0.0010 (0.0025) model time 0.4644 (0.4672) loss 2.8719 (3.0378) grad_norm 1.7213 (inf) loss_scale 1024.0000 (1039.9502) mem 16715MB [2024-08-10 10:46:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][330/625] eta 0:02:18 lr 0.000913 wd 0.0500 time 0.4625 (0.4695) data time 0.0010 (0.0024) model time 0.4615 (0.4672) loss 3.1159 (3.0402) grad_norm 6.1078 (inf) loss_scale 1024.0000 (1039.4683) mem 16715MB [2024-08-10 10:46:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][340/625] eta 0:02:13 lr 0.000913 wd 0.0500 time 0.4689 (0.4694) data time 0.0010 (0.0024) model time 0.4679 (0.4670) loss 3.0929 (3.0449) grad_norm 1.1262 (inf) loss_scale 1024.0000 (1039.0147) mem 16715MB [2024-08-10 10:46:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][350/625] eta 0:02:09 lr 0.000913 wd 0.0500 time 0.4642 (0.4693) data time 0.0010 (0.0024) model time 0.4632 (0.4669) loss 3.3270 (3.0560) grad_norm 1.1779 (inf) loss_scale 1024.0000 (1038.5869) mem 16715MB [2024-08-10 10:46:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][360/625] eta 0:02:04 lr 0.000913 wd 0.0500 time 0.4624 (0.4695) data time 0.0011 (0.0023) model time 0.4614 (0.4673) loss 3.0631 (3.0552) grad_norm 1.5681 (inf) loss_scale 1024.0000 (1038.1828) mem 16715MB [2024-08-10 10:46:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][370/625] eta 0:01:59 lr 0.000913 wd 0.0500 time 0.4637 (0.4694) data time 0.0007 (0.0023) model time 0.4630 (0.4672) loss 4.2613 (3.0583) grad_norm 1.8095 (inf) loss_scale 1024.0000 (1037.8005) mem 16715MB [2024-08-10 10:46:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][380/625] eta 0:01:54 lr 0.000913 wd 0.0500 time 0.4616 (0.4693) data time 0.0008 (0.0023) model time 0.4609 (0.4670) loss 2.4184 (3.0551) grad_norm 1.3153 (inf) loss_scale 1024.0000 (1037.4383) mem 16715MB [2024-08-10 10:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][390/625] eta 0:01:50 lr 0.000913 wd 0.0500 time 0.4609 (0.4691) data time 0.0008 (0.0022) model time 0.4601 (0.4669) loss 3.7443 (3.0615) grad_norm 2.0270 (inf) loss_scale 1024.0000 (1037.0946) mem 16715MB [2024-08-10 10:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][400/625] eta 0:01:45 lr 0.000913 wd 0.0500 time 0.4664 (0.4691) data time 0.0008 (0.0022) model time 0.4656 (0.4669) loss 3.0635 (3.0668) grad_norm 2.3910 (inf) loss_scale 1024.0000 (1036.7681) mem 16715MB [2024-08-10 10:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][410/625] eta 0:01:40 lr 0.000913 wd 0.0500 time 0.4671 (0.4694) data time 0.0008 (0.0022) model time 0.4662 (0.4673) loss 2.4816 (3.0724) grad_norm 1.6799 (inf) loss_scale 1024.0000 (1036.4574) mem 16715MB [2024-08-10 10:46:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][420/625] eta 0:01:36 lr 0.000913 wd 0.0500 time 0.4660 (0.4693) data time 0.0009 (0.0021) model time 0.4651 (0.4672) loss 3.3463 (3.0704) grad_norm 1.2958 (inf) loss_scale 1024.0000 (1036.1615) mem 16715MB [2024-08-10 10:46:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][430/625] eta 0:01:31 lr 0.000912 wd 0.0500 time 0.4641 (0.4701) data time 0.0008 (0.0021) model time 0.4634 (0.4681) loss 3.5668 (3.0731) grad_norm 1.1311 (inf) loss_scale 1024.0000 (1035.8794) mem 16715MB [2024-08-10 10:46:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][440/625] eta 0:01:27 lr 0.000912 wd 0.0500 time 0.4600 (0.4705) data time 0.0009 (0.0021) model time 0.4590 (0.4686) loss 3.6606 (3.0770) grad_norm 1.4986 (inf) loss_scale 1024.0000 (1035.6100) mem 16715MB [2024-08-10 10:46:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][450/625] eta 0:01:22 lr 0.000912 wd 0.0500 time 0.4634 (0.4703) data time 0.0009 (0.0021) model time 0.4625 (0.4684) loss 2.0827 (3.0775) grad_norm 1.2689 (inf) loss_scale 1024.0000 (1035.3525) mem 16715MB [2024-08-10 10:47:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][460/625] eta 0:01:17 lr 0.000912 wd 0.0500 time 0.4625 (0.4705) data time 0.0010 (0.0020) model time 0.4614 (0.4687) loss 2.3409 (3.0749) grad_norm 1.3379 (inf) loss_scale 1024.0000 (1035.1063) mem 16715MB [2024-08-10 10:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][470/625] eta 0:01:12 lr 0.000912 wd 0.0500 time 0.4613 (0.4704) data time 0.0010 (0.0020) model time 0.4603 (0.4686) loss 2.8478 (3.0759) grad_norm 1.6434 (inf) loss_scale 1024.0000 (1034.8705) mem 16715MB [2024-08-10 10:47:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][480/625] eta 0:01:08 lr 0.000912 wd 0.0500 time 0.4621 (0.4703) data time 0.0011 (0.0020) model time 0.4610 (0.4685) loss 3.2801 (3.0803) grad_norm 1.0307 (inf) loss_scale 1024.0000 (1034.6445) mem 16715MB [2024-08-10 10:47:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][490/625] eta 0:01:03 lr 0.000912 wd 0.0500 time 0.4671 (0.4704) data time 0.0008 (0.0020) model time 0.4663 (0.4685) loss 2.2300 (3.0746) grad_norm 1.3027 (inf) loss_scale 1024.0000 (1034.4277) mem 16715MB [2024-08-10 10:47:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][500/625] eta 0:00:58 lr 0.000912 wd 0.0500 time 0.4603 (0.4702) data time 0.0008 (0.0020) model time 0.4595 (0.4685) loss 2.3267 (3.0741) grad_norm 1.1691 (inf) loss_scale 1024.0000 (1034.2196) mem 16715MB [2024-08-10 10:47:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][510/625] eta 0:00:54 lr 0.000912 wd 0.0500 time 0.4643 (0.4701) data time 0.0009 (0.0020) model time 0.4634 (0.4683) loss 2.1809 (3.0647) grad_norm 1.5843 (inf) loss_scale 1024.0000 (1034.0196) mem 16715MB [2024-08-10 10:47:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][520/625] eta 0:00:49 lr 0.000912 wd 0.0500 time 0.4603 (0.4700) data time 0.0010 (0.0019) model time 0.4593 (0.4682) loss 3.2948 (3.0637) grad_norm 1.8493 (inf) loss_scale 1024.0000 (1033.8273) mem 16715MB [2024-08-10 10:47:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][530/625] eta 0:00:44 lr 0.000912 wd 0.0500 time 0.4616 (0.4699) data time 0.0011 (0.0019) model time 0.4605 (0.4681) loss 3.0903 (3.0624) grad_norm 2.2548 (inf) loss_scale 1024.0000 (1033.6422) mem 16715MB [2024-08-10 10:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][540/625] eta 0:00:39 lr 0.000911 wd 0.0500 time 0.4670 (0.4698) data time 0.0010 (0.0019) model time 0.4660 (0.4680) loss 3.4578 (3.0669) grad_norm 1.6412 (inf) loss_scale 1024.0000 (1033.4640) mem 16715MB [2024-08-10 10:47:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][550/625] eta 0:00:35 lr 0.000911 wd 0.0500 time 0.4605 (0.4697) data time 0.0011 (0.0019) model time 0.4594 (0.4679) loss 3.5304 (3.0709) grad_norm 2.8796 (inf) loss_scale 1024.0000 (1033.2922) mem 16715MB [2024-08-10 10:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][560/625] eta 0:00:30 lr 0.000911 wd 0.0500 time 0.4633 (0.4696) data time 0.0011 (0.0019) model time 0.4623 (0.4678) loss 3.2506 (3.0751) grad_norm 1.4878 (inf) loss_scale 1024.0000 (1033.1266) mem 16715MB [2024-08-10 10:47:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][570/625] eta 0:00:25 lr 0.000911 wd 0.0500 time 0.4618 (0.4695) data time 0.0011 (0.0019) model time 0.4607 (0.4677) loss 2.5711 (3.0769) grad_norm 1.6340 (inf) loss_scale 1024.0000 (1032.9667) mem 16715MB [2024-08-10 10:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][580/625] eta 0:00:21 lr 0.000911 wd 0.0500 time 0.4649 (0.4694) data time 0.0011 (0.0019) model time 0.4638 (0.4676) loss 3.0423 (3.0740) grad_norm 1.9741 (inf) loss_scale 1024.0000 (1032.8124) mem 16715MB [2024-08-10 10:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][590/625] eta 0:00:16 lr 0.000911 wd 0.0500 time 0.4561 (0.4696) data time 0.0011 (0.0018) model time 0.4550 (0.4679) loss 2.7983 (3.0761) grad_norm 1.5918 (inf) loss_scale 1024.0000 (1032.6633) mem 16715MB [2024-08-10 10:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][600/625] eta 0:00:11 lr 0.000911 wd 0.0500 time 0.4614 (0.4696) data time 0.0008 (0.0018) model time 0.4606 (0.4679) loss 2.8039 (3.0796) grad_norm 1.0463 (inf) loss_scale 1024.0000 (1032.5191) mem 16715MB [2024-08-10 10:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][610/625] eta 0:00:07 lr 0.000911 wd 0.0500 time 0.4598 (0.4695) data time 0.0009 (0.0018) model time 0.4589 (0.4678) loss 3.0977 (3.0752) grad_norm 1.3468 (inf) loss_scale 1024.0000 (1032.3797) mem 16715MB [2024-08-10 10:48:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [111/300][620/625] eta 0:00:02 lr 0.000911 wd 0.0500 time 0.4635 (0.4698) data time 0.0007 (0.0018) model time 0.4627 (0.4681) loss 2.6926 (3.0751) grad_norm 1.2709 (inf) loss_scale 1024.0000 (1032.2448) mem 16715MB [2024-08-10 10:48:21 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 111 training takes 0:04:53 [2024-08-10 10:48:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:48:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.5879 (0.5879) Acc@1 86.768 (86.768) Acc@5 98.242 (98.242) Mem 16715MB [2024-08-10 10:48:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8501 (0.6895) Acc@1 79.883 (84.610) Acc@5 95.557 (97.190) Mem 16715MB [2024-08-10 10:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0254 (0.8187) Acc@1 75.098 (81.180) Acc@5 93.408 (95.754) Mem 16715MB [2024-08-10 10:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.948 Acc@5 95.739 [2024-08-10 10:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.9% [2024-08-10 10:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.912 (0.912) Loss 0.4963 (0.4963) Acc@1 88.867 (88.867) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 10:48:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.199) Loss 0.7959 (0.6194) Acc@1 80.225 (86.151) Acc@5 96.094 (97.718) Mem 16715MB [2024-08-10 10:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.9102 (0.7303) Acc@1 77.441 (83.161) Acc@5 95.264 (96.529) Mem 16715MB [2024-08-10 10:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.853 Acc@5 96.549 [2024-08-10 10:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 10:48:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][0/625] eta 0:14:43 lr 0.000911 wd 0.0500 time 1.4132 (1.4132) data time 0.7200 (0.7200) model time 0.0000 (0.0000) loss 2.0187 (2.0187) grad_norm 1.3359 (1.3359) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][10/625] eta 0:05:38 lr 0.000911 wd 0.0500 time 0.4616 (0.5503) data time 0.0010 (0.0665) model time 0.0000 (0.0000) loss 3.5848 (2.9885) grad_norm 1.2461 (1.6315) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][20/625] eta 0:05:07 lr 0.000910 wd 0.0500 time 0.4573 (0.5085) data time 0.0011 (0.0354) model time 0.0000 (0.0000) loss 3.5176 (3.0883) grad_norm 1.1219 (1.4781) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][30/625] eta 0:04:56 lr 0.000910 wd 0.0500 time 0.4602 (0.4989) data time 0.0008 (0.0243) model time 0.0000 (0.0000) loss 2.4146 (3.0519) grad_norm 1.1916 (1.4205) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][40/625] eta 0:04:46 lr 0.000910 wd 0.0500 time 0.4614 (0.4897) data time 0.0015 (0.0186) model time 0.0000 (0.0000) loss 3.3296 (3.0339) grad_norm 1.4119 (1.4603) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][50/625] eta 0:04:38 lr 0.000910 wd 0.0500 time 0.4667 (0.4846) data time 0.0008 (0.0152) model time 0.0000 (0.0000) loss 2.5806 (3.0514) grad_norm 1.2241 (1.4483) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][60/625] eta 0:04:31 lr 0.000910 wd 0.0500 time 0.4720 (0.4814) data time 0.0008 (0.0129) model time 0.4712 (0.4644) loss 3.6238 (3.0586) grad_norm 1.4529 (1.4568) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][70/625] eta 0:04:25 lr 0.000910 wd 0.0500 time 0.4661 (0.4792) data time 0.0010 (0.0112) model time 0.4651 (0.4644) loss 2.8069 (3.0130) grad_norm 1.4478 (1.4841) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][80/625] eta 0:04:20 lr 0.000910 wd 0.0500 time 0.4645 (0.4775) data time 0.0010 (0.0100) model time 0.4635 (0.4646) loss 3.3373 (3.0116) grad_norm 1.3923 (1.4991) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][90/625] eta 0:04:14 lr 0.000910 wd 0.0500 time 0.4660 (0.4761) data time 0.0010 (0.0090) model time 0.4650 (0.4642) loss 3.2534 (2.9920) grad_norm 1.5138 (1.5266) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][100/625] eta 0:04:09 lr 0.000910 wd 0.0500 time 0.4674 (0.4748) data time 0.0010 (0.0082) model time 0.4664 (0.4639) loss 2.7655 (3.0188) grad_norm 2.4011 (1.5944) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][110/625] eta 0:04:03 lr 0.000910 wd 0.0500 time 0.4602 (0.4737) data time 0.0008 (0.0075) model time 0.4594 (0.4635) loss 2.7498 (3.0093) grad_norm 1.5459 (1.6030) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][120/625] eta 0:03:58 lr 0.000910 wd 0.0500 time 0.4678 (0.4729) data time 0.0008 (0.0070) model time 0.4670 (0.4634) loss 3.0688 (2.9992) grad_norm 1.5811 (1.5886) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][130/625] eta 0:03:54 lr 0.000909 wd 0.0500 time 0.4685 (0.4741) data time 0.0009 (0.0065) model time 0.4676 (0.4665) loss 2.7287 (2.9870) grad_norm 1.4250 (1.5902) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][140/625] eta 0:03:50 lr 0.000909 wd 0.0500 time 0.4618 (0.4761) data time 0.0008 (0.0061) model time 0.4610 (0.4704) loss 3.7186 (2.9977) grad_norm 1.4172 (1.5727) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][150/625] eta 0:03:45 lr 0.000909 wd 0.0500 time 0.4641 (0.4754) data time 0.0010 (0.0058) model time 0.4631 (0.4696) loss 3.2386 (3.0043) grad_norm 1.9994 (1.5724) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][160/625] eta 0:03:40 lr 0.000909 wd 0.0500 time 0.4619 (0.4746) data time 0.0010 (0.0055) model time 0.4609 (0.4690) loss 2.9018 (3.0140) grad_norm 1.6559 (1.5878) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][170/625] eta 0:03:35 lr 0.000909 wd 0.0500 time 0.4635 (0.4745) data time 0.0009 (0.0053) model time 0.4625 (0.4692) loss 3.2523 (2.9962) grad_norm 1.1120 (1.5999) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][180/625] eta 0:03:30 lr 0.000909 wd 0.0500 time 0.4629 (0.4739) data time 0.0007 (0.0050) model time 0.4622 (0.4686) loss 2.3874 (2.9919) grad_norm 2.0955 (1.6160) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][190/625] eta 0:03:25 lr 0.000909 wd 0.0500 time 0.4641 (0.4733) data time 0.0008 (0.0048) model time 0.4634 (0.4682) loss 3.6058 (2.9865) grad_norm 1.2843 (1.6189) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][200/625] eta 0:03:20 lr 0.000909 wd 0.0500 time 0.4645 (0.4728) data time 0.0010 (0.0046) model time 0.4635 (0.4678) loss 3.2818 (2.9934) grad_norm 1.5904 (1.6133) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][210/625] eta 0:03:16 lr 0.000909 wd 0.0500 time 0.4640 (0.4725) data time 0.0010 (0.0045) model time 0.4630 (0.4676) loss 3.2742 (3.0028) grad_norm 1.5164 (1.6158) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][220/625] eta 0:03:11 lr 0.000909 wd 0.0500 time 0.4692 (0.4722) data time 0.0010 (0.0043) model time 0.4682 (0.4675) loss 3.2517 (3.0100) grad_norm 2.2617 (1.6471) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][230/625] eta 0:03:06 lr 0.000909 wd 0.0500 time 0.4632 (0.4719) data time 0.0011 (0.0042) model time 0.4621 (0.4673) loss 2.6341 (3.0063) grad_norm 1.7667 (1.6462) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][240/625] eta 0:03:01 lr 0.000908 wd 0.0500 time 0.4671 (0.4717) data time 0.0010 (0.0040) model time 0.4660 (0.4672) loss 3.1330 (3.0079) grad_norm 1.6947 (1.6370) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][250/625] eta 0:02:56 lr 0.000908 wd 0.0500 time 0.4686 (0.4714) data time 0.0008 (0.0039) model time 0.4678 (0.4671) loss 3.9459 (3.0168) grad_norm 1.3512 (1.6373) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][260/625] eta 0:02:52 lr 0.000908 wd 0.0500 time 0.4664 (0.4712) data time 0.0007 (0.0038) model time 0.4657 (0.4670) loss 3.0827 (3.0211) grad_norm 1.0871 (1.6319) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][270/625] eta 0:02:47 lr 0.000908 wd 0.0500 time 0.4663 (0.4710) data time 0.0010 (0.0037) model time 0.4653 (0.4668) loss 2.7986 (3.0209) grad_norm 2.1196 (1.6277) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][280/625] eta 0:02:42 lr 0.000908 wd 0.0500 time 0.4704 (0.4709) data time 0.0010 (0.0036) model time 0.4694 (0.4668) loss 3.1040 (3.0199) grad_norm 1.3931 (1.6264) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][290/625] eta 0:02:37 lr 0.000908 wd 0.0500 time 0.4703 (0.4710) data time 0.0010 (0.0036) model time 0.4693 (0.4670) loss 3.4891 (3.0300) grad_norm 1.4506 (1.6162) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][300/625] eta 0:02:33 lr 0.000908 wd 0.0500 time 0.4682 (0.4709) data time 0.0007 (0.0035) model time 0.4675 (0.4670) loss 2.9279 (3.0369) grad_norm 0.9926 (1.6123) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][310/625] eta 0:02:28 lr 0.000908 wd 0.0500 time 0.4733 (0.4708) data time 0.0010 (0.0034) model time 0.4723 (0.4670) loss 3.2223 (3.0373) grad_norm 1.6472 (1.6155) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][320/625] eta 0:02:23 lr 0.000908 wd 0.0500 time 0.4678 (0.4706) data time 0.0008 (0.0033) model time 0.4670 (0.4669) loss 2.3666 (3.0367) grad_norm 1.7628 (1.6200) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][330/625] eta 0:02:18 lr 0.000908 wd 0.0500 time 0.4614 (0.4710) data time 0.0009 (0.0033) model time 0.4605 (0.4675) loss 3.2830 (3.0460) grad_norm 1.4482 (1.6155) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][340/625] eta 0:02:14 lr 0.000908 wd 0.0500 time 0.4633 (0.4708) data time 0.0007 (0.0032) model time 0.4626 (0.4673) loss 2.5272 (3.0471) grad_norm 1.8995 (1.6076) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][350/625] eta 0:02:09 lr 0.000907 wd 0.0500 time 0.4681 (0.4708) data time 0.0010 (0.0032) model time 0.4671 (0.4673) loss 3.0910 (3.0435) grad_norm 1.6123 (1.6092) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][360/625] eta 0:02:04 lr 0.000907 wd 0.0500 time 0.4622 (0.4707) data time 0.0009 (0.0031) model time 0.4614 (0.4673) loss 2.4526 (3.0461) grad_norm 2.2975 (1.6112) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][370/625] eta 0:02:00 lr 0.000907 wd 0.0500 time 0.4687 (0.4711) data time 0.0010 (0.0031) model time 0.4677 (0.4678) loss 3.4458 (3.0405) grad_norm 1.1261 (1.6053) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][380/625] eta 0:01:55 lr 0.000907 wd 0.0500 time 0.4645 (0.4709) data time 0.0007 (0.0030) model time 0.4638 (0.4677) loss 3.6819 (3.0458) grad_norm 1.5963 (1.6033) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][390/625] eta 0:01:50 lr 0.000907 wd 0.0500 time 0.4613 (0.4712) data time 0.0007 (0.0030) model time 0.4606 (0.4681) loss 3.8262 (3.0540) grad_norm 1.4329 (1.6049) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][400/625] eta 0:01:45 lr 0.000907 wd 0.0500 time 0.4583 (0.4709) data time 0.0007 (0.0029) model time 0.4576 (0.4678) loss 3.5620 (3.0586) grad_norm 1.6590 (1.6130) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][410/625] eta 0:01:41 lr 0.000907 wd 0.0500 time 0.4635 (0.4707) data time 0.0010 (0.0029) model time 0.4625 (0.4676) loss 3.3869 (3.0557) grad_norm 2.3305 (1.6154) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][420/625] eta 0:01:36 lr 0.000907 wd 0.0500 time 0.4666 (0.4706) data time 0.0008 (0.0028) model time 0.4658 (0.4675) loss 1.9776 (3.0588) grad_norm 1.8596 (1.6138) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][430/625] eta 0:01:31 lr 0.000907 wd 0.0500 time 0.4824 (0.4706) data time 0.0008 (0.0028) model time 0.4816 (0.4676) loss 2.6285 (3.0555) grad_norm 1.3985 (1.6089) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:51:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][440/625] eta 0:01:27 lr 0.000907 wd 0.0500 time 0.4631 (0.4705) data time 0.0009 (0.0028) model time 0.4622 (0.4675) loss 2.9739 (3.0604) grad_norm 1.5078 (1.6070) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][450/625] eta 0:01:22 lr 0.000907 wd 0.0500 time 0.4610 (0.4703) data time 0.0009 (0.0027) model time 0.4601 (0.4674) loss 3.5606 (3.0665) grad_norm 1.7619 (1.6105) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][460/625] eta 0:01:17 lr 0.000906 wd 0.0500 time 0.4648 (0.4703) data time 0.0008 (0.0027) model time 0.4640 (0.4675) loss 3.6439 (3.0631) grad_norm 1.3700 (1.6072) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][470/625] eta 0:01:12 lr 0.000906 wd 0.0500 time 0.4636 (0.4707) data time 0.0010 (0.0026) model time 0.4626 (0.4679) loss 3.1769 (3.0678) grad_norm 1.8896 (1.6019) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][480/625] eta 0:01:08 lr 0.000906 wd 0.0500 time 0.4617 (0.4710) data time 0.0010 (0.0026) model time 0.4607 (0.4683) loss 2.3537 (3.0671) grad_norm 2.9766 (1.6132) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][490/625] eta 0:01:03 lr 0.000906 wd 0.0500 time 0.4657 (0.4709) data time 0.0010 (0.0026) model time 0.4648 (0.4682) loss 3.3318 (3.0686) grad_norm 1.5389 (1.6221) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][500/625] eta 0:00:58 lr 0.000906 wd 0.0500 time 0.4619 (0.4708) data time 0.0007 (0.0025) model time 0.4612 (0.4681) loss 3.6258 (3.0729) grad_norm 1.4066 (1.6171) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][510/625] eta 0:00:54 lr 0.000906 wd 0.0500 time 0.4676 (0.4708) data time 0.0010 (0.0025) model time 0.4666 (0.4681) loss 3.7334 (3.0715) grad_norm 1.1080 (1.6134) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][520/625] eta 0:00:49 lr 0.000906 wd 0.0500 time 0.4624 (0.4709) data time 0.0008 (0.0025) model time 0.4616 (0.4682) loss 2.3432 (3.0704) grad_norm 1.9552 (1.6100) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][530/625] eta 0:00:44 lr 0.000906 wd 0.0500 time 0.4686 (0.4711) data time 0.0009 (0.0025) model time 0.4677 (0.4685) loss 2.5532 (3.0687) grad_norm 1.3023 (1.6083) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][540/625] eta 0:00:40 lr 0.000906 wd 0.0500 time 0.4593 (0.4709) data time 0.0008 (0.0025) model time 0.4585 (0.4684) loss 3.0822 (3.0683) grad_norm 1.1081 (1.6085) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][550/625] eta 0:00:35 lr 0.000906 wd 0.0500 time 0.4620 (0.4708) data time 0.0010 (0.0024) model time 0.4611 (0.4682) loss 3.4191 (3.0729) grad_norm 1.9336 (1.6126) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][560/625] eta 0:00:30 lr 0.000906 wd 0.0500 time 0.4627 (0.4708) data time 0.0010 (0.0024) model time 0.4617 (0.4682) loss 3.3986 (3.0763) grad_norm 1.7111 (1.6148) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][570/625] eta 0:00:25 lr 0.000905 wd 0.0500 time 0.4689 (0.4707) data time 0.0007 (0.0024) model time 0.4682 (0.4682) loss 3.5201 (3.0754) grad_norm 1.0788 (1.6237) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][580/625] eta 0:00:21 lr 0.000905 wd 0.0500 time 0.4633 (0.4706) data time 0.0007 (0.0024) model time 0.4626 (0.4681) loss 2.2901 (3.0816) grad_norm 1.5783 (1.6227) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][590/625] eta 0:00:16 lr 0.000905 wd 0.0500 time 0.4631 (0.4705) data time 0.0007 (0.0023) model time 0.4623 (0.4680) loss 3.8778 (3.0823) grad_norm 1.2096 (1.6218) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][600/625] eta 0:00:11 lr 0.000905 wd 0.0500 time 0.4632 (0.4703) data time 0.0010 (0.0023) model time 0.4622 (0.4679) loss 3.1466 (3.0800) grad_norm 1.7834 (1.6248) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][610/625] eta 0:00:07 lr 0.000905 wd 0.0500 time 0.4647 (0.4702) data time 0.0007 (0.0023) model time 0.4640 (0.4678) loss 2.9689 (3.0732) grad_norm 1.1246 (1.6227) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [112/300][620/625] eta 0:00:02 lr 0.000905 wd 0.0500 time 0.4589 (0.4701) data time 0.0007 (0.0023) model time 0.4582 (0.4677) loss 3.4611 (3.0759) grad_norm 1.3788 (1.6184) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 112 training takes 0:04:53 [2024-08-10 10:53:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:53:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5820 (0.5820) Acc@1 88.037 (88.037) Acc@5 98.242 (98.242) Mem 16715MB [2024-08-10 10:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.9673 (0.7252) Acc@1 78.369 (84.446) Acc@5 94.678 (97.159) Mem 16715MB [2024-08-10 10:53:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0430 (0.8537) Acc@1 74.609 (81.259) Acc@5 94.043 (95.736) Mem 16715MB [2024-08-10 10:53:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.956 Acc@5 95.713 [2024-08-10 10:53:29 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-10 10:53:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.891 (0.891) Loss 0.4951 (0.4951) Acc@1 88.818 (88.818) Acc@5 98.438 (98.438) Mem 16715MB [2024-08-10 10:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.198) Loss 0.7969 (0.6191) Acc@1 80.420 (86.146) Acc@5 96.094 (97.701) Mem 16715MB [2024-08-10 10:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.9097 (0.7296) Acc@1 77.588 (83.143) Acc@5 95.312 (96.545) Mem 16715MB [2024-08-10 10:53:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.859 Acc@5 96.559 [2024-08-10 10:53:33 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 10:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][0/625] eta 0:13:49 lr 0.000905 wd 0.0500 time 1.3266 (1.3266) data time 0.7686 (0.7686) model time 0.0000 (0.0000) loss 3.2271 (3.2271) grad_norm 1.0417 (1.0417) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][10/625] eta 0:05:35 lr 0.000905 wd 0.0500 time 0.4692 (0.5462) data time 0.0008 (0.0709) model time 0.0000 (0.0000) loss 3.4315 (3.2659) grad_norm 2.2236 (1.4924) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][20/625] eta 0:05:08 lr 0.000905 wd 0.0500 time 0.4699 (0.5097) data time 0.0008 (0.0377) model time 0.0000 (0.0000) loss 2.7305 (3.2696) grad_norm 1.6254 (1.5625) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][30/625] eta 0:04:55 lr 0.000905 wd 0.0500 time 0.4651 (0.4962) data time 0.0010 (0.0258) model time 0.0000 (0.0000) loss 3.0500 (3.2034) grad_norm 1.3184 (1.4959) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][40/625] eta 0:04:45 lr 0.000905 wd 0.0500 time 0.4639 (0.4885) data time 0.0008 (0.0198) model time 0.0000 (0.0000) loss 2.3844 (3.1211) grad_norm 1.5365 (1.4948) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][50/625] eta 0:04:41 lr 0.000904 wd 0.0500 time 0.4668 (0.4888) data time 0.0010 (0.0161) model time 0.0000 (0.0000) loss 3.4841 (3.1395) grad_norm 1.3420 (1.4787) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][60/625] eta 0:04:35 lr 0.000904 wd 0.0500 time 0.4611 (0.4873) data time 0.0011 (0.0137) model time 0.4600 (0.4783) loss 3.4938 (3.1083) grad_norm 2.6324 (1.5431) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][70/625] eta 0:04:30 lr 0.000904 wd 0.0500 time 0.4743 (0.4867) data time 0.0008 (0.0119) model time 0.4735 (0.4802) loss 2.6432 (3.0631) grad_norm 1.3898 (1.5382) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][80/625] eta 0:04:23 lr 0.000904 wd 0.0500 time 0.4651 (0.4841) data time 0.0008 (0.0105) model time 0.4643 (0.4750) loss 3.9014 (3.0951) grad_norm 1.2551 (1.5325) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][90/625] eta 0:04:17 lr 0.000904 wd 0.0500 time 0.4641 (0.4820) data time 0.0010 (0.0095) model time 0.4631 (0.4722) loss 3.2563 (3.0706) grad_norm 1.5516 (1.5202) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][100/625] eta 0:04:12 lr 0.000904 wd 0.0500 time 0.4919 (0.4808) data time 0.0012 (0.0087) model time 0.4907 (0.4716) loss 2.8358 (3.0833) grad_norm 1.9095 (1.5109) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][110/625] eta 0:04:07 lr 0.000904 wd 0.0500 time 0.4678 (0.4808) data time 0.0010 (0.0080) model time 0.4668 (0.4730) loss 3.0834 (3.0751) grad_norm 1.1993 (1.5088) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][120/625] eta 0:04:02 lr 0.000904 wd 0.0500 time 0.4687 (0.4801) data time 0.0007 (0.0074) model time 0.4679 (0.4726) loss 4.1617 (3.0932) grad_norm 1.7638 (1.5252) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][130/625] eta 0:03:57 lr 0.000904 wd 0.0500 time 0.4653 (0.4800) data time 0.0007 (0.0069) model time 0.4646 (0.4734) loss 3.3456 (3.1086) grad_norm 1.4996 (1.5177) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][140/625] eta 0:03:52 lr 0.000904 wd 0.0500 time 0.4636 (0.4793) data time 0.0010 (0.0066) model time 0.4626 (0.4728) loss 3.0810 (3.0923) grad_norm 2.3165 (1.5306) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][150/625] eta 0:03:47 lr 0.000904 wd 0.0500 time 0.4632 (0.4787) data time 0.0010 (0.0062) model time 0.4621 (0.4725) loss 2.9466 (3.0898) grad_norm 1.1717 (1.5609) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][160/625] eta 0:03:42 lr 0.000903 wd 0.0500 time 0.4682 (0.4781) data time 0.0010 (0.0059) model time 0.4672 (0.4719) loss 2.1036 (3.0755) grad_norm 1.9981 (1.5507) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][170/625] eta 0:03:37 lr 0.000903 wd 0.0500 time 0.4647 (0.4776) data time 0.0011 (0.0056) model time 0.4636 (0.4717) loss 3.0961 (3.0863) grad_norm 1.3252 (1.5930) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][180/625] eta 0:03:32 lr 0.000903 wd 0.0500 time 0.4620 (0.4769) data time 0.0011 (0.0054) model time 0.4609 (0.4711) loss 3.0313 (3.0863) grad_norm 1.1616 (1.5796) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][190/625] eta 0:03:27 lr 0.000903 wd 0.0500 time 0.4611 (0.4764) data time 0.0011 (0.0052) model time 0.4600 (0.4707) loss 3.1852 (3.0816) grad_norm 1.9713 (1.5701) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][200/625] eta 0:03:22 lr 0.000903 wd 0.0500 time 0.4681 (0.4760) data time 0.0010 (0.0050) model time 0.4671 (0.4705) loss 3.3293 (3.0852) grad_norm 1.2481 (1.5599) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][210/625] eta 0:03:17 lr 0.000903 wd 0.0500 time 0.4618 (0.4754) data time 0.0010 (0.0048) model time 0.4607 (0.4699) loss 3.2741 (3.0955) grad_norm 1.5862 (1.5533) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][220/625] eta 0:03:12 lr 0.000903 wd 0.0500 time 0.4656 (0.4749) data time 0.0011 (0.0046) model time 0.4645 (0.4696) loss 3.5995 (3.1038) grad_norm 2.2941 (1.5499) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][230/625] eta 0:03:07 lr 0.000903 wd 0.0500 time 0.4670 (0.4745) data time 0.0010 (0.0045) model time 0.4659 (0.4693) loss 2.6758 (3.1121) grad_norm 1.7559 (1.5572) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][240/625] eta 0:03:02 lr 0.000903 wd 0.0500 time 0.4668 (0.4748) data time 0.0011 (0.0043) model time 0.4657 (0.4699) loss 2.8168 (3.1017) grad_norm 1.3436 (1.5676) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][250/625] eta 0:02:57 lr 0.000903 wd 0.0500 time 0.4687 (0.4745) data time 0.0011 (0.0042) model time 0.4677 (0.4697) loss 2.6323 (3.0938) grad_norm 1.5874 (1.5662) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][260/625] eta 0:02:53 lr 0.000903 wd 0.0500 time 0.4621 (0.4747) data time 0.0008 (0.0041) model time 0.4613 (0.4701) loss 3.3527 (3.0969) grad_norm 1.8535 (1.5674) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][270/625] eta 0:02:48 lr 0.000902 wd 0.0500 time 0.4627 (0.4743) data time 0.0011 (0.0040) model time 0.4616 (0.4698) loss 2.7603 (3.0951) grad_norm 1.6122 (1.5719) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][280/625] eta 0:02:43 lr 0.000902 wd 0.0500 time 0.4641 (0.4739) data time 0.0010 (0.0039) model time 0.4631 (0.4694) loss 2.9483 (3.1049) grad_norm 1.1680 (1.5757) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][290/625] eta 0:02:38 lr 0.000902 wd 0.0500 time 0.4592 (0.4735) data time 0.0008 (0.0038) model time 0.4584 (0.4691) loss 3.4716 (3.1112) grad_norm 1.5082 (1.5720) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][300/625] eta 0:02:33 lr 0.000902 wd 0.0500 time 0.4743 (0.4733) data time 0.0010 (0.0037) model time 0.4734 (0.4690) loss 3.2091 (3.1087) grad_norm 1.2251 (1.5643) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][310/625] eta 0:02:29 lr 0.000902 wd 0.0500 time 0.4687 (0.4730) data time 0.0009 (0.0036) model time 0.4679 (0.4688) loss 2.4582 (3.1100) grad_norm 1.7233 (1.5624) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][320/625] eta 0:02:24 lr 0.000902 wd 0.0500 time 0.4649 (0.4728) data time 0.0008 (0.0035) model time 0.4642 (0.4687) loss 2.6935 (3.1069) grad_norm 1.5608 (1.5623) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][330/625] eta 0:02:19 lr 0.000902 wd 0.0500 time 0.4635 (0.4726) data time 0.0008 (0.0034) model time 0.4626 (0.4685) loss 2.5290 (3.1073) grad_norm 1.3959 (1.5567) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][340/625] eta 0:02:14 lr 0.000902 wd 0.0500 time 0.4676 (0.4723) data time 0.0009 (0.0034) model time 0.4666 (0.4683) loss 3.3874 (3.1121) grad_norm 1.3090 (1.5568) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][350/625] eta 0:02:09 lr 0.000902 wd 0.0500 time 0.4625 (0.4725) data time 0.0011 (0.0033) model time 0.4615 (0.4686) loss 3.2519 (3.1134) grad_norm 1.5989 (1.5602) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][360/625] eta 0:02:05 lr 0.000902 wd 0.0500 time 0.4611 (0.4723) data time 0.0008 (0.0032) model time 0.4603 (0.4684) loss 3.6131 (3.1094) grad_norm 1.3960 (1.5568) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][370/625] eta 0:02:00 lr 0.000902 wd 0.0500 time 0.4626 (0.4721) data time 0.0010 (0.0032) model time 0.4616 (0.4683) loss 2.5040 (3.1117) grad_norm 2.0775 (1.5603) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][380/625] eta 0:01:55 lr 0.000901 wd 0.0500 time 0.4641 (0.4719) data time 0.0010 (0.0031) model time 0.4631 (0.4682) loss 2.5893 (3.1121) grad_norm 2.3486 (1.5665) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][390/625] eta 0:01:50 lr 0.000901 wd 0.0500 time 0.4653 (0.4723) data time 0.0009 (0.0031) model time 0.4645 (0.4687) loss 3.9164 (3.1160) grad_norm 1.2879 (1.5688) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][400/625] eta 0:01:46 lr 0.000901 wd 0.0500 time 0.4622 (0.4721) data time 0.0010 (0.0030) model time 0.4612 (0.4685) loss 3.5447 (3.1191) grad_norm 1.6230 (1.5727) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][410/625] eta 0:01:41 lr 0.000901 wd 0.0500 time 0.4613 (0.4718) data time 0.0008 (0.0030) model time 0.4605 (0.4683) loss 3.9894 (3.1208) grad_norm 1.7847 (1.5667) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][420/625] eta 0:01:36 lr 0.000901 wd 0.0500 time 0.4676 (0.4717) data time 0.0008 (0.0029) model time 0.4669 (0.4682) loss 1.9000 (3.1165) grad_norm 1.3322 (1.5662) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][430/625] eta 0:01:31 lr 0.000901 wd 0.0500 time 0.4620 (0.4715) data time 0.0010 (0.0029) model time 0.4610 (0.4680) loss 3.4289 (3.1164) grad_norm 1.6165 (1.5721) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][440/625] eta 0:01:27 lr 0.000901 wd 0.0500 time 0.4603 (0.4717) data time 0.0008 (0.0028) model time 0.4595 (0.4683) loss 4.3571 (3.1162) grad_norm 1.9588 (1.5721) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][450/625] eta 0:01:22 lr 0.000901 wd 0.0500 time 0.4612 (0.4715) data time 0.0010 (0.0028) model time 0.4602 (0.4682) loss 2.9641 (3.1150) grad_norm 1.4994 (1.5740) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][460/625] eta 0:01:17 lr 0.000901 wd 0.0500 time 0.4636 (0.4719) data time 0.0010 (0.0028) model time 0.4626 (0.4687) loss 3.5033 (3.1210) grad_norm 1.1609 (1.5711) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][470/625] eta 0:01:13 lr 0.000901 wd 0.0500 time 0.4658 (0.4717) data time 0.0010 (0.0027) model time 0.4647 (0.4686) loss 3.2629 (3.1199) grad_norm 1.2386 (1.5706) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][480/625] eta 0:01:08 lr 0.000900 wd 0.0500 time 0.4624 (0.4720) data time 0.0010 (0.0027) model time 0.4614 (0.4689) loss 2.9464 (3.1194) grad_norm 1.8773 (1.5736) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][490/625] eta 0:01:03 lr 0.000900 wd 0.0500 time 0.4610 (0.4718) data time 0.0009 (0.0027) model time 0.4600 (0.4687) loss 3.6853 (3.1146) grad_norm 1.5501 (1.5769) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][500/625] eta 0:00:58 lr 0.000900 wd 0.0500 time 0.4621 (0.4719) data time 0.0010 (0.0026) model time 0.4611 (0.4689) loss 2.5600 (3.1117) grad_norm 1.5015 (1.5810) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][510/625] eta 0:00:54 lr 0.000900 wd 0.0500 time 0.4648 (0.4717) data time 0.0008 (0.0026) model time 0.4640 (0.4687) loss 2.0815 (3.1090) grad_norm 1.2117 (1.5772) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][520/625] eta 0:00:49 lr 0.000900 wd 0.0500 time 0.4672 (0.4716) data time 0.0008 (0.0026) model time 0.4664 (0.4687) loss 3.0747 (3.1060) grad_norm 1.6622 (1.5775) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][530/625] eta 0:00:44 lr 0.000900 wd 0.0500 time 0.4733 (0.4716) data time 0.0011 (0.0025) model time 0.4722 (0.4686) loss 3.3129 (3.1037) grad_norm 2.1351 (1.5819) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][540/625] eta 0:00:40 lr 0.000900 wd 0.0500 time 0.4620 (0.4715) data time 0.0008 (0.0025) model time 0.4612 (0.4686) loss 2.0036 (3.0959) grad_norm 1.3255 (1.5884) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][550/625] eta 0:00:35 lr 0.000900 wd 0.0500 time 0.4712 (0.4714) data time 0.0011 (0.0025) model time 0.4702 (0.4685) loss 3.0458 (3.0996) grad_norm 1.5838 (1.5913) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][560/625] eta 0:00:30 lr 0.000900 wd 0.0500 time 0.4594 (0.4712) data time 0.0007 (0.0025) model time 0.4586 (0.4684) loss 2.2331 (3.0968) grad_norm 2.0590 (1.5924) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][570/625] eta 0:00:25 lr 0.000900 wd 0.0500 time 0.4612 (0.4711) data time 0.0010 (0.0024) model time 0.4602 (0.4683) loss 3.2906 (3.0973) grad_norm 1.5015 (1.5925) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][580/625] eta 0:00:21 lr 0.000900 wd 0.0500 time 0.4624 (0.4711) data time 0.0010 (0.0024) model time 0.4614 (0.4683) loss 3.0406 (3.0989) grad_norm 1.8730 (1.5950) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][590/625] eta 0:00:16 lr 0.000899 wd 0.0500 time 0.4674 (0.4710) data time 0.0010 (0.0024) model time 0.4664 (0.4683) loss 2.3564 (3.1003) grad_norm 2.5113 (1.5993) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][600/625] eta 0:00:11 lr 0.000899 wd 0.0500 time 0.4648 (0.4710) data time 0.0008 (0.0024) model time 0.4640 (0.4682) loss 3.5501 (3.1020) grad_norm 1.4592 (1.5971) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][610/625] eta 0:00:07 lr 0.000899 wd 0.0500 time 0.4632 (0.4713) data time 0.0005 (0.0024) model time 0.4627 (0.4686) loss 2.8977 (3.1024) grad_norm 1.2426 (1.5974) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [113/300][620/625] eta 0:00:02 lr 0.000899 wd 0.0500 time 0.4609 (0.4712) data time 0.0005 (0.0023) model time 0.4604 (0.4685) loss 2.1244 (3.0968) grad_norm 1.4203 (1.5949) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:27 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 113 training takes 0:04:54 [2024-08-10 10:58:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 10:58:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 10:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.813 (0.813) Loss 0.5605 (0.5605) Acc@1 88.232 (88.232) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 10:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.188) Loss 0.9438 (0.7021) Acc@1 76.904 (84.455) Acc@5 94.678 (97.230) Mem 16715MB [2024-08-10 10:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.155) Loss 1.0566 (0.8321) Acc@1 73.193 (81.159) Acc@5 93.896 (95.810) Mem 16715MB [2024-08-10 10:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.030 Acc@5 95.799 [2024-08-10 10:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.0% [2024-08-10 10:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.03% [2024-08-10 10:58:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 10:58:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 10:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.4954 (0.4954) Acc@1 88.818 (88.818) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 10:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.7964 (0.6186) Acc@1 80.469 (86.195) Acc@5 96.191 (97.727) Mem 16715MB [2024-08-10 10:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9087 (0.7291) Acc@1 77.393 (83.164) Acc@5 95.459 (96.573) Mem 16715MB [2024-08-10 10:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.885 Acc@5 96.587 [2024-08-10 10:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 10:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.89% [2024-08-10 10:58:38 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 10:58:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 10:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][0/625] eta 0:08:23 lr 0.000899 wd 0.0500 time 0.8061 (0.8061) data time 0.3977 (0.3977) model time 0.0000 (0.0000) loss 3.7652 (3.7652) grad_norm 1.6559 (1.6559) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][10/625] eta 0:05:14 lr 0.000899 wd 0.0500 time 0.4631 (0.5113) data time 0.0010 (0.0371) model time 0.0000 (0.0000) loss 3.3002 (2.8865) grad_norm 1.8589 (1.6764) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][20/625] eta 0:04:55 lr 0.000899 wd 0.0500 time 0.4621 (0.4886) data time 0.0008 (0.0199) model time 0.0000 (0.0000) loss 2.7479 (2.8968) grad_norm 1.1808 (1.5831) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][30/625] eta 0:04:46 lr 0.000899 wd 0.0500 time 0.4813 (0.4816) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 3.6709 (2.9727) grad_norm 1.7533 (1.5576) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:58:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][40/625] eta 0:04:39 lr 0.000899 wd 0.0500 time 0.4679 (0.4778) data time 0.0008 (0.0107) model time 0.0000 (0.0000) loss 3.5668 (3.0314) grad_norm 1.1317 (1.5671) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][50/625] eta 0:04:33 lr 0.000899 wd 0.0500 time 0.4627 (0.4752) data time 0.0011 (0.0088) model time 0.0000 (0.0000) loss 2.7225 (3.0297) grad_norm 1.1365 (1.5453) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][60/625] eta 0:04:29 lr 0.000899 wd 0.0500 time 0.4671 (0.4764) data time 0.0011 (0.0076) model time 0.4661 (0.4818) loss 3.0792 (3.0535) grad_norm 1.5617 (1.5399) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][70/625] eta 0:04:23 lr 0.000898 wd 0.0500 time 0.4611 (0.4747) data time 0.0008 (0.0066) model time 0.4603 (0.4725) loss 3.2109 (3.0547) grad_norm 1.0013 (1.5195) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][80/625] eta 0:04:17 lr 0.000898 wd 0.0500 time 0.4620 (0.4733) data time 0.0008 (0.0059) model time 0.4613 (0.4692) loss 1.7745 (3.0708) grad_norm 1.3857 (1.4903) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][90/625] eta 0:04:13 lr 0.000898 wd 0.0500 time 0.4657 (0.4740) data time 0.0011 (0.0054) model time 0.4646 (0.4714) loss 2.4003 (3.0775) grad_norm 1.5096 (1.5151) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][100/625] eta 0:04:08 lr 0.000898 wd 0.0500 time 0.4641 (0.4733) data time 0.0010 (0.0050) model time 0.4631 (0.4704) loss 3.4536 (3.0729) grad_norm 2.3965 (1.5359) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][110/625] eta 0:04:03 lr 0.000898 wd 0.0500 time 0.4651 (0.4726) data time 0.0010 (0.0046) model time 0.4641 (0.4694) loss 2.8298 (3.0928) grad_norm 2.3453 (1.5283) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][120/625] eta 0:03:58 lr 0.000898 wd 0.0500 time 0.4614 (0.4719) data time 0.0008 (0.0043) model time 0.4606 (0.4685) loss 3.7227 (3.0946) grad_norm 2.1139 (1.5248) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 10:59:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][130/625] eta 0:03:53 lr 0.000898 wd 0.0500 time 0.4609 (0.4713) data time 0.0008 (0.0041) model time 0.4601 (0.4678) loss 3.4522 (3.1125) grad_norm 2.2142 (1.5325) loss_scale 2048.0000 (1031.8168) mem 16715MB [2024-08-10 10:59:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][140/625] eta 0:03:48 lr 0.000898 wd 0.0500 time 0.4595 (0.4706) data time 0.0011 (0.0039) model time 0.4584 (0.4669) loss 3.1895 (3.0965) grad_norm 1.6880 (1.5382) loss_scale 2048.0000 (1103.8865) mem 16715MB [2024-08-10 10:59:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][150/625] eta 0:03:43 lr 0.000898 wd 0.0500 time 0.4569 (0.4700) data time 0.0008 (0.0037) model time 0.4561 (0.4663) loss 2.6999 (3.0786) grad_norm 1.0234 (1.5253) loss_scale 2048.0000 (1166.4106) mem 16715MB [2024-08-10 10:59:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][160/625] eta 0:03:38 lr 0.000898 wd 0.0500 time 0.4687 (0.4695) data time 0.0010 (0.0035) model time 0.4677 (0.4658) loss 1.9917 (3.0724) grad_norm 1.4851 (1.5140) loss_scale 2048.0000 (1221.1677) mem 16715MB [2024-08-10 11:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][170/625] eta 0:03:33 lr 0.000898 wd 0.0500 time 0.4651 (0.4691) data time 0.0011 (0.0034) model time 0.4641 (0.4655) loss 3.6815 (3.0840) grad_norm 1.3069 (1.4995) loss_scale 2048.0000 (1269.5205) mem 16715MB [2024-08-10 11:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][180/625] eta 0:03:28 lr 0.000897 wd 0.0500 time 0.4619 (0.4690) data time 0.0011 (0.0032) model time 0.4608 (0.4655) loss 2.5619 (3.0787) grad_norm 1.6471 (1.5092) loss_scale 2048.0000 (1312.5304) mem 16715MB [2024-08-10 11:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][190/625] eta 0:03:24 lr 0.000897 wd 0.0500 time 0.4679 (0.4697) data time 0.0009 (0.0031) model time 0.4670 (0.4667) loss 3.9885 (3.0734) grad_norm 1.2052 (1.4996) loss_scale 2048.0000 (1351.0366) mem 16715MB [2024-08-10 11:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][200/625] eta 0:03:19 lr 0.000897 wd 0.0500 time 0.4679 (0.4699) data time 0.0009 (0.0030) model time 0.4670 (0.4670) loss 2.0536 (3.0526) grad_norm 1.9798 (1.5068) loss_scale 2048.0000 (1385.7114) mem 16715MB [2024-08-10 11:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][210/625] eta 0:03:15 lr 0.000897 wd 0.0500 time 0.4652 (0.4706) data time 0.0009 (0.0029) model time 0.4644 (0.4681) loss 3.1720 (3.0504) grad_norm 1.6792 (1.5129) loss_scale 2048.0000 (1417.0995) mem 16715MB [2024-08-10 11:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][220/625] eta 0:03:10 lr 0.000897 wd 0.0500 time 0.4615 (0.4703) data time 0.0008 (0.0029) model time 0.4607 (0.4677) loss 2.4794 (3.0529) grad_norm 1.1162 (1.5055) loss_scale 2048.0000 (1445.6471) mem 16715MB [2024-08-10 11:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][230/625] eta 0:03:05 lr 0.000897 wd 0.0500 time 0.4614 (0.4699) data time 0.0009 (0.0028) model time 0.4606 (0.4673) loss 2.9790 (3.0509) grad_norm 1.2054 (1.5069) loss_scale 2048.0000 (1471.7229) mem 16715MB [2024-08-10 11:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][240/625] eta 0:03:00 lr 0.000897 wd 0.0500 time 0.4657 (0.4697) data time 0.0010 (0.0027) model time 0.4646 (0.4671) loss 3.2682 (3.0527) grad_norm 2.4600 (1.5313) loss_scale 2048.0000 (1495.6349) mem 16715MB [2024-08-10 11:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][250/625] eta 0:02:56 lr 0.000897 wd 0.0500 time 0.4646 (0.4696) data time 0.0010 (0.0026) model time 0.4636 (0.4671) loss 3.1161 (3.0487) grad_norm 0.9912 (1.5254) loss_scale 2048.0000 (1517.6414) mem 16715MB [2024-08-10 11:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][260/625] eta 0:02:51 lr 0.000897 wd 0.0500 time 0.4636 (0.4694) data time 0.0008 (0.0026) model time 0.4629 (0.4669) loss 3.2270 (3.0518) grad_norm 1.3346 (1.5191) loss_scale 2048.0000 (1537.9617) mem 16715MB [2024-08-10 11:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][270/625] eta 0:02:46 lr 0.000897 wd 0.0500 time 0.4629 (0.4692) data time 0.0008 (0.0025) model time 0.4621 (0.4668) loss 3.0727 (3.0483) grad_norm 2.0151 (1.5209) loss_scale 2048.0000 (1556.7823) mem 16715MB [2024-08-10 11:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][280/625] eta 0:02:41 lr 0.000897 wd 0.0500 time 0.4639 (0.4690) data time 0.0008 (0.0025) model time 0.4631 (0.4666) loss 3.4356 (3.0495) grad_norm 1.3162 (1.5273) loss_scale 2048.0000 (1574.2633) mem 16715MB [2024-08-10 11:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][290/625] eta 0:02:37 lr 0.000896 wd 0.0500 time 0.4661 (0.4688) data time 0.0008 (0.0025) model time 0.4653 (0.4664) loss 3.1094 (3.0521) grad_norm 1.5478 (1.5254) loss_scale 2048.0000 (1590.5430) mem 16715MB [2024-08-10 11:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][300/625] eta 0:02:32 lr 0.000896 wd 0.0500 time 0.4586 (0.4686) data time 0.0009 (0.0024) model time 0.4577 (0.4662) loss 3.0491 (3.0453) grad_norm 1.7426 (1.5242) loss_scale 2048.0000 (1605.7409) mem 16715MB [2024-08-10 11:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][310/625] eta 0:02:27 lr 0.000896 wd 0.0500 time 0.4628 (0.4685) data time 0.0012 (0.0024) model time 0.4616 (0.4661) loss 3.4341 (3.0511) grad_norm 1.4732 (1.5255) loss_scale 2048.0000 (1619.9614) mem 16715MB [2024-08-10 11:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][320/625] eta 0:02:22 lr 0.000896 wd 0.0500 time 0.4648 (0.4684) data time 0.0008 (0.0023) model time 0.4640 (0.4660) loss 1.9083 (3.0484) grad_norm 1.6688 (1.5267) loss_scale 2048.0000 (1633.2960) mem 16715MB [2024-08-10 11:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][330/625] eta 0:02:18 lr 0.000896 wd 0.0500 time 0.4727 (0.4684) data time 0.0010 (0.0023) model time 0.4717 (0.4660) loss 3.5051 (3.0553) grad_norm 1.6036 (1.5269) loss_scale 2048.0000 (1645.8248) mem 16715MB [2024-08-10 11:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][340/625] eta 0:02:13 lr 0.000896 wd 0.0500 time 0.4648 (0.4683) data time 0.0010 (0.0022) model time 0.4637 (0.4659) loss 3.2932 (3.0611) grad_norm 2.1583 (1.5391) loss_scale 2048.0000 (1657.6188) mem 16715MB [2024-08-10 11:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][350/625] eta 0:02:08 lr 0.000896 wd 0.0500 time 0.4628 (0.4681) data time 0.0011 (0.0022) model time 0.4616 (0.4658) loss 3.2738 (3.0588) grad_norm 1.8054 (1.5380) loss_scale 2048.0000 (1668.7407) mem 16715MB [2024-08-10 11:01:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][360/625] eta 0:02:04 lr 0.000896 wd 0.0500 time 0.4629 (0.4685) data time 0.0010 (0.0022) model time 0.4619 (0.4664) loss 3.0045 (3.0545) grad_norm 1.5917 (1.5397) loss_scale 2048.0000 (1679.2465) mem 16715MB [2024-08-10 11:01:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][370/625] eta 0:01:59 lr 0.000896 wd 0.0500 time 0.4626 (0.4684) data time 0.0010 (0.0022) model time 0.4615 (0.4662) loss 2.5286 (3.0522) grad_norm 1.0125 (1.5448) loss_scale 2048.0000 (1689.1860) mem 16715MB [2024-08-10 11:01:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][380/625] eta 0:01:54 lr 0.000896 wd 0.0500 time 0.4604 (0.4682) data time 0.0008 (0.0021) model time 0.4597 (0.4660) loss 3.6381 (3.0492) grad_norm 1.1907 (1.5474) loss_scale 2048.0000 (1698.6037) mem 16715MB [2024-08-10 11:01:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][390/625] eta 0:01:50 lr 0.000896 wd 0.0500 time 0.4687 (0.4686) data time 0.0010 (0.0021) model time 0.4676 (0.4665) loss 3.2411 (3.0504) grad_norm 1.8303 (1.5433) loss_scale 2048.0000 (1707.5396) mem 16715MB [2024-08-10 11:01:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][400/625] eta 0:01:45 lr 0.000895 wd 0.0500 time 0.4571 (0.4685) data time 0.0009 (0.0021) model time 0.4562 (0.4664) loss 3.8752 (3.0504) grad_norm 1.3277 (1.5402) loss_scale 2048.0000 (1716.0299) mem 16715MB [2024-08-10 11:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][410/625] eta 0:01:40 lr 0.000895 wd 0.0500 time 0.4620 (0.4692) data time 0.0011 (0.0020) model time 0.4609 (0.4673) loss 2.7426 (3.0521) grad_norm 1.2937 (1.5398) loss_scale 2048.0000 (1724.1071) mem 16715MB [2024-08-10 11:01:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][420/625] eta 0:01:36 lr 0.000895 wd 0.0500 time 0.4556 (0.4695) data time 0.0008 (0.0020) model time 0.4547 (0.4677) loss 3.4854 (3.0530) grad_norm 1.7460 (1.5362) loss_scale 2048.0000 (1731.8005) mem 16715MB [2024-08-10 11:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][430/625] eta 0:01:31 lr 0.000895 wd 0.0500 time 0.4595 (0.4697) data time 0.0011 (0.0020) model time 0.4584 (0.4679) loss 3.4355 (3.0567) grad_norm 1.5868 (1.5344) loss_scale 2048.0000 (1739.1369) mem 16715MB [2024-08-10 11:02:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][440/625] eta 0:01:26 lr 0.000895 wd 0.0500 time 0.4644 (0.4695) data time 0.0009 (0.0020) model time 0.4635 (0.4677) loss 3.8763 (3.0526) grad_norm 1.3370 (1.5365) loss_scale 2048.0000 (1746.1406) mem 16715MB [2024-08-10 11:02:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][450/625] eta 0:01:22 lr 0.000895 wd 0.0500 time 0.4561 (0.4694) data time 0.0011 (0.0020) model time 0.4550 (0.4675) loss 2.8564 (3.0541) grad_norm 1.1746 (1.5374) loss_scale 2048.0000 (1752.8337) mem 16715MB [2024-08-10 11:02:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][460/625] eta 0:01:17 lr 0.000895 wd 0.0500 time 0.4607 (0.4692) data time 0.0008 (0.0019) model time 0.4600 (0.4674) loss 3.5833 (3.0504) grad_norm 2.3157 (1.5391) loss_scale 2048.0000 (1759.2364) mem 16715MB [2024-08-10 11:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][470/625] eta 0:01:12 lr 0.000895 wd 0.0500 time 0.4726 (0.4691) data time 0.0011 (0.0019) model time 0.4715 (0.4673) loss 3.6070 (3.0526) grad_norm 1.5687 (1.5517) loss_scale 2048.0000 (1765.3673) mem 16715MB [2024-08-10 11:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][480/625] eta 0:01:08 lr 0.000895 wd 0.0500 time 0.4610 (0.4691) data time 0.0011 (0.0019) model time 0.4600 (0.4672) loss 3.1812 (3.0546) grad_norm 2.4485 (1.5573) loss_scale 2048.0000 (1771.2432) mem 16715MB [2024-08-10 11:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][490/625] eta 0:01:03 lr 0.000895 wd 0.0500 time 0.4661 (0.4690) data time 0.0008 (0.0019) model time 0.4653 (0.4672) loss 2.2042 (3.0520) grad_norm 1.7220 (1.5731) loss_scale 2048.0000 (1776.8798) mem 16715MB [2024-08-10 11:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][500/625] eta 0:00:58 lr 0.000894 wd 0.0500 time 0.4655 (0.4689) data time 0.0008 (0.0019) model time 0.4647 (0.4671) loss 3.7812 (3.0559) grad_norm 1.2155 (1.5690) loss_scale 2048.0000 (1782.2914) mem 16715MB [2024-08-10 11:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][510/625] eta 0:00:53 lr 0.000894 wd 0.0500 time 0.4606 (0.4688) data time 0.0010 (0.0019) model time 0.4595 (0.4670) loss 2.9103 (3.0528) grad_norm 1.2602 (1.5688) loss_scale 2048.0000 (1787.4912) mem 16715MB [2024-08-10 11:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][520/625] eta 0:00:49 lr 0.000894 wd 0.0500 time 0.4607 (0.4687) data time 0.0008 (0.0018) model time 0.4599 (0.4669) loss 3.6351 (3.0558) grad_norm 2.2499 (1.5666) loss_scale 2048.0000 (1792.4914) mem 16715MB [2024-08-10 11:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][530/625] eta 0:00:44 lr 0.000894 wd 0.0500 time 0.4656 (0.4686) data time 0.0008 (0.0018) model time 0.4648 (0.4668) loss 3.3118 (3.0542) grad_norm 1.3503 (1.5676) loss_scale 2048.0000 (1797.3032) mem 16715MB [2024-08-10 11:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][540/625] eta 0:00:39 lr 0.000894 wd 0.0500 time 0.4639 (0.4685) data time 0.0013 (0.0018) model time 0.4626 (0.4667) loss 3.2066 (3.0547) grad_norm 1.3913 (1.5746) loss_scale 2048.0000 (1801.9372) mem 16715MB [2024-08-10 11:02:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][550/625] eta 0:00:35 lr 0.000894 wd 0.0500 time 0.4687 (0.4684) data time 0.0010 (0.0018) model time 0.4678 (0.4666) loss 3.8212 (3.0585) grad_norm 1.1951 (1.5740) loss_scale 2048.0000 (1806.4029) mem 16715MB [2024-08-10 11:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][560/625] eta 0:00:30 lr 0.000894 wd 0.0500 time 0.4637 (0.4690) data time 0.0010 (0.0018) model time 0.4627 (0.4673) loss 2.9632 (3.0594) grad_norm 1.4571 (1.5735) loss_scale 2048.0000 (1810.7094) mem 16715MB [2024-08-10 11:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][570/625] eta 0:00:25 lr 0.000894 wd 0.0500 time 0.4671 (0.4692) data time 0.0009 (0.0018) model time 0.4662 (0.4675) loss 3.2496 (3.0576) grad_norm 1.3295 (1.5712) loss_scale 2048.0000 (1814.8651) mem 16715MB [2024-08-10 11:03:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][580/625] eta 0:00:21 lr 0.000894 wd 0.0500 time 0.4645 (0.4692) data time 0.0008 (0.0018) model time 0.4637 (0.4676) loss 3.1790 (3.0586) grad_norm 1.3195 (1.5729) loss_scale 2048.0000 (1818.8778) mem 16715MB [2024-08-10 11:03:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][590/625] eta 0:00:16 lr 0.000894 wd 0.0500 time 0.4595 (0.4691) data time 0.0009 (0.0017) model time 0.4587 (0.4675) loss 2.8310 (3.0623) grad_norm 1.7911 (1.5804) loss_scale 2048.0000 (1822.7547) mem 16715MB [2024-08-10 11:03:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][600/625] eta 0:00:11 lr 0.000894 wd 0.0500 time 0.4593 (0.4691) data time 0.0008 (0.0017) model time 0.4585 (0.4674) loss 3.5727 (3.0644) grad_norm 1.2942 (1.5773) loss_scale 2048.0000 (1826.5025) mem 16715MB [2024-08-10 11:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][610/625] eta 0:00:07 lr 0.000893 wd 0.0500 time 0.4605 (0.4691) data time 0.0006 (0.0017) model time 0.4599 (0.4674) loss 1.9659 (3.0584) grad_norm 1.4167 (1.5785) loss_scale 2048.0000 (1830.1277) mem 16715MB [2024-08-10 11:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [114/300][620/625] eta 0:00:02 lr 0.000893 wd 0.0500 time 0.4600 (0.4691) data time 0.0007 (0.0017) model time 0.4592 (0.4674) loss 1.8362 (3.0570) grad_norm 1.3746 (1.5817) loss_scale 2048.0000 (1833.6361) mem 16715MB [2024-08-10 11:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 114 training takes 0:04:53 [2024-08-10 11:03:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:03:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:03:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.5601 (0.5601) Acc@1 87.891 (87.891) Acc@5 98.145 (98.145) Mem 16715MB [2024-08-10 11:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.9209 (0.7116) Acc@1 78.516 (84.641) Acc@5 95.557 (97.208) Mem 16715MB [2024-08-10 11:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.142) Loss 1.0488 (0.8464) Acc@1 74.512 (81.178) Acc@5 93.555 (95.703) Mem 16715MB [2024-08-10 11:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 80.842 Acc@5 95.661 [2024-08-10 11:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 80.8% [2024-08-10 11:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.917 (0.917) Loss 0.4941 (0.4941) Acc@1 88.965 (88.965) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 11:03:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.198) Loss 0.7964 (0.6180) Acc@1 80.469 (86.244) Acc@5 96.240 (97.718) Mem 16715MB [2024-08-10 11:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.9097 (0.7283) Acc@1 77.197 (83.208) Acc@5 95.459 (96.577) Mem 16715MB [2024-08-10 11:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.925 Acc@5 96.599 [2024-08-10 11:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 11:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.93% [2024-08-10 11:03:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:03:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:03:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][0/625] eta 0:08:23 lr 0.000893 wd 0.0500 time 0.8056 (0.8056) data time 0.3969 (0.3969) model time 0.0000 (0.0000) loss 3.3474 (3.3474) grad_norm 1.3067 (1.3067) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][10/625] eta 0:05:03 lr 0.000893 wd 0.0500 time 0.4633 (0.4940) data time 0.0008 (0.0370) model time 0.0000 (0.0000) loss 3.4748 (3.4369) grad_norm 1.4805 (1.3169) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][20/625] eta 0:04:49 lr 0.000893 wd 0.0500 time 0.4517 (0.4792) data time 0.0011 (0.0199) model time 0.0000 (0.0000) loss 3.6420 (3.2514) grad_norm 2.1465 (1.4262) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][30/625] eta 0:04:48 lr 0.000893 wd 0.0500 time 0.5244 (0.4841) data time 0.0010 (0.0138) model time 0.0000 (0.0000) loss 3.2627 (3.2127) grad_norm 1.4212 (1.4651) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][40/625] eta 0:04:39 lr 0.000893 wd 0.0500 time 0.4607 (0.4786) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 1.7341 (3.1010) grad_norm 1.5320 (1.4596) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][50/625] eta 0:04:33 lr 0.000893 wd 0.0500 time 0.4701 (0.4759) data time 0.0010 (0.0095) model time 0.0000 (0.0000) loss 2.6110 (3.0879) grad_norm 1.6694 (1.4736) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][60/625] eta 0:04:27 lr 0.000893 wd 0.0500 time 0.4636 (0.4740) data time 0.0008 (0.0081) model time 0.4628 (0.4634) loss 3.7780 (3.1129) grad_norm 1.4150 (1.5133) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][70/625] eta 0:04:22 lr 0.000893 wd 0.0500 time 0.4545 (0.4728) data time 0.0011 (0.0071) model time 0.4534 (0.4639) loss 3.3756 (3.0743) grad_norm 1.9486 (1.5266) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][80/625] eta 0:04:17 lr 0.000893 wd 0.0500 time 0.4583 (0.4716) data time 0.0008 (0.0064) model time 0.4575 (0.4633) loss 2.4434 (3.0430) grad_norm 1.2866 (1.5023) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][90/625] eta 0:04:11 lr 0.000892 wd 0.0500 time 0.4669 (0.4706) data time 0.0010 (0.0059) model time 0.4659 (0.4627) loss 3.4055 (3.0494) grad_norm 1.6707 (1.4856) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][100/625] eta 0:04:07 lr 0.000892 wd 0.0500 time 0.4677 (0.4708) data time 0.0008 (0.0054) model time 0.4670 (0.4644) loss 1.8702 (2.9903) grad_norm 1.2703 (1.4744) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][110/625] eta 0:04:02 lr 0.000892 wd 0.0500 time 0.4627 (0.4701) data time 0.0008 (0.0050) model time 0.4619 (0.4639) loss 3.1604 (3.0112) grad_norm 1.6740 (1.4947) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][120/625] eta 0:03:59 lr 0.000892 wd 0.0500 time 0.4638 (0.4743) data time 0.0010 (0.0047) model time 0.4627 (0.4719) loss 3.2367 (2.9952) grad_norm 1.2174 (1.4913) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][130/625] eta 0:03:54 lr 0.000892 wd 0.0500 time 0.4686 (0.4736) data time 0.0011 (0.0044) model time 0.4676 (0.4709) loss 3.4333 (3.0044) grad_norm 2.0798 (1.4864) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][140/625] eta 0:03:49 lr 0.000892 wd 0.0500 time 0.4649 (0.4740) data time 0.0011 (0.0042) model time 0.4639 (0.4717) loss 2.9222 (2.9996) grad_norm 1.4181 (1.4747) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][150/625] eta 0:03:45 lr 0.000892 wd 0.0500 time 0.5149 (0.4737) data time 0.0008 (0.0040) model time 0.5141 (0.4714) loss 3.1883 (3.0088) grad_norm 1.5171 (1.5080) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][160/625] eta 0:03:39 lr 0.000892 wd 0.0500 time 0.4599 (0.4730) data time 0.0010 (0.0038) model time 0.4589 (0.4704) loss 2.8093 (3.0117) grad_norm 2.7737 (1.5230) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][170/625] eta 0:03:35 lr 0.000892 wd 0.0500 time 0.4631 (0.4727) data time 0.0008 (0.0037) model time 0.4623 (0.4701) loss 3.1705 (3.0313) grad_norm 1.5952 (1.5368) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][180/625] eta 0:03:30 lr 0.000892 wd 0.0500 time 0.4656 (0.4722) data time 0.0009 (0.0035) model time 0.4646 (0.4695) loss 3.7064 (3.0328) grad_norm 1.8460 (1.5541) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][190/625] eta 0:03:25 lr 0.000892 wd 0.0500 time 0.4681 (0.4718) data time 0.0008 (0.0034) model time 0.4673 (0.4691) loss 2.2252 (3.0212) grad_norm 1.8867 (1.5484) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][200/625] eta 0:03:20 lr 0.000891 wd 0.0500 time 0.4718 (0.4715) data time 0.0008 (0.0033) model time 0.4710 (0.4689) loss 2.7705 (3.0185) grad_norm 1.3544 (1.5570) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][210/625] eta 0:03:15 lr 0.000891 wd 0.0500 time 0.4775 (0.4714) data time 0.0008 (0.0032) model time 0.4767 (0.4688) loss 3.2450 (3.0287) grad_norm 1.2862 (1.5542) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][220/625] eta 0:03:10 lr 0.000891 wd 0.0500 time 0.4651 (0.4713) data time 0.0010 (0.0031) model time 0.4641 (0.4687) loss 2.8963 (3.0212) grad_norm 1.1628 (1.5584) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][230/625] eta 0:03:05 lr 0.000891 wd 0.0500 time 0.4509 (0.4709) data time 0.0010 (0.0030) model time 0.4499 (0.4683) loss 3.2733 (3.0238) grad_norm 1.6167 (1.5567) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][240/625] eta 0:03:01 lr 0.000891 wd 0.0500 time 0.4631 (0.4706) data time 0.0007 (0.0029) model time 0.4623 (0.4680) loss 2.8677 (3.0157) grad_norm 1.2154 (1.5657) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][250/625] eta 0:02:56 lr 0.000891 wd 0.0500 time 0.4652 (0.4703) data time 0.0010 (0.0028) model time 0.4642 (0.4678) loss 3.3171 (3.0217) grad_norm 1.6131 (1.5598) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][260/625] eta 0:02:51 lr 0.000891 wd 0.0500 time 0.4669 (0.4701) data time 0.0007 (0.0028) model time 0.4661 (0.4676) loss 2.7272 (3.0352) grad_norm 1.7970 (1.5619) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][270/625] eta 0:02:46 lr 0.000891 wd 0.0500 time 0.4659 (0.4700) data time 0.0008 (0.0027) model time 0.4651 (0.4676) loss 3.5738 (3.0364) grad_norm 1.4552 (1.5602) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][280/625] eta 0:02:42 lr 0.000891 wd 0.0500 time 0.4646 (0.4700) data time 0.0007 (0.0026) model time 0.4639 (0.4675) loss 2.5613 (3.0435) grad_norm 1.0677 (1.5550) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][290/625] eta 0:02:37 lr 0.000891 wd 0.0500 time 0.4657 (0.4698) data time 0.0011 (0.0026) model time 0.4647 (0.4674) loss 3.4897 (3.0457) grad_norm 1.7934 (1.5569) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][300/625] eta 0:02:32 lr 0.000891 wd 0.0500 time 0.4624 (0.4697) data time 0.0010 (0.0025) model time 0.4614 (0.4673) loss 3.3947 (3.0483) grad_norm 1.2243 (1.5515) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][310/625] eta 0:02:27 lr 0.000890 wd 0.0500 time 0.4596 (0.4695) data time 0.0010 (0.0025) model time 0.4585 (0.4671) loss 3.2793 (3.0569) grad_norm 1.9827 (1.5534) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][320/625] eta 0:02:23 lr 0.000890 wd 0.0500 time 0.4665 (0.4698) data time 0.0010 (0.0024) model time 0.4655 (0.4676) loss 2.9245 (3.0628) grad_norm 1.7612 (1.5644) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][330/625] eta 0:02:18 lr 0.000890 wd 0.0500 time 0.4789 (0.4697) data time 0.0011 (0.0024) model time 0.4778 (0.4675) loss 3.5581 (3.0626) grad_norm 1.5602 (1.5666) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][340/625] eta 0:02:13 lr 0.000890 wd 0.0500 time 0.4689 (0.4696) data time 0.0011 (0.0024) model time 0.4678 (0.4673) loss 2.8238 (3.0626) grad_norm 2.4887 (1.5738) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][350/625] eta 0:02:09 lr 0.000890 wd 0.0500 time 0.4711 (0.4696) data time 0.0011 (0.0024) model time 0.4701 (0.4674) loss 3.4380 (3.0629) grad_norm 3.1801 (1.5914) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][360/625] eta 0:02:04 lr 0.000890 wd 0.0500 time 0.4881 (0.4696) data time 0.0009 (0.0023) model time 0.4872 (0.4674) loss 3.7437 (3.0626) grad_norm 1.2252 (1.5863) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][370/625] eta 0:01:59 lr 0.000890 wd 0.0500 time 0.4589 (0.4703) data time 0.0012 (0.0023) model time 0.4577 (0.4682) loss 3.7579 (3.0678) grad_norm 1.1102 (1.5794) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][380/625] eta 0:01:55 lr 0.000890 wd 0.0500 time 0.4678 (0.4702) data time 0.0009 (0.0023) model time 0.4670 (0.4681) loss 2.3194 (3.0675) grad_norm 1.3936 (1.6021) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][390/625] eta 0:01:50 lr 0.000890 wd 0.0500 time 0.4659 (0.4700) data time 0.0012 (0.0023) model time 0.4647 (0.4680) loss 3.1906 (3.0728) grad_norm 1.4081 (1.5992) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][400/625] eta 0:01:45 lr 0.000890 wd 0.0500 time 0.4622 (0.4699) data time 0.0008 (0.0022) model time 0.4614 (0.4678) loss 2.8732 (3.0702) grad_norm 1.4258 (1.5976) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:06:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][410/625] eta 0:01:40 lr 0.000889 wd 0.0500 time 0.4663 (0.4698) data time 0.0011 (0.0022) model time 0.4652 (0.4677) loss 3.5571 (3.0754) grad_norm 1.3915 (1.6022) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][420/625] eta 0:01:36 lr 0.000889 wd 0.0500 time 0.4547 (0.4696) data time 0.0008 (0.0022) model time 0.4539 (0.4676) loss 3.4477 (3.0780) grad_norm 1.5859 (1.6008) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][430/625] eta 0:01:31 lr 0.000889 wd 0.0500 time 0.4626 (0.4695) data time 0.0010 (0.0022) model time 0.4616 (0.4675) loss 2.9578 (3.0722) grad_norm 1.2063 (1.5956) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][440/625] eta 0:01:26 lr 0.000889 wd 0.0500 time 0.4725 (0.4694) data time 0.0012 (0.0021) model time 0.4713 (0.4674) loss 3.6638 (3.0744) grad_norm 2.0029 (1.5927) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][450/625] eta 0:01:22 lr 0.000889 wd 0.0500 time 0.6616 (0.4697) data time 0.0010 (0.0021) model time 0.6606 (0.4677) loss 3.3795 (3.0757) grad_norm 1.6483 (1.6005) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][460/625] eta 0:01:17 lr 0.000889 wd 0.0500 time 0.6935 (0.4705) data time 0.0008 (0.0021) model time 0.6927 (0.4686) loss 3.2717 (3.0744) grad_norm 1.9939 (1.5989) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][470/625] eta 0:01:12 lr 0.000889 wd 0.0500 time 0.4609 (0.4702) data time 0.0008 (0.0021) model time 0.4601 (0.4683) loss 2.0623 (3.0738) grad_norm 1.2478 (1.5977) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][480/625] eta 0:01:08 lr 0.000889 wd 0.0500 time 0.4648 (0.4704) data time 0.0012 (0.0020) model time 0.4636 (0.4686) loss 2.9867 (3.0713) grad_norm 1.4218 (1.5952) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][490/625] eta 0:01:03 lr 0.000889 wd 0.0500 time 0.4703 (0.4703) data time 0.0010 (0.0020) model time 0.4693 (0.4684) loss 2.6216 (3.0674) grad_norm 1.6128 (1.5920) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][500/625] eta 0:00:58 lr 0.000889 wd 0.0500 time 0.4647 (0.4701) data time 0.0009 (0.0020) model time 0.4638 (0.4683) loss 2.9908 (3.0651) grad_norm 1.1613 (1.5881) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][510/625] eta 0:00:54 lr 0.000889 wd 0.0500 time 0.4627 (0.4700) data time 0.0010 (0.0020) model time 0.4617 (0.4682) loss 2.8327 (3.0658) grad_norm 1.5191 (1.5844) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][520/625] eta 0:00:49 lr 0.000888 wd 0.0500 time 0.4625 (0.4699) data time 0.0010 (0.0020) model time 0.4615 (0.4681) loss 3.2597 (3.0646) grad_norm 1.2873 (1.5844) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][530/625] eta 0:00:44 lr 0.000888 wd 0.0500 time 0.4580 (0.4697) data time 0.0011 (0.0020) model time 0.4569 (0.4679) loss 2.8689 (3.0660) grad_norm 1.1388 (1.5805) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:07:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][540/625] eta 0:00:39 lr 0.000888 wd 0.0500 time 0.4636 (0.4696) data time 0.0008 (0.0019) model time 0.4629 (0.4678) loss 3.1774 (3.0645) grad_norm 2.3941 (1.5824) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][550/625] eta 0:00:35 lr 0.000888 wd 0.0500 time 0.4623 (0.4695) data time 0.0010 (0.0019) model time 0.4612 (0.4677) loss 3.5628 (3.0612) grad_norm 1.4003 (1.5859) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][560/625] eta 0:00:30 lr 0.000888 wd 0.0500 time 0.4642 (0.4698) data time 0.0008 (0.0019) model time 0.4634 (0.4680) loss 3.9126 (3.0663) grad_norm 1.3483 (1.5831) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][570/625] eta 0:00:25 lr 0.000888 wd 0.0500 time 0.4672 (0.4697) data time 0.0008 (0.0019) model time 0.4664 (0.4680) loss 3.0359 (3.0697) grad_norm 1.8893 (1.5820) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][580/625] eta 0:00:21 lr 0.000888 wd 0.0500 time 0.4651 (0.4696) data time 0.0010 (0.0019) model time 0.4641 (0.4679) loss 3.1703 (3.0685) grad_norm 2.1004 (1.5883) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][590/625] eta 0:00:16 lr 0.000888 wd 0.0500 time 0.4570 (0.4695) data time 0.0009 (0.0019) model time 0.4561 (0.4677) loss 3.5624 (3.0707) grad_norm 1.5728 (1.5938) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][600/625] eta 0:00:11 lr 0.000888 wd 0.0500 time 0.4645 (0.4694) data time 0.0011 (0.0019) model time 0.4634 (0.4676) loss 3.1558 (3.0707) grad_norm 1.4566 (1.5921) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][610/625] eta 0:00:07 lr 0.000888 wd 0.0500 time 0.4580 (0.4693) data time 0.0005 (0.0018) model time 0.4575 (0.4675) loss 3.1446 (3.0705) grad_norm 1.2433 (1.5918) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [115/300][620/625] eta 0:00:02 lr 0.000888 wd 0.0500 time 0.4629 (0.4691) data time 0.0005 (0.0018) model time 0.4624 (0.4673) loss 2.6208 (3.0704) grad_norm 1.9326 (1.5897) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 115 training takes 0:04:53 [2024-08-10 11:08:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:08:39 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.5498 (0.5498) Acc@1 87.939 (87.939) Acc@5 97.998 (97.998) Mem 16715MB [2024-08-10 11:08:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.9038 (0.6802) Acc@1 78.955 (84.668) Acc@5 95.117 (97.252) Mem 16715MB [2024-08-10 11:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.142) Loss 0.9946 (0.8101) Acc@1 75.635 (81.424) Acc@5 93.994 (95.815) Mem 16715MB [2024-08-10 11:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.136 Acc@5 95.781 [2024-08-10 11:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-10 11:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.14% [2024-08-10 11:08:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 11:08:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 11:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.4924 (0.4924) Acc@1 89.111 (89.111) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 11:08:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.7959 (0.6173) Acc@1 80.566 (86.262) Acc@5 96.289 (97.749) Mem 16715MB [2024-08-10 11:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9072 (0.7272) Acc@1 77.441 (83.252) Acc@5 95.605 (96.608) Mem 16715MB [2024-08-10 11:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.959 Acc@5 96.623 [2024-08-10 11:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 11:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 82.96% [2024-08-10 11:08:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:08:49 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][0/625] eta 0:08:06 lr 0.000887 wd 0.0500 time 0.7782 (0.7782) data time 0.3674 (0.3674) model time 0.0000 (0.0000) loss 3.4889 (3.4889) grad_norm 1.3290 (1.3290) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][10/625] eta 0:05:04 lr 0.000887 wd 0.0500 time 0.4702 (0.4952) data time 0.0010 (0.0344) model time 0.0000 (0.0000) loss 2.9548 (3.1759) grad_norm 1.0891 (1.4792) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][20/625] eta 0:04:56 lr 0.000887 wd 0.0500 time 0.4644 (0.4893) data time 0.0010 (0.0185) model time 0.0000 (0.0000) loss 2.9629 (3.0925) grad_norm 2.8286 (1.6681) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][30/625] eta 0:04:49 lr 0.000887 wd 0.0500 time 0.4668 (0.4871) data time 0.0010 (0.0128) model time 0.0000 (0.0000) loss 3.5039 (3.1089) grad_norm 1.2535 (1.6226) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][40/625] eta 0:04:43 lr 0.000887 wd 0.0500 time 0.4662 (0.4852) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 3.6037 (3.1278) grad_norm 1.2834 (1.5598) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][50/625] eta 0:04:36 lr 0.000887 wd 0.0500 time 0.4633 (0.4806) data time 0.0010 (0.0082) model time 0.0000 (0.0000) loss 2.7936 (3.0785) grad_norm 1.2685 (1.5963) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][60/625] eta 0:04:29 lr 0.000887 wd 0.0500 time 0.4608 (0.4773) data time 0.0010 (0.0070) model time 0.4598 (0.4597) loss 2.8212 (3.0895) grad_norm 2.5443 (1.6429) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][70/625] eta 0:04:25 lr 0.000887 wd 0.0500 time 0.4650 (0.4777) data time 0.0008 (0.0062) model time 0.4643 (0.4690) loss 1.9611 (3.0771) grad_norm 1.3250 (1.6113) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][80/625] eta 0:04:19 lr 0.000887 wd 0.0500 time 0.4605 (0.4760) data time 0.0008 (0.0056) model time 0.4597 (0.4671) loss 3.4769 (3.0977) grad_norm 1.3093 (1.5913) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][90/625] eta 0:04:13 lr 0.000887 wd 0.0500 time 0.4664 (0.4747) data time 0.0010 (0.0051) model time 0.4654 (0.4662) loss 3.0650 (3.0951) grad_norm 1.7197 (1.5884) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][100/625] eta 0:04:08 lr 0.000887 wd 0.0500 time 0.4590 (0.4735) data time 0.0010 (0.0047) model time 0.4579 (0.4651) loss 2.6447 (3.0646) grad_norm 1.5798 (1.5813) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][110/625] eta 0:04:03 lr 0.000886 wd 0.0500 time 0.4516 (0.4722) data time 0.0008 (0.0043) model time 0.4509 (0.4640) loss 3.3152 (3.0406) grad_norm 1.1522 (1.5601) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][120/625] eta 0:03:58 lr 0.000886 wd 0.0500 time 0.4605 (0.4714) data time 0.0010 (0.0041) model time 0.4595 (0.4636) loss 2.7936 (3.0400) grad_norm 1.3055 (1.5707) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][130/625] eta 0:03:53 lr 0.000886 wd 0.0500 time 0.4580 (0.4724) data time 0.0008 (0.0038) model time 0.4572 (0.4661) loss 1.8076 (3.0414) grad_norm 1.2498 (1.5609) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][140/625] eta 0:03:48 lr 0.000886 wd 0.0500 time 0.4607 (0.4717) data time 0.0009 (0.0037) model time 0.4598 (0.4657) loss 3.2525 (3.0424) grad_norm 1.7730 (1.5645) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][150/625] eta 0:03:43 lr 0.000886 wd 0.0500 time 0.4703 (0.4713) data time 0.0011 (0.0035) model time 0.4692 (0.4654) loss 2.7750 (3.0400) grad_norm 1.3265 (1.5853) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][160/625] eta 0:03:38 lr 0.000886 wd 0.0500 time 0.4741 (0.4709) data time 0.0010 (0.0033) model time 0.4731 (0.4653) loss 3.1667 (3.0474) grad_norm 1.7991 (1.5901) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][170/625] eta 0:03:34 lr 0.000886 wd 0.0500 time 0.4651 (0.4706) data time 0.0007 (0.0032) model time 0.4644 (0.4652) loss 2.9624 (3.0620) grad_norm 2.0613 (1.5942) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][180/625] eta 0:03:29 lr 0.000886 wd 0.0500 time 0.4621 (0.4701) data time 0.0010 (0.0031) model time 0.4611 (0.4649) loss 2.9556 (3.0598) grad_norm 1.2877 (1.5895) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][190/625] eta 0:03:24 lr 0.000886 wd 0.0500 time 0.4651 (0.4697) data time 0.0008 (0.0030) model time 0.4643 (0.4646) loss 1.9435 (3.0545) grad_norm 1.7486 (1.5979) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][200/625] eta 0:03:19 lr 0.000886 wd 0.0500 time 0.4644 (0.4693) data time 0.0010 (0.0029) model time 0.4634 (0.4644) loss 3.2719 (3.0466) grad_norm 1.8755 (1.6084) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][210/625] eta 0:03:14 lr 0.000886 wd 0.0500 time 0.4663 (0.4691) data time 0.0010 (0.0028) model time 0.4653 (0.4643) loss 3.2314 (3.0566) grad_norm 1.3555 (1.6072) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][220/625] eta 0:03:10 lr 0.000885 wd 0.0500 time 0.4666 (0.4696) data time 0.0008 (0.0027) model time 0.4658 (0.4652) loss 2.3052 (3.0667) grad_norm 1.4556 (1.5986) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][230/625] eta 0:03:05 lr 0.000885 wd 0.0500 time 0.4654 (0.4698) data time 0.0008 (0.0026) model time 0.4646 (0.4657) loss 3.4832 (3.0709) grad_norm 1.2280 (1.5911) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][240/625] eta 0:03:01 lr 0.000885 wd 0.0500 time 0.4102 (0.4703) data time 0.0009 (0.0026) model time 0.4094 (0.4665) loss 2.6976 (3.0742) grad_norm 1.3861 (1.5961) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][250/625] eta 0:02:56 lr 0.000885 wd 0.0500 time 0.4664 (0.4700) data time 0.0007 (0.0025) model time 0.4657 (0.4662) loss 3.5102 (3.0667) grad_norm 1.5149 (1.5899) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][260/625] eta 0:02:51 lr 0.000885 wd 0.0500 time 0.4630 (0.4701) data time 0.0008 (0.0024) model time 0.4622 (0.4665) loss 3.4693 (3.0683) grad_norm 1.5429 (1.6013) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:10:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][270/625] eta 0:02:46 lr 0.000885 wd 0.0500 time 0.4555 (0.4698) data time 0.0010 (0.0024) model time 0.4546 (0.4662) loss 3.4012 (3.0761) grad_norm 1.3710 (1.5976) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][280/625] eta 0:02:41 lr 0.000885 wd 0.0500 time 0.4591 (0.4695) data time 0.0007 (0.0023) model time 0.4584 (0.4659) loss 3.2522 (3.0762) grad_norm 1.4687 (1.5882) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][290/625] eta 0:02:37 lr 0.000885 wd 0.0500 time 0.4632 (0.4693) data time 0.0007 (0.0023) model time 0.4624 (0.4658) loss 2.7112 (3.0740) grad_norm 1.4099 (1.5841) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][300/625] eta 0:02:32 lr 0.000885 wd 0.0500 time 0.4692 (0.4692) data time 0.0010 (0.0023) model time 0.4681 (0.4658) loss 3.6527 (3.0798) grad_norm 1.6605 (1.5871) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][310/625] eta 0:02:27 lr 0.000885 wd 0.0500 time 0.4675 (0.4690) data time 0.0010 (0.0022) model time 0.4665 (0.4657) loss 3.0734 (3.0710) grad_norm 2.0130 (1.5883) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][320/625] eta 0:02:23 lr 0.000884 wd 0.0500 time 0.4587 (0.4689) data time 0.0009 (0.0022) model time 0.4578 (0.4656) loss 3.3038 (3.0745) grad_norm 1.4016 (1.5973) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][330/625] eta 0:02:18 lr 0.000884 wd 0.0500 time 0.4605 (0.4687) data time 0.0009 (0.0022) model time 0.4596 (0.4655) loss 2.8898 (3.0712) grad_norm 1.1418 (1.5989) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][340/625] eta 0:02:13 lr 0.000884 wd 0.0500 time 0.4640 (0.4686) data time 0.0008 (0.0021) model time 0.4632 (0.4655) loss 3.3845 (3.0750) grad_norm 1.2215 (1.5932) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][350/625] eta 0:02:09 lr 0.000884 wd 0.0500 time 0.4680 (0.4697) data time 0.0010 (0.0021) model time 0.4670 (0.4668) loss 3.2222 (3.0780) grad_norm 1.5663 (1.5881) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][360/625] eta 0:02:04 lr 0.000884 wd 0.0500 time 0.4683 (0.4696) data time 0.0008 (0.0021) model time 0.4675 (0.4668) loss 3.8126 (3.0722) grad_norm 1.5041 (1.5977) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][370/625] eta 0:01:59 lr 0.000884 wd 0.0500 time 0.4612 (0.4695) data time 0.0011 (0.0020) model time 0.4601 (0.4667) loss 3.1963 (3.0718) grad_norm 2.1098 (1.6086) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][380/625] eta 0:01:55 lr 0.000884 wd 0.0500 time 0.4642 (0.4695) data time 0.0010 (0.0020) model time 0.4632 (0.4667) loss 2.5101 (3.0792) grad_norm 1.3595 (1.6091) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][390/625] eta 0:01:50 lr 0.000884 wd 0.0500 time 0.4726 (0.4698) data time 0.0009 (0.0020) model time 0.4716 (0.4672) loss 3.6982 (3.0820) grad_norm 1.6617 (1.6033) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][400/625] eta 0:01:45 lr 0.000884 wd 0.0500 time 0.4692 (0.4699) data time 0.0008 (0.0020) model time 0.4684 (0.4673) loss 2.7865 (3.0795) grad_norm 1.9128 (1.6052) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][410/625] eta 0:01:41 lr 0.000884 wd 0.0500 time 0.4566 (0.4698) data time 0.0012 (0.0019) model time 0.4554 (0.4672) loss 3.1341 (3.0818) grad_norm 1.6407 (1.6039) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][420/625] eta 0:01:36 lr 0.000884 wd 0.0500 time 0.4635 (0.4697) data time 0.0008 (0.0020) model time 0.4626 (0.4671) loss 2.4496 (3.0831) grad_norm 2.1243 (1.6096) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][430/625] eta 0:01:31 lr 0.000883 wd 0.0500 time 0.4653 (0.4696) data time 0.0010 (0.0019) model time 0.4643 (0.4671) loss 3.3904 (3.0801) grad_norm 1.9313 (1.6126) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][440/625] eta 0:01:26 lr 0.000883 wd 0.0500 time 0.4631 (0.4700) data time 0.0008 (0.0019) model time 0.4623 (0.4675) loss 2.6570 (3.0787) grad_norm 1.8261 (1.6175) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][450/625] eta 0:01:22 lr 0.000883 wd 0.0500 time 0.4665 (0.4702) data time 0.0008 (0.0019) model time 0.4657 (0.4678) loss 2.2358 (3.0772) grad_norm 1.4592 (1.6182) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][460/625] eta 0:01:17 lr 0.000883 wd 0.0500 time 0.4686 (0.4701) data time 0.0008 (0.0019) model time 0.4678 (0.4677) loss 4.0566 (3.0737) grad_norm 1.2637 (1.6147) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][470/625] eta 0:01:12 lr 0.000883 wd 0.0500 time 0.4638 (0.4700) data time 0.0010 (0.0019) model time 0.4628 (0.4676) loss 2.8324 (3.0756) grad_norm 1.4380 (1.6130) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][480/625] eta 0:01:08 lr 0.000883 wd 0.0500 time 0.4627 (0.4703) data time 0.0008 (0.0019) model time 0.4619 (0.4679) loss 3.9192 (3.0785) grad_norm 1.6777 (1.6096) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][490/625] eta 0:01:03 lr 0.000883 wd 0.0500 time 0.4623 (0.4701) data time 0.0008 (0.0018) model time 0.4615 (0.4678) loss 2.9950 (3.0749) grad_norm 1.5915 (1.6077) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][500/625] eta 0:00:58 lr 0.000883 wd 0.0500 time 0.4550 (0.4709) data time 0.0008 (0.0018) model time 0.4542 (0.4687) loss 2.1017 (3.0695) grad_norm 2.1491 (1.6107) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][510/625] eta 0:00:54 lr 0.000883 wd 0.0500 time 0.4676 (0.4707) data time 0.0008 (0.0018) model time 0.4668 (0.4685) loss 3.7753 (3.0708) grad_norm 1.6940 (1.6120) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][520/625] eta 0:00:49 lr 0.000883 wd 0.0500 time 0.4655 (0.4706) data time 0.0008 (0.0018) model time 0.4647 (0.4685) loss 2.0838 (3.0623) grad_norm 1.7466 (1.6143) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:12:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][530/625] eta 0:00:44 lr 0.000882 wd 0.0500 time 0.4695 (0.4706) data time 0.0008 (0.0018) model time 0.4687 (0.4684) loss 3.1867 (3.0599) grad_norm 1.6019 (1.6125) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][540/625] eta 0:00:39 lr 0.000882 wd 0.0500 time 0.4693 (0.4705) data time 0.0007 (0.0018) model time 0.4686 (0.4683) loss 3.4820 (3.0602) grad_norm 1.3462 (1.6119) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][550/625] eta 0:00:35 lr 0.000882 wd 0.0500 time 0.4614 (0.4704) data time 0.0008 (0.0017) model time 0.4606 (0.4682) loss 2.0832 (3.0619) grad_norm 1.2670 (1.6099) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][560/625] eta 0:00:30 lr 0.000882 wd 0.0500 time 0.4596 (0.4702) data time 0.0011 (0.0017) model time 0.4585 (0.4681) loss 3.3057 (3.0641) grad_norm 1.6088 (1.6095) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][570/625] eta 0:00:25 lr 0.000882 wd 0.0500 time 0.4586 (0.4700) data time 0.0008 (0.0017) model time 0.4578 (0.4679) loss 2.4103 (3.0633) grad_norm 1.5945 (1.6068) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][580/625] eta 0:00:21 lr 0.000882 wd 0.0500 time 0.4647 (0.4699) data time 0.0009 (0.0017) model time 0.4638 (0.4678) loss 3.0224 (3.0640) grad_norm 1.1953 (1.6039) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][590/625] eta 0:00:16 lr 0.000882 wd 0.0500 time 0.4687 (0.4705) data time 0.0010 (0.0017) model time 0.4676 (0.4684) loss 3.3338 (3.0640) grad_norm 1.6217 (1.5993) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][600/625] eta 0:00:11 lr 0.000882 wd 0.0500 time 0.4630 (0.4704) data time 0.0011 (0.0017) model time 0.4619 (0.4684) loss 3.2045 (3.0639) grad_norm 1.6400 (1.6034) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][610/625] eta 0:00:07 lr 0.000882 wd 0.0500 time 0.4600 (0.4703) data time 0.0005 (0.0017) model time 0.4594 (0.4683) loss 2.9010 (3.0679) grad_norm 1.6132 (1.6021) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [116/300][620/625] eta 0:00:02 lr 0.000882 wd 0.0500 time 0.4597 (0.4702) data time 0.0005 (0.0017) model time 0.4592 (0.4681) loss 3.8075 (3.0729) grad_norm 1.1442 (1.5988) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 116 training takes 0:04:53 [2024-08-10 11:13:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:13:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 0.5830 (0.5830) Acc@1 87.402 (87.402) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 11:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9321 (0.6934) Acc@1 77.002 (84.579) Acc@5 95.654 (97.288) Mem 16715MB [2024-08-10 11:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0205 (0.8152) Acc@1 76.074 (81.562) Acc@5 94.141 (95.896) Mem 16715MB [2024-08-10 11:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.290 Acc@5 95.891 [2024-08-10 11:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-10 11:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.29% [2024-08-10 11:13:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 11:13:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 11:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.530 (0.530) Loss 0.4924 (0.4924) Acc@1 89.062 (89.062) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 11:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.163) Loss 0.7949 (0.6163) Acc@1 80.713 (86.337) Acc@5 96.338 (97.749) Mem 16715MB [2024-08-10 11:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 0.9058 (0.7261) Acc@1 77.344 (83.310) Acc@5 95.605 (96.598) Mem 16715MB [2024-08-10 11:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.013 Acc@5 96.617 [2024-08-10 11:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 11:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.01% [2024-08-10 11:13:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:13:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][0/625] eta 0:08:19 lr 0.000882 wd 0.0500 time 0.7988 (0.7988) data time 0.3884 (0.3884) model time 0.0000 (0.0000) loss 1.8902 (1.8902) grad_norm 1.1570 (1.1570) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][10/625] eta 0:05:03 lr 0.000881 wd 0.0500 time 0.4631 (0.4942) data time 0.0011 (0.0364) model time 0.0000 (0.0000) loss 3.0736 (2.8201) grad_norm 1.5682 (1.8564) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][20/625] eta 0:04:50 lr 0.000881 wd 0.0500 time 0.4658 (0.4802) data time 0.0010 (0.0196) model time 0.0000 (0.0000) loss 3.1646 (2.9700) grad_norm 1.1329 (1.8529) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][30/625] eta 0:04:42 lr 0.000881 wd 0.0500 time 0.4642 (0.4751) data time 0.0011 (0.0137) model time 0.0000 (0.0000) loss 2.5048 (3.0082) grad_norm 1.8351 (1.7610) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][40/625] eta 0:04:36 lr 0.000881 wd 0.0500 time 0.4627 (0.4726) data time 0.0011 (0.0106) model time 0.0000 (0.0000) loss 3.2215 (3.0567) grad_norm 1.1314 (1.6912) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][50/625] eta 0:04:30 lr 0.000881 wd 0.0500 time 0.4583 (0.4706) data time 0.0008 (0.0087) model time 0.0000 (0.0000) loss 2.9210 (3.0583) grad_norm 1.8125 (1.6614) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][60/625] eta 0:04:25 lr 0.000881 wd 0.0500 time 0.4602 (0.4692) data time 0.0008 (0.0075) model time 0.4594 (0.4611) loss 2.8902 (3.0326) grad_norm 2.2835 (1.7470) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][70/625] eta 0:04:20 lr 0.000881 wd 0.0500 time 0.4664 (0.4685) data time 0.0011 (0.0066) model time 0.4654 (0.4622) loss 3.5347 (3.0715) grad_norm 1.3095 (1.7124) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][80/625] eta 0:04:14 lr 0.000881 wd 0.0500 time 0.4607 (0.4677) data time 0.0008 (0.0059) model time 0.4599 (0.4616) loss 3.4211 (3.0856) grad_norm 1.6804 (1.6951) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][90/625] eta 0:04:11 lr 0.000881 wd 0.0500 time 0.4632 (0.4699) data time 0.0010 (0.0054) model time 0.4622 (0.4679) loss 3.5847 (3.0850) grad_norm 1.4586 (1.6956) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][100/625] eta 0:04:06 lr 0.000881 wd 0.0500 time 0.4675 (0.4695) data time 0.0008 (0.0049) model time 0.4667 (0.4674) loss 3.6685 (3.0904) grad_norm 4.3879 (1.7312) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][110/625] eta 0:04:01 lr 0.000881 wd 0.0500 time 0.4668 (0.4693) data time 0.0007 (0.0046) model time 0.4661 (0.4671) loss 2.6547 (3.0896) grad_norm 1.7113 (1.7258) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][120/625] eta 0:03:56 lr 0.000880 wd 0.0500 time 0.4632 (0.4690) data time 0.0011 (0.0043) model time 0.4622 (0.4667) loss 3.4118 (3.0795) grad_norm 1.4912 (1.7212) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:14:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][130/625] eta 0:03:53 lr 0.000880 wd 0.0500 time 0.4602 (0.4715) data time 0.0010 (0.0041) model time 0.4591 (0.4710) loss 2.4599 (3.0899) grad_norm 1.0398 (1.6986) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][140/625] eta 0:03:48 lr 0.000880 wd 0.0500 time 0.4591 (0.4709) data time 0.0010 (0.0038) model time 0.4580 (0.4699) loss 3.1565 (3.0829) grad_norm 1.3091 (1.6839) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][150/625] eta 0:03:43 lr 0.000880 wd 0.0500 time 0.4622 (0.4704) data time 0.0010 (0.0037) model time 0.4612 (0.4692) loss 2.3229 (3.0894) grad_norm 1.3784 (1.6787) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][160/625] eta 0:03:38 lr 0.000880 wd 0.0500 time 0.4714 (0.4699) data time 0.0010 (0.0035) model time 0.4704 (0.4685) loss 3.3888 (3.0856) grad_norm 1.6430 (1.6811) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][170/625] eta 0:03:33 lr 0.000880 wd 0.0500 time 0.4639 (0.4696) data time 0.0008 (0.0034) model time 0.4631 (0.4681) loss 3.5821 (3.0861) grad_norm 1.4538 (1.6715) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][180/625] eta 0:03:28 lr 0.000880 wd 0.0500 time 0.4578 (0.4693) data time 0.0008 (0.0032) model time 0.4570 (0.4677) loss 3.5564 (3.0995) grad_norm 1.4370 (1.6579) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][190/625] eta 0:03:24 lr 0.000880 wd 0.0500 time 0.4687 (0.4691) data time 0.0011 (0.0031) model time 0.4676 (0.4675) loss 3.0074 (3.1259) grad_norm 1.7064 (1.6561) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][200/625] eta 0:03:19 lr 0.000880 wd 0.0500 time 0.4599 (0.4689) data time 0.0009 (0.0030) model time 0.4590 (0.4672) loss 3.4956 (3.1373) grad_norm 1.7057 (1.6426) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][210/625] eta 0:03:14 lr 0.000880 wd 0.0500 time 0.4614 (0.4685) data time 0.0012 (0.0029) model time 0.4602 (0.4667) loss 3.1649 (3.1387) grad_norm 1.3972 (1.6440) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][220/625] eta 0:03:09 lr 0.000880 wd 0.0500 time 0.4598 (0.4681) data time 0.0008 (0.0028) model time 0.4591 (0.4663) loss 2.1118 (3.1367) grad_norm 1.0649 (1.6579) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][230/625] eta 0:03:04 lr 0.000879 wd 0.0500 time 0.4608 (0.4679) data time 0.0008 (0.0028) model time 0.4600 (0.4660) loss 2.5925 (3.1227) grad_norm 1.3122 (1.6604) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][240/625] eta 0:03:00 lr 0.000879 wd 0.0500 time 0.4622 (0.4677) data time 0.0008 (0.0027) model time 0.4613 (0.4658) loss 3.6451 (3.1121) grad_norm 1.0746 (1.6499) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][250/625] eta 0:02:55 lr 0.000879 wd 0.0500 time 0.4601 (0.4675) data time 0.0008 (0.0026) model time 0.4593 (0.4657) loss 1.6144 (3.1032) grad_norm 1.7579 (1.6446) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][260/625] eta 0:02:50 lr 0.000879 wd 0.0500 time 0.4610 (0.4674) data time 0.0010 (0.0026) model time 0.4599 (0.4656) loss 3.4198 (3.1026) grad_norm 1.3453 (1.6320) loss_scale 4096.0000 (2095.0805) mem 16715MB [2024-08-10 11:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][270/625] eta 0:02:45 lr 0.000879 wd 0.0500 time 0.4540 (0.4673) data time 0.0009 (0.0025) model time 0.4532 (0.4654) loss 3.5513 (3.1066) grad_norm 2.3335 (1.6325) loss_scale 4096.0000 (2168.9151) mem 16715MB [2024-08-10 11:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][280/625] eta 0:02:41 lr 0.000879 wd 0.0500 time 0.4611 (0.4671) data time 0.0010 (0.0025) model time 0.4601 (0.4653) loss 3.3881 (3.0940) grad_norm 1.2001 (1.6285) loss_scale 4096.0000 (2237.4947) mem 16715MB [2024-08-10 11:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][290/625] eta 0:02:36 lr 0.000879 wd 0.0500 time 0.4610 (0.4669) data time 0.0011 (0.0024) model time 0.4600 (0.4651) loss 3.1853 (3.0895) grad_norm 2.9266 (1.6379) loss_scale 4096.0000 (2301.3608) mem 16715MB [2024-08-10 11:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][300/625] eta 0:02:31 lr 0.000879 wd 0.0500 time 0.4627 (0.4667) data time 0.0008 (0.0024) model time 0.4619 (0.4648) loss 3.1660 (3.0933) grad_norm 1.3582 (1.6379) loss_scale 4096.0000 (2360.9834) mem 16715MB [2024-08-10 11:16:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][310/625] eta 0:02:27 lr 0.000879 wd 0.0500 time 0.4631 (0.4671) data time 0.0008 (0.0023) model time 0.4623 (0.4654) loss 3.3117 (3.0972) grad_norm 1.1855 (1.6323) loss_scale 4096.0000 (2416.7717) mem 16715MB [2024-08-10 11:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][320/625] eta 0:02:22 lr 0.000879 wd 0.0500 time 0.4684 (0.4671) data time 0.0010 (0.0023) model time 0.4673 (0.4653) loss 2.8781 (3.0973) grad_norm 1.3294 (1.6296) loss_scale 4096.0000 (2469.0841) mem 16715MB [2024-08-10 11:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][330/625] eta 0:02:17 lr 0.000878 wd 0.0500 time 0.4667 (0.4670) data time 0.0008 (0.0022) model time 0.4659 (0.4653) loss 2.0873 (3.0962) grad_norm 1.2749 (inf) loss_scale 2048.0000 (2462.5498) mem 16715MB [2024-08-10 11:16:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][340/625] eta 0:02:13 lr 0.000878 wd 0.0500 time 0.4489 (0.4674) data time 0.0009 (0.0022) model time 0.4480 (0.4657) loss 2.1747 (3.0968) grad_norm 1.4077 (inf) loss_scale 2048.0000 (2450.3930) mem 16715MB [2024-08-10 11:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][350/625] eta 0:02:08 lr 0.000878 wd 0.0500 time 0.4600 (0.4673) data time 0.0011 (0.0022) model time 0.4589 (0.4656) loss 3.6854 (3.0918) grad_norm 1.3154 (inf) loss_scale 2048.0000 (2438.9288) mem 16715MB [2024-08-10 11:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][360/625] eta 0:02:03 lr 0.000878 wd 0.0500 time 0.4659 (0.4671) data time 0.0010 (0.0021) model time 0.4649 (0.4655) loss 3.0006 (3.0869) grad_norm 2.0410 (inf) loss_scale 2048.0000 (2428.0997) mem 16715MB [2024-08-10 11:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][370/625] eta 0:01:59 lr 0.000878 wd 0.0500 time 0.4653 (0.4670) data time 0.0010 (0.0021) model time 0.4643 (0.4654) loss 2.4547 (3.0872) grad_norm 1.6926 (inf) loss_scale 2048.0000 (2417.8544) mem 16715MB [2024-08-10 11:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][380/625] eta 0:01:54 lr 0.000878 wd 0.0500 time 0.4672 (0.4669) data time 0.0010 (0.0021) model time 0.4662 (0.4653) loss 3.4290 (3.0843) grad_norm 1.7445 (inf) loss_scale 2048.0000 (2408.1470) mem 16715MB [2024-08-10 11:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][390/625] eta 0:01:49 lr 0.000878 wd 0.0500 time 0.4674 (0.4669) data time 0.0008 (0.0021) model time 0.4666 (0.4653) loss 3.3004 (3.0863) grad_norm 2.9044 (inf) loss_scale 2048.0000 (2398.9361) mem 16715MB [2024-08-10 11:17:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][400/625] eta 0:01:45 lr 0.000878 wd 0.0500 time 0.4653 (0.4669) data time 0.0007 (0.0020) model time 0.4646 (0.4653) loss 3.5066 (3.0893) grad_norm 1.9776 (inf) loss_scale 2048.0000 (2390.1845) mem 16715MB [2024-08-10 11:17:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][410/625] eta 0:01:40 lr 0.000878 wd 0.0500 time 0.4622 (0.4668) data time 0.0009 (0.0020) model time 0.4614 (0.4652) loss 3.4547 (3.0916) grad_norm 1.3058 (inf) loss_scale 2048.0000 (2381.8589) mem 16715MB [2024-08-10 11:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][420/625] eta 0:01:35 lr 0.000878 wd 0.0500 time 0.4635 (0.4667) data time 0.0008 (0.0020) model time 0.4626 (0.4651) loss 3.6873 (3.0936) grad_norm 1.1987 (inf) loss_scale 2048.0000 (2373.9287) mem 16715MB [2024-08-10 11:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][430/625] eta 0:01:31 lr 0.000878 wd 0.0500 time 0.4607 (0.4672) data time 0.0011 (0.0020) model time 0.4596 (0.4657) loss 1.9228 (3.0937) grad_norm 2.0050 (inf) loss_scale 2048.0000 (2366.3666) mem 16715MB [2024-08-10 11:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][440/625] eta 0:01:26 lr 0.000877 wd 0.0500 time 0.4631 (0.4671) data time 0.0008 (0.0019) model time 0.4623 (0.4656) loss 3.3963 (3.0895) grad_norm 2.7440 (inf) loss_scale 2048.0000 (2359.1474) mem 16715MB [2024-08-10 11:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][450/625] eta 0:01:21 lr 0.000877 wd 0.0500 time 0.4606 (0.4670) data time 0.0008 (0.0019) model time 0.4597 (0.4655) loss 3.7870 (3.0948) grad_norm 4.7272 (inf) loss_scale 2048.0000 (2352.2483) mem 16715MB [2024-08-10 11:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][460/625] eta 0:01:17 lr 0.000877 wd 0.0500 time 0.4644 (0.4670) data time 0.0007 (0.0019) model time 0.4637 (0.4654) loss 1.8027 (3.0931) grad_norm 1.2003 (inf) loss_scale 2048.0000 (2345.6486) mem 16715MB [2024-08-10 11:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][470/625] eta 0:01:12 lr 0.000877 wd 0.0500 time 0.4709 (0.4678) data time 0.0009 (0.0019) model time 0.4700 (0.4663) loss 3.3782 (3.0937) grad_norm 1.3051 (inf) loss_scale 2048.0000 (2339.3291) mem 16715MB [2024-08-10 11:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][480/625] eta 0:01:07 lr 0.000877 wd 0.0500 time 0.4718 (0.4677) data time 0.0008 (0.0019) model time 0.4710 (0.4663) loss 3.6801 (3.0956) grad_norm 1.3165 (inf) loss_scale 2048.0000 (2333.2723) mem 16715MB [2024-08-10 11:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][490/625] eta 0:01:03 lr 0.000877 wd 0.0500 time 0.4646 (0.4677) data time 0.0010 (0.0019) model time 0.4636 (0.4663) loss 3.3939 (3.0944) grad_norm 1.3533 (inf) loss_scale 2048.0000 (2327.4623) mem 16715MB [2024-08-10 11:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][500/625] eta 0:00:58 lr 0.000877 wd 0.0500 time 0.4673 (0.4680) data time 0.0011 (0.0018) model time 0.4662 (0.4667) loss 3.4989 (3.0939) grad_norm 1.4308 (inf) loss_scale 2048.0000 (2321.8842) mem 16715MB [2024-08-10 11:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][510/625] eta 0:00:53 lr 0.000877 wd 0.0500 time 0.4664 (0.4679) data time 0.0008 (0.0018) model time 0.4656 (0.4666) loss 3.7569 (3.0897) grad_norm 1.5920 (inf) loss_scale 2048.0000 (2316.5245) mem 16715MB [2024-08-10 11:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][520/625] eta 0:00:49 lr 0.000877 wd 0.0500 time 0.4621 (0.4679) data time 0.0010 (0.0018) model time 0.4611 (0.4665) loss 3.2870 (3.0846) grad_norm 1.7154 (inf) loss_scale 2048.0000 (2311.3704) mem 16715MB [2024-08-10 11:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][530/625] eta 0:00:44 lr 0.000877 wd 0.0500 time 0.4661 (0.4679) data time 0.0008 (0.0018) model time 0.4653 (0.4666) loss 2.8094 (3.0857) grad_norm 1.1960 (inf) loss_scale 2048.0000 (2306.4105) mem 16715MB [2024-08-10 11:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][540/625] eta 0:00:39 lr 0.000876 wd 0.0500 time 0.4572 (0.4678) data time 0.0011 (0.0018) model time 0.4561 (0.4665) loss 2.8516 (3.0871) grad_norm 1.4859 (inf) loss_scale 2048.0000 (2301.6340) mem 16715MB [2024-08-10 11:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][550/625] eta 0:00:35 lr 0.000876 wd 0.0500 time 0.4724 (0.4678) data time 0.0010 (0.0018) model time 0.4714 (0.4665) loss 2.6999 (3.0864) grad_norm 1.1283 (inf) loss_scale 2048.0000 (2297.0309) mem 16715MB [2024-08-10 11:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][560/625] eta 0:00:30 lr 0.000876 wd 0.0500 time 0.4631 (0.4678) data time 0.0010 (0.0018) model time 0.4621 (0.4664) loss 2.1058 (3.0802) grad_norm 1.4750 (inf) loss_scale 2048.0000 (2292.5918) mem 16715MB [2024-08-10 11:18:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][570/625] eta 0:00:25 lr 0.000876 wd 0.0500 time 0.4588 (0.4677) data time 0.0008 (0.0018) model time 0.4580 (0.4663) loss 3.2872 (3.0771) grad_norm 2.7549 (inf) loss_scale 2048.0000 (2288.3082) mem 16715MB [2024-08-10 11:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][580/625] eta 0:00:21 lr 0.000876 wd 0.0500 time 0.4627 (0.4676) data time 0.0008 (0.0017) model time 0.4619 (0.4662) loss 3.2483 (3.0781) grad_norm 1.6139 (inf) loss_scale 2048.0000 (2284.1721) mem 16715MB [2024-08-10 11:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][590/625] eta 0:00:16 lr 0.000876 wd 0.0500 time 0.4646 (0.4675) data time 0.0008 (0.0017) model time 0.4638 (0.4661) loss 1.9825 (3.0759) grad_norm 1.4989 (inf) loss_scale 2048.0000 (2280.1760) mem 16715MB [2024-08-10 11:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][600/625] eta 0:00:11 lr 0.000876 wd 0.0500 time 0.4615 (0.4674) data time 0.0008 (0.0017) model time 0.4608 (0.4661) loss 3.2767 (3.0752) grad_norm 1.3125 (inf) loss_scale 2048.0000 (2276.3128) mem 16715MB [2024-08-10 11:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][610/625] eta 0:00:07 lr 0.000876 wd 0.0500 time 0.4621 (0.4674) data time 0.0007 (0.0017) model time 0.4614 (0.4660) loss 3.1106 (3.0761) grad_norm 1.2847 (inf) loss_scale 2048.0000 (2272.5761) mem 16715MB [2024-08-10 11:18:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [117/300][620/625] eta 0:00:02 lr 0.000876 wd 0.0500 time 0.4606 (0.4676) data time 0.0005 (0.0017) model time 0.4600 (0.4663) loss 2.2787 (3.0770) grad_norm 1.6736 (inf) loss_scale 2048.0000 (2268.9597) mem 16715MB [2024-08-10 11:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 117 training takes 0:04:52 [2024-08-10 11:18:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:18:49 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.579 (0.579) Loss 0.5532 (0.5532) Acc@1 87.061 (87.061) Acc@5 98.145 (98.145) Mem 16715MB [2024-08-10 11:18:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.169) Loss 0.8926 (0.6863) Acc@1 78.027 (84.544) Acc@5 95.459 (97.195) Mem 16715MB [2024-08-10 11:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.145) Loss 1.0166 (0.8184) Acc@1 74.854 (81.341) Acc@5 94.336 (95.766) Mem 16715MB [2024-08-10 11:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.052 Acc@5 95.745 [2024-08-10 11:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-10 11:18:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.939 (0.939) Loss 0.4910 (0.4910) Acc@1 89.062 (89.062) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 11:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.200) Loss 0.7925 (0.6152) Acc@1 80.566 (86.315) Acc@5 96.533 (97.772) Mem 16715MB [2024-08-10 11:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.161) Loss 0.9048 (0.7251) Acc@1 77.441 (83.329) Acc@5 95.361 (96.619) Mem 16715MB [2024-08-10 11:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.043 Acc@5 96.639 [2024-08-10 11:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 11:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.04% [2024-08-10 11:18:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:18:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:18:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][0/625] eta 0:08:52 lr 0.000876 wd 0.0500 time 0.8512 (0.8512) data time 0.4417 (0.4417) model time 0.0000 (0.0000) loss 3.7752 (3.7752) grad_norm 0.9488 (0.9488) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:19:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][10/625] eta 0:05:06 lr 0.000876 wd 0.0500 time 0.4667 (0.4990) data time 0.0008 (0.0411) model time 0.0000 (0.0000) loss 3.6193 (2.8557) grad_norm 1.4398 (1.2837) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:19:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][20/625] eta 0:04:51 lr 0.000875 wd 0.0500 time 0.4676 (0.4818) data time 0.0008 (0.0220) model time 0.0000 (0.0000) loss 3.1642 (2.9823) grad_norm 1.2508 (1.3612) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:19:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][30/625] eta 0:04:45 lr 0.000875 wd 0.0500 time 0.4533 (0.4792) data time 0.0007 (0.0152) model time 0.0000 (0.0000) loss 3.4362 (3.1463) grad_norm inf (inf) loss_scale 1024.0000 (2014.9677) mem 16715MB [2024-08-10 11:19:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][40/625] eta 0:04:38 lr 0.000875 wd 0.0500 time 0.4573 (0.4756) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 3.4777 (3.1725) grad_norm 1.2386 (inf) loss_scale 1024.0000 (1773.2683) mem 16715MB [2024-08-10 11:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][50/625] eta 0:04:35 lr 0.000875 wd 0.0500 time 0.4774 (0.4796) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 2.9949 (3.1904) grad_norm 1.9330 (inf) loss_scale 1024.0000 (1626.3529) mem 16715MB [2024-08-10 11:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][60/625] eta 0:04:31 lr 0.000875 wd 0.0500 time 0.4679 (0.4812) data time 0.0008 (0.0085) model time 0.4671 (0.4878) loss 3.1601 (3.1753) grad_norm 1.3080 (inf) loss_scale 1024.0000 (1527.6066) mem 16715MB [2024-08-10 11:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][70/625] eta 0:04:25 lr 0.000875 wd 0.0500 time 0.4636 (0.4792) data time 0.0010 (0.0075) model time 0.4626 (0.4771) loss 3.1318 (3.1349) grad_norm 1.7576 (inf) loss_scale 1024.0000 (1456.6761) mem 16715MB [2024-08-10 11:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][80/625] eta 0:04:20 lr 0.000875 wd 0.0500 time 0.4655 (0.4776) data time 0.0008 (0.0067) model time 0.4647 (0.4731) loss 2.1603 (3.1209) grad_norm 1.3311 (inf) loss_scale 1024.0000 (1403.2593) mem 16715MB [2024-08-10 11:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][90/625] eta 0:04:14 lr 0.000875 wd 0.0500 time 0.4672 (0.4763) data time 0.0007 (0.0060) model time 0.4664 (0.4710) loss 3.5867 (3.1028) grad_norm 1.3922 (inf) loss_scale 1024.0000 (1361.5824) mem 16715MB [2024-08-10 11:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][100/625] eta 0:04:09 lr 0.000875 wd 0.0500 time 0.4625 (0.4752) data time 0.0011 (0.0056) model time 0.4614 (0.4695) loss 3.4913 (3.1109) grad_norm 1.1756 (inf) loss_scale 1024.0000 (1328.1584) mem 16715MB [2024-08-10 11:19:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][110/625] eta 0:04:04 lr 0.000875 wd 0.0500 time 0.4658 (0.4750) data time 0.0010 (0.0052) model time 0.4648 (0.4699) loss 3.1968 (3.1059) grad_norm 1.7330 (inf) loss_scale 1024.0000 (1300.7568) mem 16715MB [2024-08-10 11:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][120/625] eta 0:03:59 lr 0.000875 wd 0.0500 time 0.4658 (0.4743) data time 0.0008 (0.0048) model time 0.4650 (0.4693) loss 3.5123 (3.1163) grad_norm 1.5285 (inf) loss_scale 1024.0000 (1277.8843) mem 16715MB [2024-08-10 11:20:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][130/625] eta 0:03:54 lr 0.000874 wd 0.0500 time 0.4696 (0.4740) data time 0.0010 (0.0046) model time 0.4686 (0.4692) loss 3.1324 (3.0975) grad_norm 1.3939 (inf) loss_scale 1024.0000 (1258.5038) mem 16715MB [2024-08-10 11:20:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][140/625] eta 0:03:50 lr 0.000874 wd 0.0500 time 0.4675 (0.4751) data time 0.0011 (0.0043) model time 0.4664 (0.4714) loss 3.2001 (3.0972) grad_norm 1.0478 (inf) loss_scale 1024.0000 (1241.8723) mem 16715MB [2024-08-10 11:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][150/625] eta 0:03:45 lr 0.000874 wd 0.0500 time 0.4658 (0.4745) data time 0.0010 (0.0042) model time 0.4648 (0.4707) loss 3.1567 (3.0942) grad_norm 1.5609 (inf) loss_scale 1024.0000 (1227.4437) mem 16715MB [2024-08-10 11:20:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][160/625] eta 0:03:40 lr 0.000874 wd 0.0500 time 0.4711 (0.4739) data time 0.0011 (0.0040) model time 0.4700 (0.4701) loss 3.3553 (3.0942) grad_norm 1.2150 (inf) loss_scale 1024.0000 (1214.8075) mem 16715MB [2024-08-10 11:20:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][170/625] eta 0:03:35 lr 0.000874 wd 0.0500 time 0.4696 (0.4734) data time 0.0010 (0.0038) model time 0.4686 (0.4696) loss 3.4394 (3.0861) grad_norm 1.4055 (inf) loss_scale 1024.0000 (1203.6491) mem 16715MB [2024-08-10 11:20:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][180/625] eta 0:03:30 lr 0.000874 wd 0.0500 time 0.4607 (0.4730) data time 0.0010 (0.0037) model time 0.4597 (0.4691) loss 3.4389 (3.0923) grad_norm 1.5320 (inf) loss_scale 1024.0000 (1193.7238) mem 16715MB [2024-08-10 11:20:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][190/625] eta 0:03:25 lr 0.000874 wd 0.0500 time 0.4638 (0.4726) data time 0.0010 (0.0036) model time 0.4628 (0.4688) loss 3.1912 (3.0887) grad_norm 1.4171 (inf) loss_scale 1024.0000 (1184.8377) mem 16715MB [2024-08-10 11:20:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][200/625] eta 0:03:20 lr 0.000874 wd 0.0500 time 0.4662 (0.4725) data time 0.0008 (0.0034) model time 0.4654 (0.4688) loss 2.7241 (3.0783) grad_norm 1.3899 (inf) loss_scale 1024.0000 (1176.8358) mem 16715MB [2024-08-10 11:20:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][210/625] eta 0:03:16 lr 0.000874 wd 0.0500 time 0.4667 (0.4728) data time 0.0010 (0.0035) model time 0.4657 (0.4693) loss 2.9342 (3.0738) grad_norm 1.3024 (inf) loss_scale 1024.0000 (1169.5924) mem 16715MB [2024-08-10 11:20:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][220/625] eta 0:03:11 lr 0.000874 wd 0.0500 time 0.4620 (0.4725) data time 0.0008 (0.0034) model time 0.4612 (0.4689) loss 2.6574 (3.0741) grad_norm 1.2938 (inf) loss_scale 1024.0000 (1163.0045) mem 16715MB [2024-08-10 11:20:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][230/625] eta 0:03:06 lr 0.000873 wd 0.0500 time 0.4648 (0.4722) data time 0.0011 (0.0033) model time 0.4637 (0.4687) loss 2.0510 (3.0675) grad_norm 1.3222 (inf) loss_scale 1024.0000 (1156.9870) mem 16715MB [2024-08-10 11:20:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][240/625] eta 0:03:01 lr 0.000873 wd 0.0500 time 0.4630 (0.4727) data time 0.0012 (0.0032) model time 0.4618 (0.4694) loss 2.8844 (3.0705) grad_norm 1.9746 (inf) loss_scale 1024.0000 (1151.4689) mem 16715MB [2024-08-10 11:20:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][250/625] eta 0:02:57 lr 0.000873 wd 0.0500 time 0.4851 (0.4735) data time 0.0008 (0.0031) model time 0.4844 (0.4705) loss 2.0146 (3.0616) grad_norm 1.2989 (inf) loss_scale 1024.0000 (1146.3904) mem 16715MB [2024-08-10 11:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][260/625] eta 0:02:52 lr 0.000873 wd 0.0500 time 0.4665 (0.4733) data time 0.0012 (0.0030) model time 0.4653 (0.4704) loss 2.7018 (3.0753) grad_norm 1.6044 (inf) loss_scale 1024.0000 (1141.7011) mem 16715MB [2024-08-10 11:21:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][270/625] eta 0:02:47 lr 0.000873 wd 0.0500 time 0.4604 (0.4731) data time 0.0008 (0.0030) model time 0.4597 (0.4703) loss 3.6255 (3.0859) grad_norm 1.8778 (inf) loss_scale 1024.0000 (1137.3579) mem 16715MB [2024-08-10 11:21:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][280/625] eta 0:02:43 lr 0.000873 wd 0.0500 time 0.6247 (0.4736) data time 0.0008 (0.0029) model time 0.6238 (0.4709) loss 1.9935 (3.0785) grad_norm 1.6795 (inf) loss_scale 1024.0000 (1133.3238) mem 16715MB [2024-08-10 11:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][290/625] eta 0:02:38 lr 0.000873 wd 0.0500 time 0.4713 (0.4734) data time 0.0011 (0.0028) model time 0.4703 (0.4708) loss 2.6033 (3.0688) grad_norm 1.5046 (inf) loss_scale 1024.0000 (1129.5670) mem 16715MB [2024-08-10 11:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][300/625] eta 0:02:33 lr 0.000873 wd 0.0500 time 0.4673 (0.4731) data time 0.0008 (0.0028) model time 0.4665 (0.4704) loss 3.6346 (3.0684) grad_norm 1.2864 (inf) loss_scale 1024.0000 (1126.0598) mem 16715MB [2024-08-10 11:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][310/625] eta 0:02:28 lr 0.000873 wd 0.0500 time 0.4673 (0.4729) data time 0.0010 (0.0027) model time 0.4663 (0.4703) loss 3.2047 (3.0686) grad_norm 1.4721 (inf) loss_scale 1024.0000 (1122.7781) mem 16715MB [2024-08-10 11:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][320/625] eta 0:02:24 lr 0.000873 wd 0.0500 time 0.4833 (0.4727) data time 0.0007 (0.0027) model time 0.4826 (0.4701) loss 2.8520 (3.0643) grad_norm 2.1824 (inf) loss_scale 1024.0000 (1119.7009) mem 16715MB [2024-08-10 11:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][330/625] eta 0:02:19 lr 0.000873 wd 0.0500 time 0.4659 (0.4730) data time 0.0008 (0.0026) model time 0.4651 (0.4705) loss 2.4529 (3.0623) grad_norm 1.4581 (inf) loss_scale 1024.0000 (1116.8097) mem 16715MB [2024-08-10 11:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][340/625] eta 0:02:14 lr 0.000872 wd 0.0500 time 0.4645 (0.4728) data time 0.0007 (0.0026) model time 0.4637 (0.4703) loss 2.6547 (3.0538) grad_norm 1.7575 (inf) loss_scale 1024.0000 (1114.0880) mem 16715MB [2024-08-10 11:21:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][350/625] eta 0:02:09 lr 0.000872 wd 0.0500 time 0.4655 (0.4725) data time 0.0008 (0.0025) model time 0.4647 (0.4700) loss 2.1402 (3.0474) grad_norm 1.9510 (inf) loss_scale 1024.0000 (1111.5214) mem 16715MB [2024-08-10 11:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][360/625] eta 0:02:05 lr 0.000872 wd 0.0500 time 0.4654 (0.4723) data time 0.0010 (0.0025) model time 0.4644 (0.4698) loss 2.8641 (3.0395) grad_norm 1.5067 (inf) loss_scale 1024.0000 (1109.0970) mem 16715MB [2024-08-10 11:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][370/625] eta 0:02:00 lr 0.000872 wd 0.0500 time 0.4675 (0.4721) data time 0.0010 (0.0025) model time 0.4665 (0.4697) loss 3.0227 (3.0425) grad_norm 1.0688 (inf) loss_scale 1024.0000 (1106.8032) mem 16715MB [2024-08-10 11:21:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][380/625] eta 0:01:55 lr 0.000872 wd 0.0500 time 0.4666 (0.4719) data time 0.0010 (0.0024) model time 0.4657 (0.4694) loss 1.9219 (3.0377) grad_norm 1.4858 (inf) loss_scale 1024.0000 (1104.6299) mem 16715MB [2024-08-10 11:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][390/625] eta 0:01:50 lr 0.000872 wd 0.0500 time 0.4599 (0.4716) data time 0.0008 (0.0024) model time 0.4591 (0.4692) loss 3.6040 (3.0426) grad_norm 4.4318 (inf) loss_scale 1024.0000 (1102.5678) mem 16715MB [2024-08-10 11:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][400/625] eta 0:01:46 lr 0.000872 wd 0.0500 time 0.4684 (0.4718) data time 0.0010 (0.0024) model time 0.4674 (0.4694) loss 3.0037 (3.0400) grad_norm 1.6179 (inf) loss_scale 1024.0000 (1100.6085) mem 16715MB [2024-08-10 11:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][410/625] eta 0:01:41 lr 0.000872 wd 0.0500 time 0.4652 (0.4717) data time 0.0010 (0.0023) model time 0.4642 (0.4693) loss 3.4935 (3.0416) grad_norm 1.1072 (inf) loss_scale 1024.0000 (1098.7445) mem 16715MB [2024-08-10 11:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][420/625] eta 0:01:36 lr 0.000872 wd 0.0500 time 0.4775 (0.4716) data time 0.0008 (0.0023) model time 0.4767 (0.4692) loss 2.8674 (3.0400) grad_norm 1.2481 (inf) loss_scale 1024.0000 (1096.9691) mem 16715MB [2024-08-10 11:22:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][430/625] eta 0:01:31 lr 0.000872 wd 0.0500 time 0.4643 (0.4714) data time 0.0008 (0.0023) model time 0.4635 (0.4691) loss 3.2934 (3.0413) grad_norm 1.3125 (inf) loss_scale 1024.0000 (1095.2761) mem 16715MB [2024-08-10 11:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][440/625] eta 0:01:27 lr 0.000871 wd 0.0500 time 0.4789 (0.4713) data time 0.0010 (0.0022) model time 0.4778 (0.4690) loss 2.6720 (3.0511) grad_norm 1.2285 (inf) loss_scale 1024.0000 (1093.6599) mem 16715MB [2024-08-10 11:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][450/625] eta 0:01:22 lr 0.000871 wd 0.0500 time 0.4642 (0.4712) data time 0.0009 (0.0022) model time 0.4633 (0.4689) loss 3.1249 (3.0483) grad_norm 1.8171 (inf) loss_scale 1024.0000 (1092.1153) mem 16715MB [2024-08-10 11:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][460/625] eta 0:01:17 lr 0.000871 wd 0.0500 time 0.4630 (0.4715) data time 0.0010 (0.0022) model time 0.4620 (0.4692) loss 2.6783 (3.0483) grad_norm 2.4564 (inf) loss_scale 1024.0000 (1090.6377) mem 16715MB [2024-08-10 11:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][470/625] eta 0:01:13 lr 0.000871 wd 0.0500 time 0.4610 (0.4720) data time 0.0011 (0.0022) model time 0.4599 (0.4699) loss 3.1691 (3.0492) grad_norm 1.3698 (inf) loss_scale 1024.0000 (1089.2229) mem 16715MB [2024-08-10 11:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][480/625] eta 0:01:08 lr 0.000871 wd 0.0500 time 0.4592 (0.4724) data time 0.0008 (0.0021) model time 0.4584 (0.4703) loss 3.4850 (3.0476) grad_norm 1.7949 (inf) loss_scale 1024.0000 (1087.8669) mem 16715MB [2024-08-10 11:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][490/625] eta 0:01:03 lr 0.000871 wd 0.0500 time 0.4666 (0.4722) data time 0.0008 (0.0021) model time 0.4658 (0.4702) loss 1.6039 (3.0431) grad_norm 1.3416 (inf) loss_scale 1024.0000 (1086.5662) mem 16715MB [2024-08-10 11:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][500/625] eta 0:00:59 lr 0.000871 wd 0.0500 time 0.4654 (0.4721) data time 0.0009 (0.0021) model time 0.4646 (0.4701) loss 3.1883 (3.0412) grad_norm 1.2913 (inf) loss_scale 1024.0000 (1085.3174) mem 16715MB [2024-08-10 11:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][510/625] eta 0:00:54 lr 0.000871 wd 0.0500 time 0.4722 (0.4720) data time 0.0009 (0.0021) model time 0.4714 (0.4700) loss 2.9768 (3.0385) grad_norm 1.3947 (inf) loss_scale 1024.0000 (1084.1174) mem 16715MB [2024-08-10 11:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][520/625] eta 0:00:49 lr 0.000871 wd 0.0500 time 0.4503 (0.4718) data time 0.0013 (0.0021) model time 0.4491 (0.4698) loss 3.5311 (3.0390) grad_norm 2.4526 (inf) loss_scale 1024.0000 (1082.9635) mem 16715MB [2024-08-10 11:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][530/625] eta 0:00:44 lr 0.000871 wd 0.0500 time 0.4593 (0.4716) data time 0.0010 (0.0020) model time 0.4582 (0.4696) loss 3.5992 (3.0418) grad_norm 2.2825 (inf) loss_scale 1024.0000 (1081.8531) mem 16715MB [2024-08-10 11:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][540/625] eta 0:00:40 lr 0.000871 wd 0.0500 time 0.4604 (0.4714) data time 0.0008 (0.0020) model time 0.4595 (0.4694) loss 3.7954 (3.0456) grad_norm 1.1943 (inf) loss_scale 1024.0000 (1080.7837) mem 16715MB [2024-08-10 11:23:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][550/625] eta 0:00:35 lr 0.000870 wd 0.0500 time 0.4681 (0.4713) data time 0.0013 (0.0020) model time 0.4668 (0.4693) loss 2.9211 (3.0491) grad_norm 1.3435 (inf) loss_scale 1024.0000 (1079.7532) mem 16715MB [2024-08-10 11:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][560/625] eta 0:00:30 lr 0.000870 wd 0.0500 time 0.4661 (0.4712) data time 0.0011 (0.0020) model time 0.4651 (0.4692) loss 3.1709 (3.0530) grad_norm 1.8183 (inf) loss_scale 1024.0000 (1078.7594) mem 16715MB [2024-08-10 11:23:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][570/625] eta 0:00:25 lr 0.000870 wd 0.0500 time 0.4612 (0.4711) data time 0.0009 (0.0020) model time 0.4602 (0.4691) loss 2.1950 (3.0487) grad_norm 1.3685 (inf) loss_scale 1024.0000 (1077.8004) mem 16715MB [2024-08-10 11:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][580/625] eta 0:00:21 lr 0.000870 wd 0.0500 time 0.4635 (0.4711) data time 0.0013 (0.0020) model time 0.4621 (0.4690) loss 3.2881 (3.0537) grad_norm 1.2768 (inf) loss_scale 1024.0000 (1076.8744) mem 16715MB [2024-08-10 11:23:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][590/625] eta 0:00:16 lr 0.000870 wd 0.0500 time 0.4606 (0.4709) data time 0.0009 (0.0019) model time 0.4598 (0.4689) loss 3.1526 (3.0494) grad_norm 1.6553 (inf) loss_scale 1024.0000 (1075.9797) mem 16715MB [2024-08-10 11:23:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][600/625] eta 0:00:11 lr 0.000870 wd 0.0500 time 0.4574 (0.4708) data time 0.0009 (0.0019) model time 0.4565 (0.4688) loss 3.7148 (3.0461) grad_norm 2.1054 (inf) loss_scale 1024.0000 (1075.1148) mem 16715MB [2024-08-10 11:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][610/625] eta 0:00:07 lr 0.000870 wd 0.0500 time 0.4594 (0.4712) data time 0.0008 (0.0019) model time 0.4587 (0.4693) loss 3.4014 (3.0460) grad_norm 3.0973 (inf) loss_scale 1024.0000 (1074.2782) mem 16715MB [2024-08-10 11:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [118/300][620/625] eta 0:00:02 lr 0.000870 wd 0.0500 time 0.4617 (0.4711) data time 0.0006 (0.0019) model time 0.4611 (0.4691) loss 3.7565 (3.0469) grad_norm 1.3621 (inf) loss_scale 1024.0000 (1073.4686) mem 16715MB [2024-08-10 11:23:53 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 118 training takes 0:04:54 [2024-08-10 11:23:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:23:54 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:23:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5210 (0.5210) Acc@1 88.281 (88.281) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 11:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.161) Loss 0.8916 (0.6614) Acc@1 77.393 (84.881) Acc@5 94.922 (97.288) Mem 16715MB [2024-08-10 11:23:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0449 (0.7939) Acc@1 73.877 (81.534) Acc@5 94.141 (95.878) Mem 16715MB [2024-08-10 11:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.290 Acc@5 95.859 [2024-08-10 11:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.3% [2024-08-10 11:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.29% [2024-08-10 11:23:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 11:24:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 11:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.4888 (0.4888) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:24:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.164) Loss 0.7905 (0.6139) Acc@1 80.859 (86.421) Acc@5 96.436 (97.758) Mem 16715MB [2024-08-10 11:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.142) Loss 0.9053 (0.7237) Acc@1 77.344 (83.396) Acc@5 95.312 (96.622) Mem 16715MB [2024-08-10 11:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.089 Acc@5 96.645 [2024-08-10 11:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 11:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.09% [2024-08-10 11:24:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:24:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:24:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][0/625] eta 0:08:56 lr 0.000870 wd 0.0500 time 0.8587 (0.8587) data time 0.4481 (0.4481) model time 0.0000 (0.0000) loss 3.5036 (3.5036) grad_norm 1.3001 (1.3001) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][10/625] eta 0:05:08 lr 0.000870 wd 0.0500 time 0.4674 (0.5020) data time 0.0008 (0.0417) model time 0.0000 (0.0000) loss 2.0925 (3.0091) grad_norm 1.6786 (1.6715) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][20/625] eta 0:04:52 lr 0.000870 wd 0.0500 time 0.4599 (0.4831) data time 0.0011 (0.0223) model time 0.0000 (0.0000) loss 2.3519 (3.0110) grad_norm 1.1247 (1.6231) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][30/625] eta 0:04:43 lr 0.000869 wd 0.0500 time 0.4616 (0.4770) data time 0.0010 (0.0154) model time 0.0000 (0.0000) loss 3.4460 (3.0546) grad_norm 1.1825 (1.5840) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][40/625] eta 0:04:37 lr 0.000869 wd 0.0500 time 0.4628 (0.4736) data time 0.0010 (0.0119) model time 0.0000 (0.0000) loss 2.3462 (3.0274) grad_norm 1.5885 (1.5705) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][50/625] eta 0:04:31 lr 0.000869 wd 0.0500 time 0.4625 (0.4719) data time 0.0009 (0.0098) model time 0.0000 (0.0000) loss 3.2721 (3.0589) grad_norm 1.2044 (1.5447) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][60/625] eta 0:04:25 lr 0.000869 wd 0.0500 time 0.4667 (0.4706) data time 0.0007 (0.0084) model time 0.4660 (0.4629) loss 3.3733 (3.0533) grad_norm 1.2212 (1.5786) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][70/625] eta 0:04:22 lr 0.000869 wd 0.0500 time 0.4670 (0.4734) data time 0.0010 (0.0073) model time 0.4660 (0.4762) loss 3.1060 (3.0681) grad_norm 1.5363 (1.5488) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][80/625] eta 0:04:18 lr 0.000869 wd 0.0500 time 0.4670 (0.4743) data time 0.0010 (0.0065) model time 0.4659 (0.4774) loss 2.6851 (3.0156) grad_norm 2.3024 (1.5572) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][90/625] eta 0:04:13 lr 0.000869 wd 0.0500 time 0.4647 (0.4734) data time 0.0010 (0.0059) model time 0.4637 (0.4743) loss 3.4560 (3.0096) grad_norm 1.4421 (1.5677) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][100/625] eta 0:04:08 lr 0.000869 wd 0.0500 time 0.4636 (0.4725) data time 0.0008 (0.0055) model time 0.4628 (0.4721) loss 2.9602 (2.9839) grad_norm 1.6747 (1.5631) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][110/625] eta 0:04:02 lr 0.000869 wd 0.0500 time 0.4613 (0.4717) data time 0.0010 (0.0051) model time 0.4603 (0.4705) loss 3.2771 (2.9842) grad_norm 1.3753 (1.5975) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][120/625] eta 0:03:57 lr 0.000869 wd 0.0500 time 0.4629 (0.4710) data time 0.0010 (0.0047) model time 0.4619 (0.4694) loss 3.1104 (2.9860) grad_norm 1.7326 (1.5846) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][130/625] eta 0:03:52 lr 0.000868 wd 0.0500 time 0.4617 (0.4703) data time 0.0007 (0.0044) model time 0.4610 (0.4683) loss 3.3518 (2.9910) grad_norm 1.4408 (1.5849) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][140/625] eta 0:03:47 lr 0.000868 wd 0.0500 time 0.4649 (0.4700) data time 0.0008 (0.0042) model time 0.4641 (0.4679) loss 3.2426 (2.9998) grad_norm 2.5456 (1.6052) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][150/625] eta 0:03:43 lr 0.000868 wd 0.0500 time 0.4679 (0.4697) data time 0.0008 (0.0040) model time 0.4671 (0.4676) loss 2.5397 (2.9783) grad_norm 2.0832 (1.6388) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][160/625] eta 0:03:38 lr 0.000868 wd 0.0500 time 0.4641 (0.4695) data time 0.0012 (0.0038) model time 0.4629 (0.4674) loss 3.2538 (2.9826) grad_norm 1.1020 (1.6303) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][170/625] eta 0:03:33 lr 0.000868 wd 0.0500 time 0.4602 (0.4692) data time 0.0007 (0.0037) model time 0.4595 (0.4670) loss 2.9176 (2.9778) grad_norm 1.9233 (1.6274) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][180/625] eta 0:03:28 lr 0.000868 wd 0.0500 time 0.4598 (0.4689) data time 0.0008 (0.0035) model time 0.4590 (0.4667) loss 3.1439 (2.9731) grad_norm 2.0857 (1.6293) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][190/625] eta 0:03:23 lr 0.000868 wd 0.0500 time 0.4582 (0.4685) data time 0.0008 (0.0034) model time 0.4574 (0.4663) loss 1.8607 (2.9666) grad_norm 2.6551 (1.6273) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][200/625] eta 0:03:19 lr 0.000868 wd 0.0500 time 0.4638 (0.4683) data time 0.0010 (0.0033) model time 0.4627 (0.4661) loss 3.3628 (2.9653) grad_norm 1.3895 (1.6801) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][210/625] eta 0:03:14 lr 0.000868 wd 0.0500 time 0.6271 (0.4698) data time 0.0011 (0.0032) model time 0.6260 (0.4681) loss 2.7338 (2.9694) grad_norm 1.0272 (1.6676) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][220/625] eta 0:03:10 lr 0.000868 wd 0.0500 time 0.4666 (0.4700) data time 0.0010 (0.0031) model time 0.4656 (0.4684) loss 2.3704 (2.9660) grad_norm 1.8151 (1.6592) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][230/625] eta 0:03:05 lr 0.000868 wd 0.0500 time 0.4670 (0.4699) data time 0.0008 (0.0030) model time 0.4662 (0.4683) loss 2.9938 (2.9727) grad_norm 1.4369 (1.6575) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:25:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][240/625] eta 0:03:00 lr 0.000867 wd 0.0500 time 0.4605 (0.4697) data time 0.0011 (0.0029) model time 0.4595 (0.4681) loss 3.0659 (2.9720) grad_norm 1.3224 (1.6487) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][250/625] eta 0:02:56 lr 0.000867 wd 0.0500 time 0.4641 (0.4696) data time 0.0008 (0.0028) model time 0.4632 (0.4679) loss 2.7848 (2.9683) grad_norm 2.0530 (1.6480) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][260/625] eta 0:02:51 lr 0.000867 wd 0.0500 time 0.4648 (0.4701) data time 0.0008 (0.0028) model time 0.4640 (0.4686) loss 2.0240 (2.9705) grad_norm 1.4644 (1.6547) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][270/625] eta 0:02:46 lr 0.000867 wd 0.0500 time 0.4657 (0.4698) data time 0.0010 (0.0027) model time 0.4647 (0.4683) loss 3.1866 (2.9725) grad_norm 1.2556 (1.6605) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][280/625] eta 0:02:42 lr 0.000867 wd 0.0500 time 0.4627 (0.4696) data time 0.0011 (0.0026) model time 0.4617 (0.4680) loss 2.5400 (2.9722) grad_norm 1.3205 (1.6587) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][290/625] eta 0:02:37 lr 0.000867 wd 0.0500 time 0.4661 (0.4694) data time 0.0010 (0.0026) model time 0.4650 (0.4679) loss 3.5012 (2.9802) grad_norm 2.6247 (1.6600) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][300/625] eta 0:02:32 lr 0.000867 wd 0.0500 time 0.4655 (0.4693) data time 0.0008 (0.0025) model time 0.4647 (0.4677) loss 2.9726 (2.9827) grad_norm 1.7748 (1.6658) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][310/625] eta 0:02:27 lr 0.000867 wd 0.0500 time 0.4665 (0.4692) data time 0.0011 (0.0025) model time 0.4654 (0.4676) loss 2.9186 (2.9932) grad_norm 2.3293 (1.6705) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][320/625] eta 0:02:23 lr 0.000867 wd 0.0500 time 0.4633 (0.4690) data time 0.0011 (0.0024) model time 0.4622 (0.4674) loss 3.3731 (2.9890) grad_norm 1.6015 (1.6619) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][330/625] eta 0:02:18 lr 0.000867 wd 0.0500 time 0.4094 (0.4693) data time 0.0009 (0.0024) model time 0.4085 (0.4678) loss 3.8835 (2.9942) grad_norm 1.3334 (1.6528) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][340/625] eta 0:02:13 lr 0.000866 wd 0.0500 time 0.4646 (0.4691) data time 0.0012 (0.0024) model time 0.4634 (0.4676) loss 3.1586 (2.9940) grad_norm 1.8750 (1.6453) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][350/625] eta 0:02:08 lr 0.000866 wd 0.0500 time 0.4681 (0.4689) data time 0.0009 (0.0023) model time 0.4672 (0.4674) loss 3.3559 (2.9951) grad_norm 1.2349 (1.6422) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][360/625] eta 0:02:04 lr 0.000866 wd 0.0500 time 0.4651 (0.4703) data time 0.0010 (0.0023) model time 0.4641 (0.4690) loss 3.5141 (2.9885) grad_norm 1.4106 (1.6409) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:26:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][370/625] eta 0:01:59 lr 0.000866 wd 0.0500 time 0.4717 (0.4702) data time 0.0009 (0.0023) model time 0.4708 (0.4689) loss 3.2084 (2.9953) grad_norm 1.7896 (1.6416) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][380/625] eta 0:01:55 lr 0.000866 wd 0.0500 time 0.4694 (0.4701) data time 0.0009 (0.0022) model time 0.4686 (0.4687) loss 3.7113 (2.9974) grad_norm 1.4774 (1.6388) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][390/625] eta 0:01:50 lr 0.000866 wd 0.0500 time 0.4641 (0.4699) data time 0.0008 (0.0022) model time 0.4632 (0.4685) loss 2.7430 (2.9994) grad_norm 1.5920 (1.6309) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][400/625] eta 0:01:45 lr 0.000866 wd 0.0500 time 0.4632 (0.4697) data time 0.0009 (0.0022) model time 0.4623 (0.4683) loss 3.3446 (3.0027) grad_norm 1.5676 (1.6360) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][410/625] eta 0:01:40 lr 0.000866 wd 0.0500 time 0.4654 (0.4695) data time 0.0008 (0.0022) model time 0.4646 (0.4681) loss 3.2921 (3.0037) grad_norm 1.7800 (1.6423) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][420/625] eta 0:01:36 lr 0.000866 wd 0.0500 time 0.4672 (0.4697) data time 0.0010 (0.0021) model time 0.4662 (0.4683) loss 2.2046 (3.0048) grad_norm 1.1042 (1.6368) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][430/625] eta 0:01:31 lr 0.000866 wd 0.0500 time 0.4641 (0.4696) data time 0.0010 (0.0021) model time 0.4630 (0.4682) loss 3.3519 (3.0004) grad_norm 1.6325 (1.6364) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][440/625] eta 0:01:26 lr 0.000866 wd 0.0500 time 0.4650 (0.4695) data time 0.0008 (0.0021) model time 0.4642 (0.4681) loss 3.8857 (3.0033) grad_norm 2.9930 (1.6345) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][450/625] eta 0:01:22 lr 0.000865 wd 0.0500 time 0.4675 (0.4694) data time 0.0008 (0.0021) model time 0.4667 (0.4680) loss 2.5639 (3.0066) grad_norm 2.6653 (1.6372) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][460/625] eta 0:01:17 lr 0.000865 wd 0.0500 time 0.4638 (0.4693) data time 0.0010 (0.0020) model time 0.4628 (0.4679) loss 3.5163 (3.0142) grad_norm 1.0647 (1.6371) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][470/625] eta 0:01:12 lr 0.000865 wd 0.0500 time 0.4631 (0.4691) data time 0.0009 (0.0020) model time 0.4623 (0.4677) loss 3.3987 (3.0155) grad_norm 2.0520 (1.6460) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][480/625] eta 0:01:08 lr 0.000865 wd 0.0500 time 0.4652 (0.4695) data time 0.0008 (0.0020) model time 0.4644 (0.4682) loss 3.2188 (3.0212) grad_norm 1.1131 (1.6460) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][490/625] eta 0:01:03 lr 0.000865 wd 0.0500 time 0.4627 (0.4694) data time 0.0010 (0.0020) model time 0.4617 (0.4680) loss 2.4994 (3.0238) grad_norm 2.6687 (1.6476) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][500/625] eta 0:00:58 lr 0.000865 wd 0.0500 time 0.4649 (0.4692) data time 0.0010 (0.0020) model time 0.4639 (0.4679) loss 3.3133 (3.0266) grad_norm 1.7087 (1.6491) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][510/625] eta 0:00:53 lr 0.000865 wd 0.0500 time 0.4607 (0.4692) data time 0.0009 (0.0019) model time 0.4598 (0.4678) loss 3.0974 (3.0244) grad_norm 1.6337 (1.6482) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][520/625] eta 0:00:49 lr 0.000865 wd 0.0500 time 0.4689 (0.4695) data time 0.0011 (0.0019) model time 0.4678 (0.4681) loss 2.8810 (3.0261) grad_norm 1.2927 (1.6456) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][530/625] eta 0:00:44 lr 0.000865 wd 0.0500 time 0.4647 (0.4694) data time 0.0008 (0.0019) model time 0.4640 (0.4680) loss 2.9358 (3.0261) grad_norm 1.1905 (1.6429) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][540/625] eta 0:00:39 lr 0.000865 wd 0.0500 time 0.4665 (0.4693) data time 0.0011 (0.0019) model time 0.4654 (0.4680) loss 3.1205 (3.0301) grad_norm 1.1946 (1.6418) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][550/625] eta 0:00:35 lr 0.000864 wd 0.0500 time 0.4637 (0.4692) data time 0.0009 (0.0019) model time 0.4629 (0.4679) loss 3.3779 (3.0301) grad_norm 1.8041 (1.6412) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][560/625] eta 0:00:30 lr 0.000864 wd 0.0500 time 0.4609 (0.4691) data time 0.0009 (0.0019) model time 0.4600 (0.4677) loss 3.8004 (3.0313) grad_norm 1.8166 (1.6379) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][570/625] eta 0:00:25 lr 0.000864 wd 0.0500 time 0.4622 (0.4690) data time 0.0009 (0.0019) model time 0.4613 (0.4676) loss 3.9158 (3.0354) grad_norm 2.1337 (1.6406) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][580/625] eta 0:00:21 lr 0.000864 wd 0.0500 time 0.4679 (0.4689) data time 0.0008 (0.0018) model time 0.4671 (0.4676) loss 3.0400 (3.0372) grad_norm 1.8813 (1.6441) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][590/625] eta 0:00:16 lr 0.000864 wd 0.0500 time 0.4684 (0.4688) data time 0.0008 (0.0018) model time 0.4676 (0.4675) loss 3.5363 (3.0366) grad_norm 1.7234 (1.6475) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][600/625] eta 0:00:11 lr 0.000864 wd 0.0500 time 0.4644 (0.4688) data time 0.0008 (0.0018) model time 0.4636 (0.4674) loss 3.4389 (3.0392) grad_norm 1.9075 (1.6475) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][610/625] eta 0:00:07 lr 0.000864 wd 0.0500 time 0.4574 (0.4688) data time 0.0008 (0.0018) model time 0.4566 (0.4674) loss 3.5870 (3.0381) grad_norm 1.3898 (1.6441) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [119/300][620/625] eta 0:00:02 lr 0.000864 wd 0.0500 time 0.4594 (0.4686) data time 0.0005 (0.0018) model time 0.4589 (0.4673) loss 3.0614 (3.0432) grad_norm 1.0343 (1.6420) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:28:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 119 training takes 0:04:53 [2024-08-10 11:28:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:29:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5454 (0.5454) Acc@1 88.232 (88.232) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 11:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.8804 (0.6851) Acc@1 78.760 (84.495) Acc@5 95.557 (97.266) Mem 16715MB [2024-08-10 11:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 0.9570 (0.8055) Acc@1 76.807 (81.462) Acc@5 94.238 (95.982) Mem 16715MB [2024-08-10 11:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.138 Acc@5 95.985 [2024-08-10 11:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-10 11:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.863 (0.863) Loss 0.4875 (0.4875) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.194) Loss 0.7900 (0.6133) Acc@1 80.811 (86.381) Acc@5 96.436 (97.749) Mem 16715MB [2024-08-10 11:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.9043 (0.7226) Acc@1 77.588 (83.394) Acc@5 95.312 (96.631) Mem 16715MB [2024-08-10 11:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.077 Acc@5 96.651 [2024-08-10 11:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 11:29:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][0/625] eta 0:14:13 lr 0.000864 wd 0.0500 time 1.3654 (1.3654) data time 0.6398 (0.6398) model time 0.0000 (0.0000) loss 3.2480 (3.2480) grad_norm 1.1760 (1.1760) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][10/625] eta 0:05:36 lr 0.000864 wd 0.0500 time 0.4637 (0.5469) data time 0.0010 (0.0596) model time 0.0000 (0.0000) loss 3.2777 (3.1781) grad_norm 2.6023 (1.6115) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][20/625] eta 0:05:07 lr 0.000864 wd 0.0500 time 0.4453 (0.5079) data time 0.0008 (0.0317) model time 0.0000 (0.0000) loss 3.3987 (3.1197) grad_norm 1.9632 (2.0213) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][30/625] eta 0:05:04 lr 0.000863 wd 0.0500 time 0.4638 (0.5115) data time 0.0008 (0.0218) model time 0.0000 (0.0000) loss 3.1988 (3.1357) grad_norm 1.3505 (1.9742) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][40/625] eta 0:04:52 lr 0.000863 wd 0.0500 time 0.4649 (0.5001) data time 0.0008 (0.0168) model time 0.0000 (0.0000) loss 3.8441 (3.1550) grad_norm 1.4928 (1.9142) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][50/625] eta 0:04:43 lr 0.000863 wd 0.0500 time 0.4653 (0.4932) data time 0.0011 (0.0137) model time 0.0000 (0.0000) loss 3.2397 (3.0467) grad_norm 1.2900 (1.8465) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][60/625] eta 0:04:37 lr 0.000863 wd 0.0500 time 0.4644 (0.4916) data time 0.0010 (0.0116) model time 0.4634 (0.4827) loss 2.4738 (3.0393) grad_norm 1.3370 (1.8089) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][70/625] eta 0:04:30 lr 0.000863 wd 0.0500 time 0.4656 (0.4879) data time 0.0009 (0.0101) model time 0.4647 (0.4733) loss 3.2801 (3.0361) grad_norm 1.3073 (1.7424) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][80/625] eta 0:04:24 lr 0.000863 wd 0.0500 time 0.4759 (0.4856) data time 0.0010 (0.0090) model time 0.4749 (0.4716) loss 3.3356 (3.0508) grad_norm 1.9861 (1.7307) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][90/625] eta 0:04:18 lr 0.000863 wd 0.0500 time 0.4635 (0.4834) data time 0.0010 (0.0083) model time 0.4625 (0.4695) loss 3.2812 (3.0335) grad_norm 1.6894 (1.7408) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][100/625] eta 0:04:12 lr 0.000863 wd 0.0500 time 0.4621 (0.4818) data time 0.0010 (0.0076) model time 0.4611 (0.4688) loss 2.9903 (3.0531) grad_norm 1.8138 (1.7195) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][110/625] eta 0:04:07 lr 0.000863 wd 0.0500 time 0.4768 (0.4804) data time 0.0008 (0.0070) model time 0.4760 (0.4682) loss 3.5021 (3.0695) grad_norm 1.8513 (1.7125) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][120/625] eta 0:04:01 lr 0.000863 wd 0.0500 time 0.4589 (0.4790) data time 0.0010 (0.0066) model time 0.4579 (0.4672) loss 3.2174 (3.0947) grad_norm 0.9273 (1.7046) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][130/625] eta 0:03:56 lr 0.000862 wd 0.0500 time 0.4657 (0.4779) data time 0.0008 (0.0062) model time 0.4649 (0.4668) loss 3.2596 (3.0760) grad_norm 1.4865 (1.6816) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][140/625] eta 0:03:51 lr 0.000862 wd 0.0500 time 0.4642 (0.4770) data time 0.0008 (0.0059) model time 0.4633 (0.4664) loss 3.3100 (3.0792) grad_norm 1.6609 (1.6792) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][150/625] eta 0:03:46 lr 0.000862 wd 0.0500 time 0.4656 (0.4762) data time 0.0011 (0.0056) model time 0.4645 (0.4661) loss 2.7975 (3.0717) grad_norm 1.7845 (1.7224) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][160/625] eta 0:03:41 lr 0.000862 wd 0.0500 time 0.4695 (0.4756) data time 0.0008 (0.0053) model time 0.4687 (0.4659) loss 2.6885 (3.0752) grad_norm 1.1987 (1.7211) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][170/625] eta 0:03:36 lr 0.000862 wd 0.0500 time 0.4635 (0.4750) data time 0.0010 (0.0051) model time 0.4625 (0.4659) loss 2.8964 (3.0708) grad_norm 1.5259 (1.7065) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][180/625] eta 0:03:31 lr 0.000862 wd 0.0500 time 0.4590 (0.4745) data time 0.0010 (0.0049) model time 0.4580 (0.4658) loss 3.3591 (3.0716) grad_norm 1.6069 (1.6960) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][190/625] eta 0:03:26 lr 0.000862 wd 0.0500 time 0.4651 (0.4752) data time 0.0008 (0.0047) model time 0.4643 (0.4674) loss 2.6003 (3.0669) grad_norm 2.0979 (1.6905) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][200/625] eta 0:03:21 lr 0.000862 wd 0.0500 time 0.4621 (0.4747) data time 0.0010 (0.0045) model time 0.4611 (0.4670) loss 2.8287 (3.0548) grad_norm 1.1268 (1.6813) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][210/625] eta 0:03:16 lr 0.000862 wd 0.0500 time 0.4640 (0.4741) data time 0.0008 (0.0043) model time 0.4632 (0.4667) loss 1.9562 (3.0413) grad_norm 1.9401 (1.6862) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][220/625] eta 0:03:11 lr 0.000862 wd 0.0500 time 0.4590 (0.4736) data time 0.0010 (0.0042) model time 0.4580 (0.4665) loss 3.1834 (3.0345) grad_norm 1.4116 (1.6912) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:30:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][230/625] eta 0:03:06 lr 0.000862 wd 0.0500 time 0.4673 (0.4732) data time 0.0008 (0.0040) model time 0.4665 (0.4663) loss 4.1547 (3.0554) grad_norm 1.2522 (1.6792) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][240/625] eta 0:03:02 lr 0.000861 wd 0.0500 time 0.4670 (0.4728) data time 0.0010 (0.0039) model time 0.4660 (0.4661) loss 3.2774 (3.0638) grad_norm 1.3301 (1.6644) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][250/625] eta 0:02:57 lr 0.000861 wd 0.0500 time 0.4642 (0.4725) data time 0.0008 (0.0038) model time 0.4634 (0.4659) loss 3.7307 (3.0645) grad_norm 1.9598 (1.6566) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][260/625] eta 0:02:52 lr 0.000861 wd 0.0500 time 0.4632 (0.4722) data time 0.0008 (0.0037) model time 0.4624 (0.4658) loss 3.7130 (3.0702) grad_norm 1.3631 (1.6632) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][270/625] eta 0:02:47 lr 0.000861 wd 0.0500 time 0.4782 (0.4719) data time 0.0011 (0.0036) model time 0.4772 (0.4657) loss 2.9585 (3.0705) grad_norm 1.7369 (1.6772) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][280/625] eta 0:02:42 lr 0.000861 wd 0.0500 time 0.4629 (0.4715) data time 0.0009 (0.0035) model time 0.4621 (0.4655) loss 2.8239 (3.0784) grad_norm 1.1606 (1.6676) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][290/625] eta 0:02:37 lr 0.000861 wd 0.0500 time 0.4631 (0.4712) data time 0.0011 (0.0034) model time 0.4620 (0.4653) loss 2.9785 (3.0771) grad_norm 1.2109 (1.6574) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][300/625] eta 0:02:33 lr 0.000861 wd 0.0500 time 0.6094 (0.4714) data time 0.0008 (0.0033) model time 0.6086 (0.4657) loss 2.8554 (3.0810) grad_norm 1.8057 (1.6637) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][310/625] eta 0:02:28 lr 0.000861 wd 0.0500 time 0.4631 (0.4712) data time 0.0011 (0.0033) model time 0.4620 (0.4657) loss 2.8080 (3.0753) grad_norm 1.7239 (1.6749) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][320/625] eta 0:02:23 lr 0.000861 wd 0.0500 time 0.4671 (0.4711) data time 0.0011 (0.0032) model time 0.4660 (0.4657) loss 2.9000 (3.0751) grad_norm 1.2959 (1.6695) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][330/625] eta 0:02:18 lr 0.000861 wd 0.0500 time 0.4664 (0.4710) data time 0.0008 (0.0031) model time 0.4656 (0.4657) loss 2.2617 (3.0661) grad_norm 1.5654 (1.6648) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][340/625] eta 0:02:14 lr 0.000860 wd 0.0500 time 0.4668 (0.4708) data time 0.0008 (0.0031) model time 0.4659 (0.4656) loss 3.4836 (3.0674) grad_norm 2.0610 (1.6702) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][350/625] eta 0:02:09 lr 0.000860 wd 0.0500 time 0.4599 (0.4706) data time 0.0011 (0.0030) model time 0.4588 (0.4655) loss 3.0160 (3.0669) grad_norm 1.6076 (1.6763) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][360/625] eta 0:02:04 lr 0.000860 wd 0.0500 time 0.4647 (0.4704) data time 0.0010 (0.0030) model time 0.4637 (0.4654) loss 3.3267 (3.0716) grad_norm 1.4004 (1.6694) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][370/625] eta 0:02:00 lr 0.000860 wd 0.0500 time 0.4587 (0.4712) data time 0.0011 (0.0029) model time 0.4576 (0.4665) loss 2.5857 (3.0694) grad_norm 1.5563 (1.6627) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][380/625] eta 0:01:55 lr 0.000860 wd 0.0500 time 0.4701 (0.4710) data time 0.0008 (0.0029) model time 0.4693 (0.4664) loss 2.5975 (3.0719) grad_norm 1.6074 (1.6585) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][390/625] eta 0:01:50 lr 0.000860 wd 0.0500 time 0.4622 (0.4709) data time 0.0008 (0.0028) model time 0.4614 (0.4664) loss 3.5103 (3.0702) grad_norm 1.4559 (1.6514) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][400/625] eta 0:01:46 lr 0.000860 wd 0.0500 time 0.4684 (0.4712) data time 0.0011 (0.0028) model time 0.4673 (0.4668) loss 3.4181 (3.0779) grad_norm 1.4538 (1.6486) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][410/625] eta 0:01:41 lr 0.000860 wd 0.0500 time 0.4611 (0.4720) data time 0.0011 (0.0027) model time 0.4600 (0.4678) loss 3.2476 (3.0695) grad_norm 1.4745 (1.6427) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][420/625] eta 0:01:36 lr 0.000860 wd 0.0500 time 0.4637 (0.4717) data time 0.0010 (0.0027) model time 0.4627 (0.4676) loss 3.5261 (3.0701) grad_norm 1.1043 (1.6356) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][430/625] eta 0:01:31 lr 0.000860 wd 0.0500 time 0.4557 (0.4715) data time 0.0009 (0.0027) model time 0.4548 (0.4674) loss 2.3135 (3.0741) grad_norm 1.4415 (1.6302) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][440/625] eta 0:01:27 lr 0.000859 wd 0.0500 time 0.4656 (0.4713) data time 0.0010 (0.0026) model time 0.4646 (0.4673) loss 2.7054 (3.0742) grad_norm 1.6837 (1.6267) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][450/625] eta 0:01:22 lr 0.000859 wd 0.0500 time 0.4636 (0.4711) data time 0.0010 (0.0026) model time 0.4625 (0.4671) loss 2.9020 (3.0726) grad_norm 1.5754 (1.6240) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][460/625] eta 0:01:17 lr 0.000859 wd 0.0500 time 0.4777 (0.4710) data time 0.0010 (0.0026) model time 0.4768 (0.4670) loss 2.7132 (3.0717) grad_norm 1.5706 (1.6215) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][470/625] eta 0:01:12 lr 0.000859 wd 0.0500 time 0.4645 (0.4709) data time 0.0008 (0.0025) model time 0.4637 (0.4670) loss 3.4095 (3.0688) grad_norm 1.1727 (1.6303) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][480/625] eta 0:01:08 lr 0.000859 wd 0.0500 time 0.4627 (0.4708) data time 0.0009 (0.0025) model time 0.4618 (0.4670) loss 3.0862 (3.0725) grad_norm 1.7552 (1.6301) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:32:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][490/625] eta 0:01:03 lr 0.000859 wd 0.0500 time 0.5218 (0.4708) data time 0.0008 (0.0025) model time 0.5210 (0.4671) loss 3.6849 (3.0678) grad_norm 2.1633 (1.6357) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][500/625] eta 0:00:58 lr 0.000859 wd 0.0500 time 0.4666 (0.4708) data time 0.0008 (0.0024) model time 0.4657 (0.4671) loss 2.1643 (3.0666) grad_norm 1.8143 (1.6451) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][510/625] eta 0:00:54 lr 0.000859 wd 0.0500 time 0.4663 (0.4707) data time 0.0010 (0.0024) model time 0.4653 (0.4670) loss 3.7335 (3.0675) grad_norm 1.3855 (1.6479) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][520/625] eta 0:00:49 lr 0.000859 wd 0.0500 time 0.4691 (0.4705) data time 0.0011 (0.0024) model time 0.4680 (0.4669) loss 3.1029 (3.0692) grad_norm 1.3784 (1.6422) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][530/625] eta 0:00:44 lr 0.000859 wd 0.0500 time 0.4692 (0.4704) data time 0.0010 (0.0024) model time 0.4681 (0.4668) loss 3.4411 (3.0738) grad_norm 1.6290 (1.6377) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][540/625] eta 0:00:39 lr 0.000859 wd 0.0500 time 0.4663 (0.4703) data time 0.0010 (0.0023) model time 0.4653 (0.4668) loss 3.2962 (3.0728) grad_norm 1.2005 (1.6335) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][550/625] eta 0:00:35 lr 0.000858 wd 0.0500 time 0.4823 (0.4702) data time 0.0010 (0.0023) model time 0.4813 (0.4667) loss 3.4967 (3.0693) grad_norm 1.1791 (1.6294) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][560/625] eta 0:00:30 lr 0.000858 wd 0.0500 time 0.4616 (0.4712) data time 0.0008 (0.0023) model time 0.4608 (0.4679) loss 3.0452 (3.0702) grad_norm 2.9737 (1.6294) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][570/625] eta 0:00:25 lr 0.000858 wd 0.0500 time 0.4631 (0.4711) data time 0.0010 (0.0023) model time 0.4621 (0.4678) loss 3.3272 (3.0725) grad_norm 1.3224 (1.6300) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][580/625] eta 0:00:21 lr 0.000858 wd 0.0500 time 0.4664 (0.4711) data time 0.0008 (0.0023) model time 0.4655 (0.4678) loss 1.9420 (3.0678) grad_norm 1.4124 (1.6295) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][590/625] eta 0:00:16 lr 0.000858 wd 0.0500 time 0.4642 (0.4711) data time 0.0011 (0.0022) model time 0.4631 (0.4679) loss 3.1450 (3.0653) grad_norm 2.0301 (1.6267) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][600/625] eta 0:00:11 lr 0.000858 wd 0.0500 time 0.4897 (0.4711) data time 0.0008 (0.0022) model time 0.4889 (0.4679) loss 2.8032 (3.0678) grad_norm 2.3134 (1.6415) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:33:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][610/625] eta 0:00:07 lr 0.000858 wd 0.0500 time 0.4598 (0.4710) data time 0.0007 (0.0022) model time 0.4590 (0.4678) loss 2.9861 (3.0650) grad_norm 1.2461 (1.6396) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [120/300][620/625] eta 0:00:02 lr 0.000858 wd 0.0500 time 0.4591 (0.4709) data time 0.0007 (0.0022) model time 0.4584 (0.4678) loss 3.3199 (3.0677) grad_norm 1.2543 (1.6383) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 120 training takes 0:04:54 [2024-08-10 11:34:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:34:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:34:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5713 (0.5713) Acc@1 87.842 (87.842) Acc@5 98.145 (98.145) Mem 16715MB [2024-08-10 11:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.164) Loss 0.9116 (0.6928) Acc@1 78.076 (84.637) Acc@5 95.215 (97.306) Mem 16715MB [2024-08-10 11:34:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0547 (0.8248) Acc@1 74.316 (81.324) Acc@5 93.799 (95.936) Mem 16715MB [2024-08-10 11:34:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.146 Acc@5 95.913 [2024-08-10 11:34:07 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-10 11:34:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.848 (0.848) Loss 0.4854 (0.4854) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:34:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.194) Loss 0.7881 (0.6127) Acc@1 80.957 (86.395) Acc@5 96.387 (97.736) Mem 16715MB [2024-08-10 11:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.158) Loss 0.9023 (0.7217) Acc@1 77.637 (83.422) Acc@5 95.264 (96.626) Mem 16715MB [2024-08-10 11:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.095 Acc@5 96.647 [2024-08-10 11:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 11:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.10% [2024-08-10 11:34:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:34:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:34:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][0/625] eta 0:08:20 lr 0.000858 wd 0.0500 time 0.8009 (0.8009) data time 0.3781 (0.3781) model time 0.0000 (0.0000) loss 2.6598 (2.6598) grad_norm 1.0709 (1.0709) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][10/625] eta 0:05:04 lr 0.000858 wd 0.0500 time 0.4647 (0.4955) data time 0.0011 (0.0354) model time 0.0000 (0.0000) loss 2.8904 (2.8933) grad_norm 1.6843 (1.6011) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][20/625] eta 0:04:50 lr 0.000858 wd 0.0500 time 0.4608 (0.4806) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 2.3329 (2.8178) grad_norm 2.3002 (1.6681) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][30/625] eta 0:04:43 lr 0.000857 wd 0.0500 time 0.4626 (0.4763) data time 0.0010 (0.0132) model time 0.0000 (0.0000) loss 3.2004 (2.9466) grad_norm 2.2708 (1.7216) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][40/625] eta 0:04:37 lr 0.000857 wd 0.0500 time 0.4692 (0.4739) data time 0.0010 (0.0103) model time 0.0000 (0.0000) loss 3.2430 (3.0730) grad_norm 1.1977 (1.7433) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][50/625] eta 0:04:31 lr 0.000857 wd 0.0500 time 0.4638 (0.4721) data time 0.0010 (0.0085) model time 0.0000 (0.0000) loss 2.8971 (3.0443) grad_norm 1.1662 (1.8015) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][60/625] eta 0:04:26 lr 0.000857 wd 0.0500 time 0.4602 (0.4718) data time 0.0008 (0.0072) model time 0.4594 (0.4691) loss 2.6563 (3.0650) grad_norm 1.6316 (1.7198) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][70/625] eta 0:04:21 lr 0.000857 wd 0.0500 time 0.4640 (0.4707) data time 0.0008 (0.0064) model time 0.4632 (0.4661) loss 2.8739 (3.0780) grad_norm 1.3937 (1.7027) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][80/625] eta 0:04:16 lr 0.000857 wd 0.0500 time 0.4628 (0.4705) data time 0.0008 (0.0057) model time 0.4620 (0.4667) loss 3.5806 (3.0738) grad_norm 1.2912 (1.6498) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][90/625] eta 0:04:11 lr 0.000857 wd 0.0500 time 0.4608 (0.4701) data time 0.0008 (0.0052) model time 0.4599 (0.4664) loss 1.5476 (3.0703) grad_norm 1.0105 (1.6300) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][100/625] eta 0:04:07 lr 0.000857 wd 0.0500 time 0.4089 (0.4717) data time 0.0010 (0.0049) model time 0.4079 (0.4701) loss 3.3924 (3.0981) grad_norm 1.6915 (1.6459) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][110/625] eta 0:04:02 lr 0.000857 wd 0.0500 time 0.4589 (0.4712) data time 0.0008 (0.0046) model time 0.4581 (0.4692) loss 2.2422 (3.1001) grad_norm 2.0046 (1.6531) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][120/625] eta 0:03:59 lr 0.000857 wd 0.0500 time 0.4625 (0.4745) data time 0.0008 (0.0043) model time 0.4617 (0.4750) loss 3.3601 (3.0983) grad_norm 1.4771 (1.6584) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][130/625] eta 0:03:55 lr 0.000856 wd 0.0500 time 0.4623 (0.4764) data time 0.0009 (0.0040) model time 0.4614 (0.4779) loss 2.2888 (3.0935) grad_norm 1.7817 (1.6849) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][140/625] eta 0:03:50 lr 0.000856 wd 0.0500 time 0.4608 (0.4758) data time 0.0008 (0.0039) model time 0.4600 (0.4766) loss 3.2505 (3.0957) grad_norm 1.5490 (1.7011) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][150/625] eta 0:03:45 lr 0.000856 wd 0.0500 time 0.4651 (0.4751) data time 0.0010 (0.0037) model time 0.4640 (0.4753) loss 3.5708 (3.1016) grad_norm 1.4889 (1.7061) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][160/625] eta 0:03:40 lr 0.000856 wd 0.0500 time 0.4657 (0.4743) data time 0.0009 (0.0035) model time 0.4648 (0.4741) loss 3.0814 (3.0731) grad_norm 1.1670 (1.6820) loss_scale 2048.0000 (1062.1615) mem 16715MB [2024-08-10 11:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][170/625] eta 0:03:35 lr 0.000856 wd 0.0500 time 0.4656 (0.4740) data time 0.0009 (0.0034) model time 0.4647 (0.4735) loss 3.6729 (3.0629) grad_norm 1.2088 (1.6718) loss_scale 2048.0000 (1119.8129) mem 16715MB [2024-08-10 11:35:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][180/625] eta 0:03:30 lr 0.000856 wd 0.0500 time 0.5009 (0.4736) data time 0.0008 (0.0033) model time 0.5001 (0.4730) loss 2.4693 (3.0486) grad_norm 1.4548 (1.7277) loss_scale 2048.0000 (1171.0939) mem 16715MB [2024-08-10 11:35:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][190/625] eta 0:03:25 lr 0.000856 wd 0.0500 time 0.4639 (0.4732) data time 0.0008 (0.0032) model time 0.4631 (0.4723) loss 2.2109 (3.0432) grad_norm 1.4214 (1.7271) loss_scale 2048.0000 (1217.0052) mem 16715MB [2024-08-10 11:35:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][200/625] eta 0:03:20 lr 0.000856 wd 0.0500 time 0.4628 (0.4728) data time 0.0007 (0.0030) model time 0.4620 (0.4718) loss 3.8922 (3.0491) grad_norm 1.1246 (1.7185) loss_scale 2048.0000 (1258.3483) mem 16715MB [2024-08-10 11:35:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][210/625] eta 0:03:16 lr 0.000856 wd 0.0500 time 0.4710 (0.4724) data time 0.0010 (0.0030) model time 0.4700 (0.4713) loss 3.5791 (3.0603) grad_norm 1.2493 (1.7040) loss_scale 2048.0000 (1295.7725) mem 16715MB [2024-08-10 11:35:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][220/625] eta 0:03:11 lr 0.000856 wd 0.0500 time 0.4609 (0.4720) data time 0.0008 (0.0029) model time 0.4601 (0.4708) loss 3.4735 (3.0665) grad_norm 1.5084 (1.6986) loss_scale 2048.0000 (1329.8100) mem 16715MB [2024-08-10 11:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][230/625] eta 0:03:06 lr 0.000855 wd 0.0500 time 0.4642 (0.4716) data time 0.0011 (0.0028) model time 0.4631 (0.4703) loss 3.2674 (3.0588) grad_norm 1.0953 (1.6928) loss_scale 2048.0000 (1360.9004) mem 16715MB [2024-08-10 11:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][240/625] eta 0:03:01 lr 0.000855 wd 0.0500 time 0.4641 (0.4712) data time 0.0010 (0.0027) model time 0.4630 (0.4698) loss 3.4238 (3.0617) grad_norm 1.6381 (1.6939) loss_scale 2048.0000 (1389.4108) mem 16715MB [2024-08-10 11:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][250/625] eta 0:02:56 lr 0.000855 wd 0.0500 time 0.4681 (0.4708) data time 0.0008 (0.0027) model time 0.4673 (0.4693) loss 3.0513 (3.0554) grad_norm 1.6200 (1.7028) loss_scale 2048.0000 (1415.6494) mem 16715MB [2024-08-10 11:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][260/625] eta 0:02:51 lr 0.000855 wd 0.0500 time 0.4592 (0.4705) data time 0.0008 (0.0026) model time 0.4584 (0.4690) loss 3.6557 (3.0537) grad_norm 1.5561 (1.6922) loss_scale 2048.0000 (1439.8774) mem 16715MB [2024-08-10 11:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][270/625] eta 0:02:46 lr 0.000855 wd 0.0500 time 0.4682 (0.4703) data time 0.0009 (0.0025) model time 0.4673 (0.4688) loss 4.2300 (3.0586) grad_norm 2.2144 (1.7198) loss_scale 2048.0000 (1462.3173) mem 16715MB [2024-08-10 11:36:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][280/625] eta 0:02:42 lr 0.000855 wd 0.0500 time 0.4609 (0.4701) data time 0.0011 (0.0025) model time 0.4598 (0.4685) loss 3.2112 (3.0594) grad_norm 1.1155 (1.7136) loss_scale 2048.0000 (1483.1601) mem 16715MB [2024-08-10 11:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][290/625] eta 0:02:37 lr 0.000855 wd 0.0500 time 0.4632 (0.4700) data time 0.0008 (0.0024) model time 0.4624 (0.4684) loss 3.5865 (3.0507) grad_norm 1.4267 (1.7084) loss_scale 2048.0000 (1502.5704) mem 16715MB [2024-08-10 11:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][300/625] eta 0:02:32 lr 0.000855 wd 0.0500 time 0.4632 (0.4704) data time 0.0010 (0.0024) model time 0.4621 (0.4688) loss 2.0874 (3.0525) grad_norm 1.4152 (1.7034) loss_scale 2048.0000 (1520.6910) mem 16715MB [2024-08-10 11:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][310/625] eta 0:02:28 lr 0.000855 wd 0.0500 time 0.4599 (0.4702) data time 0.0011 (0.0024) model time 0.4589 (0.4686) loss 2.5177 (3.0537) grad_norm 1.6315 (1.7022) loss_scale 2048.0000 (1537.6463) mem 16715MB [2024-08-10 11:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][320/625] eta 0:02:23 lr 0.000855 wd 0.0500 time 0.4091 (0.4706) data time 0.0011 (0.0023) model time 0.4080 (0.4691) loss 3.3680 (3.0554) grad_norm 0.8983 (1.6942) loss_scale 2048.0000 (1553.5452) mem 16715MB [2024-08-10 11:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][330/625] eta 0:02:18 lr 0.000855 wd 0.0500 time 0.4667 (0.4705) data time 0.0009 (0.0023) model time 0.4657 (0.4690) loss 3.7559 (3.0535) grad_norm 2.6898 (1.6982) loss_scale 2048.0000 (1568.4834) mem 16715MB [2024-08-10 11:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][340/625] eta 0:02:14 lr 0.000854 wd 0.0500 time 0.4720 (0.4704) data time 0.0009 (0.0022) model time 0.4711 (0.4689) loss 4.0319 (3.0604) grad_norm 1.4451 (1.6908) loss_scale 2048.0000 (1582.5455) mem 16715MB [2024-08-10 11:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][350/625] eta 0:02:09 lr 0.000854 wd 0.0500 time 0.4650 (0.4708) data time 0.0008 (0.0022) model time 0.4642 (0.4694) loss 2.9314 (3.0575) grad_norm 1.0425 (1.6783) loss_scale 2048.0000 (1595.8063) mem 16715MB [2024-08-10 11:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][360/625] eta 0:02:04 lr 0.000854 wd 0.0500 time 0.4601 (0.4706) data time 0.0009 (0.0022) model time 0.4592 (0.4692) loss 1.9179 (3.0547) grad_norm 2.0664 (1.6731) loss_scale 2048.0000 (1608.3324) mem 16715MB [2024-08-10 11:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][370/625] eta 0:01:59 lr 0.000854 wd 0.0500 time 0.4631 (0.4704) data time 0.0008 (0.0021) model time 0.4624 (0.4690) loss 2.7671 (3.0553) grad_norm 2.2682 (1.6664) loss_scale 2048.0000 (1620.1833) mem 16715MB [2024-08-10 11:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][380/625] eta 0:01:55 lr 0.000854 wd 0.0500 time 0.4569 (0.4702) data time 0.0010 (0.0021) model time 0.4559 (0.4688) loss 3.2132 (3.0530) grad_norm 1.6806 (1.6673) loss_scale 2048.0000 (1631.4121) mem 16715MB [2024-08-10 11:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][390/625] eta 0:01:50 lr 0.000854 wd 0.0500 time 0.4635 (0.4701) data time 0.0009 (0.0021) model time 0.4627 (0.4686) loss 1.9043 (3.0529) grad_norm 1.4022 (1.6672) loss_scale 2048.0000 (1642.0665) mem 16715MB [2024-08-10 11:37:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][400/625] eta 0:01:45 lr 0.000854 wd 0.0500 time 0.4649 (0.4700) data time 0.0010 (0.0021) model time 0.4639 (0.4685) loss 3.1734 (3.0577) grad_norm 2.0636 (1.6673) loss_scale 2048.0000 (1652.1895) mem 16715MB [2024-08-10 11:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][410/625] eta 0:01:41 lr 0.000854 wd 0.0500 time 0.4634 (0.4699) data time 0.0011 (0.0021) model time 0.4623 (0.4684) loss 2.1038 (3.0537) grad_norm 1.2766 (1.6641) loss_scale 2048.0000 (1661.8200) mem 16715MB [2024-08-10 11:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][420/625] eta 0:01:36 lr 0.000854 wd 0.0500 time 0.4605 (0.4698) data time 0.0008 (0.0020) model time 0.4597 (0.4683) loss 3.3082 (3.0528) grad_norm 1.0920 (1.6558) loss_scale 2048.0000 (1670.9929) mem 16715MB [2024-08-10 11:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][430/625] eta 0:01:31 lr 0.000854 wd 0.0500 time 0.4645 (0.4697) data time 0.0008 (0.0020) model time 0.4636 (0.4682) loss 3.0775 (3.0561) grad_norm 1.4330 (1.6522) loss_scale 2048.0000 (1679.7401) mem 16715MB [2024-08-10 11:37:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][440/625] eta 0:01:26 lr 0.000853 wd 0.0500 time 0.4644 (0.4696) data time 0.0008 (0.0020) model time 0.4636 (0.4681) loss 2.7044 (3.0533) grad_norm 1.3871 (1.6509) loss_scale 2048.0000 (1688.0907) mem 16715MB [2024-08-10 11:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][450/625] eta 0:01:22 lr 0.000853 wd 0.0500 time 0.7122 (0.4703) data time 0.0010 (0.0020) model time 0.7112 (0.4690) loss 2.6366 (3.0543) grad_norm 1.7016 (1.6444) loss_scale 2048.0000 (1696.0710) mem 16715MB [2024-08-10 11:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][460/625] eta 0:01:17 lr 0.000853 wd 0.0500 time 0.4613 (0.4701) data time 0.0007 (0.0019) model time 0.4605 (0.4688) loss 3.3536 (3.0507) grad_norm 1.2911 (1.6457) loss_scale 2048.0000 (1703.7050) mem 16715MB [2024-08-10 11:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][470/625] eta 0:01:12 lr 0.000853 wd 0.0500 time 0.4633 (0.4703) data time 0.0008 (0.0019) model time 0.4625 (0.4690) loss 3.3261 (3.0503) grad_norm 1.5722 (1.6416) loss_scale 2048.0000 (1711.0149) mem 16715MB [2024-08-10 11:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][480/625] eta 0:01:08 lr 0.000853 wd 0.0500 time 0.4693 (0.4703) data time 0.0009 (0.0019) model time 0.4684 (0.4689) loss 3.6210 (3.0520) grad_norm 1.1402 (1.6351) loss_scale 2048.0000 (1718.0208) mem 16715MB [2024-08-10 11:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][490/625] eta 0:01:03 lr 0.000853 wd 0.0500 time 0.4629 (0.4702) data time 0.0010 (0.0019) model time 0.4619 (0.4688) loss 3.3891 (3.0558) grad_norm 1.8339 (1.6328) loss_scale 2048.0000 (1724.7413) mem 16715MB [2024-08-10 11:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][500/625] eta 0:00:58 lr 0.000853 wd 0.0500 time 0.4652 (0.4705) data time 0.0008 (0.0019) model time 0.4644 (0.4691) loss 3.4708 (3.0559) grad_norm 1.5316 (1.6368) loss_scale 2048.0000 (1731.1936) mem 16715MB [2024-08-10 11:38:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][510/625] eta 0:00:54 lr 0.000853 wd 0.0500 time 0.4624 (0.4703) data time 0.0010 (0.0019) model time 0.4613 (0.4690) loss 3.1133 (3.0565) grad_norm 2.0929 (1.6364) loss_scale 2048.0000 (1737.3933) mem 16715MB [2024-08-10 11:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][520/625] eta 0:00:49 lr 0.000853 wd 0.0500 time 0.4594 (0.4703) data time 0.0008 (0.0019) model time 0.4586 (0.4689) loss 2.1562 (3.0499) grad_norm 1.9534 (1.6345) loss_scale 2048.0000 (1743.3551) mem 16715MB [2024-08-10 11:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][530/625] eta 0:00:44 lr 0.000853 wd 0.0500 time 0.4796 (0.4702) data time 0.0009 (0.0018) model time 0.4787 (0.4688) loss 3.1796 (3.0520) grad_norm 1.0457 (1.6312) loss_scale 2048.0000 (1749.0923) mem 16715MB [2024-08-10 11:38:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][540/625] eta 0:00:39 lr 0.000852 wd 0.0500 time 0.4616 (0.4700) data time 0.0011 (0.0018) model time 0.4606 (0.4687) loss 3.2884 (3.0516) grad_norm 1.0773 (1.6330) loss_scale 2048.0000 (1754.6174) mem 16715MB [2024-08-10 11:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][550/625] eta 0:00:35 lr 0.000852 wd 0.0500 time 0.4666 (0.4700) data time 0.0008 (0.0018) model time 0.4658 (0.4686) loss 2.4936 (3.0510) grad_norm 1.4751 (1.6315) loss_scale 2048.0000 (1759.9419) mem 16715MB [2024-08-10 11:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][560/625] eta 0:00:30 lr 0.000852 wd 0.0500 time 0.4677 (0.4699) data time 0.0008 (0.0018) model time 0.4669 (0.4685) loss 2.7036 (3.0515) grad_norm 1.3143 (1.6276) loss_scale 2048.0000 (1765.0766) mem 16715MB [2024-08-10 11:38:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][570/625] eta 0:00:25 lr 0.000852 wd 0.0500 time 0.4678 (0.4698) data time 0.0010 (0.0018) model time 0.4668 (0.4685) loss 3.5457 (3.0501) grad_norm 1.1719 (1.6306) loss_scale 2048.0000 (1770.0315) mem 16715MB [2024-08-10 11:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][580/625] eta 0:00:21 lr 0.000852 wd 0.0500 time 0.4637 (0.4697) data time 0.0010 (0.0018) model time 0.4627 (0.4684) loss 3.7232 (3.0530) grad_norm 1.3484 (1.6263) loss_scale 2048.0000 (1774.8158) mem 16715MB [2024-08-10 11:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][590/625] eta 0:00:16 lr 0.000852 wd 0.0500 time 0.4596 (0.4696) data time 0.0008 (0.0018) model time 0.4589 (0.4682) loss 3.1792 (3.0533) grad_norm 1.5960 (1.6234) loss_scale 2048.0000 (1779.4382) mem 16715MB [2024-08-10 11:38:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][600/625] eta 0:00:11 lr 0.000852 wd 0.0500 time 0.4603 (0.4695) data time 0.0010 (0.0018) model time 0.4593 (0.4681) loss 2.4490 (3.0503) grad_norm 1.5235 (1.6274) loss_scale 2048.0000 (1783.9068) mem 16715MB [2024-08-10 11:38:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][610/625] eta 0:00:07 lr 0.000852 wd 0.0500 time 0.4592 (0.4694) data time 0.0005 (0.0018) model time 0.4587 (0.4680) loss 3.9350 (3.0547) grad_norm 1.6693 (1.6264) loss_scale 2048.0000 (1788.2291) mem 16715MB [2024-08-10 11:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [121/300][620/625] eta 0:00:02 lr 0.000852 wd 0.0500 time 0.4650 (0.4693) data time 0.0007 (0.0018) model time 0.4643 (0.4679) loss 2.0224 (3.0516) grad_norm 2.0030 (1.6273) loss_scale 2048.0000 (1792.4122) mem 16715MB [2024-08-10 11:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 121 training takes 0:04:53 [2024-08-10 11:39:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:39:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:39:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.511 (0.511) Loss 0.5493 (0.5493) Acc@1 88.086 (88.086) Acc@5 98.145 (98.145) Mem 16715MB [2024-08-10 11:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.8813 (0.6750) Acc@1 78.809 (84.823) Acc@5 95.117 (97.354) Mem 16715MB [2024-08-10 11:39:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.9917 (0.7964) Acc@1 74.268 (81.676) Acc@5 94.482 (95.989) Mem 16715MB [2024-08-10 11:39:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.422 Acc@5 95.975 [2024-08-10 11:39:11 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-10 11:39:11 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.42% [2024-08-10 11:39:11 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 11:39:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 11:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.4836 (0.4836) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.7866 (0.6121) Acc@1 81.006 (86.386) Acc@5 96.289 (97.732) Mem 16715MB [2024-08-10 11:39:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9004 (0.7209) Acc@1 77.539 (83.429) Acc@5 95.410 (96.649) Mem 16715MB [2024-08-10 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.117 Acc@5 96.669 [2024-08-10 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 11:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.12% [2024-08-10 11:39:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:39:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][0/625] eta 0:08:18 lr 0.000852 wd 0.0500 time 0.7983 (0.7983) data time 0.3908 (0.3908) model time 0.0000 (0.0000) loss 3.0575 (3.0575) grad_norm 1.5347 (1.5347) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][10/625] eta 0:05:04 lr 0.000852 wd 0.0500 time 0.4656 (0.4956) data time 0.0010 (0.0365) model time 0.0000 (0.0000) loss 3.1568 (3.0157) grad_norm 1.4287 (1.6050) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][20/625] eta 0:04:55 lr 0.000851 wd 0.0500 time 0.4070 (0.4892) data time 0.0011 (0.0196) model time 0.0000 (0.0000) loss 3.0474 (3.0185) grad_norm 2.2624 (1.7236) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][30/625] eta 0:04:50 lr 0.000851 wd 0.0500 time 0.4612 (0.4885) data time 0.0010 (0.0137) model time 0.0000 (0.0000) loss 3.0916 (3.0873) grad_norm 2.2712 (1.9418) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][40/625] eta 0:04:41 lr 0.000851 wd 0.0500 time 0.4624 (0.4820) data time 0.0010 (0.0106) model time 0.0000 (0.0000) loss 2.8443 (3.0475) grad_norm 1.9493 (1.9140) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][50/625] eta 0:04:34 lr 0.000851 wd 0.0500 time 0.4622 (0.4782) data time 0.0008 (0.0087) model time 0.0000 (0.0000) loss 2.8870 (3.0731) grad_norm 1.4727 (1.7937) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][60/625] eta 0:04:28 lr 0.000851 wd 0.0500 time 0.4607 (0.4760) data time 0.0008 (0.0075) model time 0.4599 (0.4634) loss 1.9850 (3.0239) grad_norm 1.4836 (1.7394) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][70/625] eta 0:04:24 lr 0.000851 wd 0.0500 time 0.4749 (0.4768) data time 0.0008 (0.0066) model time 0.4742 (0.4720) loss 2.4701 (3.0288) grad_norm 1.8185 (1.6830) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:39:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][80/625] eta 0:04:19 lr 0.000851 wd 0.0500 time 0.4611 (0.4755) data time 0.0010 (0.0059) model time 0.4601 (0.4696) loss 3.2885 (3.0102) grad_norm 1.5373 (1.7079) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][90/625] eta 0:04:14 lr 0.000851 wd 0.0500 time 0.4623 (0.4764) data time 0.0008 (0.0054) model time 0.4615 (0.4729) loss 3.4215 (3.0080) grad_norm 1.6880 (1.7371) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][100/625] eta 0:04:09 lr 0.000851 wd 0.0500 time 0.4639 (0.4750) data time 0.0008 (0.0049) model time 0.4631 (0.4706) loss 2.8424 (3.0095) grad_norm 1.7633 (1.7173) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][110/625] eta 0:04:04 lr 0.000851 wd 0.0500 time 0.4650 (0.4739) data time 0.0011 (0.0046) model time 0.4640 (0.4692) loss 3.3888 (3.0214) grad_norm 1.6195 (1.6993) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][120/625] eta 0:03:58 lr 0.000850 wd 0.0500 time 0.4690 (0.4731) data time 0.0009 (0.0043) model time 0.4682 (0.4682) loss 3.5668 (3.0172) grad_norm 1.7495 (1.6859) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][130/625] eta 0:03:53 lr 0.000850 wd 0.0500 time 0.4690 (0.4725) data time 0.0008 (0.0041) model time 0.4681 (0.4678) loss 3.6533 (3.0296) grad_norm 1.0525 (1.6686) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][140/625] eta 0:03:48 lr 0.000850 wd 0.0500 time 0.4666 (0.4720) data time 0.0009 (0.0039) model time 0.4658 (0.4674) loss 3.3791 (3.0227) grad_norm 1.0591 (1.6729) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][150/625] eta 0:03:44 lr 0.000850 wd 0.0500 time 0.4656 (0.4716) data time 0.0011 (0.0037) model time 0.4645 (0.4672) loss 3.3843 (3.0255) grad_norm 1.5639 (1.6523) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][160/625] eta 0:03:39 lr 0.000850 wd 0.0500 time 0.4660 (0.4714) data time 0.0008 (0.0036) model time 0.4652 (0.4670) loss 3.7142 (3.0276) grad_norm 1.6320 (1.6383) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][170/625] eta 0:03:34 lr 0.000850 wd 0.0500 time 0.4584 (0.4710) data time 0.0010 (0.0034) model time 0.4574 (0.4667) loss 3.3283 (3.0308) grad_norm 1.8193 (1.6270) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][180/625] eta 0:03:29 lr 0.000850 wd 0.0500 time 0.4649 (0.4706) data time 0.0008 (0.0033) model time 0.4641 (0.4664) loss 4.0972 (3.0410) grad_norm 1.3974 (1.6197) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][190/625] eta 0:03:24 lr 0.000850 wd 0.0500 time 0.4677 (0.4703) data time 0.0007 (0.0032) model time 0.4670 (0.4662) loss 2.6019 (3.0459) grad_norm 1.7700 (1.6257) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][200/625] eta 0:03:19 lr 0.000850 wd 0.0500 time 0.4591 (0.4700) data time 0.0008 (0.0031) model time 0.4584 (0.4660) loss 3.4608 (3.0537) grad_norm 1.7140 (1.6310) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][210/625] eta 0:03:14 lr 0.000850 wd 0.0500 time 0.4638 (0.4698) data time 0.0011 (0.0030) model time 0.4627 (0.4659) loss 3.1946 (3.0473) grad_norm 1.1087 (1.6426) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][220/625] eta 0:03:10 lr 0.000850 wd 0.0500 time 0.4653 (0.4705) data time 0.0009 (0.0029) model time 0.4645 (0.4670) loss 2.7526 (3.0416) grad_norm 1.1663 (1.6328) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][230/625] eta 0:03:05 lr 0.000849 wd 0.0500 time 0.4646 (0.4703) data time 0.0007 (0.0028) model time 0.4638 (0.4669) loss 2.9257 (3.0350) grad_norm 1.6126 (1.6312) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][240/625] eta 0:03:00 lr 0.000849 wd 0.0500 time 0.4597 (0.4699) data time 0.0010 (0.0027) model time 0.4587 (0.4666) loss 3.3114 (3.0378) grad_norm 1.5285 (1.6240) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][250/625] eta 0:02:56 lr 0.000849 wd 0.0500 time 0.4607 (0.4703) data time 0.0010 (0.0027) model time 0.4597 (0.4671) loss 2.6869 (3.0364) grad_norm 1.2518 (1.6269) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][260/625] eta 0:02:51 lr 0.000849 wd 0.0500 time 0.4636 (0.4700) data time 0.0011 (0.0026) model time 0.4625 (0.4668) loss 2.2606 (3.0317) grad_norm 2.8036 (1.6537) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][270/625] eta 0:02:46 lr 0.000849 wd 0.0500 time 0.4662 (0.4697) data time 0.0007 (0.0026) model time 0.4654 (0.4666) loss 3.6218 (3.0311) grad_norm 1.1553 (1.6497) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][280/625] eta 0:02:41 lr 0.000849 wd 0.0500 time 0.4647 (0.4695) data time 0.0009 (0.0025) model time 0.4638 (0.4664) loss 2.5745 (3.0239) grad_norm 1.7626 (1.6443) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][290/625] eta 0:02:37 lr 0.000849 wd 0.0500 time 0.4650 (0.4693) data time 0.0009 (0.0025) model time 0.4640 (0.4663) loss 3.2015 (3.0300) grad_norm 1.4290 (1.6396) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][300/625] eta 0:02:32 lr 0.000849 wd 0.0500 time 0.4614 (0.4692) data time 0.0011 (0.0024) model time 0.4603 (0.4662) loss 2.7620 (3.0338) grad_norm 1.7377 (1.6414) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][310/625] eta 0:02:27 lr 0.000849 wd 0.0500 time 0.4590 (0.4690) data time 0.0008 (0.0024) model time 0.4583 (0.4661) loss 2.9989 (3.0311) grad_norm 1.5131 (1.6505) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][320/625] eta 0:02:22 lr 0.000849 wd 0.0500 time 0.4663 (0.4687) data time 0.0008 (0.0023) model time 0.4655 (0.4658) loss 3.5994 (3.0317) grad_norm 1.2790 (1.6439) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][330/625] eta 0:02:18 lr 0.000848 wd 0.0500 time 0.4672 (0.4685) data time 0.0008 (0.0023) model time 0.4664 (0.4657) loss 3.7818 (3.0347) grad_norm 1.7247 (1.6370) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][340/625] eta 0:02:13 lr 0.000848 wd 0.0500 time 0.4599 (0.4683) data time 0.0008 (0.0022) model time 0.4591 (0.4655) loss 2.4867 (3.0317) grad_norm 1.4095 (1.6377) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][350/625] eta 0:02:08 lr 0.000848 wd 0.0500 time 0.4656 (0.4682) data time 0.0010 (0.0022) model time 0.4647 (0.4653) loss 2.2568 (3.0241) grad_norm 1.1152 (1.6408) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][360/625] eta 0:02:04 lr 0.000848 wd 0.0500 time 0.4658 (0.4680) data time 0.0007 (0.0022) model time 0.4651 (0.4652) loss 2.9355 (3.0264) grad_norm 5.3149 (1.6477) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][370/625] eta 0:01:59 lr 0.000848 wd 0.0500 time 0.4643 (0.4678) data time 0.0010 (0.0022) model time 0.4633 (0.4651) loss 3.5397 (3.0303) grad_norm 2.0403 (1.6590) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][380/625] eta 0:01:54 lr 0.000848 wd 0.0500 time 0.4643 (0.4677) data time 0.0008 (0.0021) model time 0.4635 (0.4650) loss 3.7599 (3.0301) grad_norm 1.6496 (1.6546) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][390/625] eta 0:01:49 lr 0.000848 wd 0.0500 time 0.4592 (0.4680) data time 0.0008 (0.0021) model time 0.4585 (0.4654) loss 3.2907 (3.0258) grad_norm 1.6209 (1.6470) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][400/625] eta 0:01:45 lr 0.000848 wd 0.0500 time 0.4609 (0.4679) data time 0.0007 (0.0021) model time 0.4602 (0.4653) loss 3.7088 (3.0266) grad_norm 1.5137 (1.6471) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][410/625] eta 0:01:40 lr 0.000848 wd 0.0500 time 0.4655 (0.4681) data time 0.0011 (0.0021) model time 0.4644 (0.4656) loss 3.2315 (3.0282) grad_norm 1.2613 (1.6411) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][420/625] eta 0:01:35 lr 0.000848 wd 0.0500 time 0.4639 (0.4680) data time 0.0008 (0.0020) model time 0.4631 (0.4655) loss 2.7795 (3.0246) grad_norm 1.0906 (1.6369) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][430/625] eta 0:01:31 lr 0.000847 wd 0.0500 time 0.4640 (0.4685) data time 0.0008 (0.0020) model time 0.4632 (0.4660) loss 3.4293 (3.0244) grad_norm 1.6293 (1.6327) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][440/625] eta 0:01:26 lr 0.000847 wd 0.0500 time 0.4672 (0.4689) data time 0.0007 (0.0020) model time 0.4665 (0.4666) loss 3.0970 (3.0269) grad_norm 1.9713 (1.6344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][450/625] eta 0:01:22 lr 0.000847 wd 0.0500 time 0.4566 (0.4688) data time 0.0010 (0.0020) model time 0.4556 (0.4665) loss 1.9214 (3.0237) grad_norm 2.1228 (1.6397) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][460/625] eta 0:01:17 lr 0.000847 wd 0.0500 time 0.4700 (0.4688) data time 0.0008 (0.0019) model time 0.4692 (0.4665) loss 3.2361 (3.0206) grad_norm 1.5345 (1.6359) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:42:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][470/625] eta 0:01:12 lr 0.000847 wd 0.0500 time 0.4690 (0.4688) data time 0.0008 (0.0019) model time 0.4682 (0.4665) loss 3.2395 (3.0244) grad_norm 1.8931 (1.6340) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][480/625] eta 0:01:07 lr 0.000847 wd 0.0500 time 0.4605 (0.4686) data time 0.0009 (0.0019) model time 0.4596 (0.4664) loss 3.5786 (3.0265) grad_norm 1.2661 (1.6329) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][490/625] eta 0:01:03 lr 0.000847 wd 0.0500 time 0.4681 (0.4685) data time 0.0010 (0.0019) model time 0.4671 (0.4663) loss 2.9580 (3.0268) grad_norm 1.3585 (1.6308) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][500/625] eta 0:00:58 lr 0.000847 wd 0.0500 time 0.4657 (0.4685) data time 0.0011 (0.0019) model time 0.4646 (0.4662) loss 2.7359 (3.0295) grad_norm 1.4466 (1.6281) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][510/625] eta 0:00:53 lr 0.000847 wd 0.0500 time 0.4653 (0.4684) data time 0.0007 (0.0019) model time 0.4645 (0.4662) loss 3.7289 (3.0304) grad_norm 2.2301 (1.6365) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][520/625] eta 0:00:49 lr 0.000847 wd 0.0500 time 0.4739 (0.4684) data time 0.0008 (0.0018) model time 0.4731 (0.4662) loss 3.5218 (3.0271) grad_norm 1.6671 (1.6402) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][530/625] eta 0:00:44 lr 0.000846 wd 0.0500 time 0.4705 (0.4683) data time 0.0010 (0.0018) model time 0.4694 (0.4662) loss 3.0100 (3.0253) grad_norm 2.9136 (1.6424) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][540/625] eta 0:00:39 lr 0.000846 wd 0.0500 time 0.4617 (0.4682) data time 0.0008 (0.0018) model time 0.4610 (0.4661) loss 3.3035 (3.0249) grad_norm 1.3372 (1.6415) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][550/625] eta 0:00:35 lr 0.000846 wd 0.0500 time 0.4637 (0.4682) data time 0.0008 (0.0018) model time 0.4629 (0.4661) loss 3.5235 (3.0297) grad_norm 1.2321 (1.6375) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][560/625] eta 0:00:30 lr 0.000846 wd 0.0500 time 0.4641 (0.4681) data time 0.0010 (0.0018) model time 0.4631 (0.4660) loss 3.3134 (3.0295) grad_norm 1.0154 (1.6329) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][570/625] eta 0:00:25 lr 0.000846 wd 0.0500 time 0.4667 (0.4681) data time 0.0011 (0.0018) model time 0.4656 (0.4660) loss 3.3220 (3.0281) grad_norm 2.2820 (1.6352) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][580/625] eta 0:00:21 lr 0.000846 wd 0.0500 time 0.4605 (0.4680) data time 0.0008 (0.0018) model time 0.4596 (0.4660) loss 3.4138 (3.0316) grad_norm 2.9276 (1.6382) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][590/625] eta 0:00:16 lr 0.000846 wd 0.0500 time 0.4650 (0.4684) data time 0.0010 (0.0017) model time 0.4639 (0.4664) loss 3.1895 (3.0363) grad_norm 2.2127 (1.6374) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:43:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][600/625] eta 0:00:11 lr 0.000846 wd 0.0500 time 0.4638 (0.4684) data time 0.0007 (0.0017) model time 0.4630 (0.4665) loss 3.4645 (3.0376) grad_norm 1.8747 (1.6333) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][610/625] eta 0:00:07 lr 0.000846 wd 0.0500 time 0.4619 (0.4684) data time 0.0005 (0.0017) model time 0.4614 (0.4664) loss 1.7797 (3.0334) grad_norm 2.0264 (1.6419) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [122/300][620/625] eta 0:00:02 lr 0.000846 wd 0.0500 time 0.4560 (0.4685) data time 0.0007 (0.0017) model time 0.4553 (0.4665) loss 2.8951 (3.0349) grad_norm 1.6937 (1.6440) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 122 training takes 0:04:52 [2024-08-10 11:44:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:44:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:44:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.535 (0.535) Loss 0.5557 (0.5557) Acc@1 87.939 (87.939) Acc@5 97.852 (97.852) Mem 16715MB [2024-08-10 11:44:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.9414 (0.6983) Acc@1 77.490 (84.388) Acc@5 94.775 (97.221) Mem 16715MB [2024-08-10 11:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.9707 (0.8195) Acc@1 76.953 (81.390) Acc@5 94.189 (95.833) Mem 16715MB [2024-08-10 11:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.064 Acc@5 95.853 [2024-08-10 11:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.1% [2024-08-10 11:44:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.884 (0.884) Loss 0.4832 (0.4832) Acc@1 89.209 (89.209) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:44:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.7837 (0.6115) Acc@1 81.152 (86.386) Acc@5 96.387 (97.718) Mem 16715MB [2024-08-10 11:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.159) Loss 0.8984 (0.7201) Acc@1 77.539 (83.415) Acc@5 95.410 (96.636) Mem 16715MB [2024-08-10 11:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.121 Acc@5 96.649 [2024-08-10 11:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 11:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.12% [2024-08-10 11:44:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:44:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][0/625] eta 0:10:55 lr 0.000846 wd 0.0500 time 1.0482 (1.0482) data time 0.4017 (0.4017) model time 0.0000 (0.0000) loss 3.3908 (3.3908) grad_norm 1.4969 (1.4969) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][10/625] eta 0:05:18 lr 0.000845 wd 0.0500 time 0.4629 (0.5184) data time 0.0011 (0.0375) model time 0.0000 (0.0000) loss 2.9775 (3.2721) grad_norm 1.8353 (1.5224) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][20/625] eta 0:04:58 lr 0.000845 wd 0.0500 time 0.4669 (0.4935) data time 0.0008 (0.0202) model time 0.0000 (0.0000) loss 2.4201 (3.0966) grad_norm 1.4086 (1.4281) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][30/625] eta 0:04:48 lr 0.000845 wd 0.0500 time 0.4633 (0.4846) data time 0.0011 (0.0140) model time 0.0000 (0.0000) loss 3.0806 (3.0390) grad_norm 1.4346 (1.6703) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][40/625] eta 0:04:40 lr 0.000845 wd 0.0500 time 0.4629 (0.4799) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 3.2019 (3.0248) grad_norm 2.5163 (1.6839) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][50/625] eta 0:04:34 lr 0.000845 wd 0.0500 time 0.4632 (0.4765) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 3.8331 (2.9975) grad_norm 1.5979 (1.6611) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][60/625] eta 0:04:27 lr 0.000845 wd 0.0500 time 0.4569 (0.4743) data time 0.0010 (0.0077) model time 0.4559 (0.4618) loss 3.1177 (3.0245) grad_norm 1.4409 (1.6468) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][70/625] eta 0:04:22 lr 0.000845 wd 0.0500 time 0.4623 (0.4727) data time 0.0008 (0.0068) model time 0.4614 (0.4617) loss 3.5391 (3.0296) grad_norm 2.0315 (1.6285) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][80/625] eta 0:04:17 lr 0.000845 wd 0.0500 time 0.4601 (0.4716) data time 0.0008 (0.0061) model time 0.4593 (0.4621) loss 2.3826 (3.0154) grad_norm 1.2292 (1.6176) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][90/625] eta 0:04:11 lr 0.000845 wd 0.0500 time 0.4631 (0.4708) data time 0.0011 (0.0055) model time 0.4620 (0.4623) loss 3.4477 (2.9975) grad_norm 0.8604 (1.5864) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][100/625] eta 0:04:06 lr 0.000845 wd 0.0500 time 0.4594 (0.4700) data time 0.0008 (0.0051) model time 0.4586 (0.4622) loss 3.8096 (3.0056) grad_norm 1.3403 (1.5740) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][110/625] eta 0:04:01 lr 0.000844 wd 0.0500 time 0.4663 (0.4695) data time 0.0010 (0.0047) model time 0.4653 (0.4624) loss 3.3371 (3.0053) grad_norm 1.3090 (1.6414) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][120/625] eta 0:03:56 lr 0.000844 wd 0.0500 time 0.4627 (0.4690) data time 0.0010 (0.0044) model time 0.4617 (0.4624) loss 2.8937 (3.0030) grad_norm 1.6977 (1.6692) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][130/625] eta 0:03:52 lr 0.000844 wd 0.0500 time 0.4627 (0.4703) data time 0.0009 (0.0042) model time 0.4618 (0.4651) loss 2.5334 (3.0055) grad_norm 1.4489 (1.6602) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][140/625] eta 0:03:48 lr 0.000844 wd 0.0500 time 0.4578 (0.4711) data time 0.0012 (0.0039) model time 0.4566 (0.4670) loss 2.9475 (2.9999) grad_norm 1.5730 (1.6435) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][150/625] eta 0:03:43 lr 0.000844 wd 0.0500 time 0.4636 (0.4705) data time 0.0009 (0.0038) model time 0.4627 (0.4663) loss 3.0968 (2.9978) grad_norm 1.5541 (1.6344) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][160/625] eta 0:03:38 lr 0.000844 wd 0.0500 time 0.4623 (0.4701) data time 0.0008 (0.0036) model time 0.4615 (0.4660) loss 2.1683 (2.9953) grad_norm 1.0566 (1.6402) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][170/625] eta 0:03:33 lr 0.000844 wd 0.0500 time 0.4643 (0.4703) data time 0.0010 (0.0035) model time 0.4634 (0.4665) loss 3.2874 (3.0051) grad_norm 1.6137 (1.6233) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][180/625] eta 0:03:29 lr 0.000844 wd 0.0500 time 0.4656 (0.4700) data time 0.0008 (0.0033) model time 0.4648 (0.4664) loss 2.2644 (2.9970) grad_norm 1.4409 (1.6075) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][190/625] eta 0:03:24 lr 0.000844 wd 0.0500 time 0.4694 (0.4697) data time 0.0009 (0.0032) model time 0.4684 (0.4661) loss 3.0123 (3.0098) grad_norm 1.9103 (1.6073) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][200/625] eta 0:03:19 lr 0.000844 wd 0.0500 time 0.4635 (0.4693) data time 0.0009 (0.0031) model time 0.4626 (0.4658) loss 3.4059 (3.0199) grad_norm 1.2967 (1.6005) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][210/625] eta 0:03:14 lr 0.000844 wd 0.0500 time 0.4686 (0.4691) data time 0.0010 (0.0030) model time 0.4676 (0.4656) loss 2.9667 (3.0287) grad_norm 1.5900 (1.6011) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][220/625] eta 0:03:09 lr 0.000843 wd 0.0500 time 0.4679 (0.4689) data time 0.0008 (0.0029) model time 0.4671 (0.4654) loss 2.4499 (3.0243) grad_norm 1.8150 (1.5929) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][230/625] eta 0:03:05 lr 0.000843 wd 0.0500 time 0.4636 (0.4687) data time 0.0008 (0.0028) model time 0.4628 (0.4654) loss 3.3852 (3.0288) grad_norm 1.4593 (1.5905) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 11:46:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][240/625] eta 0:03:00 lr 0.000843 wd 0.0500 time 0.4659 (0.4686) data time 0.0008 (0.0028) model time 0.4651 (0.4653) loss 3.7404 (3.0349) grad_norm 1.2502 (inf) loss_scale 1024.0000 (2022.5062) mem 16715MB [2024-08-10 11:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][250/625] eta 0:02:55 lr 0.000843 wd 0.0500 time 0.4618 (0.4686) data time 0.0011 (0.0027) model time 0.4607 (0.4655) loss 3.2423 (3.0358) grad_norm 1.6216 (inf) loss_scale 1024.0000 (1982.7251) mem 16715MB [2024-08-10 11:46:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][260/625] eta 0:02:50 lr 0.000843 wd 0.0500 time 0.4585 (0.4684) data time 0.0008 (0.0026) model time 0.4577 (0.4653) loss 3.1057 (3.0421) grad_norm 1.1644 (inf) loss_scale 1024.0000 (1945.9923) mem 16715MB [2024-08-10 11:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][270/625] eta 0:02:46 lr 0.000843 wd 0.0500 time 0.4647 (0.4682) data time 0.0007 (0.0026) model time 0.4640 (0.4652) loss 3.5885 (3.0348) grad_norm 1.0248 (inf) loss_scale 1024.0000 (1911.9705) mem 16715MB [2024-08-10 11:46:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][280/625] eta 0:02:41 lr 0.000843 wd 0.0500 time 0.4659 (0.4682) data time 0.0010 (0.0025) model time 0.4649 (0.4652) loss 3.6753 (3.0412) grad_norm 1.0435 (inf) loss_scale 1024.0000 (1880.3701) mem 16715MB [2024-08-10 11:46:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][290/625] eta 0:02:36 lr 0.000843 wd 0.0500 time 0.4616 (0.4680) data time 0.0008 (0.0025) model time 0.4608 (0.4651) loss 3.6613 (3.0404) grad_norm 1.4929 (inf) loss_scale 1024.0000 (1850.9416) mem 16715MB [2024-08-10 11:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][300/625] eta 0:02:32 lr 0.000843 wd 0.0500 time 0.4745 (0.4679) data time 0.0007 (0.0024) model time 0.4737 (0.4651) loss 2.3949 (3.0368) grad_norm 1.3353 (inf) loss_scale 1024.0000 (1823.4684) mem 16715MB [2024-08-10 11:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][310/625] eta 0:02:27 lr 0.000843 wd 0.0500 time 0.4749 (0.4679) data time 0.0007 (0.0024) model time 0.4742 (0.4651) loss 3.6884 (3.0447) grad_norm 1.6716 (inf) loss_scale 1024.0000 (1797.7621) mem 16715MB [2024-08-10 11:46:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][320/625] eta 0:02:22 lr 0.000842 wd 0.0500 time 0.4644 (0.4679) data time 0.0010 (0.0023) model time 0.4634 (0.4652) loss 3.1531 (3.0526) grad_norm 1.6103 (inf) loss_scale 1024.0000 (1773.6573) mem 16715MB [2024-08-10 11:46:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][330/625] eta 0:02:18 lr 0.000842 wd 0.0500 time 0.4642 (0.4680) data time 0.0010 (0.0023) model time 0.4631 (0.4654) loss 2.3273 (3.0484) grad_norm 1.2902 (inf) loss_scale 1024.0000 (1751.0091) mem 16715MB [2024-08-10 11:47:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][340/625] eta 0:02:13 lr 0.000842 wd 0.0500 time 0.4638 (0.4685) data time 0.0007 (0.0023) model time 0.4631 (0.4659) loss 3.1849 (3.0507) grad_norm 2.3193 (inf) loss_scale 1024.0000 (1729.6891) mem 16715MB [2024-08-10 11:47:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][350/625] eta 0:02:08 lr 0.000842 wd 0.0500 time 0.4621 (0.4683) data time 0.0010 (0.0022) model time 0.4611 (0.4658) loss 3.5943 (3.0516) grad_norm 1.8160 (inf) loss_scale 1024.0000 (1709.5840) mem 16715MB [2024-08-10 11:47:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][360/625] eta 0:02:04 lr 0.000842 wd 0.0500 time 0.4729 (0.4683) data time 0.0008 (0.0022) model time 0.4721 (0.4659) loss 3.2551 (3.0580) grad_norm 1.1426 (inf) loss_scale 1024.0000 (1690.5928) mem 16715MB [2024-08-10 11:47:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][370/625] eta 0:01:59 lr 0.000842 wd 0.0500 time 0.4702 (0.4683) data time 0.0008 (0.0022) model time 0.4694 (0.4658) loss 2.5692 (3.0521) grad_norm 1.7795 (inf) loss_scale 1024.0000 (1672.6253) mem 16715MB [2024-08-10 11:47:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][380/625] eta 0:01:54 lr 0.000842 wd 0.0500 time 0.4641 (0.4682) data time 0.0008 (0.0022) model time 0.4634 (0.4658) loss 3.1059 (3.0539) grad_norm 1.4462 (inf) loss_scale 1024.0000 (1655.6010) mem 16715MB [2024-08-10 11:47:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][390/625] eta 0:01:50 lr 0.000842 wd 0.0500 time 0.4671 (0.4686) data time 0.0010 (0.0021) model time 0.4661 (0.4663) loss 3.0342 (3.0581) grad_norm 1.2590 (inf) loss_scale 1024.0000 (1639.4476) mem 16715MB [2024-08-10 11:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][400/625] eta 0:01:45 lr 0.000842 wd 0.0500 time 0.4645 (0.4685) data time 0.0010 (0.0021) model time 0.4635 (0.4663) loss 3.2916 (3.0620) grad_norm 1.9117 (inf) loss_scale 1024.0000 (1624.0998) mem 16715MB [2024-08-10 11:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][410/625] eta 0:01:40 lr 0.000842 wd 0.0500 time 0.4656 (0.4685) data time 0.0011 (0.0021) model time 0.4645 (0.4663) loss 2.2195 (3.0596) grad_norm 1.4018 (inf) loss_scale 1024.0000 (1609.4988) mem 16715MB [2024-08-10 11:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][420/625] eta 0:01:36 lr 0.000841 wd 0.0500 time 0.4624 (0.4684) data time 0.0011 (0.0021) model time 0.4614 (0.4662) loss 3.3561 (3.0638) grad_norm 1.2101 (inf) loss_scale 1024.0000 (1595.5914) mem 16715MB [2024-08-10 11:47:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][430/625] eta 0:01:31 lr 0.000841 wd 0.0500 time 0.4640 (0.4683) data time 0.0008 (0.0020) model time 0.4632 (0.4661) loss 2.7629 (3.0641) grad_norm 2.9592 (inf) loss_scale 1024.0000 (1582.3295) mem 16715MB [2024-08-10 11:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][440/625] eta 0:01:26 lr 0.000841 wd 0.0500 time 0.4662 (0.4682) data time 0.0010 (0.0020) model time 0.4652 (0.4660) loss 3.3792 (3.0632) grad_norm 3.0674 (inf) loss_scale 1024.0000 (1569.6689) mem 16715MB [2024-08-10 11:47:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][450/625] eta 0:01:21 lr 0.000841 wd 0.0500 time 0.4627 (0.4682) data time 0.0010 (0.0020) model time 0.4616 (0.4660) loss 3.3499 (3.0680) grad_norm 1.9484 (inf) loss_scale 1024.0000 (1557.5698) mem 16715MB [2024-08-10 11:47:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][460/625] eta 0:01:17 lr 0.000841 wd 0.0500 time 0.4725 (0.4681) data time 0.0009 (0.0020) model time 0.4715 (0.4660) loss 2.1428 (3.0635) grad_norm 1.4420 (inf) loss_scale 1024.0000 (1545.9957) mem 16715MB [2024-08-10 11:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][470/625] eta 0:01:12 lr 0.000841 wd 0.0500 time 0.4665 (0.4686) data time 0.0008 (0.0020) model time 0.4657 (0.4666) loss 2.3191 (3.0588) grad_norm 1.2313 (inf) loss_scale 1024.0000 (1534.9130) mem 16715MB [2024-08-10 11:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][480/625] eta 0:01:08 lr 0.000841 wd 0.0500 time 0.4572 (0.4691) data time 0.0010 (0.0019) model time 0.4561 (0.4671) loss 2.5923 (3.0584) grad_norm 1.5975 (inf) loss_scale 1024.0000 (1524.2911) mem 16715MB [2024-08-10 11:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][490/625] eta 0:01:03 lr 0.000841 wd 0.0500 time 0.4650 (0.4690) data time 0.0010 (0.0019) model time 0.4640 (0.4670) loss 3.2590 (3.0619) grad_norm 1.5011 (inf) loss_scale 1024.0000 (1514.1018) mem 16715MB [2024-08-10 11:48:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][500/625] eta 0:00:58 lr 0.000841 wd 0.0500 time 0.4623 (0.4689) data time 0.0010 (0.0019) model time 0.4613 (0.4669) loss 2.7789 (3.0613) grad_norm 1.7656 (inf) loss_scale 1024.0000 (1504.3194) mem 16715MB [2024-08-10 11:48:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][510/625] eta 0:00:53 lr 0.000841 wd 0.0500 time 0.4605 (0.4688) data time 0.0010 (0.0019) model time 0.4594 (0.4669) loss 3.1914 (3.0616) grad_norm 1.2273 (inf) loss_scale 1024.0000 (1494.9198) mem 16715MB [2024-08-10 11:48:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][520/625] eta 0:00:49 lr 0.000840 wd 0.0500 time 0.4664 (0.4687) data time 0.0010 (0.0019) model time 0.4654 (0.4668) loss 3.3774 (3.0593) grad_norm 2.4785 (inf) loss_scale 1024.0000 (1485.8810) mem 16715MB [2024-08-10 11:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][530/625] eta 0:00:44 lr 0.000840 wd 0.0500 time 0.4634 (0.4691) data time 0.0008 (0.0019) model time 0.4626 (0.4672) loss 2.1810 (3.0606) grad_norm 1.1835 (inf) loss_scale 1024.0000 (1477.1827) mem 16715MB [2024-08-10 11:48:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][540/625] eta 0:00:39 lr 0.000840 wd 0.0500 time 0.4673 (0.4691) data time 0.0011 (0.0019) model time 0.4662 (0.4672) loss 3.1756 (3.0608) grad_norm 1.4333 (inf) loss_scale 1024.0000 (1468.8059) mem 16715MB [2024-08-10 11:48:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][550/625] eta 0:00:35 lr 0.000840 wd 0.0500 time 0.4630 (0.4690) data time 0.0008 (0.0018) model time 0.4622 (0.4671) loss 3.4584 (3.0602) grad_norm 1.3124 (inf) loss_scale 1024.0000 (1460.7332) mem 16715MB [2024-08-10 11:48:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][560/625] eta 0:00:30 lr 0.000840 wd 0.0500 time 0.4607 (0.4689) data time 0.0010 (0.0018) model time 0.4597 (0.4670) loss 3.7220 (3.0650) grad_norm 1.8534 (inf) loss_scale 1024.0000 (1452.9483) mem 16715MB [2024-08-10 11:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][570/625] eta 0:00:25 lr 0.000840 wd 0.0500 time 0.4604 (0.4688) data time 0.0007 (0.0018) model time 0.4596 (0.4670) loss 2.6024 (3.0667) grad_norm 1.2433 (inf) loss_scale 1024.0000 (1445.4361) mem 16715MB [2024-08-10 11:48:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][580/625] eta 0:00:21 lr 0.000840 wd 0.0500 time 0.4661 (0.4688) data time 0.0010 (0.0018) model time 0.4651 (0.4669) loss 3.3617 (3.0685) grad_norm 1.8397 (inf) loss_scale 1024.0000 (1438.1824) mem 16715MB [2024-08-10 11:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][590/625] eta 0:00:16 lr 0.000840 wd 0.0500 time 0.4585 (0.4687) data time 0.0011 (0.0018) model time 0.4575 (0.4669) loss 3.0311 (3.0642) grad_norm 1.8011 (inf) loss_scale 1024.0000 (1431.1743) mem 16715MB [2024-08-10 11:49:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][600/625] eta 0:00:11 lr 0.000840 wd 0.0500 time 0.4628 (0.4686) data time 0.0010 (0.0018) model time 0.4617 (0.4668) loss 3.1614 (3.0628) grad_norm 1.2973 (inf) loss_scale 1024.0000 (1424.3993) mem 16715MB [2024-08-10 11:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][610/625] eta 0:00:07 lr 0.000840 wd 0.0500 time 0.4627 (0.4686) data time 0.0005 (0.0018) model time 0.4622 (0.4667) loss 3.7521 (3.0604) grad_norm 1.5660 (inf) loss_scale 1024.0000 (1417.8462) mem 16715MB [2024-08-10 11:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [123/300][620/625] eta 0:00:02 lr 0.000840 wd 0.0500 time 0.4635 (0.4685) data time 0.0008 (0.0017) model time 0.4627 (0.4667) loss 2.5939 (3.0573) grad_norm 1.0457 (inf) loss_scale 1024.0000 (1411.5040) mem 16715MB [2024-08-10 11:49:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 123 training takes 0:04:52 [2024-08-10 11:49:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:49:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.5625 (0.5625) Acc@1 87.939 (87.939) Acc@5 98.242 (98.242) Mem 16715MB [2024-08-10 11:49:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.9062 (0.6945) Acc@1 77.588 (84.717) Acc@5 95.361 (97.275) Mem 16715MB [2024-08-10 11:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.143) Loss 1.0127 (0.8165) Acc@1 75.146 (81.441) Acc@5 94.336 (95.901) Mem 16715MB [2024-08-10 11:49:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.212 Acc@5 95.887 [2024-08-10 11:49:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.2% [2024-08-10 11:49:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.829 (0.829) Loss 0.4827 (0.4827) Acc@1 89.209 (89.209) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.191) Loss 0.7837 (0.6110) Acc@1 80.762 (86.333) Acc@5 96.387 (97.732) Mem 16715MB [2024-08-10 11:49:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.156) Loss 0.8979 (0.7192) Acc@1 77.686 (83.394) Acc@5 95.264 (96.640) Mem 16715MB [2024-08-10 11:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.113 Acc@5 96.653 [2024-08-10 11:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 11:49:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][0/625] eta 0:13:46 lr 0.000839 wd 0.0500 time 1.3225 (1.3225) data time 0.5232 (0.5232) model time 0.0000 (0.0000) loss 3.2658 (3.2658) grad_norm 1.4446 (1.4446) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][10/625] eta 0:05:34 lr 0.000839 wd 0.0500 time 0.4649 (0.5444) data time 0.0009 (0.0504) model time 0.0000 (0.0000) loss 3.5081 (3.0173) grad_norm 3.0628 (2.1460) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][20/625] eta 0:05:06 lr 0.000839 wd 0.0500 time 0.4658 (0.5063) data time 0.0010 (0.0270) model time 0.0000 (0.0000) loss 2.9096 (2.9234) grad_norm 1.2380 (1.9143) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][30/625] eta 0:04:53 lr 0.000839 wd 0.0500 time 0.4685 (0.4936) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 3.7806 (2.9914) grad_norm 1.7759 (1.7567) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][40/625] eta 0:04:45 lr 0.000839 wd 0.0500 time 0.4646 (0.4873) data time 0.0011 (0.0143) model time 0.0000 (0.0000) loss 2.7986 (3.0115) grad_norm 1.6711 (1.7037) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][50/625] eta 0:04:40 lr 0.000839 wd 0.0500 time 0.4762 (0.4882) data time 0.0008 (0.0117) model time 0.0000 (0.0000) loss 3.5237 (3.0291) grad_norm 1.1429 (1.6605) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][60/625] eta 0:04:35 lr 0.000839 wd 0.0500 time 0.4089 (0.4880) data time 0.0011 (0.0102) model time 0.4078 (0.4848) loss 3.6617 (3.0532) grad_norm 2.2475 (1.6348) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:49:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][70/625] eta 0:04:30 lr 0.000839 wd 0.0500 time 0.4574 (0.4877) data time 0.0008 (0.0089) model time 0.4565 (0.4847) loss 1.8551 (3.0225) grad_norm 3.4251 (1.8280) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][80/625] eta 0:04:24 lr 0.000839 wd 0.0500 time 0.4634 (0.4844) data time 0.0010 (0.0079) model time 0.4625 (0.4766) loss 3.5109 (3.0367) grad_norm 1.7417 (1.8334) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][90/625] eta 0:04:17 lr 0.000839 wd 0.0500 time 0.4648 (0.4821) data time 0.0010 (0.0072) model time 0.4638 (0.4729) loss 3.4953 (3.0289) grad_norm 1.9541 (1.7961) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][100/625] eta 0:04:12 lr 0.000838 wd 0.0500 time 0.4661 (0.4803) data time 0.0008 (0.0066) model time 0.4653 (0.4709) loss 3.4446 (3.0402) grad_norm 1.5517 (1.7501) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][110/625] eta 0:04:07 lr 0.000838 wd 0.0500 time 0.4644 (0.4803) data time 0.0011 (0.0061) model time 0.4633 (0.4724) loss 3.7660 (3.0501) grad_norm 1.8391 (1.7202) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][120/625] eta 0:04:01 lr 0.000838 wd 0.0500 time 0.4642 (0.4791) data time 0.0010 (0.0057) model time 0.4631 (0.4712) loss 2.4519 (3.0490) grad_norm 1.7714 (1.7083) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][130/625] eta 0:03:56 lr 0.000838 wd 0.0500 time 0.4615 (0.4781) data time 0.0011 (0.0053) model time 0.4605 (0.4704) loss 2.7961 (3.0386) grad_norm 1.4453 (1.6936) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][140/625] eta 0:03:51 lr 0.000838 wd 0.0500 time 0.4634 (0.4770) data time 0.0011 (0.0050) model time 0.4624 (0.4694) loss 3.1596 (3.0432) grad_norm 1.0063 (1.6750) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][150/625] eta 0:03:46 lr 0.000838 wd 0.0500 time 0.4626 (0.4759) data time 0.0008 (0.0048) model time 0.4619 (0.4685) loss 3.0245 (3.0403) grad_norm 1.5091 (1.6710) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][160/625] eta 0:03:40 lr 0.000838 wd 0.0500 time 0.4569 (0.4752) data time 0.0010 (0.0045) model time 0.4559 (0.4679) loss 3.1872 (3.0459) grad_norm 1.4141 (1.6823) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][170/625] eta 0:03:35 lr 0.000838 wd 0.0500 time 0.4692 (0.4744) data time 0.0010 (0.0043) model time 0.4681 (0.4674) loss 2.4506 (3.0406) grad_norm 1.1294 (1.6694) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][180/625] eta 0:03:30 lr 0.000838 wd 0.0500 time 0.4692 (0.4738) data time 0.0010 (0.0041) model time 0.4682 (0.4670) loss 3.2138 (3.0329) grad_norm 1.1867 (1.6445) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][190/625] eta 0:03:25 lr 0.000838 wd 0.0500 time 0.4632 (0.4734) data time 0.0008 (0.0040) model time 0.4624 (0.4668) loss 2.1982 (3.0336) grad_norm 1.4024 (1.6400) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:50:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][200/625] eta 0:03:21 lr 0.000837 wd 0.0500 time 0.4669 (0.4730) data time 0.0009 (0.0038) model time 0.4660 (0.4666) loss 2.4489 (3.0209) grad_norm 1.5175 (1.6303) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][210/625] eta 0:03:16 lr 0.000837 wd 0.0500 time 0.4516 (0.4726) data time 0.0007 (0.0037) model time 0.4509 (0.4665) loss 3.5219 (3.0255) grad_norm 1.3086 (1.6143) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][220/625] eta 0:03:11 lr 0.000837 wd 0.0500 time 0.4605 (0.4720) data time 0.0008 (0.0036) model time 0.4597 (0.4660) loss 3.7509 (3.0276) grad_norm 1.7512 (1.6129) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][230/625] eta 0:03:06 lr 0.000837 wd 0.0500 time 0.4621 (0.4716) data time 0.0008 (0.0035) model time 0.4613 (0.4657) loss 3.8159 (3.0366) grad_norm 3.1733 (1.6518) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][240/625] eta 0:03:01 lr 0.000837 wd 0.0500 time 0.4603 (0.4721) data time 0.0010 (0.0034) model time 0.4593 (0.4666) loss 2.8441 (3.0310) grad_norm 1.3855 (1.6670) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][250/625] eta 0:02:56 lr 0.000837 wd 0.0500 time 0.4659 (0.4718) data time 0.0008 (0.0033) model time 0.4650 (0.4665) loss 2.8906 (3.0384) grad_norm 1.4599 (1.6702) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][260/625] eta 0:02:52 lr 0.000837 wd 0.0500 time 0.4652 (0.4722) data time 0.0011 (0.0032) model time 0.4641 (0.4672) loss 2.4442 (3.0407) grad_norm 1.8173 (1.6649) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][270/625] eta 0:02:47 lr 0.000837 wd 0.0500 time 0.4696 (0.4720) data time 0.0013 (0.0031) model time 0.4683 (0.4671) loss 2.9704 (3.0332) grad_norm 1.2329 (1.6611) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][280/625] eta 0:02:42 lr 0.000837 wd 0.0500 time 0.4648 (0.4718) data time 0.0010 (0.0030) model time 0.4638 (0.4670) loss 3.5651 (3.0391) grad_norm 1.5792 (1.6575) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][290/625] eta 0:02:37 lr 0.000837 wd 0.0500 time 0.4616 (0.4716) data time 0.0011 (0.0030) model time 0.4605 (0.4669) loss 2.5709 (3.0344) grad_norm 1.5909 (1.6701) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][300/625] eta 0:02:33 lr 0.000837 wd 0.0500 time 0.4641 (0.4713) data time 0.0008 (0.0029) model time 0.4633 (0.4668) loss 3.4344 (3.0406) grad_norm 1.2153 (1.6628) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][310/625] eta 0:02:28 lr 0.000836 wd 0.0500 time 0.4672 (0.4711) data time 0.0010 (0.0029) model time 0.4662 (0.4666) loss 3.2613 (3.0436) grad_norm 1.1434 (1.6564) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][320/625] eta 0:02:23 lr 0.000836 wd 0.0500 time 0.4585 (0.4708) data time 0.0010 (0.0028) model time 0.4574 (0.4664) loss 3.6456 (3.0351) grad_norm 1.7504 (1.6549) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:51:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][330/625] eta 0:02:18 lr 0.000836 wd 0.0500 time 0.4694 (0.4707) data time 0.0010 (0.0027) model time 0.4684 (0.4664) loss 2.8706 (3.0371) grad_norm 2.3181 (1.6571) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][340/625] eta 0:02:14 lr 0.000836 wd 0.0500 time 0.4665 (0.4705) data time 0.0011 (0.0027) model time 0.4654 (0.4663) loss 2.9351 (3.0315) grad_norm 1.7553 (1.6572) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][350/625] eta 0:02:09 lr 0.000836 wd 0.0500 time 0.4642 (0.4704) data time 0.0011 (0.0026) model time 0.4631 (0.4663) loss 2.8600 (3.0389) grad_norm 1.7689 (1.6569) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][360/625] eta 0:02:04 lr 0.000836 wd 0.0500 time 0.4648 (0.4702) data time 0.0011 (0.0026) model time 0.4637 (0.4662) loss 3.4180 (3.0447) grad_norm 1.8078 (1.6651) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][370/625] eta 0:01:59 lr 0.000836 wd 0.0500 time 0.4599 (0.4700) data time 0.0008 (0.0026) model time 0.4591 (0.4660) loss 2.6395 (3.0459) grad_norm 1.5524 (1.6673) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][380/625] eta 0:01:55 lr 0.000836 wd 0.0500 time 0.4633 (0.4699) data time 0.0010 (0.0025) model time 0.4623 (0.4660) loss 2.7620 (3.0479) grad_norm 1.7914 (1.6691) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][390/625] eta 0:01:50 lr 0.000836 wd 0.0500 time 0.4628 (0.4697) data time 0.0008 (0.0025) model time 0.4620 (0.4658) loss 2.6700 (3.0469) grad_norm 1.3011 (1.6652) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][400/625] eta 0:01:45 lr 0.000836 wd 0.0500 time 0.4660 (0.4701) data time 0.0009 (0.0025) model time 0.4651 (0.4663) loss 3.6236 (3.0504) grad_norm 2.2367 (1.6889) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][410/625] eta 0:01:41 lr 0.000835 wd 0.0500 time 0.4617 (0.4700) data time 0.0008 (0.0024) model time 0.4609 (0.4663) loss 3.8500 (3.0573) grad_norm 1.5823 (1.6911) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][420/625] eta 0:01:36 lr 0.000835 wd 0.0500 time 0.4701 (0.4699) data time 0.0007 (0.0024) model time 0.4694 (0.4663) loss 2.4667 (3.0596) grad_norm 1.6148 (1.6832) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][430/625] eta 0:01:31 lr 0.000835 wd 0.0500 time 0.4589 (0.4698) data time 0.0008 (0.0024) model time 0.4582 (0.4662) loss 3.0582 (3.0584) grad_norm 1.4653 (1.6796) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][440/625] eta 0:01:26 lr 0.000835 wd 0.0500 time 0.4656 (0.4700) data time 0.0008 (0.0023) model time 0.4649 (0.4666) loss 3.0478 (3.0560) grad_norm 2.3148 (1.6851) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][450/625] eta 0:01:22 lr 0.000835 wd 0.0500 time 0.4627 (0.4699) data time 0.0010 (0.0023) model time 0.4617 (0.4665) loss 2.7243 (3.0570) grad_norm 1.5463 (1.6973) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][460/625] eta 0:01:17 lr 0.000835 wd 0.0500 time 0.4669 (0.4704) data time 0.0007 (0.0023) model time 0.4662 (0.4670) loss 3.8687 (3.0559) grad_norm 1.4109 (1.6936) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][470/625] eta 0:01:12 lr 0.000835 wd 0.0500 time 0.4671 (0.4702) data time 0.0008 (0.0022) model time 0.4663 (0.4670) loss 2.3368 (3.0595) grad_norm 1.4217 (1.6875) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][480/625] eta 0:01:08 lr 0.000835 wd 0.0500 time 0.4672 (0.4706) data time 0.0010 (0.0022) model time 0.4662 (0.4674) loss 3.5487 (3.0583) grad_norm 2.2307 (1.6833) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][490/625] eta 0:01:03 lr 0.000835 wd 0.0500 time 0.4636 (0.4705) data time 0.0007 (0.0022) model time 0.4628 (0.4673) loss 2.7770 (3.0590) grad_norm 1.2975 (1.6791) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][500/625] eta 0:00:58 lr 0.000835 wd 0.0500 time 0.4583 (0.4703) data time 0.0007 (0.0022) model time 0.4576 (0.4672) loss 3.4989 (3.0592) grad_norm 4.4669 (1.6865) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][510/625] eta 0:00:54 lr 0.000834 wd 0.0500 time 0.4656 (0.4702) data time 0.0011 (0.0021) model time 0.4646 (0.4671) loss 3.3599 (3.0588) grad_norm 1.3716 (1.6875) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][520/625] eta 0:00:49 lr 0.000834 wd 0.0500 time 0.4633 (0.4701) data time 0.0010 (0.0021) model time 0.4623 (0.4670) loss 3.2081 (3.0571) grad_norm 2.1725 (1.6879) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][530/625] eta 0:00:44 lr 0.000834 wd 0.0500 time 0.4625 (0.4699) data time 0.0010 (0.0021) model time 0.4616 (0.4669) loss 3.6289 (3.0582) grad_norm 1.7524 (1.6882) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][540/625] eta 0:00:39 lr 0.000834 wd 0.0500 time 0.4686 (0.4698) data time 0.0010 (0.0021) model time 0.4676 (0.4668) loss 3.4623 (3.0588) grad_norm 1.2198 (1.6832) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][550/625] eta 0:00:35 lr 0.000834 wd 0.0500 time 0.4649 (0.4698) data time 0.0008 (0.0021) model time 0.4641 (0.4668) loss 4.1302 (3.0638) grad_norm 1.3754 (1.6803) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][560/625] eta 0:00:30 lr 0.000834 wd 0.0500 time 0.4645 (0.4697) data time 0.0010 (0.0021) model time 0.4635 (0.4668) loss 3.0436 (3.0631) grad_norm 1.3575 (1.6760) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][570/625] eta 0:00:25 lr 0.000834 wd 0.0500 time 0.4641 (0.4696) data time 0.0009 (0.0020) model time 0.4631 (0.4667) loss 2.8710 (3.0653) grad_norm 1.3298 (1.6729) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:53:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][580/625] eta 0:00:21 lr 0.000834 wd 0.0500 time 0.4649 (0.4695) data time 0.0007 (0.0020) model time 0.4642 (0.4667) loss 3.3733 (3.0657) grad_norm 1.4059 (1.6715) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][590/625] eta 0:00:16 lr 0.000834 wd 0.0500 time 0.4631 (0.4696) data time 0.0007 (0.0020) model time 0.4624 (0.4667) loss 2.4686 (3.0642) grad_norm 2.3717 (1.6682) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][600/625] eta 0:00:11 lr 0.000834 wd 0.0500 time 0.4602 (0.4694) data time 0.0010 (0.0020) model time 0.4592 (0.4666) loss 3.3757 (3.0648) grad_norm 2.1607 (1.6657) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][610/625] eta 0:00:07 lr 0.000833 wd 0.0500 time 0.4607 (0.4697) data time 0.0007 (0.0020) model time 0.4600 (0.4670) loss 3.3798 (3.0657) grad_norm 1.2672 (1.6677) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [124/300][620/625] eta 0:00:02 lr 0.000833 wd 0.0500 time 0.4614 (0.4696) data time 0.0005 (0.0020) model time 0.4609 (0.4669) loss 2.9220 (3.0639) grad_norm 1.3417 (1.6683) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 124 training takes 0:04:53 [2024-08-10 11:54:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:54:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5547 (0.5547) Acc@1 87.061 (87.061) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 11:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.162) Loss 0.8979 (0.6703) Acc@1 77.295 (84.695) Acc@5 95.312 (97.328) Mem 16715MB [2024-08-10 11:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0068 (0.7911) Acc@1 75.195 (81.689) Acc@5 94.531 (95.998) Mem 16715MB [2024-08-10 11:54:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.454 Acc@5 95.985 [2024-08-10 11:54:22 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-10 11:54:22 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.45% [2024-08-10 11:54:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 11:54:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 11:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 0.4829 (0.4829) Acc@1 89.209 (89.209) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.7842 (0.6107) Acc@1 80.762 (86.421) Acc@5 96.387 (97.758) Mem 16715MB [2024-08-10 11:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 0.8984 (0.7184) Acc@1 77.637 (83.461) Acc@5 95.117 (96.640) Mem 16715MB [2024-08-10 11:54:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.159 Acc@5 96.651 [2024-08-10 11:54:27 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 11:54:27 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.16% [2024-08-10 11:54:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:54:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][0/625] eta 0:08:26 lr 0.000833 wd 0.0500 time 0.8097 (0.8097) data time 0.4024 (0.4024) model time 0.0000 (0.0000) loss 2.2319 (2.2319) grad_norm 1.1354 (1.1354) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][10/625] eta 0:05:14 lr 0.000833 wd 0.0500 time 0.4681 (0.5110) data time 0.0010 (0.0376) model time 0.0000 (0.0000) loss 3.6423 (3.2306) grad_norm 1.7582 (1.3365) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][20/625] eta 0:04:56 lr 0.000833 wd 0.0500 time 0.4672 (0.4896) data time 0.0010 (0.0202) model time 0.0000 (0.0000) loss 3.3948 (3.1480) grad_norm 1.9183 (1.4763) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][30/625] eta 0:04:46 lr 0.000833 wd 0.0500 time 0.4668 (0.4814) data time 0.0008 (0.0140) model time 0.0000 (0.0000) loss 1.8296 (3.1121) grad_norm 1.1374 (1.4635) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][40/625] eta 0:04:39 lr 0.000833 wd 0.0500 time 0.4636 (0.4776) data time 0.0011 (0.0109) model time 0.0000 (0.0000) loss 1.9061 (3.0376) grad_norm 1.3814 (1.6032) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][50/625] eta 0:04:33 lr 0.000833 wd 0.0500 time 0.4657 (0.4752) data time 0.0010 (0.0090) model time 0.0000 (0.0000) loss 2.7813 (3.0153) grad_norm 1.3019 (1.6402) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][60/625] eta 0:04:29 lr 0.000833 wd 0.0500 time 0.4127 (0.4765) data time 0.0008 (0.0077) model time 0.4119 (0.4823) loss 2.8319 (2.9938) grad_norm 1.3967 (1.6130) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][70/625] eta 0:04:23 lr 0.000833 wd 0.0500 time 0.4633 (0.4749) data time 0.0010 (0.0067) model time 0.4623 (0.4730) loss 2.2637 (2.9984) grad_norm 1.6402 (1.6443) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][80/625] eta 0:04:18 lr 0.000833 wd 0.0500 time 0.4636 (0.4736) data time 0.0008 (0.0060) model time 0.4628 (0.4699) loss 3.5673 (3.0093) grad_norm 1.6727 (1.6690) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][90/625] eta 0:04:13 lr 0.000832 wd 0.0500 time 0.4686 (0.4732) data time 0.0007 (0.0061) model time 0.4679 (0.4682) loss 3.2808 (3.0258) grad_norm 1.4246 (1.6549) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][100/625] eta 0:04:07 lr 0.000832 wd 0.0500 time 0.4647 (0.4723) data time 0.0009 (0.0056) model time 0.4638 (0.4673) loss 2.9788 (3.0386) grad_norm 1.5962 (1.6523) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][110/625] eta 0:04:02 lr 0.000832 wd 0.0500 time 0.4588 (0.4716) data time 0.0013 (0.0052) model time 0.4575 (0.4667) loss 2.2491 (3.0296) grad_norm 1.6606 (1.6285) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][120/625] eta 0:03:57 lr 0.000832 wd 0.0500 time 0.4698 (0.4710) data time 0.0010 (0.0048) model time 0.4688 (0.4661) loss 3.1055 (3.0402) grad_norm 1.8184 (1.6291) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][130/625] eta 0:03:52 lr 0.000832 wd 0.0500 time 0.4672 (0.4707) data time 0.0008 (0.0046) model time 0.4664 (0.4660) loss 3.8445 (3.0636) grad_norm 1.3372 (1.6231) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][140/625] eta 0:03:48 lr 0.000832 wd 0.0500 time 0.4705 (0.4705) data time 0.0008 (0.0043) model time 0.4697 (0.4661) loss 3.6304 (3.0749) grad_norm 1.2267 (1.6348) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][150/625] eta 0:03:43 lr 0.000832 wd 0.0500 time 0.4666 (0.4702) data time 0.0009 (0.0041) model time 0.4657 (0.4660) loss 3.5261 (3.0754) grad_norm 1.4772 (1.6366) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][160/625] eta 0:03:38 lr 0.000832 wd 0.0500 time 0.4600 (0.4699) data time 0.0008 (0.0039) model time 0.4592 (0.4659) loss 2.8878 (3.0658) grad_norm 1.9688 (1.6513) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][170/625] eta 0:03:33 lr 0.000832 wd 0.0500 time 0.4907 (0.4698) data time 0.0008 (0.0038) model time 0.4899 (0.4660) loss 3.5669 (3.0773) grad_norm 8.9717 (1.6903) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][180/625] eta 0:03:28 lr 0.000832 wd 0.0500 time 0.4568 (0.4695) data time 0.0008 (0.0036) model time 0.4560 (0.4658) loss 2.1861 (3.0691) grad_norm 2.8694 (1.7179) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:55:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][190/625] eta 0:03:24 lr 0.000831 wd 0.0500 time 0.4638 (0.4702) data time 0.0008 (0.0036) model time 0.4631 (0.4668) loss 2.8700 (3.0662) grad_norm 1.4865 (1.7225) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][200/625] eta 0:03:19 lr 0.000831 wd 0.0500 time 0.4671 (0.4703) data time 0.0010 (0.0034) model time 0.4662 (0.4671) loss 3.0644 (3.0689) grad_norm 1.6289 (1.7243) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][210/625] eta 0:03:15 lr 0.000831 wd 0.0500 time 0.4661 (0.4713) data time 0.0009 (0.0033) model time 0.4653 (0.4685) loss 3.1204 (3.0582) grad_norm 1.5219 (1.7138) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][220/625] eta 0:03:10 lr 0.000831 wd 0.0500 time 0.5154 (0.4712) data time 0.0008 (0.0032) model time 0.5146 (0.4685) loss 2.4285 (3.0596) grad_norm 2.0186 (1.7176) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][230/625] eta 0:03:06 lr 0.000831 wd 0.0500 time 0.4640 (0.4710) data time 0.0011 (0.0031) model time 0.4629 (0.4683) loss 3.0877 (3.0588) grad_norm 3.0897 (1.7225) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][240/625] eta 0:03:01 lr 0.000831 wd 0.0500 time 0.4673 (0.4709) data time 0.0010 (0.0031) model time 0.4663 (0.4681) loss 3.3878 (3.0610) grad_norm 2.9250 (1.7226) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][250/625] eta 0:02:56 lr 0.000831 wd 0.0500 time 0.4650 (0.4706) data time 0.0010 (0.0031) model time 0.4639 (0.4679) loss 3.2833 (3.0660) grad_norm 1.5226 (1.7177) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][260/625] eta 0:02:51 lr 0.000831 wd 0.0500 time 0.4643 (0.4704) data time 0.0008 (0.0030) model time 0.4635 (0.4677) loss 3.2279 (3.0707) grad_norm 1.2093 (1.7059) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][270/625] eta 0:02:46 lr 0.000831 wd 0.0500 time 0.4647 (0.4702) data time 0.0008 (0.0029) model time 0.4639 (0.4675) loss 3.6941 (3.0729) grad_norm 1.7014 (1.6961) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][280/625] eta 0:02:42 lr 0.000831 wd 0.0500 time 0.4740 (0.4701) data time 0.0010 (0.0029) model time 0.4730 (0.4675) loss 3.3511 (3.0749) grad_norm 1.5004 (1.7033) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][290/625] eta 0:02:37 lr 0.000830 wd 0.0500 time 0.4634 (0.4700) data time 0.0008 (0.0028) model time 0.4626 (0.4674) loss 2.7596 (3.0717) grad_norm 1.4563 (1.6991) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][300/625] eta 0:02:32 lr 0.000830 wd 0.0500 time 0.4650 (0.4698) data time 0.0010 (0.0027) model time 0.4641 (0.4673) loss 2.2597 (3.0694) grad_norm 1.4357 (1.7015) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][310/625] eta 0:02:27 lr 0.000830 wd 0.0500 time 0.4667 (0.4697) data time 0.0008 (0.0027) model time 0.4659 (0.4672) loss 2.3555 (3.0688) grad_norm 2.2916 (1.7039) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][320/625] eta 0:02:23 lr 0.000830 wd 0.0500 time 0.4626 (0.4695) data time 0.0008 (0.0026) model time 0.4618 (0.4670) loss 3.3133 (3.0743) grad_norm 1.6486 (1.7035) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][330/625] eta 0:02:18 lr 0.000830 wd 0.0500 time 0.4569 (0.4693) data time 0.0008 (0.0026) model time 0.4561 (0.4668) loss 2.2720 (3.0673) grad_norm 1.5578 (1.6918) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][340/625] eta 0:02:13 lr 0.000830 wd 0.0500 time 0.4646 (0.4692) data time 0.0008 (0.0025) model time 0.4638 (0.4667) loss 2.2892 (3.0609) grad_norm 2.4921 (1.7174) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][350/625] eta 0:02:08 lr 0.000830 wd 0.0500 time 0.4592 (0.4690) data time 0.0008 (0.0025) model time 0.4584 (0.4666) loss 3.5891 (3.0599) grad_norm 1.2988 (1.7129) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][360/625] eta 0:02:04 lr 0.000830 wd 0.0500 time 0.4665 (0.4696) data time 0.0008 (0.0025) model time 0.4657 (0.4673) loss 2.1036 (3.0588) grad_norm 1.7124 (1.7057) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][370/625] eta 0:01:59 lr 0.000830 wd 0.0500 time 0.4596 (0.4695) data time 0.0008 (0.0024) model time 0.4588 (0.4672) loss 3.3669 (3.0569) grad_norm 1.6340 (1.7024) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][380/625] eta 0:01:54 lr 0.000830 wd 0.0500 time 0.4634 (0.4694) data time 0.0010 (0.0024) model time 0.4624 (0.4671) loss 3.5302 (3.0579) grad_norm 1.0583 (1.6933) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][390/625] eta 0:01:50 lr 0.000829 wd 0.0500 time 0.4670 (0.4692) data time 0.0008 (0.0024) model time 0.4662 (0.4670) loss 2.9743 (3.0612) grad_norm 1.2565 (1.6908) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][400/625] eta 0:01:45 lr 0.000829 wd 0.0500 time 0.4663 (0.4695) data time 0.0011 (0.0023) model time 0.4652 (0.4673) loss 3.0685 (3.0669) grad_norm 1.0172 (1.6845) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][410/625] eta 0:01:41 lr 0.000829 wd 0.0500 time 0.4637 (0.4702) data time 0.0009 (0.0023) model time 0.4627 (0.4682) loss 3.9235 (3.0690) grad_norm 1.6040 (1.6794) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][420/625] eta 0:01:36 lr 0.000829 wd 0.0500 time 0.4630 (0.4705) data time 0.0008 (0.0023) model time 0.4622 (0.4685) loss 2.3231 (3.0712) grad_norm 1.7752 (1.6758) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][430/625] eta 0:01:31 lr 0.000829 wd 0.0500 time 0.4684 (0.4703) data time 0.0008 (0.0022) model time 0.4676 (0.4683) loss 1.9978 (3.0648) grad_norm 1.2743 (1.6819) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:57:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][440/625] eta 0:01:26 lr 0.000829 wd 0.0500 time 0.4675 (0.4703) data time 0.0007 (0.0022) model time 0.4667 (0.4683) loss 3.9161 (3.0683) grad_norm 1.4096 (1.6771) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][450/625] eta 0:01:22 lr 0.000829 wd 0.0500 time 0.4629 (0.4701) data time 0.0008 (0.0022) model time 0.4620 (0.4681) loss 3.3372 (3.0690) grad_norm 1.4699 (1.6757) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][460/625] eta 0:01:17 lr 0.000829 wd 0.0500 time 0.4605 (0.4699) data time 0.0008 (0.0022) model time 0.4597 (0.4679) loss 3.1575 (3.0705) grad_norm 1.4968 (1.6771) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][470/625] eta 0:01:12 lr 0.000829 wd 0.0500 time 0.4640 (0.4698) data time 0.0010 (0.0021) model time 0.4630 (0.4678) loss 2.9493 (3.0673) grad_norm 1.1259 (1.6789) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][480/625] eta 0:01:08 lr 0.000829 wd 0.0500 time 0.4663 (0.4697) data time 0.0007 (0.0021) model time 0.4656 (0.4677) loss 3.8322 (3.0673) grad_norm 1.8516 (1.6762) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][490/625] eta 0:01:03 lr 0.000828 wd 0.0500 time 0.4655 (0.4695) data time 0.0008 (0.0021) model time 0.4647 (0.4676) loss 3.5770 (3.0620) grad_norm 1.3224 (1.6726) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][500/625] eta 0:00:58 lr 0.000828 wd 0.0500 time 0.4659 (0.4695) data time 0.0010 (0.0021) model time 0.4649 (0.4675) loss 3.3898 (3.0647) grad_norm 2.5944 (1.6760) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][510/625] eta 0:00:53 lr 0.000828 wd 0.0500 time 0.4654 (0.4694) data time 0.0008 (0.0021) model time 0.4646 (0.4674) loss 1.8138 (3.0659) grad_norm 1.5886 (1.6792) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][520/625] eta 0:00:49 lr 0.000828 wd 0.0500 time 0.4708 (0.4693) data time 0.0010 (0.0020) model time 0.4698 (0.4674) loss 3.2481 (3.0675) grad_norm 1.6426 (1.6759) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][530/625] eta 0:00:44 lr 0.000828 wd 0.0500 time 0.4644 (0.4692) data time 0.0011 (0.0020) model time 0.4633 (0.4673) loss 2.7518 (3.0636) grad_norm 1.4615 (1.6733) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][540/625] eta 0:00:39 lr 0.000828 wd 0.0500 time 0.4593 (0.4691) data time 0.0010 (0.0020) model time 0.4583 (0.4672) loss 3.2088 (3.0620) grad_norm 1.0196 (1.6685) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][550/625] eta 0:00:35 lr 0.000828 wd 0.0500 time 0.4641 (0.4690) data time 0.0008 (0.0020) model time 0.4634 (0.4671) loss 2.9813 (3.0652) grad_norm 1.4058 (1.6698) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][560/625] eta 0:00:30 lr 0.000828 wd 0.0500 time 0.4666 (0.4696) data time 0.0008 (0.0020) model time 0.4658 (0.4678) loss 3.0969 (3.0662) grad_norm 2.2006 (1.6728) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][570/625] eta 0:00:25 lr 0.000828 wd 0.0500 time 0.4651 (0.4698) data time 0.0008 (0.0020) model time 0.4643 (0.4680) loss 1.9769 (3.0653) grad_norm 1.2749 (1.6735) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][580/625] eta 0:00:21 lr 0.000828 wd 0.0500 time 0.4692 (0.4698) data time 0.0010 (0.0019) model time 0.4682 (0.4680) loss 3.4247 (3.0711) grad_norm 1.2807 (1.6723) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][590/625] eta 0:00:16 lr 0.000827 wd 0.0500 time 0.4724 (0.4698) data time 0.0008 (0.0019) model time 0.4717 (0.4681) loss 2.3848 (3.0692) grad_norm 2.1794 (1.6815) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][600/625] eta 0:00:11 lr 0.000827 wd 0.0500 time 0.4631 (0.4698) data time 0.0012 (0.0019) model time 0.4620 (0.4680) loss 3.7675 (3.0715) grad_norm 1.5617 (1.6824) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][610/625] eta 0:00:07 lr 0.000827 wd 0.0500 time 0.4621 (0.4697) data time 0.0005 (0.0019) model time 0.4615 (0.4679) loss 3.3480 (3.0715) grad_norm 1.5203 (1.6777) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [125/300][620/625] eta 0:00:02 lr 0.000827 wd 0.0500 time 0.4600 (0.4695) data time 0.0005 (0.0019) model time 0.4595 (0.4678) loss 3.2962 (3.0717) grad_norm 1.1120 (1.6719) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 125 training takes 0:04:53 [2024-08-10 11:59:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 11:59:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 11:59:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.535 (0.535) Loss 0.5308 (0.5308) Acc@1 87.988 (87.988) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 11:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.8770 (0.6711) Acc@1 79.395 (84.863) Acc@5 95.410 (97.377) Mem 16715MB [2024-08-10 11:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0166 (0.8022) Acc@1 75.146 (81.713) Acc@5 94.189 (95.945) Mem 16715MB [2024-08-10 11:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.516 Acc@5 95.955 [2024-08-10 11:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-10 11:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.52% [2024-08-10 11:59:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 11:59:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 11:59:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.4822 (0.4822) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 11:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.7808 (0.6098) Acc@1 80.908 (86.430) Acc@5 96.484 (97.781) Mem 16715MB [2024-08-10 11:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.8989 (0.7175) Acc@1 77.490 (83.477) Acc@5 95.215 (96.654) Mem 16715MB [2024-08-10 11:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.191 Acc@5 96.671 [2024-08-10 11:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 11:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.19% [2024-08-10 11:59:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 11:59:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 11:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][0/625] eta 0:08:37 lr 0.000827 wd 0.0500 time 0.8283 (0.8283) data time 0.4219 (0.4219) model time 0.0000 (0.0000) loss 3.4940 (3.4940) grad_norm 1.1816 (1.1816) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][10/625] eta 0:05:07 lr 0.000827 wd 0.0500 time 0.4614 (0.4999) data time 0.0008 (0.0394) model time 0.0000 (0.0000) loss 3.4940 (3.1214) grad_norm 1.9226 (1.5390) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][20/625] eta 0:04:52 lr 0.000827 wd 0.0500 time 0.4664 (0.4841) data time 0.0011 (0.0211) model time 0.0000 (0.0000) loss 2.3561 (3.0982) grad_norm 1.1644 (1.7749) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][30/625] eta 0:04:49 lr 0.000827 wd 0.0500 time 0.4697 (0.4863) data time 0.0008 (0.0147) model time 0.0000 (0.0000) loss 4.0412 (3.1985) grad_norm 1.6932 (2.2486) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][40/625] eta 0:04:41 lr 0.000827 wd 0.0500 time 0.4626 (0.4808) data time 0.0010 (0.0113) model time 0.0000 (0.0000) loss 3.4964 (3.1577) grad_norm 1.3442 (2.1194) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 11:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][50/625] eta 0:04:34 lr 0.000827 wd 0.0500 time 0.4611 (0.4774) data time 0.0007 (0.0093) model time 0.0000 (0.0000) loss 2.5357 (3.0226) grad_norm 1.5377 (1.9902) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][60/625] eta 0:04:29 lr 0.000827 wd 0.0500 time 0.4098 (0.4776) data time 0.0009 (0.0080) model time 0.4089 (0.4776) loss 3.4884 (3.0353) grad_norm 1.6622 (1.9390) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][70/625] eta 0:04:24 lr 0.000826 wd 0.0500 time 0.4594 (0.4757) data time 0.0011 (0.0070) model time 0.4583 (0.4702) loss 2.5378 (3.0314) grad_norm 1.4421 (1.8535) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][80/625] eta 0:04:18 lr 0.000826 wd 0.0500 time 0.4638 (0.4743) data time 0.0009 (0.0063) model time 0.4629 (0.4677) loss 3.2150 (3.0767) grad_norm 1.7692 (1.8066) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][90/625] eta 0:04:13 lr 0.000826 wd 0.0500 time 0.4612 (0.4732) data time 0.0010 (0.0057) model time 0.4602 (0.4666) loss 3.3316 (3.0758) grad_norm 1.6511 (1.7760) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][100/625] eta 0:04:08 lr 0.000826 wd 0.0500 time 0.4631 (0.4724) data time 0.0010 (0.0053) model time 0.4621 (0.4662) loss 3.0921 (3.0675) grad_norm 1.2203 (1.7502) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][110/625] eta 0:04:02 lr 0.000826 wd 0.0500 time 0.4633 (0.4717) data time 0.0008 (0.0049) model time 0.4625 (0.4658) loss 3.5541 (3.0578) grad_norm 1.6785 (1.7221) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][120/625] eta 0:04:00 lr 0.000826 wd 0.0500 time 0.4639 (0.4756) data time 0.0008 (0.0046) model time 0.4631 (0.4732) loss 3.8681 (3.0583) grad_norm 1.9594 (1.6986) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][130/625] eta 0:03:54 lr 0.000826 wd 0.0500 time 0.4638 (0.4747) data time 0.0010 (0.0043) model time 0.4628 (0.4718) loss 3.3572 (3.0527) grad_norm 1.7232 (1.6695) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][140/625] eta 0:03:49 lr 0.000826 wd 0.0500 time 0.4709 (0.4739) data time 0.0008 (0.0041) model time 0.4701 (0.4708) loss 3.1083 (3.0459) grad_norm 1.3476 (1.6476) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][150/625] eta 0:03:44 lr 0.000826 wd 0.0500 time 0.4634 (0.4732) data time 0.0009 (0.0039) model time 0.4625 (0.4700) loss 3.2532 (3.0513) grad_norm 2.3400 (1.6406) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][160/625] eta 0:03:39 lr 0.000826 wd 0.0500 time 0.4691 (0.4728) data time 0.0010 (0.0037) model time 0.4680 (0.4696) loss 3.0532 (3.0439) grad_norm 1.5418 (1.6309) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][170/625] eta 0:03:34 lr 0.000825 wd 0.0500 time 0.4661 (0.4724) data time 0.0011 (0.0035) model time 0.4650 (0.4692) loss 3.7253 (3.0420) grad_norm 1.3325 (1.6314) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:00:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][180/625] eta 0:03:30 lr 0.000825 wd 0.0500 time 0.4662 (0.4720) data time 0.0010 (0.0034) model time 0.4653 (0.4688) loss 3.0328 (3.0474) grad_norm 6.1661 (1.6529) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][190/625] eta 0:03:25 lr 0.000825 wd 0.0500 time 0.4654 (0.4717) data time 0.0008 (0.0033) model time 0.4646 (0.4685) loss 2.4811 (3.0461) grad_norm 1.7610 (1.6524) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][200/625] eta 0:03:20 lr 0.000825 wd 0.0500 time 0.4606 (0.4713) data time 0.0010 (0.0032) model time 0.4595 (0.4682) loss 3.1817 (3.0499) grad_norm 1.0843 (1.6445) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][210/625] eta 0:03:15 lr 0.000825 wd 0.0500 time 0.4623 (0.4712) data time 0.0010 (0.0031) model time 0.4613 (0.4681) loss 2.5684 (3.0424) grad_norm 1.3143 (1.6337) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][220/625] eta 0:03:10 lr 0.000825 wd 0.0500 time 0.4748 (0.4710) data time 0.0008 (0.0030) model time 0.4740 (0.4680) loss 2.2868 (3.0414) grad_norm 1.1185 (1.6335) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][230/625] eta 0:03:05 lr 0.000825 wd 0.0500 time 0.4642 (0.4709) data time 0.0010 (0.0029) model time 0.4632 (0.4679) loss 2.5192 (3.0355) grad_norm 1.4009 (1.6335) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][240/625] eta 0:03:01 lr 0.000825 wd 0.0500 time 0.4615 (0.4708) data time 0.0009 (0.0028) model time 0.4606 (0.4680) loss 3.9209 (3.0389) grad_norm 1.4226 (1.6260) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][250/625] eta 0:02:56 lr 0.000825 wd 0.0500 time 0.4653 (0.4708) data time 0.0008 (0.0027) model time 0.4644 (0.4680) loss 2.9036 (3.0347) grad_norm 2.3031 (1.6269) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][260/625] eta 0:02:51 lr 0.000825 wd 0.0500 time 0.4689 (0.4707) data time 0.0008 (0.0027) model time 0.4681 (0.4680) loss 3.4768 (3.0343) grad_norm 1.9861 (1.6201) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][270/625] eta 0:02:47 lr 0.000824 wd 0.0500 time 0.4593 (0.4705) data time 0.0008 (0.0026) model time 0.4585 (0.4678) loss 3.4151 (3.0287) grad_norm 1.6759 (1.6168) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][280/625] eta 0:02:42 lr 0.000824 wd 0.0500 time 0.4687 (0.4706) data time 0.0009 (0.0026) model time 0.4678 (0.4680) loss 2.4645 (3.0257) grad_norm 1.3597 (1.6098) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][290/625] eta 0:02:37 lr 0.000824 wd 0.0500 time 0.4655 (0.4705) data time 0.0008 (0.0025) model time 0.4647 (0.4679) loss 3.2877 (3.0276) grad_norm 1.3424 (1.6019) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:01:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][300/625] eta 0:02:32 lr 0.000824 wd 0.0500 time 0.4678 (0.4704) data time 0.0009 (0.0025) model time 0.4669 (0.4679) loss 3.2466 (3.0321) grad_norm 1.2412 (1.6066) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][310/625] eta 0:02:28 lr 0.000824 wd 0.0500 time 0.5274 (0.4706) data time 0.0007 (0.0025) model time 0.5267 (0.4682) loss 3.1618 (3.0369) grad_norm 1.6170 (1.6695) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:02:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][320/625] eta 0:02:23 lr 0.000824 wd 0.0500 time 0.4657 (0.4705) data time 0.0008 (0.0024) model time 0.4649 (0.4681) loss 2.8953 (3.0397) grad_norm 1.6676 (1.6714) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:02:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][330/625] eta 0:02:18 lr 0.000824 wd 0.0500 time 0.4566 (0.4705) data time 0.0010 (0.0024) model time 0.4556 (0.4682) loss 3.1161 (3.0336) grad_norm 1.2260 (1.6676) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:02:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][340/625] eta 0:02:14 lr 0.000824 wd 0.0500 time 0.4648 (0.4704) data time 0.0011 (0.0023) model time 0.4637 (0.4681) loss 2.3335 (3.0301) grad_norm 1.4846 (1.6688) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:02:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][350/625] eta 0:02:09 lr 0.000824 wd 0.0500 time 0.4661 (0.4703) data time 0.0010 (0.0023) model time 0.4651 (0.4680) loss 2.4160 (3.0210) grad_norm 1.3311 (1.6722) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:02:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][360/625] eta 0:02:04 lr 0.000824 wd 0.0500 time 0.4593 (0.4702) data time 0.0007 (0.0023) model time 0.4586 (0.4679) loss 3.4810 (3.0167) grad_norm 1.5550 (1.6747) loss_scale 2048.0000 (1026.8366) mem 16715MB [2024-08-10 12:02:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][370/625] eta 0:02:00 lr 0.000823 wd 0.0500 time 0.4611 (0.4708) data time 0.0010 (0.0022) model time 0.4601 (0.4686) loss 3.5474 (3.0209) grad_norm 1.3982 (1.6729) loss_scale 2048.0000 (1054.3612) mem 16715MB [2024-08-10 12:02:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][380/625] eta 0:01:55 lr 0.000823 wd 0.0500 time 0.5087 (0.4708) data time 0.0008 (0.0022) model time 0.5079 (0.4687) loss 2.1517 (3.0187) grad_norm 3.7100 (1.6736) loss_scale 2048.0000 (1080.4409) mem 16715MB [2024-08-10 12:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][390/625] eta 0:01:50 lr 0.000823 wd 0.0500 time 0.4608 (0.4707) data time 0.0011 (0.0022) model time 0.4597 (0.4686) loss 2.7736 (3.0159) grad_norm 1.5589 (1.6729) loss_scale 2048.0000 (1105.1867) mem 16715MB [2024-08-10 12:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][400/625] eta 0:01:46 lr 0.000823 wd 0.0500 time 0.4692 (0.4711) data time 0.0010 (0.0022) model time 0.4682 (0.4691) loss 2.5315 (3.0130) grad_norm 1.0796 (1.6786) loss_scale 2048.0000 (1128.6983) mem 16715MB [2024-08-10 12:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][410/625] eta 0:01:41 lr 0.000823 wd 0.0500 time 0.4775 (0.4710) data time 0.0011 (0.0021) model time 0.4764 (0.4689) loss 3.2655 (3.0174) grad_norm 2.8697 (1.6838) loss_scale 2048.0000 (1151.0657) mem 16715MB [2024-08-10 12:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][420/625] eta 0:01:36 lr 0.000823 wd 0.0500 time 0.4655 (0.4708) data time 0.0010 (0.0021) model time 0.4645 (0.4688) loss 3.5345 (3.0197) grad_norm 1.5726 (1.6820) loss_scale 2048.0000 (1172.3705) mem 16715MB [2024-08-10 12:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][430/625] eta 0:01:31 lr 0.000823 wd 0.0500 time 0.4717 (0.4707) data time 0.0011 (0.0021) model time 0.4706 (0.4687) loss 3.1764 (3.0210) grad_norm 1.2975 (1.6825) loss_scale 2048.0000 (1192.6868) mem 16715MB [2024-08-10 12:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][440/625] eta 0:01:27 lr 0.000823 wd 0.0500 time 0.4660 (0.4706) data time 0.0011 (0.0021) model time 0.4650 (0.4686) loss 2.8307 (3.0203) grad_norm 1.8182 (1.6832) loss_scale 2048.0000 (1212.0816) mem 16715MB [2024-08-10 12:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][450/625] eta 0:01:22 lr 0.000823 wd 0.0500 time 0.6758 (0.4709) data time 0.0008 (0.0020) model time 0.6750 (0.4690) loss 3.6723 (3.0278) grad_norm 2.2809 (1.6862) loss_scale 2048.0000 (1230.6164) mem 16715MB [2024-08-10 12:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][460/625] eta 0:01:17 lr 0.000823 wd 0.0500 time 0.4678 (0.4712) data time 0.0007 (0.0020) model time 0.4671 (0.4693) loss 3.4316 (3.0258) grad_norm 1.4427 (1.6871) loss_scale 2048.0000 (1248.3471) mem 16715MB [2024-08-10 12:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][470/625] eta 0:01:13 lr 0.000822 wd 0.0500 time 0.4624 (0.4710) data time 0.0009 (0.0020) model time 0.4615 (0.4691) loss 3.2084 (3.0239) grad_norm 1.0974 (1.6801) loss_scale 2048.0000 (1265.3248) mem 16715MB [2024-08-10 12:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][480/625] eta 0:01:08 lr 0.000822 wd 0.0500 time 0.4623 (0.4709) data time 0.0008 (0.0020) model time 0.4615 (0.4690) loss 3.2769 (3.0316) grad_norm 1.0629 (1.6752) loss_scale 2048.0000 (1281.5967) mem 16715MB [2024-08-10 12:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][490/625] eta 0:01:03 lr 0.000822 wd 0.0500 time 0.4662 (0.4708) data time 0.0008 (0.0020) model time 0.4654 (0.4689) loss 3.2121 (3.0319) grad_norm 1.4556 (1.6752) loss_scale 2048.0000 (1297.2057) mem 16715MB [2024-08-10 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][500/625] eta 0:00:58 lr 0.000822 wd 0.0500 time 0.4644 (0.4708) data time 0.0007 (0.0020) model time 0.4637 (0.4689) loss 3.2813 (3.0306) grad_norm 1.3806 (1.6706) loss_scale 2048.0000 (1312.1916) mem 16715MB [2024-08-10 12:03:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][510/625] eta 0:00:54 lr 0.000822 wd 0.0500 time 0.4618 (0.4708) data time 0.0010 (0.0020) model time 0.4607 (0.4689) loss 2.6785 (3.0227) grad_norm 1.4669 (1.6695) loss_scale 2048.0000 (1326.5910) mem 16715MB [2024-08-10 12:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][520/625] eta 0:00:49 lr 0.000822 wd 0.0500 time 0.4659 (0.4707) data time 0.0010 (0.0020) model time 0.4649 (0.4688) loss 3.3805 (3.0264) grad_norm 1.3892 (1.6654) loss_scale 2048.0000 (1340.4376) mem 16715MB [2024-08-10 12:03:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][530/625] eta 0:00:44 lr 0.000822 wd 0.0500 time 0.4657 (0.4708) data time 0.0008 (0.0020) model time 0.4649 (0.4689) loss 3.7050 (3.0236) grad_norm 2.0479 (1.6632) loss_scale 2048.0000 (1353.7627) mem 16715MB [2024-08-10 12:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][540/625] eta 0:00:40 lr 0.000822 wd 0.0500 time 0.4553 (0.4708) data time 0.0007 (0.0019) model time 0.4545 (0.4689) loss 3.2147 (3.0235) grad_norm 1.7671 (1.6631) loss_scale 2048.0000 (1366.5952) mem 16715MB [2024-08-10 12:03:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][550/625] eta 0:00:35 lr 0.000822 wd 0.0500 time 0.4723 (0.4707) data time 0.0009 (0.0019) model time 0.4713 (0.4688) loss 3.1621 (3.0258) grad_norm 1.8796 (1.6602) loss_scale 2048.0000 (1378.9619) mem 16715MB [2024-08-10 12:03:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][560/625] eta 0:00:30 lr 0.000822 wd 0.0500 time 0.4619 (0.4710) data time 0.0010 (0.0019) model time 0.4609 (0.4691) loss 3.4525 (3.0253) grad_norm 1.0298 (1.6587) loss_scale 2048.0000 (1390.8877) mem 16715MB [2024-08-10 12:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][570/625] eta 0:00:25 lr 0.000821 wd 0.0500 time 0.4644 (0.4708) data time 0.0010 (0.0019) model time 0.4634 (0.4690) loss 3.3104 (3.0243) grad_norm 1.3608 (1.6548) loss_scale 2048.0000 (1402.3958) mem 16715MB [2024-08-10 12:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][580/625] eta 0:00:21 lr 0.000821 wd 0.0500 time 0.4563 (0.4707) data time 0.0009 (0.0019) model time 0.4554 (0.4689) loss 2.5201 (3.0272) grad_norm 1.4448 (1.6505) loss_scale 2048.0000 (1413.5077) mem 16715MB [2024-08-10 12:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][590/625] eta 0:00:16 lr 0.000821 wd 0.0500 time 0.4631 (0.4709) data time 0.0011 (0.0019) model time 0.4620 (0.4690) loss 2.8756 (3.0297) grad_norm 2.0527 (1.6549) loss_scale 2048.0000 (1424.2437) mem 16715MB [2024-08-10 12:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][600/625] eta 0:00:11 lr 0.000821 wd 0.0500 time 0.4621 (0.4709) data time 0.0011 (0.0019) model time 0.4611 (0.4690) loss 3.0923 (3.0308) grad_norm 1.3395 (1.6565) loss_scale 2048.0000 (1434.6223) mem 16715MB [2024-08-10 12:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][610/625] eta 0:00:07 lr 0.000821 wd 0.0500 time 0.4586 (0.4708) data time 0.0008 (0.0019) model time 0.4578 (0.4689) loss 3.4355 (3.0342) grad_norm 3.4440 (1.6608) loss_scale 2048.0000 (1444.6612) mem 16715MB [2024-08-10 12:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [126/300][620/625] eta 0:00:02 lr 0.000821 wd 0.0500 time 0.4589 (0.4707) data time 0.0007 (0.0020) model time 0.4581 (0.4688) loss 3.3397 (3.0363) grad_norm 2.2800 (1.6655) loss_scale 2048.0000 (1454.3768) mem 16715MB [2024-08-10 12:04:28 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 126 training takes 0:04:54 [2024-08-10 12:04:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:04:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.5508 (0.5508) Acc@1 87.793 (87.793) Acc@5 97.998 (97.998) Mem 16715MB [2024-08-10 12:04:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.162) Loss 0.9336 (0.6805) Acc@1 77.979 (84.925) Acc@5 95.068 (97.221) Mem 16715MB [2024-08-10 12:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.141) Loss 0.9858 (0.8019) Acc@1 76.318 (81.875) Acc@5 94.238 (95.905) Mem 16715MB [2024-08-10 12:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.548 Acc@5 95.873 [2024-08-10 12:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-10 12:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.55% [2024-08-10 12:04:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 12:04:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 12:04:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.4824 (0.4824) Acc@1 89.062 (89.062) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 12:04:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.161) Loss 0.7808 (0.6090) Acc@1 80.908 (86.412) Acc@5 96.387 (97.794) Mem 16715MB [2024-08-10 12:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.8955 (0.7165) Acc@1 77.490 (83.466) Acc@5 95.117 (96.670) Mem 16715MB [2024-08-10 12:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.187 Acc@5 96.687 [2024-08-10 12:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 12:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][0/625] eta 0:13:50 lr 0.000821 wd 0.0500 time 1.3284 (1.3284) data time 0.8451 (0.8451) model time 0.0000 (0.0000) loss 3.1840 (3.1840) grad_norm 1.9560 (1.9560) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][10/625] eta 0:05:33 lr 0.000821 wd 0.0500 time 0.4631 (0.5415) data time 0.0010 (0.0778) model time 0.0000 (0.0000) loss 3.1221 (3.2226) grad_norm 1.6401 (2.0226) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:04:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][20/625] eta 0:05:07 lr 0.000821 wd 0.0500 time 0.4699 (0.5079) data time 0.0010 (0.0413) model time 0.0000 (0.0000) loss 3.3911 (3.2573) grad_norm 1.4627 (1.8753) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:04:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][30/625] eta 0:04:58 lr 0.000821 wd 0.0500 time 0.4719 (0.5017) data time 0.0008 (0.0284) model time 0.0000 (0.0000) loss 3.1074 (3.1871) grad_norm 1.6419 (1.7307) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:04:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][40/625] eta 0:04:51 lr 0.000821 wd 0.0500 time 0.4652 (0.4982) data time 0.0013 (0.0217) model time 0.0000 (0.0000) loss 2.7536 (3.1179) grad_norm 2.3732 (1.6697) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][50/625] eta 0:04:43 lr 0.000820 wd 0.0500 time 0.4701 (0.4922) data time 0.0008 (0.0177) model time 0.0000 (0.0000) loss 3.0663 (3.1221) grad_norm 1.3493 (1.6298) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][60/625] eta 0:04:35 lr 0.000820 wd 0.0500 time 0.4613 (0.4879) data time 0.0010 (0.0149) model time 0.4603 (0.4644) loss 3.1625 (3.1288) grad_norm 2.2920 (1.7060) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][70/625] eta 0:04:29 lr 0.000820 wd 0.0500 time 0.4622 (0.4847) data time 0.0010 (0.0130) model time 0.4612 (0.4645) loss 2.6631 (3.1503) grad_norm 1.6414 (1.7023) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][80/625] eta 0:04:22 lr 0.000820 wd 0.0500 time 0.4633 (0.4825) data time 0.0008 (0.0115) model time 0.4625 (0.4650) loss 3.5852 (3.1307) grad_norm 1.4179 (1.6922) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][90/625] eta 0:04:17 lr 0.000820 wd 0.0500 time 0.4675 (0.4806) data time 0.0007 (0.0104) model time 0.4667 (0.4648) loss 2.6641 (3.1240) grad_norm 1.4574 (1.6744) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][100/625] eta 0:04:11 lr 0.000820 wd 0.0500 time 0.4644 (0.4792) data time 0.0008 (0.0094) model time 0.4636 (0.4648) loss 3.4772 (3.0961) grad_norm 1.5773 (1.6726) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][110/625] eta 0:04:06 lr 0.000820 wd 0.0500 time 0.4670 (0.4788) data time 0.0011 (0.0087) model time 0.4659 (0.4664) loss 3.0827 (3.1003) grad_norm 1.3948 (1.7052) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][120/625] eta 0:04:01 lr 0.000820 wd 0.0500 time 0.4634 (0.4780) data time 0.0011 (0.0081) model time 0.4624 (0.4666) loss 3.1931 (3.0920) grad_norm 2.0922 (1.7138) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][130/625] eta 0:03:57 lr 0.000820 wd 0.0500 time 0.4650 (0.4790) data time 0.0008 (0.0075) model time 0.4642 (0.4695) loss 3.2126 (3.1013) grad_norm 1.3549 (1.7248) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][140/625] eta 0:03:51 lr 0.000820 wd 0.0500 time 0.4634 (0.4781) data time 0.0010 (0.0071) model time 0.4624 (0.4691) loss 3.2033 (3.0795) grad_norm 1.4340 (1.7295) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][150/625] eta 0:03:46 lr 0.000819 wd 0.0500 time 0.4618 (0.4773) data time 0.0010 (0.0067) model time 0.4608 (0.4686) loss 2.0728 (3.0632) grad_norm 1.2766 (1.7303) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][160/625] eta 0:03:41 lr 0.000819 wd 0.0500 time 0.4621 (0.4765) data time 0.0008 (0.0063) model time 0.4613 (0.4682) loss 3.5895 (3.0623) grad_norm 2.4063 (1.7329) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][170/625] eta 0:03:36 lr 0.000819 wd 0.0500 time 0.4696 (0.4760) data time 0.0008 (0.0060) model time 0.4688 (0.4680) loss 2.6673 (3.0587) grad_norm 2.1589 (1.7499) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][180/625] eta 0:03:31 lr 0.000819 wd 0.0500 time 0.4578 (0.4754) data time 0.0008 (0.0057) model time 0.4570 (0.4678) loss 2.5781 (3.0609) grad_norm 1.5113 (1.7664) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][190/625] eta 0:03:26 lr 0.000819 wd 0.0500 time 0.4666 (0.4749) data time 0.0008 (0.0056) model time 0.4658 (0.4675) loss 3.6704 (3.0585) grad_norm 2.1279 (1.7786) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][200/625] eta 0:03:21 lr 0.000819 wd 0.0500 time 0.4608 (0.4744) data time 0.0011 (0.0053) model time 0.4597 (0.4672) loss 3.0589 (3.0627) grad_norm 1.2936 (1.7728) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][210/625] eta 0:03:16 lr 0.000819 wd 0.0500 time 0.4692 (0.4739) data time 0.0010 (0.0051) model time 0.4682 (0.4669) loss 3.3064 (3.0714) grad_norm 1.9328 (1.7569) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][220/625] eta 0:03:12 lr 0.000819 wd 0.0500 time 0.4622 (0.4741) data time 0.0008 (0.0049) model time 0.4614 (0.4676) loss 3.5240 (3.0643) grad_norm 1.5691 (1.7371) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][230/625] eta 0:03:07 lr 0.000819 wd 0.0500 time 0.4674 (0.4739) data time 0.0007 (0.0048) model time 0.4667 (0.4676) loss 3.2956 (3.0667) grad_norm 2.0219 (1.7171) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][240/625] eta 0:03:02 lr 0.000819 wd 0.0500 time 0.4610 (0.4735) data time 0.0016 (0.0046) model time 0.4594 (0.4673) loss 3.2983 (3.0756) grad_norm 1.2066 (1.7067) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][250/625] eta 0:02:57 lr 0.000818 wd 0.0500 time 0.4631 (0.4732) data time 0.0007 (0.0045) model time 0.4624 (0.4672) loss 3.6496 (3.0848) grad_norm 1.1960 (1.7092) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][260/625] eta 0:02:52 lr 0.000818 wd 0.0500 time 0.4639 (0.4729) data time 0.0010 (0.0044) model time 0.4629 (0.4670) loss 2.7137 (3.0862) grad_norm 1.4449 (1.7208) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][270/625] eta 0:02:47 lr 0.000818 wd 0.0500 time 0.4627 (0.4726) data time 0.0008 (0.0042) model time 0.4619 (0.4669) loss 3.8646 (3.0843) grad_norm 1.8017 (1.7211) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][280/625] eta 0:02:42 lr 0.000818 wd 0.0500 time 0.4644 (0.4722) data time 0.0008 (0.0041) model time 0.4636 (0.4667) loss 3.6375 (3.0883) grad_norm 1.0145 (1.7117) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:06:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][290/625] eta 0:02:38 lr 0.000818 wd 0.0500 time 0.4638 (0.4720) data time 0.0008 (0.0040) model time 0.4630 (0.4665) loss 3.0489 (3.0936) grad_norm 1.2354 (1.7066) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][300/625] eta 0:02:33 lr 0.000818 wd 0.0500 time 0.4639 (0.4717) data time 0.0008 (0.0039) model time 0.4631 (0.4664) loss 3.0298 (3.0947) grad_norm 1.2165 (1.7006) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][310/625] eta 0:02:28 lr 0.000818 wd 0.0500 time 0.4630 (0.4714) data time 0.0011 (0.0038) model time 0.4620 (0.4663) loss 3.3129 (3.1008) grad_norm 2.0070 (1.6963) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][320/625] eta 0:02:23 lr 0.000818 wd 0.0500 time 0.4686 (0.4713) data time 0.0008 (0.0037) model time 0.4678 (0.4662) loss 3.5266 (3.0968) grad_norm 1.6623 (1.6885) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][330/625] eta 0:02:19 lr 0.000818 wd 0.0500 time 0.4622 (0.4715) data time 0.0010 (0.0037) model time 0.4612 (0.4667) loss 2.9420 (3.1024) grad_norm 1.2813 (1.6847) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][340/625] eta 0:02:14 lr 0.000818 wd 0.0500 time 0.4653 (0.4713) data time 0.0008 (0.0036) model time 0.4645 (0.4665) loss 1.9819 (3.1030) grad_norm 1.8777 (1.6784) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][350/625] eta 0:02:09 lr 0.000817 wd 0.0500 time 0.4668 (0.4715) data time 0.0008 (0.0035) model time 0.4660 (0.4669) loss 2.9166 (3.1005) grad_norm 1.0791 (1.6747) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][360/625] eta 0:02:04 lr 0.000817 wd 0.0500 time 0.4581 (0.4712) data time 0.0010 (0.0034) model time 0.4571 (0.4667) loss 3.3863 (3.0981) grad_norm 1.3916 (1.6714) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][370/625] eta 0:02:00 lr 0.000817 wd 0.0500 time 0.4635 (0.4710) data time 0.0013 (0.0034) model time 0.4622 (0.4665) loss 3.1948 (3.0951) grad_norm 1.5368 (1.6831) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][380/625] eta 0:01:55 lr 0.000817 wd 0.0500 time 0.4622 (0.4708) data time 0.0011 (0.0033) model time 0.4612 (0.4663) loss 2.9583 (3.0899) grad_norm 2.3420 (1.6817) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][390/625] eta 0:01:50 lr 0.000817 wd 0.0500 time 0.4613 (0.4706) data time 0.0008 (0.0033) model time 0.4606 (0.4662) loss 3.5229 (3.0870) grad_norm 1.2349 (1.6738) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][400/625] eta 0:01:45 lr 0.000817 wd 0.0500 time 0.4658 (0.4705) data time 0.0010 (0.0032) model time 0.4648 (0.4662) loss 3.6374 (3.0810) grad_norm 1.2801 (1.6649) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][410/625] eta 0:01:41 lr 0.000817 wd 0.0500 time 0.4669 (0.4703) data time 0.0009 (0.0032) model time 0.4660 (0.4661) loss 3.6273 (3.0806) grad_norm 1.2820 (1.6573) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:07:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][420/625] eta 0:01:36 lr 0.000817 wd 0.0500 time 0.4609 (0.4702) data time 0.0010 (0.0031) model time 0.4600 (0.4660) loss 2.6180 (3.0792) grad_norm 1.6260 (1.6747) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][430/625] eta 0:01:31 lr 0.000817 wd 0.0500 time 0.4579 (0.4699) data time 0.0008 (0.0031) model time 0.4572 (0.4658) loss 2.2303 (3.0805) grad_norm 1.2819 (inf) loss_scale 1024.0000 (2033.7448) mem 16715MB [2024-08-10 12:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][440/625] eta 0:01:26 lr 0.000817 wd 0.0500 time 0.4639 (0.4702) data time 0.0007 (0.0030) model time 0.4632 (0.4662) loss 3.3775 (3.0816) grad_norm 1.3555 (inf) loss_scale 1024.0000 (2010.8481) mem 16715MB [2024-08-10 12:08:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][450/625] eta 0:01:22 lr 0.000816 wd 0.0500 time 0.4621 (0.4704) data time 0.0008 (0.0030) model time 0.4614 (0.4665) loss 3.5501 (3.0835) grad_norm 1.7957 (inf) loss_scale 1024.0000 (1988.9667) mem 16715MB [2024-08-10 12:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][460/625] eta 0:01:17 lr 0.000816 wd 0.0500 time 0.4652 (0.4702) data time 0.0010 (0.0029) model time 0.4642 (0.4664) loss 2.4349 (3.0851) grad_norm 1.7220 (inf) loss_scale 1024.0000 (1968.0347) mem 16715MB [2024-08-10 12:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][470/625] eta 0:01:12 lr 0.000816 wd 0.0500 time 0.6942 (0.4706) data time 0.0010 (0.0029) model time 0.6932 (0.4669) loss 2.8321 (3.0851) grad_norm 1.2277 (inf) loss_scale 1024.0000 (1947.9915) mem 16715MB [2024-08-10 12:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][480/625] eta 0:01:08 lr 0.000816 wd 0.0500 time 0.4780 (0.4705) data time 0.0008 (0.0029) model time 0.4772 (0.4668) loss 2.4523 (3.0879) grad_norm 1.6715 (inf) loss_scale 1024.0000 (1928.7817) mem 16715MB [2024-08-10 12:08:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][490/625] eta 0:01:03 lr 0.000816 wd 0.0500 time 0.4637 (0.4704) data time 0.0008 (0.0028) model time 0.4629 (0.4667) loss 1.9986 (3.0812) grad_norm 1.4404 (inf) loss_scale 1024.0000 (1910.3544) mem 16715MB [2024-08-10 12:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][500/625] eta 0:00:58 lr 0.000816 wd 0.0500 time 0.4727 (0.4708) data time 0.0010 (0.0028) model time 0.4718 (0.4673) loss 3.0598 (3.0766) grad_norm 1.7960 (inf) loss_scale 1024.0000 (1892.6627) mem 16715MB [2024-08-10 12:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][510/625] eta 0:00:54 lr 0.000816 wd 0.0500 time 0.4691 (0.4706) data time 0.0010 (0.0027) model time 0.4681 (0.4672) loss 3.1043 (3.0776) grad_norm 1.2323 (inf) loss_scale 1024.0000 (1875.6634) mem 16715MB [2024-08-10 12:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][520/625] eta 0:00:49 lr 0.000816 wd 0.0500 time 0.4630 (0.4705) data time 0.0010 (0.0027) model time 0.4620 (0.4670) loss 3.5965 (3.0719) grad_norm 1.2306 (inf) loss_scale 512.0000 (1849.4894) mem 16715MB [2024-08-10 12:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][530/625] eta 0:00:44 lr 0.000816 wd 0.0500 time 0.4646 (0.4703) data time 0.0009 (0.0027) model time 0.4637 (0.4669) loss 3.6974 (3.0778) grad_norm 1.3377 (inf) loss_scale 512.0000 (1824.3013) mem 16715MB [2024-08-10 12:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][540/625] eta 0:00:39 lr 0.000816 wd 0.0500 time 0.4701 (0.4702) data time 0.0010 (0.0027) model time 0.4690 (0.4668) loss 3.2839 (3.0820) grad_norm 1.4811 (inf) loss_scale 512.0000 (1800.0444) mem 16715MB [2024-08-10 12:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][550/625] eta 0:00:35 lr 0.000815 wd 0.0500 time 0.4715 (0.4702) data time 0.0008 (0.0026) model time 0.4707 (0.4668) loss 3.6015 (3.0779) grad_norm 1.6758 (inf) loss_scale 512.0000 (1776.6679) mem 16715MB [2024-08-10 12:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][560/625] eta 0:00:30 lr 0.000815 wd 0.0500 time 0.4605 (0.4701) data time 0.0008 (0.0026) model time 0.4597 (0.4668) loss 3.0934 (3.0790) grad_norm 1.4970 (inf) loss_scale 512.0000 (1754.1248) mem 16715MB [2024-08-10 12:09:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][570/625] eta 0:00:25 lr 0.000815 wd 0.0500 time 0.4642 (0.4700) data time 0.0010 (0.0026) model time 0.4632 (0.4667) loss 2.6999 (3.0771) grad_norm 2.1997 (inf) loss_scale 512.0000 (1732.3713) mem 16715MB [2024-08-10 12:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][580/625] eta 0:00:21 lr 0.000815 wd 0.0500 time 0.4650 (0.4699) data time 0.0008 (0.0025) model time 0.4642 (0.4667) loss 3.0115 (3.0755) grad_norm 1.5000 (inf) loss_scale 512.0000 (1711.3666) mem 16715MB [2024-08-10 12:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][590/625] eta 0:00:16 lr 0.000815 wd 0.0500 time 0.4687 (0.4704) data time 0.0010 (0.0025) model time 0.4676 (0.4672) loss 3.3527 (3.0807) grad_norm 1.6588 (inf) loss_scale 512.0000 (1691.0728) mem 16715MB [2024-08-10 12:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][600/625] eta 0:00:11 lr 0.000815 wd 0.0500 time 0.4666 (0.4703) data time 0.0012 (0.0025) model time 0.4655 (0.4672) loss 3.2870 (3.0761) grad_norm 1.8072 (inf) loss_scale 512.0000 (1671.4542) mem 16715MB [2024-08-10 12:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][610/625] eta 0:00:07 lr 0.000815 wd 0.0500 time 0.4625 (0.4702) data time 0.0008 (0.0025) model time 0.4618 (0.4671) loss 2.7370 (3.0735) grad_norm 1.5661 (inf) loss_scale 512.0000 (1652.4779) mem 16715MB [2024-08-10 12:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [127/300][620/625] eta 0:00:02 lr 0.000815 wd 0.0500 time 0.4620 (0.4701) data time 0.0005 (0.0025) model time 0.4614 (0.4670) loss 3.2923 (3.0738) grad_norm 1.8924 (inf) loss_scale 512.0000 (1634.1127) mem 16715MB [2024-08-10 12:09:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 127 training takes 0:04:53 [2024-08-10 12:09:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:09:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.5405 (0.5405) Acc@1 88.135 (88.135) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 12:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.9009 (0.6724) Acc@1 78.467 (84.996) Acc@5 95.264 (97.283) Mem 16715MB [2024-08-10 12:09:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0176 (0.8011) Acc@1 75.488 (81.682) Acc@5 93.701 (95.924) Mem 16715MB [2024-08-10 12:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.448 Acc@5 95.899 [2024-08-10 12:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-10 12:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.806 (0.806) Loss 0.4812 (0.4812) Acc@1 88.965 (88.965) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 12:09:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.192) Loss 0.7798 (0.6085) Acc@1 81.104 (86.412) Acc@5 96.289 (97.789) Mem 16715MB [2024-08-10 12:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.8945 (0.7157) Acc@1 77.783 (83.522) Acc@5 95.068 (96.659) Mem 16715MB [2024-08-10 12:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.233 Acc@5 96.675 [2024-08-10 12:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 12:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.23% [2024-08-10 12:09:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:09:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][0/625] eta 0:08:25 lr 0.000815 wd 0.0500 time 0.8089 (0.8089) data time 0.3983 (0.3983) model time 0.0000 (0.0000) loss 3.3876 (3.3876) grad_norm 1.8256 (1.8256) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][10/625] eta 0:05:05 lr 0.000815 wd 0.0500 time 0.4627 (0.4961) data time 0.0012 (0.0372) model time 0.0000 (0.0000) loss 3.1434 (3.2619) grad_norm 1.6934 (1.4937) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][20/625] eta 0:04:50 lr 0.000815 wd 0.0500 time 0.4633 (0.4808) data time 0.0008 (0.0200) model time 0.0000 (0.0000) loss 2.9909 (3.0822) grad_norm 1.7926 (1.5615) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][30/625] eta 0:04:42 lr 0.000814 wd 0.0500 time 0.4637 (0.4751) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 3.1611 (3.0149) grad_norm 1.5915 (1.5811) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][40/625] eta 0:04:36 lr 0.000814 wd 0.0500 time 0.4757 (0.4722) data time 0.0010 (0.0108) model time 0.0000 (0.0000) loss 3.0444 (3.0216) grad_norm 1.1089 (1.5763) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][50/625] eta 0:04:32 lr 0.000814 wd 0.0500 time 0.4639 (0.4739) data time 0.0010 (0.0089) model time 0.0000 (0.0000) loss 3.4063 (3.0588) grad_norm 1.9346 (1.6940) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][60/625] eta 0:04:26 lr 0.000814 wd 0.0500 time 0.4669 (0.4725) data time 0.0010 (0.0076) model time 0.4659 (0.4643) loss 3.4851 (3.0700) grad_norm 1.5316 (1.7470) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][70/625] eta 0:04:21 lr 0.000814 wd 0.0500 time 0.4643 (0.4713) data time 0.0008 (0.0067) model time 0.4635 (0.4638) loss 4.0269 (3.0796) grad_norm 2.2185 (1.7367) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][80/625] eta 0:04:16 lr 0.000814 wd 0.0500 time 0.4633 (0.4702) data time 0.0010 (0.0060) model time 0.4623 (0.4630) loss 3.3814 (3.0798) grad_norm 1.6276 (1.6952) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][90/625] eta 0:04:12 lr 0.000814 wd 0.0500 time 0.4552 (0.4717) data time 0.0011 (0.0054) model time 0.4542 (0.4679) loss 2.9714 (3.0936) grad_norm 1.1708 (1.6620) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][100/625] eta 0:04:07 lr 0.000814 wd 0.0500 time 0.4667 (0.4708) data time 0.0013 (0.0050) model time 0.4653 (0.4667) loss 3.5027 (3.1120) grad_norm 1.4871 (1.6540) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][110/625] eta 0:04:02 lr 0.000814 wd 0.0500 time 0.4594 (0.4701) data time 0.0009 (0.0046) model time 0.4586 (0.4658) loss 2.0104 (3.0797) grad_norm 1.8095 (1.6402) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][120/625] eta 0:03:57 lr 0.000814 wd 0.0500 time 0.4645 (0.4696) data time 0.0010 (0.0044) model time 0.4635 (0.4654) loss 3.2556 (3.0549) grad_norm 1.3645 (1.6409) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][130/625] eta 0:03:53 lr 0.000813 wd 0.0500 time 0.4609 (0.4718) data time 0.0008 (0.0041) model time 0.4601 (0.4694) loss 3.6979 (3.0344) grad_norm 1.3718 (1.6462) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][140/625] eta 0:03:48 lr 0.000813 wd 0.0500 time 0.4705 (0.4715) data time 0.0011 (0.0039) model time 0.4694 (0.4691) loss 3.2853 (3.0476) grad_norm 0.9739 (1.6565) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][150/625] eta 0:03:43 lr 0.000813 wd 0.0500 time 0.4659 (0.4711) data time 0.0008 (0.0037) model time 0.4651 (0.4686) loss 3.4046 (3.0502) grad_norm 1.4473 (1.6638) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][160/625] eta 0:03:38 lr 0.000813 wd 0.0500 time 0.4656 (0.4705) data time 0.0008 (0.0036) model time 0.4648 (0.4679) loss 2.3281 (3.0510) grad_norm 1.1578 (1.6464) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][170/625] eta 0:03:33 lr 0.000813 wd 0.0500 time 0.4626 (0.4701) data time 0.0009 (0.0034) model time 0.4617 (0.4675) loss 3.4488 (3.0575) grad_norm 1.3612 (1.6280) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][180/625] eta 0:03:29 lr 0.000813 wd 0.0500 time 0.4628 (0.4697) data time 0.0008 (0.0033) model time 0.4620 (0.4670) loss 3.1199 (3.0412) grad_norm 1.0235 (1.6195) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][190/625] eta 0:03:24 lr 0.000813 wd 0.0500 time 0.4630 (0.4694) data time 0.0008 (0.0032) model time 0.4622 (0.4666) loss 3.4558 (3.0426) grad_norm 1.5447 (1.6177) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][200/625] eta 0:03:19 lr 0.000813 wd 0.0500 time 0.4635 (0.4691) data time 0.0008 (0.0030) model time 0.4628 (0.4665) loss 2.7957 (3.0398) grad_norm 1.2862 (1.6162) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][210/625] eta 0:03:14 lr 0.000813 wd 0.0500 time 0.4624 (0.4690) data time 0.0010 (0.0030) model time 0.4614 (0.4664) loss 2.9580 (3.0428) grad_norm 1.8573 (1.6088) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][220/625] eta 0:03:09 lr 0.000813 wd 0.0500 time 0.4688 (0.4689) data time 0.0008 (0.0029) model time 0.4680 (0.4663) loss 2.4343 (3.0463) grad_norm 1.4753 (1.6119) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][230/625] eta 0:03:05 lr 0.000812 wd 0.0500 time 0.4654 (0.4687) data time 0.0008 (0.0028) model time 0.4646 (0.4661) loss 3.0574 (3.0453) grad_norm 1.2659 (1.6094) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][240/625] eta 0:03:00 lr 0.000812 wd 0.0500 time 0.4670 (0.4686) data time 0.0007 (0.0027) model time 0.4663 (0.4661) loss 3.3063 (3.0491) grad_norm 1.9501 (1.6276) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][250/625] eta 0:02:55 lr 0.000812 wd 0.0500 time 0.4587 (0.4684) data time 0.0010 (0.0027) model time 0.4578 (0.4659) loss 3.2718 (3.0512) grad_norm 1.1456 (1.6248) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][260/625] eta 0:02:50 lr 0.000812 wd 0.0500 time 0.4609 (0.4683) data time 0.0008 (0.0026) model time 0.4600 (0.4658) loss 3.7347 (3.0531) grad_norm 1.5788 (1.6469) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][270/625] eta 0:02:46 lr 0.000812 wd 0.0500 time 0.4760 (0.4682) data time 0.0010 (0.0025) model time 0.4750 (0.4659) loss 3.4154 (3.0554) grad_norm 1.1062 (1.6397) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][280/625] eta 0:02:41 lr 0.000812 wd 0.0500 time 0.4655 (0.4681) data time 0.0010 (0.0025) model time 0.4645 (0.4658) loss 2.7480 (3.0602) grad_norm 1.8095 (1.6369) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][290/625] eta 0:02:36 lr 0.000812 wd 0.0500 time 0.4637 (0.4680) data time 0.0010 (0.0024) model time 0.4627 (0.4657) loss 3.3993 (3.0660) grad_norm 1.4677 (1.6353) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][300/625] eta 0:02:32 lr 0.000812 wd 0.0500 time 0.4615 (0.4678) data time 0.0009 (0.0024) model time 0.4606 (0.4655) loss 2.2573 (3.0718) grad_norm 1.5715 (1.6357) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][310/625] eta 0:02:27 lr 0.000812 wd 0.0500 time 0.4656 (0.4676) data time 0.0008 (0.0023) model time 0.4648 (0.4653) loss 3.1707 (3.0648) grad_norm 1.2183 (1.6356) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][320/625] eta 0:02:22 lr 0.000812 wd 0.0500 time 0.4600 (0.4676) data time 0.0008 (0.0023) model time 0.4592 (0.4653) loss 3.9497 (3.0665) grad_norm 2.3694 (1.6317) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][330/625] eta 0:02:17 lr 0.000811 wd 0.0500 time 0.4790 (0.4675) data time 0.0008 (0.0023) model time 0.4782 (0.4652) loss 3.4250 (3.0644) grad_norm 1.8497 (1.6370) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][340/625] eta 0:02:13 lr 0.000811 wd 0.0500 time 0.4734 (0.4675) data time 0.0009 (0.0023) model time 0.4725 (0.4653) loss 3.4323 (3.0705) grad_norm 1.4817 (1.6360) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][350/625] eta 0:02:08 lr 0.000811 wd 0.0500 time 0.4693 (0.4675) data time 0.0009 (0.0022) model time 0.4684 (0.4653) loss 3.6017 (3.0683) grad_norm 1.1447 (1.6341) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][360/625] eta 0:02:03 lr 0.000811 wd 0.0500 time 0.4638 (0.4676) data time 0.0010 (0.0022) model time 0.4629 (0.4654) loss 2.3347 (3.0714) grad_norm 0.9881 (1.6366) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][370/625] eta 0:01:59 lr 0.000811 wd 0.0500 time 0.4677 (0.4675) data time 0.0009 (0.0022) model time 0.4669 (0.4654) loss 3.1304 (3.0796) grad_norm 1.3250 (1.6343) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][380/625] eta 0:01:54 lr 0.000811 wd 0.0500 time 0.4658 (0.4679) data time 0.0009 (0.0021) model time 0.4649 (0.4659) loss 3.0236 (3.0764) grad_norm 1.6673 (1.6337) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][390/625] eta 0:01:49 lr 0.000811 wd 0.0500 time 0.4654 (0.4679) data time 0.0010 (0.0021) model time 0.4643 (0.4659) loss 3.3122 (3.0763) grad_norm 1.7978 (1.6313) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][400/625] eta 0:01:45 lr 0.000811 wd 0.0500 time 0.4616 (0.4677) data time 0.0008 (0.0021) model time 0.4608 (0.4657) loss 2.7781 (3.0804) grad_norm 1.4930 (1.6287) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:12:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][410/625] eta 0:01:40 lr 0.000811 wd 0.0500 time 0.4725 (0.4677) data time 0.0008 (0.0021) model time 0.4716 (0.4657) loss 2.6264 (3.0786) grad_norm 1.5462 (1.6248) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][420/625] eta 0:01:35 lr 0.000811 wd 0.0500 time 0.4659 (0.4677) data time 0.0010 (0.0021) model time 0.4649 (0.4657) loss 2.3657 (3.0800) grad_norm 1.2983 (1.6260) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][430/625] eta 0:01:31 lr 0.000810 wd 0.0500 time 0.4672 (0.4684) data time 0.0007 (0.0021) model time 0.4665 (0.4664) loss 3.4667 (3.0736) grad_norm 0.9968 (1.6264) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][440/625] eta 0:01:26 lr 0.000810 wd 0.0500 time 0.4648 (0.4684) data time 0.0008 (0.0021) model time 0.4640 (0.4665) loss 2.8732 (3.0746) grad_norm 2.1203 (1.6299) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][450/625] eta 0:01:21 lr 0.000810 wd 0.0500 time 0.4617 (0.4683) data time 0.0009 (0.0021) model time 0.4608 (0.4664) loss 2.1134 (3.0740) grad_norm 1.7563 (1.6518) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][460/625] eta 0:01:17 lr 0.000810 wd 0.0500 time 0.4625 (0.4683) data time 0.0010 (0.0021) model time 0.4614 (0.4664) loss 2.9649 (3.0731) grad_norm 1.1557 (1.6465) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][470/625] eta 0:01:12 lr 0.000810 wd 0.0500 time 0.4642 (0.4690) data time 0.0007 (0.0020) model time 0.4634 (0.4672) loss 2.4581 (3.0685) grad_norm 1.1760 (1.6418) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][480/625] eta 0:01:07 lr 0.000810 wd 0.0500 time 0.4686 (0.4689) data time 0.0010 (0.0020) model time 0.4676 (0.4671) loss 3.2424 (3.0723) grad_norm 1.8177 (1.6433) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][490/625] eta 0:01:03 lr 0.000810 wd 0.0500 time 0.4640 (0.4689) data time 0.0010 (0.0020) model time 0.4630 (0.4671) loss 3.1844 (3.0707) grad_norm 1.1515 (1.6382) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][500/625] eta 0:00:58 lr 0.000810 wd 0.0500 time 0.4627 (0.4688) data time 0.0010 (0.0020) model time 0.4617 (0.4671) loss 3.1937 (3.0704) grad_norm 1.2146 (1.6306) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][510/625] eta 0:00:53 lr 0.000810 wd 0.0500 time 0.4635 (0.4688) data time 0.0010 (0.0020) model time 0.4625 (0.4670) loss 3.3504 (3.0718) grad_norm 1.1831 (1.6283) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][520/625] eta 0:00:49 lr 0.000810 wd 0.0500 time 0.4621 (0.4688) data time 0.0011 (0.0020) model time 0.4610 (0.4670) loss 3.0837 (3.0710) grad_norm 1.4481 (1.6256) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][530/625] eta 0:00:44 lr 0.000809 wd 0.0500 time 0.4658 (0.4687) data time 0.0008 (0.0020) model time 0.4650 (0.4669) loss 3.1197 (3.0715) grad_norm 1.7773 (1.6234) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][540/625] eta 0:00:39 lr 0.000809 wd 0.0500 time 0.4641 (0.4686) data time 0.0008 (0.0019) model time 0.4633 (0.4669) loss 2.7466 (3.0681) grad_norm 1.5085 (1.6215) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][550/625] eta 0:00:35 lr 0.000809 wd 0.0500 time 0.4614 (0.4685) data time 0.0010 (0.0019) model time 0.4604 (0.4668) loss 3.4026 (3.0653) grad_norm 2.3511 (1.6228) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][560/625] eta 0:00:30 lr 0.000809 wd 0.0500 time 0.4662 (0.4684) data time 0.0012 (0.0019) model time 0.4650 (0.4667) loss 3.3422 (3.0618) grad_norm 1.7169 (1.6268) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][570/625] eta 0:00:25 lr 0.000809 wd 0.0500 time 0.4698 (0.4686) data time 0.0008 (0.0019) model time 0.4690 (0.4668) loss 3.5893 (3.0629) grad_norm 1.9376 (1.6416) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][580/625] eta 0:00:21 lr 0.000809 wd 0.0500 time 0.4658 (0.4686) data time 0.0012 (0.0019) model time 0.4646 (0.4668) loss 3.2125 (3.0637) grad_norm 1.8929 (1.6695) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][590/625] eta 0:00:16 lr 0.000809 wd 0.0500 time 0.4679 (0.4685) data time 0.0008 (0.0019) model time 0.4671 (0.4668) loss 3.1620 (3.0631) grad_norm 0.9895 (1.6700) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][600/625] eta 0:00:11 lr 0.000809 wd 0.0500 time 0.4655 (0.4685) data time 0.0009 (0.0019) model time 0.4646 (0.4668) loss 2.3147 (3.0601) grad_norm 1.7569 (1.6721) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][610/625] eta 0:00:07 lr 0.000809 wd 0.0500 time 0.4628 (0.4684) data time 0.0008 (0.0019) model time 0.4621 (0.4667) loss 3.0004 (3.0585) grad_norm 2.6425 (1.6703) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [128/300][620/625] eta 0:00:02 lr 0.000809 wd 0.0500 time 0.4571 (0.4686) data time 0.0006 (0.0018) model time 0.4566 (0.4669) loss 3.1320 (3.0575) grad_norm 1.6023 (1.6699) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:36 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 128 training takes 0:04:52 [2024-08-10 12:14:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:14:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5464 (0.5464) Acc@1 88.770 (88.770) Acc@5 98.096 (98.096) Mem 16715MB [2024-08-10 12:14:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.8862 (0.6885) Acc@1 79.004 (84.912) Acc@5 95.752 (97.341) Mem 16715MB [2024-08-10 12:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0664 (0.8082) Acc@1 74.316 (81.664) Acc@5 93.848 (96.010) Mem 16715MB [2024-08-10 12:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.408 Acc@5 95.987 [2024-08-10 12:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.4% [2024-08-10 12:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.796 (0.796) Loss 0.4805 (0.4805) Acc@1 89.062 (89.062) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 12:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.190) Loss 0.7798 (0.6082) Acc@1 81.104 (86.497) Acc@5 96.240 (97.789) Mem 16715MB [2024-08-10 12:14:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.156) Loss 0.8940 (0.7151) Acc@1 77.734 (83.584) Acc@5 95.068 (96.677) Mem 16715MB [2024-08-10 12:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.295 Acc@5 96.697 [2024-08-10 12:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 12:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.30% [2024-08-10 12:14:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:14:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][0/625] eta 0:08:21 lr 0.000808 wd 0.0500 time 0.8019 (0.8019) data time 0.3858 (0.3858) model time 0.0000 (0.0000) loss 3.0604 (3.0604) grad_norm 1.2365 (1.2365) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][10/625] eta 0:05:05 lr 0.000808 wd 0.0500 time 0.4755 (0.4974) data time 0.0008 (0.0361) model time 0.0000 (0.0000) loss 2.8746 (3.0000) grad_norm 1.5912 (1.4898) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:14:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][20/625] eta 0:04:51 lr 0.000808 wd 0.0500 time 0.4599 (0.4817) data time 0.0010 (0.0196) model time 0.0000 (0.0000) loss 3.5001 (3.0627) grad_norm 1.7715 (1.6334) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][30/625] eta 0:04:43 lr 0.000808 wd 0.0500 time 0.4581 (0.4766) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 3.8008 (3.1289) grad_norm 1.2478 (1.6430) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][40/625] eta 0:04:37 lr 0.000808 wd 0.0500 time 0.4718 (0.4737) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 3.2941 (3.0809) grad_norm 2.2440 (1.7828) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][50/625] eta 0:04:33 lr 0.000808 wd 0.0500 time 0.4730 (0.4762) data time 0.0010 (0.0087) model time 0.0000 (0.0000) loss 3.3601 (3.1127) grad_norm 1.8810 (2.1347) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][60/625] eta 0:04:29 lr 0.000808 wd 0.0500 time 0.4626 (0.4763) data time 0.0008 (0.0075) model time 0.4618 (0.4759) loss 3.2143 (3.1237) grad_norm 1.3026 (2.0373) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][70/625] eta 0:04:23 lr 0.000808 wd 0.0500 time 0.4665 (0.4747) data time 0.0008 (0.0066) model time 0.4657 (0.4699) loss 3.3051 (3.1607) grad_norm 1.6161 (1.9819) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][80/625] eta 0:04:18 lr 0.000808 wd 0.0500 time 0.4792 (0.4742) data time 0.0012 (0.0059) model time 0.4780 (0.4699) loss 3.3715 (3.1428) grad_norm 1.4453 (1.9193) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][90/625] eta 0:04:13 lr 0.000808 wd 0.0500 time 0.4660 (0.4741) data time 0.0011 (0.0054) model time 0.4649 (0.4704) loss 2.0257 (3.1403) grad_norm 1.2057 (1.9922) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][100/625] eta 0:04:08 lr 0.000807 wd 0.0500 time 0.4595 (0.4736) data time 0.0007 (0.0049) model time 0.4588 (0.4700) loss 3.8602 (3.1614) grad_norm 1.4481 (1.9589) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][110/625] eta 0:04:03 lr 0.000807 wd 0.0500 time 0.4588 (0.4726) data time 0.0010 (0.0046) model time 0.4577 (0.4686) loss 3.2417 (3.1493) grad_norm 1.6806 (1.9635) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][120/625] eta 0:03:58 lr 0.000807 wd 0.0500 time 0.4590 (0.4716) data time 0.0010 (0.0043) model time 0.4581 (0.4672) loss 3.6046 (3.1562) grad_norm 1.2287 (1.9153) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][130/625] eta 0:03:53 lr 0.000807 wd 0.0500 time 0.4609 (0.4708) data time 0.0011 (0.0041) model time 0.4599 (0.4663) loss 3.2286 (3.1520) grad_norm 1.6817 (1.8805) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][140/625] eta 0:03:48 lr 0.000807 wd 0.0500 time 0.4648 (0.4718) data time 0.0008 (0.0039) model time 0.4640 (0.4682) loss 3.5547 (3.1541) grad_norm 1.3815 (1.8472) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][150/625] eta 0:03:44 lr 0.000807 wd 0.0500 time 0.4645 (0.4721) data time 0.0008 (0.0037) model time 0.4638 (0.4690) loss 3.5576 (3.1413) grad_norm 1.3800 (1.8212) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][160/625] eta 0:03:39 lr 0.000807 wd 0.0500 time 0.4621 (0.4716) data time 0.0007 (0.0035) model time 0.4613 (0.4684) loss 2.8361 (3.1404) grad_norm 1.6248 (1.8087) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][170/625] eta 0:03:34 lr 0.000807 wd 0.0500 time 0.4652 (0.4713) data time 0.0011 (0.0035) model time 0.4641 (0.4679) loss 3.2914 (3.1356) grad_norm 1.2649 (1.7864) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][180/625] eta 0:03:29 lr 0.000807 wd 0.0500 time 0.4734 (0.4708) data time 0.0009 (0.0034) model time 0.4725 (0.4674) loss 2.3885 (3.1261) grad_norm 1.3011 (1.7683) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][190/625] eta 0:03:24 lr 0.000807 wd 0.0500 time 0.4631 (0.4703) data time 0.0008 (0.0033) model time 0.4623 (0.4669) loss 3.1787 (3.1288) grad_norm 1.0153 (1.7466) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][200/625] eta 0:03:19 lr 0.000806 wd 0.0500 time 0.4643 (0.4699) data time 0.0008 (0.0032) model time 0.4635 (0.4664) loss 3.6591 (3.1246) grad_norm 1.6229 (1.7424) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][210/625] eta 0:03:14 lr 0.000806 wd 0.0500 time 0.4610 (0.4695) data time 0.0010 (0.0031) model time 0.4600 (0.4661) loss 3.4531 (3.1179) grad_norm 2.4103 (1.7545) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][220/625] eta 0:03:10 lr 0.000806 wd 0.0500 time 0.4707 (0.4693) data time 0.0008 (0.0030) model time 0.4699 (0.4660) loss 2.4813 (3.1211) grad_norm 1.8737 (1.7695) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][230/625] eta 0:03:05 lr 0.000806 wd 0.0500 time 0.4637 (0.4691) data time 0.0008 (0.0029) model time 0.4629 (0.4658) loss 2.9911 (3.1144) grad_norm 3.0602 (1.7607) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][240/625] eta 0:03:00 lr 0.000806 wd 0.0500 time 0.4609 (0.4696) data time 0.0009 (0.0028) model time 0.4600 (0.4666) loss 2.2639 (3.1054) grad_norm 1.4500 (1.7543) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][250/625] eta 0:02:56 lr 0.000806 wd 0.0500 time 0.4617 (0.4696) data time 0.0010 (0.0028) model time 0.4607 (0.4666) loss 3.3004 (3.1142) grad_norm 1.6449 (1.7473) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][260/625] eta 0:02:51 lr 0.000806 wd 0.0500 time 0.4611 (0.4693) data time 0.0009 (0.0027) model time 0.4603 (0.4664) loss 3.7464 (3.1187) grad_norm 1.8431 (1.7391) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][270/625] eta 0:02:46 lr 0.000806 wd 0.0500 time 0.4595 (0.4690) data time 0.0008 (0.0026) model time 0.4587 (0.4661) loss 2.1879 (3.1212) grad_norm 1.2513 (1.7325) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][280/625] eta 0:02:41 lr 0.000806 wd 0.0500 time 0.4581 (0.4687) data time 0.0008 (0.0026) model time 0.4573 (0.4658) loss 2.9314 (3.1210) grad_norm 2.3290 (1.7301) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][290/625] eta 0:02:36 lr 0.000806 wd 0.0500 time 0.4618 (0.4685) data time 0.0008 (0.0025) model time 0.4610 (0.4656) loss 3.4155 (3.1266) grad_norm 1.4139 (1.7221) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][300/625] eta 0:02:32 lr 0.000805 wd 0.0500 time 0.4870 (0.4686) data time 0.0008 (0.0025) model time 0.4862 (0.4658) loss 3.2821 (3.1227) grad_norm 1.4038 (1.7155) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][310/625] eta 0:02:27 lr 0.000805 wd 0.0500 time 0.4636 (0.4690) data time 0.0010 (0.0024) model time 0.4626 (0.4664) loss 3.0339 (3.1182) grad_norm 0.9979 (1.7054) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][320/625] eta 0:02:23 lr 0.000805 wd 0.0500 time 0.4604 (0.4692) data time 0.0010 (0.0025) model time 0.4594 (0.4666) loss 3.5891 (3.1102) grad_norm 1.4968 (1.7029) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][330/625] eta 0:02:18 lr 0.000805 wd 0.0500 time 0.4652 (0.4692) data time 0.0010 (0.0024) model time 0.4642 (0.4667) loss 3.1577 (3.1111) grad_norm 2.7576 (1.7148) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][340/625] eta 0:02:13 lr 0.000805 wd 0.0500 time 0.4631 (0.4690) data time 0.0008 (0.0024) model time 0.4623 (0.4665) loss 3.4656 (3.1126) grad_norm 2.2229 (1.7239) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][350/625] eta 0:02:08 lr 0.000805 wd 0.0500 time 0.4636 (0.4689) data time 0.0010 (0.0024) model time 0.4626 (0.4664) loss 3.3218 (3.1124) grad_norm 1.3320 (1.7293) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][360/625] eta 0:02:04 lr 0.000805 wd 0.0500 time 0.4648 (0.4688) data time 0.0008 (0.0023) model time 0.4640 (0.4663) loss 2.6232 (3.1132) grad_norm 1.8214 (1.7286) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][370/625] eta 0:01:59 lr 0.000805 wd 0.0500 time 0.4642 (0.4688) data time 0.0008 (0.0023) model time 0.4634 (0.4664) loss 3.4687 (3.1123) grad_norm 1.9129 (1.7225) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][380/625] eta 0:01:54 lr 0.000805 wd 0.0500 time 0.4686 (0.4688) data time 0.0009 (0.0023) model time 0.4677 (0.4664) loss 3.6204 (3.1062) grad_norm 1.1823 (1.7175) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][390/625] eta 0:01:50 lr 0.000805 wd 0.0500 time 0.4653 (0.4688) data time 0.0008 (0.0022) model time 0.4645 (0.4664) loss 3.4619 (3.1110) grad_norm 1.8812 (1.7085) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][400/625] eta 0:01:45 lr 0.000804 wd 0.0500 time 0.4619 (0.4686) data time 0.0011 (0.0022) model time 0.4608 (0.4663) loss 2.8528 (3.1037) grad_norm 1.1984 (1.7000) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][410/625] eta 0:01:40 lr 0.000804 wd 0.0500 time 0.4591 (0.4685) data time 0.0010 (0.0022) model time 0.4581 (0.4661) loss 3.6665 (3.1042) grad_norm 1.3984 (1.6928) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][420/625] eta 0:01:35 lr 0.000804 wd 0.0500 time 0.4593 (0.4683) data time 0.0010 (0.0022) model time 0.4582 (0.4659) loss 3.2418 (3.1022) grad_norm 2.1964 (1.6978) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][430/625] eta 0:01:31 lr 0.000804 wd 0.0500 time 0.4605 (0.4681) data time 0.0008 (0.0021) model time 0.4597 (0.4657) loss 2.8282 (3.1013) grad_norm 1.2553 (1.7009) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][440/625] eta 0:01:26 lr 0.000804 wd 0.0500 time 0.4656 (0.4680) data time 0.0012 (0.0021) model time 0.4644 (0.4657) loss 3.2192 (3.0992) grad_norm 1.2304 (1.6972) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][450/625] eta 0:01:21 lr 0.000804 wd 0.0500 time 0.4660 (0.4679) data time 0.0011 (0.0021) model time 0.4650 (0.4656) loss 2.3778 (3.0980) grad_norm 1.3106 (1.6954) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][460/625] eta 0:01:17 lr 0.000804 wd 0.0500 time 0.4680 (0.4686) data time 0.0010 (0.0021) model time 0.4670 (0.4665) loss 3.2247 (3.0967) grad_norm 1.2676 (1.7005) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][470/625] eta 0:01:12 lr 0.000804 wd 0.0500 time 0.4577 (0.4689) data time 0.0011 (0.0021) model time 0.4566 (0.4667) loss 3.0165 (3.0958) grad_norm 1.3839 (1.6946) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][480/625] eta 0:01:08 lr 0.000804 wd 0.0500 time 0.4633 (0.4692) data time 0.0011 (0.0020) model time 0.4622 (0.4672) loss 2.0395 (3.0953) grad_norm 1.8396 (1.6902) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][490/625] eta 0:01:03 lr 0.000804 wd 0.0500 time 0.4641 (0.4691) data time 0.0010 (0.0020) model time 0.4631 (0.4670) loss 3.2660 (3.0912) grad_norm 1.4895 (1.7104) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][500/625] eta 0:00:58 lr 0.000803 wd 0.0500 time 0.4596 (0.4690) data time 0.0008 (0.0020) model time 0.4588 (0.4669) loss 1.8576 (3.0830) grad_norm 1.6809 (1.7091) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][510/625] eta 0:00:53 lr 0.000803 wd 0.0500 time 0.4661 (0.4689) data time 0.0010 (0.0020) model time 0.4651 (0.4669) loss 3.5809 (3.0859) grad_norm 1.1519 (1.7032) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][520/625] eta 0:00:49 lr 0.000803 wd 0.0500 time 0.4691 (0.4688) data time 0.0009 (0.0020) model time 0.4681 (0.4668) loss 3.2721 (3.0878) grad_norm 1.2873 (1.6986) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][530/625] eta 0:00:44 lr 0.000803 wd 0.0500 time 0.4621 (0.4688) data time 0.0010 (0.0019) model time 0.4612 (0.4667) loss 2.9239 (3.0893) grad_norm 2.0933 (1.6958) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][540/625] eta 0:00:39 lr 0.000803 wd 0.0500 time 0.4645 (0.4687) data time 0.0008 (0.0019) model time 0.4638 (0.4667) loss 3.8575 (3.0867) grad_norm 1.1266 (1.6961) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][550/625] eta 0:00:35 lr 0.000803 wd 0.0500 time 0.4638 (0.4686) data time 0.0010 (0.0019) model time 0.4628 (0.4666) loss 3.0597 (3.0870) grad_norm 1.4205 (1.6931) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][560/625] eta 0:00:30 lr 0.000803 wd 0.0500 time 0.4611 (0.4685) data time 0.0007 (0.0019) model time 0.4604 (0.4665) loss 2.2079 (3.0869) grad_norm 1.8111 (1.6950) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][570/625] eta 0:00:25 lr 0.000803 wd 0.0500 time 0.4587 (0.4684) data time 0.0008 (0.0019) model time 0.4578 (0.4665) loss 3.0523 (3.0837) grad_norm 2.8763 (1.6962) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][580/625] eta 0:00:21 lr 0.000803 wd 0.0500 time 0.4599 (0.4683) data time 0.0010 (0.0019) model time 0.4589 (0.4664) loss 3.1255 (3.0849) grad_norm 1.4769 (1.6972) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][590/625] eta 0:00:16 lr 0.000803 wd 0.0500 time 0.4637 (0.4683) data time 0.0007 (0.0019) model time 0.4630 (0.4663) loss 2.4975 (3.0874) grad_norm 1.1895 (1.6923) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][600/625] eta 0:00:11 lr 0.000802 wd 0.0500 time 0.4633 (0.4682) data time 0.0008 (0.0018) model time 0.4625 (0.4662) loss 1.9042 (3.0847) grad_norm 1.4597 (1.6863) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][610/625] eta 0:00:07 lr 0.000802 wd 0.0500 time 0.4594 (0.4687) data time 0.0008 (0.0018) model time 0.4587 (0.4668) loss 3.6456 (3.0841) grad_norm 1.2678 (1.6866) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [129/300][620/625] eta 0:00:02 lr 0.000802 wd 0.0500 time 0.4604 (0.4685) data time 0.0005 (0.0018) model time 0.4599 (0.4666) loss 3.7772 (3.0822) grad_norm 1.4914 (1.6834) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 129 training takes 0:04:52 [2024-08-10 12:19:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:19:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5796 (0.5796) Acc@1 88.232 (88.232) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 12:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.9458 (0.7004) Acc@1 77.344 (84.917) Acc@5 94.922 (97.164) Mem 16715MB [2024-08-10 12:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0264 (0.8167) Acc@1 76.074 (81.862) Acc@5 94.336 (95.975) Mem 16715MB [2024-08-10 12:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.616 Acc@5 95.983 [2024-08-10 12:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-10 12:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.62% [2024-08-10 12:19:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 12:19:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 12:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.4810 (0.4810) Acc@1 89.111 (89.111) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 12:19:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.159) Loss 0.7788 (0.6077) Acc@1 81.250 (86.532) Acc@5 96.240 (97.794) Mem 16715MB [2024-08-10 12:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.8921 (0.7143) Acc@1 78.076 (83.633) Acc@5 95.166 (96.673) Mem 16715MB [2024-08-10 12:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.331 Acc@5 96.697 [2024-08-10 12:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 12:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.33% [2024-08-10 12:19:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:19:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][0/625] eta 0:09:02 lr 0.000802 wd 0.0500 time 0.8683 (0.8683) data time 0.4588 (0.4588) model time 0.0000 (0.0000) loss 3.2444 (3.2444) grad_norm 1.4172 (1.4172) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:19:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][10/625] eta 0:05:07 lr 0.000802 wd 0.0500 time 0.4623 (0.4995) data time 0.0010 (0.0427) model time 0.0000 (0.0000) loss 2.2086 (2.9890) grad_norm 1.9446 (1.5483) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][20/625] eta 0:04:58 lr 0.000802 wd 0.0500 time 0.4664 (0.4928) data time 0.0008 (0.0229) model time 0.0000 (0.0000) loss 1.8089 (2.8592) grad_norm 1.7987 (1.4772) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][30/625] eta 0:04:48 lr 0.000802 wd 0.0500 time 0.4636 (0.4847) data time 0.0011 (0.0159) model time 0.0000 (0.0000) loss 3.1718 (2.8708) grad_norm 1.0692 (1.5296) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][40/625] eta 0:04:41 lr 0.000802 wd 0.0500 time 0.4705 (0.4805) data time 0.0010 (0.0122) model time 0.0000 (0.0000) loss 2.9583 (2.9042) grad_norm 1.3859 (1.5445) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][50/625] eta 0:04:34 lr 0.000802 wd 0.0500 time 0.4646 (0.4774) data time 0.0010 (0.0100) model time 0.0000 (0.0000) loss 3.2914 (2.9815) grad_norm 1.6991 (1.5440) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][60/625] eta 0:04:28 lr 0.000802 wd 0.0500 time 0.4654 (0.4752) data time 0.0010 (0.0086) model time 0.4644 (0.4629) loss 2.1513 (2.9969) grad_norm 1.7043 (1.5337) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][70/625] eta 0:04:24 lr 0.000801 wd 0.0500 time 0.4593 (0.4767) data time 0.0007 (0.0075) model time 0.4586 (0.4741) loss 2.9279 (2.9915) grad_norm 2.1446 (1.5284) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][80/625] eta 0:04:18 lr 0.000801 wd 0.0500 time 0.4669 (0.4748) data time 0.0011 (0.0067) model time 0.4659 (0.4694) loss 3.3178 (2.9885) grad_norm 1.2296 (1.5383) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][90/625] eta 0:04:13 lr 0.000801 wd 0.0500 time 0.4589 (0.4734) data time 0.0013 (0.0061) model time 0.4576 (0.4673) loss 3.4055 (2.9816) grad_norm 1.6784 (1.5693) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][100/625] eta 0:04:08 lr 0.000801 wd 0.0500 time 0.4722 (0.4726) data time 0.0008 (0.0056) model time 0.4714 (0.4667) loss 3.0109 (3.0010) grad_norm 1.5842 (1.5586) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][110/625] eta 0:04:03 lr 0.000801 wd 0.0500 time 0.4648 (0.4720) data time 0.0008 (0.0052) model time 0.4640 (0.4664) loss 2.8471 (3.0068) grad_norm 1.2436 (1.5556) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][120/625] eta 0:03:58 lr 0.000801 wd 0.0500 time 0.4638 (0.4714) data time 0.0008 (0.0049) model time 0.4630 (0.4659) loss 2.7020 (3.0171) grad_norm 1.3335 (1.5533) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][130/625] eta 0:03:53 lr 0.000801 wd 0.0500 time 0.4588 (0.4708) data time 0.0010 (0.0046) model time 0.4577 (0.4655) loss 2.5812 (3.0033) grad_norm 1.3847 (1.5396) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][140/625] eta 0:03:48 lr 0.000801 wd 0.0500 time 0.4760 (0.4702) data time 0.0009 (0.0043) model time 0.4752 (0.4651) loss 3.2647 (3.0126) grad_norm 1.5509 (1.5361) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][150/625] eta 0:03:43 lr 0.000801 wd 0.0500 time 0.4524 (0.4696) data time 0.0008 (0.0041) model time 0.4516 (0.4646) loss 3.7935 (3.0403) grad_norm 2.0824 (1.5506) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][160/625] eta 0:03:38 lr 0.000801 wd 0.0500 time 0.4631 (0.4690) data time 0.0008 (0.0040) model time 0.4623 (0.4639) loss 2.4548 (3.0403) grad_norm 1.4646 (1.5647) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][170/625] eta 0:03:33 lr 0.000800 wd 0.0500 time 0.4669 (0.4687) data time 0.0009 (0.0038) model time 0.4660 (0.4639) loss 3.9863 (3.0489) grad_norm 1.0335 (1.5661) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][180/625] eta 0:03:28 lr 0.000800 wd 0.0500 time 0.4668 (0.4685) data time 0.0008 (0.0037) model time 0.4660 (0.4639) loss 2.6225 (3.0579) grad_norm 1.6127 (1.5560) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][190/625] eta 0:03:23 lr 0.000800 wd 0.0500 time 0.4630 (0.4684) data time 0.0011 (0.0035) model time 0.4619 (0.4641) loss 3.2326 (3.0596) grad_norm 1.8938 (1.5632) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][200/625] eta 0:03:19 lr 0.000800 wd 0.0500 time 0.4653 (0.4683) data time 0.0008 (0.0034) model time 0.4644 (0.4641) loss 2.6733 (3.0518) grad_norm 1.6246 (1.5541) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][210/625] eta 0:03:14 lr 0.000800 wd 0.0500 time 0.4635 (0.4690) data time 0.0011 (0.0033) model time 0.4625 (0.4652) loss 3.2480 (3.0403) grad_norm 1.2175 (1.5472) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][220/625] eta 0:03:10 lr 0.000800 wd 0.0500 time 0.4672 (0.4693) data time 0.0008 (0.0032) model time 0.4664 (0.4657) loss 3.4410 (3.0291) grad_norm 1.3054 (1.5369) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][230/625] eta 0:03:05 lr 0.000800 wd 0.0500 time 0.4684 (0.4690) data time 0.0010 (0.0031) model time 0.4674 (0.4654) loss 3.3757 (3.0230) grad_norm 1.1371 (1.5392) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][240/625] eta 0:03:00 lr 0.000800 wd 0.0500 time 0.4650 (0.4695) data time 0.0011 (0.0031) model time 0.4640 (0.4662) loss 3.3142 (3.0240) grad_norm 1.5963 (1.5333) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][250/625] eta 0:02:56 lr 0.000800 wd 0.0500 time 0.4608 (0.4695) data time 0.0007 (0.0030) model time 0.4601 (0.4664) loss 2.0775 (3.0154) grad_norm 1.4661 (1.5394) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][260/625] eta 0:02:51 lr 0.000800 wd 0.0500 time 0.4636 (0.4702) data time 0.0008 (0.0029) model time 0.4629 (0.4673) loss 3.7812 (3.0278) grad_norm 1.3926 (1.5412) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][270/625] eta 0:02:46 lr 0.000799 wd 0.0500 time 0.4679 (0.4700) data time 0.0008 (0.0029) model time 0.4671 (0.4672) loss 3.2059 (3.0295) grad_norm 1.3873 (1.5388) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][280/625] eta 0:02:42 lr 0.000799 wd 0.0500 time 0.4621 (0.4698) data time 0.0008 (0.0028) model time 0.4613 (0.4669) loss 1.9301 (3.0174) grad_norm 1.5420 (1.5478) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][290/625] eta 0:02:37 lr 0.000799 wd 0.0500 time 0.4624 (0.4698) data time 0.0010 (0.0027) model time 0.4614 (0.4670) loss 3.0117 (3.0121) grad_norm 2.2806 (1.5652) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][300/625] eta 0:02:32 lr 0.000799 wd 0.0500 time 0.4628 (0.4696) data time 0.0008 (0.0027) model time 0.4620 (0.4668) loss 2.2384 (3.0096) grad_norm 2.2393 (1.5969) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][310/625] eta 0:02:27 lr 0.000799 wd 0.0500 time 0.4674 (0.4693) data time 0.0008 (0.0026) model time 0.4667 (0.4666) loss 3.7737 (3.0051) grad_norm 1.5575 (1.6010) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][320/625] eta 0:02:23 lr 0.000799 wd 0.0500 time 0.4651 (0.4692) data time 0.0009 (0.0026) model time 0.4642 (0.4665) loss 1.7966 (3.0047) grad_norm 2.6647 (1.6111) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][330/625] eta 0:02:18 lr 0.000799 wd 0.0500 time 0.4648 (0.4692) data time 0.0008 (0.0025) model time 0.4640 (0.4665) loss 3.1573 (3.0087) grad_norm 1.4443 (1.6101) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][340/625] eta 0:02:13 lr 0.000799 wd 0.0500 time 0.4589 (0.4690) data time 0.0008 (0.0025) model time 0.4580 (0.4664) loss 2.2499 (3.0072) grad_norm 1.9825 (1.6157) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][350/625] eta 0:02:08 lr 0.000799 wd 0.0500 time 0.4692 (0.4688) data time 0.0008 (0.0025) model time 0.4684 (0.4662) loss 2.8360 (3.0195) grad_norm 1.3685 (1.6164) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][360/625] eta 0:02:04 lr 0.000799 wd 0.0500 time 0.4617 (0.4695) data time 0.0008 (0.0024) model time 0.4609 (0.4671) loss 2.8889 (3.0211) grad_norm 1.8776 (1.6142) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][370/625] eta 0:01:59 lr 0.000798 wd 0.0500 time 0.4604 (0.4693) data time 0.0008 (0.0024) model time 0.4596 (0.4668) loss 2.8248 (3.0283) grad_norm 1.2605 (1.6124) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][380/625] eta 0:01:54 lr 0.000798 wd 0.0500 time 0.4656 (0.4690) data time 0.0008 (0.0023) model time 0.4648 (0.4666) loss 3.5315 (3.0332) grad_norm 1.8992 (1.6121) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][390/625] eta 0:01:50 lr 0.000798 wd 0.0500 time 0.4686 (0.4694) data time 0.0007 (0.0023) model time 0.4679 (0.4671) loss 2.3158 (3.0338) grad_norm 1.2338 (1.6061) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][400/625] eta 0:01:45 lr 0.000798 wd 0.0500 time 0.4715 (0.4693) data time 0.0008 (0.0023) model time 0.4707 (0.4670) loss 3.2825 (3.0305) grad_norm 1.6443 (1.6057) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][410/625] eta 0:01:40 lr 0.000798 wd 0.0500 time 0.4619 (0.4693) data time 0.0013 (0.0022) model time 0.4606 (0.4670) loss 3.6355 (3.0354) grad_norm 1.7533 (1.6033) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][420/625] eta 0:01:36 lr 0.000798 wd 0.0500 time 0.4634 (0.4692) data time 0.0008 (0.0022) model time 0.4626 (0.4670) loss 3.5329 (3.0350) grad_norm 1.8396 (1.6189) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][430/625] eta 0:01:31 lr 0.000798 wd 0.0500 time 0.4669 (0.4691) data time 0.0007 (0.0022) model time 0.4662 (0.4669) loss 3.9143 (3.0355) grad_norm 2.1739 (1.6341) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][440/625] eta 0:01:26 lr 0.000798 wd 0.0500 time 0.4653 (0.4690) data time 0.0010 (0.0022) model time 0.4643 (0.4668) loss 3.1949 (3.0391) grad_norm 1.3025 (1.6405) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][450/625] eta 0:01:22 lr 0.000798 wd 0.0500 time 0.4651 (0.4688) data time 0.0010 (0.0021) model time 0.4641 (0.4666) loss 3.1350 (3.0393) grad_norm 1.2550 (1.6352) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][460/625] eta 0:01:17 lr 0.000798 wd 0.0500 time 0.4642 (0.4688) data time 0.0008 (0.0021) model time 0.4634 (0.4666) loss 2.9468 (3.0348) grad_norm 1.4240 (1.6525) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][470/625] eta 0:01:12 lr 0.000797 wd 0.0500 time 0.4588 (0.4687) data time 0.0007 (0.0021) model time 0.4581 (0.4665) loss 2.0841 (3.0252) grad_norm 1.7738 (1.6511) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][480/625] eta 0:01:08 lr 0.000797 wd 0.0500 time 0.4640 (0.4691) data time 0.0009 (0.0021) model time 0.4631 (0.4670) loss 3.0325 (3.0290) grad_norm 1.8079 (1.6480) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][490/625] eta 0:01:03 lr 0.000797 wd 0.0500 time 0.4610 (0.4690) data time 0.0010 (0.0021) model time 0.4599 (0.4669) loss 2.8786 (3.0319) grad_norm 2.2432 (1.6448) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][500/625] eta 0:00:58 lr 0.000797 wd 0.0500 time 0.4632 (0.4689) data time 0.0011 (0.0020) model time 0.4621 (0.4668) loss 3.1932 (3.0305) grad_norm 2.4077 (1.6462) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][510/625] eta 0:00:53 lr 0.000797 wd 0.0500 time 0.4689 (0.4688) data time 0.0010 (0.0020) model time 0.4679 (0.4668) loss 3.4117 (3.0319) grad_norm 2.5702 (1.6485) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][520/625] eta 0:00:49 lr 0.000797 wd 0.0500 time 0.4583 (0.4687) data time 0.0010 (0.0020) model time 0.4573 (0.4667) loss 2.8330 (3.0332) grad_norm 1.8776 (1.6557) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][530/625] eta 0:00:44 lr 0.000797 wd 0.0500 time 0.4666 (0.4687) data time 0.0010 (0.0020) model time 0.4656 (0.4666) loss 1.8471 (3.0325) grad_norm 2.1938 (1.6522) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][540/625] eta 0:00:39 lr 0.000797 wd 0.0500 time 0.4651 (0.4686) data time 0.0009 (0.0020) model time 0.4642 (0.4665) loss 3.7898 (3.0323) grad_norm 1.4612 (1.6502) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][550/625] eta 0:00:35 lr 0.000797 wd 0.0500 time 0.4631 (0.4685) data time 0.0011 (0.0019) model time 0.4620 (0.4665) loss 2.7677 (3.0280) grad_norm 1.3694 (1.6491) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][560/625] eta 0:00:30 lr 0.000797 wd 0.0500 time 0.4483 (0.4684) data time 0.0011 (0.0020) model time 0.4472 (0.4664) loss 2.4227 (3.0309) grad_norm 1.4354 (1.6486) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][570/625] eta 0:00:25 lr 0.000796 wd 0.0500 time 0.4644 (0.4683) data time 0.0011 (0.0019) model time 0.4633 (0.4663) loss 2.1757 (3.0355) grad_norm 1.5534 (1.6560) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][580/625] eta 0:00:21 lr 0.000796 wd 0.0500 time 0.4597 (0.4683) data time 0.0008 (0.0019) model time 0.4589 (0.4662) loss 3.5676 (3.0349) grad_norm 1.8511 (1.6540) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][590/625] eta 0:00:16 lr 0.000796 wd 0.0500 time 0.4699 (0.4682) data time 0.0008 (0.0019) model time 0.4691 (0.4662) loss 3.5421 (3.0368) grad_norm 5.7909 (1.6610) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][600/625] eta 0:00:11 lr 0.000796 wd 0.0500 time 0.4673 (0.4682) data time 0.0011 (0.0019) model time 0.4663 (0.4662) loss 2.7164 (3.0338) grad_norm 1.5009 (1.6702) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][610/625] eta 0:00:07 lr 0.000796 wd 0.0500 time 0.4610 (0.4681) data time 0.0006 (0.0019) model time 0.4604 (0.4661) loss 1.8606 (3.0361) grad_norm 1.5765 (1.6700) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [130/300][620/625] eta 0:00:02 lr 0.000796 wd 0.0500 time 0.4628 (0.4680) data time 0.0008 (0.0019) model time 0.4620 (0.4661) loss 3.2158 (3.0387) grad_norm 1.4617 (1.6688) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 130 training takes 0:04:52 [2024-08-10 12:24:44 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:24:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:24:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.530 (0.530) Loss 0.5757 (0.5757) Acc@1 88.623 (88.623) Acc@5 98.047 (98.047) Mem 16715MB [2024-08-10 12:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.163) Loss 0.9199 (0.7077) Acc@1 78.174 (84.988) Acc@5 95.557 (97.248) Mem 16715MB [2024-08-10 12:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0381 (0.8262) Acc@1 75.928 (81.913) Acc@5 93.945 (95.996) Mem 16715MB [2024-08-10 12:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.696 Acc@5 96.005 [2024-08-10 12:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-10 12:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.70% [2024-08-10 12:24:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 12:24:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 12:24:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.4812 (0.4812) Acc@1 89.209 (89.209) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 12:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.160) Loss 0.7773 (0.6069) Acc@1 81.396 (86.577) Acc@5 96.338 (97.798) Mem 16715MB [2024-08-10 12:24:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.8901 (0.7133) Acc@1 77.783 (83.652) Acc@5 95.312 (96.698) Mem 16715MB [2024-08-10 12:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.349 Acc@5 96.717 [2024-08-10 12:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 12:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.35% [2024-08-10 12:24:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:24:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][0/625] eta 0:08:40 lr 0.000796 wd 0.0500 time 0.8329 (0.8329) data time 0.4173 (0.4173) model time 0.0000 (0.0000) loss 3.1572 (3.1572) grad_norm 1.9790 (1.9790) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][10/625] eta 0:05:06 lr 0.000796 wd 0.0500 time 0.4620 (0.4983) data time 0.0010 (0.0389) model time 0.0000 (0.0000) loss 3.1065 (2.9547) grad_norm 1.3641 (1.6650) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][20/625] eta 0:04:51 lr 0.000796 wd 0.0500 time 0.4617 (0.4815) data time 0.0008 (0.0210) model time 0.0000 (0.0000) loss 3.0713 (3.0780) grad_norm 1.0694 (1.5546) loss_scale 1024.0000 (755.8095) mem 16715MB [2024-08-10 12:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][30/625] eta 0:04:49 lr 0.000796 wd 0.0500 time 0.4560 (0.4873) data time 0.0010 (0.0146) model time 0.0000 (0.0000) loss 3.1576 (3.1015) grad_norm 1.9147 (1.7606) loss_scale 1024.0000 (842.3226) mem 16715MB [2024-08-10 12:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][40/625] eta 0:04:41 lr 0.000795 wd 0.0500 time 0.4665 (0.4818) data time 0.0009 (0.0113) model time 0.0000 (0.0000) loss 2.7189 (3.0686) grad_norm 2.4518 (1.8387) loss_scale 1024.0000 (886.6341) mem 16715MB [2024-08-10 12:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][50/625] eta 0:04:35 lr 0.000795 wd 0.0500 time 0.4715 (0.4786) data time 0.0012 (0.0093) model time 0.0000 (0.0000) loss 2.7770 (3.0885) grad_norm 2.1812 (1.9249) loss_scale 1024.0000 (913.5686) mem 16715MB [2024-08-10 12:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][60/625] eta 0:04:29 lr 0.000795 wd 0.0500 time 0.4811 (0.4770) data time 0.0008 (0.0079) model time 0.4803 (0.4672) loss 1.9760 (3.0253) grad_norm 1.4962 (1.9149) loss_scale 1024.0000 (931.6721) mem 16715MB [2024-08-10 12:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][70/625] eta 0:04:23 lr 0.000795 wd 0.0500 time 0.4641 (0.4753) data time 0.0010 (0.0070) model time 0.4631 (0.4657) loss 2.9671 (2.9991) grad_norm 1.0993 (1.8408) loss_scale 1024.0000 (944.6761) mem 16715MB [2024-08-10 12:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][80/625] eta 0:04:18 lr 0.000795 wd 0.0500 time 0.4628 (0.4739) data time 0.0010 (0.0062) model time 0.4618 (0.4649) loss 2.9359 (3.0184) grad_norm 1.4577 (1.7840) loss_scale 1024.0000 (954.4691) mem 16715MB [2024-08-10 12:25:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][90/625] eta 0:04:12 lr 0.000795 wd 0.0500 time 0.4613 (0.4726) data time 0.0010 (0.0057) model time 0.4602 (0.4639) loss 3.2575 (3.0229) grad_norm 2.0204 (1.8113) loss_scale 1024.0000 (962.1099) mem 16715MB [2024-08-10 12:25:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][100/625] eta 0:04:07 lr 0.000795 wd 0.0500 time 0.4609 (0.4715) data time 0.0011 (0.0052) model time 0.4598 (0.4632) loss 3.2360 (3.0152) grad_norm 1.1407 (1.8099) loss_scale 1024.0000 (968.2376) mem 16715MB [2024-08-10 12:25:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][110/625] eta 0:04:02 lr 0.000795 wd 0.0500 time 0.4641 (0.4707) data time 0.0011 (0.0049) model time 0.4630 (0.4628) loss 2.7489 (2.9878) grad_norm 2.4610 (1.8010) loss_scale 1024.0000 (973.2613) mem 16715MB [2024-08-10 12:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][120/625] eta 0:03:57 lr 0.000795 wd 0.0500 time 0.4530 (0.4700) data time 0.0008 (0.0046) model time 0.4522 (0.4625) loss 3.2607 (2.9880) grad_norm 1.5359 (1.7835) loss_scale 1024.0000 (977.4545) mem 16715MB [2024-08-10 12:25:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][130/625] eta 0:03:52 lr 0.000795 wd 0.0500 time 0.4618 (0.4696) data time 0.0008 (0.0043) model time 0.4611 (0.4627) loss 3.4324 (2.9709) grad_norm 1.0527 (1.7526) loss_scale 1024.0000 (981.0076) mem 16715MB [2024-08-10 12:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][140/625] eta 0:03:47 lr 0.000794 wd 0.0500 time 0.4598 (0.4691) data time 0.0010 (0.0041) model time 0.4588 (0.4626) loss 3.4253 (2.9787) grad_norm 1.3769 (1.7309) loss_scale 1024.0000 (984.0567) mem 16715MB [2024-08-10 12:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][150/625] eta 0:03:42 lr 0.000794 wd 0.0500 time 0.4618 (0.4687) data time 0.0011 (0.0039) model time 0.4607 (0.4625) loss 3.0721 (2.9712) grad_norm 1.6844 (1.7233) loss_scale 1024.0000 (986.7020) mem 16715MB [2024-08-10 12:26:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][160/625] eta 0:03:37 lr 0.000794 wd 0.0500 time 0.4528 (0.4683) data time 0.0009 (0.0037) model time 0.4519 (0.4624) loss 3.3781 (2.9736) grad_norm 2.0459 (1.7340) loss_scale 1024.0000 (989.0186) mem 16715MB [2024-08-10 12:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][170/625] eta 0:03:33 lr 0.000794 wd 0.0500 time 0.4579 (0.4685) data time 0.0008 (0.0035) model time 0.4570 (0.4632) loss 4.2062 (3.0004) grad_norm 1.1615 (1.7191) loss_scale 1024.0000 (991.0643) mem 16715MB [2024-08-10 12:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][180/625] eta 0:03:28 lr 0.000794 wd 0.0500 time 0.4663 (0.4682) data time 0.0011 (0.0034) model time 0.4653 (0.4630) loss 3.2793 (3.0193) grad_norm 1.3316 (1.7108) loss_scale 1024.0000 (992.8840) mem 16715MB [2024-08-10 12:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][190/625] eta 0:03:24 lr 0.000794 wd 0.0500 time 0.4655 (0.4692) data time 0.0010 (0.0033) model time 0.4645 (0.4646) loss 2.8972 (3.0056) grad_norm 1.7070 (1.7055) loss_scale 1024.0000 (994.5131) mem 16715MB [2024-08-10 12:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][200/625] eta 0:03:19 lr 0.000794 wd 0.0500 time 0.4660 (0.4689) data time 0.0010 (0.0032) model time 0.4650 (0.4645) loss 3.0807 (3.0250) grad_norm 1.6624 (1.7105) loss_scale 1024.0000 (995.9801) mem 16715MB [2024-08-10 12:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][210/625] eta 0:03:14 lr 0.000794 wd 0.0500 time 0.4546 (0.4687) data time 0.0010 (0.0031) model time 0.4535 (0.4645) loss 3.5768 (3.0322) grad_norm 2.0634 (1.7047) loss_scale 1024.0000 (997.3081) mem 16715MB [2024-08-10 12:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][220/625] eta 0:03:09 lr 0.000794 wd 0.0500 time 0.4650 (0.4685) data time 0.0010 (0.0030) model time 0.4640 (0.4644) loss 3.0785 (3.0359) grad_norm 1.1647 (1.7058) loss_scale 1024.0000 (998.5158) mem 16715MB [2024-08-10 12:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][230/625] eta 0:03:04 lr 0.000794 wd 0.0500 time 0.4648 (0.4682) data time 0.0009 (0.0029) model time 0.4639 (0.4641) loss 3.9079 (3.0431) grad_norm 1.4796 (1.6942) loss_scale 1024.0000 (999.6190) mem 16715MB [2024-08-10 12:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][240/625] eta 0:03:00 lr 0.000793 wd 0.0500 time 0.4638 (0.4679) data time 0.0010 (0.0028) model time 0.4628 (0.4640) loss 3.1147 (3.0381) grad_norm 1.2355 (1.6875) loss_scale 1024.0000 (1000.6307) mem 16715MB [2024-08-10 12:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][250/625] eta 0:02:55 lr 0.000793 wd 0.0500 time 0.4659 (0.4677) data time 0.0007 (0.0028) model time 0.4652 (0.4639) loss 2.8771 (3.0332) grad_norm 1.3176 (1.6770) loss_scale 1024.0000 (1001.5618) mem 16715MB [2024-08-10 12:26:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][260/625] eta 0:02:50 lr 0.000793 wd 0.0500 time 0.4656 (0.4676) data time 0.0009 (0.0027) model time 0.4647 (0.4638) loss 3.7645 (3.0393) grad_norm 2.6979 (1.6865) loss_scale 1024.0000 (1002.4215) mem 16715MB [2024-08-10 12:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][270/625] eta 0:02:45 lr 0.000793 wd 0.0500 time 0.4647 (0.4675) data time 0.0011 (0.0026) model time 0.4636 (0.4638) loss 3.6502 (3.0425) grad_norm 1.3335 (1.6934) loss_scale 1024.0000 (1003.2177) mem 16715MB [2024-08-10 12:27:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][280/625] eta 0:02:41 lr 0.000793 wd 0.0500 time 0.4652 (0.4675) data time 0.0010 (0.0026) model time 0.4642 (0.4640) loss 2.8638 (3.0400) grad_norm 1.1012 (1.6822) loss_scale 1024.0000 (1003.9573) mem 16715MB [2024-08-10 12:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][290/625] eta 0:02:36 lr 0.000793 wd 0.0500 time 0.4659 (0.4675) data time 0.0008 (0.0025) model time 0.4651 (0.4641) loss 3.4271 (3.0409) grad_norm 1.7563 (1.6712) loss_scale 1024.0000 (1004.6460) mem 16715MB [2024-08-10 12:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][300/625] eta 0:02:31 lr 0.000793 wd 0.0500 time 0.4744 (0.4676) data time 0.0008 (0.0025) model time 0.4736 (0.4642) loss 2.8757 (3.0458) grad_norm 1.3234 (1.6610) loss_scale 1024.0000 (1005.2890) mem 16715MB [2024-08-10 12:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][310/625] eta 0:02:27 lr 0.000793 wd 0.0500 time 0.4706 (0.4675) data time 0.0008 (0.0024) model time 0.4698 (0.4643) loss 3.2873 (3.0493) grad_norm 1.4206 (1.6579) loss_scale 1024.0000 (1005.8907) mem 16715MB [2024-08-10 12:27:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][320/625] eta 0:02:22 lr 0.000793 wd 0.0500 time 0.4616 (0.4674) data time 0.0010 (0.0024) model time 0.4606 (0.4642) loss 3.4504 (3.0466) grad_norm 1.8320 (1.6551) loss_scale 1024.0000 (1006.4548) mem 16715MB [2024-08-10 12:27:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][330/625] eta 0:02:17 lr 0.000793 wd 0.0500 time 0.4632 (0.4674) data time 0.0010 (0.0023) model time 0.4622 (0.4643) loss 3.2275 (3.0504) grad_norm 1.2507 (1.6473) loss_scale 1024.0000 (1006.9849) mem 16715MB [2024-08-10 12:27:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][340/625] eta 0:02:13 lr 0.000792 wd 0.0500 time 0.4686 (0.4674) data time 0.0008 (0.0023) model time 0.4678 (0.4643) loss 2.2053 (3.0409) grad_norm 1.8734 (1.6602) loss_scale 1024.0000 (1007.4839) mem 16715MB [2024-08-10 12:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][350/625] eta 0:02:08 lr 0.000792 wd 0.0500 time 0.4699 (0.4674) data time 0.0008 (0.0023) model time 0.4691 (0.4644) loss 3.6240 (3.0355) grad_norm 1.9559 (1.6581) loss_scale 1024.0000 (1007.9544) mem 16715MB [2024-08-10 12:27:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][360/625] eta 0:02:03 lr 0.000792 wd 0.0500 time 0.4670 (0.4673) data time 0.0008 (0.0022) model time 0.4662 (0.4644) loss 3.6728 (3.0369) grad_norm 1.5847 (1.6681) loss_scale 1024.0000 (1008.3989) mem 16715MB [2024-08-10 12:27:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][370/625] eta 0:01:59 lr 0.000792 wd 0.0500 time 0.4625 (0.4682) data time 0.0008 (0.0022) model time 0.4617 (0.4655) loss 3.4172 (3.0333) grad_norm 1.3372 (1.6597) loss_scale 1024.0000 (1008.8194) mem 16715MB [2024-08-10 12:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][380/625] eta 0:01:54 lr 0.000792 wd 0.0500 time 0.4639 (0.4681) data time 0.0010 (0.0022) model time 0.4629 (0.4654) loss 2.5345 (3.0345) grad_norm 1.6639 (1.6557) loss_scale 1024.0000 (1009.2178) mem 16715MB [2024-08-10 12:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][390/625] eta 0:01:50 lr 0.000792 wd 0.0500 time 0.4626 (0.4684) data time 0.0007 (0.0021) model time 0.4619 (0.4658) loss 3.6074 (3.0279) grad_norm 1.7678 (1.6582) loss_scale 1024.0000 (1009.5959) mem 16715MB [2024-08-10 12:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][400/625] eta 0:01:45 lr 0.000792 wd 0.0500 time 0.4579 (0.4682) data time 0.0011 (0.0021) model time 0.4568 (0.4657) loss 3.1992 (3.0250) grad_norm 1.9854 (1.6602) loss_scale 1024.0000 (1009.9551) mem 16715MB [2024-08-10 12:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][410/625] eta 0:01:40 lr 0.000792 wd 0.0500 time 0.4600 (0.4691) data time 0.0008 (0.0021) model time 0.4591 (0.4667) loss 2.5547 (3.0262) grad_norm 1.4609 (1.6616) loss_scale 1024.0000 (1010.2968) mem 16715MB [2024-08-10 12:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][420/625] eta 0:01:36 lr 0.000792 wd 0.0500 time 0.4617 (0.4690) data time 0.0010 (0.0020) model time 0.4607 (0.4667) loss 2.7649 (3.0178) grad_norm 1.5187 (1.6558) loss_scale 1024.0000 (1010.6223) mem 16715MB [2024-08-10 12:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][430/625] eta 0:01:31 lr 0.000792 wd 0.0500 time 0.4648 (0.4690) data time 0.0012 (0.0020) model time 0.4636 (0.4666) loss 2.6801 (3.0209) grad_norm 1.1045 (1.6515) loss_scale 1024.0000 (1010.9327) mem 16715MB [2024-08-10 12:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][440/625] eta 0:01:26 lr 0.000791 wd 0.0500 time 0.4692 (0.4688) data time 0.0008 (0.0020) model time 0.4685 (0.4665) loss 3.4196 (3.0234) grad_norm 1.2117 (1.6537) loss_scale 1024.0000 (1011.2290) mem 16715MB [2024-08-10 12:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][450/625] eta 0:01:22 lr 0.000791 wd 0.0500 time 0.4632 (0.4687) data time 0.0010 (0.0020) model time 0.4622 (0.4664) loss 2.7754 (3.0239) grad_norm 2.1658 (1.6609) loss_scale 1024.0000 (1011.5122) mem 16715MB [2024-08-10 12:28:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][460/625] eta 0:01:17 lr 0.000791 wd 0.0500 time 0.4723 (0.4686) data time 0.0008 (0.0020) model time 0.4715 (0.4663) loss 3.3164 (3.0178) grad_norm 1.6855 (1.6621) loss_scale 1024.0000 (1011.7831) mem 16715MB [2024-08-10 12:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][470/625] eta 0:01:12 lr 0.000791 wd 0.0500 time 0.4658 (0.4685) data time 0.0008 (0.0020) model time 0.4650 (0.4662) loss 3.9085 (3.0182) grad_norm 1.5001 (1.6601) loss_scale 1024.0000 (1012.0425) mem 16715MB [2024-08-10 12:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][480/625] eta 0:01:07 lr 0.000791 wd 0.0500 time 0.4647 (0.4684) data time 0.0012 (0.0019) model time 0.4635 (0.4662) loss 3.3971 (3.0142) grad_norm 4.3870 (1.6680) loss_scale 1024.0000 (1012.2911) mem 16715MB [2024-08-10 12:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][490/625] eta 0:01:03 lr 0.000791 wd 0.0500 time 0.4661 (0.4684) data time 0.0008 (0.0019) model time 0.4653 (0.4662) loss 2.7658 (3.0106) grad_norm 1.2021 (1.6744) loss_scale 1024.0000 (1012.5295) mem 16715MB [2024-08-10 12:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][500/625] eta 0:00:58 lr 0.000791 wd 0.0500 time 0.4638 (0.4684) data time 0.0010 (0.0019) model time 0.4628 (0.4661) loss 3.3121 (3.0169) grad_norm 2.3417 (1.6798) loss_scale 1024.0000 (1012.7585) mem 16715MB [2024-08-10 12:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][510/625] eta 0:00:53 lr 0.000791 wd 0.0500 time 0.4728 (0.4683) data time 0.0009 (0.0019) model time 0.4719 (0.4661) loss 3.5520 (3.0182) grad_norm 1.4056 (1.6765) loss_scale 1024.0000 (1012.9785) mem 16715MB [2024-08-10 12:29:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][520/625] eta 0:00:49 lr 0.000791 wd 0.0500 time 0.4687 (0.4683) data time 0.0011 (0.0019) model time 0.4677 (0.4661) loss 2.7234 (3.0172) grad_norm 1.2002 (1.6710) loss_scale 1024.0000 (1013.1900) mem 16715MB [2024-08-10 12:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][530/625] eta 0:00:44 lr 0.000791 wd 0.0500 time 0.4625 (0.4682) data time 0.0009 (0.0019) model time 0.4616 (0.4660) loss 2.7276 (3.0185) grad_norm 1.2534 (1.6672) loss_scale 1024.0000 (1013.3936) mem 16715MB [2024-08-10 12:29:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][540/625] eta 0:00:39 lr 0.000790 wd 0.0500 time 0.4654 (0.4684) data time 0.0010 (0.0018) model time 0.4644 (0.4663) loss 3.2041 (3.0176) grad_norm 1.2751 (1.6684) loss_scale 1024.0000 (1013.5896) mem 16715MB [2024-08-10 12:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][550/625] eta 0:00:35 lr 0.000790 wd 0.0500 time 0.4644 (0.4684) data time 0.0008 (0.0018) model time 0.4636 (0.4663) loss 3.5744 (3.0221) grad_norm 1.7478 (1.6654) loss_scale 1024.0000 (1013.7786) mem 16715MB [2024-08-10 12:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][560/625] eta 0:00:30 lr 0.000790 wd 0.0500 time 0.4684 (0.4695) data time 0.0007 (0.0018) model time 0.4677 (0.4675) loss 3.0055 (3.0249) grad_norm 1.7203 (1.6635) loss_scale 1024.0000 (1013.9608) mem 16715MB [2024-08-10 12:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][570/625] eta 0:00:25 lr 0.000790 wd 0.0500 time 0.4660 (0.4694) data time 0.0010 (0.0018) model time 0.4650 (0.4674) loss 3.2713 (3.0287) grad_norm 1.7864 (1.6648) loss_scale 1024.0000 (1014.1366) mem 16715MB [2024-08-10 12:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][580/625] eta 0:00:21 lr 0.000790 wd 0.0500 time 0.4672 (0.4693) data time 0.0008 (0.0018) model time 0.4664 (0.4674) loss 3.8140 (3.0318) grad_norm 1.2401 (1.6612) loss_scale 1024.0000 (1014.3064) mem 16715MB [2024-08-10 12:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][590/625] eta 0:00:16 lr 0.000790 wd 0.0500 time 0.4618 (0.4693) data time 0.0007 (0.0018) model time 0.4611 (0.4674) loss 3.7934 (3.0335) grad_norm 1.7712 (1.6618) loss_scale 1024.0000 (1014.4704) mem 16715MB [2024-08-10 12:29:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][600/625] eta 0:00:11 lr 0.000790 wd 0.0500 time 0.4643 (0.4692) data time 0.0008 (0.0018) model time 0.4635 (0.4673) loss 3.1583 (3.0335) grad_norm 2.2617 (1.6613) loss_scale 1024.0000 (1014.6290) mem 16715MB [2024-08-10 12:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][610/625] eta 0:00:07 lr 0.000790 wd 0.0500 time 0.4656 (0.4692) data time 0.0007 (0.0018) model time 0.4648 (0.4672) loss 3.0131 (3.0281) grad_norm 3.6418 (1.6629) loss_scale 1024.0000 (1014.7823) mem 16715MB [2024-08-10 12:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [131/300][620/625] eta 0:00:02 lr 0.000790 wd 0.0500 time 0.4619 (0.4690) data time 0.0007 (0.0017) model time 0.4611 (0.4671) loss 2.5847 (3.0246) grad_norm 1.5623 (1.6613) loss_scale 1024.0000 (1014.9308) mem 16715MB [2024-08-10 12:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 131 training takes 0:04:53 [2024-08-10 12:29:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:29:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.5864 (0.5864) Acc@1 87.598 (87.598) Acc@5 98.193 (98.193) Mem 16715MB [2024-08-10 12:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.9038 (0.6981) Acc@1 77.881 (84.863) Acc@5 95.508 (97.292) Mem 16715MB [2024-08-10 12:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 1.0000 (0.8176) Acc@1 76.562 (81.762) Acc@5 94.385 (95.977) Mem 16715MB [2024-08-10 12:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.458 Acc@5 95.985 [2024-08-10 12:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-10 12:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.829 (0.829) Loss 0.4817 (0.4817) Acc@1 89.404 (89.404) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 12:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.7744 (0.6066) Acc@1 81.396 (86.626) Acc@5 96.484 (97.820) Mem 16715MB [2024-08-10 12:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.158) Loss 0.8896 (0.7125) Acc@1 77.930 (83.717) Acc@5 95.312 (96.731) Mem 16715MB [2024-08-10 12:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.405 Acc@5 96.739 [2024-08-10 12:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 12:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.41% [2024-08-10 12:29:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:30:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:30:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][0/625] eta 0:09:22 lr 0.000790 wd 0.0500 time 0.9004 (0.9004) data time 0.4855 (0.4855) model time 0.0000 (0.0000) loss 2.8132 (2.8132) grad_norm 1.3625 (1.3625) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][10/625] eta 0:05:13 lr 0.000789 wd 0.0500 time 0.4579 (0.5090) data time 0.0008 (0.0451) model time 0.0000 (0.0000) loss 3.4425 (3.1652) grad_norm 1.1937 (1.4334) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][20/625] eta 0:04:57 lr 0.000789 wd 0.0500 time 0.4608 (0.4910) data time 0.0010 (0.0241) model time 0.0000 (0.0000) loss 2.9792 (3.0231) grad_norm 1.8714 (1.6970) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][30/625] eta 0:04:46 lr 0.000789 wd 0.0500 time 0.4572 (0.4822) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 3.7920 (2.8792) grad_norm 2.4103 (1.8529) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][40/625] eta 0:04:40 lr 0.000789 wd 0.0500 time 0.4627 (0.4788) data time 0.0010 (0.0129) model time 0.0000 (0.0000) loss 3.3813 (2.9304) grad_norm 1.4455 (1.7308) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][50/625] eta 0:04:34 lr 0.000789 wd 0.0500 time 0.4525 (0.4768) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 3.5098 (2.9779) grad_norm 1.7384 (1.7070) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][60/625] eta 0:04:28 lr 0.000789 wd 0.0500 time 0.4653 (0.4749) data time 0.0008 (0.0090) model time 0.4645 (0.4640) loss 1.8887 (2.9450) grad_norm 1.5786 (1.6924) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][70/625] eta 0:04:24 lr 0.000789 wd 0.0500 time 0.4619 (0.4759) data time 0.0008 (0.0079) model time 0.4611 (0.4726) loss 3.1103 (2.9292) grad_norm 1.3554 (1.6780) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][80/625] eta 0:04:18 lr 0.000789 wd 0.0500 time 0.4634 (0.4747) data time 0.0009 (0.0071) model time 0.4625 (0.4700) loss 2.3640 (2.9382) grad_norm 2.4568 (1.8524) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][90/625] eta 0:04:13 lr 0.000789 wd 0.0500 time 0.4708 (0.4744) data time 0.0011 (0.0064) model time 0.4697 (0.4703) loss 2.6693 (2.9367) grad_norm 1.6288 (1.9314) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][100/625] eta 0:04:08 lr 0.000789 wd 0.0500 time 0.4585 (0.4740) data time 0.0012 (0.0059) model time 0.4573 (0.4700) loss 2.6206 (2.9405) grad_norm 1.1702 (1.8947) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][110/625] eta 0:04:03 lr 0.000788 wd 0.0500 time 0.4670 (0.4734) data time 0.0010 (0.0055) model time 0.4659 (0.4692) loss 2.4711 (2.9450) grad_norm 2.9316 (1.9009) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:30:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][120/625] eta 0:04:00 lr 0.000788 wd 0.0500 time 0.4565 (0.4765) data time 0.0010 (0.0052) model time 0.4555 (0.4750) loss 3.2817 (2.9501) grad_norm 1.3775 (1.8730) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][130/625] eta 0:03:56 lr 0.000788 wd 0.0500 time 0.4591 (0.4774) data time 0.0009 (0.0048) model time 0.4582 (0.4766) loss 3.6551 (2.9518) grad_norm 3.3694 (1.8801) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][140/625] eta 0:03:51 lr 0.000788 wd 0.0500 time 0.4633 (0.4764) data time 0.0008 (0.0046) model time 0.4624 (0.4751) loss 2.8648 (2.9647) grad_norm 1.7123 (1.8844) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][150/625] eta 0:03:46 lr 0.000788 wd 0.0500 time 0.4695 (0.4761) data time 0.0008 (0.0043) model time 0.4687 (0.4746) loss 2.8578 (2.9700) grad_norm 1.6253 (1.8558) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][160/625] eta 0:03:41 lr 0.000788 wd 0.0500 time 0.5504 (0.4758) data time 0.0010 (0.0041) model time 0.5494 (0.4742) loss 3.0500 (2.9636) grad_norm 1.3564 (1.8320) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][170/625] eta 0:03:36 lr 0.000788 wd 0.0500 time 0.4599 (0.4756) data time 0.0008 (0.0040) model time 0.4591 (0.4739) loss 3.4274 (2.9737) grad_norm 2.5379 (1.8184) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][180/625] eta 0:03:31 lr 0.000788 wd 0.0500 time 0.4617 (0.4751) data time 0.0010 (0.0038) model time 0.4607 (0.4732) loss 3.1065 (2.9621) grad_norm 1.5049 (1.8068) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][190/625] eta 0:03:26 lr 0.000788 wd 0.0500 time 0.4603 (0.4744) data time 0.0010 (0.0037) model time 0.4593 (0.4724) loss 3.1048 (2.9633) grad_norm 2.0942 (1.8000) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][200/625] eta 0:03:21 lr 0.000788 wd 0.0500 time 0.4622 (0.4740) data time 0.0011 (0.0035) model time 0.4612 (0.4720) loss 2.8362 (2.9610) grad_norm 14.2989 (1.8559) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][210/625] eta 0:03:16 lr 0.000787 wd 0.0500 time 0.4674 (0.4737) data time 0.0011 (0.0034) model time 0.4663 (0.4716) loss 3.0952 (2.9700) grad_norm 1.6474 (1.8542) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][220/625] eta 0:03:11 lr 0.000787 wd 0.0500 time 0.4665 (0.4735) data time 0.0010 (0.0033) model time 0.4655 (0.4714) loss 2.5982 (2.9671) grad_norm 1.6783 (1.8754) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][230/625] eta 0:03:06 lr 0.000787 wd 0.0500 time 0.4653 (0.4734) data time 0.0008 (0.0032) model time 0.4644 (0.4713) loss 3.8744 (2.9755) grad_norm 2.9694 (1.8867) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][240/625] eta 0:03:02 lr 0.000787 wd 0.0500 time 0.4668 (0.4730) data time 0.0008 (0.0031) model time 0.4660 (0.4709) loss 1.9710 (2.9745) grad_norm 1.1678 (1.8730) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:31:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][250/625] eta 0:02:57 lr 0.000787 wd 0.0500 time 0.4643 (0.4729) data time 0.0011 (0.0030) model time 0.4633 (0.4708) loss 2.9981 (2.9641) grad_norm 1.1038 (1.8540) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][260/625] eta 0:02:52 lr 0.000787 wd 0.0500 time 0.4602 (0.4726) data time 0.0008 (0.0030) model time 0.4594 (0.4704) loss 2.7467 (2.9739) grad_norm 1.2162 (1.8358) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][270/625] eta 0:02:47 lr 0.000787 wd 0.0500 time 0.4600 (0.4722) data time 0.0011 (0.0029) model time 0.4589 (0.4700) loss 2.0683 (2.9595) grad_norm 1.3341 (1.8256) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][280/625] eta 0:02:42 lr 0.000787 wd 0.0500 time 0.4649 (0.4719) data time 0.0008 (0.0028) model time 0.4641 (0.4697) loss 3.3839 (2.9621) grad_norm 7.6735 (1.8371) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][290/625] eta 0:02:37 lr 0.000787 wd 0.0500 time 0.4634 (0.4716) data time 0.0011 (0.0028) model time 0.4623 (0.4693) loss 3.3210 (2.9616) grad_norm 1.9121 (1.8468) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][300/625] eta 0:02:33 lr 0.000787 wd 0.0500 time 0.4641 (0.4714) data time 0.0008 (0.0027) model time 0.4633 (0.4691) loss 3.0587 (2.9509) grad_norm 1.4031 (1.8332) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][310/625] eta 0:02:28 lr 0.000786 wd 0.0500 time 0.4635 (0.4712) data time 0.0011 (0.0027) model time 0.4624 (0.4689) loss 3.5868 (2.9600) grad_norm 2.2273 (1.8274) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][320/625] eta 0:02:23 lr 0.000786 wd 0.0500 time 0.4623 (0.4709) data time 0.0010 (0.0026) model time 0.4613 (0.4686) loss 3.2747 (2.9608) grad_norm 1.2996 (1.8142) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][330/625] eta 0:02:18 lr 0.000786 wd 0.0500 time 0.4616 (0.4706) data time 0.0010 (0.0026) model time 0.4606 (0.4683) loss 2.6422 (2.9599) grad_norm 1.7657 (1.8091) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][340/625] eta 0:02:14 lr 0.000786 wd 0.0500 time 0.4605 (0.4703) data time 0.0009 (0.0025) model time 0.4596 (0.4680) loss 3.3922 (2.9598) grad_norm 1.6323 (1.8004) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][350/625] eta 0:02:09 lr 0.000786 wd 0.0500 time 0.4614 (0.4706) data time 0.0011 (0.0025) model time 0.4603 (0.4684) loss 3.5039 (2.9657) grad_norm 1.7733 (1.8407) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][360/625] eta 0:02:04 lr 0.000786 wd 0.0500 time 0.4654 (0.4704) data time 0.0008 (0.0025) model time 0.4646 (0.4682) loss 3.2835 (2.9708) grad_norm 1.3836 (1.8294) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:32:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][370/625] eta 0:01:59 lr 0.000786 wd 0.0500 time 0.4650 (0.4703) data time 0.0011 (0.0024) model time 0.4639 (0.4681) loss 3.2830 (2.9724) grad_norm 1.5553 (1.8166) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][380/625] eta 0:01:55 lr 0.000786 wd 0.0500 time 0.4671 (0.4701) data time 0.0008 (0.0024) model time 0.4663 (0.4680) loss 2.2946 (2.9782) grad_norm 1.2895 (1.8117) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][390/625] eta 0:01:50 lr 0.000786 wd 0.0500 time 0.4627 (0.4699) data time 0.0010 (0.0023) model time 0.4616 (0.4678) loss 3.2306 (2.9813) grad_norm 1.4384 (1.8113) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][400/625] eta 0:01:45 lr 0.000785 wd 0.0500 time 0.4606 (0.4701) data time 0.0010 (0.0023) model time 0.4596 (0.4680) loss 3.0354 (2.9865) grad_norm 1.2788 (1.8092) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][410/625] eta 0:01:41 lr 0.000785 wd 0.0500 time 0.4633 (0.4699) data time 0.0009 (0.0023) model time 0.4625 (0.4679) loss 2.2861 (2.9820) grad_norm 1.8056 (1.8002) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][420/625] eta 0:01:36 lr 0.000785 wd 0.0500 time 0.4800 (0.4698) data time 0.0011 (0.0023) model time 0.4789 (0.4677) loss 3.2676 (2.9763) grad_norm 2.7227 (1.8022) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][430/625] eta 0:01:31 lr 0.000785 wd 0.0500 time 0.4678 (0.4697) data time 0.0007 (0.0022) model time 0.4671 (0.4676) loss 3.3626 (2.9806) grad_norm 1.0540 (1.7928) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][440/625] eta 0:01:26 lr 0.000785 wd 0.0500 time 0.4665 (0.4696) data time 0.0010 (0.0022) model time 0.4654 (0.4675) loss 3.3794 (2.9882) grad_norm 1.5759 (1.7856) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][450/625] eta 0:01:22 lr 0.000785 wd 0.0500 time 0.7178 (0.4700) data time 0.0008 (0.0022) model time 0.7170 (0.4680) loss 2.2582 (2.9920) grad_norm 1.2174 (1.7792) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][460/625] eta 0:01:17 lr 0.000785 wd 0.0500 time 0.4712 (0.4699) data time 0.0008 (0.0022) model time 0.4704 (0.4679) loss 3.4971 (2.9955) grad_norm 1.6361 (1.7762) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][470/625] eta 0:01:12 lr 0.000785 wd 0.0500 time 0.4644 (0.4698) data time 0.0011 (0.0021) model time 0.4633 (0.4678) loss 2.3445 (2.9930) grad_norm 1.5360 (1.7734) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][480/625] eta 0:01:08 lr 0.000785 wd 0.0500 time 0.4636 (0.4696) data time 0.0007 (0.0021) model time 0.4629 (0.4677) loss 2.8960 (2.9953) grad_norm 1.7319 (1.7670) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][490/625] eta 0:01:03 lr 0.000785 wd 0.0500 time 0.4810 (0.4696) data time 0.0008 (0.0021) model time 0.4802 (0.4677) loss 2.7416 (2.9925) grad_norm 1.4460 (1.7656) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][500/625] eta 0:00:58 lr 0.000784 wd 0.0500 time 0.4632 (0.4699) data time 0.0008 (0.0021) model time 0.4624 (0.4680) loss 3.3663 (2.9912) grad_norm 1.0052 (1.7621) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][510/625] eta 0:00:54 lr 0.000784 wd 0.0500 time 0.4642 (0.4698) data time 0.0010 (0.0020) model time 0.4632 (0.4680) loss 3.2164 (2.9922) grad_norm 1.7871 (1.7572) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][520/625] eta 0:00:49 lr 0.000784 wd 0.0500 time 0.4623 (0.4698) data time 0.0011 (0.0020) model time 0.4612 (0.4679) loss 2.7601 (2.9943) grad_norm 1.1356 (1.7499) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][530/625] eta 0:00:44 lr 0.000784 wd 0.0500 time 0.4723 (0.4698) data time 0.0008 (0.0020) model time 0.4716 (0.4679) loss 2.3307 (2.9964) grad_norm 2.9523 (1.7658) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][540/625] eta 0:00:39 lr 0.000784 wd 0.0500 time 0.4591 (0.4696) data time 0.0011 (0.0020) model time 0.4580 (0.4678) loss 2.3461 (2.9933) grad_norm 1.5613 (1.7659) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][550/625] eta 0:00:35 lr 0.000784 wd 0.0500 time 0.4573 (0.4695) data time 0.0010 (0.0020) model time 0.4563 (0.4677) loss 3.8189 (2.9954) grad_norm 1.2022 (1.7608) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][560/625] eta 0:00:30 lr 0.000784 wd 0.0500 time 0.4692 (0.4694) data time 0.0008 (0.0020) model time 0.4684 (0.4676) loss 2.1324 (2.9973) grad_norm 1.6994 (1.7569) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][570/625] eta 0:00:25 lr 0.000784 wd 0.0500 time 0.4724 (0.4693) data time 0.0011 (0.0019) model time 0.4714 (0.4675) loss 3.0706 (3.0012) grad_norm 1.6833 (1.7665) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][580/625] eta 0:00:21 lr 0.000784 wd 0.0500 time 0.4686 (0.4693) data time 0.0008 (0.0019) model time 0.4678 (0.4675) loss 1.9115 (3.0010) grad_norm 1.6790 (1.7699) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][590/625] eta 0:00:16 lr 0.000784 wd 0.0500 time 0.4644 (0.4694) data time 0.0010 (0.0019) model time 0.4634 (0.4676) loss 2.7852 (3.0024) grad_norm 2.4608 (1.7719) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][600/625] eta 0:00:11 lr 0.000783 wd 0.0500 time 0.4685 (0.4693) data time 0.0010 (0.0019) model time 0.4674 (0.4675) loss 2.2309 (3.0019) grad_norm 1.7571 (1.7722) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][610/625] eta 0:00:07 lr 0.000783 wd 0.0500 time 0.4610 (0.4693) data time 0.0007 (0.0019) model time 0.4603 (0.4675) loss 2.9616 (3.0020) grad_norm 1.5468 (1.7721) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [132/300][620/625] eta 0:00:02 lr 0.000783 wd 0.0500 time 0.4604 (0.4691) data time 0.0005 (0.0019) model time 0.4599 (0.4674) loss 3.5721 (3.0018) grad_norm 1.5738 (1.7671) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:34:54 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 132 training takes 0:04:53 [2024-08-10 12:34:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:34:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:34:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5425 (0.5425) Acc@1 87.842 (87.842) Acc@5 98.389 (98.389) Mem 16715MB [2024-08-10 12:34:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8716 (0.6660) Acc@1 78.516 (84.735) Acc@5 96.094 (97.363) Mem 16715MB [2024-08-10 12:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9810 (0.7867) Acc@1 75.635 (81.845) Acc@5 94.580 (96.087) Mem 16715MB [2024-08-10 12:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.644 Acc@5 96.081 [2024-08-10 12:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-10 12:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.4824 (0.4824) Acc@1 89.453 (89.453) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 12:35:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.7759 (0.6061) Acc@1 81.543 (86.634) Acc@5 96.484 (97.816) Mem 16715MB [2024-08-10 12:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.159) Loss 0.8887 (0.7122) Acc@1 77.783 (83.710) Acc@5 95.410 (96.745) Mem 16715MB [2024-08-10 12:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.411 Acc@5 96.749 [2024-08-10 12:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 12:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.41% [2024-08-10 12:35:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:35:04 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:35:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][0/625] eta 0:08:21 lr 0.000783 wd 0.0500 time 0.8025 (0.8025) data time 0.3863 (0.3863) model time 0.0000 (0.0000) loss 3.2207 (3.2207) grad_norm 1.3029 (1.3029) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][10/625] eta 0:05:04 lr 0.000783 wd 0.0500 time 0.4649 (0.4955) data time 0.0011 (0.0361) model time 0.0000 (0.0000) loss 3.0518 (3.0329) grad_norm 1.8367 (1.6306) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][20/625] eta 0:04:51 lr 0.000783 wd 0.0500 time 0.4618 (0.4815) data time 0.0011 (0.0194) model time 0.0000 (0.0000) loss 3.1023 (3.0298) grad_norm 1.5085 (1.6323) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][30/625] eta 0:04:48 lr 0.000783 wd 0.0500 time 0.4690 (0.4842) data time 0.0012 (0.0135) model time 0.0000 (0.0000) loss 3.3147 (3.0157) grad_norm 1.1479 (1.8441) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][40/625] eta 0:04:40 lr 0.000783 wd 0.0500 time 0.4665 (0.4800) data time 0.0009 (0.0104) model time 0.0000 (0.0000) loss 1.6097 (2.9678) grad_norm 1.7023 (1.9391) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][50/625] eta 0:04:34 lr 0.000783 wd 0.0500 time 0.4647 (0.4771) data time 0.0008 (0.0086) model time 0.0000 (0.0000) loss 3.0420 (2.9738) grad_norm 1.6195 (1.8366) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][60/625] eta 0:04:28 lr 0.000783 wd 0.0500 time 0.4723 (0.4750) data time 0.0011 (0.0074) model time 0.4712 (0.4630) loss 3.0176 (2.9538) grad_norm 1.5396 (1.8076) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][70/625] eta 0:04:22 lr 0.000782 wd 0.0500 time 0.4619 (0.4734) data time 0.0010 (0.0065) model time 0.4609 (0.4628) loss 2.4014 (2.9009) grad_norm 1.3016 (1.7575) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][80/625] eta 0:04:17 lr 0.000782 wd 0.0500 time 0.4611 (0.4721) data time 0.0008 (0.0058) model time 0.4602 (0.4626) loss 3.2477 (2.9176) grad_norm 1.2324 (1.7171) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][90/625] eta 0:04:13 lr 0.000782 wd 0.0500 time 0.4606 (0.4737) data time 0.0009 (0.0053) model time 0.4598 (0.4683) loss 2.5376 (2.8944) grad_norm 8.1773 (1.7812) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][100/625] eta 0:04:08 lr 0.000782 wd 0.0500 time 0.4615 (0.4727) data time 0.0010 (0.0049) model time 0.4605 (0.4671) loss 2.5182 (2.8913) grad_norm 2.0633 (1.8285) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:35:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][110/625] eta 0:04:03 lr 0.000782 wd 0.0500 time 0.4623 (0.4734) data time 0.0010 (0.0045) model time 0.4612 (0.4692) loss 2.9723 (2.9041) grad_norm 1.6136 (1.8088) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][120/625] eta 0:03:58 lr 0.000782 wd 0.0500 time 0.4671 (0.4725) data time 0.0009 (0.0042) model time 0.4662 (0.4681) loss 2.0382 (2.9206) grad_norm 1.2832 (1.7834) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][130/625] eta 0:03:53 lr 0.000782 wd 0.0500 time 0.4638 (0.4716) data time 0.0011 (0.0040) model time 0.4627 (0.4671) loss 3.0609 (2.9200) grad_norm 1.4631 (1.7765) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][140/625] eta 0:03:48 lr 0.000782 wd 0.0500 time 0.4631 (0.4709) data time 0.0011 (0.0038) model time 0.4621 (0.4663) loss 2.7484 (2.9477) grad_norm 2.0174 (1.7724) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][150/625] eta 0:03:43 lr 0.000782 wd 0.0500 time 0.4640 (0.4702) data time 0.0008 (0.0036) model time 0.4631 (0.4656) loss 3.1161 (2.9667) grad_norm 10.5126 (1.8105) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][160/625] eta 0:03:38 lr 0.000782 wd 0.0500 time 0.4650 (0.4698) data time 0.0011 (0.0034) model time 0.4639 (0.4653) loss 3.1084 (2.9892) grad_norm 3.2053 (1.8486) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][170/625] eta 0:03:33 lr 0.000781 wd 0.0500 time 0.4638 (0.4694) data time 0.0009 (0.0033) model time 0.4629 (0.4651) loss 2.8491 (2.9880) grad_norm 1.7445 (1.8521) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][180/625] eta 0:03:28 lr 0.000781 wd 0.0500 time 0.4590 (0.4691) data time 0.0010 (0.0032) model time 0.4580 (0.4649) loss 2.2863 (2.9941) grad_norm 1.4326 (1.8427) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][190/625] eta 0:03:23 lr 0.000781 wd 0.0500 time 0.4728 (0.4689) data time 0.0010 (0.0031) model time 0.4718 (0.4648) loss 2.6216 (2.9858) grad_norm 1.4054 (1.8301) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][200/625] eta 0:03:19 lr 0.000781 wd 0.0500 time 0.4646 (0.4687) data time 0.0010 (0.0030) model time 0.4635 (0.4647) loss 3.1595 (2.9914) grad_norm 1.6055 (1.8208) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][210/625] eta 0:03:14 lr 0.000781 wd 0.0500 time 0.4614 (0.4684) data time 0.0010 (0.0029) model time 0.4605 (0.4646) loss 2.0112 (2.9944) grad_norm 1.2780 (1.8114) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][220/625] eta 0:03:10 lr 0.000781 wd 0.0500 time 0.4646 (0.4692) data time 0.0010 (0.0028) model time 0.4637 (0.4658) loss 2.9360 (3.0019) grad_norm 1.5437 (1.7901) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][230/625] eta 0:03:05 lr 0.000781 wd 0.0500 time 0.4646 (0.4690) data time 0.0007 (0.0027) model time 0.4639 (0.4657) loss 2.3952 (3.0003) grad_norm 1.1381 (1.7786) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][240/625] eta 0:03:00 lr 0.000781 wd 0.0500 time 0.4694 (0.4689) data time 0.0008 (0.0026) model time 0.4687 (0.4656) loss 3.6902 (2.9964) grad_norm 1.4044 (1.7715) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][250/625] eta 0:02:55 lr 0.000781 wd 0.0500 time 0.4677 (0.4688) data time 0.0008 (0.0026) model time 0.4669 (0.4656) loss 3.5145 (2.9893) grad_norm 1.4652 (1.7729) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][260/625] eta 0:02:51 lr 0.000781 wd 0.0500 time 0.4667 (0.4686) data time 0.0008 (0.0025) model time 0.4659 (0.4655) loss 3.2851 (2.9918) grad_norm 1.6177 (1.7850) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][270/625] eta 0:02:46 lr 0.000780 wd 0.0500 time 0.4602 (0.4685) data time 0.0010 (0.0024) model time 0.4591 (0.4654) loss 2.7534 (2.9833) grad_norm 1.4902 (1.7777) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][280/625] eta 0:02:41 lr 0.000780 wd 0.0500 time 0.4660 (0.4684) data time 0.0008 (0.0024) model time 0.4652 (0.4654) loss 3.0391 (2.9868) grad_norm 1.6150 (1.7639) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][290/625] eta 0:02:36 lr 0.000780 wd 0.0500 time 0.4611 (0.4682) data time 0.0010 (0.0023) model time 0.4601 (0.4653) loss 2.7465 (2.9907) grad_norm 1.9668 (1.7673) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][300/625] eta 0:02:32 lr 0.000780 wd 0.0500 time 0.4678 (0.4682) data time 0.0008 (0.0023) model time 0.4670 (0.4653) loss 3.4451 (2.9938) grad_norm 1.3973 (1.7628) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][310/625] eta 0:02:27 lr 0.000780 wd 0.0500 time 0.4695 (0.4681) data time 0.0008 (0.0023) model time 0.4687 (0.4653) loss 3.4762 (2.9998) grad_norm 1.5775 (1.7622) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][320/625] eta 0:02:22 lr 0.000780 wd 0.0500 time 0.4655 (0.4681) data time 0.0008 (0.0022) model time 0.4647 (0.4653) loss 3.6482 (3.0061) grad_norm 2.0669 (1.7648) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][330/625] eta 0:02:18 lr 0.000780 wd 0.0500 time 0.4610 (0.4685) data time 0.0009 (0.0022) model time 0.4601 (0.4659) loss 2.3567 (3.0039) grad_norm 1.5912 (1.7580) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][340/625] eta 0:02:13 lr 0.000780 wd 0.0500 time 0.4656 (0.4684) data time 0.0009 (0.0022) model time 0.4647 (0.4659) loss 2.7732 (2.9999) grad_norm 1.3900 (1.7492) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][350/625] eta 0:02:08 lr 0.000780 wd 0.0500 time 0.4669 (0.4684) data time 0.0010 (0.0021) model time 0.4659 (0.4658) loss 3.0678 (3.0049) grad_norm 1.7297 (1.7430) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][360/625] eta 0:02:04 lr 0.000780 wd 0.0500 time 0.4642 (0.4683) data time 0.0011 (0.0021) model time 0.4631 (0.4658) loss 3.0196 (3.0044) grad_norm 1.4097 (1.7357) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][370/625] eta 0:01:59 lr 0.000779 wd 0.0500 time 0.4607 (0.4682) data time 0.0009 (0.0021) model time 0.4597 (0.4657) loss 3.0712 (3.0064) grad_norm 2.0866 (1.7304) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][380/625] eta 0:01:54 lr 0.000779 wd 0.0500 time 0.4676 (0.4681) data time 0.0008 (0.0020) model time 0.4668 (0.4657) loss 2.2015 (3.0074) grad_norm 1.7527 (1.7535) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][390/625] eta 0:01:49 lr 0.000779 wd 0.0500 time 0.4711 (0.4681) data time 0.0009 (0.0020) model time 0.4702 (0.4657) loss 3.5745 (3.0100) grad_norm 1.5430 (1.7503) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][400/625] eta 0:01:45 lr 0.000779 wd 0.0500 time 0.4627 (0.4681) data time 0.0008 (0.0020) model time 0.4619 (0.4658) loss 3.4923 (3.0166) grad_norm 1.2282 (1.7393) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][410/625] eta 0:01:40 lr 0.000779 wd 0.0500 time 0.4617 (0.4681) data time 0.0010 (0.0020) model time 0.4607 (0.4658) loss 3.0064 (3.0220) grad_norm 1.9339 (1.7390) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][420/625] eta 0:01:35 lr 0.000779 wd 0.0500 time 0.4622 (0.4680) data time 0.0008 (0.0019) model time 0.4615 (0.4657) loss 2.5876 (3.0239) grad_norm 1.6552 (1.7308) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][430/625] eta 0:01:31 lr 0.000779 wd 0.0500 time 0.4721 (0.4684) data time 0.0010 (0.0019) model time 0.4710 (0.4662) loss 2.8522 (3.0178) grad_norm 1.9530 (1.7256) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][440/625] eta 0:01:26 lr 0.000779 wd 0.0500 time 0.4659 (0.4689) data time 0.0008 (0.0019) model time 0.4651 (0.4667) loss 2.9106 (3.0163) grad_norm 1.9642 (1.7222) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][450/625] eta 0:01:22 lr 0.000779 wd 0.0500 time 0.5220 (0.4689) data time 0.0008 (0.0019) model time 0.5213 (0.4668) loss 3.5628 (3.0187) grad_norm 1.3328 (1.7211) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][460/625] eta 0:01:17 lr 0.000779 wd 0.0500 time 0.4638 (0.4689) data time 0.0008 (0.0019) model time 0.4631 (0.4669) loss 2.3650 (3.0124) grad_norm 2.1406 (1.7215) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][470/625] eta 0:01:12 lr 0.000778 wd 0.0500 time 0.4081 (0.4693) data time 0.0011 (0.0018) model time 0.4070 (0.4673) loss 2.4626 (3.0099) grad_norm 1.3864 (1.7290) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][480/625] eta 0:01:08 lr 0.000778 wd 0.0500 time 0.4677 (0.4694) data time 0.0010 (0.0018) model time 0.4667 (0.4674) loss 3.1239 (3.0076) grad_norm 1.4305 (1.7292) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][490/625] eta 0:01:03 lr 0.000778 wd 0.0500 time 0.4648 (0.4693) data time 0.0010 (0.0018) model time 0.4637 (0.4674) loss 2.9905 (3.0077) grad_norm 1.2021 (1.7258) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:38:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][500/625] eta 0:00:58 lr 0.000778 wd 0.0500 time 0.4596 (0.4692) data time 0.0011 (0.0018) model time 0.4585 (0.4673) loss 2.2725 (3.0057) grad_norm 1.4852 (1.7218) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][510/625] eta 0:00:53 lr 0.000778 wd 0.0500 time 0.4713 (0.4691) data time 0.0012 (0.0018) model time 0.4701 (0.4672) loss 3.3439 (3.0061) grad_norm 1.6070 (1.7187) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][520/625] eta 0:00:49 lr 0.000778 wd 0.0500 time 0.4679 (0.4690) data time 0.0008 (0.0018) model time 0.4671 (0.4671) loss 3.2727 (3.0093) grad_norm 1.8825 (1.7458) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][530/625] eta 0:00:44 lr 0.000778 wd 0.0500 time 0.4602 (0.4690) data time 0.0008 (0.0018) model time 0.4594 (0.4671) loss 2.6586 (3.0054) grad_norm 1.9047 (1.7468) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][540/625] eta 0:00:39 lr 0.000778 wd 0.0500 time 0.4696 (0.4690) data time 0.0009 (0.0018) model time 0.4687 (0.4671) loss 3.5593 (3.0065) grad_norm 1.0564 (1.7560) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][550/625] eta 0:00:35 lr 0.000778 wd 0.0500 time 0.4712 (0.4690) data time 0.0012 (0.0018) model time 0.4700 (0.4671) loss 3.0210 (3.0055) grad_norm 2.2323 (1.7593) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][560/625] eta 0:00:30 lr 0.000777 wd 0.0500 time 0.4667 (0.4690) data time 0.0008 (0.0017) model time 0.4658 (0.4671) loss 3.3914 (3.0067) grad_norm 1.3502 (1.7558) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][570/625] eta 0:00:25 lr 0.000777 wd 0.0500 time 0.4603 (0.4689) data time 0.0010 (0.0017) model time 0.4593 (0.4670) loss 3.5058 (3.0126) grad_norm 1.4028 (1.7581) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][580/625] eta 0:00:21 lr 0.000777 wd 0.0500 time 0.4636 (0.4689) data time 0.0010 (0.0017) model time 0.4626 (0.4670) loss 3.4829 (3.0168) grad_norm 1.3824 (1.7538) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][590/625] eta 0:00:16 lr 0.000777 wd 0.0500 time 0.4580 (0.4691) data time 0.0008 (0.0017) model time 0.4572 (0.4673) loss 3.0751 (3.0167) grad_norm 2.5804 (1.7510) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][600/625] eta 0:00:11 lr 0.000777 wd 0.0500 time 0.4688 (0.4691) data time 0.0008 (0.0017) model time 0.4680 (0.4673) loss 3.4481 (3.0183) grad_norm 1.1589 (1.7513) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][610/625] eta 0:00:07 lr 0.000777 wd 0.0500 time 0.4627 (0.4690) data time 0.0007 (0.0017) model time 0.4620 (0.4672) loss 2.8948 (3.0199) grad_norm 1.0955 (1.7450) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [133/300][620/625] eta 0:00:02 lr 0.000777 wd 0.0500 time 0.4630 (0.4691) data time 0.0005 (0.0017) model time 0.4625 (0.4674) loss 3.1435 (3.0194) grad_norm 1.5127 (1.7408) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 133 training takes 0:04:53 [2024-08-10 12:39:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:39:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5361 (0.5361) Acc@1 89.258 (89.258) Acc@5 98.389 (98.389) Mem 16715MB [2024-08-10 12:40:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8657 (0.6748) Acc@1 80.322 (85.449) Acc@5 95.361 (97.421) Mem 16715MB [2024-08-10 12:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 1.0098 (0.7990) Acc@1 76.025 (82.161) Acc@5 94.629 (96.080) Mem 16715MB [2024-08-10 12:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.864 Acc@5 96.077 [2024-08-10 12:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 12:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.86% [2024-08-10 12:40:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 12:40:04 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 12:40:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.4814 (0.4814) Acc@1 89.453 (89.453) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 12:40:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.162) Loss 0.7759 (0.6060) Acc@1 81.445 (86.688) Acc@5 96.582 (97.816) Mem 16715MB [2024-08-10 12:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.8892 (0.7118) Acc@1 77.930 (83.729) Acc@5 95.410 (96.729) Mem 16715MB [2024-08-10 12:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.411 Acc@5 96.741 [2024-08-10 12:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 12:40:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][0/625] eta 0:13:04 lr 0.000777 wd 0.0500 time 1.2544 (1.2544) data time 0.6612 (0.6612) model time 0.0000 (0.0000) loss 3.0030 (3.0030) grad_norm 1.1788 (1.1788) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][10/625] eta 0:05:30 lr 0.000777 wd 0.0500 time 0.4630 (0.5374) data time 0.0010 (0.0611) model time 0.0000 (0.0000) loss 3.3098 (3.2555) grad_norm 2.1705 (2.4902) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][20/625] eta 0:05:04 lr 0.000777 wd 0.0500 time 0.4676 (0.5032) data time 0.0011 (0.0325) model time 0.0000 (0.0000) loss 3.2005 (3.2473) grad_norm 1.4497 (2.0206) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][30/625] eta 0:04:51 lr 0.000777 wd 0.0500 time 0.4636 (0.4906) data time 0.0008 (0.0223) model time 0.0000 (0.0000) loss 3.6074 (3.1992) grad_norm 1.1388 (1.7801) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][40/625] eta 0:04:45 lr 0.000776 wd 0.0500 time 0.4640 (0.4889) data time 0.0007 (0.0171) model time 0.0000 (0.0000) loss 2.4063 (3.1618) grad_norm 2.1977 (1.7348) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][50/625] eta 0:04:38 lr 0.000776 wd 0.0500 time 0.4660 (0.4847) data time 0.0009 (0.0140) model time 0.0000 (0.0000) loss 3.1286 (3.1357) grad_norm 1.2563 (1.7674) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][60/625] eta 0:04:32 lr 0.000776 wd 0.0500 time 0.4680 (0.4815) data time 0.0011 (0.0119) model time 0.4670 (0.4642) loss 2.9084 (3.0947) grad_norm 1.3316 (1.6791) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][70/625] eta 0:04:25 lr 0.000776 wd 0.0500 time 0.4627 (0.4791) data time 0.0008 (0.0103) model time 0.4619 (0.4637) loss 3.5022 (3.1255) grad_norm 2.6833 (1.7431) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][80/625] eta 0:04:20 lr 0.000776 wd 0.0500 time 0.4613 (0.4772) data time 0.0008 (0.0092) model time 0.4606 (0.4635) loss 3.2691 (3.1442) grad_norm 1.7217 (1.7472) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][90/625] eta 0:04:14 lr 0.000776 wd 0.0500 time 0.4622 (0.4757) data time 0.0010 (0.0083) model time 0.4612 (0.4632) loss 3.1946 (3.1283) grad_norm 1.5159 (1.7316) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][100/625] eta 0:04:09 lr 0.000776 wd 0.0500 time 0.4608 (0.4744) data time 0.0010 (0.0076) model time 0.4598 (0.4628) loss 2.4534 (3.0938) grad_norm 1.3750 (1.7257) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:41:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][110/625] eta 0:04:03 lr 0.000776 wd 0.0500 time 0.4675 (0.4735) data time 0.0009 (0.0070) model time 0.4665 (0.4630) loss 2.2603 (3.0667) grad_norm 1.6906 (1.7034) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:41:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][120/625] eta 0:03:58 lr 0.000776 wd 0.0500 time 0.4658 (0.4731) data time 0.0007 (0.0065) model time 0.4650 (0.4635) loss 3.1824 (3.0624) grad_norm 1.8305 (1.6866) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:41:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][130/625] eta 0:03:54 lr 0.000776 wd 0.0500 time 0.4655 (0.4745) data time 0.0010 (0.0061) model time 0.4645 (0.4670) loss 3.0762 (3.0636) grad_norm 1.2517 (1.6552) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 12:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][140/625] eta 0:03:50 lr 0.000775 wd 0.0500 time 0.4673 (0.4756) data time 0.0011 (0.0057) model time 0.4662 (0.4694) loss 2.8559 (3.0545) grad_norm 1.3706 (1.6401) loss_scale 2048.0000 (1060.3121) mem 16715MB [2024-08-10 12:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][150/625] eta 0:03:45 lr 0.000775 wd 0.0500 time 0.4636 (0.4749) data time 0.0011 (0.0054) model time 0.4625 (0.4689) loss 2.2650 (3.0419) grad_norm 1.3416 (1.6423) loss_scale 2048.0000 (1125.7219) mem 16715MB [2024-08-10 12:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][160/625] eta 0:03:40 lr 0.000775 wd 0.0500 time 0.4602 (0.4744) data time 0.0009 (0.0051) model time 0.4593 (0.4686) loss 2.7850 (3.0353) grad_norm 1.5665 (1.6396) loss_scale 2048.0000 (1183.0062) mem 16715MB [2024-08-10 12:41:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][170/625] eta 0:03:35 lr 0.000775 wd 0.0500 time 0.4624 (0.4739) data time 0.0011 (0.0049) model time 0.4613 (0.4682) loss 3.1029 (3.0271) grad_norm 1.2308 (1.6325) loss_scale 2048.0000 (1233.5906) mem 16715MB [2024-08-10 12:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][180/625] eta 0:03:30 lr 0.000775 wd 0.0500 time 0.4688 (0.4735) data time 0.0008 (0.0047) model time 0.4680 (0.4681) loss 2.8323 (3.0139) grad_norm 1.5487 (1.6237) loss_scale 2048.0000 (1278.5856) mem 16715MB [2024-08-10 12:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][190/625] eta 0:03:25 lr 0.000775 wd 0.0500 time 0.4627 (0.4731) data time 0.0009 (0.0045) model time 0.4618 (0.4678) loss 3.3430 (3.0119) grad_norm 2.5234 (1.6228) loss_scale 2048.0000 (1318.8691) mem 16715MB [2024-08-10 12:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][200/625] eta 0:03:20 lr 0.000775 wd 0.0500 time 0.4643 (0.4728) data time 0.0007 (0.0044) model time 0.4636 (0.4677) loss 2.5024 (3.0031) grad_norm 2.3885 (1.6243) loss_scale 2048.0000 (1355.1443) mem 16715MB [2024-08-10 12:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][210/625] eta 0:03:16 lr 0.000775 wd 0.0500 time 0.4660 (0.4725) data time 0.0010 (0.0042) model time 0.4650 (0.4675) loss 3.2545 (3.0076) grad_norm 1.6358 (1.6191) loss_scale 2048.0000 (1387.9810) mem 16715MB [2024-08-10 12:41:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][220/625] eta 0:03:11 lr 0.000775 wd 0.0500 time 0.4652 (0.4721) data time 0.0011 (0.0041) model time 0.4641 (0.4673) loss 3.3037 (2.9894) grad_norm 2.2262 (1.6149) loss_scale 2048.0000 (1417.8462) mem 16715MB [2024-08-10 12:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][230/625] eta 0:03:06 lr 0.000774 wd 0.0500 time 0.4620 (0.4718) data time 0.0011 (0.0039) model time 0.4609 (0.4670) loss 3.1858 (2.9842) grad_norm 1.5365 (1.6138) loss_scale 2048.0000 (1445.1255) mem 16715MB [2024-08-10 12:42:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][240/625] eta 0:03:01 lr 0.000774 wd 0.0500 time 0.4614 (0.4714) data time 0.0007 (0.0038) model time 0.4607 (0.4668) loss 2.1312 (2.9868) grad_norm 1.0685 (1.6159) loss_scale 2048.0000 (1470.1411) mem 16715MB [2024-08-10 12:42:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][250/625] eta 0:02:56 lr 0.000774 wd 0.0500 time 0.4605 (0.4710) data time 0.0008 (0.0037) model time 0.4598 (0.4664) loss 2.6093 (2.9899) grad_norm 1.3684 (1.6206) loss_scale 2048.0000 (1493.1633) mem 16715MB [2024-08-10 12:42:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][260/625] eta 0:02:51 lr 0.000774 wd 0.0500 time 0.4668 (0.4708) data time 0.0007 (0.0036) model time 0.4660 (0.4664) loss 3.5034 (2.9978) grad_norm 1.4232 (1.6311) loss_scale 2048.0000 (1514.4215) mem 16715MB [2024-08-10 12:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][270/625] eta 0:02:47 lr 0.000774 wd 0.0500 time 0.4685 (0.4707) data time 0.0010 (0.0035) model time 0.4675 (0.4664) loss 3.0001 (2.9948) grad_norm 1.5000 (1.6259) loss_scale 2048.0000 (1534.1107) mem 16715MB [2024-08-10 12:42:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][280/625] eta 0:02:42 lr 0.000774 wd 0.0500 time 0.4608 (0.4705) data time 0.0011 (0.0034) model time 0.4597 (0.4663) loss 2.8780 (2.9888) grad_norm 1.3524 (1.6261) loss_scale 2048.0000 (1552.3986) mem 16715MB [2024-08-10 12:42:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][290/625] eta 0:02:37 lr 0.000774 wd 0.0500 time 0.4585 (0.4703) data time 0.0008 (0.0033) model time 0.4577 (0.4661) loss 3.5042 (2.9990) grad_norm 1.8562 (1.6235) loss_scale 2048.0000 (1569.4296) mem 16715MB [2024-08-10 12:42:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][300/625] eta 0:02:32 lr 0.000774 wd 0.0500 time 0.4609 (0.4700) data time 0.0009 (0.0032) model time 0.4600 (0.4660) loss 3.1016 (3.0038) grad_norm 2.3180 (1.6263) loss_scale 2048.0000 (1585.3289) mem 16715MB [2024-08-10 12:42:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][310/625] eta 0:02:27 lr 0.000774 wd 0.0500 time 0.4631 (0.4698) data time 0.0008 (0.0032) model time 0.4623 (0.4658) loss 3.2121 (3.0084) grad_norm 1.4197 (1.6305) loss_scale 2048.0000 (1600.2058) mem 16715MB [2024-08-10 12:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][320/625] eta 0:02:23 lr 0.000774 wd 0.0500 time 0.4617 (0.4702) data time 0.0010 (0.0031) model time 0.4607 (0.4664) loss 3.1947 (3.0119) grad_norm 1.3066 (1.6260) loss_scale 2048.0000 (1614.1558) mem 16715MB [2024-08-10 12:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][330/625] eta 0:02:18 lr 0.000773 wd 0.0500 time 0.4648 (0.4700) data time 0.0008 (0.0030) model time 0.4640 (0.4663) loss 3.5451 (3.0119) grad_norm 2.5810 (1.6284) loss_scale 2048.0000 (1627.2628) mem 16715MB [2024-08-10 12:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][340/625] eta 0:02:13 lr 0.000773 wd 0.0500 time 0.4627 (0.4699) data time 0.0008 (0.0030) model time 0.4619 (0.4662) loss 3.3465 (3.0180) grad_norm 1.5080 (1.6311) loss_scale 2048.0000 (1639.6012) mem 16715MB [2024-08-10 12:42:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][350/625] eta 0:02:09 lr 0.000773 wd 0.0500 time 0.4646 (0.4698) data time 0.0011 (0.0029) model time 0.4635 (0.4663) loss 2.6444 (3.0139) grad_norm 1.4712 (1.6298) loss_scale 2048.0000 (1651.2365) mem 16715MB [2024-08-10 12:42:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][360/625] eta 0:02:04 lr 0.000773 wd 0.0500 time 0.4676 (0.4697) data time 0.0008 (0.0029) model time 0.4667 (0.4662) loss 3.4551 (3.0055) grad_norm 1.1388 (1.6232) loss_scale 2048.0000 (1662.2271) mem 16715MB [2024-08-10 12:43:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][370/625] eta 0:01:59 lr 0.000773 wd 0.0500 time 0.4661 (0.4697) data time 0.0011 (0.0028) model time 0.4651 (0.4663) loss 3.1321 (3.0103) grad_norm 1.5421 (1.6421) loss_scale 2048.0000 (1672.6253) mem 16715MB [2024-08-10 12:43:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][380/625] eta 0:01:55 lr 0.000773 wd 0.0500 time 0.4670 (0.4701) data time 0.0010 (0.0028) model time 0.4660 (0.4668) loss 2.9192 (3.0105) grad_norm 1.3008 (1.6478) loss_scale 2048.0000 (1682.4777) mem 16715MB [2024-08-10 12:43:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][390/625] eta 0:01:50 lr 0.000773 wd 0.0500 time 0.4601 (0.4699) data time 0.0010 (0.0027) model time 0.4592 (0.4667) loss 2.4896 (3.0120) grad_norm 1.4079 (1.6445) loss_scale 2048.0000 (1691.8261) mem 16715MB [2024-08-10 12:43:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][400/625] eta 0:01:45 lr 0.000773 wd 0.0500 time 0.4699 (0.4698) data time 0.0008 (0.0027) model time 0.4691 (0.4666) loss 3.5880 (3.0158) grad_norm 2.1603 (1.6557) loss_scale 2048.0000 (1700.7082) mem 16715MB [2024-08-10 12:43:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][410/625] eta 0:01:41 lr 0.000773 wd 0.0500 time 0.4685 (0.4698) data time 0.0011 (0.0026) model time 0.4674 (0.4667) loss 3.0540 (3.0148) grad_norm 1.7374 (1.6613) loss_scale 2048.0000 (1709.1582) mem 16715MB [2024-08-10 12:43:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][420/625] eta 0:01:36 lr 0.000773 wd 0.0500 time 0.4695 (0.4698) data time 0.0010 (0.0026) model time 0.4686 (0.4667) loss 3.4348 (3.0168) grad_norm 2.0483 (1.6675) loss_scale 2048.0000 (1717.2067) mem 16715MB [2024-08-10 12:43:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][430/625] eta 0:01:31 lr 0.000772 wd 0.0500 time 0.4941 (0.4698) data time 0.0011 (0.0026) model time 0.4929 (0.4667) loss 3.5633 (3.0258) grad_norm 2.0169 (1.6730) loss_scale 2048.0000 (1724.8817) mem 16715MB [2024-08-10 12:43:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][440/625] eta 0:01:26 lr 0.000772 wd 0.0500 time 0.4626 (0.4698) data time 0.0008 (0.0026) model time 0.4618 (0.4667) loss 3.4456 (3.0252) grad_norm 3.4736 (1.6758) loss_scale 2048.0000 (1732.2086) mem 16715MB [2024-08-10 12:43:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][450/625] eta 0:01:22 lr 0.000772 wd 0.0500 time 0.4625 (0.4698) data time 0.0011 (0.0026) model time 0.4615 (0.4668) loss 3.5347 (3.0327) grad_norm 1.5091 (1.6961) loss_scale 2048.0000 (1739.2106) mem 16715MB [2024-08-10 12:43:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][460/625] eta 0:01:17 lr 0.000772 wd 0.0500 time 0.4656 (0.4697) data time 0.0010 (0.0025) model time 0.4646 (0.4667) loss 3.0558 (3.0296) grad_norm 1.6154 (1.6984) loss_scale 2048.0000 (1745.9089) mem 16715MB [2024-08-10 12:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][470/625] eta 0:01:12 lr 0.000772 wd 0.0500 time 0.4671 (0.4701) data time 0.0010 (0.0025) model time 0.4661 (0.4672) loss 3.0856 (3.0229) grad_norm 1.3283 (1.7016) loss_scale 2048.0000 (1752.3227) mem 16715MB [2024-08-10 12:43:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][480/625] eta 0:01:08 lr 0.000772 wd 0.0500 time 0.4673 (0.4704) data time 0.0008 (0.0025) model time 0.4666 (0.4675) loss 3.8947 (3.0273) grad_norm 1.6902 (1.6993) loss_scale 2048.0000 (1758.4699) mem 16715MB [2024-08-10 12:43:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][490/625] eta 0:01:03 lr 0.000772 wd 0.0500 time 0.4646 (0.4703) data time 0.0008 (0.0025) model time 0.4638 (0.4675) loss 2.3347 (3.0203) grad_norm 1.1425 (1.6985) loss_scale 2048.0000 (1764.3666) mem 16715MB [2024-08-10 12:44:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][500/625] eta 0:00:58 lr 0.000772 wd 0.0500 time 0.4649 (0.4702) data time 0.0010 (0.0024) model time 0.4639 (0.4675) loss 2.7201 (3.0217) grad_norm 1.6093 (1.6932) loss_scale 2048.0000 (1770.0279) mem 16715MB [2024-08-10 12:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][510/625] eta 0:00:54 lr 0.000772 wd 0.0500 time 0.4687 (0.4703) data time 0.0008 (0.0024) model time 0.4679 (0.4676) loss 2.1783 (3.0192) grad_norm 1.7886 (1.6996) loss_scale 2048.0000 (1775.4677) mem 16715MB [2024-08-10 12:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][520/625] eta 0:00:49 lr 0.000772 wd 0.0500 time 0.4653 (0.4702) data time 0.0008 (0.0024) model time 0.4645 (0.4675) loss 2.5534 (3.0223) grad_norm 1.4453 (1.7036) loss_scale 2048.0000 (1780.6987) mem 16715MB [2024-08-10 12:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][530/625] eta 0:00:44 lr 0.000771 wd 0.0500 time 0.4639 (0.4701) data time 0.0012 (0.0023) model time 0.4627 (0.4674) loss 3.2723 (3.0216) grad_norm 1.7216 (1.7030) loss_scale 2048.0000 (1785.7326) mem 16715MB [2024-08-10 12:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][540/625] eta 0:00:39 lr 0.000771 wd 0.0500 time 0.4632 (0.4700) data time 0.0009 (0.0023) model time 0.4623 (0.4674) loss 2.0587 (3.0139) grad_norm 1.4858 (1.7031) loss_scale 2048.0000 (1790.5804) mem 16715MB [2024-08-10 12:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][550/625] eta 0:00:35 lr 0.000771 wd 0.0500 time 0.4722 (0.4699) data time 0.0011 (0.0023) model time 0.4712 (0.4673) loss 3.4227 (3.0176) grad_norm 1.5537 (1.7103) loss_scale 2048.0000 (1795.2523) mem 16715MB [2024-08-10 12:44:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][560/625] eta 0:00:30 lr 0.000771 wd 0.0500 time 0.4645 (0.4699) data time 0.0011 (0.0023) model time 0.4634 (0.4673) loss 2.8331 (3.0133) grad_norm 1.4092 (1.7115) loss_scale 2048.0000 (1799.7576) mem 16715MB [2024-08-10 12:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][570/625] eta 0:00:25 lr 0.000771 wd 0.0500 time 0.4640 (0.4700) data time 0.0010 (0.0023) model time 0.4630 (0.4674) loss 3.2899 (3.0165) grad_norm 2.9780 (1.7121) loss_scale 2048.0000 (1804.1051) mem 16715MB [2024-08-10 12:44:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][580/625] eta 0:00:21 lr 0.000771 wd 0.0500 time 0.4668 (0.4700) data time 0.0008 (0.0022) model time 0.4660 (0.4674) loss 2.5249 (3.0187) grad_norm 3.4113 (1.7162) loss_scale 2048.0000 (1808.3029) mem 16715MB [2024-08-10 12:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][590/625] eta 0:00:16 lr 0.000771 wd 0.0500 time 0.4643 (0.4699) data time 0.0008 (0.0022) model time 0.4635 (0.4674) loss 2.2218 (3.0109) grad_norm 1.3393 (1.7178) loss_scale 2048.0000 (1812.3587) mem 16715MB [2024-08-10 12:44:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][600/625] eta 0:00:11 lr 0.000771 wd 0.0500 time 0.4608 (0.4698) data time 0.0009 (0.0022) model time 0.4599 (0.4673) loss 2.8350 (3.0109) grad_norm 2.6677 (1.7220) loss_scale 2048.0000 (1816.2795) mem 16715MB [2024-08-10 12:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][610/625] eta 0:00:07 lr 0.000771 wd 0.0500 time 0.4629 (0.4697) data time 0.0005 (0.0022) model time 0.4623 (0.4672) loss 3.3870 (3.0144) grad_norm 1.3798 (1.7187) loss_scale 2048.0000 (1820.0720) mem 16715MB [2024-08-10 12:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [134/300][620/625] eta 0:00:02 lr 0.000770 wd 0.0500 time 0.4625 (0.4696) data time 0.0005 (0.0022) model time 0.4620 (0.4671) loss 3.4176 (3.0186) grad_norm 1.6346 (1.7165) loss_scale 2048.0000 (1823.7424) mem 16715MB [2024-08-10 12:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 134 training takes 0:04:53 [2024-08-10 12:45:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:45:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:45:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.524 (0.524) Loss 0.5444 (0.5444) Acc@1 87.744 (87.744) Acc@5 98.389 (98.389) Mem 16715MB [2024-08-10 12:45:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.8555 (0.6582) Acc@1 80.078 (85.170) Acc@5 95.264 (97.408) Mem 16715MB [2024-08-10 12:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0059 (0.7870) Acc@1 76.465 (81.862) Acc@5 93.945 (96.054) Mem 16715MB [2024-08-10 12:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.478 Acc@5 96.023 [2024-08-10 12:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.5% [2024-08-10 12:45:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.866 (0.866) Loss 0.4807 (0.4807) Acc@1 89.502 (89.502) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 12:45:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.201) Loss 0.7739 (0.6052) Acc@1 81.543 (86.683) Acc@5 96.680 (97.829) Mem 16715MB [2024-08-10 12:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.161) Loss 0.8887 (0.7108) Acc@1 78.076 (83.743) Acc@5 95.361 (96.742) Mem 16715MB [2024-08-10 12:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.423 Acc@5 96.751 [2024-08-10 12:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 12:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.42% [2024-08-10 12:45:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:45:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:45:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][0/625] eta 0:08:38 lr 0.000770 wd 0.0500 time 0.8302 (0.8302) data time 0.4220 (0.4220) model time 0.0000 (0.0000) loss 2.0653 (2.0653) grad_norm 1.4076 (1.4076) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][10/625] eta 0:05:13 lr 0.000770 wd 0.0500 time 0.4604 (0.5099) data time 0.0007 (0.0392) model time 0.0000 (0.0000) loss 2.4777 (2.7217) grad_norm 1.7143 (1.6988) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][20/625] eta 0:04:56 lr 0.000770 wd 0.0500 time 0.4568 (0.4895) data time 0.0007 (0.0210) model time 0.0000 (0.0000) loss 2.9573 (2.8840) grad_norm 1.6290 (1.5918) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:45:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][30/625] eta 0:04:46 lr 0.000770 wd 0.0500 time 0.4647 (0.4814) data time 0.0010 (0.0147) model time 0.0000 (0.0000) loss 1.8543 (2.9222) grad_norm 1.5404 (1.6201) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][40/625] eta 0:04:39 lr 0.000770 wd 0.0500 time 0.4631 (0.4785) data time 0.0010 (0.0114) model time 0.0000 (0.0000) loss 3.1460 (2.9812) grad_norm 1.2877 (1.6697) loss_scale 2048.0000 (2048.0000) mem 16715MB [2024-08-10 12:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][50/625] eta 0:04:36 lr 0.000770 wd 0.0500 time 0.4634 (0.4804) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 2.7400 (2.9341) grad_norm 1.8748 (inf) loss_scale 1024.0000 (1927.5294) mem 16715MB [2024-08-10 12:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][60/625] eta 0:04:30 lr 0.000770 wd 0.0500 time 0.4595 (0.4783) data time 0.0008 (0.0081) model time 0.4587 (0.4655) loss 3.5868 (2.9429) grad_norm 1.6864 (inf) loss_scale 1024.0000 (1779.4098) mem 16715MB [2024-08-10 12:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][70/625] eta 0:04:26 lr 0.000770 wd 0.0500 time 0.4736 (0.4794) data time 0.0008 (0.0071) model time 0.4728 (0.4752) loss 3.4006 (2.9354) grad_norm 1.6734 (inf) loss_scale 1024.0000 (1673.0141) mem 16715MB [2024-08-10 12:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][80/625] eta 0:04:20 lr 0.000770 wd 0.0500 time 0.4648 (0.4782) data time 0.0009 (0.0064) model time 0.4639 (0.4730) loss 1.9868 (2.9134) grad_norm 1.5977 (inf) loss_scale 1024.0000 (1592.8889) mem 16715MB [2024-08-10 12:45:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][90/625] eta 0:04:16 lr 0.000770 wd 0.0500 time 0.4673 (0.4790) data time 0.0010 (0.0058) model time 0.4663 (0.4759) loss 3.1735 (2.8939) grad_norm 1.1448 (inf) loss_scale 1024.0000 (1530.3736) mem 16715MB [2024-08-10 12:46:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][100/625] eta 0:04:10 lr 0.000769 wd 0.0500 time 0.4667 (0.4775) data time 0.0010 (0.0053) model time 0.4657 (0.4733) loss 3.3191 (2.8986) grad_norm 1.8139 (inf) loss_scale 1024.0000 (1480.2376) mem 16715MB [2024-08-10 12:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][110/625] eta 0:04:05 lr 0.000769 wd 0.0500 time 0.4639 (0.4764) data time 0.0008 (0.0049) model time 0.4631 (0.4718) loss 3.0527 (2.9165) grad_norm 1.7721 (inf) loss_scale 1024.0000 (1439.1351) mem 16715MB [2024-08-10 12:46:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][120/625] eta 0:04:00 lr 0.000769 wd 0.0500 time 0.4646 (0.4754) data time 0.0008 (0.0046) model time 0.4638 (0.4706) loss 2.2323 (2.9157) grad_norm 1.8024 (inf) loss_scale 1024.0000 (1404.8264) mem 16715MB [2024-08-10 12:46:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][130/625] eta 0:03:54 lr 0.000769 wd 0.0500 time 0.4658 (0.4745) data time 0.0011 (0.0043) model time 0.4647 (0.4696) loss 2.9030 (2.9111) grad_norm 2.0240 (inf) loss_scale 1024.0000 (1375.7557) mem 16715MB [2024-08-10 12:46:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][140/625] eta 0:03:49 lr 0.000769 wd 0.0500 time 0.4641 (0.4738) data time 0.0008 (0.0041) model time 0.4633 (0.4690) loss 3.4608 (2.9381) grad_norm 1.6148 (inf) loss_scale 1024.0000 (1350.8085) mem 16715MB [2024-08-10 12:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][150/625] eta 0:03:44 lr 0.000769 wd 0.0500 time 0.4673 (0.4734) data time 0.0008 (0.0039) model time 0.4665 (0.4687) loss 3.7712 (2.9502) grad_norm 1.8075 (inf) loss_scale 1024.0000 (1329.1656) mem 16715MB [2024-08-10 12:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][160/625] eta 0:03:39 lr 0.000769 wd 0.0500 time 0.4689 (0.4730) data time 0.0008 (0.0037) model time 0.4680 (0.4685) loss 2.4056 (2.9617) grad_norm 1.6003 (inf) loss_scale 1024.0000 (1310.2112) mem 16715MB [2024-08-10 12:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][170/625] eta 0:03:34 lr 0.000769 wd 0.0500 time 0.4584 (0.4724) data time 0.0008 (0.0035) model time 0.4576 (0.4680) loss 2.2982 (2.9412) grad_norm 2.0589 (inf) loss_scale 1024.0000 (1293.4737) mem 16715MB [2024-08-10 12:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][180/625] eta 0:03:30 lr 0.000769 wd 0.0500 time 0.4609 (0.4719) data time 0.0010 (0.0034) model time 0.4599 (0.4675) loss 2.2705 (2.9259) grad_norm 1.4479 (inf) loss_scale 1024.0000 (1278.5856) mem 16715MB [2024-08-10 12:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][190/625] eta 0:03:25 lr 0.000768 wd 0.0500 time 0.4636 (0.4715) data time 0.0010 (0.0033) model time 0.4625 (0.4672) loss 2.9216 (2.9297) grad_norm 1.2953 (inf) loss_scale 1024.0000 (1265.2565) mem 16715MB [2024-08-10 12:46:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][200/625] eta 0:03:20 lr 0.000768 wd 0.0500 time 0.4702 (0.4713) data time 0.0011 (0.0032) model time 0.4691 (0.4671) loss 2.8579 (2.9305) grad_norm 1.8889 (inf) loss_scale 1024.0000 (1253.2537) mem 16715MB [2024-08-10 12:46:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][210/625] eta 0:03:15 lr 0.000768 wd 0.0500 time 0.4674 (0.4710) data time 0.0010 (0.0031) model time 0.4664 (0.4670) loss 3.1601 (2.9359) grad_norm 2.4606 (inf) loss_scale 1024.0000 (1242.3886) mem 16715MB [2024-08-10 12:46:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][220/625] eta 0:03:10 lr 0.000768 wd 0.0500 time 0.4648 (0.4710) data time 0.0008 (0.0030) model time 0.4640 (0.4671) loss 2.7781 (2.9351) grad_norm 2.8134 (inf) loss_scale 1024.0000 (1232.5068) mem 16715MB [2024-08-10 12:47:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][230/625] eta 0:03:05 lr 0.000768 wd 0.0500 time 0.4640 (0.4708) data time 0.0010 (0.0029) model time 0.4630 (0.4670) loss 3.4900 (2.9488) grad_norm 1.5495 (inf) loss_scale 1024.0000 (1223.4805) mem 16715MB [2024-08-10 12:47:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][240/625] eta 0:03:01 lr 0.000768 wd 0.0500 time 0.4581 (0.4713) data time 0.0008 (0.0028) model time 0.4573 (0.4678) loss 4.4343 (2.9566) grad_norm 1.9153 (inf) loss_scale 1024.0000 (1215.2033) mem 16715MB [2024-08-10 12:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][250/625] eta 0:02:56 lr 0.000768 wd 0.0500 time 0.4647 (0.4711) data time 0.0010 (0.0027) model time 0.4637 (0.4676) loss 2.9530 (2.9622) grad_norm 1.6706 (inf) loss_scale 1024.0000 (1207.5857) mem 16715MB [2024-08-10 12:47:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][260/625] eta 0:02:52 lr 0.000768 wd 0.0500 time 0.4834 (0.4714) data time 0.0008 (0.0027) model time 0.4826 (0.4682) loss 2.7542 (2.9690) grad_norm 2.4849 (inf) loss_scale 1024.0000 (1200.5517) mem 16715MB [2024-08-10 12:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][270/625] eta 0:02:47 lr 0.000768 wd 0.0500 time 0.4694 (0.4713) data time 0.0011 (0.0027) model time 0.4683 (0.4680) loss 3.5034 (2.9674) grad_norm 1.6002 (inf) loss_scale 1024.0000 (1194.0369) mem 16715MB [2024-08-10 12:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][280/625] eta 0:02:42 lr 0.000768 wd 0.0500 time 0.4652 (0.4712) data time 0.0008 (0.0026) model time 0.4644 (0.4679) loss 2.5673 (2.9581) grad_norm 0.9278 (inf) loss_scale 1024.0000 (1187.9858) mem 16715MB [2024-08-10 12:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][290/625] eta 0:02:37 lr 0.000767 wd 0.0500 time 0.4656 (0.4710) data time 0.0008 (0.0026) model time 0.4648 (0.4679) loss 3.3108 (2.9630) grad_norm 1.3068 (inf) loss_scale 1024.0000 (1182.3505) mem 16715MB [2024-08-10 12:47:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][300/625] eta 0:02:33 lr 0.000767 wd 0.0500 time 0.4628 (0.4709) data time 0.0008 (0.0025) model time 0.4620 (0.4678) loss 2.0618 (2.9573) grad_norm 2.5377 (inf) loss_scale 1024.0000 (1177.0897) mem 16715MB [2024-08-10 12:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][310/625] eta 0:02:28 lr 0.000767 wd 0.0500 time 0.4636 (0.4718) data time 0.0008 (0.0025) model time 0.4628 (0.4689) loss 3.3208 (2.9574) grad_norm 1.6945 (inf) loss_scale 1024.0000 (1172.1672) mem 16715MB [2024-08-10 12:47:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][320/625] eta 0:02:23 lr 0.000767 wd 0.0500 time 0.4666 (0.4718) data time 0.0008 (0.0024) model time 0.4657 (0.4690) loss 3.0203 (2.9554) grad_norm 1.4505 (inf) loss_scale 1024.0000 (1167.5514) mem 16715MB [2024-08-10 12:47:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][330/625] eta 0:02:19 lr 0.000767 wd 0.0500 time 0.4648 (0.4717) data time 0.0011 (0.0024) model time 0.4637 (0.4689) loss 2.9823 (2.9536) grad_norm 1.4922 (inf) loss_scale 1024.0000 (1163.2145) mem 16715MB [2024-08-10 12:47:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][340/625] eta 0:02:14 lr 0.000767 wd 0.0500 time 0.4646 (0.4714) data time 0.0008 (0.0024) model time 0.4638 (0.4687) loss 2.7707 (2.9543) grad_norm 1.6689 (inf) loss_scale 1024.0000 (1159.1320) mem 16715MB [2024-08-10 12:47:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][350/625] eta 0:02:09 lr 0.000767 wd 0.0500 time 0.4679 (0.4713) data time 0.0008 (0.0023) model time 0.4671 (0.4686) loss 3.1930 (2.9563) grad_norm 1.0754 (inf) loss_scale 1024.0000 (1155.2821) mem 16715MB [2024-08-10 12:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][360/625] eta 0:02:04 lr 0.000767 wd 0.0500 time 0.4632 (0.4712) data time 0.0010 (0.0023) model time 0.4622 (0.4685) loss 3.5017 (2.9652) grad_norm 1.7219 (inf) loss_scale 1024.0000 (1151.6454) mem 16715MB [2024-08-10 12:48:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][370/625] eta 0:02:00 lr 0.000767 wd 0.0500 time 0.4588 (0.4709) data time 0.0011 (0.0023) model time 0.4577 (0.4683) loss 2.8922 (2.9645) grad_norm 1.3745 (inf) loss_scale 1024.0000 (1148.2049) mem 16715MB [2024-08-10 12:48:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][380/625] eta 0:01:55 lr 0.000767 wd 0.0500 time 0.4622 (0.4708) data time 0.0008 (0.0022) model time 0.4615 (0.4681) loss 2.8283 (2.9631) grad_norm 1.4733 (inf) loss_scale 1024.0000 (1144.9449) mem 16715MB [2024-08-10 12:48:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][390/625] eta 0:01:50 lr 0.000766 wd 0.0500 time 0.4647 (0.4706) data time 0.0009 (0.0022) model time 0.4638 (0.4680) loss 2.9540 (2.9678) grad_norm 1.5712 (inf) loss_scale 1024.0000 (1141.8517) mem 16715MB [2024-08-10 12:48:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][400/625] eta 0:01:45 lr 0.000766 wd 0.0500 time 0.4628 (0.4704) data time 0.0009 (0.0022) model time 0.4619 (0.4678) loss 3.1217 (2.9653) grad_norm 2.3766 (inf) loss_scale 512.0000 (1126.1446) mem 16715MB [2024-08-10 12:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][410/625] eta 0:01:41 lr 0.000766 wd 0.0500 time 0.4629 (0.4702) data time 0.0012 (0.0021) model time 0.4617 (0.4677) loss 3.2341 (2.9685) grad_norm 2.0400 (inf) loss_scale 512.0000 (1111.2019) mem 16715MB [2024-08-10 12:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][420/625] eta 0:01:36 lr 0.000766 wd 0.0500 time 0.4663 (0.4701) data time 0.0010 (0.0021) model time 0.4653 (0.4676) loss 3.2171 (2.9699) grad_norm 1.3791 (inf) loss_scale 512.0000 (1096.9691) mem 16715MB [2024-08-10 12:48:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][430/625] eta 0:01:31 lr 0.000766 wd 0.0500 time 0.4634 (0.4700) data time 0.0008 (0.0021) model time 0.4626 (0.4675) loss 3.2411 (2.9674) grad_norm 1.6772 (inf) loss_scale 512.0000 (1083.3968) mem 16715MB [2024-08-10 12:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][440/625] eta 0:01:26 lr 0.000766 wd 0.0500 time 0.4669 (0.4699) data time 0.0008 (0.0021) model time 0.4661 (0.4674) loss 2.6358 (2.9679) grad_norm 1.6270 (inf) loss_scale 512.0000 (1070.4399) mem 16715MB [2024-08-10 12:48:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][450/625] eta 0:01:22 lr 0.000766 wd 0.0500 time 0.4669 (0.4698) data time 0.0010 (0.0021) model time 0.4659 (0.4674) loss 3.1568 (2.9711) grad_norm 1.4532 (inf) loss_scale 512.0000 (1058.0576) mem 16715MB [2024-08-10 12:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][460/625] eta 0:01:17 lr 0.000766 wd 0.0500 time 0.4618 (0.4710) data time 0.0007 (0.0020) model time 0.4611 (0.4687) loss 2.5233 (2.9734) grad_norm 4.2769 (inf) loss_scale 512.0000 (1046.2126) mem 16715MB [2024-08-10 12:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][470/625] eta 0:01:12 lr 0.000766 wd 0.0500 time 0.4613 (0.4708) data time 0.0008 (0.0020) model time 0.4605 (0.4685) loss 3.6628 (2.9803) grad_norm 1.6285 (inf) loss_scale 512.0000 (1034.8705) mem 16715MB [2024-08-10 12:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][480/625] eta 0:01:08 lr 0.000766 wd 0.0500 time 0.4608 (0.4711) data time 0.0008 (0.0020) model time 0.4600 (0.4689) loss 3.3803 (2.9840) grad_norm 2.1333 (inf) loss_scale 512.0000 (1024.0000) mem 16715MB [2024-08-10 12:49:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][490/625] eta 0:01:03 lr 0.000765 wd 0.0500 time 0.4612 (0.4710) data time 0.0011 (0.0020) model time 0.4601 (0.4688) loss 2.4193 (2.9825) grad_norm 1.2363 (inf) loss_scale 512.0000 (1013.5723) mem 16715MB [2024-08-10 12:49:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][500/625] eta 0:00:58 lr 0.000765 wd 0.0500 time 0.4626 (0.4709) data time 0.0010 (0.0020) model time 0.4617 (0.4687) loss 2.4432 (2.9837) grad_norm 1.1826 (inf) loss_scale 512.0000 (1003.5609) mem 16715MB [2024-08-10 12:49:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][510/625] eta 0:00:54 lr 0.000765 wd 0.0500 time 0.4626 (0.4708) data time 0.0011 (0.0019) model time 0.4615 (0.4686) loss 3.4932 (2.9813) grad_norm 1.9539 (inf) loss_scale 512.0000 (993.9413) mem 16715MB [2024-08-10 12:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][520/625] eta 0:00:49 lr 0.000765 wd 0.0500 time 0.4626 (0.4707) data time 0.0008 (0.0019) model time 0.4618 (0.4685) loss 3.7018 (2.9775) grad_norm 1.9371 (inf) loss_scale 512.0000 (984.6910) mem 16715MB [2024-08-10 12:49:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][530/625] eta 0:00:44 lr 0.000765 wd 0.0500 time 0.4593 (0.4705) data time 0.0007 (0.0019) model time 0.4586 (0.4684) loss 2.4412 (2.9736) grad_norm 2.0132 (inf) loss_scale 512.0000 (975.7891) mem 16715MB [2024-08-10 12:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][540/625] eta 0:00:39 lr 0.000765 wd 0.0500 time 0.4652 (0.4704) data time 0.0010 (0.0019) model time 0.4642 (0.4682) loss 1.7682 (2.9696) grad_norm 2.0784 (inf) loss_scale 512.0000 (967.2163) mem 16715MB [2024-08-10 12:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][550/625] eta 0:00:35 lr 0.000765 wd 0.0500 time 0.4645 (0.4703) data time 0.0010 (0.0019) model time 0.4635 (0.4681) loss 3.0995 (2.9746) grad_norm 2.4011 (inf) loss_scale 512.0000 (958.9546) mem 16715MB [2024-08-10 12:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][560/625] eta 0:00:30 lr 0.000765 wd 0.0500 time 0.4635 (0.4701) data time 0.0010 (0.0019) model time 0.4624 (0.4680) loss 3.4931 (2.9752) grad_norm 1.6465 (inf) loss_scale 512.0000 (950.9875) mem 16715MB [2024-08-10 12:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][570/625] eta 0:00:25 lr 0.000765 wd 0.0500 time 0.4720 (0.4700) data time 0.0011 (0.0018) model time 0.4710 (0.4679) loss 2.6675 (2.9775) grad_norm 1.7825 (inf) loss_scale 512.0000 (943.2995) mem 16715MB [2024-08-10 12:49:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][580/625] eta 0:00:21 lr 0.000764 wd 0.0500 time 0.4652 (0.4699) data time 0.0008 (0.0018) model time 0.4644 (0.4678) loss 3.3183 (2.9821) grad_norm 1.4394 (inf) loss_scale 512.0000 (935.8761) mem 16715MB [2024-08-10 12:49:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][590/625] eta 0:00:16 lr 0.000764 wd 0.0500 time 0.4694 (0.4699) data time 0.0011 (0.0018) model time 0.4683 (0.4678) loss 2.8602 (2.9798) grad_norm 1.2051 (inf) loss_scale 512.0000 (928.7039) mem 16715MB [2024-08-10 12:49:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][600/625] eta 0:00:11 lr 0.000764 wd 0.0500 time 0.4752 (0.4698) data time 0.0010 (0.0018) model time 0.4742 (0.4678) loss 3.4156 (2.9828) grad_norm 1.5272 (inf) loss_scale 512.0000 (921.7704) mem 16715MB [2024-08-10 12:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][610/625] eta 0:00:07 lr 0.000764 wd 0.0500 time 0.4592 (0.4701) data time 0.0007 (0.0018) model time 0.4585 (0.4681) loss 3.0110 (2.9829) grad_norm 1.5443 (inf) loss_scale 512.0000 (915.0638) mem 16715MB [2024-08-10 12:50:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [135/300][620/625] eta 0:00:02 lr 0.000764 wd 0.0500 time 0.4639 (0.4699) data time 0.0007 (0.0018) model time 0.4631 (0.4679) loss 3.1377 (2.9843) grad_norm 1.6145 (inf) loss_scale 512.0000 (908.5733) mem 16715MB [2024-08-10 12:50:06 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 135 training takes 0:04:53 [2024-08-10 12:50:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:50:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5283 (0.5283) Acc@1 88.672 (88.672) Acc@5 98.340 (98.340) Mem 16715MB [2024-08-10 12:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.9136 (0.6635) Acc@1 78.662 (85.245) Acc@5 95.264 (97.301) Mem 16715MB [2024-08-10 12:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 1.0146 (0.7894) Acc@1 75.781 (82.078) Acc@5 94.531 (96.022) Mem 16715MB [2024-08-10 12:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.816 Acc@5 96.053 [2024-08-10 12:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-10 12:50:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.892 (0.892) Loss 0.4802 (0.4802) Acc@1 89.355 (89.355) Acc@5 98.584 (98.584) Mem 16715MB [2024-08-10 12:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.201) Loss 0.7734 (0.6047) Acc@1 81.641 (86.714) Acc@5 96.729 (97.825) Mem 16715MB [2024-08-10 12:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.161) Loss 0.8877 (0.7099) Acc@1 78.027 (83.773) Acc@5 95.459 (96.749) Mem 16715MB [2024-08-10 12:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.439 Acc@5 96.763 [2024-08-10 12:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 12:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.44% [2024-08-10 12:50:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:50:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][0/625] eta 0:08:50 lr 0.000764 wd 0.0500 time 0.8496 (0.8496) data time 0.4440 (0.4440) model time 0.0000 (0.0000) loss 3.3673 (3.3673) grad_norm 1.3408 (1.3408) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][10/625] eta 0:05:07 lr 0.000764 wd 0.0500 time 0.4681 (0.5004) data time 0.0009 (0.0413) model time 0.0000 (0.0000) loss 3.3031 (3.2711) grad_norm 1.7174 (1.8387) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][20/625] eta 0:04:52 lr 0.000764 wd 0.0500 time 0.4637 (0.4832) data time 0.0011 (0.0221) model time 0.0000 (0.0000) loss 3.1860 (3.1907) grad_norm 1.4113 (1.5964) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][30/625] eta 0:04:44 lr 0.000764 wd 0.0500 time 0.4744 (0.4779) data time 0.0009 (0.0153) model time 0.0000 (0.0000) loss 2.5995 (3.1218) grad_norm 1.8626 (1.6676) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][40/625] eta 0:04:40 lr 0.000764 wd 0.0500 time 0.4624 (0.4790) data time 0.0009 (0.0118) model time 0.0000 (0.0000) loss 2.5170 (3.0426) grad_norm 1.5578 (1.6998) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][50/625] eta 0:04:33 lr 0.000764 wd 0.0500 time 0.4604 (0.4758) data time 0.0009 (0.0097) model time 0.0000 (0.0000) loss 3.3663 (3.0146) grad_norm 2.1666 (1.6730) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][60/625] eta 0:04:27 lr 0.000763 wd 0.0500 time 0.4614 (0.4737) data time 0.0011 (0.0083) model time 0.4603 (0.4616) loss 3.6215 (3.0140) grad_norm 1.4001 (1.6157) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][70/625] eta 0:04:23 lr 0.000763 wd 0.0500 time 0.6264 (0.4746) data time 0.0009 (0.0073) model time 0.6255 (0.4703) loss 3.5480 (3.0278) grad_norm 1.8206 (1.6111) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][80/625] eta 0:04:18 lr 0.000763 wd 0.0500 time 0.4713 (0.4736) data time 0.0011 (0.0065) model time 0.4702 (0.4687) loss 3.4366 (3.0430) grad_norm 1.6154 (1.6318) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][90/625] eta 0:04:12 lr 0.000763 wd 0.0500 time 0.4662 (0.4729) data time 0.0008 (0.0059) model time 0.4654 (0.4681) loss 3.2107 (3.0450) grad_norm 1.3548 (1.6749) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][100/625] eta 0:04:07 lr 0.000763 wd 0.0500 time 0.4641 (0.4722) data time 0.0008 (0.0054) model time 0.4633 (0.4675) loss 2.8409 (3.0387) grad_norm 2.1182 (1.6890) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][110/625] eta 0:04:02 lr 0.000763 wd 0.0500 time 0.4696 (0.4716) data time 0.0011 (0.0050) model time 0.4685 (0.4670) loss 3.0011 (3.0297) grad_norm 1.7871 (1.7109) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][120/625] eta 0:03:57 lr 0.000763 wd 0.0500 time 0.4646 (0.4710) data time 0.0007 (0.0047) model time 0.4639 (0.4664) loss 2.6345 (3.0316) grad_norm 1.5372 (1.7998) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][130/625] eta 0:03:52 lr 0.000763 wd 0.0500 time 0.4633 (0.4704) data time 0.0008 (0.0044) model time 0.4625 (0.4660) loss 3.0110 (3.0209) grad_norm 1.3700 (1.7832) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][140/625] eta 0:03:47 lr 0.000763 wd 0.0500 time 0.4605 (0.4699) data time 0.0009 (0.0041) model time 0.4597 (0.4656) loss 2.5118 (3.0102) grad_norm 2.0339 (1.7669) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][150/625] eta 0:03:43 lr 0.000762 wd 0.0500 time 0.4679 (0.4695) data time 0.0008 (0.0039) model time 0.4671 (0.4653) loss 2.2213 (3.0126) grad_norm 1.7361 (1.7535) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][160/625] eta 0:03:38 lr 0.000762 wd 0.0500 time 0.4655 (0.4692) data time 0.0011 (0.0038) model time 0.4644 (0.4651) loss 3.3605 (3.0307) grad_norm 1.2485 (1.7533) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][170/625] eta 0:03:33 lr 0.000762 wd 0.0500 time 0.4599 (0.4689) data time 0.0011 (0.0036) model time 0.4588 (0.4649) loss 2.6333 (3.0326) grad_norm 2.6650 (1.7567) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][180/625] eta 0:03:28 lr 0.000762 wd 0.0500 time 0.4584 (0.4686) data time 0.0008 (0.0035) model time 0.4576 (0.4647) loss 2.4034 (3.0397) grad_norm 1.7868 (1.7691) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][190/625] eta 0:03:24 lr 0.000762 wd 0.0500 time 0.4658 (0.4693) data time 0.0010 (0.0033) model time 0.4648 (0.4659) loss 3.3611 (3.0444) grad_norm 1.3248 (1.7541) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][200/625] eta 0:03:19 lr 0.000762 wd 0.0500 time 0.4619 (0.4690) data time 0.0008 (0.0032) model time 0.4610 (0.4657) loss 3.0399 (3.0282) grad_norm 1.9620 (1.7600) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][210/625] eta 0:03:14 lr 0.000762 wd 0.0500 time 0.4670 (0.4698) data time 0.0008 (0.0031) model time 0.4662 (0.4668) loss 2.8945 (3.0255) grad_norm 1.3098 (1.7485) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][220/625] eta 0:03:10 lr 0.000762 wd 0.0500 time 0.4642 (0.4694) data time 0.0007 (0.0030) model time 0.4635 (0.4665) loss 3.6567 (3.0173) grad_norm 1.5619 (1.7437) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][230/625] eta 0:03:05 lr 0.000762 wd 0.0500 time 0.4674 (0.4692) data time 0.0010 (0.0029) model time 0.4664 (0.4663) loss 3.1706 (3.0246) grad_norm 1.3101 (1.7471) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][240/625] eta 0:03:00 lr 0.000762 wd 0.0500 time 0.4640 (0.4690) data time 0.0010 (0.0028) model time 0.4630 (0.4661) loss 2.6315 (3.0284) grad_norm 5.1319 (1.7653) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][250/625] eta 0:02:55 lr 0.000761 wd 0.0500 time 0.4621 (0.4688) data time 0.0009 (0.0028) model time 0.4612 (0.4660) loss 3.0774 (3.0219) grad_norm 2.0981 (1.7948) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][260/625] eta 0:02:51 lr 0.000761 wd 0.0500 time 0.4620 (0.4686) data time 0.0008 (0.0027) model time 0.4612 (0.4658) loss 3.0618 (3.0178) grad_norm 1.3291 (1.7980) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][270/625] eta 0:02:46 lr 0.000761 wd 0.0500 time 0.4687 (0.4683) data time 0.0008 (0.0026) model time 0.4679 (0.4656) loss 2.3213 (3.0167) grad_norm 1.4683 (1.7835) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][280/625] eta 0:02:41 lr 0.000761 wd 0.0500 time 0.4644 (0.4681) data time 0.0010 (0.0026) model time 0.4634 (0.4653) loss 3.1258 (3.0196) grad_norm 1.2773 (1.7772) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][290/625] eta 0:02:36 lr 0.000761 wd 0.0500 time 0.4583 (0.4679) data time 0.0008 (0.0025) model time 0.4575 (0.4652) loss 3.6276 (3.0158) grad_norm 1.3408 (1.7634) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][300/625] eta 0:02:32 lr 0.000761 wd 0.0500 time 0.4643 (0.4678) data time 0.0008 (0.0025) model time 0.4635 (0.4651) loss 2.6658 (3.0154) grad_norm 1.6375 (1.7529) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][310/625] eta 0:02:27 lr 0.000761 wd 0.0500 time 0.4632 (0.4677) data time 0.0008 (0.0024) model time 0.4624 (0.4651) loss 3.3504 (3.0090) grad_norm 1.1906 (1.7454) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][320/625] eta 0:02:22 lr 0.000761 wd 0.0500 time 0.4674 (0.4676) data time 0.0010 (0.0024) model time 0.4664 (0.4650) loss 3.1090 (3.0063) grad_norm 1.1393 (1.7392) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][330/625] eta 0:02:17 lr 0.000761 wd 0.0500 time 0.4677 (0.4675) data time 0.0010 (0.0024) model time 0.4667 (0.4650) loss 3.3164 (3.0153) grad_norm 1.4326 (1.7376) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:52:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][340/625] eta 0:02:13 lr 0.000761 wd 0.0500 time 0.4608 (0.4674) data time 0.0010 (0.0023) model time 0.4598 (0.4650) loss 3.3797 (3.0204) grad_norm 1.5255 (1.7369) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][350/625] eta 0:02:08 lr 0.000760 wd 0.0500 time 0.4602 (0.4674) data time 0.0010 (0.0023) model time 0.4592 (0.4649) loss 3.2660 (3.0184) grad_norm 1.4832 (1.7270) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][360/625] eta 0:02:03 lr 0.000760 wd 0.0500 time 0.4627 (0.4679) data time 0.0008 (0.0022) model time 0.4619 (0.4656) loss 3.7538 (3.0157) grad_norm 1.3643 (1.7186) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][370/625] eta 0:01:59 lr 0.000760 wd 0.0500 time 0.4669 (0.4678) data time 0.0010 (0.0022) model time 0.4659 (0.4655) loss 2.4600 (3.0124) grad_norm 1.2931 (1.7127) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][380/625] eta 0:01:54 lr 0.000760 wd 0.0500 time 0.4711 (0.4683) data time 0.0010 (0.0022) model time 0.4701 (0.4661) loss 3.2449 (3.0076) grad_norm 1.4192 (1.7198) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][390/625] eta 0:01:50 lr 0.000760 wd 0.0500 time 0.4641 (0.4682) data time 0.0009 (0.0021) model time 0.4633 (0.4660) loss 2.3058 (2.9990) grad_norm 1.4739 (1.7188) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][400/625] eta 0:01:45 lr 0.000760 wd 0.0500 time 0.4672 (0.4681) data time 0.0011 (0.0021) model time 0.4661 (0.4660) loss 2.9058 (3.0048) grad_norm 2.0534 (1.7174) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][410/625] eta 0:01:40 lr 0.000760 wd 0.0500 time 0.4637 (0.4693) data time 0.0009 (0.0021) model time 0.4628 (0.4674) loss 3.4439 (3.0014) grad_norm 1.9325 (1.7198) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][420/625] eta 0:01:36 lr 0.000760 wd 0.0500 time 0.4637 (0.4692) data time 0.0008 (0.0021) model time 0.4629 (0.4672) loss 2.6349 (3.0009) grad_norm 2.3102 (1.7229) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][430/625] eta 0:01:31 lr 0.000760 wd 0.0500 time 0.4584 (0.4690) data time 0.0008 (0.0020) model time 0.4575 (0.4671) loss 3.3994 (3.0017) grad_norm 1.1592 (1.7174) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][440/625] eta 0:01:26 lr 0.000759 wd 0.0500 time 0.4631 (0.4689) data time 0.0009 (0.0020) model time 0.4623 (0.4669) loss 2.8446 (3.0059) grad_norm 1.7924 (1.7139) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][450/625] eta 0:01:22 lr 0.000759 wd 0.0500 time 0.4617 (0.4688) data time 0.0008 (0.0020) model time 0.4608 (0.4669) loss 2.6573 (3.0034) grad_norm 1.5049 (1.7119) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][460/625] eta 0:01:17 lr 0.000759 wd 0.0500 time 0.4659 (0.4687) data time 0.0008 (0.0020) model time 0.4651 (0.4668) loss 3.3636 (3.0061) grad_norm 1.4531 (1.7077) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][470/625] eta 0:01:12 lr 0.000759 wd 0.0500 time 0.4656 (0.4686) data time 0.0010 (0.0020) model time 0.4646 (0.4668) loss 2.7709 (3.0057) grad_norm 1.6164 (1.7021) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][480/625] eta 0:01:07 lr 0.000759 wd 0.0500 time 0.4587 (0.4685) data time 0.0010 (0.0019) model time 0.4578 (0.4666) loss 2.9865 (3.0034) grad_norm 1.4793 (1.7020) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][490/625] eta 0:01:03 lr 0.000759 wd 0.0500 time 0.4576 (0.4683) data time 0.0009 (0.0019) model time 0.4567 (0.4665) loss 3.1405 (3.0077) grad_norm 1.2893 (1.6974) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][500/625] eta 0:00:58 lr 0.000759 wd 0.0500 time 0.4615 (0.4682) data time 0.0008 (0.0019) model time 0.4607 (0.4663) loss 3.6594 (3.0046) grad_norm 1.7113 (1.6975) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][510/625] eta 0:00:53 lr 0.000759 wd 0.0500 time 0.4646 (0.4681) data time 0.0008 (0.0019) model time 0.4638 (0.4662) loss 2.4918 (3.0105) grad_norm 1.5754 (1.6978) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][520/625] eta 0:00:49 lr 0.000759 wd 0.0500 time 0.4665 (0.4680) data time 0.0009 (0.0019) model time 0.4656 (0.4662) loss 2.3451 (3.0068) grad_norm 1.4964 (1.6965) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][530/625] eta 0:00:44 lr 0.000759 wd 0.0500 time 0.4636 (0.4680) data time 0.0011 (0.0019) model time 0.4625 (0.4661) loss 3.1841 (3.0081) grad_norm 1.6760 (1.7002) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][540/625] eta 0:00:39 lr 0.000758 wd 0.0500 time 0.4534 (0.4680) data time 0.0011 (0.0018) model time 0.4523 (0.4661) loss 3.0507 (3.0117) grad_norm 1.4421 (1.6948) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][550/625] eta 0:00:35 lr 0.000758 wd 0.0500 time 0.4647 (0.4679) data time 0.0011 (0.0018) model time 0.4636 (0.4661) loss 3.9135 (3.0120) grad_norm 1.5675 (1.6934) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][560/625] eta 0:00:30 lr 0.000758 wd 0.0500 time 0.6112 (0.4688) data time 0.0010 (0.0018) model time 0.6102 (0.4671) loss 2.8343 (3.0147) grad_norm 1.5068 (1.6904) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][570/625] eta 0:00:25 lr 0.000758 wd 0.0500 time 0.4592 (0.4686) data time 0.0011 (0.0018) model time 0.4581 (0.4669) loss 2.8160 (3.0173) grad_norm 1.6133 (1.6904) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][580/625] eta 0:00:21 lr 0.000758 wd 0.0500 time 0.4587 (0.4685) data time 0.0011 (0.0018) model time 0.4577 (0.4668) loss 3.0493 (3.0175) grad_norm 1.2928 (1.6885) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][590/625] eta 0:00:16 lr 0.000758 wd 0.0500 time 0.4755 (0.4685) data time 0.0011 (0.0018) model time 0.4744 (0.4668) loss 2.6687 (3.0138) grad_norm 2.1792 (1.6907) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][600/625] eta 0:00:11 lr 0.000758 wd 0.0500 time 0.4738 (0.4686) data time 0.0007 (0.0018) model time 0.4730 (0.4669) loss 2.2867 (3.0111) grad_norm 1.6533 (1.6903) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][610/625] eta 0:00:07 lr 0.000758 wd 0.0500 time 0.4646 (0.4686) data time 0.0006 (0.0018) model time 0.4641 (0.4669) loss 3.0985 (3.0141) grad_norm 6.6854 (1.7065) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [136/300][620/625] eta 0:00:02 lr 0.000758 wd 0.0500 time 0.4594 (0.4685) data time 0.0005 (0.0017) model time 0.4589 (0.4668) loss 2.9984 (3.0105) grad_norm 1.5318 (1.7104) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 136 training takes 0:04:52 [2024-08-10 12:55:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 12:55:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 12:55:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.532 (0.532) Loss 0.5684 (0.5684) Acc@1 87.988 (87.988) Acc@5 98.340 (98.340) Mem 16715MB [2024-08-10 12:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.8809 (0.6816) Acc@1 77.344 (84.899) Acc@5 95.508 (97.519) Mem 16715MB [2024-08-10 12:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 0.9717 (0.8089) Acc@1 76.904 (81.799) Acc@5 94.434 (96.043) Mem 16715MB [2024-08-10 12:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.602 Acc@5 96.063 [2024-08-10 12:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.6% [2024-08-10 12:55:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.864 (0.864) Loss 0.4795 (0.4795) Acc@1 89.453 (89.453) Acc@5 98.633 (98.633) Mem 16715MB [2024-08-10 12:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.195) Loss 0.7715 (0.6038) Acc@1 81.738 (86.754) Acc@5 96.729 (97.829) Mem 16715MB [2024-08-10 12:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.8857 (0.7088) Acc@1 78.125 (83.803) Acc@5 95.508 (96.731) Mem 16715MB [2024-08-10 12:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.467 Acc@5 96.743 [2024-08-10 12:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 12:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.47% [2024-08-10 12:55:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 12:55:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 12:55:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][0/625] eta 0:09:02 lr 0.000758 wd 0.0500 time 0.8687 (0.8687) data time 0.4616 (0.4616) model time 0.0000 (0.0000) loss 3.9189 (3.9189) grad_norm 1.2288 (1.2288) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][10/625] eta 0:05:07 lr 0.000757 wd 0.0500 time 0.4662 (0.5008) data time 0.0010 (0.0429) model time 0.0000 (0.0000) loss 1.9426 (3.0265) grad_norm 1.0373 (1.4022) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][20/625] eta 0:04:52 lr 0.000757 wd 0.0500 time 0.4594 (0.4830) data time 0.0010 (0.0229) model time 0.0000 (0.0000) loss 3.5952 (3.0535) grad_norm 1.6991 (1.5020) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][30/625] eta 0:04:48 lr 0.000757 wd 0.0500 time 0.4758 (0.4857) data time 0.0009 (0.0159) model time 0.0000 (0.0000) loss 3.6402 (3.0502) grad_norm 1.2964 (1.4958) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][40/625] eta 0:04:41 lr 0.000757 wd 0.0500 time 0.4661 (0.4812) data time 0.0010 (0.0122) model time 0.0000 (0.0000) loss 3.0517 (2.9753) grad_norm 1.1243 (1.4753) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][50/625] eta 0:04:35 lr 0.000757 wd 0.0500 time 0.4682 (0.4786) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 3.0041 (2.9982) grad_norm 1.3732 (1.4697) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][60/625] eta 0:04:29 lr 0.000757 wd 0.0500 time 0.4639 (0.4768) data time 0.0012 (0.0085) model time 0.4627 (0.4667) loss 2.9736 (3.0256) grad_norm 1.6438 (1.4822) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][70/625] eta 0:04:23 lr 0.000757 wd 0.0500 time 0.4548 (0.4756) data time 0.0011 (0.0075) model time 0.4537 (0.4668) loss 3.3236 (3.0012) grad_norm 2.2165 (1.5399) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:55:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][80/625] eta 0:04:18 lr 0.000757 wd 0.0500 time 0.4662 (0.4744) data time 0.0007 (0.0067) model time 0.4655 (0.4661) loss 3.5851 (3.0214) grad_norm 1.6190 (1.5767) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][90/625] eta 0:04:13 lr 0.000757 wd 0.0500 time 0.4612 (0.4743) data time 0.0008 (0.0061) model time 0.4604 (0.4678) loss 2.4460 (2.9994) grad_norm 0.9311 (1.6048) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][100/625] eta 0:04:08 lr 0.000757 wd 0.0500 time 0.4655 (0.4735) data time 0.0010 (0.0055) model time 0.4645 (0.4672) loss 3.1895 (2.9966) grad_norm 1.5996 (1.6134) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][110/625] eta 0:04:03 lr 0.000756 wd 0.0500 time 0.4672 (0.4735) data time 0.0007 (0.0051) model time 0.4665 (0.4681) loss 2.8399 (2.9857) grad_norm 1.0884 (1.6131) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][120/625] eta 0:04:00 lr 0.000756 wd 0.0500 time 0.4870 (0.4762) data time 0.0010 (0.0048) model time 0.4860 (0.4735) loss 3.1143 (3.0030) grad_norm 1.4890 (1.6061) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][130/625] eta 0:03:55 lr 0.000756 wd 0.0500 time 0.4612 (0.4754) data time 0.0010 (0.0045) model time 0.4602 (0.4724) loss 2.2936 (3.0058) grad_norm 1.2168 (1.6023) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][140/625] eta 0:03:50 lr 0.000756 wd 0.0500 time 0.4596 (0.4748) data time 0.0012 (0.0043) model time 0.4584 (0.4716) loss 3.0801 (2.9795) grad_norm 1.8828 (1.6065) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][150/625] eta 0:03:45 lr 0.000756 wd 0.0500 time 0.4761 (0.4742) data time 0.0010 (0.0041) model time 0.4751 (0.4709) loss 3.0388 (2.9755) grad_norm 2.6802 (1.6518) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][160/625] eta 0:03:40 lr 0.000756 wd 0.0500 time 0.4575 (0.4739) data time 0.0008 (0.0040) model time 0.4567 (0.4704) loss 3.5074 (2.9610) grad_norm 2.6059 (1.7185) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][170/625] eta 0:03:35 lr 0.000756 wd 0.0500 time 0.4620 (0.4742) data time 0.0011 (0.0039) model time 0.4609 (0.4711) loss 3.0024 (2.9659) grad_norm 1.2097 (1.7517) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][180/625] eta 0:03:30 lr 0.000756 wd 0.0500 time 0.4675 (0.4739) data time 0.0011 (0.0037) model time 0.4665 (0.4708) loss 2.6511 (2.9629) grad_norm 1.2056 (1.7408) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][190/625] eta 0:03:25 lr 0.000756 wd 0.0500 time 0.4603 (0.4735) data time 0.0008 (0.0036) model time 0.4595 (0.4705) loss 2.0720 (2.9600) grad_norm 1.7192 (1.7185) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][200/625] eta 0:03:21 lr 0.000756 wd 0.0500 time 0.4638 (0.4732) data time 0.0011 (0.0034) model time 0.4627 (0.4701) loss 2.7150 (2.9649) grad_norm 1.0271 (1.7128) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][210/625] eta 0:03:16 lr 0.000755 wd 0.0500 time 0.4629 (0.4728) data time 0.0011 (0.0033) model time 0.4619 (0.4697) loss 2.8220 (2.9697) grad_norm 1.2503 (1.6957) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][220/625] eta 0:03:11 lr 0.000755 wd 0.0500 time 0.4650 (0.4723) data time 0.0010 (0.0032) model time 0.4640 (0.4692) loss 3.0537 (2.9716) grad_norm 1.2886 (1.6945) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][230/625] eta 0:03:06 lr 0.000755 wd 0.0500 time 0.4621 (0.4719) data time 0.0012 (0.0032) model time 0.4610 (0.4688) loss 3.1560 (2.9848) grad_norm 2.0787 (1.6985) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][240/625] eta 0:03:01 lr 0.000755 wd 0.0500 time 0.4671 (0.4717) data time 0.0008 (0.0031) model time 0.4663 (0.4686) loss 2.6884 (2.9865) grad_norm 2.3048 (1.7186) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][250/625] eta 0:02:56 lr 0.000755 wd 0.0500 time 0.4714 (0.4715) data time 0.0011 (0.0030) model time 0.4703 (0.4685) loss 3.3606 (2.9933) grad_norm 1.2938 (1.7169) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][260/625] eta 0:02:52 lr 0.000755 wd 0.0500 time 0.4690 (0.4714) data time 0.0008 (0.0029) model time 0.4681 (0.4684) loss 3.7994 (3.0041) grad_norm 2.0299 (1.7041) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][270/625] eta 0:02:47 lr 0.000755 wd 0.0500 time 0.4616 (0.4713) data time 0.0008 (0.0029) model time 0.4608 (0.4683) loss 3.1690 (3.0067) grad_norm 1.3483 (1.7095) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][280/625] eta 0:02:42 lr 0.000755 wd 0.0500 time 0.4637 (0.4710) data time 0.0011 (0.0028) model time 0.4626 (0.4682) loss 3.1961 (3.0160) grad_norm 1.1633 (1.7042) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][290/625] eta 0:02:37 lr 0.000755 wd 0.0500 time 0.4609 (0.4708) data time 0.0009 (0.0027) model time 0.4600 (0.4680) loss 2.1318 (3.0106) grad_norm 2.2111 (1.7075) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][300/625] eta 0:02:32 lr 0.000754 wd 0.0500 time 0.4640 (0.4707) data time 0.0008 (0.0027) model time 0.4632 (0.4679) loss 3.4360 (3.0145) grad_norm 1.5373 (1.7002) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][310/625] eta 0:02:28 lr 0.000754 wd 0.0500 time 0.4658 (0.4710) data time 0.0008 (0.0026) model time 0.4650 (0.4683) loss 2.9277 (3.0171) grad_norm 1.5704 (1.6948) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][320/625] eta 0:02:23 lr 0.000754 wd 0.0500 time 0.4704 (0.4708) data time 0.0008 (0.0026) model time 0.4696 (0.4681) loss 2.3324 (3.0119) grad_norm 1.3137 (1.6847) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:57:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][330/625] eta 0:02:18 lr 0.000754 wd 0.0500 time 0.4655 (0.4706) data time 0.0008 (0.0025) model time 0.4647 (0.4680) loss 3.3437 (3.0120) grad_norm 1.4597 (1.6786) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][340/625] eta 0:02:14 lr 0.000754 wd 0.0500 time 0.4658 (0.4706) data time 0.0010 (0.0025) model time 0.4648 (0.4680) loss 2.7491 (3.0153) grad_norm 1.9636 (1.6966) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][350/625] eta 0:02:09 lr 0.000754 wd 0.0500 time 0.4629 (0.4704) data time 0.0008 (0.0025) model time 0.4622 (0.4678) loss 2.3360 (3.0137) grad_norm 1.6940 (1.6944) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][360/625] eta 0:02:04 lr 0.000754 wd 0.0500 time 0.4631 (0.4703) data time 0.0009 (0.0024) model time 0.4623 (0.4677) loss 2.8257 (3.0150) grad_norm 1.3368 (1.6898) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][370/625] eta 0:02:00 lr 0.000754 wd 0.0500 time 0.4589 (0.4707) data time 0.0011 (0.0024) model time 0.4577 (0.4683) loss 2.9461 (3.0210) grad_norm 1.1554 (1.6868) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][380/625] eta 0:01:55 lr 0.000754 wd 0.0500 time 0.4619 (0.4705) data time 0.0011 (0.0023) model time 0.4609 (0.4681) loss 2.7698 (3.0107) grad_norm 2.0130 (1.6861) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][390/625] eta 0:01:50 lr 0.000754 wd 0.0500 time 0.4585 (0.4708) data time 0.0009 (0.0023) model time 0.4576 (0.4685) loss 2.6101 (3.0144) grad_norm 1.5904 (1.6827) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][400/625] eta 0:01:45 lr 0.000753 wd 0.0500 time 0.4695 (0.4707) data time 0.0010 (0.0023) model time 0.4685 (0.4684) loss 3.2084 (3.0146) grad_norm 1.4867 (1.6783) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][410/625] eta 0:01:41 lr 0.000753 wd 0.0500 time 0.4646 (0.4706) data time 0.0011 (0.0023) model time 0.4635 (0.4683) loss 3.4882 (3.0158) grad_norm 1.2186 (1.6754) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][420/625] eta 0:01:36 lr 0.000753 wd 0.0500 time 0.4654 (0.4705) data time 0.0007 (0.0022) model time 0.4647 (0.4682) loss 2.6687 (3.0136) grad_norm 1.2420 (1.6731) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][430/625] eta 0:01:31 lr 0.000753 wd 0.0500 time 0.4626 (0.4703) data time 0.0008 (0.0022) model time 0.4619 (0.4681) loss 3.5769 (3.0132) grad_norm 1.4174 (1.6763) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][440/625] eta 0:01:26 lr 0.000753 wd 0.0500 time 0.4612 (0.4702) data time 0.0009 (0.0022) model time 0.4603 (0.4680) loss 3.1648 (3.0079) grad_norm 1.8799 (1.6803) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][450/625] eta 0:01:22 lr 0.000753 wd 0.0500 time 0.6778 (0.4709) data time 0.0007 (0.0021) model time 0.6771 (0.4688) loss 2.2736 (3.0054) grad_norm 1.4621 (1.6823) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:58:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][460/625] eta 0:01:17 lr 0.000753 wd 0.0500 time 0.4614 (0.4707) data time 0.0008 (0.0021) model time 0.4606 (0.4686) loss 2.7514 (3.0043) grad_norm 1.6676 (1.6864) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][470/625] eta 0:01:12 lr 0.000753 wd 0.0500 time 0.4624 (0.4705) data time 0.0007 (0.0021) model time 0.4617 (0.4684) loss 2.9762 (3.0065) grad_norm 1.7574 (1.6971) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][480/625] eta 0:01:08 lr 0.000753 wd 0.0500 time 0.4643 (0.4706) data time 0.0009 (0.0021) model time 0.4634 (0.4685) loss 3.3179 (3.0085) grad_norm 1.5975 (1.6911) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][490/625] eta 0:01:03 lr 0.000753 wd 0.0500 time 0.4645 (0.4704) data time 0.0008 (0.0021) model time 0.4637 (0.4683) loss 3.7486 (3.0102) grad_norm 1.7538 (1.6900) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][500/625] eta 0:00:58 lr 0.000752 wd 0.0500 time 0.4620 (0.4704) data time 0.0008 (0.0021) model time 0.4612 (0.4684) loss 2.9965 (3.0101) grad_norm 1.9999 (1.6926) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][510/625] eta 0:00:54 lr 0.000752 wd 0.0500 time 0.4631 (0.4703) data time 0.0010 (0.0020) model time 0.4621 (0.4682) loss 3.3049 (3.0115) grad_norm 1.6317 (1.7008) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][520/625] eta 0:00:49 lr 0.000752 wd 0.0500 time 0.4598 (0.4701) data time 0.0010 (0.0020) model time 0.4588 (0.4681) loss 2.8210 (3.0093) grad_norm 1.3058 (1.7016) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][530/625] eta 0:00:44 lr 0.000752 wd 0.0500 time 0.4618 (0.4700) data time 0.0008 (0.0020) model time 0.4610 (0.4679) loss 2.2361 (3.0140) grad_norm 1.5894 (1.7002) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][540/625] eta 0:00:39 lr 0.000752 wd 0.0500 time 0.4659 (0.4702) data time 0.0011 (0.0020) model time 0.4648 (0.4682) loss 1.8432 (3.0126) grad_norm 1.7689 (1.6988) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][550/625] eta 0:00:35 lr 0.000752 wd 0.0500 time 0.4689 (0.4701) data time 0.0010 (0.0020) model time 0.4679 (0.4681) loss 3.1903 (3.0107) grad_norm 1.7743 (1.6983) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][560/625] eta 0:00:30 lr 0.000752 wd 0.0500 time 0.4655 (0.4704) data time 0.0011 (0.0019) model time 0.4644 (0.4685) loss 2.7595 (3.0107) grad_norm 1.3413 (1.7004) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][570/625] eta 0:00:25 lr 0.000752 wd 0.0500 time 0.4621 (0.4704) data time 0.0011 (0.0020) model time 0.4610 (0.4684) loss 3.4957 (3.0118) grad_norm 1.6967 (1.6959) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][580/625] eta 0:00:21 lr 0.000752 wd 0.0500 time 0.4624 (0.4703) data time 0.0009 (0.0019) model time 0.4615 (0.4683) loss 3.1921 (3.0118) grad_norm 1.2586 (1.6920) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 12:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][590/625] eta 0:00:16 lr 0.000752 wd 0.0500 time 0.4647 (0.4701) data time 0.0008 (0.0019) model time 0.4639 (0.4682) loss 3.2073 (3.0146) grad_norm 2.9049 (1.6968) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][600/625] eta 0:00:11 lr 0.000751 wd 0.0500 time 0.4616 (0.4700) data time 0.0008 (0.0019) model time 0.4607 (0.4681) loss 3.3801 (3.0182) grad_norm 2.7195 (1.7015) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][610/625] eta 0:00:07 lr 0.000751 wd 0.0500 time 0.4576 (0.4699) data time 0.0007 (0.0019) model time 0.4569 (0.4680) loss 3.4535 (3.0176) grad_norm 1.4591 (1.7003) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [137/300][620/625] eta 0:00:02 lr 0.000751 wd 0.0500 time 0.4583 (0.4698) data time 0.0007 (0.0019) model time 0.4575 (0.4678) loss 2.5342 (3.0173) grad_norm 2.0321 (1.6983) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 137 training takes 0:04:53 [2024-08-10 13:00:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:00:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.5122 (0.5122) Acc@1 88.623 (88.623) Acc@5 98.486 (98.486) Mem 16715MB [2024-08-10 13:00:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.121 (0.163) Loss 0.8623 (0.6453) Acc@1 79.395 (85.525) Acc@5 95.605 (97.390) Mem 16715MB [2024-08-10 13:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 1.0068 (0.7660) Acc@1 74.463 (82.208) Acc@5 93.848 (96.140) Mem 16715MB [2024-08-10 13:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.850 Acc@5 96.113 [2024-08-10 13:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-10 13:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.866 (0.866) Loss 0.4783 (0.4783) Acc@1 89.551 (89.551) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 13:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.195) Loss 0.7690 (0.6026) Acc@1 81.836 (86.732) Acc@5 96.631 (97.820) Mem 16715MB [2024-08-10 13:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.8848 (0.7077) Acc@1 78.027 (83.819) Acc@5 95.557 (96.729) Mem 16715MB [2024-08-10 13:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.495 Acc@5 96.753 [2024-08-10 13:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 13:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.50% [2024-08-10 13:00:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:00:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:00:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][0/625] eta 0:08:47 lr 0.000751 wd 0.0500 time 0.8447 (0.8447) data time 0.4290 (0.4290) model time 0.0000 (0.0000) loss 2.7287 (2.7287) grad_norm 1.8488 (1.8488) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][10/625] eta 0:05:07 lr 0.000751 wd 0.0500 time 0.4689 (0.4994) data time 0.0009 (0.0400) model time 0.0000 (0.0000) loss 3.2001 (2.8712) grad_norm 1.4439 (1.5365) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][20/625] eta 0:04:51 lr 0.000751 wd 0.0500 time 0.4652 (0.4818) data time 0.0008 (0.0214) model time 0.0000 (0.0000) loss 3.1233 (2.9852) grad_norm 1.3975 (1.5732) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][30/625] eta 0:04:47 lr 0.000751 wd 0.0500 time 0.4609 (0.4827) data time 0.0009 (0.0149) model time 0.0000 (0.0000) loss 2.6359 (3.0193) grad_norm 1.4100 (1.5498) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][40/625] eta 0:04:39 lr 0.000751 wd 0.0500 time 0.4553 (0.4776) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 3.3266 (3.0472) grad_norm 1.2209 (1.5387) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][50/625] eta 0:04:33 lr 0.000751 wd 0.0500 time 0.4653 (0.4750) data time 0.0009 (0.0094) model time 0.0000 (0.0000) loss 2.7710 (3.0054) grad_norm 1.6705 (1.5840) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][60/625] eta 0:04:27 lr 0.000751 wd 0.0500 time 0.4724 (0.4736) data time 0.0010 (0.0081) model time 0.4714 (0.4655) loss 2.4189 (2.9812) grad_norm 1.4900 (1.6078) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:00:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][70/625] eta 0:04:22 lr 0.000750 wd 0.0500 time 0.4620 (0.4726) data time 0.0010 (0.0071) model time 0.4610 (0.4653) loss 3.0991 (2.9171) grad_norm 1.4358 (1.6562) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][80/625] eta 0:04:17 lr 0.000750 wd 0.0500 time 0.4613 (0.4717) data time 0.0010 (0.0063) model time 0.4603 (0.4650) loss 3.5201 (2.9218) grad_norm 1.4749 (1.6378) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][90/625] eta 0:04:12 lr 0.000750 wd 0.0500 time 0.4659 (0.4712) data time 0.0011 (0.0057) model time 0.4649 (0.4652) loss 3.3924 (2.9268) grad_norm 1.3444 (1.6242) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][100/625] eta 0:04:06 lr 0.000750 wd 0.0500 time 0.4561 (0.4703) data time 0.0009 (0.0053) model time 0.4552 (0.4644) loss 2.9026 (2.9379) grad_norm 1.3204 (1.6001) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][110/625] eta 0:04:02 lr 0.000750 wd 0.0500 time 0.4627 (0.4711) data time 0.0008 (0.0049) model time 0.4619 (0.4668) loss 2.7510 (2.9303) grad_norm 1.8257 (1.6868) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][120/625] eta 0:03:57 lr 0.000750 wd 0.0500 time 0.4660 (0.4705) data time 0.0008 (0.0046) model time 0.4652 (0.4662) loss 1.9466 (2.9489) grad_norm 1.2570 (1.6810) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][130/625] eta 0:03:53 lr 0.000750 wd 0.0500 time 0.4698 (0.4720) data time 0.0008 (0.0043) model time 0.4689 (0.4690) loss 1.9281 (2.9367) grad_norm 1.2913 (1.6754) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][140/625] eta 0:03:48 lr 0.000750 wd 0.0500 time 0.4695 (0.4715) data time 0.0009 (0.0041) model time 0.4686 (0.4685) loss 2.5050 (2.9353) grad_norm 1.6108 (1.6984) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][150/625] eta 0:03:43 lr 0.000750 wd 0.0500 time 0.4624 (0.4711) data time 0.0008 (0.0039) model time 0.4615 (0.4680) loss 2.4107 (2.9361) grad_norm 1.7231 (1.6751) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][160/625] eta 0:03:38 lr 0.000749 wd 0.0500 time 0.4666 (0.4707) data time 0.0010 (0.0037) model time 0.4656 (0.4677) loss 3.2246 (2.9352) grad_norm 1.2021 (1.6732) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][170/625] eta 0:03:33 lr 0.000749 wd 0.0500 time 0.4590 (0.4702) data time 0.0010 (0.0035) model time 0.4580 (0.4671) loss 3.1897 (2.9479) grad_norm 1.8770 (1.6756) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][180/625] eta 0:03:29 lr 0.000749 wd 0.0500 time 0.4644 (0.4698) data time 0.0010 (0.0034) model time 0.4634 (0.4667) loss 3.2689 (2.9544) grad_norm 1.4613 (1.6994) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][190/625] eta 0:03:24 lr 0.000749 wd 0.0500 time 0.4612 (0.4695) data time 0.0011 (0.0033) model time 0.4600 (0.4665) loss 2.8626 (2.9561) grad_norm 2.6074 (1.7087) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][200/625] eta 0:03:19 lr 0.000749 wd 0.0500 time 0.4709 (0.4694) data time 0.0008 (0.0032) model time 0.4702 (0.4664) loss 3.7778 (2.9704) grad_norm 1.4073 (1.6946) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][210/625] eta 0:03:14 lr 0.000749 wd 0.0500 time 0.4827 (0.4693) data time 0.0010 (0.0031) model time 0.4817 (0.4665) loss 2.6537 (2.9749) grad_norm 1.5153 (1.6935) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][220/625] eta 0:03:10 lr 0.000749 wd 0.0500 time 0.4632 (0.4698) data time 0.0010 (0.0030) model time 0.4622 (0.4672) loss 3.2573 (2.9747) grad_norm 1.3693 (1.6830) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][230/625] eta 0:03:05 lr 0.000749 wd 0.0500 time 0.4732 (0.4696) data time 0.0009 (0.0029) model time 0.4723 (0.4671) loss 3.0220 (2.9748) grad_norm 1.8863 (1.6748) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][240/625] eta 0:03:00 lr 0.000749 wd 0.0500 time 0.4647 (0.4695) data time 0.0009 (0.0028) model time 0.4639 (0.4671) loss 3.6672 (2.9750) grad_norm 2.0555 (1.6654) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][250/625] eta 0:02:56 lr 0.000749 wd 0.0500 time 0.4663 (0.4694) data time 0.0008 (0.0027) model time 0.4655 (0.4669) loss 3.5154 (2.9777) grad_norm 2.4308 (1.6706) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][260/625] eta 0:02:51 lr 0.000748 wd 0.0500 time 0.4591 (0.4693) data time 0.0009 (0.0027) model time 0.4582 (0.4669) loss 2.3576 (2.9732) grad_norm 2.7724 (1.6691) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][270/625] eta 0:02:46 lr 0.000748 wd 0.0500 time 0.4687 (0.4693) data time 0.0012 (0.0026) model time 0.4674 (0.4670) loss 2.5592 (2.9714) grad_norm 1.2216 (1.6575) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][280/625] eta 0:02:41 lr 0.000748 wd 0.0500 time 0.4675 (0.4693) data time 0.0008 (0.0026) model time 0.4667 (0.4670) loss 3.4202 (2.9670) grad_norm 1.6601 (1.6544) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][290/625] eta 0:02:37 lr 0.000748 wd 0.0500 time 0.4689 (0.4694) data time 0.0008 (0.0025) model time 0.4681 (0.4671) loss 3.6973 (2.9668) grad_norm 1.7751 (1.6578) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][300/625] eta 0:02:32 lr 0.000748 wd 0.0500 time 0.4624 (0.4693) data time 0.0008 (0.0025) model time 0.4617 (0.4672) loss 2.7738 (2.9673) grad_norm 1.9739 (1.6641) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][310/625] eta 0:02:27 lr 0.000748 wd 0.0500 time 0.4750 (0.4693) data time 0.0010 (0.0024) model time 0.4740 (0.4672) loss 2.7598 (2.9630) grad_norm 1.5703 (1.6570) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][320/625] eta 0:02:23 lr 0.000748 wd 0.0500 time 0.4747 (0.4693) data time 0.0010 (0.0024) model time 0.4737 (0.4672) loss 3.3821 (2.9706) grad_norm 1.5503 (1.7031) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][330/625] eta 0:02:18 lr 0.000748 wd 0.0500 time 0.4672 (0.4693) data time 0.0008 (0.0023) model time 0.4665 (0.4672) loss 2.2563 (2.9620) grad_norm 1.6627 (1.7100) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][340/625] eta 0:02:13 lr 0.000748 wd 0.0500 time 0.4621 (0.4692) data time 0.0008 (0.0023) model time 0.4613 (0.4671) loss 3.6002 (2.9637) grad_norm 1.8521 (1.7209) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][350/625] eta 0:02:09 lr 0.000748 wd 0.0500 time 0.4671 (0.4697) data time 0.0010 (0.0023) model time 0.4661 (0.4678) loss 3.3766 (2.9627) grad_norm 1.3153 (1.7159) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][360/625] eta 0:02:04 lr 0.000747 wd 0.0500 time 0.6405 (0.4702) data time 0.0011 (0.0022) model time 0.6394 (0.4683) loss 2.1001 (2.9566) grad_norm 1.7504 (1.7140) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][370/625] eta 0:01:59 lr 0.000747 wd 0.0500 time 0.4682 (0.4701) data time 0.0009 (0.0022) model time 0.4673 (0.4683) loss 3.3665 (2.9567) grad_norm 1.7538 (1.7100) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][380/625] eta 0:01:55 lr 0.000747 wd 0.0500 time 0.4615 (0.4700) data time 0.0008 (0.0022) model time 0.4607 (0.4682) loss 2.5912 (2.9541) grad_norm 1.3643 (1.7039) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][390/625] eta 0:01:50 lr 0.000747 wd 0.0500 time 0.4641 (0.4698) data time 0.0008 (0.0021) model time 0.4633 (0.4680) loss 2.5459 (2.9519) grad_norm 1.4600 (1.6996) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][400/625] eta 0:01:45 lr 0.000747 wd 0.0500 time 0.4716 (0.4697) data time 0.0010 (0.0021) model time 0.4706 (0.4679) loss 2.5833 (2.9517) grad_norm 1.4918 (1.6921) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][410/625] eta 0:01:40 lr 0.000747 wd 0.0500 time 0.4611 (0.4696) data time 0.0011 (0.0021) model time 0.4600 (0.4678) loss 2.9458 (2.9490) grad_norm 1.6666 (1.6930) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][420/625] eta 0:01:36 lr 0.000747 wd 0.0500 time 0.4633 (0.4694) data time 0.0008 (0.0020) model time 0.4625 (0.4676) loss 3.7496 (2.9522) grad_norm 1.7549 (1.6841) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][430/625] eta 0:01:31 lr 0.000747 wd 0.0500 time 0.4656 (0.4695) data time 0.0009 (0.0021) model time 0.4647 (0.4677) loss 2.8720 (2.9557) grad_norm 2.0348 (1.6841) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][440/625] eta 0:01:27 lr 0.000747 wd 0.0500 time 0.4809 (0.4704) data time 0.0008 (0.0020) model time 0.4802 (0.4687) loss 3.4661 (2.9563) grad_norm 2.3783 (1.6837) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:03:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][450/625] eta 0:01:22 lr 0.000746 wd 0.0500 time 0.4668 (0.4703) data time 0.0008 (0.0020) model time 0.4660 (0.4686) loss 3.7982 (2.9626) grad_norm 1.6582 (1.6833) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][460/625] eta 0:01:17 lr 0.000746 wd 0.0500 time 0.4629 (0.4702) data time 0.0008 (0.0020) model time 0.4621 (0.4685) loss 2.4839 (2.9601) grad_norm 1.9673 (1.6798) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][470/625] eta 0:01:12 lr 0.000746 wd 0.0500 time 0.4715 (0.4701) data time 0.0009 (0.0020) model time 0.4706 (0.4684) loss 3.4645 (2.9617) grad_norm 1.4296 (1.6763) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][480/625] eta 0:01:08 lr 0.000746 wd 0.0500 time 0.4605 (0.4699) data time 0.0008 (0.0020) model time 0.4598 (0.4682) loss 2.9489 (2.9664) grad_norm 1.8358 (1.6758) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][490/625] eta 0:01:03 lr 0.000746 wd 0.0500 time 0.4654 (0.4698) data time 0.0008 (0.0020) model time 0.4645 (0.4681) loss 3.2406 (2.9706) grad_norm 1.3180 (1.6749) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][500/625] eta 0:00:58 lr 0.000746 wd 0.0500 time 0.4635 (0.4701) data time 0.0011 (0.0020) model time 0.4625 (0.4685) loss 2.5156 (2.9693) grad_norm 1.5159 (1.6829) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][510/625] eta 0:00:54 lr 0.000746 wd 0.0500 time 0.4700 (0.4701) data time 0.0008 (0.0019) model time 0.4692 (0.4685) loss 2.5852 (2.9698) grad_norm 1.6020 (1.6857) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 13:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][520/625] eta 0:00:49 lr 0.000746 wd 0.0500 time 0.4623 (0.4702) data time 0.0008 (0.0019) model time 0.4615 (0.4686) loss 2.3538 (2.9698) grad_norm 1.6153 (1.6999) loss_scale 1024.0000 (516.9136) mem 16715MB [2024-08-10 13:04:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][530/625] eta 0:00:44 lr 0.000746 wd 0.0500 time 0.4596 (0.4701) data time 0.0011 (0.0019) model time 0.4585 (0.4685) loss 2.8220 (2.9721) grad_norm 1.0956 (1.6946) loss_scale 1024.0000 (526.4633) mem 16715MB [2024-08-10 13:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][540/625] eta 0:00:39 lr 0.000746 wd 0.0500 time 0.4669 (0.4701) data time 0.0008 (0.0019) model time 0.4661 (0.4685) loss 3.0350 (2.9725) grad_norm 1.4094 (1.6940) loss_scale 1024.0000 (535.6599) mem 16715MB [2024-08-10 13:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][550/625] eta 0:00:35 lr 0.000745 wd 0.0500 time 0.4122 (0.4702) data time 0.0009 (0.0019) model time 0.4113 (0.4686) loss 2.3428 (2.9718) grad_norm 2.4750 (1.7026) loss_scale 1024.0000 (544.5227) mem 16715MB [2024-08-10 13:04:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][560/625] eta 0:00:30 lr 0.000745 wd 0.0500 time 0.4648 (0.4701) data time 0.0011 (0.0019) model time 0.4637 (0.4685) loss 3.3991 (2.9770) grad_norm 1.5823 (1.7107) loss_scale 1024.0000 (553.0695) mem 16715MB [2024-08-10 13:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][570/625] eta 0:00:25 lr 0.000745 wd 0.0500 time 0.4640 (0.4701) data time 0.0012 (0.0018) model time 0.4628 (0.4685) loss 3.3521 (2.9772) grad_norm 1.7923 (1.7131) loss_scale 1024.0000 (561.3170) mem 16715MB [2024-08-10 13:04:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][580/625] eta 0:00:21 lr 0.000745 wd 0.0500 time 0.4682 (0.4700) data time 0.0010 (0.0018) model time 0.4672 (0.4684) loss 3.4331 (2.9806) grad_norm 1.2177 (1.7094) loss_scale 1024.0000 (569.2806) mem 16715MB [2024-08-10 13:05:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][590/625] eta 0:00:16 lr 0.000745 wd 0.0500 time 0.4731 (0.4704) data time 0.0008 (0.0018) model time 0.4723 (0.4688) loss 2.0242 (2.9816) grad_norm 1.7446 (1.7107) loss_scale 1024.0000 (576.9746) mem 16715MB [2024-08-10 13:05:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][600/625] eta 0:00:11 lr 0.000745 wd 0.0500 time 0.4658 (0.4703) data time 0.0010 (0.0018) model time 0.4647 (0.4687) loss 3.6638 (2.9854) grad_norm 1.4084 (1.7050) loss_scale 1024.0000 (584.4126) mem 16715MB [2024-08-10 13:05:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][610/625] eta 0:00:07 lr 0.000745 wd 0.0500 time 0.4601 (0.4702) data time 0.0007 (0.0018) model time 0.4594 (0.4686) loss 2.1793 (2.9858) grad_norm 1.4315 (1.6993) loss_scale 1024.0000 (591.6072) mem 16715MB [2024-08-10 13:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [138/300][620/625] eta 0:00:02 lr 0.000745 wd 0.0500 time 0.4571 (0.4700) data time 0.0005 (0.0018) model time 0.4566 (0.4685) loss 3.0704 (2.9833) grad_norm 1.2256 (1.7165) loss_scale 1024.0000 (598.5700) mem 16715MB [2024-08-10 13:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 138 training takes 0:04:53 [2024-08-10 13:05:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:05:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:05:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.528 (0.528) Loss 0.5054 (0.5054) Acc@1 89.062 (89.062) Acc@5 98.535 (98.535) Mem 16715MB [2024-08-10 13:05:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8774 (0.6485) Acc@1 78.662 (85.152) Acc@5 95.361 (97.461) Mem 16715MB [2024-08-10 13:05:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9624 (0.7770) Acc@1 75.977 (82.024) Acc@5 94.678 (96.105) Mem 16715MB [2024-08-10 13:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.730 Acc@5 96.097 [2024-08-10 13:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-10 13:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.906 (0.906) Loss 0.4771 (0.4771) Acc@1 89.551 (89.551) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 13:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.199) Loss 0.7671 (0.6014) Acc@1 81.982 (86.799) Acc@5 96.729 (97.843) Mem 16715MB [2024-08-10 13:05:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.8843 (0.7067) Acc@1 78.027 (83.884) Acc@5 95.557 (96.749) Mem 16715MB [2024-08-10 13:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.571 Acc@5 96.765 [2024-08-10 13:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-10 13:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.57% [2024-08-10 13:05:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:05:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][0/625] eta 0:09:16 lr 0.000745 wd 0.0500 time 0.8899 (0.8899) data time 0.4841 (0.4841) model time 0.0000 (0.0000) loss 2.6689 (2.6689) grad_norm 1.4511 (1.4511) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:05:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][10/625] eta 0:05:19 lr 0.000745 wd 0.0500 time 0.4629 (0.5199) data time 0.0010 (0.0449) model time 0.0000 (0.0000) loss 3.0853 (2.7376) grad_norm 1.6775 (1.6924) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:05:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][20/625] eta 0:04:58 lr 0.000744 wd 0.0500 time 0.4623 (0.4937) data time 0.0009 (0.0241) model time 0.0000 (0.0000) loss 2.7225 (2.7023) grad_norm 2.6111 (1.6421) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:05:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][30/625] eta 0:04:48 lr 0.000744 wd 0.0500 time 0.4670 (0.4843) data time 0.0008 (0.0166) model time 0.0000 (0.0000) loss 3.1636 (2.7674) grad_norm 1.5190 (1.5520) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:05:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][40/625] eta 0:04:40 lr 0.000744 wd 0.0500 time 0.4590 (0.4788) data time 0.0010 (0.0128) model time 0.0000 (0.0000) loss 3.4825 (2.8286) grad_norm 1.9746 (1.8019) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:05:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][50/625] eta 0:04:33 lr 0.000744 wd 0.0500 time 0.4606 (0.4754) data time 0.0010 (0.0105) model time 0.0000 (0.0000) loss 2.7150 (2.8515) grad_norm 1.2464 (1.7415) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:05:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][60/625] eta 0:04:27 lr 0.000744 wd 0.0500 time 0.4670 (0.4736) data time 0.0008 (0.0089) model time 0.4662 (0.4633) loss 3.7416 (2.8896) grad_norm 1.3584 (1.7304) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][70/625] eta 0:04:23 lr 0.000744 wd 0.0500 time 0.5967 (0.4740) data time 0.0010 (0.0078) model time 0.5957 (0.4692) loss 3.3327 (2.9071) grad_norm 1.8015 (1.7055) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][80/625] eta 0:04:17 lr 0.000744 wd 0.0500 time 0.4644 (0.4718) data time 0.0007 (0.0070) model time 0.4637 (0.4646) loss 2.7365 (2.8986) grad_norm 1.6947 (1.7314) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][90/625] eta 0:04:13 lr 0.000744 wd 0.0500 time 0.4615 (0.4735) data time 0.0008 (0.0063) model time 0.4608 (0.4699) loss 2.0423 (2.9237) grad_norm 1.3966 (1.7495) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][100/625] eta 0:04:08 lr 0.000744 wd 0.0500 time 0.4638 (0.4726) data time 0.0009 (0.0058) model time 0.4629 (0.4687) loss 2.4763 (2.9032) grad_norm 1.3261 (1.7116) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][110/625] eta 0:04:03 lr 0.000744 wd 0.0500 time 0.4736 (0.4721) data time 0.0010 (0.0054) model time 0.4726 (0.4683) loss 1.7321 (2.9026) grad_norm 1.4171 (1.7030) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][120/625] eta 0:03:58 lr 0.000743 wd 0.0500 time 0.4591 (0.4716) data time 0.0012 (0.0050) model time 0.4579 (0.4677) loss 3.3328 (2.9157) grad_norm 1.3404 (1.6923) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][130/625] eta 0:03:53 lr 0.000743 wd 0.0500 time 0.4622 (0.4723) data time 0.0010 (0.0048) model time 0.4613 (0.4692) loss 2.5504 (2.9385) grad_norm 1.7506 (1.6866) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][140/625] eta 0:03:48 lr 0.000743 wd 0.0500 time 0.4582 (0.4716) data time 0.0008 (0.0045) model time 0.4573 (0.4683) loss 3.1585 (2.9425) grad_norm 1.7022 (1.6733) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][150/625] eta 0:03:43 lr 0.000743 wd 0.0500 time 0.4709 (0.4712) data time 0.0010 (0.0043) model time 0.4699 (0.4680) loss 2.6862 (2.9371) grad_norm 1.8275 (1.6686) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][160/625] eta 0:03:38 lr 0.000743 wd 0.0500 time 0.4677 (0.4709) data time 0.0010 (0.0041) model time 0.4666 (0.4677) loss 2.7831 (2.9397) grad_norm 1.5823 (1.6544) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][170/625] eta 0:03:34 lr 0.000743 wd 0.0500 time 0.4628 (0.4705) data time 0.0008 (0.0039) model time 0.4620 (0.4674) loss 2.5617 (2.9464) grad_norm 1.6564 (1.6472) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][180/625] eta 0:03:29 lr 0.000743 wd 0.0500 time 0.4618 (0.4702) data time 0.0008 (0.0037) model time 0.4610 (0.4671) loss 2.0566 (2.9401) grad_norm 1.3285 (1.6453) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][190/625] eta 0:03:24 lr 0.000743 wd 0.0500 time 0.4682 (0.4699) data time 0.0008 (0.0036) model time 0.4674 (0.4669) loss 2.3452 (2.9449) grad_norm 2.5599 (1.6717) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][200/625] eta 0:03:19 lr 0.000743 wd 0.0500 time 0.4587 (0.4705) data time 0.0008 (0.0035) model time 0.4578 (0.4677) loss 2.1159 (2.9424) grad_norm 1.3098 (1.6840) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][210/625] eta 0:03:15 lr 0.000742 wd 0.0500 time 0.4691 (0.4701) data time 0.0008 (0.0033) model time 0.4682 (0.4674) loss 3.3813 (2.9514) grad_norm 1.8525 (1.6892) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][220/625] eta 0:03:10 lr 0.000742 wd 0.0500 time 0.4686 (0.4698) data time 0.0008 (0.0032) model time 0.4678 (0.4671) loss 1.9740 (2.9578) grad_norm 2.9689 (1.6887) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][230/625] eta 0:03:05 lr 0.000742 wd 0.0500 time 0.4676 (0.4697) data time 0.0011 (0.0031) model time 0.4665 (0.4670) loss 3.0212 (2.9608) grad_norm 1.8498 (1.7065) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][240/625] eta 0:03:00 lr 0.000742 wd 0.0500 time 0.4674 (0.4696) data time 0.0010 (0.0031) model time 0.4663 (0.4670) loss 2.4286 (2.9579) grad_norm 1.2931 (1.6978) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][250/625] eta 0:02:56 lr 0.000742 wd 0.0500 time 0.4678 (0.4695) data time 0.0010 (0.0030) model time 0.4668 (0.4669) loss 2.5384 (2.9517) grad_norm 3.8683 (1.6982) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][260/625] eta 0:02:51 lr 0.000742 wd 0.0500 time 0.4631 (0.4692) data time 0.0010 (0.0029) model time 0.4621 (0.4667) loss 3.4137 (2.9528) grad_norm 1.6481 (1.6934) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][270/625] eta 0:02:46 lr 0.000742 wd 0.0500 time 0.4570 (0.4691) data time 0.0012 (0.0028) model time 0.4558 (0.4665) loss 3.0163 (2.9566) grad_norm 1.0070 (1.6901) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][280/625] eta 0:02:41 lr 0.000742 wd 0.0500 time 0.4637 (0.4689) data time 0.0007 (0.0028) model time 0.4630 (0.4664) loss 1.8939 (2.9585) grad_norm 1.5933 (1.6837) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][290/625] eta 0:02:37 lr 0.000742 wd 0.0500 time 0.6966 (0.4695) data time 0.0010 (0.0027) model time 0.6956 (0.4671) loss 3.5922 (2.9534) grad_norm 1.5040 (1.6788) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][300/625] eta 0:02:32 lr 0.000742 wd 0.0500 time 0.4680 (0.4691) data time 0.0010 (0.0026) model time 0.4671 (0.4667) loss 2.3132 (2.9585) grad_norm 1.3820 (1.6704) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][310/625] eta 0:02:27 lr 0.000741 wd 0.0500 time 0.4636 (0.4689) data time 0.0008 (0.0026) model time 0.4628 (0.4666) loss 2.2777 (2.9574) grad_norm 1.6027 (1.6646) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][320/625] eta 0:02:22 lr 0.000741 wd 0.0500 time 0.4629 (0.4688) data time 0.0007 (0.0025) model time 0.4622 (0.4665) loss 3.2235 (2.9619) grad_norm 1.3222 (1.6647) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][330/625] eta 0:02:18 lr 0.000741 wd 0.0500 time 0.4615 (0.4686) data time 0.0007 (0.0025) model time 0.4608 (0.4663) loss 2.6029 (2.9566) grad_norm 1.5685 (1.6788) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][340/625] eta 0:02:13 lr 0.000741 wd 0.0500 time 0.4647 (0.4685) data time 0.0008 (0.0024) model time 0.4639 (0.4663) loss 2.1023 (2.9542) grad_norm 1.4041 (1.6752) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][350/625] eta 0:02:08 lr 0.000741 wd 0.0500 time 0.4607 (0.4684) data time 0.0008 (0.0024) model time 0.4599 (0.4661) loss 3.0821 (2.9561) grad_norm 1.6266 (1.6711) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][360/625] eta 0:02:04 lr 0.000741 wd 0.0500 time 0.4597 (0.4682) data time 0.0008 (0.0024) model time 0.4589 (0.4660) loss 3.5428 (2.9560) grad_norm 1.7924 (1.6736) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][370/625] eta 0:01:59 lr 0.000741 wd 0.0500 time 0.4660 (0.4681) data time 0.0010 (0.0023) model time 0.4650 (0.4659) loss 3.3060 (2.9556) grad_norm 1.6100 (1.6824) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][380/625] eta 0:01:54 lr 0.000741 wd 0.0500 time 0.4648 (0.4680) data time 0.0010 (0.0023) model time 0.4638 (0.4659) loss 2.5147 (2.9539) grad_norm 1.3348 (1.6887) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][390/625] eta 0:01:49 lr 0.000741 wd 0.0500 time 0.4642 (0.4680) data time 0.0010 (0.0023) model time 0.4632 (0.4658) loss 2.5996 (2.9527) grad_norm 1.4415 (1.6858) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][400/625] eta 0:01:45 lr 0.000741 wd 0.0500 time 0.4636 (0.4679) data time 0.0008 (0.0022) model time 0.4628 (0.4658) loss 2.8886 (2.9595) grad_norm 1.4700 (1.6777) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][410/625] eta 0:01:40 lr 0.000740 wd 0.0500 time 0.4661 (0.4678) data time 0.0010 (0.0022) model time 0.4651 (0.4657) loss 3.2426 (2.9630) grad_norm 1.7062 (1.6723) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][420/625] eta 0:01:35 lr 0.000740 wd 0.0500 time 0.4703 (0.4681) data time 0.0008 (0.0022) model time 0.4695 (0.4661) loss 3.2513 (2.9602) grad_norm 1.7013 (1.6725) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][430/625] eta 0:01:31 lr 0.000740 wd 0.0500 time 0.4594 (0.4685) data time 0.0007 (0.0021) model time 0.4587 (0.4666) loss 3.4479 (2.9647) grad_norm 2.0587 (1.6811) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:08:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][440/625] eta 0:01:26 lr 0.000740 wd 0.0500 time 0.4730 (0.4688) data time 0.0008 (0.0021) model time 0.4722 (0.4670) loss 3.5594 (2.9682) grad_norm 1.3496 (1.6882) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][450/625] eta 0:01:22 lr 0.000740 wd 0.0500 time 0.4620 (0.4688) data time 0.0008 (0.0021) model time 0.4612 (0.4669) loss 2.2060 (2.9668) grad_norm 2.6098 (1.6879) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][460/625] eta 0:01:17 lr 0.000740 wd 0.0500 time 0.4587 (0.4686) data time 0.0010 (0.0021) model time 0.4576 (0.4668) loss 2.6012 (2.9618) grad_norm 1.7599 (1.6871) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][470/625] eta 0:01:12 lr 0.000740 wd 0.0500 time 0.4657 (0.4689) data time 0.0007 (0.0020) model time 0.4650 (0.4671) loss 3.2236 (2.9600) grad_norm 1.1278 (1.6869) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][480/625] eta 0:01:07 lr 0.000740 wd 0.0500 time 0.4607 (0.4688) data time 0.0007 (0.0020) model time 0.4599 (0.4670) loss 3.5992 (2.9597) grad_norm 1.4514 (1.6837) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][490/625] eta 0:01:03 lr 0.000740 wd 0.0500 time 0.4641 (0.4687) data time 0.0007 (0.0020) model time 0.4634 (0.4669) loss 3.6322 (2.9606) grad_norm 1.2414 (1.6845) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][500/625] eta 0:00:58 lr 0.000739 wd 0.0500 time 0.4627 (0.4686) data time 0.0008 (0.0020) model time 0.4620 (0.4668) loss 3.4215 (2.9624) grad_norm 1.9694 (1.6922) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][510/625] eta 0:00:53 lr 0.000739 wd 0.0500 time 0.4660 (0.4685) data time 0.0008 (0.0020) model time 0.4652 (0.4667) loss 2.7523 (2.9572) grad_norm 1.3428 (1.7005) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][520/625] eta 0:00:49 lr 0.000739 wd 0.0500 time 0.4785 (0.4685) data time 0.0010 (0.0019) model time 0.4775 (0.4667) loss 2.0518 (2.9519) grad_norm 1.8694 (1.6982) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][530/625] eta 0:00:44 lr 0.000739 wd 0.0500 time 0.4672 (0.4684) data time 0.0010 (0.0019) model time 0.4661 (0.4667) loss 2.7859 (2.9484) grad_norm 1.5246 (1.7099) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][540/625] eta 0:00:39 lr 0.000739 wd 0.0500 time 0.4645 (0.4684) data time 0.0008 (0.0019) model time 0.4637 (0.4666) loss 2.8830 (2.9471) grad_norm 1.5068 (1.7119) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][550/625] eta 0:00:35 lr 0.000739 wd 0.0500 time 0.4639 (0.4683) data time 0.0008 (0.0019) model time 0.4631 (0.4666) loss 3.5683 (2.9441) grad_norm 1.3144 (1.7089) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][560/625] eta 0:00:30 lr 0.000739 wd 0.0500 time 0.4659 (0.4682) data time 0.0010 (0.0019) model time 0.4649 (0.4665) loss 3.2620 (2.9444) grad_norm 1.3719 (1.7174) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:09:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][570/625] eta 0:00:25 lr 0.000739 wd 0.0500 time 0.4639 (0.4685) data time 0.0008 (0.0019) model time 0.4631 (0.4668) loss 3.6102 (2.9488) grad_norm 1.7809 (1.7148) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][580/625] eta 0:00:21 lr 0.000739 wd 0.0500 time 0.4587 (0.4684) data time 0.0008 (0.0018) model time 0.4579 (0.4667) loss 2.9574 (2.9536) grad_norm 1.3164 (1.7096) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][590/625] eta 0:00:16 lr 0.000739 wd 0.0500 time 0.4645 (0.4683) data time 0.0009 (0.0018) model time 0.4636 (0.4666) loss 3.0162 (2.9567) grad_norm 1.8672 (1.7080) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][600/625] eta 0:00:11 lr 0.000738 wd 0.0500 time 0.4726 (0.4683) data time 0.0009 (0.0018) model time 0.4717 (0.4666) loss 3.3349 (2.9585) grad_norm 1.3261 (1.7063) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][610/625] eta 0:00:07 lr 0.000738 wd 0.0500 time 0.4596 (0.4683) data time 0.0005 (0.0018) model time 0.4591 (0.4666) loss 3.2698 (2.9595) grad_norm 1.7414 (1.7030) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [139/300][620/625] eta 0:00:02 lr 0.000738 wd 0.0500 time 0.4619 (0.4684) data time 0.0007 (0.0018) model time 0.4612 (0.4668) loss 3.0111 (2.9597) grad_norm 1.4255 (1.7184) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 139 training takes 0:04:52 [2024-08-10 13:10:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:10:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:10:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.540 (0.540) Loss 0.5396 (0.5396) Acc@1 88.232 (88.232) Acc@5 98.291 (98.291) Mem 16715MB [2024-08-10 13:10:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.121 (0.165) Loss 0.8584 (0.6675) Acc@1 79.150 (85.227) Acc@5 95.361 (97.350) Mem 16715MB [2024-08-10 13:10:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.143) Loss 0.9688 (0.7901) Acc@1 76.367 (82.055) Acc@5 94.238 (96.057) Mem 16715MB [2024-08-10 13:10:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.838 Acc@5 96.041 [2024-08-10 13:10:27 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-10 13:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.953 (0.953) Loss 0.4761 (0.4761) Acc@1 89.551 (89.551) Acc@5 98.682 (98.682) Mem 16715MB [2024-08-10 13:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.201) Loss 0.7651 (0.6008) Acc@1 81.787 (86.808) Acc@5 96.777 (97.847) Mem 16715MB [2024-08-10 13:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.161) Loss 0.8828 (0.7061) Acc@1 77.930 (83.891) Acc@5 95.654 (96.766) Mem 16715MB [2024-08-10 13:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.577 Acc@5 96.779 [2024-08-10 13:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-10 13:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.58% [2024-08-10 13:10:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:10:33 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:10:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][0/625] eta 0:08:53 lr 0.000738 wd 0.0500 time 0.8530 (0.8530) data time 0.4427 (0.4427) model time 0.0000 (0.0000) loss 3.4040 (3.4040) grad_norm 1.6591 (1.6591) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][10/625] eta 0:05:07 lr 0.000738 wd 0.0500 time 0.4680 (0.4992) data time 0.0008 (0.0412) model time 0.0000 (0.0000) loss 2.8731 (2.9703) grad_norm 1.3819 (1.7006) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][20/625] eta 0:04:51 lr 0.000738 wd 0.0500 time 0.4576 (0.4821) data time 0.0010 (0.0220) model time 0.0000 (0.0000) loss 2.6745 (2.9761) grad_norm 2.0026 (1.6701) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][30/625] eta 0:04:47 lr 0.000738 wd 0.0500 time 0.4662 (0.4826) data time 0.0010 (0.0152) model time 0.0000 (0.0000) loss 3.3792 (3.0683) grad_norm 1.6616 (1.6554) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][40/625] eta 0:04:40 lr 0.000738 wd 0.0500 time 0.4656 (0.4791) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 2.9718 (2.9952) grad_norm 2.0412 (1.6607) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:10:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][50/625] eta 0:04:36 lr 0.000738 wd 0.0500 time 0.4843 (0.4810) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 3.6318 (3.0052) grad_norm 1.6902 (1.6406) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:11:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][60/625] eta 0:04:30 lr 0.000738 wd 0.0500 time 0.4587 (0.4795) data time 0.0010 (0.0085) model time 0.4578 (0.4691) loss 3.0701 (3.0213) grad_norm 2.1470 (1.7087) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][70/625] eta 0:04:25 lr 0.000737 wd 0.0500 time 0.4736 (0.4775) data time 0.0008 (0.0075) model time 0.4728 (0.4667) loss 3.0254 (3.0170) grad_norm 1.2849 (1.7200) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:11:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][80/625] eta 0:04:19 lr 0.000737 wd 0.0500 time 0.4638 (0.4768) data time 0.0008 (0.0071) model time 0.4630 (0.4668) loss 2.5930 (2.9898) grad_norm 1.4108 (1.6874) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][90/625] eta 0:04:14 lr 0.000737 wd 0.0500 time 0.4666 (0.4755) data time 0.0011 (0.0065) model time 0.4654 (0.4661) loss 3.6981 (2.9696) grad_norm 2.0053 (1.7138) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:11:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][100/625] eta 0:04:09 lr 0.000737 wd 0.0500 time 0.4682 (0.4745) data time 0.0008 (0.0059) model time 0.4674 (0.4657) loss 2.4244 (2.9551) grad_norm 2.3695 (1.7717) loss_scale 1024.0000 (1024.0000) mem 16715MB [2024-08-10 13:11:23 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 13:11:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:11:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:13:11 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 13:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 13:13:25 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 13:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 13:13:34 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 13:13:37 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 13:13:39 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 13:13:39 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 140) [2024-08-10 13:13:39 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 13:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][110/625] eta 0:32:09 lr 0.000737 wd 0.0500 time 0.4393 (3.7460) data time 0.0008 (0.1792) model time 0.4384 (3.5668) loss 3.4764 (3.3794) grad_norm 1.8436 (2.0318) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][120/625] eta 0:14:09 lr 0.000737 wd 0.0500 time 0.4571 (1.6826) data time 0.0009 (0.0678) model time 0.4561 (1.6148) loss 3.2095 (3.2258) grad_norm 1.5645 (1.7877) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][130/625] eta 0:09:56 lr 0.000737 wd 0.0500 time 0.4402 (1.2050) data time 0.0007 (0.0421) model time 0.4395 (1.1630) loss 2.8809 (3.1843) grad_norm 1.2889 (1.7102) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][140/625] eta 0:08:04 lr 0.000737 wd 0.0500 time 0.4397 (0.9997) data time 0.0009 (0.0307) model time 0.4388 (0.9691) loss 3.2650 (3.1794) grad_norm 1.7712 (1.7045) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][150/625] eta 0:06:59 lr 0.000737 wd 0.0500 time 0.4486 (0.8833) data time 0.0010 (0.0242) model time 0.4476 (0.8590) loss 2.9060 (3.1628) grad_norm 1.8266 (1.7461) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][160/625] eta 0:06:14 lr 0.000737 wd 0.0500 time 0.4448 (0.8047) data time 0.0007 (0.0201) model time 0.4441 (0.7846) loss 3.5775 (3.1624) grad_norm 1.7639 (1.7263) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][170/625] eta 0:05:41 lr 0.000736 wd 0.0500 time 0.4404 (0.7499) data time 0.0007 (0.0172) model time 0.4397 (0.7328) loss 2.5852 (3.1349) grad_norm 2.4601 (1.7513) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][180/625] eta 0:05:15 lr 0.000736 wd 0.0500 time 0.4410 (0.7098) data time 0.0010 (0.0150) model time 0.4400 (0.6947) loss 3.1992 (3.0999) grad_norm 1.6850 (1.7447) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][190/625] eta 0:04:55 lr 0.000736 wd 0.0500 time 0.4524 (0.6790) data time 0.0007 (0.0134) model time 0.4518 (0.6656) loss 2.4031 (3.0686) grad_norm 1.5122 (1.7164) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][200/625] eta 0:04:38 lr 0.000736 wd 0.0500 time 0.4416 (0.6542) data time 0.0007 (0.0121) model time 0.4409 (0.6421) loss 3.2101 (3.0709) grad_norm 1.8193 (1.7175) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][210/625] eta 0:04:23 lr 0.000736 wd 0.0500 time 0.4394 (0.6341) data time 0.0010 (0.0110) model time 0.4385 (0.6231) loss 3.2814 (3.0967) grad_norm 1.8390 (1.7131) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][220/625] eta 0:04:10 lr 0.000736 wd 0.0500 time 0.4407 (0.6176) data time 0.0006 (0.0102) model time 0.4401 (0.6074) loss 3.4169 (3.0900) grad_norm 1.2927 (1.6895) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][230/625] eta 0:03:58 lr 0.000736 wd 0.0500 time 0.4421 (0.6037) data time 0.0007 (0.0094) model time 0.4414 (0.5942) loss 2.1181 (3.0804) grad_norm 1.3407 (1.6691) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][240/625] eta 0:03:47 lr 0.000736 wd 0.0500 time 0.4447 (0.5919) data time 0.0008 (0.0088) model time 0.4439 (0.5831) loss 2.2056 (3.0730) grad_norm 1.5762 (1.6571) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][250/625] eta 0:03:38 lr 0.000736 wd 0.0500 time 0.4401 (0.5820) data time 0.0007 (0.0083) model time 0.4394 (0.5737) loss 2.4771 (3.0539) grad_norm 1.5917 (1.7201) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][260/625] eta 0:03:29 lr 0.000735 wd 0.0500 time 0.4410 (0.5730) data time 0.0008 (0.0078) model time 0.4402 (0.5652) loss 3.3005 (3.0508) grad_norm 2.7788 (1.7333) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][270/625] eta 0:03:20 lr 0.000735 wd 0.0500 time 0.4427 (0.5650) data time 0.0009 (0.0074) model time 0.4418 (0.5576) loss 2.8336 (3.0488) grad_norm 1.4101 (1.7189) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][280/625] eta 0:03:12 lr 0.000735 wd 0.0500 time 0.4402 (0.5581) data time 0.0006 (0.0070) model time 0.4395 (0.5511) loss 3.3330 (3.0454) grad_norm 1.3113 (1.7051) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][290/625] eta 0:03:05 lr 0.000735 wd 0.0500 time 0.4425 (0.5527) data time 0.0010 (0.0067) model time 0.4416 (0.5460) loss 2.5203 (3.0385) grad_norm 1.4280 (1.7197) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][300/625] eta 0:02:57 lr 0.000735 wd 0.0500 time 0.4449 (0.5471) data time 0.0008 (0.0064) model time 0.4441 (0.5407) loss 2.5691 (3.0323) grad_norm 1.8741 (1.7233) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][310/625] eta 0:02:50 lr 0.000735 wd 0.0500 time 0.4437 (0.5420) data time 0.0007 (0.0061) model time 0.4430 (0.5358) loss 2.1535 (3.0222) grad_norm 1.9205 (1.7284) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][320/625] eta 0:02:43 lr 0.000735 wd 0.0500 time 0.4394 (0.5374) data time 0.0007 (0.0059) model time 0.4387 (0.5315) loss 2.0280 (3.0177) grad_norm 1.7955 (1.7192) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][330/625] eta 0:02:37 lr 0.000735 wd 0.0500 time 0.4416 (0.5331) data time 0.0006 (0.0057) model time 0.4410 (0.5274) loss 2.9340 (3.0236) grad_norm 1.7411 (1.7155) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][340/625] eta 0:02:30 lr 0.000735 wd 0.0500 time 0.4377 (0.5292) data time 0.0007 (0.0055) model time 0.4370 (0.5238) loss 3.4425 (3.0198) grad_norm 1.5275 (1.7116) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][350/625] eta 0:02:24 lr 0.000735 wd 0.0500 time 0.4423 (0.5257) data time 0.0006 (0.0053) model time 0.4416 (0.5204) loss 1.7164 (3.0143) grad_norm 1.5270 (1.7202) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][360/625] eta 0:02:18 lr 0.000734 wd 0.0500 time 0.4411 (0.5224) data time 0.0008 (0.0051) model time 0.4403 (0.5173) loss 2.5128 (3.0063) grad_norm 1.7302 (1.7256) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][370/625] eta 0:02:12 lr 0.000734 wd 0.0500 time 0.4470 (0.5194) data time 0.0006 (0.0050) model time 0.4464 (0.5145) loss 3.2817 (3.0049) grad_norm 2.1519 (1.7276) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][380/625] eta 0:02:06 lr 0.000734 wd 0.0500 time 0.4393 (0.5166) data time 0.0010 (0.0048) model time 0.4383 (0.5118) loss 3.0204 (3.0069) grad_norm 2.0564 (1.7444) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][390/625] eta 0:02:00 lr 0.000734 wd 0.0500 time 0.4409 (0.5140) data time 0.0009 (0.0047) model time 0.4400 (0.5093) loss 2.5773 (3.0025) grad_norm 1.5503 (1.7343) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][400/625] eta 0:01:55 lr 0.000734 wd 0.0500 time 0.4425 (0.5116) data time 0.0006 (0.0045) model time 0.4419 (0.5070) loss 2.2091 (2.9997) grad_norm 1.9919 (1.7472) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][410/625] eta 0:01:49 lr 0.000734 wd 0.0500 time 0.4442 (0.5093) data time 0.0007 (0.0044) model time 0.4435 (0.5049) loss 2.0022 (2.9918) grad_norm 1.7533 (1.7637) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][420/625] eta 0:01:43 lr 0.000734 wd 0.0500 time 0.4433 (0.5072) data time 0.0007 (0.0043) model time 0.4427 (0.5029) loss 3.4845 (2.9991) grad_norm 1.4633 (1.7580) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][430/625] eta 0:01:38 lr 0.000734 wd 0.0500 time 0.4381 (0.5052) data time 0.0008 (0.0042) model time 0.4373 (0.5010) loss 3.2837 (3.0079) grad_norm 1.0532 (1.7602) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][440/625] eta 0:01:33 lr 0.000734 wd 0.0500 time 0.4470 (0.5034) data time 0.0009 (0.0041) model time 0.4461 (0.4993) loss 2.9175 (3.0015) grad_norm 1.6618 (1.7574) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][450/625] eta 0:01:27 lr 0.000733 wd 0.0500 time 0.4428 (0.5017) data time 0.0009 (0.0040) model time 0.4419 (0.4976) loss 3.2622 (2.9998) grad_norm 1.5885 (1.7518) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][460/625] eta 0:01:22 lr 0.000733 wd 0.0500 time 0.4423 (0.5000) data time 0.0007 (0.0039) model time 0.4416 (0.4961) loss 2.0998 (3.0008) grad_norm 1.6152 (1.7467) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][470/625] eta 0:01:17 lr 0.000733 wd 0.0500 time 0.4421 (0.4990) data time 0.0009 (0.0039) model time 0.4411 (0.4952) loss 3.3140 (3.0007) grad_norm 1.2495 (1.7396) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][480/625] eta 0:01:12 lr 0.000733 wd 0.0500 time 0.4401 (0.4975) data time 0.0006 (0.0038) model time 0.4395 (0.4937) loss 1.8538 (2.9945) grad_norm 1.7053 (1.7397) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][490/625] eta 0:01:06 lr 0.000733 wd 0.0500 time 0.4395 (0.4961) data time 0.0010 (0.0037) model time 0.4385 (0.4924) loss 3.0169 (2.9898) grad_norm 1.3783 (1.7348) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][500/625] eta 0:01:01 lr 0.000733 wd 0.0500 time 0.4419 (0.4947) data time 0.0007 (0.0036) model time 0.4413 (0.4911) loss 3.5444 (2.9934) grad_norm 1.8903 (1.7304) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][510/625] eta 0:00:56 lr 0.000733 wd 0.0500 time 0.4539 (0.4935) data time 0.0006 (0.0036) model time 0.4533 (0.4899) loss 3.1807 (2.9963) grad_norm 1.3447 (1.7384) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][520/625] eta 0:00:51 lr 0.000733 wd 0.0500 time 0.4418 (0.4923) data time 0.0007 (0.0035) model time 0.4411 (0.4888) loss 2.4224 (2.9969) grad_norm 1.4875 (1.7343) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][530/625] eta 0:00:46 lr 0.000733 wd 0.0500 time 0.4423 (0.4911) data time 0.0009 (0.0034) model time 0.4413 (0.4877) loss 3.1141 (2.9949) grad_norm 1.4107 (1.7305) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][540/625] eta 0:00:41 lr 0.000733 wd 0.0500 time 0.4412 (0.4900) data time 0.0009 (0.0034) model time 0.4404 (0.4866) loss 2.6187 (3.0018) grad_norm 1.4085 (1.7240) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][550/625] eta 0:00:36 lr 0.000732 wd 0.0500 time 0.4450 (0.4889) data time 0.0006 (0.0033) model time 0.4445 (0.4856) loss 3.3305 (3.0034) grad_norm 1.3207 (1.7156) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][560/625] eta 0:00:31 lr 0.000732 wd 0.0500 time 0.4441 (0.4879) data time 0.0006 (0.0033) model time 0.4434 (0.4846) loss 2.6341 (2.9988) grad_norm 1.4515 (1.7100) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][570/625] eta 0:00:26 lr 0.000732 wd 0.0500 time 0.4419 (0.4869) data time 0.0007 (0.0032) model time 0.4413 (0.4837) loss 2.5644 (2.9930) grad_norm 1.6444 (1.7089) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][580/625] eta 0:00:21 lr 0.000732 wd 0.0500 time 0.4394 (0.4860) data time 0.0008 (0.0032) model time 0.4386 (0.4828) loss 2.4758 (2.9895) grad_norm 0.9715 (1.7084) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][590/625] eta 0:00:16 lr 0.000732 wd 0.0500 time 0.4416 (0.4851) data time 0.0009 (0.0031) model time 0.4407 (0.4819) loss 3.1957 (2.9937) grad_norm 1.2938 (1.7084) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][600/625] eta 0:00:12 lr 0.000732 wd 0.0500 time 0.4391 (0.4842) data time 0.0007 (0.0031) model time 0.4384 (0.4811) loss 2.7146 (2.9929) grad_norm 1.5341 (1.7058) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][610/625] eta 0:00:07 lr 0.000732 wd 0.0500 time 0.4365 (0.4834) data time 0.0006 (0.0030) model time 0.4359 (0.4803) loss 3.3476 (2.9962) grad_norm 1.1877 (1.7033) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [140/300][620/625] eta 0:00:02 lr 0.000732 wd 0.0500 time 0.4365 (0.4825) data time 0.0007 (0.0030) model time 0.4358 (0.4795) loss 2.5956 (3.0037) grad_norm 1.2559 (1.6964) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 13:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 140 training takes 0:04:10 [2024-08-10 13:17:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:17:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5381 (0.5381) Acc@1 88.184 (88.184) Acc@5 98.242 (98.242) Mem 16695MB [2024-08-10 13:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8408 (0.6605) Acc@1 79.883 (85.241) Acc@5 95.996 (97.470) Mem 16695MB [2024-08-10 13:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9795 (0.7789) Acc@1 75.195 (82.250) Acc@5 94.678 (96.203) Mem 16695MB [2024-08-10 13:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.930 Acc@5 96.179 [2024-08-10 13:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 13:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.93% [2024-08-10 13:18:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 13:18:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 13:18:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.4756 (0.4756) Acc@1 89.502 (89.502) Acc@5 98.682 (98.682) Mem 16695MB [2024-08-10 13:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.152) Loss 0.7656 (0.6004) Acc@1 81.787 (86.839) Acc@5 96.777 (97.834) Mem 16695MB [2024-08-10 13:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.8813 (0.7056) Acc@1 78.076 (83.922) Acc@5 95.703 (96.761) Mem 16695MB [2024-08-10 13:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.609 Acc@5 96.769 [2024-08-10 13:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-10 13:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.61% [2024-08-10 13:18:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:18:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][0/625] eta 0:08:48 lr 0.000732 wd 0.0500 time 0.8450 (0.8450) data time 0.3437 (0.3437) model time 0.0000 (0.0000) loss 2.8625 (2.8625) grad_norm 2.0214 (2.0214) loss_scale 1024.0000 (1024.0000) mem 16704MB [2024-08-10 13:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][10/625] eta 0:04:54 lr 0.000732 wd 0.0500 time 0.4442 (0.4781) data time 0.0008 (0.0321) model time 0.0000 (0.0000) loss 3.1107 (2.7642) grad_norm 1.6465 (1.5548) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][20/625] eta 0:04:39 lr 0.000731 wd 0.0500 time 0.4420 (0.4613) data time 0.0008 (0.0172) model time 0.0000 (0.0000) loss 3.2041 (2.8462) grad_norm 1.4755 (1.5366) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][30/625] eta 0:04:30 lr 0.000731 wd 0.0500 time 0.4426 (0.4549) data time 0.0006 (0.0119) model time 0.0000 (0.0000) loss 3.3352 (2.8900) grad_norm 2.4184 (1.5350) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][40/625] eta 0:04:24 lr 0.000731 wd 0.0500 time 0.4455 (0.4522) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 3.0067 (2.9883) grad_norm 1.8783 (1.5993) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][50/625] eta 0:04:18 lr 0.000731 wd 0.0500 time 0.4370 (0.4503) data time 0.0011 (0.0076) model time 0.0000 (0.0000) loss 3.0613 (2.9981) grad_norm 1.4207 (1.6508) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][60/625] eta 0:04:15 lr 0.000731 wd 0.0500 time 0.6677 (0.4526) data time 0.0006 (0.0065) model time 0.6671 (0.4637) loss 3.2664 (2.9926) grad_norm 1.7187 (1.6574) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][70/625] eta 0:04:10 lr 0.000731 wd 0.0500 time 0.4525 (0.4513) data time 0.0008 (0.0057) model time 0.4517 (0.4531) loss 3.2534 (2.9885) grad_norm 1.4315 (1.6802) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][80/625] eta 0:04:05 lr 0.000731 wd 0.0500 time 0.4411 (0.4503) data time 0.0008 (0.0051) model time 0.4402 (0.4494) loss 2.1799 (2.9723) grad_norm 1.3743 (1.6647) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:18:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][90/625] eta 0:04:00 lr 0.000731 wd 0.0500 time 0.4444 (0.4496) data time 0.0009 (0.0047) model time 0.4436 (0.4477) loss 2.9252 (2.9801) grad_norm 1.3270 (1.6412) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][100/625] eta 0:03:55 lr 0.000731 wd 0.0500 time 0.4434 (0.4490) data time 0.0006 (0.0043) model time 0.4428 (0.4468) loss 3.3386 (2.9938) grad_norm 1.3911 (1.6212) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][110/625] eta 0:03:50 lr 0.000731 wd 0.0500 time 0.4421 (0.4485) data time 0.0008 (0.0040) model time 0.4413 (0.4461) loss 3.1867 (2.9966) grad_norm 1.8745 (1.6152) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][120/625] eta 0:03:46 lr 0.000730 wd 0.0500 time 0.4512 (0.4482) data time 0.0006 (0.0037) model time 0.4506 (0.4458) loss 1.9968 (2.9750) grad_norm 1.3061 (1.5876) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][130/625] eta 0:03:41 lr 0.000730 wd 0.0500 time 0.4397 (0.4477) data time 0.0007 (0.0035) model time 0.4390 (0.4451) loss 3.0722 (2.9690) grad_norm 1.4709 (1.5813) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][140/625] eta 0:03:36 lr 0.000730 wd 0.0500 time 0.4399 (0.4474) data time 0.0007 (0.0033) model time 0.4392 (0.4449) loss 1.9089 (2.9408) grad_norm 1.4641 (1.5901) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][150/625] eta 0:03:32 lr 0.000730 wd 0.0500 time 0.4530 (0.4472) data time 0.0006 (0.0032) model time 0.4524 (0.4448) loss 3.4271 (2.9563) grad_norm 1.5095 (1.6084) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][160/625] eta 0:03:27 lr 0.000730 wd 0.0500 time 0.4452 (0.4470) data time 0.0006 (0.0030) model time 0.4446 (0.4445) loss 3.9135 (2.9622) grad_norm 1.9566 (1.6027) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][170/625] eta 0:03:23 lr 0.000730 wd 0.0500 time 0.4429 (0.4467) data time 0.0009 (0.0029) model time 0.4420 (0.4443) loss 2.1537 (2.9488) grad_norm 1.5322 (1.6027) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][180/625] eta 0:03:18 lr 0.000730 wd 0.0500 time 0.4424 (0.4466) data time 0.0009 (0.0028) model time 0.4415 (0.4443) loss 3.6785 (2.9399) grad_norm 2.0639 (1.5982) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][190/625] eta 0:03:14 lr 0.000730 wd 0.0500 time 0.4437 (0.4465) data time 0.0008 (0.0027) model time 0.4429 (0.4442) loss 2.0956 (2.9414) grad_norm 1.8178 (1.6278) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][200/625] eta 0:03:10 lr 0.000730 wd 0.0500 time 0.4463 (0.4471) data time 0.0006 (0.0026) model time 0.4456 (0.4452) loss 3.5936 (2.9379) grad_norm 1.5824 (1.6277) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][210/625] eta 0:03:05 lr 0.000729 wd 0.0500 time 0.4414 (0.4469) data time 0.0009 (0.0025) model time 0.4406 (0.4449) loss 3.2792 (2.9444) grad_norm 1.3083 (1.6163) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][220/625] eta 0:03:00 lr 0.000729 wd 0.0500 time 0.4500 (0.4467) data time 0.0008 (0.0025) model time 0.4492 (0.4447) loss 3.4401 (2.9618) grad_norm 1.8577 (1.6205) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][230/625] eta 0:02:56 lr 0.000729 wd 0.0500 time 0.4492 (0.4467) data time 0.0008 (0.0024) model time 0.4484 (0.4447) loss 3.1459 (2.9568) grad_norm 3.0578 (1.6446) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][240/625] eta 0:02:51 lr 0.000729 wd 0.0500 time 0.4443 (0.4466) data time 0.0009 (0.0023) model time 0.4433 (0.4447) loss 3.1675 (2.9568) grad_norm 2.0247 (1.6421) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][250/625] eta 0:02:47 lr 0.000729 wd 0.0500 time 0.6051 (0.4471) data time 0.0007 (0.0023) model time 0.6045 (0.4454) loss 3.7308 (2.9616) grad_norm 1.4107 (1.6380) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][260/625] eta 0:02:43 lr 0.000729 wd 0.0500 time 0.4401 (0.4470) data time 0.0007 (0.0022) model time 0.4394 (0.4452) loss 3.3743 (2.9675) grad_norm 1.4593 (1.6332) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][270/625] eta 0:02:38 lr 0.000729 wd 0.0500 time 0.4401 (0.4468) data time 0.0008 (0.0022) model time 0.4392 (0.4450) loss 2.8068 (2.9670) grad_norm 1.7531 (1.6391) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][280/625] eta 0:02:34 lr 0.000729 wd 0.0500 time 0.4414 (0.4467) data time 0.0008 (0.0022) model time 0.4406 (0.4449) loss 3.8546 (2.9712) grad_norm 1.4738 (1.6335) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][290/625] eta 0:02:29 lr 0.000729 wd 0.0500 time 0.4409 (0.4465) data time 0.0006 (0.0021) model time 0.4402 (0.4448) loss 2.2721 (2.9569) grad_norm 1.6381 (1.6641) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][300/625] eta 0:02:25 lr 0.000729 wd 0.0500 time 0.4449 (0.4464) data time 0.0007 (0.0021) model time 0.4442 (0.4447) loss 3.3950 (2.9611) grad_norm 1.2734 (1.6581) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][310/625] eta 0:02:20 lr 0.000728 wd 0.0500 time 0.4443 (0.4464) data time 0.0010 (0.0020) model time 0.4433 (0.4447) loss 2.4380 (2.9557) grad_norm 1.2491 (1.6523) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][320/625] eta 0:02:16 lr 0.000728 wd 0.0500 time 0.4445 (0.4463) data time 0.0006 (0.0020) model time 0.4439 (0.4446) loss 3.1582 (2.9555) grad_norm 1.4191 (1.6528) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][330/625] eta 0:02:11 lr 0.000728 wd 0.0500 time 0.4423 (0.4462) data time 0.0007 (0.0020) model time 0.4417 (0.4446) loss 3.5360 (2.9535) grad_norm 2.5159 (1.6552) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][340/625] eta 0:02:07 lr 0.000728 wd 0.0500 time 0.4437 (0.4462) data time 0.0009 (0.0019) model time 0.4429 (0.4445) loss 3.5134 (2.9570) grad_norm 1.8726 (1.6603) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][350/625] eta 0:02:02 lr 0.000728 wd 0.0500 time 0.4430 (0.4461) data time 0.0006 (0.0019) model time 0.4424 (0.4445) loss 2.8198 (2.9584) grad_norm 1.3742 (1.6584) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:20:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][360/625] eta 0:01:58 lr 0.000728 wd 0.0500 time 0.4438 (0.4460) data time 0.0007 (0.0019) model time 0.4431 (0.4444) loss 3.4701 (2.9547) grad_norm 2.4011 (1.6637) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][370/625] eta 0:01:53 lr 0.000728 wd 0.0500 time 0.4454 (0.4460) data time 0.0008 (0.0019) model time 0.4446 (0.4444) loss 2.2015 (2.9519) grad_norm 1.7703 (1.6631) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][380/625] eta 0:01:49 lr 0.000728 wd 0.0500 time 0.4408 (0.4460) data time 0.0008 (0.0018) model time 0.4400 (0.4444) loss 3.4139 (2.9538) grad_norm 1.1317 (1.6623) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][390/625] eta 0:01:44 lr 0.000728 wd 0.0500 time 0.4425 (0.4463) data time 0.0008 (0.0018) model time 0.4417 (0.4448) loss 2.4946 (2.9542) grad_norm 2.5151 (1.6684) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][400/625] eta 0:01:40 lr 0.000727 wd 0.0500 time 0.4430 (0.4463) data time 0.0006 (0.0018) model time 0.4424 (0.4448) loss 3.6109 (2.9581) grad_norm 1.7707 (1.6736) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][410/625] eta 0:01:35 lr 0.000727 wd 0.0500 time 0.4439 (0.4462) data time 0.0010 (0.0018) model time 0.4429 (0.4447) loss 3.5871 (2.9664) grad_norm 1.0300 (1.6653) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][420/625] eta 0:01:31 lr 0.000727 wd 0.0500 time 0.4400 (0.4462) data time 0.0006 (0.0017) model time 0.4394 (0.4446) loss 3.2043 (2.9644) grad_norm 1.5811 (1.6635) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][430/625] eta 0:01:26 lr 0.000727 wd 0.0500 time 0.4455 (0.4461) data time 0.0008 (0.0017) model time 0.4446 (0.4446) loss 2.5440 (2.9563) grad_norm 1.8567 (1.6604) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][440/625] eta 0:01:22 lr 0.000727 wd 0.0500 time 0.4426 (0.4460) data time 0.0008 (0.0017) model time 0.4418 (0.4445) loss 3.3854 (2.9564) grad_norm 1.5576 (1.6646) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][450/625] eta 0:01:18 lr 0.000727 wd 0.0500 time 0.4323 (0.4459) data time 0.0009 (0.0017) model time 0.4314 (0.4444) loss 3.1773 (2.9577) grad_norm 1.1690 (1.6650) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][460/625] eta 0:01:13 lr 0.000727 wd 0.0500 time 0.4412 (0.4459) data time 0.0010 (0.0017) model time 0.4402 (0.4444) loss 3.4869 (2.9591) grad_norm 2.0016 (1.6639) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][470/625] eta 0:01:09 lr 0.000727 wd 0.0500 time 0.6652 (0.4463) data time 0.0009 (0.0017) model time 0.6644 (0.4449) loss 3.0803 (2.9590) grad_norm 1.4669 (1.6601) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][480/625] eta 0:01:04 lr 0.000727 wd 0.0500 time 0.4424 (0.4462) data time 0.0008 (0.0017) model time 0.4416 (0.4448) loss 2.5518 (2.9593) grad_norm 1.4289 (1.6744) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:21:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][490/625] eta 0:01:00 lr 0.000727 wd 0.0500 time 0.4399 (0.4461) data time 0.0009 (0.0016) model time 0.4390 (0.4447) loss 3.4670 (2.9654) grad_norm 1.6389 (1.6799) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][500/625] eta 0:00:55 lr 0.000726 wd 0.0500 time 0.4655 (0.4462) data time 0.0006 (0.0016) model time 0.4649 (0.4447) loss 3.1497 (2.9667) grad_norm 1.7428 (1.6829) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][510/625] eta 0:00:51 lr 0.000726 wd 0.0500 time 0.4401 (0.4462) data time 0.0009 (0.0016) model time 0.4393 (0.4447) loss 3.3397 (2.9612) grad_norm 1.8925 (1.6808) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][520/625] eta 0:00:46 lr 0.000726 wd 0.0500 time 0.4399 (0.4461) data time 0.0007 (0.0017) model time 0.4392 (0.4446) loss 3.4192 (2.9673) grad_norm 3.2667 (1.6901) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][530/625] eta 0:00:42 lr 0.000726 wd 0.0500 time 0.4420 (0.4461) data time 0.0009 (0.0017) model time 0.4411 (0.4446) loss 2.0982 (2.9691) grad_norm 1.3145 (1.6908) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][540/625] eta 0:00:37 lr 0.000726 wd 0.0500 time 0.4403 (0.4460) data time 0.0006 (0.0017) model time 0.4397 (0.4445) loss 2.4139 (2.9717) grad_norm 1.8121 (1.6910) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][550/625] eta 0:00:33 lr 0.000726 wd 0.0500 time 0.4432 (0.4460) data time 0.0009 (0.0017) model time 0.4423 (0.4445) loss 3.4725 (2.9718) grad_norm 2.0554 (1.6896) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][560/625] eta 0:00:28 lr 0.000726 wd 0.0500 time 0.4416 (0.4459) data time 0.0008 (0.0016) model time 0.4407 (0.4444) loss 2.0556 (2.9717) grad_norm 1.7084 (1.6870) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][570/625] eta 0:00:24 lr 0.000726 wd 0.0500 time 0.4414 (0.4459) data time 0.0010 (0.0016) model time 0.4404 (0.4444) loss 2.9890 (2.9710) grad_norm 1.3827 (1.6933) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][580/625] eta 0:00:20 lr 0.000726 wd 0.0500 time 0.4419 (0.4459) data time 0.0008 (0.0016) model time 0.4412 (0.4444) loss 3.3701 (2.9696) grad_norm 1.4493 (1.6943) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][590/625] eta 0:00:15 lr 0.000726 wd 0.0500 time 0.4422 (0.4458) data time 0.0007 (0.0016) model time 0.4415 (0.4443) loss 3.4833 (2.9687) grad_norm 1.3656 (1.6946) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][600/625] eta 0:00:11 lr 0.000725 wd 0.0500 time 0.4444 (0.4458) data time 0.0008 (0.0016) model time 0.4436 (0.4443) loss 2.1413 (2.9704) grad_norm 1.1299 (1.6912) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][610/625] eta 0:00:06 lr 0.000725 wd 0.0500 time 0.4426 (0.4458) data time 0.0004 (0.0016) model time 0.4422 (0.4444) loss 3.3690 (2.9742) grad_norm 2.3196 (1.6888) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [141/300][620/625] eta 0:00:02 lr 0.000725 wd 0.0500 time 0.4371 (0.4460) data time 0.0006 (0.0016) model time 0.4365 (0.4446) loss 2.2153 (2.9709) grad_norm 1.5395 (1.6876) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:22:56 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 141 training takes 0:04:38 [2024-08-10 13:22:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:22:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5732 (0.5732) Acc@1 87.695 (87.695) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 13:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8589 (0.6792) Acc@1 79.785 (85.320) Acc@5 95.898 (97.505) Mem 16699MB [2024-08-10 13:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9761 (0.7986) Acc@1 77.002 (82.206) Acc@5 94.629 (96.194) Mem 16699MB [2024-08-10 13:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.850 Acc@5 96.177 [2024-08-10 13:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-10 13:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.923 (0.923) Loss 0.4744 (0.4744) Acc@1 89.453 (89.453) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 13:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.193) Loss 0.7656 (0.5998) Acc@1 81.836 (86.892) Acc@5 96.729 (97.834) Mem 16699MB [2024-08-10 13:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.157) Loss 0.8813 (0.7046) Acc@1 78.174 (83.959) Acc@5 95.605 (96.745) Mem 16699MB [2024-08-10 13:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.641 Acc@5 96.759 [2024-08-10 13:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-10 13:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.64% [2024-08-10 13:23:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:23:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:23:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][0/625] eta 0:07:51 lr 0.000725 wd 0.0500 time 0.7545 (0.7545) data time 0.3668 (0.3668) model time 0.0000 (0.0000) loss 2.6563 (2.6563) grad_norm 1.3494 (1.3494) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][10/625] eta 0:04:49 lr 0.000725 wd 0.0500 time 0.4445 (0.4705) data time 0.0006 (0.0342) model time 0.0000 (0.0000) loss 3.2577 (2.9706) grad_norm 1.9645 (1.6184) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:23:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][20/625] eta 0:04:36 lr 0.000725 wd 0.0500 time 0.4366 (0.4575) data time 0.0006 (0.0185) model time 0.0000 (0.0000) loss 3.4221 (2.8717) grad_norm 1.3647 (1.5617) loss_scale 2048.0000 (1267.8095) mem 16699MB [2024-08-10 13:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][30/625] eta 0:04:29 lr 0.000725 wd 0.0500 time 0.4410 (0.4526) data time 0.0010 (0.0128) model time 0.0000 (0.0000) loss 3.2290 (2.9730) grad_norm 1.3287 (1.5210) loss_scale 2048.0000 (1519.4839) mem 16699MB [2024-08-10 13:23:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][40/625] eta 0:04:23 lr 0.000725 wd 0.0500 time 0.4428 (0.4502) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 3.1623 (3.0450) grad_norm 2.9315 (1.5618) loss_scale 2048.0000 (1648.3902) mem 16699MB [2024-08-10 13:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][50/625] eta 0:04:18 lr 0.000725 wd 0.0500 time 0.4418 (0.4489) data time 0.0008 (0.0081) model time 0.0000 (0.0000) loss 2.9342 (3.0434) grad_norm 1.1264 (1.5756) loss_scale 2048.0000 (1726.7451) mem 16699MB [2024-08-10 13:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][60/625] eta 0:04:13 lr 0.000725 wd 0.0500 time 0.4418 (0.4479) data time 0.0006 (0.0070) model time 0.4412 (0.4417) loss 3.1543 (3.0677) grad_norm 1.8691 (1.5781) loss_scale 2048.0000 (1779.4098) mem 16699MB [2024-08-10 13:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][70/625] eta 0:04:08 lr 0.000724 wd 0.0500 time 0.4429 (0.4472) data time 0.0006 (0.0061) model time 0.4423 (0.4416) loss 2.3337 (3.0661) grad_norm 1.3683 (1.6575) loss_scale 2048.0000 (1817.2394) mem 16699MB [2024-08-10 13:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][80/625] eta 0:04:03 lr 0.000724 wd 0.0500 time 0.4384 (0.4464) data time 0.0006 (0.0055) model time 0.4377 (0.4410) loss 2.1423 (3.0316) grad_norm 3.2619 (1.7062) loss_scale 2048.0000 (1845.7284) mem 16699MB [2024-08-10 13:23:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][90/625] eta 0:03:58 lr 0.000724 wd 0.0500 time 0.4376 (0.4458) data time 0.0007 (0.0050) model time 0.4370 (0.4408) loss 2.3072 (3.0049) grad_norm 1.7466 (1.8031) loss_scale 2048.0000 (1867.9560) mem 16699MB [2024-08-10 13:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][100/625] eta 0:03:53 lr 0.000724 wd 0.0500 time 0.4434 (0.4453) data time 0.0009 (0.0046) model time 0.4425 (0.4407) loss 3.1460 (3.0071) grad_norm 1.6629 (1.8030) loss_scale 2048.0000 (1885.7822) mem 16699MB [2024-08-10 13:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][110/625] eta 0:03:49 lr 0.000724 wd 0.0500 time 0.4394 (0.4451) data time 0.0006 (0.0043) model time 0.4388 (0.4409) loss 3.2275 (2.9918) grad_norm 1.3210 (1.7859) loss_scale 2048.0000 (1900.3964) mem 16699MB [2024-08-10 13:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][120/625] eta 0:03:44 lr 0.000724 wd 0.0500 time 0.4485 (0.4451) data time 0.0006 (0.0040) model time 0.4479 (0.4413) loss 3.3898 (2.9952) grad_norm 1.8501 (1.7654) loss_scale 2048.0000 (1912.5950) mem 16699MB [2024-08-10 13:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][130/625] eta 0:03:40 lr 0.000724 wd 0.0500 time 0.4403 (0.4450) data time 0.0009 (0.0037) model time 0.4395 (0.4415) loss 3.0828 (3.0014) grad_norm 1.7403 (1.7781) loss_scale 2048.0000 (1922.9313) mem 16699MB [2024-08-10 13:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][140/625] eta 0:03:35 lr 0.000724 wd 0.0500 time 0.4410 (0.4449) data time 0.0007 (0.0035) model time 0.4403 (0.4417) loss 3.3692 (3.0233) grad_norm 1.9742 (1.8095) loss_scale 2048.0000 (1931.8014) mem 16699MB [2024-08-10 13:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][150/625] eta 0:03:31 lr 0.000724 wd 0.0500 time 0.4393 (0.4448) data time 0.0007 (0.0034) model time 0.4386 (0.4417) loss 2.9579 (3.0293) grad_norm 1.7441 (1.8129) loss_scale 2048.0000 (1939.4967) mem 16699MB [2024-08-10 13:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][160/625] eta 0:03:26 lr 0.000723 wd 0.0500 time 0.4431 (0.4447) data time 0.0006 (0.0032) model time 0.4425 (0.4418) loss 2.6733 (3.0315) grad_norm 1.8738 (1.8018) loss_scale 2048.0000 (1946.2360) mem 16699MB [2024-08-10 13:24:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][170/625] eta 0:03:22 lr 0.000723 wd 0.0500 time 0.4402 (0.4445) data time 0.0009 (0.0031) model time 0.4393 (0.4417) loss 2.9928 (3.0148) grad_norm 1.6149 (1.7871) loss_scale 2048.0000 (1952.1871) mem 16699MB [2024-08-10 13:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][180/625] eta 0:03:17 lr 0.000723 wd 0.0500 time 0.4462 (0.4446) data time 0.0009 (0.0030) model time 0.4453 (0.4419) loss 3.1250 (3.0074) grad_norm 2.9106 (1.7857) loss_scale 2048.0000 (1957.4807) mem 16699MB [2024-08-10 13:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][190/625] eta 0:03:14 lr 0.000723 wd 0.0500 time 0.6105 (0.4473) data time 0.0009 (0.0029) model time 0.6097 (0.4458) loss 2.2599 (3.0022) grad_norm 2.0701 (1.7796) loss_scale 2048.0000 (1962.2199) mem 16699MB [2024-08-10 13:24:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][200/625] eta 0:03:10 lr 0.000723 wd 0.0500 time 0.4350 (0.4474) data time 0.0009 (0.0028) model time 0.4341 (0.4460) loss 3.1625 (3.0073) grad_norm 1.6605 (1.7804) loss_scale 2048.0000 (1966.4876) mem 16699MB [2024-08-10 13:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][210/625] eta 0:03:05 lr 0.000723 wd 0.0500 time 0.4436 (0.4474) data time 0.0009 (0.0027) model time 0.4428 (0.4459) loss 3.0907 (3.0173) grad_norm 1.3421 (1.7679) loss_scale 2048.0000 (1970.3507) mem 16699MB [2024-08-10 13:24:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][220/625] eta 0:03:01 lr 0.000723 wd 0.0500 time 0.4429 (0.4473) data time 0.0010 (0.0027) model time 0.4419 (0.4458) loss 3.1611 (3.0240) grad_norm 1.3042 (1.7483) loss_scale 2048.0000 (1973.8643) mem 16699MB [2024-08-10 13:24:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][230/625] eta 0:02:56 lr 0.000723 wd 0.0500 time 0.4386 (0.4471) data time 0.0009 (0.0026) model time 0.4377 (0.4455) loss 2.8526 (3.0325) grad_norm 2.9518 (1.7413) loss_scale 2048.0000 (1977.0736) mem 16699MB [2024-08-10 13:24:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][240/625] eta 0:02:52 lr 0.000723 wd 0.0500 time 0.4488 (0.4470) data time 0.0008 (0.0026) model time 0.4481 (0.4453) loss 2.6678 (3.0295) grad_norm 1.4297 (1.7478) loss_scale 2048.0000 (1980.0166) mem 16699MB [2024-08-10 13:24:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][250/625] eta 0:02:47 lr 0.000723 wd 0.0500 time 0.4639 (0.4470) data time 0.0006 (0.0026) model time 0.4633 (0.4453) loss 2.2311 (3.0266) grad_norm 1.2976 (1.7339) loss_scale 2048.0000 (1982.7251) mem 16699MB [2024-08-10 13:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][260/625] eta 0:02:43 lr 0.000722 wd 0.0500 time 0.4449 (0.4469) data time 0.0008 (0.0026) model time 0.4441 (0.4452) loss 2.7644 (3.0344) grad_norm 1.3701 (1.7235) loss_scale 2048.0000 (1985.2261) mem 16699MB [2024-08-10 13:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][270/625] eta 0:02:38 lr 0.000722 wd 0.0500 time 0.4393 (0.4472) data time 0.0007 (0.0025) model time 0.4387 (0.4455) loss 2.0860 (3.0336) grad_norm 1.4385 (1.7170) loss_scale 2048.0000 (1987.5424) mem 16699MB [2024-08-10 13:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][280/625] eta 0:02:34 lr 0.000722 wd 0.0500 time 0.4624 (0.4471) data time 0.0007 (0.0025) model time 0.4617 (0.4455) loss 2.6745 (3.0292) grad_norm 1.7364 (1.7104) loss_scale 2048.0000 (1989.6940) mem 16699MB [2024-08-10 13:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][290/625] eta 0:02:29 lr 0.000722 wd 0.0500 time 0.4417 (0.4472) data time 0.0008 (0.0024) model time 0.4409 (0.4456) loss 3.3646 (3.0293) grad_norm 1.1171 (1.7071) loss_scale 2048.0000 (1991.6976) mem 16699MB [2024-08-10 13:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][300/625] eta 0:02:25 lr 0.000722 wd 0.0500 time 0.4417 (0.4471) data time 0.0008 (0.0024) model time 0.4408 (0.4455) loss 3.4294 (3.0306) grad_norm 1.5509 (1.7096) loss_scale 2048.0000 (1993.5681) mem 16699MB [2024-08-10 13:25:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][310/625] eta 0:02:20 lr 0.000722 wd 0.0500 time 0.4382 (0.4471) data time 0.0006 (0.0023) model time 0.4376 (0.4456) loss 3.3366 (3.0305) grad_norm 1.6121 (1.7211) loss_scale 2048.0000 (1995.3183) mem 16699MB [2024-08-10 13:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][320/625] eta 0:02:16 lr 0.000722 wd 0.0500 time 0.4431 (0.4470) data time 0.0009 (0.0023) model time 0.4423 (0.4455) loss 2.0868 (3.0314) grad_norm 1.2932 (1.7189) loss_scale 2048.0000 (1996.9595) mem 16699MB [2024-08-10 13:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][330/625] eta 0:02:11 lr 0.000722 wd 0.0500 time 0.4431 (0.4469) data time 0.0009 (0.0022) model time 0.4422 (0.4454) loss 2.1875 (3.0229) grad_norm 1.6127 (1.7181) loss_scale 2048.0000 (1998.5015) mem 16699MB [2024-08-10 13:25:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][340/625] eta 0:02:07 lr 0.000722 wd 0.0500 time 0.4467 (0.4469) data time 0.0009 (0.0022) model time 0.4458 (0.4454) loss 3.0282 (3.0128) grad_norm 1.9110 (1.7154) loss_scale 2048.0000 (1999.9531) mem 16699MB [2024-08-10 13:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][350/625] eta 0:02:02 lr 0.000721 wd 0.0500 time 0.4400 (0.4468) data time 0.0009 (0.0021) model time 0.4391 (0.4453) loss 2.9669 (3.0038) grad_norm 1.3505 (1.7091) loss_scale 2048.0000 (2001.3219) mem 16699MB [2024-08-10 13:25:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][360/625] eta 0:01:58 lr 0.000721 wd 0.0500 time 0.4457 (0.4467) data time 0.0009 (0.0021) model time 0.4449 (0.4452) loss 3.3326 (3.0085) grad_norm 1.3898 (1.7081) loss_scale 2048.0000 (2002.6150) mem 16699MB [2024-08-10 13:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][370/625] eta 0:01:53 lr 0.000721 wd 0.0500 time 0.4421 (0.4466) data time 0.0008 (0.0021) model time 0.4413 (0.4451) loss 2.8272 (3.0101) grad_norm 1.5112 (1.7024) loss_scale 2048.0000 (2003.8383) mem 16699MB [2024-08-10 13:25:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][380/625] eta 0:01:49 lr 0.000721 wd 0.0500 time 0.3903 (0.4467) data time 0.0006 (0.0020) model time 0.3897 (0.4452) loss 3.1853 (3.0152) grad_norm 2.1628 (1.7089) loss_scale 2048.0000 (2004.9974) mem 16699MB [2024-08-10 13:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][390/625] eta 0:01:44 lr 0.000721 wd 0.0500 time 0.4420 (0.4466) data time 0.0008 (0.0020) model time 0.4412 (0.4451) loss 3.2747 (3.0107) grad_norm 1.3881 (1.7095) loss_scale 2048.0000 (2006.0972) mem 16699MB [2024-08-10 13:26:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][400/625] eta 0:01:40 lr 0.000721 wd 0.0500 time 0.4464 (0.4465) data time 0.0006 (0.0020) model time 0.4458 (0.4450) loss 2.6110 (3.0076) grad_norm 2.0861 (1.7077) loss_scale 2048.0000 (2007.1421) mem 16699MB [2024-08-10 13:26:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][410/625] eta 0:01:36 lr 0.000721 wd 0.0500 time 0.4443 (0.4472) data time 0.0009 (0.0020) model time 0.4434 (0.4459) loss 3.3170 (3.0082) grad_norm 1.4285 (1.7051) loss_scale 2048.0000 (2008.1363) mem 16699MB [2024-08-10 13:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][420/625] eta 0:01:31 lr 0.000721 wd 0.0500 time 0.4418 (0.4472) data time 0.0007 (0.0019) model time 0.4411 (0.4458) loss 2.8112 (3.0015) grad_norm 1.8214 (1.7090) loss_scale 2048.0000 (2009.0831) mem 16699MB [2024-08-10 13:26:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][430/625] eta 0:01:27 lr 0.000721 wd 0.0500 time 0.4448 (0.4471) data time 0.0007 (0.0019) model time 0.4441 (0.4457) loss 2.6135 (3.0007) grad_norm 1.7224 (1.7043) loss_scale 2048.0000 (2009.9861) mem 16699MB [2024-08-10 13:26:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][440/625] eta 0:01:22 lr 0.000721 wd 0.0500 time 0.4429 (0.4470) data time 0.0006 (0.0019) model time 0.4423 (0.4457) loss 3.2745 (2.9976) grad_norm 1.0666 (1.7012) loss_scale 2048.0000 (2010.8481) mem 16699MB [2024-08-10 13:26:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][450/625] eta 0:01:18 lr 0.000720 wd 0.0500 time 0.4420 (0.4469) data time 0.0007 (0.0019) model time 0.4413 (0.4456) loss 2.6441 (2.9987) grad_norm 1.1661 (1.6931) loss_scale 2048.0000 (2011.6718) mem 16699MB [2024-08-10 13:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][460/625] eta 0:01:13 lr 0.000720 wd 0.0500 time 0.4403 (0.4468) data time 0.0008 (0.0018) model time 0.4395 (0.4455) loss 2.8199 (3.0022) grad_norm 1.2576 (1.7118) loss_scale 2048.0000 (2012.4599) mem 16699MB [2024-08-10 13:26:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][470/625] eta 0:01:09 lr 0.000720 wd 0.0500 time 0.4415 (0.4467) data time 0.0009 (0.0018) model time 0.4406 (0.4454) loss 3.1773 (3.0019) grad_norm 1.3761 (1.7078) loss_scale 2048.0000 (2013.2144) mem 16699MB [2024-08-10 13:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][480/625] eta 0:01:04 lr 0.000720 wd 0.0500 time 0.4475 (0.4467) data time 0.0007 (0.0019) model time 0.4467 (0.4453) loss 3.8011 (3.0007) grad_norm 1.4854 (1.7052) loss_scale 2048.0000 (2013.9376) mem 16699MB [2024-08-10 13:26:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][490/625] eta 0:01:00 lr 0.000720 wd 0.0500 time 0.4438 (0.4467) data time 0.0009 (0.0018) model time 0.4429 (0.4454) loss 3.2689 (3.0041) grad_norm 2.1768 (1.7065) loss_scale 2048.0000 (2014.6314) mem 16699MB [2024-08-10 13:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][500/625] eta 0:00:55 lr 0.000720 wd 0.0500 time 0.4403 (0.4468) data time 0.0007 (0.0018) model time 0.4396 (0.4454) loss 2.0092 (2.9968) grad_norm 1.6340 (1.7084) loss_scale 2048.0000 (2015.2974) mem 16699MB [2024-08-10 13:26:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][510/625] eta 0:00:51 lr 0.000720 wd 0.0500 time 0.4384 (0.4467) data time 0.0010 (0.0018) model time 0.4374 (0.4453) loss 2.3964 (2.9874) grad_norm 2.0916 (1.7068) loss_scale 2048.0000 (2015.9374) mem 16699MB [2024-08-10 13:26:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][520/625] eta 0:00:46 lr 0.000720 wd 0.0500 time 0.4407 (0.4466) data time 0.0007 (0.0018) model time 0.4400 (0.4453) loss 3.1966 (2.9845) grad_norm 1.7209 (1.7099) loss_scale 2048.0000 (2016.5528) mem 16699MB [2024-08-10 13:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][530/625] eta 0:00:42 lr 0.000720 wd 0.0500 time 0.4421 (0.4466) data time 0.0007 (0.0018) model time 0.4415 (0.4453) loss 3.6029 (2.9856) grad_norm 8.5208 (1.7207) loss_scale 2048.0000 (2017.1450) mem 16699MB [2024-08-10 13:27:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][540/625] eta 0:00:37 lr 0.000720 wd 0.0500 time 0.4420 (0.4467) data time 0.0006 (0.0018) model time 0.4414 (0.4453) loss 3.4645 (2.9871) grad_norm 1.6596 (1.7223) loss_scale 2048.0000 (2017.7153) mem 16699MB [2024-08-10 13:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][550/625] eta 0:00:33 lr 0.000719 wd 0.0500 time 0.6309 (0.4473) data time 0.0006 (0.0018) model time 0.6302 (0.4461) loss 3.5344 (2.9897) grad_norm 1.5005 (1.7247) loss_scale 2048.0000 (2018.2650) mem 16699MB [2024-08-10 13:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][560/625] eta 0:00:29 lr 0.000719 wd 0.0500 time 0.4412 (0.4473) data time 0.0007 (0.0018) model time 0.4405 (0.4461) loss 3.3461 (2.9940) grad_norm 1.7927 (1.7249) loss_scale 2048.0000 (2018.7950) mem 16699MB [2024-08-10 13:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][570/625] eta 0:00:24 lr 0.000719 wd 0.0500 time 0.4476 (0.4475) data time 0.0008 (0.0018) model time 0.4468 (0.4462) loss 3.2717 (2.9935) grad_norm 1.3510 (1.7193) loss_scale 2048.0000 (2019.3065) mem 16699MB [2024-08-10 13:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][580/625] eta 0:00:20 lr 0.000719 wd 0.0500 time 0.4430 (0.4474) data time 0.0006 (0.0017) model time 0.4424 (0.4462) loss 3.3353 (2.9935) grad_norm 1.7341 (1.7294) loss_scale 2048.0000 (2019.8003) mem 16699MB [2024-08-10 13:27:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][590/625] eta 0:00:15 lr 0.000719 wd 0.0500 time 0.4416 (0.4475) data time 0.0009 (0.0017) model time 0.4407 (0.4462) loss 2.9816 (2.9926) grad_norm 1.6693 (1.7434) loss_scale 2048.0000 (2020.2775) mem 16699MB [2024-08-10 13:27:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][600/625] eta 0:00:11 lr 0.000719 wd 0.0500 time 0.3880 (0.4477) data time 0.0011 (0.0017) model time 0.3869 (0.4464) loss 2.6044 (2.9921) grad_norm 1.1894 (1.7419) loss_scale 2048.0000 (2020.7388) mem 16699MB [2024-08-10 13:27:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][610/625] eta 0:00:06 lr 0.000719 wd 0.0500 time 0.4446 (0.4477) data time 0.0006 (0.0017) model time 0.4440 (0.4464) loss 2.7754 (2.9886) grad_norm 1.5257 (1.7382) loss_scale 2048.0000 (2021.1849) mem 16699MB [2024-08-10 13:27:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [142/300][620/625] eta 0:00:02 lr 0.000719 wd 0.0500 time 0.4432 (0.4475) data time 0.0006 (0.0017) model time 0.4426 (0.4462) loss 3.0232 (2.9903) grad_norm 1.3165 (1.7359) loss_scale 2048.0000 (2021.6167) mem 16699MB [2024-08-10 13:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 142 training takes 0:04:39 [2024-08-10 13:27:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:27:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:27:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5361 (0.5361) Acc@1 87.793 (87.793) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-10 13:27:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8257 (0.6577) Acc@1 79.688 (85.116) Acc@5 96.143 (97.474) Mem 16699MB [2024-08-10 13:27:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9526 (0.7761) Acc@1 77.734 (82.247) Acc@5 94.141 (96.145) Mem 16699MB [2024-08-10 13:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.928 Acc@5 96.137 [2024-08-10 13:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 13:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.772 (0.772) Loss 0.4739 (0.4739) Acc@1 89.404 (89.404) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 13:27:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.181) Loss 0.7656 (0.5992) Acc@1 81.787 (86.874) Acc@5 96.631 (97.820) Mem 16699MB [2024-08-10 13:27:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.150) Loss 0.8779 (0.7038) Acc@1 78.223 (83.954) Acc@5 95.703 (96.735) Mem 16699MB [2024-08-10 13:27:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.613 Acc@5 96.757 [2024-08-10 13:27:54 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-10 13:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][0/625] eta 0:12:54 lr 0.000719 wd 0.0500 time 1.2390 (1.2390) data time 0.5500 (0.5500) model time 0.0000 (0.0000) loss 3.0890 (3.0890) grad_norm 2.4004 (2.4004) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][10/625] eta 0:05:18 lr 0.000719 wd 0.0500 time 0.4469 (0.5173) data time 0.0006 (0.0509) model time 0.0000 (0.0000) loss 3.5684 (3.1061) grad_norm 1.6043 (1.6383) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][20/625] eta 0:04:51 lr 0.000718 wd 0.0500 time 0.4411 (0.4822) data time 0.0007 (0.0271) model time 0.0000 (0.0000) loss 2.9689 (3.0296) grad_norm 1.1411 (1.5547) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][30/625] eta 0:04:39 lr 0.000718 wd 0.0500 time 0.4413 (0.4703) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 2.4456 (3.0591) grad_norm 1.7509 (1.5477) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][40/625] eta 0:04:31 lr 0.000718 wd 0.0500 time 0.4475 (0.4636) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 2.3775 (3.0202) grad_norm 1.2582 (1.5486) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][50/625] eta 0:04:24 lr 0.000718 wd 0.0500 time 0.4447 (0.4593) data time 0.0008 (0.0117) model time 0.0000 (0.0000) loss 3.1660 (3.0312) grad_norm 1.3501 (1.5544) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][60/625] eta 0:04:17 lr 0.000718 wd 0.0500 time 0.4452 (0.4566) data time 0.0009 (0.0100) model time 0.4443 (0.4421) loss 3.1629 (3.0407) grad_norm 2.0642 (1.6148) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][70/625] eta 0:04:12 lr 0.000718 wd 0.0500 time 0.4440 (0.4547) data time 0.0006 (0.0087) model time 0.4434 (0.4421) loss 2.4360 (3.0187) grad_norm 1.7871 (1.7262) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][80/625] eta 0:04:07 lr 0.000718 wd 0.0500 time 0.4447 (0.4533) data time 0.0006 (0.0077) model time 0.4440 (0.4422) loss 1.7935 (2.9849) grad_norm 2.1426 (1.7165) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][90/625] eta 0:04:01 lr 0.000718 wd 0.0500 time 0.4453 (0.4522) data time 0.0007 (0.0070) model time 0.4446 (0.4423) loss 3.0921 (2.9799) grad_norm 1.2429 (1.7029) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][100/625] eta 0:03:56 lr 0.000718 wd 0.0500 time 0.4417 (0.4512) data time 0.0010 (0.0064) model time 0.4407 (0.4421) loss 3.2700 (2.9556) grad_norm 2.2656 (1.7148) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][110/625] eta 0:03:52 lr 0.000717 wd 0.0500 time 0.4385 (0.4516) data time 0.0009 (0.0059) model time 0.4377 (0.4443) loss 3.3105 (2.9813) grad_norm 2.3457 (1.7236) loss_scale 2048.0000 (2048.0000) mem 16699MB [2024-08-10 13:28:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][120/625] eta 0:03:48 lr 0.000717 wd 0.0500 time 0.4404 (0.4524) data time 0.0009 (0.0055) model time 0.4395 (0.4465) loss 2.8407 (2.9747) grad_norm 1.8560 (inf) loss_scale 1024.0000 (2005.6860) mem 16699MB [2024-08-10 13:28:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][130/625] eta 0:03:43 lr 0.000717 wd 0.0500 time 0.4460 (0.4516) data time 0.0006 (0.0051) model time 0.4453 (0.4458) loss 3.4199 (2.9716) grad_norm 2.4795 (inf) loss_scale 1024.0000 (1930.7481) mem 16699MB [2024-08-10 13:28:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][140/625] eta 0:03:38 lr 0.000717 wd 0.0500 time 0.4431 (0.4511) data time 0.0010 (0.0048) model time 0.4421 (0.4456) loss 3.4361 (2.9815) grad_norm 2.4822 (inf) loss_scale 1024.0000 (1866.4397) mem 16699MB [2024-08-10 13:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][150/625] eta 0:03:34 lr 0.000717 wd 0.0500 time 0.4425 (0.4506) data time 0.0006 (0.0046) model time 0.4418 (0.4453) loss 2.8459 (2.9539) grad_norm 2.3497 (inf) loss_scale 1024.0000 (1810.6490) mem 16699MB [2024-08-10 13:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][160/625] eta 0:03:29 lr 0.000717 wd 0.0500 time 0.4418 (0.4502) data time 0.0006 (0.0043) model time 0.4411 (0.4451) loss 2.7500 (2.9488) grad_norm 1.2377 (inf) loss_scale 1024.0000 (1761.7888) mem 16699MB [2024-08-10 13:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][170/625] eta 0:03:24 lr 0.000717 wd 0.0500 time 0.4422 (0.4497) data time 0.0006 (0.0041) model time 0.4416 (0.4448) loss 3.4309 (2.9470) grad_norm 1.9198 (inf) loss_scale 1024.0000 (1718.6433) mem 16699MB [2024-08-10 13:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][180/625] eta 0:03:19 lr 0.000717 wd 0.0500 time 0.4413 (0.4492) data time 0.0008 (0.0039) model time 0.4404 (0.4444) loss 2.7228 (2.9462) grad_norm 1.1254 (inf) loss_scale 1024.0000 (1680.2652) mem 16699MB [2024-08-10 13:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][190/625] eta 0:03:15 lr 0.000717 wd 0.0500 time 0.4405 (0.4488) data time 0.0009 (0.0038) model time 0.4396 (0.4441) loss 2.6986 (2.9386) grad_norm 1.9112 (inf) loss_scale 1024.0000 (1645.9058) mem 16699MB [2024-08-10 13:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][200/625] eta 0:03:10 lr 0.000717 wd 0.0500 time 0.4701 (0.4487) data time 0.0009 (0.0036) model time 0.4692 (0.4442) loss 2.7728 (2.9362) grad_norm 1.4177 (inf) loss_scale 1024.0000 (1614.9652) mem 16699MB [2024-08-10 13:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][210/625] eta 0:03:06 lr 0.000716 wd 0.0500 time 0.4577 (0.4485) data time 0.0009 (0.0035) model time 0.4568 (0.4442) loss 2.1748 (2.9327) grad_norm 1.1622 (inf) loss_scale 1024.0000 (1586.9573) mem 16699MB [2024-08-10 13:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][220/625] eta 0:03:01 lr 0.000716 wd 0.0500 time 0.4464 (0.4483) data time 0.0006 (0.0034) model time 0.4457 (0.4441) loss 3.4072 (2.9292) grad_norm 1.3365 (inf) loss_scale 1024.0000 (1561.4842) mem 16699MB [2024-08-10 13:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][230/625] eta 0:02:57 lr 0.000716 wd 0.0500 time 0.4416 (0.4481) data time 0.0008 (0.0033) model time 0.4408 (0.4441) loss 3.0809 (2.9275) grad_norm 1.6524 (inf) loss_scale 1024.0000 (1538.2165) mem 16699MB [2024-08-10 13:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][240/625] eta 0:02:52 lr 0.000716 wd 0.0500 time 0.4400 (0.4480) data time 0.0007 (0.0032) model time 0.4393 (0.4441) loss 3.0352 (2.9319) grad_norm 2.0365 (inf) loss_scale 1024.0000 (1516.8797) mem 16699MB [2024-08-10 13:29:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][250/625] eta 0:02:47 lr 0.000716 wd 0.0500 time 0.4437 (0.4478) data time 0.0009 (0.0031) model time 0.4428 (0.4440) loss 2.5762 (2.9312) grad_norm 1.6962 (inf) loss_scale 1024.0000 (1497.2430) mem 16699MB [2024-08-10 13:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][260/625] eta 0:02:43 lr 0.000716 wd 0.0500 time 0.4462 (0.4477) data time 0.0009 (0.0030) model time 0.4453 (0.4440) loss 3.2921 (2.9391) grad_norm 1.2548 (inf) loss_scale 1024.0000 (1479.1111) mem 16699MB [2024-08-10 13:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][270/625] eta 0:02:38 lr 0.000716 wd 0.0500 time 0.4450 (0.4475) data time 0.0008 (0.0029) model time 0.4442 (0.4439) loss 3.2279 (2.9448) grad_norm 1.2273 (inf) loss_scale 1024.0000 (1462.3173) mem 16699MB [2024-08-10 13:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][280/625] eta 0:02:34 lr 0.000716 wd 0.0500 time 0.4530 (0.4475) data time 0.0006 (0.0029) model time 0.4524 (0.4440) loss 3.1697 (2.9435) grad_norm 1.3173 (inf) loss_scale 1024.0000 (1446.7189) mem 16699MB [2024-08-10 13:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][290/625] eta 0:02:29 lr 0.000716 wd 0.0500 time 0.4474 (0.4475) data time 0.0009 (0.0028) model time 0.4465 (0.4441) loss 2.9330 (2.9417) grad_norm 3.2065 (inf) loss_scale 1024.0000 (1432.1924) mem 16699MB [2024-08-10 13:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][300/625] eta 0:02:25 lr 0.000715 wd 0.0500 time 0.4453 (0.4477) data time 0.0008 (0.0027) model time 0.4445 (0.4444) loss 3.5706 (2.9470) grad_norm 1.7156 (inf) loss_scale 1024.0000 (1418.6312) mem 16699MB [2024-08-10 13:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][310/625] eta 0:02:21 lr 0.000715 wd 0.0500 time 0.4421 (0.4477) data time 0.0006 (0.0027) model time 0.4415 (0.4444) loss 3.0996 (2.9487) grad_norm 1.7701 (inf) loss_scale 1024.0000 (1405.9421) mem 16699MB [2024-08-10 13:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][320/625] eta 0:02:16 lr 0.000715 wd 0.0500 time 0.4463 (0.4476) data time 0.0009 (0.0027) model time 0.4454 (0.4445) loss 3.0178 (2.9586) grad_norm 1.8481 (inf) loss_scale 1024.0000 (1394.0436) mem 16699MB [2024-08-10 13:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][330/625] eta 0:02:12 lr 0.000715 wd 0.0500 time 0.4417 (0.4475) data time 0.0007 (0.0026) model time 0.4410 (0.4444) loss 3.2186 (2.9581) grad_norm 1.6665 (inf) loss_scale 1024.0000 (1382.8640) mem 16699MB [2024-08-10 13:30:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][340/625] eta 0:02:07 lr 0.000715 wd 0.0500 time 0.4445 (0.4474) data time 0.0009 (0.0026) model time 0.4436 (0.4443) loss 3.3451 (2.9637) grad_norm 1.7580 (inf) loss_scale 1024.0000 (1372.3402) mem 16699MB [2024-08-10 13:30:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][350/625] eta 0:02:02 lr 0.000715 wd 0.0500 time 0.4436 (0.4472) data time 0.0008 (0.0025) model time 0.4427 (0.4442) loss 2.9481 (2.9608) grad_norm 1.2714 (inf) loss_scale 1024.0000 (1362.4160) mem 16699MB [2024-08-10 13:30:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][360/625] eta 0:01:58 lr 0.000715 wd 0.0500 time 0.4425 (0.4472) data time 0.0006 (0.0025) model time 0.4419 (0.4442) loss 3.4153 (2.9638) grad_norm 1.6361 (inf) loss_scale 1024.0000 (1353.0416) mem 16699MB [2024-08-10 13:30:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][370/625] eta 0:01:54 lr 0.000715 wd 0.0500 time 0.4407 (0.4472) data time 0.0007 (0.0024) model time 0.4400 (0.4443) loss 2.7130 (2.9690) grad_norm 1.5039 (inf) loss_scale 1024.0000 (1344.1725) mem 16699MB [2024-08-10 13:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][380/625] eta 0:01:49 lr 0.000715 wd 0.0500 time 0.4422 (0.4471) data time 0.0007 (0.0024) model time 0.4416 (0.4443) loss 3.2215 (2.9670) grad_norm 1.8927 (inf) loss_scale 1024.0000 (1335.7690) mem 16699MB [2024-08-10 13:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][390/625] eta 0:01:45 lr 0.000715 wd 0.0500 time 0.4390 (0.4471) data time 0.0009 (0.0024) model time 0.4381 (0.4444) loss 3.2272 (2.9681) grad_norm 2.9385 (inf) loss_scale 1024.0000 (1327.7954) mem 16699MB [2024-08-10 13:30:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][400/625] eta 0:01:40 lr 0.000714 wd 0.0500 time 0.4380 (0.4470) data time 0.0007 (0.0023) model time 0.4373 (0.4442) loss 3.5695 (2.9771) grad_norm 1.7049 (inf) loss_scale 1024.0000 (1320.2195) mem 16699MB [2024-08-10 13:30:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][410/625] eta 0:01:36 lr 0.000714 wd 0.0500 time 0.4466 (0.4469) data time 0.0009 (0.0023) model time 0.4457 (0.4442) loss 2.3535 (2.9722) grad_norm 2.4222 (inf) loss_scale 1024.0000 (1313.0122) mem 16699MB [2024-08-10 13:31:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][420/625] eta 0:01:31 lr 0.000714 wd 0.0500 time 0.4411 (0.4468) data time 0.0007 (0.0023) model time 0.4404 (0.4442) loss 3.4244 (2.9777) grad_norm 1.6177 (inf) loss_scale 1024.0000 (1306.1473) mem 16699MB [2024-08-10 13:31:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][430/625] eta 0:01:27 lr 0.000714 wd 0.0500 time 0.4453 (0.4467) data time 0.0009 (0.0022) model time 0.4444 (0.4441) loss 3.3290 (2.9725) grad_norm 1.6307 (inf) loss_scale 1024.0000 (1299.6009) mem 16699MB [2024-08-10 13:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][440/625] eta 0:01:22 lr 0.000714 wd 0.0500 time 0.4434 (0.4470) data time 0.0007 (0.0022) model time 0.4428 (0.4444) loss 2.1549 (2.9714) grad_norm 2.1591 (inf) loss_scale 1024.0000 (1293.3515) mem 16699MB [2024-08-10 13:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][450/625] eta 0:01:18 lr 0.000714 wd 0.0500 time 0.4430 (0.4474) data time 0.0007 (0.0022) model time 0.4423 (0.4449) loss 2.4632 (2.9680) grad_norm 1.9043 (inf) loss_scale 1024.0000 (1287.3792) mem 16699MB [2024-08-10 13:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][460/625] eta 0:01:13 lr 0.000714 wd 0.0500 time 0.4424 (0.4473) data time 0.0007 (0.0021) model time 0.4417 (0.4448) loss 2.7307 (2.9660) grad_norm 1.0842 (inf) loss_scale 1024.0000 (1281.6659) mem 16699MB [2024-08-10 13:31:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][470/625] eta 0:01:09 lr 0.000714 wd 0.0500 time 0.4433 (0.4472) data time 0.0009 (0.0021) model time 0.4425 (0.4447) loss 2.9694 (2.9621) grad_norm 1.7713 (inf) loss_scale 1024.0000 (1276.1953) mem 16699MB [2024-08-10 13:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][480/625] eta 0:01:04 lr 0.000714 wd 0.0500 time 0.4577 (0.4471) data time 0.0009 (0.0021) model time 0.4569 (0.4447) loss 3.0150 (2.9601) grad_norm 2.6197 (inf) loss_scale 1024.0000 (1270.9522) mem 16699MB [2024-08-10 13:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][490/625] eta 0:01:00 lr 0.000713 wd 0.0500 time 0.4428 (0.4470) data time 0.0008 (0.0021) model time 0.4419 (0.4446) loss 3.1444 (2.9598) grad_norm 1.8840 (inf) loss_scale 1024.0000 (1265.9226) mem 16699MB [2024-08-10 13:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][500/625] eta 0:00:55 lr 0.000713 wd 0.0500 time 0.4437 (0.4469) data time 0.0008 (0.0020) model time 0.4429 (0.4446) loss 3.1290 (2.9615) grad_norm 1.7814 (inf) loss_scale 1024.0000 (1261.0938) mem 16699MB [2024-08-10 13:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][510/625] eta 0:00:51 lr 0.000713 wd 0.0500 time 0.4477 (0.4468) data time 0.0007 (0.0020) model time 0.4470 (0.4445) loss 3.5784 (2.9641) grad_norm 1.2842 (inf) loss_scale 1024.0000 (1256.4540) mem 16699MB [2024-08-10 13:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][520/625] eta 0:00:46 lr 0.000713 wd 0.0500 time 0.4449 (0.4468) data time 0.0007 (0.0020) model time 0.4442 (0.4445) loss 3.3952 (2.9651) grad_norm 1.5040 (inf) loss_scale 1024.0000 (1251.9923) mem 16699MB [2024-08-10 13:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][530/625] eta 0:00:42 lr 0.000713 wd 0.0500 time 0.4403 (0.4467) data time 0.0008 (0.0020) model time 0.4394 (0.4444) loss 2.9105 (2.9636) grad_norm 2.7522 (inf) loss_scale 1024.0000 (1247.6987) mem 16699MB [2024-08-10 13:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][540/625] eta 0:00:37 lr 0.000713 wd 0.0500 time 0.4418 (0.4466) data time 0.0006 (0.0020) model time 0.4412 (0.4444) loss 2.4842 (2.9604) grad_norm 2.3377 (inf) loss_scale 1024.0000 (1243.5638) mem 16699MB [2024-08-10 13:32:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][550/625] eta 0:00:33 lr 0.000713 wd 0.0500 time 0.4442 (0.4466) data time 0.0007 (0.0019) model time 0.4435 (0.4443) loss 2.5722 (2.9651) grad_norm 1.2367 (inf) loss_scale 1024.0000 (1239.5789) mem 16699MB [2024-08-10 13:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][560/625] eta 0:00:29 lr 0.000713 wd 0.0500 time 0.4383 (0.4465) data time 0.0010 (0.0019) model time 0.4373 (0.4442) loss 3.4155 (2.9677) grad_norm 1.9282 (inf) loss_scale 1024.0000 (1235.7362) mem 16699MB [2024-08-10 13:32:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][570/625] eta 0:00:24 lr 0.000713 wd 0.0500 time 0.4407 (0.4464) data time 0.0009 (0.0019) model time 0.4398 (0.4442) loss 2.9949 (2.9682) grad_norm 2.2674 (inf) loss_scale 1024.0000 (1232.0280) mem 16699MB [2024-08-10 13:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][580/625] eta 0:00:20 lr 0.000713 wd 0.0500 time 0.4422 (0.4463) data time 0.0007 (0.0019) model time 0.4415 (0.4442) loss 1.8570 (2.9708) grad_norm 1.5328 (inf) loss_scale 1024.0000 (1228.4475) mem 16699MB [2024-08-10 13:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][590/625] eta 0:00:15 lr 0.000712 wd 0.0500 time 0.4419 (0.4463) data time 0.0006 (0.0019) model time 0.4412 (0.4441) loss 3.0724 (2.9692) grad_norm 1.2493 (inf) loss_scale 1024.0000 (1224.9882) mem 16699MB [2024-08-10 13:32:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][600/625] eta 0:00:11 lr 0.000712 wd 0.0500 time 0.4403 (0.4463) data time 0.0009 (0.0019) model time 0.4394 (0.4441) loss 3.0325 (2.9677) grad_norm 2.5959 (inf) loss_scale 1024.0000 (1221.6439) mem 16699MB [2024-08-10 13:32:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][610/625] eta 0:00:06 lr 0.000712 wd 0.0500 time 0.4375 (0.4462) data time 0.0004 (0.0018) model time 0.4370 (0.4440) loss 3.9077 (2.9693) grad_norm 1.6988 (inf) loss_scale 1024.0000 (1218.4092) mem 16699MB [2024-08-10 13:32:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [143/300][620/625] eta 0:00:02 lr 0.000712 wd 0.0500 time 0.4358 (0.4461) data time 0.0004 (0.0018) model time 0.4354 (0.4439) loss 2.8361 (2.9649) grad_norm 4.9986 (inf) loss_scale 1024.0000 (1215.2786) mem 16699MB [2024-08-10 13:32:33 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 143 training takes 0:04:38 [2024-08-10 13:32:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:32:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:32:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5288 (0.5288) Acc@1 88.623 (88.623) Acc@5 98.438 (98.438) Mem 16699MB [2024-08-10 13:32:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8584 (0.6604) Acc@1 79.346 (85.196) Acc@5 95.703 (97.368) Mem 16699MB [2024-08-10 13:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9858 (0.7880) Acc@1 75.684 (81.955) Acc@5 94.385 (96.019) Mem 16699MB [2024-08-10 13:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.712 Acc@5 96.045 [2024-08-10 13:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.7% [2024-08-10 13:32:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.942 (0.942) Loss 0.4739 (0.4739) Acc@1 89.453 (89.453) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-10 13:32:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.194) Loss 0.7642 (0.5988) Acc@1 81.641 (86.883) Acc@5 96.729 (97.847) Mem 16699MB [2024-08-10 13:32:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.156) Loss 0.8760 (0.7032) Acc@1 78.418 (83.987) Acc@5 95.703 (96.742) Mem 16699MB [2024-08-10 13:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.653 Acc@5 96.767 [2024-08-10 13:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 13:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.65% [2024-08-10 13:32:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:32:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:32:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][0/625] eta 0:07:48 lr 0.000712 wd 0.0500 time 0.7502 (0.7502) data time 0.3627 (0.3627) model time 0.0000 (0.0000) loss 3.2895 (3.2895) grad_norm 1.9206 (1.9206) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:32:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][10/625] eta 0:04:49 lr 0.000712 wd 0.0500 time 0.4439 (0.4711) data time 0.0006 (0.0339) model time 0.0000 (0.0000) loss 3.5089 (3.0673) grad_norm 1.5777 (1.7369) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:32:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][20/625] eta 0:04:43 lr 0.000712 wd 0.0500 time 0.6572 (0.4683) data time 0.0007 (0.0182) model time 0.0000 (0.0000) loss 2.1418 (2.9597) grad_norm 1.2602 (1.6579) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:32:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][30/625] eta 0:04:33 lr 0.000712 wd 0.0500 time 0.4441 (0.4605) data time 0.0006 (0.0126) model time 0.0000 (0.0000) loss 1.8154 (2.9003) grad_norm 1.4721 (1.6724) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][40/625] eta 0:04:27 lr 0.000712 wd 0.0500 time 0.4389 (0.4567) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 3.2285 (2.9698) grad_norm 2.4740 (1.6879) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][50/625] eta 0:04:21 lr 0.000712 wd 0.0500 time 0.4464 (0.4542) data time 0.0006 (0.0080) model time 0.0000 (0.0000) loss 2.9675 (2.9810) grad_norm 1.5902 (1.6955) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][60/625] eta 0:04:15 lr 0.000711 wd 0.0500 time 0.4484 (0.4525) data time 0.0006 (0.0068) model time 0.4478 (0.4433) loss 3.0785 (3.0433) grad_norm 1.6221 (1.6789) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][70/625] eta 0:04:10 lr 0.000711 wd 0.0500 time 0.4423 (0.4511) data time 0.0007 (0.0060) model time 0.4416 (0.4424) loss 2.0671 (2.9780) grad_norm 1.1034 (1.6481) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][80/625] eta 0:04:05 lr 0.000711 wd 0.0500 time 0.4391 (0.4500) data time 0.0006 (0.0053) model time 0.4385 (0.4419) loss 2.4618 (2.9793) grad_norm 1.8108 (1.6424) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][90/625] eta 0:04:00 lr 0.000711 wd 0.0500 time 0.4429 (0.4491) data time 0.0006 (0.0049) model time 0.4423 (0.4418) loss 3.4608 (2.9995) grad_norm 1.9469 (1.6601) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][100/625] eta 0:03:55 lr 0.000711 wd 0.0500 time 0.4445 (0.4485) data time 0.0008 (0.0045) model time 0.4437 (0.4418) loss 2.4834 (2.9902) grad_norm 2.0482 (1.6651) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][110/625] eta 0:03:50 lr 0.000711 wd 0.0500 time 0.4438 (0.4480) data time 0.0006 (0.0041) model time 0.4431 (0.4419) loss 2.6663 (2.9783) grad_norm 1.9457 (1.7162) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][120/625] eta 0:03:46 lr 0.000711 wd 0.0500 time 0.4452 (0.4478) data time 0.0008 (0.0039) model time 0.4444 (0.4422) loss 2.4216 (2.9913) grad_norm 1.7135 (1.7028) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][130/625] eta 0:03:41 lr 0.000711 wd 0.0500 time 0.4415 (0.4474) data time 0.0006 (0.0036) model time 0.4409 (0.4422) loss 3.3595 (2.9981) grad_norm 1.9745 (1.6994) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][140/625] eta 0:03:37 lr 0.000711 wd 0.0500 time 0.3916 (0.4474) data time 0.0009 (0.0034) model time 0.3908 (0.4427) loss 2.9346 (3.0015) grad_norm 1.2965 (1.6846) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][150/625] eta 0:03:32 lr 0.000710 wd 0.0500 time 0.4416 (0.4471) data time 0.0010 (0.0033) model time 0.4406 (0.4426) loss 2.6829 (3.0004) grad_norm 1.7507 (1.6789) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:33:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][160/625] eta 0:03:27 lr 0.000710 wd 0.0500 time 0.4444 (0.4469) data time 0.0008 (0.0031) model time 0.4436 (0.4427) loss 3.3118 (3.0180) grad_norm 1.9980 (1.6852) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][170/625] eta 0:03:23 lr 0.000710 wd 0.0500 time 0.4417 (0.4468) data time 0.0009 (0.0030) model time 0.4409 (0.4428) loss 3.2382 (3.0358) grad_norm 1.4501 (1.6721) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][180/625] eta 0:03:18 lr 0.000710 wd 0.0500 time 0.4461 (0.4467) data time 0.0008 (0.0029) model time 0.4452 (0.4429) loss 3.0234 (3.0386) grad_norm 1.9488 (1.6926) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][190/625] eta 0:03:14 lr 0.000710 wd 0.0500 time 0.4436 (0.4466) data time 0.0009 (0.0028) model time 0.4427 (0.4429) loss 3.2999 (3.0310) grad_norm 1.1697 (1.6949) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][200/625] eta 0:03:09 lr 0.000710 wd 0.0500 time 0.4433 (0.4464) data time 0.0008 (0.0027) model time 0.4424 (0.4428) loss 3.2407 (3.0248) grad_norm 1.6228 (1.6879) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][210/625] eta 0:03:05 lr 0.000710 wd 0.0500 time 0.6053 (0.4469) data time 0.0010 (0.0026) model time 0.6043 (0.4436) loss 2.7082 (3.0212) grad_norm 1.0367 (1.6783) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][220/625] eta 0:03:00 lr 0.000710 wd 0.0500 time 0.4445 (0.4466) data time 0.0008 (0.0025) model time 0.4436 (0.4435) loss 2.5916 (3.0155) grad_norm 1.3660 (1.6769) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][230/625] eta 0:02:56 lr 0.000710 wd 0.0500 time 0.4417 (0.4465) data time 0.0009 (0.0025) model time 0.4407 (0.4434) loss 3.1254 (3.0169) grad_norm 1.5002 (1.6749) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][240/625] eta 0:02:51 lr 0.000710 wd 0.0500 time 0.4450 (0.4464) data time 0.0006 (0.0024) model time 0.4443 (0.4434) loss 3.1721 (3.0210) grad_norm 1.4028 (1.6851) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][250/625] eta 0:02:47 lr 0.000709 wd 0.0500 time 0.4441 (0.4463) data time 0.0006 (0.0023) model time 0.4435 (0.4433) loss 2.0676 (3.0221) grad_norm 3.1637 (1.7060) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][260/625] eta 0:02:42 lr 0.000709 wd 0.0500 time 0.4422 (0.4462) data time 0.0007 (0.0023) model time 0.4414 (0.4433) loss 3.4891 (3.0192) grad_norm 2.0043 (1.7120) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][270/625] eta 0:02:38 lr 0.000709 wd 0.0500 time 0.4398 (0.4460) data time 0.0009 (0.0022) model time 0.4389 (0.4432) loss 3.1611 (3.0216) grad_norm 1.7456 (1.7051) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][280/625] eta 0:02:33 lr 0.000709 wd 0.0500 time 0.4424 (0.4459) data time 0.0009 (0.0022) model time 0.4415 (0.4431) loss 3.0672 (3.0211) grad_norm 1.4856 (1.6997) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][290/625] eta 0:02:29 lr 0.000709 wd 0.0500 time 0.4555 (0.4458) data time 0.0007 (0.0021) model time 0.4548 (0.4431) loss 3.2529 (3.0231) grad_norm 1.7318 (1.7026) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:34:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][300/625] eta 0:02:24 lr 0.000709 wd 0.0500 time 0.4401 (0.4457) data time 0.0010 (0.0021) model time 0.4391 (0.4431) loss 2.7432 (3.0236) grad_norm 1.1986 (1.7114) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][310/625] eta 0:02:20 lr 0.000709 wd 0.0500 time 0.4459 (0.4456) data time 0.0007 (0.0021) model time 0.4452 (0.4430) loss 3.3896 (3.0294) grad_norm 1.8510 (1.7197) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][320/625] eta 0:02:15 lr 0.000709 wd 0.0500 time 0.4543 (0.4455) data time 0.0006 (0.0020) model time 0.4537 (0.4430) loss 3.6168 (3.0334) grad_norm 1.4761 (1.7219) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][330/625] eta 0:02:11 lr 0.000709 wd 0.0500 time 0.4441 (0.4454) data time 0.0007 (0.0020) model time 0.4435 (0.4429) loss 2.8160 (3.0329) grad_norm 1.6301 (1.7173) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][340/625] eta 0:02:06 lr 0.000708 wd 0.0500 time 0.4399 (0.4454) data time 0.0009 (0.0020) model time 0.4390 (0.4429) loss 1.9741 (3.0288) grad_norm 1.1664 (1.7153) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][350/625] eta 0:02:02 lr 0.000708 wd 0.0500 time 0.4444 (0.4453) data time 0.0009 (0.0019) model time 0.4435 (0.4428) loss 3.6771 (3.0280) grad_norm 1.2315 (1.7091) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][360/625] eta 0:01:58 lr 0.000708 wd 0.0500 time 0.3894 (0.4457) data time 0.0010 (0.0019) model time 0.3884 (0.4434) loss 3.0820 (3.0204) grad_norm 1.7565 (1.7191) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][370/625] eta 0:01:53 lr 0.000708 wd 0.0500 time 0.4446 (0.4456) data time 0.0006 (0.0019) model time 0.4440 (0.4433) loss 3.2747 (3.0233) grad_norm 1.3127 (1.7108) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][380/625] eta 0:01:49 lr 0.000708 wd 0.0500 time 0.4409 (0.4455) data time 0.0007 (0.0019) model time 0.4402 (0.4432) loss 3.0742 (3.0262) grad_norm 2.3166 (1.7185) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][390/625] eta 0:01:44 lr 0.000708 wd 0.0500 time 0.4449 (0.4455) data time 0.0008 (0.0018) model time 0.4440 (0.4433) loss 2.9253 (3.0247) grad_norm 1.7205 (1.7241) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:35:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][400/625] eta 0:01:40 lr 0.000708 wd 0.0500 time 0.4463 (0.4454) data time 0.0008 (0.0018) model time 0.4455 (0.4432) loss 2.3881 (3.0190) grad_norm 1.6048 (inf) loss_scale 512.0000 (1011.2319) mem 16699MB [2024-08-10 13:35:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][410/625] eta 0:01:35 lr 0.000708 wd 0.0500 time 0.4415 (0.4454) data time 0.0008 (0.0018) model time 0.4408 (0.4432) loss 3.2134 (3.0207) grad_norm 1.6666 (inf) loss_scale 512.0000 (999.0852) mem 16699MB [2024-08-10 13:35:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][420/625] eta 0:01:31 lr 0.000708 wd 0.0500 time 0.4440 (0.4453) data time 0.0008 (0.0018) model time 0.4432 (0.4432) loss 3.1009 (3.0248) grad_norm 1.6306 (inf) loss_scale 512.0000 (987.5154) mem 16699MB [2024-08-10 13:35:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][430/625] eta 0:01:26 lr 0.000708 wd 0.0500 time 0.6510 (0.4457) data time 0.0008 (0.0018) model time 0.6502 (0.4437) loss 3.6195 (3.0265) grad_norm 1.9259 (inf) loss_scale 512.0000 (976.4826) mem 16699MB [2024-08-10 13:36:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][440/625] eta 0:01:22 lr 0.000707 wd 0.0500 time 0.4413 (0.4457) data time 0.0008 (0.0017) model time 0.4404 (0.4436) loss 3.1841 (3.0277) grad_norm 1.2376 (inf) loss_scale 512.0000 (965.9501) mem 16699MB [2024-08-10 13:36:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][450/625] eta 0:01:17 lr 0.000707 wd 0.0500 time 0.4440 (0.4456) data time 0.0010 (0.0017) model time 0.4430 (0.4436) loss 2.9677 (3.0235) grad_norm 1.6571 (inf) loss_scale 512.0000 (955.8847) mem 16699MB [2024-08-10 13:36:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][460/625] eta 0:01:13 lr 0.000707 wd 0.0500 time 0.4435 (0.4456) data time 0.0006 (0.0017) model time 0.4429 (0.4436) loss 2.0618 (3.0209) grad_norm 1.3397 (inf) loss_scale 512.0000 (946.2560) mem 16699MB [2024-08-10 13:36:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][470/625] eta 0:01:09 lr 0.000707 wd 0.0500 time 0.4452 (0.4455) data time 0.0009 (0.0017) model time 0.4443 (0.4436) loss 3.1505 (3.0151) grad_norm 2.0723 (inf) loss_scale 512.0000 (937.0361) mem 16699MB [2024-08-10 13:36:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][480/625] eta 0:01:04 lr 0.000707 wd 0.0500 time 0.4497 (0.4455) data time 0.0008 (0.0017) model time 0.4489 (0.4435) loss 3.2646 (3.0132) grad_norm 1.6016 (inf) loss_scale 512.0000 (928.1996) mem 16699MB [2024-08-10 13:36:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][490/625] eta 0:01:00 lr 0.000707 wd 0.0500 time 0.4432 (0.4454) data time 0.0008 (0.0017) model time 0.4424 (0.4435) loss 2.4243 (3.0068) grad_norm 1.4514 (inf) loss_scale 512.0000 (919.7230) mem 16699MB [2024-08-10 13:36:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][500/625] eta 0:00:55 lr 0.000707 wd 0.0500 time 0.4433 (0.4454) data time 0.0010 (0.0016) model time 0.4423 (0.4434) loss 2.9221 (3.0070) grad_norm 1.1854 (inf) loss_scale 512.0000 (911.5848) mem 16699MB [2024-08-10 13:36:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][510/625] eta 0:00:51 lr 0.000707 wd 0.0500 time 0.4425 (0.4456) data time 0.0008 (0.0016) model time 0.4418 (0.4437) loss 2.7093 (3.0048) grad_norm 1.6054 (inf) loss_scale 512.0000 (903.7652) mem 16699MB [2024-08-10 13:36:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][520/625] eta 0:00:46 lr 0.000707 wd 0.0500 time 0.4432 (0.4455) data time 0.0006 (0.0016) model time 0.4425 (0.4436) loss 3.5309 (3.0060) grad_norm 1.8017 (inf) loss_scale 512.0000 (896.2457) mem 16699MB [2024-08-10 13:36:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][530/625] eta 0:00:42 lr 0.000706 wd 0.0500 time 0.4434 (0.4454) data time 0.0007 (0.0016) model time 0.4427 (0.4436) loss 3.4954 (3.0037) grad_norm 1.5647 (inf) loss_scale 512.0000 (889.0094) mem 16699MB [2024-08-10 13:36:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][540/625] eta 0:00:37 lr 0.000706 wd 0.0500 time 0.4445 (0.4454) data time 0.0009 (0.0016) model time 0.4436 (0.4435) loss 3.2991 (3.0093) grad_norm 1.6345 (inf) loss_scale 512.0000 (882.0407) mem 16699MB [2024-08-10 13:36:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][550/625] eta 0:00:33 lr 0.000706 wd 0.0500 time 0.4444 (0.4453) data time 0.0007 (0.0016) model time 0.4437 (0.4435) loss 2.8723 (3.0087) grad_norm 4.2319 (inf) loss_scale 512.0000 (875.3249) mem 16699MB [2024-08-10 13:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][560/625] eta 0:00:28 lr 0.000706 wd 0.0500 time 0.4436 (0.4453) data time 0.0007 (0.0016) model time 0.4430 (0.4435) loss 3.7618 (3.0047) grad_norm 1.4903 (inf) loss_scale 512.0000 (868.8485) mem 16699MB [2024-08-10 13:36:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][570/625] eta 0:00:24 lr 0.000706 wd 0.0500 time 0.4438 (0.4453) data time 0.0008 (0.0016) model time 0.4430 (0.4435) loss 3.3615 (3.0035) grad_norm 1.5289 (inf) loss_scale 512.0000 (862.5989) mem 16699MB [2024-08-10 13:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][580/625] eta 0:00:20 lr 0.000706 wd 0.0500 time 0.4415 (0.4456) data time 0.0009 (0.0016) model time 0.4406 (0.4438) loss 3.0785 (3.0068) grad_norm 1.6339 (inf) loss_scale 512.0000 (856.5645) mem 16699MB [2024-08-10 13:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][590/625] eta 0:00:15 lr 0.000706 wd 0.0500 time 0.4429 (0.4455) data time 0.0009 (0.0016) model time 0.4419 (0.4438) loss 3.3786 (3.0031) grad_norm 1.2389 (inf) loss_scale 512.0000 (850.7343) mem 16699MB [2024-08-10 13:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][600/625] eta 0:00:11 lr 0.000706 wd 0.0500 time 0.4583 (0.4455) data time 0.0006 (0.0015) model time 0.4577 (0.4438) loss 2.8433 (3.0027) grad_norm 1.3950 (inf) loss_scale 512.0000 (845.0982) mem 16699MB [2024-08-10 13:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][610/625] eta 0:00:06 lr 0.000706 wd 0.0500 time 0.4379 (0.4455) data time 0.0006 (0.0015) model time 0.4373 (0.4437) loss 2.9950 (3.0032) grad_norm 1.1612 (inf) loss_scale 512.0000 (839.6465) mem 16699MB [2024-08-10 13:37:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [144/300][620/625] eta 0:00:02 lr 0.000706 wd 0.0500 time 0.4403 (0.4454) data time 0.0003 (0.0015) model time 0.4400 (0.4436) loss 3.7319 (3.0053) grad_norm 1.1228 (inf) loss_scale 512.0000 (834.3704) mem 16699MB [2024-08-10 13:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 144 training takes 0:04:38 [2024-08-10 13:37:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:37:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5200 (0.5200) Acc@1 88.525 (88.525) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 13:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8677 (0.6546) Acc@1 78.857 (85.405) Acc@5 95.264 (97.417) Mem 16699MB [2024-08-10 13:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 1.0312 (0.7796) Acc@1 75.146 (82.227) Acc@5 94.092 (96.096) Mem 16699MB [2024-08-10 13:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.852 Acc@5 96.085 [2024-08-10 13:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 13:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.847 (0.847) Loss 0.4746 (0.4746) Acc@1 89.404 (89.404) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 13:37:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.187) Loss 0.7627 (0.5981) Acc@1 81.689 (86.910) Acc@5 96.826 (97.856) Mem 16699MB [2024-08-10 13:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.8740 (0.7026) Acc@1 78.564 (84.008) Acc@5 95.605 (96.763) Mem 16699MB [2024-08-10 13:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.683 Acc@5 96.783 [2024-08-10 13:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 13:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.68% [2024-08-10 13:37:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:37:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][0/625] eta 0:07:55 lr 0.000705 wd 0.0500 time 0.7605 (0.7605) data time 0.3653 (0.3653) model time 0.0000 (0.0000) loss 3.4202 (3.4202) grad_norm 1.5103 (1.5103) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:37:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][10/625] eta 0:04:50 lr 0.000705 wd 0.0500 time 0.4494 (0.4726) data time 0.0007 (0.0341) model time 0.0000 (0.0000) loss 3.5240 (2.9079) grad_norm 1.6179 (1.6551) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][20/625] eta 0:04:37 lr 0.000705 wd 0.0500 time 0.4401 (0.4581) data time 0.0007 (0.0183) model time 0.0000 (0.0000) loss 1.8110 (2.9426) grad_norm 1.6251 (1.5310) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:37:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][30/625] eta 0:04:29 lr 0.000705 wd 0.0500 time 0.4442 (0.4533) data time 0.0009 (0.0127) model time 0.0000 (0.0000) loss 3.2081 (2.9906) grad_norm 1.5198 (2.0239) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:37:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][40/625] eta 0:04:23 lr 0.000705 wd 0.0500 time 0.4417 (0.4508) data time 0.0026 (0.0099) model time 0.0000 (0.0000) loss 3.1247 (2.9703) grad_norm 1.4082 (1.9880) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:37:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][50/625] eta 0:04:18 lr 0.000705 wd 0.0500 time 0.4408 (0.4493) data time 0.0009 (0.0081) model time 0.0000 (0.0000) loss 2.8778 (2.9370) grad_norm 1.6352 (1.9641) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][60/625] eta 0:04:15 lr 0.000705 wd 0.0500 time 0.4474 (0.4514) data time 0.0008 (0.0070) model time 0.4466 (0.4610) loss 2.5919 (2.9272) grad_norm 1.6365 (1.9182) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][70/625] eta 0:04:10 lr 0.000705 wd 0.0500 time 0.4450 (0.4506) data time 0.0010 (0.0061) model time 0.4440 (0.4528) loss 3.2547 (2.9080) grad_norm 1.8368 (1.9157) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][80/625] eta 0:04:05 lr 0.000705 wd 0.0500 time 0.4411 (0.4497) data time 0.0007 (0.0055) model time 0.4405 (0.4492) loss 3.5930 (2.9438) grad_norm 1.7125 (1.8800) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][90/625] eta 0:04:00 lr 0.000705 wd 0.0500 time 0.4487 (0.4491) data time 0.0008 (0.0050) model time 0.4479 (0.4478) loss 2.8039 (2.9352) grad_norm 4.8136 (1.8967) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][100/625] eta 0:03:55 lr 0.000704 wd 0.0500 time 0.4432 (0.4484) data time 0.0006 (0.0046) model time 0.4426 (0.4464) loss 3.2973 (2.9542) grad_norm 1.4819 (1.8996) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][110/625] eta 0:03:50 lr 0.000704 wd 0.0500 time 0.4434 (0.4480) data time 0.0009 (0.0043) model time 0.4425 (0.4458) loss 2.9809 (2.9515) grad_norm 1.4144 (1.8668) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][120/625] eta 0:03:46 lr 0.000704 wd 0.0500 time 0.4444 (0.4477) data time 0.0008 (0.0040) model time 0.4436 (0.4456) loss 2.9720 (2.9264) grad_norm 2.2603 (1.9086) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][130/625] eta 0:03:42 lr 0.000704 wd 0.0500 time 0.4423 (0.4490) data time 0.0007 (0.0037) model time 0.4415 (0.4479) loss 1.8225 (2.9351) grad_norm 1.3153 (1.8740) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][140/625] eta 0:03:37 lr 0.000704 wd 0.0500 time 0.4421 (0.4486) data time 0.0007 (0.0035) model time 0.4414 (0.4472) loss 2.3292 (2.9318) grad_norm 2.5588 (1.8634) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][150/625] eta 0:03:32 lr 0.000704 wd 0.0500 time 0.4421 (0.4483) data time 0.0008 (0.0034) model time 0.4413 (0.4468) loss 3.0760 (2.9426) grad_norm 1.6271 (1.8496) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][160/625] eta 0:03:28 lr 0.000704 wd 0.0500 time 0.4407 (0.4478) data time 0.0009 (0.0032) model time 0.4398 (0.4462) loss 3.1886 (2.9456) grad_norm 1.5649 (1.8467) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][170/625] eta 0:03:23 lr 0.000704 wd 0.0500 time 0.4411 (0.4476) data time 0.0009 (0.0031) model time 0.4402 (0.4459) loss 3.2139 (2.9340) grad_norm 1.5657 (1.8368) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][180/625] eta 0:03:19 lr 0.000704 wd 0.0500 time 0.4416 (0.4473) data time 0.0006 (0.0030) model time 0.4409 (0.4456) loss 3.0995 (2.9427) grad_norm 1.9352 (1.8220) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:38:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][190/625] eta 0:03:14 lr 0.000704 wd 0.0500 time 0.4440 (0.4473) data time 0.0007 (0.0030) model time 0.4433 (0.4455) loss 3.3563 (2.9449) grad_norm 1.7985 (1.8238) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][200/625] eta 0:03:10 lr 0.000703 wd 0.0500 time 0.4478 (0.4472) data time 0.0006 (0.0029) model time 0.4472 (0.4454) loss 3.9011 (2.9534) grad_norm 2.1096 (1.8350) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][210/625] eta 0:03:05 lr 0.000703 wd 0.0500 time 0.4417 (0.4470) data time 0.0008 (0.0028) model time 0.4408 (0.4452) loss 3.1691 (2.9595) grad_norm 1.3126 (1.8294) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][220/625] eta 0:03:00 lr 0.000703 wd 0.0500 time 0.4431 (0.4469) data time 0.0006 (0.0027) model time 0.4425 (0.4451) loss 3.3520 (2.9732) grad_norm 1.6502 (1.8150) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][230/625] eta 0:02:56 lr 0.000703 wd 0.0500 time 0.4468 (0.4468) data time 0.0006 (0.0026) model time 0.4462 (0.4450) loss 3.3472 (2.9665) grad_norm 2.0810 (1.8197) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][240/625] eta 0:02:51 lr 0.000703 wd 0.0500 time 0.4447 (0.4466) data time 0.0006 (0.0026) model time 0.4441 (0.4449) loss 2.2255 (2.9659) grad_norm 2.3532 (1.8120) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][250/625] eta 0:02:47 lr 0.000703 wd 0.0500 time 0.4415 (0.4465) data time 0.0009 (0.0025) model time 0.4406 (0.4447) loss 2.7376 (2.9673) grad_norm 1.3916 (1.7958) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][260/625] eta 0:02:42 lr 0.000703 wd 0.0500 time 0.4467 (0.4465) data time 0.0009 (0.0024) model time 0.4458 (0.4447) loss 3.2913 (2.9632) grad_norm 2.5691 (1.7932) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][270/625] eta 0:02:38 lr 0.000703 wd 0.0500 time 0.4413 (0.4464) data time 0.0009 (0.0024) model time 0.4404 (0.4447) loss 2.9924 (2.9633) grad_norm 3.1237 (1.8073) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][280/625] eta 0:02:33 lr 0.000703 wd 0.0500 time 0.4426 (0.4464) data time 0.0006 (0.0024) model time 0.4420 (0.4446) loss 2.1852 (2.9692) grad_norm 1.4905 (1.8064) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][290/625] eta 0:02:29 lr 0.000702 wd 0.0500 time 0.4433 (0.4463) data time 0.0007 (0.0023) model time 0.4426 (0.4445) loss 3.7943 (2.9834) grad_norm 1.0701 (1.8103) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][300/625] eta 0:02:25 lr 0.000702 wd 0.0500 time 0.4382 (0.4462) data time 0.0009 (0.0023) model time 0.4373 (0.4444) loss 3.2122 (2.9845) grad_norm 1.9342 (1.8081) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][310/625] eta 0:02:20 lr 0.000702 wd 0.0500 time 0.4424 (0.4461) data time 0.0007 (0.0022) model time 0.4417 (0.4444) loss 2.5989 (2.9764) grad_norm 1.5292 (1.8005) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][320/625] eta 0:02:16 lr 0.000702 wd 0.0500 time 0.4441 (0.4459) data time 0.0006 (0.0022) model time 0.4435 (0.4442) loss 3.3247 (2.9722) grad_norm 1.4684 (1.7902) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:39:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][330/625] eta 0:02:11 lr 0.000702 wd 0.0500 time 0.4436 (0.4459) data time 0.0009 (0.0021) model time 0.4428 (0.4442) loss 3.2593 (2.9741) grad_norm 1.9096 (1.7927) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][340/625] eta 0:02:07 lr 0.000702 wd 0.0500 time 0.4448 (0.4458) data time 0.0009 (0.0021) model time 0.4439 (0.4441) loss 3.1316 (2.9729) grad_norm 1.5373 (1.7873) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][350/625] eta 0:02:02 lr 0.000702 wd 0.0500 time 0.4408 (0.4458) data time 0.0009 (0.0021) model time 0.4399 (0.4442) loss 2.9590 (2.9729) grad_norm 1.5812 (1.7814) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][360/625] eta 0:01:58 lr 0.000702 wd 0.0500 time 0.4394 (0.4458) data time 0.0008 (0.0020) model time 0.4386 (0.4441) loss 3.1146 (2.9732) grad_norm 1.2010 (1.7713) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][370/625] eta 0:01:53 lr 0.000702 wd 0.0500 time 0.4404 (0.4457) data time 0.0008 (0.0020) model time 0.4396 (0.4440) loss 3.0721 (2.9757) grad_norm 1.3451 (1.7739) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][380/625] eta 0:01:49 lr 0.000702 wd 0.0500 time 0.4365 (0.4456) data time 0.0006 (0.0020) model time 0.4359 (0.4439) loss 2.2226 (2.9693) grad_norm 1.7816 (1.7744) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][390/625] eta 0:01:44 lr 0.000701 wd 0.0500 time 0.4414 (0.4460) data time 0.0008 (0.0020) model time 0.4406 (0.4444) loss 3.4804 (2.9746) grad_norm 1.9290 (1.7748) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][400/625] eta 0:01:40 lr 0.000701 wd 0.0500 time 0.4427 (0.4459) data time 0.0009 (0.0019) model time 0.4418 (0.4444) loss 2.7975 (2.9737) grad_norm 1.4555 (1.7683) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][410/625] eta 0:01:35 lr 0.000701 wd 0.0500 time 0.4359 (0.4458) data time 0.0007 (0.0019) model time 0.4352 (0.4442) loss 2.5290 (2.9730) grad_norm 1.4577 (1.7631) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][420/625] eta 0:01:31 lr 0.000701 wd 0.0500 time 0.4435 (0.4457) data time 0.0008 (0.0019) model time 0.4427 (0.4442) loss 2.9903 (2.9678) grad_norm 1.4179 (1.7638) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][430/625] eta 0:01:26 lr 0.000701 wd 0.0500 time 0.4411 (0.4456) data time 0.0006 (0.0019) model time 0.4404 (0.4441) loss 2.5928 (2.9665) grad_norm 1.8575 (1.7573) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][440/625] eta 0:01:22 lr 0.000701 wd 0.0500 time 0.4396 (0.4455) data time 0.0009 (0.0018) model time 0.4387 (0.4440) loss 2.9826 (2.9667) grad_norm 1.7675 (1.7560) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][450/625] eta 0:01:17 lr 0.000701 wd 0.0500 time 0.4400 (0.4455) data time 0.0008 (0.0018) model time 0.4392 (0.4440) loss 2.6156 (2.9604) grad_norm 2.1751 (1.7931) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][460/625] eta 0:01:13 lr 0.000701 wd 0.0500 time 0.4427 (0.4461) data time 0.0007 (0.0018) model time 0.4420 (0.4446) loss 3.4654 (2.9592) grad_norm 1.7211 (1.8044) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][470/625] eta 0:01:09 lr 0.000701 wd 0.0500 time 0.4425 (0.4461) data time 0.0007 (0.0018) model time 0.4417 (0.4447) loss 3.1605 (2.9592) grad_norm 1.5690 (1.8005) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][480/625] eta 0:01:04 lr 0.000700 wd 0.0500 time 0.5012 (0.4462) data time 0.0006 (0.0018) model time 0.5006 (0.4448) loss 2.9798 (2.9587) grad_norm 1.2921 (1.7966) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][490/625] eta 0:01:00 lr 0.000700 wd 0.0500 time 0.4423 (0.4463) data time 0.0006 (0.0018) model time 0.4416 (0.4449) loss 2.7946 (2.9613) grad_norm 1.6078 (1.7980) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][500/625] eta 0:00:55 lr 0.000700 wd 0.0500 time 0.4420 (0.4463) data time 0.0007 (0.0018) model time 0.4414 (0.4449) loss 2.4784 (2.9592) grad_norm 1.1938 (1.7920) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][510/625] eta 0:00:51 lr 0.000700 wd 0.0500 time 0.4410 (0.4462) data time 0.0006 (0.0018) model time 0.4404 (0.4448) loss 2.3001 (2.9601) grad_norm 1.4282 (1.7857) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][520/625] eta 0:00:46 lr 0.000700 wd 0.0500 time 0.4382 (0.4463) data time 0.0006 (0.0018) model time 0.4376 (0.4449) loss 3.0181 (2.9584) grad_norm 1.7678 (1.7858) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][530/625] eta 0:00:42 lr 0.000700 wd 0.0500 time 0.4347 (0.4462) data time 0.0010 (0.0017) model time 0.4337 (0.4448) loss 1.9393 (2.9542) grad_norm 2.0613 (1.7987) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][540/625] eta 0:00:37 lr 0.000700 wd 0.0500 time 0.4435 (0.4462) data time 0.0006 (0.0017) model time 0.4429 (0.4448) loss 2.7185 (2.9583) grad_norm 1.1141 (1.7958) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][550/625] eta 0:00:33 lr 0.000700 wd 0.0500 time 0.4402 (0.4461) data time 0.0008 (0.0017) model time 0.4394 (0.4448) loss 2.7687 (2.9563) grad_norm 1.3977 (1.7944) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][560/625] eta 0:00:28 lr 0.000700 wd 0.0500 time 0.4499 (0.4461) data time 0.0007 (0.0017) model time 0.4492 (0.4447) loss 3.5954 (2.9615) grad_norm 2.0269 (1.8182) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][570/625] eta 0:00:24 lr 0.000700 wd 0.0500 time 0.4455 (0.4461) data time 0.0010 (0.0017) model time 0.4445 (0.4447) loss 3.1689 (2.9626) grad_norm 1.2264 (1.8159) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][580/625] eta 0:00:20 lr 0.000699 wd 0.0500 time 0.4400 (0.4462) data time 0.0007 (0.0017) model time 0.4393 (0.4448) loss 2.4112 (2.9617) grad_norm 1.4682 (1.8161) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][590/625] eta 0:00:15 lr 0.000699 wd 0.0500 time 0.4411 (0.4461) data time 0.0006 (0.0017) model time 0.4404 (0.4448) loss 3.5455 (2.9616) grad_norm 1.4412 (1.8123) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][600/625] eta 0:00:11 lr 0.000699 wd 0.0500 time 0.4430 (0.4461) data time 0.0009 (0.0017) model time 0.4421 (0.4447) loss 2.8361 (2.9642) grad_norm 1.2420 (1.8067) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][610/625] eta 0:00:06 lr 0.000699 wd 0.0500 time 0.4392 (0.4460) data time 0.0004 (0.0017) model time 0.4387 (0.4447) loss 2.9585 (2.9666) grad_norm 1.0065 (1.8015) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [145/300][620/625] eta 0:00:02 lr 0.000699 wd 0.0500 time 0.4409 (0.4459) data time 0.0006 (0.0016) model time 0.4402 (0.4446) loss 3.3638 (2.9658) grad_norm 1.8720 (1.7946) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 145 training takes 0:04:38 [2024-08-10 13:42:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:42:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.5137 (0.5137) Acc@1 88.867 (88.867) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 13:42:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.155) Loss 0.9077 (0.6665) Acc@1 77.734 (85.249) Acc@5 95.117 (97.461) Mem 16699MB [2024-08-10 13:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.137) Loss 0.9727 (0.7876) Acc@1 77.295 (82.180) Acc@5 94.482 (96.184) Mem 16699MB [2024-08-10 13:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.932 Acc@5 96.157 [2024-08-10 13:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 13:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.93% [2024-08-10 13:42:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 13:42:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 13:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.503 (0.503) Loss 0.4739 (0.4739) Acc@1 89.502 (89.502) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 13:42:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.154) Loss 0.7637 (0.5977) Acc@1 81.787 (86.905) Acc@5 96.680 (97.838) Mem 16699MB [2024-08-10 13:42:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.8745 (0.7023) Acc@1 78.369 (83.975) Acc@5 95.703 (96.756) Mem 16699MB [2024-08-10 13:42:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.669 Acc@5 96.777 [2024-08-10 13:42:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 13:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][0/625] eta 0:13:06 lr 0.000699 wd 0.0500 time 1.2586 (1.2586) data time 0.6120 (0.6120) model time 0.0000 (0.0000) loss 3.0480 (3.0480) grad_norm 2.2690 (2.2690) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][10/625] eta 0:05:19 lr 0.000699 wd 0.0500 time 0.4459 (0.5188) data time 0.0008 (0.0564) model time 0.0000 (0.0000) loss 3.4600 (3.0835) grad_norm 2.0225 (1.7477) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][20/625] eta 0:04:51 lr 0.000699 wd 0.0500 time 0.4391 (0.4824) data time 0.0007 (0.0300) model time 0.0000 (0.0000) loss 3.7460 (3.0823) grad_norm 1.4533 (1.7255) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][30/625] eta 0:04:39 lr 0.000699 wd 0.0500 time 0.4463 (0.4700) data time 0.0007 (0.0206) model time 0.0000 (0.0000) loss 2.9141 (3.0618) grad_norm 1.5165 (1.6763) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][40/625] eta 0:04:30 lr 0.000699 wd 0.0500 time 0.4418 (0.4632) data time 0.0007 (0.0158) model time 0.0000 (0.0000) loss 2.9179 (3.0448) grad_norm 1.5974 (1.6137) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][50/625] eta 0:04:26 lr 0.000698 wd 0.0500 time 0.4444 (0.4634) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 2.0574 (3.0376) grad_norm 2.7931 (1.6418) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][60/625] eta 0:04:19 lr 0.000698 wd 0.0500 time 0.4437 (0.4601) data time 0.0007 (0.0110) model time 0.4429 (0.4425) loss 2.4912 (3.0095) grad_norm 2.6452 (1.6405) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][70/625] eta 0:04:14 lr 0.000698 wd 0.0500 time 0.4405 (0.4579) data time 0.0008 (0.0095) model time 0.4396 (0.4432) loss 3.2990 (3.0137) grad_norm 2.9463 (1.6697) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:42:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][80/625] eta 0:04:08 lr 0.000698 wd 0.0500 time 0.4435 (0.4560) data time 0.0009 (0.0085) model time 0.4426 (0.4425) loss 2.7546 (3.0111) grad_norm 2.4670 (1.7724) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][90/625] eta 0:04:03 lr 0.000698 wd 0.0500 time 0.4405 (0.4544) data time 0.0009 (0.0076) model time 0.4396 (0.4419) loss 2.3755 (3.0039) grad_norm 1.4020 (1.7391) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][100/625] eta 0:03:58 lr 0.000698 wd 0.0500 time 0.3918 (0.4545) data time 0.0010 (0.0070) model time 0.3909 (0.4445) loss 2.0284 (3.0218) grad_norm 2.2887 (1.7783) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][110/625] eta 0:03:53 lr 0.000698 wd 0.0500 time 0.4441 (0.4534) data time 0.0009 (0.0064) model time 0.4433 (0.4440) loss 3.5213 (3.0127) grad_norm 1.6890 (1.7964) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][120/625] eta 0:03:48 lr 0.000698 wd 0.0500 time 0.4386 (0.4524) data time 0.0006 (0.0060) model time 0.4379 (0.4436) loss 3.4396 (3.0156) grad_norm 1.5246 (1.8223) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][130/625] eta 0:03:43 lr 0.000698 wd 0.0500 time 0.4384 (0.4518) data time 0.0010 (0.0056) model time 0.4374 (0.4434) loss 3.2938 (3.0085) grad_norm 1.1751 (1.8199) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][140/625] eta 0:03:38 lr 0.000697 wd 0.0500 time 0.4442 (0.4512) data time 0.0009 (0.0053) model time 0.4433 (0.4433) loss 3.2149 (3.0115) grad_norm 2.3797 (1.8251) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][150/625] eta 0:03:34 lr 0.000697 wd 0.0500 time 0.4429 (0.4507) data time 0.0006 (0.0050) model time 0.4423 (0.4433) loss 3.3361 (3.0123) grad_norm 1.3706 (1.8346) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][160/625] eta 0:03:29 lr 0.000697 wd 0.0500 time 0.4426 (0.4503) data time 0.0010 (0.0047) model time 0.4416 (0.4433) loss 2.8193 (3.0139) grad_norm 2.2182 (1.8737) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][170/625] eta 0:03:24 lr 0.000697 wd 0.0500 time 0.4438 (0.4498) data time 0.0007 (0.0045) model time 0.4431 (0.4431) loss 2.8085 (2.9995) grad_norm 1.2471 (1.8909) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][180/625] eta 0:03:19 lr 0.000697 wd 0.0500 time 0.4403 (0.4494) data time 0.0009 (0.0043) model time 0.4394 (0.4430) loss 3.3555 (3.0078) grad_norm 2.6882 (1.8730) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][190/625] eta 0:03:15 lr 0.000697 wd 0.0500 time 0.4390 (0.4490) data time 0.0006 (0.0041) model time 0.4384 (0.4428) loss 1.7607 (2.9940) grad_norm 2.1015 (1.8884) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][200/625] eta 0:03:10 lr 0.000697 wd 0.0500 time 0.4444 (0.4487) data time 0.0007 (0.0039) model time 0.4437 (0.4428) loss 2.4219 (2.9854) grad_norm 1.3751 (1.8789) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][210/625] eta 0:03:06 lr 0.000697 wd 0.0500 time 0.4407 (0.4485) data time 0.0009 (0.0038) model time 0.4399 (0.4428) loss 3.2414 (2.9920) grad_norm 1.7707 (1.8657) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:43:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][220/625] eta 0:03:01 lr 0.000697 wd 0.0500 time 0.4422 (0.4482) data time 0.0007 (0.0037) model time 0.4415 (0.4427) loss 2.5641 (2.9850) grad_norm 1.6257 (1.8475) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][230/625] eta 0:02:56 lr 0.000696 wd 0.0500 time 0.4420 (0.4480) data time 0.0008 (0.0036) model time 0.4412 (0.4427) loss 2.8154 (2.9862) grad_norm 1.3015 (1.8266) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][240/625] eta 0:02:52 lr 0.000696 wd 0.0500 time 0.4412 (0.4485) data time 0.0009 (0.0034) model time 0.4403 (0.4436) loss 3.0944 (2.9826) grad_norm 1.7472 (1.8230) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][250/625] eta 0:02:48 lr 0.000696 wd 0.0500 time 0.4431 (0.4483) data time 0.0009 (0.0033) model time 0.4422 (0.4435) loss 2.5170 (2.9799) grad_norm 1.6367 (1.8132) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][260/625] eta 0:02:43 lr 0.000696 wd 0.0500 time 0.4427 (0.4481) data time 0.0009 (0.0032) model time 0.4418 (0.4435) loss 3.2411 (2.9766) grad_norm 1.3963 (1.8053) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][270/625] eta 0:02:39 lr 0.000696 wd 0.0500 time 0.4488 (0.4479) data time 0.0007 (0.0032) model time 0.4481 (0.4435) loss 3.5906 (2.9776) grad_norm 1.6649 (1.7994) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][280/625] eta 0:02:34 lr 0.000696 wd 0.0500 time 0.4511 (0.4478) data time 0.0007 (0.0031) model time 0.4504 (0.4435) loss 2.1019 (2.9741) grad_norm 1.5923 (1.7908) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][290/625] eta 0:02:29 lr 0.000696 wd 0.0500 time 0.4399 (0.4476) data time 0.0008 (0.0030) model time 0.4390 (0.4434) loss 2.7265 (2.9737) grad_norm 1.9329 (1.7888) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][300/625] eta 0:02:25 lr 0.000696 wd 0.0500 time 0.4437 (0.4475) data time 0.0006 (0.0029) model time 0.4431 (0.4434) loss 3.3782 (2.9705) grad_norm 1.3651 (1.7818) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][310/625] eta 0:02:20 lr 0.000696 wd 0.0500 time 0.4436 (0.4474) data time 0.0008 (0.0029) model time 0.4428 (0.4433) loss 3.0446 (2.9660) grad_norm 1.3643 (1.7705) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][320/625] eta 0:02:16 lr 0.000696 wd 0.0500 time 0.4407 (0.4472) data time 0.0009 (0.0028) model time 0.4398 (0.4433) loss 3.3322 (2.9685) grad_norm 1.1132 (1.7667) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][330/625] eta 0:02:12 lr 0.000695 wd 0.0500 time 0.4371 (0.4475) data time 0.0010 (0.0027) model time 0.4362 (0.4437) loss 2.6800 (2.9684) grad_norm 1.5227 (1.7625) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][340/625] eta 0:02:07 lr 0.000695 wd 0.0500 time 0.4420 (0.4473) data time 0.0006 (0.0027) model time 0.4414 (0.4436) loss 3.6098 (2.9716) grad_norm 1.2861 (1.7597) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:44:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][350/625] eta 0:02:02 lr 0.000695 wd 0.0500 time 0.4452 (0.4472) data time 0.0006 (0.0026) model time 0.4446 (0.4436) loss 2.5529 (2.9700) grad_norm 1.7576 (1.7765) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][360/625] eta 0:01:58 lr 0.000695 wd 0.0500 time 0.4429 (0.4472) data time 0.0009 (0.0026) model time 0.4420 (0.4436) loss 2.2096 (2.9698) grad_norm 1.7488 (1.7698) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][370/625] eta 0:01:54 lr 0.000695 wd 0.0500 time 0.4412 (0.4471) data time 0.0009 (0.0025) model time 0.4403 (0.4436) loss 3.1436 (2.9675) grad_norm 1.4682 (1.7645) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][380/625] eta 0:01:49 lr 0.000695 wd 0.0500 time 0.4490 (0.4470) data time 0.0006 (0.0025) model time 0.4484 (0.4436) loss 3.0272 (2.9721) grad_norm 1.2405 (1.7579) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][390/625] eta 0:01:45 lr 0.000695 wd 0.0500 time 0.4400 (0.4469) data time 0.0006 (0.0025) model time 0.4393 (0.4436) loss 3.2820 (2.9702) grad_norm 2.6090 (1.7546) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][400/625] eta 0:01:40 lr 0.000695 wd 0.0500 time 0.4381 (0.4468) data time 0.0010 (0.0024) model time 0.4371 (0.4435) loss 2.9870 (2.9747) grad_norm 2.3060 (1.7529) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][410/625] eta 0:01:36 lr 0.000695 wd 0.0500 time 0.4465 (0.4468) data time 0.0007 (0.0024) model time 0.4458 (0.4435) loss 3.3481 (2.9827) grad_norm 1.7407 (1.7620) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][420/625] eta 0:01:31 lr 0.000694 wd 0.0500 time 0.4434 (0.4467) data time 0.0009 (0.0024) model time 0.4425 (0.4435) loss 2.3738 (2.9786) grad_norm 1.4439 (1.7578) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][430/625] eta 0:01:27 lr 0.000694 wd 0.0500 time 0.4483 (0.4466) data time 0.0006 (0.0023) model time 0.4477 (0.4434) loss 2.1191 (2.9760) grad_norm 1.4001 (1.7507) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][440/625] eta 0:01:22 lr 0.000694 wd 0.0500 time 0.4405 (0.4465) data time 0.0009 (0.0023) model time 0.4396 (0.4434) loss 3.0751 (2.9697) grad_norm 2.2741 (1.7498) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][450/625] eta 0:01:18 lr 0.000694 wd 0.0500 time 0.4451 (0.4465) data time 0.0008 (0.0023) model time 0.4443 (0.4434) loss 3.2047 (2.9708) grad_norm 1.4351 (1.7511) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][460/625] eta 0:01:13 lr 0.000694 wd 0.0500 time 0.4409 (0.4469) data time 0.0008 (0.0022) model time 0.4401 (0.4439) loss 2.9036 (2.9629) grad_norm 1.7169 (1.7514) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][470/625] eta 0:01:09 lr 0.000694 wd 0.0500 time 0.4407 (0.4471) data time 0.0006 (0.0022) model time 0.4401 (0.4442) loss 2.6852 (2.9644) grad_norm 1.3863 (1.7480) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][480/625] eta 0:01:04 lr 0.000694 wd 0.0500 time 0.4447 (0.4470) data time 0.0006 (0.0022) model time 0.4440 (0.4441) loss 2.8202 (2.9651) grad_norm 1.4646 (1.7427) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:45:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][490/625] eta 0:01:00 lr 0.000694 wd 0.0500 time 0.4461 (0.4469) data time 0.0009 (0.0021) model time 0.4452 (0.4441) loss 3.2072 (2.9644) grad_norm 1.8497 (1.7483) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][500/625] eta 0:00:55 lr 0.000694 wd 0.0500 time 0.4398 (0.4468) data time 0.0007 (0.0021) model time 0.4392 (0.4441) loss 2.2013 (2.9647) grad_norm 1.1262 (1.7482) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][510/625] eta 0:00:51 lr 0.000694 wd 0.0500 time 0.4444 (0.4468) data time 0.0007 (0.0021) model time 0.4437 (0.4440) loss 2.6737 (2.9658) grad_norm 1.4802 (1.7537) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][520/625] eta 0:00:46 lr 0.000693 wd 0.0500 time 0.4459 (0.4467) data time 0.0007 (0.0021) model time 0.4453 (0.4440) loss 3.2520 (2.9670) grad_norm 1.8078 (1.7577) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][530/625] eta 0:00:42 lr 0.000693 wd 0.0500 time 0.4429 (0.4467) data time 0.0008 (0.0021) model time 0.4421 (0.4440) loss 3.2327 (2.9697) grad_norm 1.5815 (1.7588) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][540/625] eta 0:00:37 lr 0.000693 wd 0.0500 time 0.4432 (0.4466) data time 0.0009 (0.0020) model time 0.4423 (0.4440) loss 2.2034 (2.9715) grad_norm 2.2366 (1.7588) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][550/625] eta 0:00:33 lr 0.000693 wd 0.0500 time 0.4424 (0.4465) data time 0.0006 (0.0020) model time 0.4417 (0.4439) loss 3.4986 (2.9691) grad_norm 1.6114 (1.7635) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][560/625] eta 0:00:29 lr 0.000693 wd 0.0500 time 0.4420 (0.4465) data time 0.0010 (0.0020) model time 0.4409 (0.4439) loss 2.9964 (2.9689) grad_norm 1.6477 (1.7643) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][570/625] eta 0:00:24 lr 0.000693 wd 0.0500 time 0.4425 (0.4464) data time 0.0007 (0.0020) model time 0.4418 (0.4439) loss 3.5254 (2.9725) grad_norm 2.0106 (1.7648) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][580/625] eta 0:00:20 lr 0.000693 wd 0.0500 time 0.4491 (0.4464) data time 0.0009 (0.0020) model time 0.4481 (0.4439) loss 2.0529 (2.9722) grad_norm 2.4950 (1.7647) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][590/625] eta 0:00:15 lr 0.000693 wd 0.0500 time 0.4496 (0.4464) data time 0.0007 (0.0019) model time 0.4489 (0.4439) loss 2.5142 (2.9692) grad_norm 1.5369 (1.7616) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][600/625] eta 0:00:11 lr 0.000693 wd 0.0500 time 0.4649 (0.4467) data time 0.0006 (0.0019) model time 0.4642 (0.4442) loss 2.5761 (2.9667) grad_norm 1.1687 (1.7598) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][610/625] eta 0:00:06 lr 0.000692 wd 0.0500 time 0.4378 (0.4466) data time 0.0004 (0.0019) model time 0.4374 (0.4442) loss 3.7276 (2.9710) grad_norm 2.7544 (1.7774) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [146/300][620/625] eta 0:00:02 lr 0.000692 wd 0.0500 time 0.4363 (0.4465) data time 0.0004 (0.0019) model time 0.4360 (0.4441) loss 3.0609 (2.9696) grad_norm 1.4135 (1.7793) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:46:59 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 146 training takes 0:04:39 [2024-08-10 13:46:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:47:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:47:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.5449 (0.5449) Acc@1 87.939 (87.939) Acc@5 98.242 (98.242) Mem 16699MB [2024-08-10 13:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.154) Loss 0.9204 (0.6669) Acc@1 78.174 (85.494) Acc@5 95.068 (97.359) Mem 16699MB [2024-08-10 13:47:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.9824 (0.7987) Acc@1 77.051 (82.173) Acc@5 94.580 (96.003) Mem 16699MB [2024-08-10 13:47:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.852 Acc@5 96.071 [2024-08-10 13:47:04 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.9% [2024-08-10 13:47:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.937 (0.937) Loss 0.4736 (0.4736) Acc@1 89.502 (89.502) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 13:47:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.194) Loss 0.7637 (0.5975) Acc@1 82.031 (86.950) Acc@5 96.631 (97.847) Mem 16699MB [2024-08-10 13:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.156) Loss 0.8735 (0.7020) Acc@1 78.369 (83.987) Acc@5 95.752 (96.754) Mem 16699MB [2024-08-10 13:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.681 Acc@5 96.785 [2024-08-10 13:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 13:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][0/625] eta 0:13:22 lr 0.000692 wd 0.0500 time 1.2835 (1.2835) data time 0.6540 (0.6540) model time 0.0000 (0.0000) loss 2.7508 (2.7508) grad_norm 2.8923 (2.8923) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][10/625] eta 0:05:19 lr 0.000692 wd 0.0500 time 0.4420 (0.5192) data time 0.0010 (0.0604) model time 0.0000 (0.0000) loss 2.4696 (3.0050) grad_norm 1.7735 (1.9644) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][20/625] eta 0:04:52 lr 0.000692 wd 0.0500 time 0.4419 (0.4835) data time 0.0008 (0.0321) model time 0.0000 (0.0000) loss 3.2290 (3.0204) grad_norm 1.8922 (2.1104) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][30/625] eta 0:04:40 lr 0.000692 wd 0.0500 time 0.4474 (0.4709) data time 0.0006 (0.0220) model time 0.0000 (0.0000) loss 2.2512 (2.9898) grad_norm 1.6296 (1.9443) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][40/625] eta 0:04:31 lr 0.000692 wd 0.0500 time 0.4417 (0.4641) data time 0.0008 (0.0169) model time 0.0000 (0.0000) loss 3.4401 (2.9624) grad_norm 1.5249 (1.8942) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][50/625] eta 0:04:24 lr 0.000692 wd 0.0500 time 0.4397 (0.4595) data time 0.0007 (0.0137) model time 0.0000 (0.0000) loss 3.4580 (3.0196) grad_norm 1.6712 (1.8722) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][60/625] eta 0:04:18 lr 0.000692 wd 0.0500 time 0.4436 (0.4568) data time 0.0007 (0.0117) model time 0.4429 (0.4416) loss 2.5080 (2.9836) grad_norm 1.3761 (1.8377) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][70/625] eta 0:04:12 lr 0.000692 wd 0.0500 time 0.4503 (0.4549) data time 0.0009 (0.0101) model time 0.4494 (0.4423) loss 3.1224 (2.9751) grad_norm 2.4773 (1.8998) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][80/625] eta 0:04:08 lr 0.000691 wd 0.0500 time 0.4450 (0.4552) data time 0.0009 (0.0090) model time 0.4441 (0.4470) loss 3.3870 (2.9521) grad_norm 1.5988 (1.9146) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][90/625] eta 0:04:02 lr 0.000691 wd 0.0500 time 0.4408 (0.4539) data time 0.0007 (0.0081) model time 0.4401 (0.4458) loss 1.8311 (2.9648) grad_norm 2.0005 (1.8711) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][100/625] eta 0:03:57 lr 0.000691 wd 0.0500 time 0.4419 (0.4529) data time 0.0009 (0.0074) model time 0.4411 (0.4451) loss 3.3181 (2.9635) grad_norm 1.9164 (1.8442) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:47:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][110/625] eta 0:03:52 lr 0.000691 wd 0.0500 time 0.4449 (0.4520) data time 0.0007 (0.0068) model time 0.4443 (0.4446) loss 3.3995 (2.9553) grad_norm 2.4526 (1.9020) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][120/625] eta 0:03:47 lr 0.000691 wd 0.0500 time 0.4385 (0.4512) data time 0.0009 (0.0063) model time 0.4376 (0.4442) loss 3.3422 (2.9573) grad_norm 1.4885 (1.8681) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][130/625] eta 0:03:43 lr 0.000691 wd 0.0500 time 0.4393 (0.4506) data time 0.0009 (0.0059) model time 0.4384 (0.4439) loss 2.9923 (2.9629) grad_norm 1.6816 (1.8624) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][140/625] eta 0:03:39 lr 0.000691 wd 0.0500 time 0.4508 (0.4517) data time 0.0009 (0.0056) model time 0.4499 (0.4463) loss 3.4087 (2.9596) grad_norm 1.5372 (1.8439) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][150/625] eta 0:03:34 lr 0.000691 wd 0.0500 time 0.4441 (0.4511) data time 0.0006 (0.0053) model time 0.4435 (0.4459) loss 3.5507 (2.9537) grad_norm 1.3367 (1.8250) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][160/625] eta 0:03:29 lr 0.000691 wd 0.0500 time 0.4456 (0.4508) data time 0.0006 (0.0050) model time 0.4449 (0.4458) loss 3.8673 (2.9488) grad_norm 1.7782 (1.8431) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][170/625] eta 0:03:24 lr 0.000691 wd 0.0500 time 0.4438 (0.4504) data time 0.0009 (0.0048) model time 0.4429 (0.4456) loss 3.2421 (2.9586) grad_norm 2.1335 (1.8374) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][180/625] eta 0:03:20 lr 0.000690 wd 0.0500 time 0.4430 (0.4500) data time 0.0007 (0.0046) model time 0.4423 (0.4453) loss 2.0724 (2.9580) grad_norm 1.5597 (1.8309) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][190/625] eta 0:03:15 lr 0.000690 wd 0.0500 time 0.4443 (0.4496) data time 0.0009 (0.0044) model time 0.4434 (0.4450) loss 2.1815 (2.9519) grad_norm 1.0816 (1.8213) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][200/625] eta 0:03:10 lr 0.000690 wd 0.0500 time 0.4469 (0.4492) data time 0.0009 (0.0042) model time 0.4460 (0.4447) loss 3.4679 (2.9622) grad_norm 1.5756 (1.8154) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][210/625] eta 0:03:06 lr 0.000690 wd 0.0500 time 0.4434 (0.4495) data time 0.0007 (0.0041) model time 0.4427 (0.4454) loss 3.2967 (2.9660) grad_norm 1.4602 (1.8033) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][220/625] eta 0:03:01 lr 0.000690 wd 0.0500 time 0.4430 (0.4493) data time 0.0006 (0.0039) model time 0.4424 (0.4453) loss 2.0483 (2.9509) grad_norm 2.1721 (1.7921) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][230/625] eta 0:02:57 lr 0.000690 wd 0.0500 time 0.4427 (0.4492) data time 0.0006 (0.0038) model time 0.4421 (0.4453) loss 3.3504 (2.9539) grad_norm 3.8151 (1.8082) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][240/625] eta 0:02:52 lr 0.000690 wd 0.0500 time 0.4432 (0.4489) data time 0.0009 (0.0037) model time 0.4423 (0.4451) loss 3.0562 (2.9631) grad_norm 1.5140 (1.8013) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][250/625] eta 0:02:48 lr 0.000690 wd 0.0500 time 0.4413 (0.4487) data time 0.0007 (0.0035) model time 0.4406 (0.4450) loss 3.5582 (2.9594) grad_norm 1.3732 (1.7900) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][260/625] eta 0:02:43 lr 0.000690 wd 0.0500 time 0.4417 (0.4485) data time 0.0006 (0.0035) model time 0.4411 (0.4449) loss 2.4529 (2.9605) grad_norm 1.1614 (1.7761) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][270/625] eta 0:02:39 lr 0.000689 wd 0.0500 time 0.4420 (0.4483) data time 0.0008 (0.0034) model time 0.4412 (0.4447) loss 3.2516 (2.9593) grad_norm 1.5821 (1.7657) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][280/625] eta 0:02:34 lr 0.000689 wd 0.0500 time 0.4484 (0.4481) data time 0.0008 (0.0033) model time 0.4476 (0.4446) loss 2.8754 (2.9611) grad_norm 1.2337 (1.7496) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][290/625] eta 0:02:30 lr 0.000689 wd 0.0500 time 0.4440 (0.4479) data time 0.0009 (0.0032) model time 0.4430 (0.4445) loss 2.8062 (2.9633) grad_norm 1.7499 (1.7528) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][300/625] eta 0:02:25 lr 0.000689 wd 0.0500 time 0.4391 (0.4483) data time 0.0009 (0.0031) model time 0.4382 (0.4451) loss 2.0028 (2.9621) grad_norm 1.5728 (1.7452) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][310/625] eta 0:02:21 lr 0.000689 wd 0.0500 time 0.4477 (0.4482) data time 0.0008 (0.0030) model time 0.4468 (0.4451) loss 3.2469 (2.9669) grad_norm 2.3731 (1.7447) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][320/625] eta 0:02:16 lr 0.000689 wd 0.0500 time 0.4465 (0.4482) data time 0.0006 (0.0030) model time 0.4459 (0.4451) loss 3.2651 (2.9684) grad_norm 1.9380 (1.7420) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][330/625] eta 0:02:12 lr 0.000689 wd 0.0500 time 0.4442 (0.4480) data time 0.0009 (0.0029) model time 0.4433 (0.4450) loss 2.8569 (2.9666) grad_norm 1.5810 (1.7405) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][340/625] eta 0:02:07 lr 0.000689 wd 0.0500 time 0.4403 (0.4479) data time 0.0007 (0.0029) model time 0.4397 (0.4449) loss 2.8882 (2.9625) grad_norm 2.9323 (1.7617) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][350/625] eta 0:02:03 lr 0.000689 wd 0.0500 time 0.4429 (0.4484) data time 0.0009 (0.0028) model time 0.4419 (0.4455) loss 2.1534 (2.9596) grad_norm 1.5026 (1.7585) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][360/625] eta 0:01:58 lr 0.000689 wd 0.0500 time 0.4414 (0.4482) data time 0.0006 (0.0027) model time 0.4408 (0.4454) loss 2.1739 (2.9593) grad_norm 1.4633 (1.7461) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][370/625] eta 0:01:54 lr 0.000688 wd 0.0500 time 0.4452 (0.4481) data time 0.0007 (0.0027) model time 0.4445 (0.4453) loss 3.2300 (2.9626) grad_norm 1.6876 (1.7460) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:49:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][380/625] eta 0:01:49 lr 0.000688 wd 0.0500 time 0.4408 (0.4480) data time 0.0006 (0.0027) model time 0.4402 (0.4452) loss 2.9436 (2.9616) grad_norm 1.8900 (1.7476) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][390/625] eta 0:01:45 lr 0.000688 wd 0.0500 time 0.4461 (0.4479) data time 0.0009 (0.0026) model time 0.4452 (0.4452) loss 3.7147 (2.9690) grad_norm 1.9940 (1.7428) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][400/625] eta 0:01:40 lr 0.000688 wd 0.0500 time 0.4453 (0.4478) data time 0.0007 (0.0026) model time 0.4446 (0.4451) loss 2.8951 (2.9709) grad_norm 1.8724 (1.7408) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][410/625] eta 0:01:36 lr 0.000688 wd 0.0500 time 0.4458 (0.4477) data time 0.0007 (0.0025) model time 0.4451 (0.4450) loss 2.8158 (2.9707) grad_norm 1.5683 (1.7492) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][420/625] eta 0:01:31 lr 0.000688 wd 0.0500 time 0.4456 (0.4476) data time 0.0008 (0.0025) model time 0.4447 (0.4450) loss 3.0838 (2.9730) grad_norm 1.8055 (1.7530) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][430/625] eta 0:01:27 lr 0.000688 wd 0.0500 time 0.4454 (0.4475) data time 0.0007 (0.0025) model time 0.4448 (0.4449) loss 3.5078 (2.9817) grad_norm 1.4853 (1.7492) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][440/625] eta 0:01:22 lr 0.000688 wd 0.0500 time 0.4430 (0.4474) data time 0.0006 (0.0024) model time 0.4424 (0.4449) loss 2.9772 (2.9812) grad_norm 1.8404 (1.7636) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][450/625] eta 0:01:18 lr 0.000688 wd 0.0500 time 0.4467 (0.4476) data time 0.0008 (0.0024) model time 0.4459 (0.4452) loss 3.7936 (2.9874) grad_norm 1.9107 (1.7620) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][460/625] eta 0:01:13 lr 0.000687 wd 0.0500 time 0.4405 (0.4475) data time 0.0008 (0.0024) model time 0.4397 (0.4451) loss 2.3691 (2.9860) grad_norm 1.7838 (1.7613) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][470/625] eta 0:01:09 lr 0.000687 wd 0.0500 time 0.4425 (0.4479) data time 0.0008 (0.0023) model time 0.4417 (0.4455) loss 3.2030 (2.9904) grad_norm 1.4221 (1.7581) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][480/625] eta 0:01:04 lr 0.000687 wd 0.0500 time 0.4444 (0.4478) data time 0.0006 (0.0023) model time 0.4438 (0.4455) loss 2.8943 (2.9904) grad_norm 1.3694 (1.7634) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][490/625] eta 0:01:00 lr 0.000687 wd 0.0500 time 0.4328 (0.4477) data time 0.0008 (0.0023) model time 0.4320 (0.4454) loss 2.6825 (2.9869) grad_norm 1.4714 (1.7557) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][500/625] eta 0:00:55 lr 0.000687 wd 0.0500 time 0.4472 (0.4476) data time 0.0007 (0.0022) model time 0.4465 (0.4453) loss 3.3234 (2.9865) grad_norm 1.9451 (1.7679) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][510/625] eta 0:00:51 lr 0.000687 wd 0.0500 time 0.4413 (0.4475) data time 0.0009 (0.0022) model time 0.4404 (0.4452) loss 3.1776 (2.9856) grad_norm 2.0454 (1.7700) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 13:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][520/625] eta 0:00:46 lr 0.000687 wd 0.0500 time 0.4399 (0.4474) data time 0.0007 (0.0022) model time 0.4392 (0.4452) loss 3.4429 (2.9853) grad_norm 1.8520 (1.7749) loss_scale 1024.0000 (516.9136) mem 16699MB [2024-08-10 13:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][530/625] eta 0:00:42 lr 0.000687 wd 0.0500 time 0.4473 (0.4473) data time 0.0006 (0.0022) model time 0.4466 (0.4451) loss 2.9434 (2.9857) grad_norm 1.2119 (1.7738) loss_scale 1024.0000 (526.4633) mem 16699MB [2024-08-10 13:51:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][540/625] eta 0:00:38 lr 0.000687 wd 0.0500 time 0.4434 (0.4473) data time 0.0006 (0.0021) model time 0.4428 (0.4451) loss 2.4445 (2.9864) grad_norm 1.5217 (1.7674) loss_scale 1024.0000 (535.6599) mem 16699MB [2024-08-10 13:51:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][550/625] eta 0:00:33 lr 0.000687 wd 0.0500 time 0.4453 (0.4472) data time 0.0006 (0.0021) model time 0.4447 (0.4450) loss 1.8410 (2.9861) grad_norm 1.4011 (1.7640) loss_scale 1024.0000 (544.5227) mem 16699MB [2024-08-10 13:51:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][560/625] eta 0:00:29 lr 0.000686 wd 0.0500 time 0.4461 (0.4471) data time 0.0009 (0.0021) model time 0.4452 (0.4450) loss 3.2518 (2.9845) grad_norm 1.5813 (1.7651) loss_scale 1024.0000 (553.0695) mem 16699MB [2024-08-10 13:51:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][570/625] eta 0:00:24 lr 0.000686 wd 0.0500 time 0.4445 (0.4471) data time 0.0008 (0.0021) model time 0.4437 (0.4449) loss 3.3304 (2.9841) grad_norm 2.3846 (1.7672) loss_scale 1024.0000 (561.3170) mem 16699MB [2024-08-10 13:51:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][580/625] eta 0:00:20 lr 0.000686 wd 0.0500 time 0.4389 (0.4470) data time 0.0009 (0.0020) model time 0.4380 (0.4448) loss 3.2773 (2.9842) grad_norm 1.7531 (1.7865) loss_scale 1024.0000 (569.2806) mem 16699MB [2024-08-10 13:51:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][590/625] eta 0:00:15 lr 0.000686 wd 0.0500 time 0.4424 (0.4469) data time 0.0007 (0.0020) model time 0.4417 (0.4448) loss 2.9617 (2.9810) grad_norm 1.8716 (1.7856) loss_scale 1024.0000 (576.9746) mem 16699MB [2024-08-10 13:51:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][600/625] eta 0:00:11 lr 0.000686 wd 0.0500 time 0.4426 (0.4468) data time 0.0007 (0.0020) model time 0.4419 (0.4447) loss 1.9183 (2.9765) grad_norm 1.2536 (1.7850) loss_scale 1024.0000 (584.4126) mem 16699MB [2024-08-10 13:51:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][610/625] eta 0:00:06 lr 0.000686 wd 0.0500 time 0.4343 (0.4468) data time 0.0005 (0.0020) model time 0.4338 (0.4447) loss 3.3012 (2.9746) grad_norm 1.4126 (1.7836) loss_scale 1024.0000 (591.6072) mem 16699MB [2024-08-10 13:51:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [147/300][620/625] eta 0:00:02 lr 0.000686 wd 0.0500 time 0.4444 (0.4467) data time 0.0007 (0.0020) model time 0.4438 (0.4446) loss 3.5806 (2.9744) grad_norm 1.7222 (1.7841) loss_scale 1024.0000 (598.5700) mem 16699MB [2024-08-10 13:51:47 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 147 training takes 0:04:39 [2024-08-10 13:51:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:51:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:51:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5273 (0.5273) Acc@1 88.721 (88.721) Acc@5 98.096 (98.096) Mem 16699MB [2024-08-10 13:51:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8950 (0.6674) Acc@1 79.199 (85.276) Acc@5 95.361 (97.408) Mem 16699MB [2024-08-10 13:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9409 (0.7802) Acc@1 76.416 (82.294) Acc@5 95.166 (96.164) Mem 16699MB [2024-08-10 13:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.962 Acc@5 96.159 [2024-08-10 13:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-10 13:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.96% [2024-08-10 13:51:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 13:51:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 13:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 0.4736 (0.4736) Acc@1 89.502 (89.502) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 13:51:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.7651 (0.5976) Acc@1 82.080 (86.972) Acc@5 96.582 (97.883) Mem 16699MB [2024-08-10 13:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.135) Loss 0.8735 (0.7018) Acc@1 78.516 (84.024) Acc@5 95.801 (96.801) Mem 16699MB [2024-08-10 13:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.713 Acc@5 96.829 [2024-08-10 13:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 13:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.71% [2024-08-10 13:51:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 13:51:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 13:51:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][0/625] eta 0:07:35 lr 0.000686 wd 0.0500 time 0.7285 (0.7285) data time 0.3333 (0.3333) model time 0.0000 (0.0000) loss 2.6371 (2.6371) grad_norm 2.0081 (2.0081) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][10/625] eta 0:04:55 lr 0.000686 wd 0.0500 time 0.4397 (0.4802) data time 0.0006 (0.0311) model time 0.0000 (0.0000) loss 2.8769 (2.9418) grad_norm 1.4744 (1.7317) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][20/625] eta 0:04:39 lr 0.000686 wd 0.0500 time 0.4404 (0.4623) data time 0.0008 (0.0167) model time 0.0000 (0.0000) loss 3.3801 (2.9829) grad_norm 1.9114 (1.6780) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][30/625] eta 0:04:34 lr 0.000685 wd 0.0500 time 0.4409 (0.4617) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 2.5119 (2.9340) grad_norm 2.0458 (1.6270) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][40/625] eta 0:04:27 lr 0.000685 wd 0.0500 time 0.4416 (0.4570) data time 0.0006 (0.0090) model time 0.0000 (0.0000) loss 3.3951 (2.9693) grad_norm 1.9452 (1.6710) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][50/625] eta 0:04:21 lr 0.000685 wd 0.0500 time 0.4427 (0.4541) data time 0.0006 (0.0074) model time 0.0000 (0.0000) loss 3.4276 (2.9844) grad_norm 1.7180 (1.7081) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][60/625] eta 0:04:17 lr 0.000685 wd 0.0500 time 0.4436 (0.4558) data time 0.0009 (0.0064) model time 0.4427 (0.4635) loss 3.4248 (2.9749) grad_norm 1.7638 (1.6814) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][70/625] eta 0:04:11 lr 0.000685 wd 0.0500 time 0.4437 (0.4539) data time 0.0008 (0.0056) model time 0.4429 (0.4523) loss 3.3468 (2.9561) grad_norm 2.7062 (1.8816) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][80/625] eta 0:04:06 lr 0.000685 wd 0.0500 time 0.4391 (0.4524) data time 0.0009 (0.0050) model time 0.4382 (0.4486) loss 2.9835 (2.9783) grad_norm 1.4753 (1.8350) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][90/625] eta 0:04:01 lr 0.000685 wd 0.0500 time 0.4398 (0.4514) data time 0.0006 (0.0046) model time 0.4391 (0.4470) loss 2.7557 (3.0146) grad_norm 1.2284 (1.7819) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][100/625] eta 0:03:56 lr 0.000685 wd 0.0500 time 0.4628 (0.4508) data time 0.0006 (0.0042) model time 0.4622 (0.4464) loss 3.8116 (3.0204) grad_norm 1.7205 (1.7657) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][110/625] eta 0:03:51 lr 0.000685 wd 0.0500 time 0.4511 (0.4502) data time 0.0006 (0.0039) model time 0.4504 (0.4460) loss 2.4878 (3.0296) grad_norm 2.5679 (1.7663) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][120/625] eta 0:03:47 lr 0.000684 wd 0.0500 time 0.4412 (0.4497) data time 0.0006 (0.0036) model time 0.4406 (0.4455) loss 3.4966 (3.0305) grad_norm 1.6702 (1.7606) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:52:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][130/625] eta 0:03:42 lr 0.000684 wd 0.0500 time 0.4453 (0.4492) data time 0.0009 (0.0034) model time 0.4444 (0.4452) loss 2.8186 (3.0217) grad_norm 1.6182 (1.7585) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][140/625] eta 0:03:37 lr 0.000684 wd 0.0500 time 0.4412 (0.4487) data time 0.0010 (0.0033) model time 0.4402 (0.4448) loss 2.9701 (2.9994) grad_norm 1.4399 (1.7375) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][150/625] eta 0:03:32 lr 0.000684 wd 0.0500 time 0.4440 (0.4483) data time 0.0007 (0.0031) model time 0.4432 (0.4444) loss 3.0500 (3.0033) grad_norm 1.7396 (1.7368) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][160/625] eta 0:03:28 lr 0.000684 wd 0.0500 time 0.4432 (0.4479) data time 0.0007 (0.0030) model time 0.4425 (0.4441) loss 3.0436 (2.9991) grad_norm 2.5518 (1.7436) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][170/625] eta 0:03:23 lr 0.000684 wd 0.0500 time 0.4435 (0.4476) data time 0.0009 (0.0029) model time 0.4425 (0.4439) loss 2.8822 (2.9988) grad_norm 1.8863 (1.7599) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][180/625] eta 0:03:19 lr 0.000684 wd 0.0500 time 0.4403 (0.4473) data time 0.0009 (0.0028) model time 0.4394 (0.4437) loss 3.3493 (2.9970) grad_norm 1.4611 (1.7573) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][190/625] eta 0:03:14 lr 0.000684 wd 0.0500 time 0.4425 (0.4471) data time 0.0007 (0.0027) model time 0.4419 (0.4437) loss 2.8304 (2.9876) grad_norm 2.1999 (1.7785) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][200/625] eta 0:03:09 lr 0.000684 wd 0.0500 time 0.4424 (0.4469) data time 0.0009 (0.0026) model time 0.4415 (0.4435) loss 3.5241 (2.9821) grad_norm 2.0558 (1.7799) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][210/625] eta 0:03:05 lr 0.000684 wd 0.0500 time 0.4364 (0.4466) data time 0.0007 (0.0025) model time 0.4357 (0.4433) loss 2.2066 (2.9844) grad_norm 1.1230 (1.7658) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][220/625] eta 0:03:00 lr 0.000683 wd 0.0500 time 0.4393 (0.4464) data time 0.0009 (0.0024) model time 0.4384 (0.4431) loss 2.7922 (2.9817) grad_norm 1.4056 (1.7579) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][230/625] eta 0:02:56 lr 0.000683 wd 0.0500 time 0.3895 (0.4470) data time 0.0008 (0.0024) model time 0.3886 (0.4440) loss 3.2292 (2.9790) grad_norm 1.9789 (1.7581) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][240/625] eta 0:02:51 lr 0.000683 wd 0.0500 time 0.4403 (0.4467) data time 0.0008 (0.0023) model time 0.4394 (0.4438) loss 2.7563 (2.9824) grad_norm 1.5594 (1.7645) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][250/625] eta 0:02:47 lr 0.000683 wd 0.0500 time 0.4388 (0.4473) data time 0.0008 (0.0023) model time 0.4380 (0.4446) loss 2.7893 (2.9855) grad_norm 1.1789 (1.7577) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][260/625] eta 0:02:43 lr 0.000683 wd 0.0500 time 0.4409 (0.4471) data time 0.0009 (0.0022) model time 0.4400 (0.4444) loss 3.3066 (2.9850) grad_norm 1.6046 (1.7453) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:53:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][270/625] eta 0:02:38 lr 0.000683 wd 0.0500 time 0.4400 (0.4469) data time 0.0009 (0.0022) model time 0.4391 (0.4442) loss 3.0332 (2.9948) grad_norm 1.2453 (1.7468) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][280/625] eta 0:02:34 lr 0.000683 wd 0.0500 time 0.4408 (0.4467) data time 0.0007 (0.0021) model time 0.4402 (0.4441) loss 3.3819 (2.9905) grad_norm 1.1542 (1.7416) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][290/625] eta 0:02:29 lr 0.000683 wd 0.0500 time 0.4380 (0.4465) data time 0.0006 (0.0021) model time 0.4374 (0.4440) loss 2.9627 (2.9926) grad_norm 2.5423 (1.7361) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][300/625] eta 0:02:25 lr 0.000683 wd 0.0500 time 0.4344 (0.4464) data time 0.0010 (0.0020) model time 0.4334 (0.4438) loss 3.7296 (2.9862) grad_norm 1.3356 (1.7297) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][310/625] eta 0:02:20 lr 0.000682 wd 0.0500 time 0.4634 (0.4463) data time 0.0008 (0.0020) model time 0.4625 (0.4438) loss 2.5015 (2.9842) grad_norm 1.3579 (1.7197) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][320/625] eta 0:02:16 lr 0.000682 wd 0.0500 time 0.4428 (0.4463) data time 0.0007 (0.0020) model time 0.4421 (0.4438) loss 3.1213 (2.9897) grad_norm 1.4700 (1.7388) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][330/625] eta 0:02:11 lr 0.000682 wd 0.0500 time 0.4441 (0.4463) data time 0.0006 (0.0020) model time 0.4435 (0.4439) loss 1.8073 (2.9871) grad_norm 1.3174 (1.7379) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][340/625] eta 0:02:07 lr 0.000682 wd 0.0500 time 0.4442 (0.4462) data time 0.0008 (0.0019) model time 0.4434 (0.4438) loss 3.2150 (2.9865) grad_norm 1.7760 (1.7454) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][350/625] eta 0:02:02 lr 0.000682 wd 0.0500 time 0.4411 (0.4461) data time 0.0008 (0.0019) model time 0.4403 (0.4438) loss 3.0134 (2.9969) grad_norm 1.9367 (1.7470) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][360/625] eta 0:01:58 lr 0.000682 wd 0.0500 time 0.4462 (0.4465) data time 0.0007 (0.0019) model time 0.4455 (0.4443) loss 1.9499 (2.9929) grad_norm 3.4697 (1.7566) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][370/625] eta 0:01:53 lr 0.000682 wd 0.0500 time 0.4466 (0.4464) data time 0.0009 (0.0019) model time 0.4457 (0.4442) loss 3.3332 (2.9966) grad_norm 1.5255 (1.7665) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][380/625] eta 0:01:49 lr 0.000682 wd 0.0500 time 0.4422 (0.4468) data time 0.0007 (0.0018) model time 0.4415 (0.4447) loss 3.6966 (2.9962) grad_norm 1.7158 (1.7664) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][390/625] eta 0:01:44 lr 0.000682 wd 0.0500 time 0.4413 (0.4467) data time 0.0006 (0.0018) model time 0.4406 (0.4446) loss 3.5582 (2.9977) grad_norm 2.0399 (1.7676) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][400/625] eta 0:01:40 lr 0.000682 wd 0.0500 time 0.4418 (0.4466) data time 0.0006 (0.0018) model time 0.4411 (0.4446) loss 3.0050 (2.9942) grad_norm 2.2613 (1.7700) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][410/625] eta 0:01:36 lr 0.000681 wd 0.0500 time 0.4493 (0.4467) data time 0.0007 (0.0019) model time 0.4486 (0.4446) loss 1.9087 (2.9951) grad_norm 2.0096 (1.7838) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][420/625] eta 0:01:31 lr 0.000681 wd 0.0500 time 0.4423 (0.4467) data time 0.0007 (0.0019) model time 0.4416 (0.4446) loss 2.6157 (2.9906) grad_norm 1.8234 (1.7763) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][430/625] eta 0:01:27 lr 0.000681 wd 0.0500 time 0.4403 (0.4466) data time 0.0009 (0.0019) model time 0.4394 (0.4445) loss 2.4778 (2.9917) grad_norm 2.0050 (1.7857) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][440/625] eta 0:01:22 lr 0.000681 wd 0.0500 time 0.4482 (0.4466) data time 0.0007 (0.0018) model time 0.4475 (0.4445) loss 3.4223 (2.9886) grad_norm 1.6747 (1.7921) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][450/625] eta 0:01:18 lr 0.000681 wd 0.0500 time 0.4450 (0.4466) data time 0.0006 (0.0018) model time 0.4444 (0.4445) loss 3.7251 (2.9941) grad_norm 1.0928 (1.7831) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][460/625] eta 0:01:13 lr 0.000681 wd 0.0500 time 0.4439 (0.4465) data time 0.0007 (0.0018) model time 0.4433 (0.4444) loss 2.4172 (2.9944) grad_norm 1.4363 (1.7755) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][470/625] eta 0:01:09 lr 0.000681 wd 0.0500 time 0.4444 (0.4470) data time 0.0010 (0.0018) model time 0.4434 (0.4450) loss 2.8075 (2.9934) grad_norm 7.3689 (1.7837) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][480/625] eta 0:01:04 lr 0.000681 wd 0.0500 time 0.4415 (0.4469) data time 0.0006 (0.0018) model time 0.4409 (0.4449) loss 3.3133 (2.9918) grad_norm 1.4488 (1.7879) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][490/625] eta 0:01:00 lr 0.000681 wd 0.0500 time 0.4469 (0.4468) data time 0.0008 (0.0018) model time 0.4461 (0.4449) loss 2.6842 (2.9882) grad_norm 1.4594 (1.7865) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][500/625] eta 0:00:55 lr 0.000680 wd 0.0500 time 0.4445 (0.4468) data time 0.0007 (0.0017) model time 0.4438 (0.4448) loss 3.3956 (2.9881) grad_norm 1.1807 (1.7858) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][510/625] eta 0:00:51 lr 0.000680 wd 0.0500 time 0.4405 (0.4467) data time 0.0007 (0.0017) model time 0.4397 (0.4448) loss 2.9760 (2.9920) grad_norm 1.1606 (1.7798) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][520/625] eta 0:00:46 lr 0.000680 wd 0.0500 time 0.4436 (0.4466) data time 0.0006 (0.0017) model time 0.4430 (0.4447) loss 2.2106 (2.9920) grad_norm 1.6289 (1.7731) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][530/625] eta 0:00:42 lr 0.000680 wd 0.0500 time 0.4416 (0.4465) data time 0.0006 (0.0017) model time 0.4410 (0.4446) loss 2.0620 (2.9881) grad_norm 1.4524 (1.7726) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:55:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][540/625] eta 0:00:37 lr 0.000680 wd 0.0500 time 0.4473 (0.4465) data time 0.0009 (0.0017) model time 0.4464 (0.4446) loss 2.9871 (2.9903) grad_norm 1.4605 (1.7711) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][550/625] eta 0:00:33 lr 0.000680 wd 0.0500 time 0.4450 (0.4466) data time 0.0009 (0.0017) model time 0.4441 (0.4448) loss 3.4001 (2.9906) grad_norm 1.5919 (1.7714) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][560/625] eta 0:00:29 lr 0.000680 wd 0.0500 time 0.4426 (0.4466) data time 0.0009 (0.0017) model time 0.4417 (0.4448) loss 3.2090 (2.9901) grad_norm 1.5253 (1.7707) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][570/625] eta 0:00:24 lr 0.000680 wd 0.0500 time 0.4439 (0.4466) data time 0.0007 (0.0016) model time 0.4432 (0.4447) loss 3.6588 (2.9935) grad_norm 2.1209 (1.7724) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][580/625] eta 0:00:20 lr 0.000680 wd 0.0500 time 0.4430 (0.4465) data time 0.0009 (0.0016) model time 0.4421 (0.4447) loss 2.9415 (2.9920) grad_norm 2.9948 (1.7742) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][590/625] eta 0:00:15 lr 0.000679 wd 0.0500 time 0.4444 (0.4464) data time 0.0009 (0.0016) model time 0.4436 (0.4446) loss 3.2787 (2.9884) grad_norm 2.7001 (1.7781) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][600/625] eta 0:00:11 lr 0.000679 wd 0.0500 time 0.4473 (0.4464) data time 0.0009 (0.0016) model time 0.4464 (0.4446) loss 3.3052 (2.9908) grad_norm 1.8020 (1.7833) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][610/625] eta 0:00:06 lr 0.000679 wd 0.0500 time 0.4413 (0.4463) data time 0.0004 (0.0016) model time 0.4409 (0.4446) loss 3.5052 (2.9928) grad_norm 1.4168 (1.7770) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [148/300][620/625] eta 0:00:02 lr 0.000679 wd 0.0500 time 0.4365 (0.4465) data time 0.0004 (0.0016) model time 0.4361 (0.4448) loss 2.3753 (2.9886) grad_norm 1.4284 (1.7765) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 148 training takes 0:04:39 [2024-08-10 13:56:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 13:56:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 13:56:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.476 (0.476) Loss 0.5552 (0.5552) Acc@1 87.842 (87.842) Acc@5 98.340 (98.340) Mem 16699MB [2024-08-10 13:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8730 (0.6717) Acc@1 80.469 (85.502) Acc@5 95.459 (97.452) Mem 16699MB [2024-08-10 13:56:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9707 (0.7943) Acc@1 76.514 (82.280) Acc@5 94.971 (96.126) Mem 16699MB [2024-08-10 13:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.974 Acc@5 96.125 [2024-08-10 13:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-10 13:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 81.97% [2024-08-10 13:56:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 13:56:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 13:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.476 (0.476) Loss 0.4739 (0.4739) Acc@1 89.453 (89.453) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 13:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.7661 (0.5973) Acc@1 81.934 (86.936) Acc@5 96.631 (97.892) Mem 16699MB [2024-08-10 13:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.8735 (0.7018) Acc@1 78.467 (84.003) Acc@5 95.947 (96.819) Mem 16699MB [2024-08-10 13:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.687 Acc@5 96.839 [2024-08-10 13:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 13:56:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][0/625] eta 0:12:56 lr 0.000679 wd 0.0500 time 1.2431 (1.2431) data time 0.5429 (0.5429) model time 0.0000 (0.0000) loss 3.2529 (3.2529) grad_norm 1.1930 (1.1930) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][10/625] eta 0:05:17 lr 0.000679 wd 0.0500 time 0.4418 (0.5160) data time 0.0008 (0.0503) model time 0.0000 (0.0000) loss 3.3097 (2.8444) grad_norm 1.3986 (1.4158) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][20/625] eta 0:04:50 lr 0.000679 wd 0.0500 time 0.4401 (0.4806) data time 0.0007 (0.0268) model time 0.0000 (0.0000) loss 2.9709 (2.8073) grad_norm 1.8043 (1.5763) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:57:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][30/625] eta 0:04:38 lr 0.000679 wd 0.0500 time 0.4416 (0.4681) data time 0.0009 (0.0184) model time 0.0000 (0.0000) loss 2.9540 (2.7731) grad_norm 1.4193 (1.5273) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:57:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][40/625] eta 0:04:30 lr 0.000679 wd 0.0500 time 0.4420 (0.4617) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 2.2124 (2.7918) grad_norm 1.6617 (1.6107) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:57:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][50/625] eta 0:04:23 lr 0.000679 wd 0.0500 time 0.4424 (0.4577) data time 0.0006 (0.0116) model time 0.0000 (0.0000) loss 2.3297 (2.8633) grad_norm 3.2940 (1.7855) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:57:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][60/625] eta 0:04:17 lr 0.000678 wd 0.0500 time 0.4388 (0.4552) data time 0.0006 (0.0098) model time 0.4382 (0.4417) loss 3.3350 (2.8634) grad_norm 1.8627 (1.8253) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][70/625] eta 0:04:11 lr 0.000678 wd 0.0500 time 0.4416 (0.4534) data time 0.0007 (0.0086) model time 0.4409 (0.4414) loss 3.3691 (2.8946) grad_norm 1.4828 (1.7850) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 13:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][80/625] eta 0:04:06 lr 0.000678 wd 0.0500 time 0.4406 (0.4518) data time 0.0007 (0.0076) model time 0.4399 (0.4410) loss 2.1791 (2.8773) grad_norm 1.5328 (inf) loss_scale 512.0000 (998.7160) mem 16699MB [2024-08-10 13:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][90/625] eta 0:04:01 lr 0.000678 wd 0.0500 time 0.4422 (0.4508) data time 0.0008 (0.0069) model time 0.4414 (0.4411) loss 2.0242 (2.8850) grad_norm 1.3253 (inf) loss_scale 512.0000 (945.2308) mem 16699MB [2024-08-10 13:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][100/625] eta 0:03:56 lr 0.000678 wd 0.0500 time 0.4409 (0.4498) data time 0.0006 (0.0063) model time 0.4403 (0.4410) loss 2.5752 (2.8891) grad_norm 1.5860 (inf) loss_scale 512.0000 (902.3366) mem 16699MB [2024-08-10 13:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][110/625] eta 0:03:51 lr 0.000678 wd 0.0500 time 0.4385 (0.4491) data time 0.0007 (0.0058) model time 0.4378 (0.4409) loss 2.7652 (2.9121) grad_norm 1.4047 (inf) loss_scale 512.0000 (867.1712) mem 16699MB [2024-08-10 13:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][120/625] eta 0:03:46 lr 0.000678 wd 0.0500 time 0.4412 (0.4487) data time 0.0007 (0.0054) model time 0.4406 (0.4412) loss 3.5073 (2.9388) grad_norm 1.6127 (inf) loss_scale 512.0000 (837.8182) mem 16699MB [2024-08-10 13:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][130/625] eta 0:03:42 lr 0.000678 wd 0.0500 time 0.4405 (0.4496) data time 0.0006 (0.0050) model time 0.4399 (0.4435) loss 3.3561 (2.9194) grad_norm 6.3267 (inf) loss_scale 512.0000 (812.9466) mem 16699MB [2024-08-10 13:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][140/625] eta 0:03:37 lr 0.000678 wd 0.0500 time 0.4421 (0.4490) data time 0.0008 (0.0048) model time 0.4413 (0.4433) loss 2.5170 (2.9316) grad_norm 1.4687 (inf) loss_scale 512.0000 (791.6028) mem 16699MB [2024-08-10 13:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][150/625] eta 0:03:33 lr 0.000678 wd 0.0500 time 0.4438 (0.4486) data time 0.0008 (0.0045) model time 0.4430 (0.4432) loss 3.1820 (2.9176) grad_norm 0.9282 (inf) loss_scale 512.0000 (773.0861) mem 16699MB [2024-08-10 13:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][160/625] eta 0:03:28 lr 0.000677 wd 0.0500 time 0.4404 (0.4482) data time 0.0008 (0.0043) model time 0.4396 (0.4430) loss 2.9222 (2.9242) grad_norm 1.3541 (inf) loss_scale 512.0000 (756.8696) mem 16699MB [2024-08-10 13:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][170/625] eta 0:03:23 lr 0.000677 wd 0.0500 time 0.4417 (0.4479) data time 0.0006 (0.0041) model time 0.4411 (0.4428) loss 3.2136 (2.9313) grad_norm 1.0883 (inf) loss_scale 512.0000 (742.5497) mem 16699MB [2024-08-10 13:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][180/625] eta 0:03:19 lr 0.000677 wd 0.0500 time 0.4437 (0.4476) data time 0.0006 (0.0039) model time 0.4431 (0.4428) loss 3.5303 (2.9310) grad_norm 1.3740 (inf) loss_scale 512.0000 (729.8122) mem 16699MB [2024-08-10 13:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][190/625] eta 0:03:15 lr 0.000677 wd 0.0500 time 0.4456 (0.4486) data time 0.0006 (0.0037) model time 0.4450 (0.4443) loss 2.8627 (2.9227) grad_norm 1.6068 (inf) loss_scale 512.0000 (718.4084) mem 16699MB [2024-08-10 13:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][200/625] eta 0:03:10 lr 0.000677 wd 0.0500 time 0.4483 (0.4484) data time 0.0008 (0.0036) model time 0.4475 (0.4443) loss 3.1959 (2.9210) grad_norm 1.8056 (inf) loss_scale 512.0000 (708.1393) mem 16699MB [2024-08-10 13:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][210/625] eta 0:03:06 lr 0.000677 wd 0.0500 time 0.4456 (0.4482) data time 0.0008 (0.0035) model time 0.4448 (0.4443) loss 2.2129 (2.9118) grad_norm 1.1710 (inf) loss_scale 512.0000 (698.8436) mem 16699MB [2024-08-10 13:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][220/625] eta 0:03:01 lr 0.000677 wd 0.0500 time 0.4455 (0.4480) data time 0.0006 (0.0034) model time 0.4449 (0.4442) loss 3.4286 (2.9188) grad_norm 1.9633 (inf) loss_scale 512.0000 (690.3891) mem 16699MB [2024-08-10 13:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][230/625] eta 0:02:56 lr 0.000677 wd 0.0500 time 0.4473 (0.4479) data time 0.0009 (0.0032) model time 0.4464 (0.4443) loss 2.8958 (2.9211) grad_norm 1.5931 (inf) loss_scale 512.0000 (682.6667) mem 16699MB [2024-08-10 13:58:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][240/625] eta 0:02:52 lr 0.000677 wd 0.0500 time 0.4445 (0.4477) data time 0.0011 (0.0031) model time 0.4434 (0.4441) loss 3.0859 (2.9319) grad_norm 1.3365 (inf) loss_scale 512.0000 (675.5851) mem 16699MB [2024-08-10 13:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][250/625] eta 0:02:47 lr 0.000676 wd 0.0500 time 0.4408 (0.4475) data time 0.0006 (0.0031) model time 0.4402 (0.4440) loss 2.0998 (2.9353) grad_norm 1.8006 (inf) loss_scale 512.0000 (669.0677) mem 16699MB [2024-08-10 13:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][260/625] eta 0:02:43 lr 0.000676 wd 0.0500 time 0.4437 (0.4473) data time 0.0006 (0.0030) model time 0.4431 (0.4439) loss 3.3701 (2.9329) grad_norm 1.3656 (inf) loss_scale 512.0000 (663.0498) mem 16699MB [2024-08-10 13:58:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][270/625] eta 0:02:38 lr 0.000676 wd 0.0500 time 0.4475 (0.4472) data time 0.0006 (0.0029) model time 0.4469 (0.4439) loss 3.1031 (2.9359) grad_norm 1.4406 (inf) loss_scale 512.0000 (657.4760) mem 16699MB [2024-08-10 13:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][280/625] eta 0:02:34 lr 0.000676 wd 0.0500 time 0.4436 (0.4471) data time 0.0007 (0.0029) model time 0.4430 (0.4438) loss 3.8083 (2.9375) grad_norm 2.1350 (inf) loss_scale 512.0000 (652.2989) mem 16699MB [2024-08-10 13:58:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][290/625] eta 0:02:29 lr 0.000676 wd 0.0500 time 0.4399 (0.4470) data time 0.0008 (0.0028) model time 0.4391 (0.4438) loss 3.2464 (2.9415) grad_norm 2.4421 (inf) loss_scale 512.0000 (647.4777) mem 16699MB [2024-08-10 13:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][300/625] eta 0:02:25 lr 0.000676 wd 0.0500 time 0.4432 (0.4468) data time 0.0010 (0.0028) model time 0.4422 (0.4436) loss 3.0621 (2.9374) grad_norm 1.4811 (inf) loss_scale 512.0000 (642.9767) mem 16699MB [2024-08-10 13:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][310/625] eta 0:02:20 lr 0.000676 wd 0.0500 time 0.4430 (0.4466) data time 0.0009 (0.0027) model time 0.4421 (0.4435) loss 2.6832 (2.9451) grad_norm 1.8591 (inf) loss_scale 512.0000 (638.7653) mem 16699MB [2024-08-10 13:59:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][320/625] eta 0:02:16 lr 0.000676 wd 0.0500 time 0.4394 (0.4465) data time 0.0009 (0.0026) model time 0.4385 (0.4434) loss 3.0961 (2.9461) grad_norm 2.5296 (inf) loss_scale 512.0000 (634.8162) mem 16699MB [2024-08-10 13:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][330/625] eta 0:02:11 lr 0.000676 wd 0.0500 time 0.4441 (0.4464) data time 0.0006 (0.0026) model time 0.4434 (0.4433) loss 2.2436 (2.9459) grad_norm 0.9424 (inf) loss_scale 512.0000 (631.1057) mem 16699MB [2024-08-10 13:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][340/625] eta 0:02:07 lr 0.000676 wd 0.0500 time 0.4409 (0.4467) data time 0.0008 (0.0025) model time 0.4401 (0.4438) loss 3.4998 (2.9512) grad_norm 3.1780 (inf) loss_scale 512.0000 (627.6129) mem 16699MB [2024-08-10 13:59:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][350/625] eta 0:02:03 lr 0.000675 wd 0.0500 time 0.4450 (0.4474) data time 0.0007 (0.0025) model time 0.4444 (0.4447) loss 3.3903 (2.9611) grad_norm 1.4945 (inf) loss_scale 512.0000 (624.3191) mem 16699MB [2024-08-10 13:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][360/625] eta 0:01:58 lr 0.000675 wd 0.0500 time 0.4439 (0.4473) data time 0.0008 (0.0024) model time 0.4431 (0.4446) loss 2.8962 (2.9619) grad_norm 1.4979 (inf) loss_scale 512.0000 (621.2078) mem 16699MB [2024-08-10 13:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][370/625] eta 0:01:54 lr 0.000675 wd 0.0500 time 0.4447 (0.4471) data time 0.0009 (0.0024) model time 0.4439 (0.4445) loss 2.7552 (2.9634) grad_norm 1.9866 (inf) loss_scale 512.0000 (618.2642) mem 16699MB [2024-08-10 13:59:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][380/625] eta 0:01:49 lr 0.000675 wd 0.0500 time 0.4415 (0.4470) data time 0.0007 (0.0024) model time 0.4409 (0.4444) loss 3.6920 (2.9534) grad_norm 1.6368 (inf) loss_scale 512.0000 (615.4751) mem 16699MB [2024-08-10 13:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][390/625] eta 0:01:45 lr 0.000675 wd 0.0500 time 0.4381 (0.4468) data time 0.0007 (0.0023) model time 0.4374 (0.4443) loss 3.4704 (2.9604) grad_norm 1.2737 (inf) loss_scale 512.0000 (612.8286) mem 16699MB [2024-08-10 13:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][400/625] eta 0:01:40 lr 0.000675 wd 0.0500 time 0.4414 (0.4467) data time 0.0008 (0.0023) model time 0.4406 (0.4442) loss 3.3818 (2.9642) grad_norm 2.3911 (inf) loss_scale 512.0000 (610.3142) mem 16699MB [2024-08-10 13:59:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][410/625] eta 0:01:36 lr 0.000675 wd 0.0500 time 0.4433 (0.4474) data time 0.0008 (0.0023) model time 0.4425 (0.4451) loss 3.5949 (2.9654) grad_norm 1.2987 (inf) loss_scale 512.0000 (607.9221) mem 16699MB [2024-08-10 13:59:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][420/625] eta 0:01:31 lr 0.000675 wd 0.0500 time 0.4449 (0.4473) data time 0.0006 (0.0022) model time 0.4443 (0.4450) loss 3.6309 (2.9687) grad_norm 1.6446 (inf) loss_scale 512.0000 (605.6437) mem 16699MB [2024-08-10 13:59:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][430/625] eta 0:01:27 lr 0.000675 wd 0.0500 time 0.4474 (0.4472) data time 0.0009 (0.0022) model time 0.4465 (0.4449) loss 3.4436 (2.9681) grad_norm 1.4527 (inf) loss_scale 512.0000 (603.4710) mem 16699MB [2024-08-10 14:00:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][440/625] eta 0:01:22 lr 0.000674 wd 0.0500 time 0.4510 (0.4471) data time 0.0007 (0.0022) model time 0.4504 (0.4448) loss 2.7049 (2.9619) grad_norm 1.2762 (inf) loss_scale 512.0000 (601.3968) mem 16699MB [2024-08-10 14:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][450/625] eta 0:01:18 lr 0.000674 wd 0.0500 time 0.4403 (0.4471) data time 0.0006 (0.0021) model time 0.4397 (0.4448) loss 2.6770 (2.9624) grad_norm 1.8286 (inf) loss_scale 512.0000 (599.4146) mem 16699MB [2024-08-10 14:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][460/625] eta 0:01:13 lr 0.000674 wd 0.0500 time 0.4405 (0.4470) data time 0.0006 (0.0021) model time 0.4398 (0.4447) loss 3.3242 (2.9702) grad_norm 1.5667 (inf) loss_scale 512.0000 (597.5184) mem 16699MB [2024-08-10 14:00:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][470/625] eta 0:01:09 lr 0.000674 wd 0.0500 time 0.4411 (0.4469) data time 0.0007 (0.0021) model time 0.4404 (0.4446) loss 2.6969 (2.9693) grad_norm 1.7458 (inf) loss_scale 512.0000 (595.7028) mem 16699MB [2024-08-10 14:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][480/625] eta 0:01:04 lr 0.000674 wd 0.0500 time 0.4435 (0.4468) data time 0.0008 (0.0021) model time 0.4427 (0.4446) loss 3.4370 (2.9718) grad_norm 1.9324 (inf) loss_scale 512.0000 (593.9626) mem 16699MB [2024-08-10 14:00:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][490/625] eta 0:01:00 lr 0.000674 wd 0.0500 time 0.4431 (0.4471) data time 0.0007 (0.0020) model time 0.4424 (0.4450) loss 2.7599 (2.9726) grad_norm 1.4083 (inf) loss_scale 512.0000 (592.2933) mem 16699MB [2024-08-10 14:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][500/625] eta 0:00:55 lr 0.000674 wd 0.0500 time 0.4411 (0.4473) data time 0.0009 (0.0020) model time 0.4402 (0.4452) loss 3.2753 (2.9704) grad_norm 1.4298 (inf) loss_scale 512.0000 (590.6906) mem 16699MB [2024-08-10 14:00:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][510/625] eta 0:00:51 lr 0.000674 wd 0.0500 time 0.4425 (0.4473) data time 0.0007 (0.0020) model time 0.4419 (0.4452) loss 1.9110 (2.9663) grad_norm 3.0327 (inf) loss_scale 512.0000 (589.1507) mem 16699MB [2024-08-10 14:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][520/625] eta 0:00:46 lr 0.000674 wd 0.0500 time 0.4395 (0.4472) data time 0.0008 (0.0020) model time 0.4387 (0.4451) loss 2.2492 (2.9639) grad_norm 1.6978 (inf) loss_scale 512.0000 (587.6699) mem 16699MB [2024-08-10 14:00:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][530/625] eta 0:00:42 lr 0.000674 wd 0.0500 time 0.4414 (0.4472) data time 0.0009 (0.0020) model time 0.4405 (0.4451) loss 2.7657 (2.9644) grad_norm 1.6385 (inf) loss_scale 512.0000 (586.2448) mem 16699MB [2024-08-10 14:00:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][540/625] eta 0:00:38 lr 0.000673 wd 0.0500 time 0.4391 (0.4471) data time 0.0009 (0.0019) model time 0.4383 (0.4450) loss 2.9118 (2.9650) grad_norm 1.7696 (inf) loss_scale 512.0000 (584.8725) mem 16699MB [2024-08-10 14:00:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][550/625] eta 0:00:33 lr 0.000673 wd 0.0500 time 0.4422 (0.4474) data time 0.0009 (0.0019) model time 0.4413 (0.4454) loss 2.7419 (2.9698) grad_norm 1.8353 (inf) loss_scale 512.0000 (583.5499) mem 16699MB [2024-08-10 14:00:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][560/625] eta 0:00:29 lr 0.000673 wd 0.0500 time 0.4441 (0.4476) data time 0.0008 (0.0019) model time 0.4433 (0.4457) loss 3.4440 (2.9725) grad_norm 1.5711 (inf) loss_scale 512.0000 (582.2745) mem 16699MB [2024-08-10 14:01:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][570/625] eta 0:00:24 lr 0.000673 wd 0.0500 time 0.4445 (0.4475) data time 0.0006 (0.0019) model time 0.4439 (0.4456) loss 3.5389 (2.9763) grad_norm 1.7536 (inf) loss_scale 512.0000 (581.0438) mem 16699MB [2024-08-10 14:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][580/625] eta 0:00:20 lr 0.000673 wd 0.0500 time 0.4539 (0.4475) data time 0.0006 (0.0019) model time 0.4533 (0.4455) loss 3.2647 (2.9736) grad_norm 1.9684 (inf) loss_scale 512.0000 (579.8554) mem 16699MB [2024-08-10 14:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][590/625] eta 0:00:15 lr 0.000673 wd 0.0500 time 0.4461 (0.4474) data time 0.0007 (0.0019) model time 0.4454 (0.4455) loss 3.0294 (2.9712) grad_norm 1.7669 (inf) loss_scale 512.0000 (578.7073) mem 16699MB [2024-08-10 14:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][600/625] eta 0:00:11 lr 0.000673 wd 0.0500 time 0.4415 (0.4473) data time 0.0007 (0.0018) model time 0.4408 (0.4454) loss 3.6006 (2.9719) grad_norm 1.7324 (inf) loss_scale 512.0000 (577.5973) mem 16699MB [2024-08-10 14:01:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][610/625] eta 0:00:06 lr 0.000673 wd 0.0500 time 0.4447 (0.4473) data time 0.0004 (0.0018) model time 0.4443 (0.4454) loss 3.4710 (2.9717) grad_norm 1.4759 (inf) loss_scale 512.0000 (576.5237) mem 16699MB [2024-08-10 14:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [149/300][620/625] eta 0:00:02 lr 0.000673 wd 0.0500 time 0.4367 (0.4471) data time 0.0004 (0.0018) model time 0.4362 (0.4452) loss 3.0274 (2.9689) grad_norm 1.5849 (inf) loss_scale 512.0000 (575.4847) mem 16699MB [2024-08-10 14:01:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 149 training takes 0:04:39 [2024-08-10 14:01:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:01:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.5376 (0.5376) Acc@1 88.574 (88.574) Acc@5 98.242 (98.242) Mem 16699MB [2024-08-10 14:01:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8467 (0.6607) Acc@1 79.492 (85.649) Acc@5 95.508 (97.559) Mem 16699MB [2024-08-10 14:01:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9590 (0.7804) Acc@1 77.246 (82.603) Acc@5 94.336 (96.205) Mem 16699MB [2024-08-10 14:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.284 Acc@5 96.203 [2024-08-10 14:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 14:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.28% [2024-08-10 14:01:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 14:01:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 14:01:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.4741 (0.4741) Acc@1 89.453 (89.453) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:01:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.7661 (0.5972) Acc@1 82.178 (87.012) Acc@5 96.631 (97.878) Mem 16699MB [2024-08-10 14:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.8726 (0.7017) Acc@1 78.418 (84.056) Acc@5 95.850 (96.808) Mem 16699MB [2024-08-10 14:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.731 Acc@5 96.809 [2024-08-10 14:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 14:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.73% [2024-08-10 14:01:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:01:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:01:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][0/625] eta 0:07:59 lr 0.000673 wd 0.0500 time 0.7676 (0.7676) data time 0.3808 (0.3808) model time 0.0000 (0.0000) loss 3.0232 (3.0232) grad_norm 1.4681 (1.4681) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:01:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][10/625] eta 0:04:51 lr 0.000672 wd 0.0500 time 0.4529 (0.4744) data time 0.0008 (0.0355) model time 0.0000 (0.0000) loss 3.3884 (2.9982) grad_norm 2.3458 (1.4527) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:01:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][20/625] eta 0:04:42 lr 0.000672 wd 0.0500 time 0.4426 (0.4669) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 3.4004 (2.9367) grad_norm 1.3128 (1.4844) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:01:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][30/625] eta 0:04:33 lr 0.000672 wd 0.0500 time 0.4415 (0.4592) data time 0.0008 (0.0132) model time 0.0000 (0.0000) loss 2.6806 (2.9723) grad_norm 1.5511 (1.5892) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:01:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][40/625] eta 0:04:26 lr 0.000672 wd 0.0500 time 0.4442 (0.4549) data time 0.0009 (0.0102) model time 0.0000 (0.0000) loss 3.5923 (2.9984) grad_norm 1.4916 (1.6530) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][50/625] eta 0:04:20 lr 0.000672 wd 0.0500 time 0.4459 (0.4527) data time 0.0007 (0.0084) model time 0.0000 (0.0000) loss 3.6731 (3.0255) grad_norm 1.3988 (1.6391) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][60/625] eta 0:04:14 lr 0.000672 wd 0.0500 time 0.4428 (0.4507) data time 0.0008 (0.0072) model time 0.4420 (0.4398) loss 2.6845 (3.0284) grad_norm 1.3932 (1.6337) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][70/625] eta 0:04:09 lr 0.000672 wd 0.0500 time 0.4419 (0.4493) data time 0.0008 (0.0063) model time 0.4411 (0.4398) loss 2.6916 (2.9551) grad_norm 1.5029 (1.6374) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][80/625] eta 0:04:04 lr 0.000672 wd 0.0500 time 0.4426 (0.4486) data time 0.0006 (0.0056) model time 0.4420 (0.4407) loss 2.0687 (2.9527) grad_norm 1.2265 (1.6644) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][90/625] eta 0:04:00 lr 0.000672 wd 0.0500 time 0.4447 (0.4501) data time 0.0006 (0.0051) model time 0.4441 (0.4458) loss 3.4693 (2.9487) grad_norm 1.3882 (1.6951) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][100/625] eta 0:03:56 lr 0.000671 wd 0.0500 time 0.4415 (0.4495) data time 0.0006 (0.0047) model time 0.4409 (0.4454) loss 3.5436 (2.9611) grad_norm 2.3104 (1.6847) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][110/625] eta 0:03:51 lr 0.000671 wd 0.0500 time 0.4443 (0.4489) data time 0.0008 (0.0043) model time 0.4435 (0.4448) loss 2.8111 (2.9758) grad_norm 2.0615 (1.7896) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][120/625] eta 0:03:47 lr 0.000671 wd 0.0500 time 0.4469 (0.4502) data time 0.0006 (0.0041) model time 0.4463 (0.4475) loss 3.7026 (3.0002) grad_norm 1.2083 (1.8002) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][130/625] eta 0:03:42 lr 0.000671 wd 0.0500 time 0.4445 (0.4497) data time 0.0009 (0.0038) model time 0.4436 (0.4469) loss 2.9691 (2.9952) grad_norm 1.2952 (1.8058) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][140/625] eta 0:03:37 lr 0.000671 wd 0.0500 time 0.4473 (0.4493) data time 0.0007 (0.0036) model time 0.4466 (0.4465) loss 2.5409 (2.9904) grad_norm 1.6078 (1.8391) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][150/625] eta 0:03:33 lr 0.000671 wd 0.0500 time 0.4399 (0.4489) data time 0.0008 (0.0034) model time 0.4390 (0.4461) loss 3.7002 (3.0104) grad_norm 1.8107 (1.8446) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][160/625] eta 0:03:28 lr 0.000671 wd 0.0500 time 0.4487 (0.4486) data time 0.0006 (0.0033) model time 0.4481 (0.4457) loss 3.5108 (3.0210) grad_norm 1.8861 (1.8384) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][170/625] eta 0:03:23 lr 0.000671 wd 0.0500 time 0.4419 (0.4483) data time 0.0006 (0.0031) model time 0.4413 (0.4455) loss 3.5566 (3.0269) grad_norm 1.8385 (1.8361) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:02:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][180/625] eta 0:03:19 lr 0.000671 wd 0.0500 time 0.4447 (0.4480) data time 0.0006 (0.0030) model time 0.4441 (0.4452) loss 2.2420 (3.0319) grad_norm 1.5549 (1.8304) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][190/625] eta 0:03:14 lr 0.000670 wd 0.0500 time 0.4411 (0.4477) data time 0.0009 (0.0029) model time 0.4402 (0.4450) loss 3.5533 (3.0288) grad_norm 3.5828 (1.8479) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][200/625] eta 0:03:10 lr 0.000670 wd 0.0500 time 0.4399 (0.4475) data time 0.0006 (0.0028) model time 0.4393 (0.4448) loss 3.3696 (3.0314) grad_norm 1.3956 (1.8568) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][210/625] eta 0:03:05 lr 0.000670 wd 0.0500 time 0.4391 (0.4472) data time 0.0009 (0.0027) model time 0.4382 (0.4446) loss 2.4816 (3.0226) grad_norm 1.9879 (1.8426) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][220/625] eta 0:03:01 lr 0.000670 wd 0.0500 time 0.4504 (0.4471) data time 0.0006 (0.0026) model time 0.4498 (0.4445) loss 3.5372 (3.0266) grad_norm 1.4948 (1.8377) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][230/625] eta 0:02:56 lr 0.000670 wd 0.0500 time 0.4424 (0.4469) data time 0.0008 (0.0026) model time 0.4416 (0.4444) loss 3.3200 (3.0293) grad_norm 1.8869 (1.8297) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][240/625] eta 0:02:52 lr 0.000670 wd 0.0500 time 0.4368 (0.4468) data time 0.0011 (0.0025) model time 0.4357 (0.4443) loss 3.2016 (3.0294) grad_norm 1.5922 (1.8134) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][250/625] eta 0:02:47 lr 0.000670 wd 0.0500 time 0.4405 (0.4466) data time 0.0009 (0.0024) model time 0.4396 (0.4442) loss 2.3313 (3.0254) grad_norm 1.4789 (1.8052) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][260/625] eta 0:02:42 lr 0.000670 wd 0.0500 time 0.4437 (0.4465) data time 0.0007 (0.0024) model time 0.4430 (0.4441) loss 3.2840 (3.0257) grad_norm 1.6929 (1.8027) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][270/625] eta 0:02:38 lr 0.000670 wd 0.0500 time 0.4400 (0.4463) data time 0.0008 (0.0023) model time 0.4392 (0.4439) loss 3.4973 (3.0231) grad_norm 1.3977 (1.7993) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][280/625] eta 0:02:33 lr 0.000670 wd 0.0500 time 0.4434 (0.4462) data time 0.0007 (0.0023) model time 0.4427 (0.4438) loss 3.0491 (3.0209) grad_norm 1.4414 (1.8034) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][290/625] eta 0:02:29 lr 0.000669 wd 0.0500 time 0.4455 (0.4461) data time 0.0011 (0.0022) model time 0.4444 (0.4438) loss 3.4864 (3.0152) grad_norm 1.3748 (1.8016) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][300/625] eta 0:02:24 lr 0.000669 wd 0.0500 time 0.4452 (0.4460) data time 0.0007 (0.0022) model time 0.4446 (0.4437) loss 1.9845 (3.0112) grad_norm 1.6388 (1.8043) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][310/625] eta 0:02:20 lr 0.000669 wd 0.0500 time 0.4432 (0.4459) data time 0.0006 (0.0021) model time 0.4426 (0.4437) loss 1.9206 (3.0058) grad_norm 1.9499 (1.8134) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][320/625] eta 0:02:15 lr 0.000669 wd 0.0500 time 0.4439 (0.4458) data time 0.0006 (0.0021) model time 0.4433 (0.4436) loss 1.7296 (3.0014) grad_norm 1.5469 (1.8042) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][330/625] eta 0:02:11 lr 0.000669 wd 0.0500 time 0.4422 (0.4458) data time 0.0009 (0.0021) model time 0.4413 (0.4436) loss 3.2430 (2.9943) grad_norm 1.9532 (1.7951) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][340/625] eta 0:02:07 lr 0.000669 wd 0.0500 time 0.4443 (0.4457) data time 0.0006 (0.0020) model time 0.4437 (0.4435) loss 3.7195 (2.9891) grad_norm 2.0730 (1.7957) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][350/625] eta 0:02:02 lr 0.000669 wd 0.0500 time 0.4392 (0.4455) data time 0.0010 (0.0020) model time 0.4382 (0.4434) loss 3.0623 (2.9877) grad_norm 1.4568 (1.7954) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][360/625] eta 0:01:58 lr 0.000669 wd 0.0500 time 0.4441 (0.4460) data time 0.0007 (0.0020) model time 0.4434 (0.4439) loss 3.1354 (2.9850) grad_norm 1.3715 (1.7869) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][370/625] eta 0:01:53 lr 0.000669 wd 0.0500 time 0.4447 (0.4459) data time 0.0008 (0.0019) model time 0.4439 (0.4439) loss 2.7869 (2.9855) grad_norm 1.7249 (1.7799) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][380/625] eta 0:01:49 lr 0.000668 wd 0.0500 time 0.4414 (0.4458) data time 0.0010 (0.0019) model time 0.4404 (0.4438) loss 3.4594 (2.9846) grad_norm 2.2499 (1.7837) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][390/625] eta 0:01:44 lr 0.000668 wd 0.0500 time 0.4407 (0.4457) data time 0.0007 (0.0019) model time 0.4400 (0.4438) loss 3.2095 (2.9830) grad_norm 1.3705 (1.7782) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][400/625] eta 0:01:40 lr 0.000668 wd 0.0500 time 0.4425 (0.4456) data time 0.0008 (0.0019) model time 0.4416 (0.4437) loss 2.9787 (2.9873) grad_norm 1.4403 (1.7719) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][410/625] eta 0:01:35 lr 0.000668 wd 0.0500 time 0.4408 (0.4455) data time 0.0010 (0.0018) model time 0.4398 (0.4436) loss 2.8221 (2.9883) grad_norm 1.1706 (1.7678) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][420/625] eta 0:01:31 lr 0.000668 wd 0.0500 time 0.4385 (0.4455) data time 0.0007 (0.0018) model time 0.4379 (0.4436) loss 3.0568 (2.9882) grad_norm 1.3662 (1.7632) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][430/625] eta 0:01:26 lr 0.000668 wd 0.0500 time 0.4401 (0.4458) data time 0.0006 (0.0018) model time 0.4395 (0.4440) loss 1.9745 (2.9886) grad_norm 1.3130 (1.7618) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][440/625] eta 0:01:22 lr 0.000668 wd 0.0500 time 0.4480 (0.4458) data time 0.0006 (0.0018) model time 0.4474 (0.4440) loss 3.0364 (2.9850) grad_norm 1.4052 (1.7538) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:04:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][450/625] eta 0:01:18 lr 0.000668 wd 0.0500 time 0.4420 (0.4462) data time 0.0006 (0.0018) model time 0.4414 (0.4445) loss 3.4840 (2.9864) grad_norm 2.8439 (1.7571) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][460/625] eta 0:01:13 lr 0.000668 wd 0.0500 time 0.4412 (0.4461) data time 0.0007 (0.0017) model time 0.4405 (0.4444) loss 2.2815 (2.9821) grad_norm 1.8419 (1.7579) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][470/625] eta 0:01:09 lr 0.000668 wd 0.0500 time 0.4444 (0.4461) data time 0.0007 (0.0017) model time 0.4437 (0.4444) loss 3.1121 (2.9820) grad_norm 2.0306 (1.7550) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][480/625] eta 0:01:04 lr 0.000667 wd 0.0500 time 0.4423 (0.4460) data time 0.0009 (0.0017) model time 0.4414 (0.4443) loss 3.3169 (2.9857) grad_norm 1.2053 (1.7647) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][490/625] eta 0:01:00 lr 0.000667 wd 0.0500 time 0.4376 (0.4459) data time 0.0008 (0.0017) model time 0.4368 (0.4442) loss 3.2649 (2.9838) grad_norm 2.1718 (1.7671) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][500/625] eta 0:00:55 lr 0.000667 wd 0.0500 time 0.4419 (0.4459) data time 0.0008 (0.0017) model time 0.4410 (0.4442) loss 3.0057 (2.9856) grad_norm 1.2357 (1.7709) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][510/625] eta 0:00:51 lr 0.000667 wd 0.0500 time 0.4411 (0.4458) data time 0.0008 (0.0017) model time 0.4403 (0.4442) loss 2.8490 (2.9814) grad_norm 1.5719 (1.7686) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][520/625] eta 0:00:46 lr 0.000667 wd 0.0500 time 0.4457 (0.4458) data time 0.0006 (0.0016) model time 0.4451 (0.4441) loss 3.2751 (2.9807) grad_norm 1.3849 (1.7762) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][530/625] eta 0:00:42 lr 0.000667 wd 0.0500 time 0.4464 (0.4457) data time 0.0007 (0.0016) model time 0.4457 (0.4441) loss 2.9850 (2.9779) grad_norm 1.1456 (1.7762) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][540/625] eta 0:00:37 lr 0.000667 wd 0.0500 time 0.5566 (0.4459) data time 0.0010 (0.0016) model time 0.5557 (0.4443) loss 3.0239 (2.9742) grad_norm 1.3754 (1.7746) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][550/625] eta 0:00:33 lr 0.000667 wd 0.0500 time 0.4406 (0.4457) data time 0.0006 (0.0016) model time 0.4400 (0.4441) loss 3.3934 (2.9720) grad_norm 1.5962 (1.7674) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][560/625] eta 0:00:28 lr 0.000667 wd 0.0500 time 0.4446 (0.4457) data time 0.0008 (0.0016) model time 0.4437 (0.4441) loss 3.3856 (2.9758) grad_norm 1.3665 (1.7634) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][570/625] eta 0:00:24 lr 0.000666 wd 0.0500 time 0.4425 (0.4456) data time 0.0007 (0.0016) model time 0.4418 (0.4440) loss 3.4421 (2.9769) grad_norm 3.9244 (1.7709) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][580/625] eta 0:00:20 lr 0.000666 wd 0.0500 time 0.4439 (0.4456) data time 0.0008 (0.0016) model time 0.4430 (0.4440) loss 2.5315 (2.9794) grad_norm 2.6051 (1.7810) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][590/625] eta 0:00:15 lr 0.000666 wd 0.0500 time 0.4425 (0.4456) data time 0.0006 (0.0016) model time 0.4419 (0.4440) loss 2.0459 (2.9801) grad_norm 2.3494 (1.7868) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][600/625] eta 0:00:11 lr 0.000666 wd 0.0500 time 0.4426 (0.4455) data time 0.0006 (0.0015) model time 0.4420 (0.4439) loss 2.4131 (2.9764) grad_norm 1.5994 (1.7822) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][610/625] eta 0:00:06 lr 0.000666 wd 0.0500 time 0.4399 (0.4455) data time 0.0005 (0.0015) model time 0.4395 (0.4439) loss 3.5172 (2.9750) grad_norm 1.8240 (1.7777) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [150/300][620/625] eta 0:00:02 lr 0.000666 wd 0.0500 time 0.4405 (0.4456) data time 0.0004 (0.0015) model time 0.4401 (0.4441) loss 3.2153 (2.9729) grad_norm 1.6529 (1.7734) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:15 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 150 training takes 0:04:38 [2024-08-10 14:06:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:06:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5688 (0.5688) Acc@1 87.793 (87.793) Acc@5 98.242 (98.242) Mem 16699MB [2024-08-10 14:06:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.9019 (0.6826) Acc@1 78.125 (84.970) Acc@5 95.166 (97.363) Mem 16699MB [2024-08-10 14:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9775 (0.7995) Acc@1 77.930 (82.055) Acc@5 94.482 (96.157) Mem 16699MB [2024-08-10 14:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 81.804 Acc@5 96.155 [2024-08-10 14:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 81.8% [2024-08-10 14:06:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.788 (0.788) Loss 0.4736 (0.4736) Acc@1 89.551 (89.551) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.182) Loss 0.7651 (0.5964) Acc@1 82.275 (86.985) Acc@5 96.582 (97.865) Mem 16699MB [2024-08-10 14:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.151) Loss 0.8711 (0.7009) Acc@1 78.564 (84.043) Acc@5 95.996 (96.801) Mem 16699MB [2024-08-10 14:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.727 Acc@5 96.811 [2024-08-10 14:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 14:06:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][0/625] eta 0:13:24 lr 0.000666 wd 0.0500 time 1.2864 (1.2864) data time 0.7993 (0.7993) model time 0.0000 (0.0000) loss 2.0696 (2.0696) grad_norm 2.6519 (2.6519) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][10/625] eta 0:05:19 lr 0.000666 wd 0.0500 time 0.4428 (0.5190) data time 0.0008 (0.0735) model time 0.0000 (0.0000) loss 3.3913 (3.0349) grad_norm 1.1751 (1.7854) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][20/625] eta 0:04:58 lr 0.000666 wd 0.0500 time 0.4403 (0.4926) data time 0.0007 (0.0389) model time 0.0000 (0.0000) loss 3.6878 (3.0338) grad_norm 1.1114 (1.6519) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][30/625] eta 0:04:43 lr 0.000666 wd 0.0500 time 0.4395 (0.4763) data time 0.0006 (0.0267) model time 0.0000 (0.0000) loss 3.6064 (3.0289) grad_norm 1.7962 (1.6507) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][40/625] eta 0:04:33 lr 0.000665 wd 0.0500 time 0.4419 (0.4683) data time 0.0006 (0.0204) model time 0.0000 (0.0000) loss 3.7850 (3.0139) grad_norm 1.5258 (1.6792) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][50/625] eta 0:04:26 lr 0.000665 wd 0.0500 time 0.4427 (0.4636) data time 0.0006 (0.0165) model time 0.0000 (0.0000) loss 2.8959 (2.9916) grad_norm 3.1954 (1.7096) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][60/625] eta 0:04:19 lr 0.000665 wd 0.0500 time 0.4400 (0.4599) data time 0.0008 (0.0140) model time 0.4392 (0.4404) loss 2.7386 (2.9841) grad_norm 2.0099 (1.7144) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:06:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][70/625] eta 0:04:15 lr 0.000665 wd 0.0500 time 0.4398 (0.4595) data time 0.0006 (0.0121) model time 0.4391 (0.4483) loss 2.5615 (2.9756) grad_norm 2.0194 (1.7071) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][80/625] eta 0:04:09 lr 0.000665 wd 0.0500 time 0.4408 (0.4573) data time 0.0006 (0.0107) model time 0.4401 (0.4458) loss 2.2008 (2.9514) grad_norm 2.2175 (1.7031) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][90/625] eta 0:04:03 lr 0.000665 wd 0.0500 time 0.4414 (0.4557) data time 0.0006 (0.0096) model time 0.4408 (0.4449) loss 3.5196 (2.9430) grad_norm 2.4088 (1.7472) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][100/625] eta 0:03:58 lr 0.000665 wd 0.0500 time 0.4436 (0.4545) data time 0.0006 (0.0088) model time 0.4429 (0.4443) loss 2.2862 (2.9340) grad_norm 3.5831 (1.7770) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][110/625] eta 0:03:53 lr 0.000665 wd 0.0500 time 0.4409 (0.4534) data time 0.0009 (0.0081) model time 0.4401 (0.4439) loss 3.2820 (2.9398) grad_norm 1.3790 (1.7714) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][120/625] eta 0:03:48 lr 0.000665 wd 0.0500 time 0.4505 (0.4526) data time 0.0006 (0.0075) model time 0.4499 (0.4438) loss 3.3755 (2.9473) grad_norm 1.6423 (1.7611) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][130/625] eta 0:03:43 lr 0.000665 wd 0.0500 time 0.4427 (0.4519) data time 0.0006 (0.0070) model time 0.4421 (0.4437) loss 1.7052 (2.9377) grad_norm 2.4799 (1.7538) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][140/625] eta 0:03:39 lr 0.000664 wd 0.0500 time 0.4403 (0.4528) data time 0.0009 (0.0065) model time 0.4394 (0.4458) loss 1.8514 (2.9256) grad_norm 1.3104 (1.7544) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][150/625] eta 0:03:34 lr 0.000664 wd 0.0500 time 0.4435 (0.4521) data time 0.0009 (0.0062) model time 0.4427 (0.4453) loss 3.2965 (2.9265) grad_norm 2.0156 (1.7866) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][160/625] eta 0:03:29 lr 0.000664 wd 0.0500 time 0.4451 (0.4514) data time 0.0007 (0.0058) model time 0.4444 (0.4449) loss 2.9045 (2.9337) grad_norm 1.9419 (1.8254) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][170/625] eta 0:03:25 lr 0.000664 wd 0.0500 time 0.4394 (0.4509) data time 0.0009 (0.0056) model time 0.4385 (0.4447) loss 2.4857 (2.9416) grad_norm 1.2275 (1.8202) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][180/625] eta 0:03:20 lr 0.000664 wd 0.0500 time 0.4422 (0.4505) data time 0.0007 (0.0053) model time 0.4415 (0.4445) loss 3.7899 (2.9379) grad_norm 1.3201 (1.8045) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][190/625] eta 0:03:15 lr 0.000664 wd 0.0500 time 0.4436 (0.4501) data time 0.0009 (0.0051) model time 0.4427 (0.4443) loss 2.4912 (2.9257) grad_norm 1.5956 (1.7895) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][200/625] eta 0:03:11 lr 0.000664 wd 0.0500 time 0.4371 (0.4497) data time 0.0008 (0.0049) model time 0.4363 (0.4441) loss 3.3747 (2.9278) grad_norm 1.2901 (1.7740) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:07:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][210/625] eta 0:03:06 lr 0.000664 wd 0.0500 time 0.4409 (0.4501) data time 0.0006 (0.0047) model time 0.4403 (0.4449) loss 2.5658 (2.9262) grad_norm 1.4540 (1.7653) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][220/625] eta 0:03:02 lr 0.000664 wd 0.0500 time 0.4398 (0.4497) data time 0.0008 (0.0045) model time 0.4390 (0.4447) loss 3.0461 (2.9208) grad_norm 1.6681 (1.7973) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][230/625] eta 0:02:57 lr 0.000663 wd 0.0500 time 0.4422 (0.4494) data time 0.0006 (0.0044) model time 0.4416 (0.4445) loss 2.5284 (2.9159) grad_norm 1.7965 (1.7987) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][240/625] eta 0:02:52 lr 0.000663 wd 0.0500 time 0.4493 (0.4492) data time 0.0008 (0.0042) model time 0.4485 (0.4445) loss 3.3225 (2.9111) grad_norm 2.9066 (1.7968) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][250/625] eta 0:02:48 lr 0.000663 wd 0.0500 time 0.4430 (0.4490) data time 0.0006 (0.0041) model time 0.4423 (0.4444) loss 2.9193 (2.9148) grad_norm 2.2032 (1.8005) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][260/625] eta 0:02:43 lr 0.000663 wd 0.0500 time 0.4428 (0.4489) data time 0.0006 (0.0040) model time 0.4422 (0.4444) loss 3.4544 (2.9161) grad_norm 1.3390 (1.7995) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][270/625] eta 0:02:39 lr 0.000663 wd 0.0500 time 0.4465 (0.4488) data time 0.0008 (0.0038) model time 0.4457 (0.4444) loss 2.6515 (2.9132) grad_norm 1.3030 (1.7943) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][280/625] eta 0:02:34 lr 0.000663 wd 0.0500 time 0.4390 (0.4485) data time 0.0009 (0.0037) model time 0.4381 (0.4443) loss 2.1243 (2.9117) grad_norm 2.7284 (1.7955) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][290/625] eta 0:02:30 lr 0.000663 wd 0.0500 time 0.4383 (0.4484) data time 0.0008 (0.0036) model time 0.4375 (0.4443) loss 3.0827 (2.9117) grad_norm 1.8294 (1.7894) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][300/625] eta 0:02:25 lr 0.000663 wd 0.0500 time 0.4466 (0.4483) data time 0.0008 (0.0036) model time 0.4457 (0.4443) loss 2.9732 (2.9169) grad_norm 1.4358 (1.7919) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][310/625] eta 0:02:21 lr 0.000663 wd 0.0500 time 0.4427 (0.4485) data time 0.0008 (0.0035) model time 0.4419 (0.4447) loss 2.7347 (2.9095) grad_norm 2.3881 (1.7909) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][320/625] eta 0:02:16 lr 0.000662 wd 0.0500 time 0.4474 (0.4484) data time 0.0008 (0.0034) model time 0.4466 (0.4446) loss 2.6452 (2.9094) grad_norm 1.7167 (1.7887) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][330/625] eta 0:02:12 lr 0.000662 wd 0.0500 time 0.4441 (0.4486) data time 0.0008 (0.0033) model time 0.4433 (0.4449) loss 2.9681 (2.9155) grad_norm 1.7769 (1.7804) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][340/625] eta 0:02:07 lr 0.000662 wd 0.0500 time 0.4476 (0.4485) data time 0.0008 (0.0033) model time 0.4468 (0.4449) loss 2.0648 (2.9059) grad_norm 2.1690 (1.7817) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][350/625] eta 0:02:03 lr 0.000662 wd 0.0500 time 0.4468 (0.4484) data time 0.0009 (0.0032) model time 0.4459 (0.4449) loss 2.6070 (2.9064) grad_norm 1.2691 (1.7878) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][360/625] eta 0:01:58 lr 0.000662 wd 0.0500 time 0.4454 (0.4483) data time 0.0006 (0.0031) model time 0.4448 (0.4448) loss 3.3396 (2.9104) grad_norm 1.1828 (1.7802) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][370/625] eta 0:01:54 lr 0.000662 wd 0.0500 time 0.4428 (0.4481) data time 0.0009 (0.0031) model time 0.4419 (0.4448) loss 2.6293 (2.9115) grad_norm 2.0026 (1.7893) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][380/625] eta 0:01:49 lr 0.000662 wd 0.0500 time 0.4412 (0.4481) data time 0.0009 (0.0032) model time 0.4403 (0.4446) loss 2.1540 (2.9178) grad_norm 2.1860 (1.7893) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][390/625] eta 0:01:45 lr 0.000662 wd 0.0500 time 0.4404 (0.4481) data time 0.0009 (0.0032) model time 0.4396 (0.4447) loss 3.3169 (2.9297) grad_norm 2.0508 (1.7890) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][400/625] eta 0:01:40 lr 0.000662 wd 0.0500 time 0.4422 (0.4480) data time 0.0006 (0.0031) model time 0.4415 (0.4446) loss 3.4937 (2.9348) grad_norm 1.4538 (1.7842) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][410/625] eta 0:01:36 lr 0.000662 wd 0.0500 time 0.4522 (0.4484) data time 0.0009 (0.0031) model time 0.4513 (0.4450) loss 3.3736 (2.9300) grad_norm 1.1367 (1.7724) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][420/625] eta 0:01:31 lr 0.000661 wd 0.0500 time 0.4420 (0.4483) data time 0.0007 (0.0030) model time 0.4413 (0.4450) loss 2.7071 (2.9275) grad_norm 1.0237 (1.7617) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][430/625] eta 0:01:27 lr 0.000661 wd 0.0500 time 0.4424 (0.4486) data time 0.0009 (0.0030) model time 0.4416 (0.4455) loss 3.0298 (2.9308) grad_norm 1.4478 (1.7554) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][440/625] eta 0:01:22 lr 0.000661 wd 0.0500 time 0.4391 (0.4485) data time 0.0009 (0.0029) model time 0.4382 (0.4453) loss 2.5063 (2.9280) grad_norm 2.0871 (1.7700) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][450/625] eta 0:01:18 lr 0.000661 wd 0.0500 time 0.4428 (0.4483) data time 0.0006 (0.0029) model time 0.4422 (0.4453) loss 3.4047 (2.9334) grad_norm 2.1020 (1.7718) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][460/625] eta 0:01:13 lr 0.000661 wd 0.0500 time 0.4421 (0.4482) data time 0.0006 (0.0028) model time 0.4415 (0.4452) loss 3.6457 (2.9331) grad_norm 1.7049 (1.7831) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][470/625] eta 0:01:09 lr 0.000661 wd 0.0500 time 0.4406 (0.4481) data time 0.0008 (0.0028) model time 0.4398 (0.4451) loss 3.3563 (2.9360) grad_norm 1.6605 (1.7801) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:09:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][480/625] eta 0:01:05 lr 0.000661 wd 0.0500 time 0.4426 (0.4484) data time 0.0009 (0.0028) model time 0.4417 (0.4455) loss 3.0194 (2.9382) grad_norm 1.9302 (1.7774) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][490/625] eta 0:01:00 lr 0.000661 wd 0.0500 time 0.4424 (0.4483) data time 0.0006 (0.0027) model time 0.4418 (0.4454) loss 3.7779 (2.9409) grad_norm 1.5664 (1.7750) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][500/625] eta 0:00:56 lr 0.000661 wd 0.0500 time 0.4430 (0.4482) data time 0.0009 (0.0027) model time 0.4421 (0.4454) loss 2.9319 (2.9455) grad_norm 1.1675 (1.7685) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][510/625] eta 0:00:51 lr 0.000660 wd 0.0500 time 0.4390 (0.4481) data time 0.0007 (0.0027) model time 0.4382 (0.4453) loss 1.8448 (2.9432) grad_norm 1.4933 (1.7626) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][520/625] eta 0:00:47 lr 0.000660 wd 0.0500 time 0.4398 (0.4480) data time 0.0009 (0.0026) model time 0.4389 (0.4452) loss 3.2569 (2.9416) grad_norm 1.3812 (1.7597) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][530/625] eta 0:00:42 lr 0.000660 wd 0.0500 time 0.4407 (0.4479) data time 0.0006 (0.0026) model time 0.4401 (0.4451) loss 3.2079 (2.9394) grad_norm 1.6765 (1.7584) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][540/625] eta 0:00:38 lr 0.000660 wd 0.0500 time 0.4390 (0.4477) data time 0.0007 (0.0026) model time 0.4383 (0.4450) loss 3.0501 (2.9405) grad_norm 2.2132 (1.7670) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][550/625] eta 0:00:33 lr 0.000660 wd 0.0500 time 0.4442 (0.4476) data time 0.0008 (0.0025) model time 0.4434 (0.4449) loss 2.3689 (2.9388) grad_norm 2.3891 (1.7759) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][560/625] eta 0:00:29 lr 0.000660 wd 0.0500 time 0.4423 (0.4475) data time 0.0009 (0.0025) model time 0.4414 (0.4448) loss 3.3046 (2.9391) grad_norm 3.7918 (1.7856) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][570/625] eta 0:00:24 lr 0.000660 wd 0.0500 time 0.4434 (0.4474) data time 0.0007 (0.0025) model time 0.4428 (0.4448) loss 3.1901 (2.9374) grad_norm 1.8277 (1.7924) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][580/625] eta 0:00:20 lr 0.000660 wd 0.0500 time 0.4426 (0.4477) data time 0.0008 (0.0025) model time 0.4417 (0.4451) loss 3.2438 (2.9410) grad_norm 1.9883 (1.7918) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][590/625] eta 0:00:15 lr 0.000660 wd 0.0500 time 0.4426 (0.4476) data time 0.0007 (0.0024) model time 0.4419 (0.4450) loss 3.3386 (2.9421) grad_norm 1.8867 (1.7986) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][600/625] eta 0:00:11 lr 0.000660 wd 0.0500 time 0.4425 (0.4477) data time 0.0009 (0.0024) model time 0.4416 (0.4451) loss 3.3114 (2.9406) grad_norm 1.8838 (1.7941) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:10:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][610/625] eta 0:00:06 lr 0.000659 wd 0.0500 time 0.4368 (0.4476) data time 0.0004 (0.0024) model time 0.4364 (0.4450) loss 3.0113 (2.9406) grad_norm 1.4118 (1.7974) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [151/300][620/625] eta 0:00:02 lr 0.000659 wd 0.0500 time 0.4417 (0.4475) data time 0.0006 (0.0024) model time 0.4411 (0.4449) loss 3.1232 (2.9422) grad_norm 1.3069 (1.8005) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 151 training takes 0:04:39 [2024-08-10 14:11:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:11:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:11:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5532 (0.5532) Acc@1 87.939 (87.939) Acc@5 98.193 (98.193) Mem 16699MB [2024-08-10 14:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8560 (0.6647) Acc@1 80.518 (85.551) Acc@5 95.557 (97.514) Mem 16699MB [2024-08-10 14:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9902 (0.7868) Acc@1 76.660 (82.382) Acc@5 94.971 (96.291) Mem 16699MB [2024-08-10 14:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.088 Acc@5 96.259 [2024-08-10 14:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-10 14:11:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.959 (0.959) Loss 0.4734 (0.4734) Acc@1 89.551 (89.551) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:11:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.196) Loss 0.7642 (0.5957) Acc@1 82.178 (86.950) Acc@5 96.631 (97.869) Mem 16699MB [2024-08-10 14:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.158) Loss 0.8711 (0.7001) Acc@1 78.711 (84.015) Acc@5 95.947 (96.801) Mem 16699MB [2024-08-10 14:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.709 Acc@5 96.811 [2024-08-10 14:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 14:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][0/625] eta 0:12:58 lr 0.000659 wd 0.0500 time 1.2463 (1.2463) data time 0.5980 (0.5980) model time 0.0000 (0.0000) loss 2.1518 (2.1518) grad_norm 1.4928 (1.4928) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][10/625] eta 0:05:17 lr 0.000659 wd 0.0500 time 0.4428 (0.5168) data time 0.0009 (0.0552) model time 0.0000 (0.0000) loss 2.8792 (2.7025) grad_norm 2.0445 (1.6651) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][20/625] eta 0:04:51 lr 0.000659 wd 0.0500 time 0.4412 (0.4814) data time 0.0008 (0.0294) model time 0.0000 (0.0000) loss 3.0138 (2.8564) grad_norm 1.1986 (1.5721) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][30/625] eta 0:04:39 lr 0.000659 wd 0.0500 time 0.4414 (0.4690) data time 0.0009 (0.0202) model time 0.0000 (0.0000) loss 3.1857 (2.9312) grad_norm 1.3483 (1.5473) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][40/625] eta 0:04:30 lr 0.000659 wd 0.0500 time 0.4432 (0.4625) data time 0.0007 (0.0155) model time 0.0000 (0.0000) loss 2.4215 (2.8876) grad_norm 1.7576 (1.5728) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][50/625] eta 0:04:24 lr 0.000659 wd 0.0500 time 0.4452 (0.4591) data time 0.0006 (0.0126) model time 0.0000 (0.0000) loss 2.9955 (2.8384) grad_norm 1.9690 (1.5873) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][60/625] eta 0:04:18 lr 0.000659 wd 0.0500 time 0.4445 (0.4569) data time 0.0008 (0.0107) model time 0.4437 (0.4444) loss 3.2096 (2.8426) grad_norm 2.1986 (1.6511) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][70/625] eta 0:04:14 lr 0.000659 wd 0.0500 time 0.4409 (0.4578) data time 0.0009 (0.0093) model time 0.4400 (0.4533) loss 2.6090 (2.8575) grad_norm 1.6234 (1.6772) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][80/625] eta 0:04:08 lr 0.000658 wd 0.0500 time 0.4435 (0.4560) data time 0.0007 (0.0083) model time 0.4427 (0.4497) loss 2.4094 (2.8822) grad_norm 1.4035 (1.6822) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][90/625] eta 0:04:03 lr 0.000658 wd 0.0500 time 0.4436 (0.4546) data time 0.0007 (0.0075) model time 0.4429 (0.4479) loss 2.7218 (2.8831) grad_norm 1.3254 (1.6901) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:11:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][100/625] eta 0:03:58 lr 0.000658 wd 0.0500 time 0.4399 (0.4534) data time 0.0007 (0.0068) model time 0.4392 (0.4467) loss 2.0120 (2.8688) grad_norm 1.2095 (1.6748) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][110/625] eta 0:03:53 lr 0.000658 wd 0.0500 time 0.4450 (0.4541) data time 0.0006 (0.0063) model time 0.4444 (0.4490) loss 2.8276 (2.8798) grad_norm 1.3566 (1.6495) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][120/625] eta 0:03:48 lr 0.000658 wd 0.0500 time 0.4389 (0.4531) data time 0.0006 (0.0058) model time 0.4383 (0.4479) loss 2.9287 (2.8787) grad_norm 1.5129 (1.6215) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][130/625] eta 0:03:44 lr 0.000658 wd 0.0500 time 0.4376 (0.4542) data time 0.0006 (0.0055) model time 0.4370 (0.4501) loss 3.3688 (2.8780) grad_norm 1.4051 (1.6068) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][140/625] eta 0:03:39 lr 0.000658 wd 0.0500 time 0.4455 (0.4534) data time 0.0008 (0.0051) model time 0.4447 (0.4492) loss 2.2959 (2.8723) grad_norm 1.3027 (1.6044) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][150/625] eta 0:03:35 lr 0.000658 wd 0.0500 time 0.4424 (0.4527) data time 0.0006 (0.0049) model time 0.4417 (0.4485) loss 2.7921 (2.8919) grad_norm 1.7304 (1.6309) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][160/625] eta 0:03:30 lr 0.000658 wd 0.0500 time 0.4421 (0.4520) data time 0.0008 (0.0046) model time 0.4413 (0.4478) loss 2.9245 (2.8835) grad_norm 3.0978 (1.6501) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][170/625] eta 0:03:25 lr 0.000657 wd 0.0500 time 0.4447 (0.4515) data time 0.0008 (0.0044) model time 0.4439 (0.4473) loss 2.8462 (2.8665) grad_norm 1.4519 (1.7183) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][180/625] eta 0:03:20 lr 0.000657 wd 0.0500 time 0.4471 (0.4509) data time 0.0008 (0.0042) model time 0.4463 (0.4468) loss 3.5043 (2.8732) grad_norm 1.6823 (1.7257) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][190/625] eta 0:03:15 lr 0.000657 wd 0.0500 time 0.4419 (0.4505) data time 0.0006 (0.0040) model time 0.4413 (0.4465) loss 4.1197 (2.8911) grad_norm 1.7582 (1.7309) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][200/625] eta 0:03:11 lr 0.000657 wd 0.0500 time 0.4457 (0.4502) data time 0.0006 (0.0039) model time 0.4451 (0.4462) loss 2.3488 (2.8906) grad_norm 1.9176 (1.7572) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][210/625] eta 0:03:06 lr 0.000657 wd 0.0500 time 0.4425 (0.4498) data time 0.0006 (0.0037) model time 0.4419 (0.4460) loss 3.1729 (2.8992) grad_norm 2.2359 (1.7555) loss_scale 1024.0000 (533.8389) mem 16699MB [2024-08-10 14:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][220/625] eta 0:03:02 lr 0.000657 wd 0.0500 time 0.4442 (0.4496) data time 0.0006 (0.0036) model time 0.4436 (0.4458) loss 1.9736 (2.8902) grad_norm 1.6502 (1.7463) loss_scale 1024.0000 (556.0181) mem 16699MB [2024-08-10 14:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][230/625] eta 0:02:57 lr 0.000657 wd 0.0500 time 0.4453 (0.4493) data time 0.0006 (0.0035) model time 0.4447 (0.4456) loss 2.2734 (2.8972) grad_norm 6.4309 (1.7607) loss_scale 1024.0000 (576.2771) mem 16699MB [2024-08-10 14:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][240/625] eta 0:02:52 lr 0.000657 wd 0.0500 time 0.4408 (0.4491) data time 0.0006 (0.0034) model time 0.4402 (0.4455) loss 2.1364 (2.9045) grad_norm 1.8404 (1.7869) loss_scale 1024.0000 (594.8548) mem 16699MB [2024-08-10 14:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][250/625] eta 0:02:48 lr 0.000657 wd 0.0500 time 0.4419 (0.4488) data time 0.0008 (0.0033) model time 0.4411 (0.4452) loss 2.9511 (2.9052) grad_norm 1.8336 (1.7775) loss_scale 1024.0000 (611.9522) mem 16699MB [2024-08-10 14:13:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][260/625] eta 0:02:43 lr 0.000656 wd 0.0500 time 0.4363 (0.4491) data time 0.0007 (0.0032) model time 0.4357 (0.4458) loss 3.2828 (2.9062) grad_norm 1.6946 (1.7732) loss_scale 1024.0000 (627.7395) mem 16699MB [2024-08-10 14:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][270/625] eta 0:02:39 lr 0.000656 wd 0.0500 time 0.4464 (0.4489) data time 0.0006 (0.0031) model time 0.4458 (0.4456) loss 3.5040 (2.9186) grad_norm 1.9910 (1.7706) loss_scale 1024.0000 (642.3616) mem 16699MB [2024-08-10 14:13:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][280/625] eta 0:02:34 lr 0.000656 wd 0.0500 time 0.4450 (0.4486) data time 0.0007 (0.0030) model time 0.4443 (0.4454) loss 2.4069 (2.9166) grad_norm 3.0262 (1.7741) loss_scale 1024.0000 (655.9431) mem 16699MB [2024-08-10 14:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][290/625] eta 0:02:30 lr 0.000656 wd 0.0500 time 0.4397 (0.4485) data time 0.0009 (0.0030) model time 0.4388 (0.4454) loss 3.1890 (2.9185) grad_norm 1.3881 (1.7634) loss_scale 1024.0000 (668.5911) mem 16699MB [2024-08-10 14:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][300/625] eta 0:02:25 lr 0.000656 wd 0.0500 time 0.4414 (0.4483) data time 0.0009 (0.0029) model time 0.4405 (0.4452) loss 3.7512 (2.9255) grad_norm 1.7015 (1.7603) loss_scale 1024.0000 (680.3987) mem 16699MB [2024-08-10 14:13:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][310/625] eta 0:02:21 lr 0.000656 wd 0.0500 time 0.4431 (0.4481) data time 0.0009 (0.0028) model time 0.4422 (0.4451) loss 3.0429 (2.9314) grad_norm 1.5156 (1.7608) loss_scale 1024.0000 (691.4469) mem 16699MB [2024-08-10 14:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][320/625] eta 0:02:16 lr 0.000656 wd 0.0500 time 0.4413 (0.4480) data time 0.0007 (0.0028) model time 0.4406 (0.4450) loss 2.4475 (2.9273) grad_norm 1.5843 (1.7701) loss_scale 1024.0000 (701.8069) mem 16699MB [2024-08-10 14:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][330/625] eta 0:02:12 lr 0.000656 wd 0.0500 time 0.4419 (0.4478) data time 0.0009 (0.0027) model time 0.4410 (0.4448) loss 3.2683 (2.9159) grad_norm 2.6537 (1.7955) loss_scale 1024.0000 (711.5408) mem 16699MB [2024-08-10 14:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][340/625] eta 0:02:07 lr 0.000656 wd 0.0500 time 0.4427 (0.4477) data time 0.0009 (0.0027) model time 0.4418 (0.4447) loss 2.7728 (2.9180) grad_norm 2.4032 (1.8132) loss_scale 1024.0000 (720.7038) mem 16699MB [2024-08-10 14:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][350/625] eta 0:02:03 lr 0.000656 wd 0.0500 time 0.4431 (0.4476) data time 0.0006 (0.0026) model time 0.4425 (0.4447) loss 3.5184 (2.9170) grad_norm 1.6061 (1.8119) loss_scale 1024.0000 (729.3447) mem 16699MB [2024-08-10 14:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][360/625] eta 0:01:58 lr 0.000655 wd 0.0500 time 0.4394 (0.4475) data time 0.0009 (0.0026) model time 0.4386 (0.4446) loss 2.5589 (2.9178) grad_norm 1.4475 (1.8058) loss_scale 1024.0000 (737.5069) mem 16699MB [2024-08-10 14:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][370/625] eta 0:01:54 lr 0.000655 wd 0.0500 time 0.4492 (0.4474) data time 0.0008 (0.0025) model time 0.4484 (0.4446) loss 3.7775 (2.9189) grad_norm 1.1992 (1.8030) loss_scale 1024.0000 (745.2291) mem 16699MB [2024-08-10 14:14:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][380/625] eta 0:01:49 lr 0.000655 wd 0.0500 time 0.4410 (0.4473) data time 0.0006 (0.0025) model time 0.4403 (0.4446) loss 3.7202 (2.9200) grad_norm 1.2070 (1.7998) loss_scale 1024.0000 (752.5459) mem 16699MB [2024-08-10 14:14:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][390/625] eta 0:01:45 lr 0.000655 wd 0.0500 time 0.4428 (0.4472) data time 0.0008 (0.0024) model time 0.4420 (0.4445) loss 2.6192 (2.9205) grad_norm 1.8362 (1.7989) loss_scale 1024.0000 (759.4885) mem 16699MB [2024-08-10 14:14:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][400/625] eta 0:01:40 lr 0.000655 wd 0.0500 time 0.4434 (0.4471) data time 0.0007 (0.0024) model time 0.4427 (0.4444) loss 2.8618 (2.9248) grad_norm 3.8392 (1.8009) loss_scale 1024.0000 (766.0848) mem 16699MB [2024-08-10 14:14:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][410/625] eta 0:01:36 lr 0.000655 wd 0.0500 time 0.4417 (0.4470) data time 0.0008 (0.0024) model time 0.4409 (0.4443) loss 3.0628 (2.9227) grad_norm 1.4798 (1.7973) loss_scale 1024.0000 (772.3601) mem 16699MB [2024-08-10 14:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][420/625] eta 0:01:31 lr 0.000655 wd 0.0500 time 0.4441 (0.4469) data time 0.0006 (0.0023) model time 0.4435 (0.4443) loss 2.9989 (2.9235) grad_norm 1.4874 (1.7964) loss_scale 1024.0000 (778.3373) mem 16699MB [2024-08-10 14:14:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][430/625] eta 0:01:27 lr 0.000655 wd 0.0500 time 0.4454 (0.4468) data time 0.0010 (0.0023) model time 0.4444 (0.4443) loss 3.4651 (2.9220) grad_norm 1.7044 (1.7936) loss_scale 1024.0000 (784.0371) mem 16699MB [2024-08-10 14:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][440/625] eta 0:01:22 lr 0.000655 wd 0.0500 time 0.5816 (0.4471) data time 0.0007 (0.0023) model time 0.5808 (0.4446) loss 3.5339 (2.9232) grad_norm 2.5345 (1.7894) loss_scale 1024.0000 (789.4785) mem 16699MB [2024-08-10 14:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][450/625] eta 0:01:18 lr 0.000654 wd 0.0500 time 0.4483 (0.4470) data time 0.0007 (0.0022) model time 0.4476 (0.4445) loss 2.0304 (2.9209) grad_norm 2.1992 (1.8078) loss_scale 1024.0000 (794.6785) mem 16699MB [2024-08-10 14:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][460/625] eta 0:01:13 lr 0.000654 wd 0.0500 time 0.4438 (0.4474) data time 0.0008 (0.0022) model time 0.4430 (0.4450) loss 2.7635 (2.9201) grad_norm 1.1983 (1.8132) loss_scale 1024.0000 (799.6529) mem 16699MB [2024-08-10 14:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][470/625] eta 0:01:09 lr 0.000654 wd 0.0500 time 0.4430 (0.4473) data time 0.0006 (0.0022) model time 0.4424 (0.4449) loss 3.2054 (2.9229) grad_norm 1.5171 (1.8069) loss_scale 1024.0000 (804.4161) mem 16699MB [2024-08-10 14:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][480/625] eta 0:01:04 lr 0.000654 wd 0.0500 time 0.4466 (0.4476) data time 0.0006 (0.0021) model time 0.4460 (0.4453) loss 3.5331 (2.9277) grad_norm 1.2916 (1.8190) loss_scale 1024.0000 (808.9813) mem 16699MB [2024-08-10 14:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][490/625] eta 0:01:00 lr 0.000654 wd 0.0500 time 0.4485 (0.4475) data time 0.0006 (0.0021) model time 0.4478 (0.4453) loss 2.6830 (2.9251) grad_norm 1.5591 (1.8166) loss_scale 1024.0000 (813.3605) mem 16699MB [2024-08-10 14:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][500/625] eta 0:00:55 lr 0.000654 wd 0.0500 time 0.4455 (0.4475) data time 0.0008 (0.0021) model time 0.4447 (0.4452) loss 3.2029 (2.9281) grad_norm 3.2294 (1.8238) loss_scale 1024.0000 (817.5649) mem 16699MB [2024-08-10 14:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][510/625] eta 0:00:51 lr 0.000654 wd 0.0500 time 0.4405 (0.4474) data time 0.0009 (0.0021) model time 0.4397 (0.4452) loss 3.5043 (2.9330) grad_norm 3.6132 (1.8272) loss_scale 1024.0000 (821.6047) mem 16699MB [2024-08-10 14:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][520/625] eta 0:00:46 lr 0.000654 wd 0.0500 time 0.4393 (0.4473) data time 0.0009 (0.0021) model time 0.4384 (0.4451) loss 2.9915 (2.9307) grad_norm 1.5947 (1.8309) loss_scale 1024.0000 (825.4894) mem 16699MB [2024-08-10 14:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][530/625] eta 0:00:42 lr 0.000654 wd 0.0500 time 0.4433 (0.4472) data time 0.0009 (0.0020) model time 0.4424 (0.4451) loss 3.1956 (2.9294) grad_norm 0.9310 (1.8284) loss_scale 1024.0000 (829.2279) mem 16699MB [2024-08-10 14:15:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][540/625] eta 0:00:38 lr 0.000654 wd 0.0500 time 0.4377 (0.4471) data time 0.0008 (0.0020) model time 0.4369 (0.4450) loss 2.3634 (2.9292) grad_norm 1.0372 (1.8258) loss_scale 1024.0000 (832.8281) mem 16699MB [2024-08-10 14:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][550/625] eta 0:00:33 lr 0.000653 wd 0.0500 time 0.4424 (0.4471) data time 0.0009 (0.0020) model time 0.4414 (0.4450) loss 3.1171 (2.9345) grad_norm 1.4621 (1.8224) loss_scale 1024.0000 (836.2976) mem 16699MB [2024-08-10 14:15:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][560/625] eta 0:00:29 lr 0.000653 wd 0.0500 time 0.4458 (0.4470) data time 0.0009 (0.0020) model time 0.4449 (0.4449) loss 3.2716 (2.9323) grad_norm 1.2491 (1.8162) loss_scale 1024.0000 (839.6435) mem 16699MB [2024-08-10 14:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][570/625] eta 0:00:24 lr 0.000653 wd 0.0500 time 0.4453 (0.4470) data time 0.0006 (0.0020) model time 0.4446 (0.4449) loss 1.9753 (2.9331) grad_norm 1.8042 (1.8145) loss_scale 1024.0000 (842.8722) mem 16699MB [2024-08-10 14:15:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][580/625] eta 0:00:20 lr 0.000653 wd 0.0500 time 0.4432 (0.4470) data time 0.0008 (0.0019) model time 0.4424 (0.4449) loss 3.4427 (2.9333) grad_norm 1.5896 (1.8258) loss_scale 1024.0000 (845.9897) mem 16699MB [2024-08-10 14:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][590/625] eta 0:00:15 lr 0.000653 wd 0.0500 time 0.4515 (0.4469) data time 0.0010 (0.0019) model time 0.4505 (0.4449) loss 2.5799 (2.9335) grad_norm 1.6560 (1.8221) loss_scale 1024.0000 (849.0017) mem 16699MB [2024-08-10 14:15:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][600/625] eta 0:00:11 lr 0.000653 wd 0.0500 time 0.4453 (0.4469) data time 0.0009 (0.0019) model time 0.4444 (0.4448) loss 3.5161 (2.9337) grad_norm 1.7485 (1.8249) loss_scale 1024.0000 (851.9135) mem 16699MB [2024-08-10 14:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][610/625] eta 0:00:06 lr 0.000653 wd 0.0500 time 0.4380 (0.4468) data time 0.0007 (0.0019) model time 0.4373 (0.4448) loss 2.7556 (2.9326) grad_norm 1.3188 (1.8273) loss_scale 1024.0000 (854.7300) mem 16699MB [2024-08-10 14:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [152/300][620/625] eta 0:00:02 lr 0.000653 wd 0.0500 time 0.4394 (0.4467) data time 0.0005 (0.0019) model time 0.4390 (0.4446) loss 3.4469 (2.9331) grad_norm 1.4240 (1.8210) loss_scale 1024.0000 (857.4557) mem 16699MB [2024-08-10 14:15:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 152 training takes 0:04:39 [2024-08-10 14:15:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:15:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5469 (0.5469) Acc@1 88.574 (88.574) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 14:15:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.153) Loss 0.8564 (0.6784) Acc@1 80.078 (85.431) Acc@5 95.801 (97.505) Mem 16699MB [2024-08-10 14:15:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.136) Loss 0.9897 (0.7924) Acc@1 76.562 (82.489) Acc@5 94.922 (96.373) Mem 16699MB [2024-08-10 14:15:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.142 Acc@5 96.327 [2024-08-10 14:15:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-10 14:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.842 (0.842) Loss 0.4736 (0.4736) Acc@1 89.600 (89.600) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.186) Loss 0.7632 (0.5952) Acc@1 82.031 (86.941) Acc@5 96.533 (97.829) Mem 16699MB [2024-08-10 14:15:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.8687 (0.6996) Acc@1 78.613 (83.994) Acc@5 95.996 (96.780) Mem 16699MB [2024-08-10 14:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.687 Acc@5 96.797 [2024-08-10 14:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 14:16:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][0/625] eta 0:13:39 lr 0.000653 wd 0.0500 time 1.3115 (1.3115) data time 0.4409 (0.4409) model time 0.0000 (0.0000) loss 2.3419 (2.3419) grad_norm 1.5749 (1.5749) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][10/625] eta 0:05:22 lr 0.000652 wd 0.0500 time 0.4455 (0.5240) data time 0.0009 (0.0409) model time 0.0000 (0.0000) loss 2.5364 (2.7365) grad_norm 1.3612 (1.5705) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][20/625] eta 0:04:54 lr 0.000652 wd 0.0500 time 0.4428 (0.4860) data time 0.0009 (0.0219) model time 0.0000 (0.0000) loss 2.8557 (2.8055) grad_norm 1.4285 (1.5295) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][30/625] eta 0:04:41 lr 0.000652 wd 0.0500 time 0.4383 (0.4725) data time 0.0007 (0.0151) model time 0.0000 (0.0000) loss 3.1838 (2.7660) grad_norm 1.4850 (1.5407) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][40/625] eta 0:04:34 lr 0.000652 wd 0.0500 time 0.6572 (0.4701) data time 0.0007 (0.0117) model time 0.0000 (0.0000) loss 3.2591 (2.7798) grad_norm 1.6952 (1.5485) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][50/625] eta 0:04:27 lr 0.000652 wd 0.0500 time 0.4449 (0.4646) data time 0.0006 (0.0096) model time 0.0000 (0.0000) loss 3.6911 (2.8344) grad_norm 1.5341 (1.5883) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][60/625] eta 0:04:20 lr 0.000652 wd 0.0500 time 0.4424 (0.4607) data time 0.0008 (0.0082) model time 0.4416 (0.4400) loss 3.2301 (2.8271) grad_norm 1.1986 (1.6257) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][70/625] eta 0:04:14 lr 0.000652 wd 0.0500 time 0.4411 (0.4582) data time 0.0007 (0.0071) model time 0.4404 (0.4410) loss 3.0289 (2.8592) grad_norm 1.1662 (1.6638) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][80/625] eta 0:04:08 lr 0.000652 wd 0.0500 time 0.4430 (0.4566) data time 0.0009 (0.0064) model time 0.4421 (0.4419) loss 3.1718 (2.8688) grad_norm 2.9060 (1.7335) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][90/625] eta 0:04:03 lr 0.000652 wd 0.0500 time 0.4467 (0.4552) data time 0.0006 (0.0058) model time 0.4460 (0.4421) loss 2.2287 (2.8552) grad_norm 2.2511 (1.8305) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][100/625] eta 0:03:58 lr 0.000652 wd 0.0500 time 0.4432 (0.4539) data time 0.0008 (0.0054) model time 0.4424 (0.4420) loss 3.3100 (2.8695) grad_norm 1.5691 (1.8061) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][110/625] eta 0:03:53 lr 0.000651 wd 0.0500 time 0.4443 (0.4529) data time 0.0009 (0.0050) model time 0.4434 (0.4419) loss 2.6111 (2.8757) grad_norm 1.9023 (1.7915) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][120/625] eta 0:03:48 lr 0.000651 wd 0.0500 time 0.4399 (0.4522) data time 0.0009 (0.0047) model time 0.4390 (0.4421) loss 3.2143 (2.8732) grad_norm 1.2911 (1.7774) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][130/625] eta 0:03:43 lr 0.000651 wd 0.0500 time 0.4416 (0.4514) data time 0.0009 (0.0044) model time 0.4407 (0.4418) loss 3.4176 (2.8839) grad_norm 1.1329 (1.7837) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][140/625] eta 0:03:38 lr 0.000651 wd 0.0500 time 0.4434 (0.4507) data time 0.0009 (0.0041) model time 0.4425 (0.4418) loss 3.2050 (2.8791) grad_norm 1.4705 (1.7505) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][150/625] eta 0:03:33 lr 0.000651 wd 0.0500 time 0.4444 (0.4503) data time 0.0007 (0.0039) model time 0.4437 (0.4419) loss 3.0500 (2.8809) grad_norm 1.7436 (1.7302) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][160/625] eta 0:03:29 lr 0.000651 wd 0.0500 time 0.4383 (0.4498) data time 0.0009 (0.0037) model time 0.4374 (0.4418) loss 3.1107 (2.8866) grad_norm 1.8330 (1.7248) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][170/625] eta 0:03:24 lr 0.000651 wd 0.0500 time 0.4451 (0.4494) data time 0.0008 (0.0036) model time 0.4443 (0.4419) loss 3.1245 (2.8890) grad_norm 2.0050 (1.7324) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][180/625] eta 0:03:19 lr 0.000651 wd 0.0500 time 0.4445 (0.4491) data time 0.0010 (0.0034) model time 0.4435 (0.4420) loss 3.6641 (2.8900) grad_norm 1.9194 (1.7458) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][190/625] eta 0:03:15 lr 0.000651 wd 0.0500 time 0.4492 (0.4498) data time 0.0006 (0.0033) model time 0.4485 (0.4435) loss 3.3261 (2.9036) grad_norm 1.6492 (1.7394) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][200/625] eta 0:03:11 lr 0.000650 wd 0.0500 time 0.4449 (0.4496) data time 0.0010 (0.0032) model time 0.4439 (0.4435) loss 3.5147 (2.9045) grad_norm 1.9790 (1.7277) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][210/625] eta 0:03:06 lr 0.000650 wd 0.0500 time 0.4433 (0.4493) data time 0.0008 (0.0031) model time 0.4424 (0.4435) loss 2.4051 (2.9010) grad_norm 1.1812 (1.7188) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][220/625] eta 0:03:01 lr 0.000650 wd 0.0500 time 0.4463 (0.4492) data time 0.0006 (0.0030) model time 0.4457 (0.4436) loss 3.5459 (2.8955) grad_norm 1.8002 (1.7205) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][230/625] eta 0:02:57 lr 0.000650 wd 0.0500 time 0.6373 (0.4499) data time 0.0008 (0.0029) model time 0.6365 (0.4447) loss 2.9430 (2.8927) grad_norm 1.5970 (1.7210) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][240/625] eta 0:02:53 lr 0.000650 wd 0.0500 time 0.4472 (0.4497) data time 0.0009 (0.0028) model time 0.4463 (0.4447) loss 2.7235 (2.9003) grad_norm 1.6549 (1.7128) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][250/625] eta 0:02:48 lr 0.000650 wd 0.0500 time 0.4445 (0.4494) data time 0.0009 (0.0027) model time 0.4436 (0.4446) loss 3.3965 (2.9102) grad_norm 1.6530 (1.7130) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:17:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][260/625] eta 0:02:43 lr 0.000650 wd 0.0500 time 0.4414 (0.4492) data time 0.0006 (0.0026) model time 0.4408 (0.4445) loss 3.0963 (2.9102) grad_norm 2.2361 (1.7179) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][270/625] eta 0:02:39 lr 0.000650 wd 0.0500 time 0.4426 (0.4489) data time 0.0009 (0.0026) model time 0.4417 (0.4443) loss 3.0833 (2.9188) grad_norm 1.5765 (1.7174) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][280/625] eta 0:02:34 lr 0.000650 wd 0.0500 time 0.4397 (0.4487) data time 0.0007 (0.0025) model time 0.4391 (0.4442) loss 3.4525 (2.9181) grad_norm 1.6348 (1.7304) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][290/625] eta 0:02:30 lr 0.000650 wd 0.0500 time 0.4424 (0.4484) data time 0.0008 (0.0025) model time 0.4416 (0.4440) loss 2.9482 (2.9151) grad_norm 1.4413 (1.7287) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][300/625] eta 0:02:25 lr 0.000649 wd 0.0500 time 0.4404 (0.4488) data time 0.0009 (0.0024) model time 0.4395 (0.4446) loss 3.1506 (2.9182) grad_norm 1.3051 (1.7322) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][310/625] eta 0:02:21 lr 0.000649 wd 0.0500 time 0.4458 (0.4486) data time 0.0006 (0.0024) model time 0.4452 (0.4445) loss 2.3367 (2.9180) grad_norm 1.8898 (1.7378) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][320/625] eta 0:02:16 lr 0.000649 wd 0.0500 time 0.4392 (0.4484) data time 0.0007 (0.0023) model time 0.4386 (0.4444) loss 2.4571 (2.9172) grad_norm 2.4226 (1.7569) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][330/625] eta 0:02:12 lr 0.000649 wd 0.0500 time 0.4406 (0.4482) data time 0.0009 (0.0023) model time 0.4397 (0.4442) loss 3.0592 (2.9142) grad_norm 2.1107 (1.7830) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][340/625] eta 0:02:07 lr 0.000649 wd 0.0500 time 0.4410 (0.4480) data time 0.0007 (0.0022) model time 0.4403 (0.4442) loss 3.3953 (2.9215) grad_norm 1.9659 (1.7841) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][350/625] eta 0:02:03 lr 0.000649 wd 0.0500 time 0.4379 (0.4479) data time 0.0007 (0.0022) model time 0.4372 (0.4441) loss 2.7339 (2.9227) grad_norm 1.2587 (1.7766) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][360/625] eta 0:01:58 lr 0.000649 wd 0.0500 time 0.4405 (0.4477) data time 0.0007 (0.0022) model time 0.4398 (0.4440) loss 2.8769 (2.9255) grad_norm 2.2200 (1.7732) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][370/625] eta 0:01:54 lr 0.000649 wd 0.0500 time 0.4423 (0.4476) data time 0.0007 (0.0021) model time 0.4416 (0.4439) loss 2.8694 (2.9291) grad_norm 1.9272 (1.7683) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][380/625] eta 0:01:49 lr 0.000649 wd 0.0500 time 0.4422 (0.4475) data time 0.0008 (0.0021) model time 0.4415 (0.4439) loss 3.1930 (2.9351) grad_norm 1.6564 (1.7756) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][390/625] eta 0:01:45 lr 0.000648 wd 0.0500 time 0.4443 (0.4474) data time 0.0006 (0.0021) model time 0.4436 (0.4439) loss 3.3567 (2.9381) grad_norm 1.5431 (1.7971) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:18:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][400/625] eta 0:01:40 lr 0.000648 wd 0.0500 time 0.4458 (0.4474) data time 0.0009 (0.0020) model time 0.4450 (0.4439) loss 3.0567 (2.9322) grad_norm 1.4342 (1.7978) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][410/625] eta 0:01:36 lr 0.000648 wd 0.0500 time 0.4410 (0.4476) data time 0.0007 (0.0020) model time 0.4403 (0.4443) loss 3.3126 (2.9326) grad_norm 1.4692 (1.7961) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][420/625] eta 0:01:31 lr 0.000648 wd 0.0500 time 0.4419 (0.4475) data time 0.0008 (0.0020) model time 0.4411 (0.4442) loss 3.3022 (2.9324) grad_norm 1.6342 (1.7895) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][430/625] eta 0:01:27 lr 0.000648 wd 0.0500 time 0.4740 (0.4475) data time 0.0007 (0.0020) model time 0.4733 (0.4443) loss 3.0872 (2.9363) grad_norm 1.4184 (1.7810) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][440/625] eta 0:01:22 lr 0.000648 wd 0.0500 time 0.4439 (0.4475) data time 0.0010 (0.0019) model time 0.4429 (0.4443) loss 2.6072 (2.9394) grad_norm 1.8012 (1.7878) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][450/625] eta 0:01:18 lr 0.000648 wd 0.0500 time 0.6661 (0.4479) data time 0.0007 (0.0019) model time 0.6654 (0.4448) loss 2.0240 (2.9343) grad_norm 1.2860 (1.7902) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][460/625] eta 0:01:13 lr 0.000648 wd 0.0500 time 0.4405 (0.4478) data time 0.0008 (0.0019) model time 0.4397 (0.4448) loss 3.1938 (2.9373) grad_norm 2.0021 (1.7907) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][470/625] eta 0:01:09 lr 0.000648 wd 0.0500 time 0.4470 (0.4477) data time 0.0009 (0.0019) model time 0.4461 (0.4447) loss 2.9853 (2.9380) grad_norm 1.4703 (1.7847) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][480/625] eta 0:01:04 lr 0.000648 wd 0.0500 time 0.4445 (0.4476) data time 0.0008 (0.0019) model time 0.4436 (0.4446) loss 3.2227 (2.9421) grad_norm 1.4551 (1.7791) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][490/625] eta 0:01:00 lr 0.000647 wd 0.0500 time 0.4407 (0.4476) data time 0.0008 (0.0019) model time 0.4400 (0.4447) loss 3.3062 (2.9471) grad_norm 1.7679 (1.7791) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][500/625] eta 0:00:55 lr 0.000647 wd 0.0500 time 0.4398 (0.4475) data time 0.0008 (0.0018) model time 0.4389 (0.4446) loss 3.2418 (2.9454) grad_norm 3.9683 (1.7945) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][510/625] eta 0:00:51 lr 0.000647 wd 0.0500 time 0.4485 (0.4475) data time 0.0009 (0.0018) model time 0.4477 (0.4446) loss 2.9150 (2.9470) grad_norm 3.9182 (1.8049) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][520/625] eta 0:00:46 lr 0.000647 wd 0.0500 time 0.4404 (0.4474) data time 0.0010 (0.0018) model time 0.4394 (0.4445) loss 3.0536 (2.9529) grad_norm 3.2646 (1.8054) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:19:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][530/625] eta 0:00:42 lr 0.000647 wd 0.0500 time 0.4415 (0.4473) data time 0.0009 (0.0018) model time 0.4407 (0.4445) loss 3.5015 (2.9556) grad_norm 1.6678 (1.8102) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][540/625] eta 0:00:38 lr 0.000647 wd 0.0500 time 0.4453 (0.4472) data time 0.0009 (0.0018) model time 0.4444 (0.4445) loss 3.1777 (2.9579) grad_norm 1.0995 (1.8049) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][550/625] eta 0:00:33 lr 0.000647 wd 0.0500 time 0.4458 (0.4472) data time 0.0009 (0.0018) model time 0.4449 (0.4444) loss 3.4290 (2.9576) grad_norm 1.3364 (1.8005) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][560/625] eta 0:00:29 lr 0.000647 wd 0.0500 time 0.4413 (0.4474) data time 0.0006 (0.0017) model time 0.4407 (0.4447) loss 3.3509 (2.9567) grad_norm 1.6439 (1.7956) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][570/625] eta 0:00:24 lr 0.000647 wd 0.0500 time 0.4420 (0.4473) data time 0.0006 (0.0017) model time 0.4414 (0.4447) loss 3.8411 (2.9563) grad_norm 1.6685 (1.8019) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][580/625] eta 0:00:20 lr 0.000646 wd 0.0500 time 0.4418 (0.4472) data time 0.0009 (0.0017) model time 0.4409 (0.4446) loss 3.4304 (2.9613) grad_norm 1.6897 (1.7991) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][590/625] eta 0:00:15 lr 0.000646 wd 0.0500 time 0.4432 (0.4472) data time 0.0006 (0.0017) model time 0.4426 (0.4445) loss 3.4118 (2.9645) grad_norm 1.4929 (1.8033) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][600/625] eta 0:00:11 lr 0.000646 wd 0.0500 time 0.4411 (0.4474) data time 0.0006 (0.0017) model time 0.4405 (0.4448) loss 3.4059 (2.9675) grad_norm 1.2193 (1.7980) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][610/625] eta 0:00:06 lr 0.000646 wd 0.0500 time 0.4367 (0.4473) data time 0.0006 (0.0017) model time 0.4361 (0.4448) loss 2.9245 (2.9684) grad_norm 1.3893 (1.7898) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [153/300][620/625] eta 0:00:02 lr 0.000646 wd 0.0500 time 0.4347 (0.4472) data time 0.0006 (0.0017) model time 0.4341 (0.4447) loss 3.3004 (2.9670) grad_norm 3.1444 (1.7924) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 153 training takes 0:04:39 [2024-08-10 14:20:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:20:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5557 (0.5557) Acc@1 88.135 (88.135) Acc@5 97.998 (97.998) Mem 16699MB [2024-08-10 14:20:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8794 (0.6641) Acc@1 79.639 (85.445) Acc@5 95.410 (97.372) Mem 16699MB [2024-08-10 14:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9780 (0.7877) Acc@1 76.514 (82.424) Acc@5 95.166 (96.082) Mem 16699MB [2024-08-10 14:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.154 Acc@5 96.095 [2024-08-10 14:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-10 14:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.977 (0.977) Loss 0.4744 (0.4744) Acc@1 89.404 (89.404) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:20:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.197) Loss 0.7646 (0.5949) Acc@1 82.178 (86.932) Acc@5 96.533 (97.829) Mem 16699MB [2024-08-10 14:20:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.159) Loss 0.8687 (0.6993) Acc@1 78.467 (84.005) Acc@5 96.094 (96.773) Mem 16699MB [2024-08-10 14:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.697 Acc@5 96.797 [2024-08-10 14:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 14:20:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][0/625] eta 0:13:17 lr 0.000646 wd 0.0500 time 1.2761 (1.2761) data time 0.5491 (0.5491) model time 0.0000 (0.0000) loss 3.4581 (3.4581) grad_norm 1.5927 (1.5927) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][10/625] eta 0:05:18 lr 0.000646 wd 0.0500 time 0.4489 (0.5186) data time 0.0009 (0.0507) model time 0.0000 (0.0000) loss 2.0391 (2.7445) grad_norm 1.4161 (1.5499) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 14:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][20/625] eta 0:04:52 lr 0.000646 wd 0.0500 time 0.4438 (0.4828) data time 0.0007 (0.0270) model time 0.0000 (0.0000) loss 3.3324 (2.7407) grad_norm 3.5334 (inf) loss_scale 512.0000 (877.7143) mem 16699MB [2024-08-10 14:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][30/625] eta 0:04:39 lr 0.000646 wd 0.0500 time 0.4421 (0.4702) data time 0.0009 (0.0186) model time 0.0000 (0.0000) loss 3.0394 (2.7935) grad_norm 1.7731 (inf) loss_scale 512.0000 (759.7419) mem 16699MB [2024-08-10 14:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][40/625] eta 0:04:33 lr 0.000646 wd 0.0500 time 0.4436 (0.4682) data time 0.0007 (0.0143) model time 0.0000 (0.0000) loss 3.0042 (2.7548) grad_norm 1.5220 (inf) loss_scale 512.0000 (699.3171) mem 16699MB [2024-08-10 14:21:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][50/625] eta 0:04:26 lr 0.000645 wd 0.0500 time 0.4440 (0.4634) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 2.7504 (2.7761) grad_norm 2.0106 (inf) loss_scale 512.0000 (662.5882) mem 16699MB [2024-08-10 14:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][60/625] eta 0:04:19 lr 0.000645 wd 0.0500 time 0.4431 (0.4601) data time 0.0009 (0.0099) model time 0.4422 (0.4421) loss 2.4428 (2.8185) grad_norm 2.0142 (inf) loss_scale 512.0000 (637.9016) mem 16699MB [2024-08-10 14:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][70/625] eta 0:04:14 lr 0.000645 wd 0.0500 time 0.4416 (0.4577) data time 0.0008 (0.0086) model time 0.4408 (0.4424) loss 3.3144 (2.8310) grad_norm 1.4994 (inf) loss_scale 512.0000 (620.1690) mem 16699MB [2024-08-10 14:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][80/625] eta 0:04:08 lr 0.000645 wd 0.0500 time 0.4435 (0.4558) data time 0.0009 (0.0077) model time 0.4426 (0.4419) loss 2.6457 (2.8597) grad_norm 1.7864 (inf) loss_scale 512.0000 (606.8148) mem 16699MB [2024-08-10 14:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][90/625] eta 0:04:03 lr 0.000645 wd 0.0500 time 0.4436 (0.4545) data time 0.0006 (0.0070) model time 0.4429 (0.4423) loss 2.9724 (2.8665) grad_norm 2.1655 (inf) loss_scale 512.0000 (596.3956) mem 16699MB [2024-08-10 14:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][100/625] eta 0:03:58 lr 0.000645 wd 0.0500 time 0.4404 (0.4534) data time 0.0010 (0.0063) model time 0.4394 (0.4423) loss 2.5766 (2.8962) grad_norm 1.7892 (inf) loss_scale 512.0000 (588.0396) mem 16699MB [2024-08-10 14:21:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][110/625] eta 0:03:53 lr 0.000645 wd 0.0500 time 0.4475 (0.4527) data time 0.0010 (0.0059) model time 0.4465 (0.4426) loss 3.3374 (2.9136) grad_norm 1.4726 (inf) loss_scale 512.0000 (581.1892) mem 16699MB [2024-08-10 14:21:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][120/625] eta 0:03:49 lr 0.000645 wd 0.0500 time 0.4380 (0.4535) data time 0.0009 (0.0055) model time 0.4370 (0.4453) loss 3.3096 (2.9184) grad_norm 2.5134 (inf) loss_scale 512.0000 (575.4711) mem 16699MB [2024-08-10 14:21:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][130/625] eta 0:03:44 lr 0.000645 wd 0.0500 time 0.4538 (0.4529) data time 0.0007 (0.0051) model time 0.4531 (0.4451) loss 2.9131 (2.9350) grad_norm 1.6154 (inf) loss_scale 512.0000 (570.6260) mem 16699MB [2024-08-10 14:21:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][140/625] eta 0:03:40 lr 0.000644 wd 0.0500 time 0.4424 (0.4537) data time 0.0010 (0.0048) model time 0.4414 (0.4472) loss 3.0133 (2.9343) grad_norm 2.3501 (inf) loss_scale 512.0000 (566.4681) mem 16699MB [2024-08-10 14:21:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][150/625] eta 0:03:35 lr 0.000644 wd 0.0500 time 0.4419 (0.4531) data time 0.0009 (0.0046) model time 0.4410 (0.4468) loss 3.2752 (2.9489) grad_norm 1.4334 (inf) loss_scale 512.0000 (562.8609) mem 16699MB [2024-08-10 14:22:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][160/625] eta 0:03:30 lr 0.000644 wd 0.0500 time 0.4430 (0.4524) data time 0.0007 (0.0044) model time 0.4424 (0.4464) loss 2.8799 (2.9421) grad_norm 1.6543 (inf) loss_scale 512.0000 (559.7019) mem 16699MB [2024-08-10 14:22:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][170/625] eta 0:03:25 lr 0.000644 wd 0.0500 time 0.4438 (0.4520) data time 0.0006 (0.0042) model time 0.4431 (0.4462) loss 3.1377 (2.9445) grad_norm 2.0195 (inf) loss_scale 512.0000 (556.9123) mem 16699MB [2024-08-10 14:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][180/625] eta 0:03:20 lr 0.000644 wd 0.0500 time 0.4475 (0.4516) data time 0.0009 (0.0040) model time 0.4466 (0.4459) loss 2.8880 (2.9431) grad_norm 3.7553 (inf) loss_scale 512.0000 (554.4309) mem 16699MB [2024-08-10 14:22:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][190/625] eta 0:03:16 lr 0.000644 wd 0.0500 time 0.4411 (0.4512) data time 0.0007 (0.0039) model time 0.4403 (0.4457) loss 2.9592 (2.9466) grad_norm 3.0553 (inf) loss_scale 512.0000 (552.2094) mem 16699MB [2024-08-10 14:22:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][200/625] eta 0:03:11 lr 0.000644 wd 0.0500 time 0.4396 (0.4508) data time 0.0007 (0.0037) model time 0.4389 (0.4455) loss 3.2538 (2.9527) grad_norm 1.4101 (inf) loss_scale 512.0000 (550.2090) mem 16699MB [2024-08-10 14:22:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][210/625] eta 0:03:06 lr 0.000644 wd 0.0500 time 0.4562 (0.4505) data time 0.0009 (0.0036) model time 0.4553 (0.4453) loss 2.5338 (2.9498) grad_norm 1.5425 (inf) loss_scale 512.0000 (548.3981) mem 16699MB [2024-08-10 14:22:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][220/625] eta 0:03:02 lr 0.000644 wd 0.0500 time 0.4401 (0.4501) data time 0.0009 (0.0035) model time 0.4392 (0.4450) loss 3.1163 (2.9508) grad_norm 2.1103 (inf) loss_scale 512.0000 (546.7511) mem 16699MB [2024-08-10 14:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][230/625] eta 0:02:57 lr 0.000644 wd 0.0500 time 0.4400 (0.4501) data time 0.0009 (0.0034) model time 0.4391 (0.4453) loss 3.1399 (2.9532) grad_norm 1.9361 (inf) loss_scale 512.0000 (545.2468) mem 16699MB [2024-08-10 14:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][240/625] eta 0:02:53 lr 0.000643 wd 0.0500 time 0.4439 (0.4498) data time 0.0008 (0.0033) model time 0.4431 (0.4452) loss 2.5888 (2.9487) grad_norm 1.8504 (inf) loss_scale 512.0000 (543.8672) mem 16699MB [2024-08-10 14:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][250/625] eta 0:02:48 lr 0.000643 wd 0.0500 time 0.4433 (0.4496) data time 0.0007 (0.0032) model time 0.4426 (0.4450) loss 2.8549 (2.9369) grad_norm 1.5000 (inf) loss_scale 512.0000 (542.5976) mem 16699MB [2024-08-10 14:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][260/625] eta 0:02:44 lr 0.000643 wd 0.0500 time 0.4406 (0.4494) data time 0.0006 (0.0031) model time 0.4400 (0.4449) loss 3.6920 (2.9310) grad_norm 1.8647 (inf) loss_scale 512.0000 (541.4253) mem 16699MB [2024-08-10 14:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][270/625] eta 0:02:39 lr 0.000643 wd 0.0500 time 0.4400 (0.4492) data time 0.0008 (0.0030) model time 0.4392 (0.4448) loss 2.9825 (2.9365) grad_norm 1.2540 (inf) loss_scale 512.0000 (540.3395) mem 16699MB [2024-08-10 14:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][280/625] eta 0:02:34 lr 0.000643 wd 0.0500 time 0.4426 (0.4491) data time 0.0007 (0.0030) model time 0.4420 (0.4449) loss 3.2669 (2.9363) grad_norm 2.2679 (inf) loss_scale 512.0000 (539.3310) mem 16699MB [2024-08-10 14:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][290/625] eta 0:02:30 lr 0.000643 wd 0.0500 time 0.4431 (0.4490) data time 0.0008 (0.0029) model time 0.4423 (0.4449) loss 3.0298 (2.9403) grad_norm 1.3654 (inf) loss_scale 512.0000 (538.3918) mem 16699MB [2024-08-10 14:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][300/625] eta 0:02:25 lr 0.000643 wd 0.0500 time 0.4445 (0.4488) data time 0.0008 (0.0028) model time 0.4438 (0.4448) loss 2.9684 (2.9379) grad_norm 1.8350 (inf) loss_scale 512.0000 (537.5150) mem 16699MB [2024-08-10 14:23:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][310/625] eta 0:02:21 lr 0.000643 wd 0.0500 time 0.4458 (0.4489) data time 0.0008 (0.0028) model time 0.4450 (0.4450) loss 3.0395 (2.9347) grad_norm 1.2871 (inf) loss_scale 512.0000 (536.6945) mem 16699MB [2024-08-10 14:23:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][320/625] eta 0:02:16 lr 0.000643 wd 0.0500 time 0.4441 (0.4488) data time 0.0007 (0.0027) model time 0.4435 (0.4450) loss 3.4189 (2.9485) grad_norm 1.3024 (inf) loss_scale 512.0000 (535.9252) mem 16699MB [2024-08-10 14:23:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][330/625] eta 0:02:12 lr 0.000642 wd 0.0500 time 0.4430 (0.4488) data time 0.0009 (0.0028) model time 0.4422 (0.4450) loss 2.7051 (2.9518) grad_norm 1.7503 (inf) loss_scale 512.0000 (535.2024) mem 16699MB [2024-08-10 14:23:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][340/625] eta 0:02:07 lr 0.000642 wd 0.0500 time 0.4414 (0.4486) data time 0.0008 (0.0027) model time 0.4405 (0.4449) loss 2.7726 (2.9458) grad_norm 1.5950 (inf) loss_scale 512.0000 (534.5220) mem 16699MB [2024-08-10 14:23:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][350/625] eta 0:02:03 lr 0.000642 wd 0.0500 time 0.4422 (0.4485) data time 0.0008 (0.0027) model time 0.4413 (0.4449) loss 2.6704 (2.9469) grad_norm 2.1774 (inf) loss_scale 512.0000 (533.8803) mem 16699MB [2024-08-10 14:23:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][360/625] eta 0:01:58 lr 0.000642 wd 0.0500 time 0.4387 (0.4484) data time 0.0007 (0.0026) model time 0.4381 (0.4448) loss 2.5633 (2.9393) grad_norm 1.8656 (inf) loss_scale 512.0000 (533.2742) mem 16699MB [2024-08-10 14:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][370/625] eta 0:01:54 lr 0.000642 wd 0.0500 time 0.4497 (0.4488) data time 0.0009 (0.0026) model time 0.4487 (0.4453) loss 2.5285 (2.9307) grad_norm 1.9826 (inf) loss_scale 512.0000 (532.7008) mem 16699MB [2024-08-10 14:23:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][380/625] eta 0:01:49 lr 0.000642 wd 0.0500 time 0.4432 (0.4486) data time 0.0009 (0.0025) model time 0.4423 (0.4452) loss 3.3302 (2.9349) grad_norm 1.8128 (inf) loss_scale 512.0000 (532.1575) mem 16699MB [2024-08-10 14:23:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][390/625] eta 0:01:45 lr 0.000642 wd 0.0500 time 0.4431 (0.4485) data time 0.0009 (0.0025) model time 0.4421 (0.4452) loss 3.6091 (2.9367) grad_norm 1.8429 (inf) loss_scale 512.0000 (531.6419) mem 16699MB [2024-08-10 14:23:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][400/625] eta 0:01:40 lr 0.000642 wd 0.0500 time 0.4428 (0.4484) data time 0.0007 (0.0024) model time 0.4421 (0.4451) loss 2.5257 (2.9317) grad_norm 1.4563 (inf) loss_scale 512.0000 (531.1521) mem 16699MB [2024-08-10 14:23:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][410/625] eta 0:01:36 lr 0.000642 wd 0.0500 time 0.4625 (0.4483) data time 0.0007 (0.0024) model time 0.4618 (0.4451) loss 3.5016 (2.9328) grad_norm 1.7494 (inf) loss_scale 512.0000 (530.6861) mem 16699MB [2024-08-10 14:23:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][420/625] eta 0:01:31 lr 0.000641 wd 0.0500 time 0.4473 (0.4482) data time 0.0006 (0.0024) model time 0.4467 (0.4451) loss 2.5299 (2.9303) grad_norm 1.5851 (inf) loss_scale 512.0000 (530.2423) mem 16699MB [2024-08-10 14:24:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][430/625] eta 0:01:27 lr 0.000641 wd 0.0500 time 0.4417 (0.4481) data time 0.0007 (0.0023) model time 0.4410 (0.4450) loss 3.4444 (2.9340) grad_norm 1.6188 (inf) loss_scale 512.0000 (529.8190) mem 16699MB [2024-08-10 14:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][440/625] eta 0:01:22 lr 0.000641 wd 0.0500 time 0.4426 (0.4481) data time 0.0008 (0.0023) model time 0.4418 (0.4450) loss 3.0928 (2.9345) grad_norm 1.5892 (inf) loss_scale 512.0000 (529.4150) mem 16699MB [2024-08-10 14:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][450/625] eta 0:01:18 lr 0.000641 wd 0.0500 time 0.6435 (0.4484) data time 0.0007 (0.0023) model time 0.6429 (0.4455) loss 1.7386 (2.9343) grad_norm 1.1038 (inf) loss_scale 512.0000 (529.0288) mem 16699MB [2024-08-10 14:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][460/625] eta 0:01:13 lr 0.000641 wd 0.0500 time 0.4485 (0.4483) data time 0.0008 (0.0022) model time 0.4477 (0.4454) loss 3.0373 (2.9353) grad_norm 1.5600 (inf) loss_scale 512.0000 (528.6594) mem 16699MB [2024-08-10 14:24:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][470/625] eta 0:01:09 lr 0.000641 wd 0.0500 time 0.4418 (0.4487) data time 0.0006 (0.0022) model time 0.4412 (0.4458) loss 1.6031 (2.9333) grad_norm 2.5979 (inf) loss_scale 512.0000 (528.3057) mem 16699MB [2024-08-10 14:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][480/625] eta 0:01:05 lr 0.000641 wd 0.0500 time 0.4448 (0.4486) data time 0.0008 (0.0022) model time 0.4440 (0.4457) loss 2.8493 (2.9314) grad_norm 1.8062 (inf) loss_scale 512.0000 (527.9667) mem 16699MB [2024-08-10 14:24:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][490/625] eta 0:01:00 lr 0.000641 wd 0.0500 time 0.4425 (0.4484) data time 0.0007 (0.0022) model time 0.4418 (0.4456) loss 3.2601 (2.9327) grad_norm 1.6634 (inf) loss_scale 512.0000 (527.6415) mem 16699MB [2024-08-10 14:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][500/625] eta 0:00:56 lr 0.000641 wd 0.0500 time 0.4424 (0.4483) data time 0.0008 (0.0021) model time 0.4416 (0.4455) loss 2.8068 (2.9302) grad_norm 1.4355 (inf) loss_scale 512.0000 (527.3293) mem 16699MB [2024-08-10 14:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][510/625] eta 0:00:51 lr 0.000641 wd 0.0500 time 0.4410 (0.4482) data time 0.0006 (0.0021) model time 0.4404 (0.4454) loss 3.3644 (2.9296) grad_norm 1.2389 (inf) loss_scale 512.0000 (527.0294) mem 16699MB [2024-08-10 14:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][520/625] eta 0:00:47 lr 0.000640 wd 0.0500 time 0.4451 (0.4481) data time 0.0006 (0.0021) model time 0.4444 (0.4454) loss 3.5096 (2.9314) grad_norm 1.7670 (inf) loss_scale 512.0000 (526.7409) mem 16699MB [2024-08-10 14:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][530/625] eta 0:00:42 lr 0.000640 wd 0.0500 time 0.4445 (0.4480) data time 0.0009 (0.0021) model time 0.4436 (0.4453) loss 2.0958 (2.9258) grad_norm 1.4603 (inf) loss_scale 512.0000 (526.4633) mem 16699MB [2024-08-10 14:24:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][540/625] eta 0:00:38 lr 0.000640 wd 0.0500 time 0.4427 (0.4479) data time 0.0009 (0.0020) model time 0.4418 (0.4452) loss 2.6927 (2.9236) grad_norm 1.7485 (inf) loss_scale 512.0000 (526.1959) mem 16699MB [2024-08-10 14:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][550/625] eta 0:00:33 lr 0.000640 wd 0.0500 time 0.4404 (0.4478) data time 0.0007 (0.0020) model time 0.4397 (0.4452) loss 3.9945 (2.9240) grad_norm 2.6266 (inf) loss_scale 512.0000 (525.9383) mem 16699MB [2024-08-10 14:24:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][560/625] eta 0:00:29 lr 0.000640 wd 0.0500 time 0.4458 (0.4479) data time 0.0009 (0.0020) model time 0.4448 (0.4453) loss 2.5091 (2.9234) grad_norm 1.3250 (inf) loss_scale 512.0000 (525.6898) mem 16699MB [2024-08-10 14:25:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][570/625] eta 0:00:24 lr 0.000640 wd 0.0500 time 0.4459 (0.4478) data time 0.0006 (0.0020) model time 0.4453 (0.4452) loss 3.5556 (2.9249) grad_norm 1.7459 (inf) loss_scale 512.0000 (525.4501) mem 16699MB [2024-08-10 14:25:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][580/625] eta 0:00:20 lr 0.000640 wd 0.0500 time 0.4421 (0.4477) data time 0.0009 (0.0020) model time 0.4413 (0.4452) loss 2.5980 (2.9271) grad_norm 1.2391 (inf) loss_scale 512.0000 (525.2186) mem 16699MB [2024-08-10 14:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][590/625] eta 0:00:15 lr 0.000640 wd 0.0500 time 0.4386 (0.4476) data time 0.0008 (0.0020) model time 0.4378 (0.4451) loss 2.8683 (2.9249) grad_norm 1.1463 (inf) loss_scale 512.0000 (524.9949) mem 16699MB [2024-08-10 14:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][600/625] eta 0:00:11 lr 0.000640 wd 0.0500 time 0.4365 (0.4475) data time 0.0009 (0.0019) model time 0.4357 (0.4450) loss 3.3448 (2.9256) grad_norm 1.6991 (inf) loss_scale 512.0000 (524.7787) mem 16699MB [2024-08-10 14:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][610/625] eta 0:00:06 lr 0.000639 wd 0.0500 time 0.4384 (0.4475) data time 0.0004 (0.0019) model time 0.4379 (0.4450) loss 2.1108 (2.9268) grad_norm 1.2501 (inf) loss_scale 512.0000 (524.5696) mem 16699MB [2024-08-10 14:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [154/300][620/625] eta 0:00:02 lr 0.000639 wd 0.0500 time 0.4395 (0.4473) data time 0.0004 (0.0019) model time 0.4391 (0.4449) loss 2.8846 (2.9280) grad_norm 3.0943 (inf) loss_scale 512.0000 (524.3671) mem 16699MB [2024-08-10 14:25:27 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 154 training takes 0:04:39 [2024-08-10 14:25:27 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:25:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:25:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.498 (0.498) Loss 0.5273 (0.5273) Acc@1 88.721 (88.721) Acc@5 98.486 (98.486) Mem 16699MB [2024-08-10 14:25:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.154) Loss 0.8472 (0.6514) Acc@1 80.273 (85.427) Acc@5 95.117 (97.443) Mem 16699MB [2024-08-10 14:25:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.136) Loss 0.9746 (0.7755) Acc@1 76.367 (82.359) Acc@5 94.727 (96.203) Mem 16699MB [2024-08-10 14:25:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.008 Acc@5 96.173 [2024-08-10 14:25:32 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.0% [2024-08-10 14:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.807 (0.807) Loss 0.4746 (0.4746) Acc@1 89.404 (89.404) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.184) Loss 0.7651 (0.5948) Acc@1 82.031 (86.936) Acc@5 96.436 (97.820) Mem 16699MB [2024-08-10 14:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8682 (0.6989) Acc@1 78.369 (84.063) Acc@5 96.045 (96.780) Mem 16699MB [2024-08-10 14:25:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.745 Acc@5 96.803 [2024-08-10 14:25:36 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-10 14:25:36 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.75% [2024-08-10 14:25:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:25:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][0/625] eta 0:08:39 lr 0.000639 wd 0.0500 time 0.8312 (0.8312) data time 0.4431 (0.4431) model time 0.0000 (0.0000) loss 2.0517 (2.0517) grad_norm 1.4904 (1.4904) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][10/625] eta 0:04:53 lr 0.000639 wd 0.0500 time 0.4433 (0.4778) data time 0.0009 (0.0412) model time 0.0000 (0.0000) loss 3.1217 (3.0827) grad_norm 1.7850 (1.5442) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][20/625] eta 0:04:39 lr 0.000639 wd 0.0500 time 0.4601 (0.4620) data time 0.0007 (0.0220) model time 0.0000 (0.0000) loss 1.8701 (2.9461) grad_norm 1.3938 (1.4959) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][30/625] eta 0:04:34 lr 0.000639 wd 0.0500 time 0.4427 (0.4622) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 3.2121 (2.9025) grad_norm 1.9878 (1.6249) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:25:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][40/625] eta 0:04:27 lr 0.000639 wd 0.0500 time 0.4484 (0.4580) data time 0.0008 (0.0117) model time 0.0000 (0.0000) loss 3.0367 (2.9086) grad_norm 1.7634 (1.6609) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][50/625] eta 0:04:21 lr 0.000639 wd 0.0500 time 0.4431 (0.4554) data time 0.0008 (0.0096) model time 0.0000 (0.0000) loss 3.4300 (2.9044) grad_norm 1.2605 (1.6392) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][60/625] eta 0:04:18 lr 0.000639 wd 0.0500 time 0.4429 (0.4569) data time 0.0009 (0.0081) model time 0.4420 (0.4633) loss 2.1450 (2.8861) grad_norm 1.4565 (1.6448) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][70/625] eta 0:04:12 lr 0.000639 wd 0.0500 time 0.4411 (0.4549) data time 0.0007 (0.0071) model time 0.4405 (0.4527) loss 2.9713 (2.8881) grad_norm 1.7601 (1.6820) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][80/625] eta 0:04:07 lr 0.000638 wd 0.0500 time 0.4416 (0.4541) data time 0.0008 (0.0064) model time 0.4408 (0.4510) loss 3.1525 (2.8778) grad_norm 1.8664 (1.7550) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][90/625] eta 0:04:02 lr 0.000638 wd 0.0500 time 0.4410 (0.4530) data time 0.0006 (0.0058) model time 0.4404 (0.4490) loss 3.3507 (2.8979) grad_norm 1.9017 (1.7405) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][100/625] eta 0:03:57 lr 0.000638 wd 0.0500 time 0.4420 (0.4519) data time 0.0008 (0.0053) model time 0.4412 (0.4473) loss 3.1648 (2.9176) grad_norm 1.6209 (1.7239) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][110/625] eta 0:03:52 lr 0.000638 wd 0.0500 time 0.4422 (0.4510) data time 0.0007 (0.0049) model time 0.4415 (0.4463) loss 3.7691 (2.9183) grad_norm 1.6048 (1.7298) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][120/625] eta 0:03:47 lr 0.000638 wd 0.0500 time 0.4479 (0.4505) data time 0.0006 (0.0045) model time 0.4473 (0.4460) loss 3.2826 (2.9244) grad_norm 1.7648 (1.7747) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][130/625] eta 0:03:42 lr 0.000638 wd 0.0500 time 0.4408 (0.4500) data time 0.0008 (0.0043) model time 0.4400 (0.4456) loss 3.7096 (2.9334) grad_norm 1.1348 (1.7637) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][140/625] eta 0:03:38 lr 0.000638 wd 0.0500 time 0.4437 (0.4495) data time 0.0009 (0.0040) model time 0.4428 (0.4453) loss 3.4339 (2.9432) grad_norm 1.2870 (1.7457) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][150/625] eta 0:03:33 lr 0.000638 wd 0.0500 time 0.4409 (0.4491) data time 0.0008 (0.0038) model time 0.4400 (0.4450) loss 3.2461 (2.9484) grad_norm 1.4827 (1.7387) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][160/625] eta 0:03:28 lr 0.000638 wd 0.0500 time 0.4401 (0.4486) data time 0.0006 (0.0036) model time 0.4395 (0.4445) loss 3.2352 (2.9529) grad_norm 2.0103 (1.7316) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][170/625] eta 0:03:23 lr 0.000637 wd 0.0500 time 0.4458 (0.4482) data time 0.0006 (0.0035) model time 0.4452 (0.4441) loss 3.1130 (2.9523) grad_norm 1.4471 (1.7243) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][180/625] eta 0:03:19 lr 0.000637 wd 0.0500 time 0.4428 (0.4478) data time 0.0006 (0.0034) model time 0.4421 (0.4439) loss 2.8146 (2.9571) grad_norm 2.6444 (1.7541) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][190/625] eta 0:03:14 lr 0.000637 wd 0.0500 time 0.4396 (0.4475) data time 0.0006 (0.0032) model time 0.4390 (0.4437) loss 2.4378 (2.9487) grad_norm 1.3887 (1.7647) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][200/625] eta 0:03:10 lr 0.000637 wd 0.0500 time 0.4542 (0.4473) data time 0.0008 (0.0031) model time 0.4534 (0.4436) loss 3.1544 (2.9576) grad_norm 0.9546 (1.7574) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][210/625] eta 0:03:05 lr 0.000637 wd 0.0500 time 0.4436 (0.4471) data time 0.0008 (0.0030) model time 0.4428 (0.4435) loss 3.1816 (2.9647) grad_norm 1.5913 (1.7757) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][220/625] eta 0:03:01 lr 0.000637 wd 0.0500 time 0.4394 (0.4475) data time 0.0008 (0.0029) model time 0.4386 (0.4442) loss 1.9116 (2.9665) grad_norm 1.5903 (1.7830) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][230/625] eta 0:02:56 lr 0.000637 wd 0.0500 time 0.4436 (0.4473) data time 0.0006 (0.0028) model time 0.4430 (0.4441) loss 2.8264 (2.9689) grad_norm 2.1727 (1.8009) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][240/625] eta 0:02:52 lr 0.000637 wd 0.0500 time 0.4429 (0.4471) data time 0.0009 (0.0028) model time 0.4421 (0.4439) loss 2.6052 (2.9458) grad_norm 1.8211 (1.8032) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][250/625] eta 0:02:47 lr 0.000637 wd 0.0500 time 0.4419 (0.4477) data time 0.0006 (0.0027) model time 0.4413 (0.4448) loss 3.1465 (2.9419) grad_norm 1.2485 (1.7972) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][260/625] eta 0:02:43 lr 0.000637 wd 0.0500 time 0.4397 (0.4474) data time 0.0008 (0.0026) model time 0.4389 (0.4446) loss 2.8902 (2.9432) grad_norm 1.1567 (1.7861) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][270/625] eta 0:02:38 lr 0.000636 wd 0.0500 time 0.4454 (0.4473) data time 0.0009 (0.0025) model time 0.4446 (0.4445) loss 3.5774 (2.9526) grad_norm 2.0970 (1.7869) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][280/625] eta 0:02:34 lr 0.000636 wd 0.0500 time 0.4431 (0.4472) data time 0.0009 (0.0025) model time 0.4422 (0.4444) loss 3.1600 (2.9487) grad_norm 2.2728 (1.7931) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][290/625] eta 0:02:29 lr 0.000636 wd 0.0500 time 0.4456 (0.4470) data time 0.0006 (0.0024) model time 0.4450 (0.4443) loss 2.3445 (2.9516) grad_norm 1.6056 (1.7943) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][300/625] eta 0:02:25 lr 0.000636 wd 0.0500 time 0.4404 (0.4474) data time 0.0007 (0.0024) model time 0.4397 (0.4448) loss 2.3049 (2.9511) grad_norm 1.6460 (1.7812) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][310/625] eta 0:02:20 lr 0.000636 wd 0.0500 time 0.4411 (0.4472) data time 0.0009 (0.0023) model time 0.4402 (0.4447) loss 2.4555 (2.9488) grad_norm 2.1438 (1.7783) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][320/625] eta 0:02:16 lr 0.000636 wd 0.0500 time 0.4466 (0.4471) data time 0.0006 (0.0023) model time 0.4460 (0.4446) loss 3.2319 (2.9479) grad_norm 2.0291 (1.7768) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][330/625] eta 0:02:11 lr 0.000636 wd 0.0500 time 0.4402 (0.4470) data time 0.0006 (0.0022) model time 0.4396 (0.4445) loss 3.5814 (2.9452) grad_norm 1.4440 (1.7856) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][340/625] eta 0:02:07 lr 0.000636 wd 0.0500 time 0.4446 (0.4469) data time 0.0006 (0.0022) model time 0.4440 (0.4445) loss 3.3944 (2.9455) grad_norm 1.5538 (1.7841) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][350/625] eta 0:02:02 lr 0.000636 wd 0.0500 time 0.4419 (0.4468) data time 0.0009 (0.0022) model time 0.4410 (0.4444) loss 3.1632 (2.9405) grad_norm 1.1941 (1.7784) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][360/625] eta 0:01:58 lr 0.000635 wd 0.0500 time 0.4449 (0.4467) data time 0.0008 (0.0021) model time 0.4442 (0.4443) loss 3.2987 (2.9454) grad_norm 1.9452 (1.7802) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][370/625] eta 0:01:53 lr 0.000635 wd 0.0500 time 0.4460 (0.4466) data time 0.0006 (0.0021) model time 0.4454 (0.4443) loss 1.9913 (2.9382) grad_norm 1.7073 (1.7819) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][380/625] eta 0:01:49 lr 0.000635 wd 0.0500 time 0.4428 (0.4465) data time 0.0007 (0.0021) model time 0.4422 (0.4442) loss 3.4733 (2.9375) grad_norm 1.4266 (1.7950) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][390/625] eta 0:01:44 lr 0.000635 wd 0.0500 time 0.4430 (0.4464) data time 0.0006 (0.0020) model time 0.4423 (0.4441) loss 3.7219 (2.9417) grad_norm 1.8883 (1.7955) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][400/625] eta 0:01:40 lr 0.000635 wd 0.0500 time 0.4416 (0.4463) data time 0.0008 (0.0020) model time 0.4408 (0.4441) loss 3.1673 (2.9461) grad_norm 1.5705 (1.7914) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][410/625] eta 0:01:35 lr 0.000635 wd 0.0500 time 0.4400 (0.4462) data time 0.0009 (0.0020) model time 0.4391 (0.4440) loss 2.6657 (2.9515) grad_norm 1.5660 (1.7922) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][420/625] eta 0:01:31 lr 0.000635 wd 0.0500 time 0.4488 (0.4462) data time 0.0006 (0.0020) model time 0.4481 (0.4440) loss 2.6798 (2.9552) grad_norm 2.0895 (1.7898) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][430/625] eta 0:01:26 lr 0.000635 wd 0.0500 time 0.4446 (0.4461) data time 0.0008 (0.0019) model time 0.4438 (0.4440) loss 2.1655 (2.9564) grad_norm 15.8424 (1.8321) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][440/625] eta 0:01:22 lr 0.000635 wd 0.0500 time 0.4406 (0.4464) data time 0.0008 (0.0019) model time 0.4398 (0.4443) loss 2.9619 (2.9596) grad_norm 1.4373 (1.8438) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:28:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][450/625] eta 0:01:18 lr 0.000635 wd 0.0500 time 0.4385 (0.4466) data time 0.0006 (0.0019) model time 0.4379 (0.4446) loss 2.6163 (2.9581) grad_norm 1.4321 (1.8363) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][460/625] eta 0:01:13 lr 0.000634 wd 0.0500 time 0.4428 (0.4465) data time 0.0007 (0.0019) model time 0.4420 (0.4445) loss 3.6867 (2.9586) grad_norm 1.5131 (1.8292) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][470/625] eta 0:01:09 lr 0.000634 wd 0.0500 time 0.4400 (0.4473) data time 0.0009 (0.0019) model time 0.4392 (0.4454) loss 2.7678 (2.9578) grad_norm 1.4692 (1.8265) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][480/625] eta 0:01:04 lr 0.000634 wd 0.0500 time 0.4442 (0.4472) data time 0.0007 (0.0018) model time 0.4435 (0.4453) loss 2.7751 (2.9599) grad_norm 1.3513 (1.8239) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][490/625] eta 0:01:00 lr 0.000634 wd 0.0500 time 0.4439 (0.4471) data time 0.0009 (0.0018) model time 0.4431 (0.4452) loss 3.0524 (2.9592) grad_norm 1.3712 (1.8224) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][500/625] eta 0:00:55 lr 0.000634 wd 0.0500 time 0.4417 (0.4470) data time 0.0010 (0.0018) model time 0.4408 (0.4451) loss 3.0167 (2.9569) grad_norm 1.9060 (1.8234) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][510/625] eta 0:00:51 lr 0.000634 wd 0.0500 time 0.4436 (0.4469) data time 0.0006 (0.0018) model time 0.4429 (0.4450) loss 2.5751 (2.9555) grad_norm 1.5399 (1.8261) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][520/625] eta 0:00:46 lr 0.000634 wd 0.0500 time 0.4401 (0.4468) data time 0.0008 (0.0018) model time 0.4393 (0.4450) loss 3.3679 (2.9597) grad_norm 1.4262 (1.8224) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][530/625] eta 0:00:42 lr 0.000634 wd 0.0500 time 0.4407 (0.4467) data time 0.0008 (0.0017) model time 0.4399 (0.4449) loss 3.0185 (2.9620) grad_norm 1.8642 (1.8211) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][540/625] eta 0:00:37 lr 0.000634 wd 0.0500 time 0.4368 (0.4466) data time 0.0009 (0.0017) model time 0.4359 (0.4448) loss 2.7623 (2.9576) grad_norm 1.2366 (1.8193) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][550/625] eta 0:00:33 lr 0.000633 wd 0.0500 time 0.4440 (0.4466) data time 0.0006 (0.0017) model time 0.4434 (0.4448) loss 2.5772 (2.9583) grad_norm 2.1131 (1.8214) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][560/625] eta 0:00:29 lr 0.000633 wd 0.0500 time 0.4446 (0.4466) data time 0.0009 (0.0017) model time 0.4438 (0.4448) loss 2.6743 (2.9583) grad_norm 1.3273 (1.8230) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][570/625] eta 0:00:24 lr 0.000633 wd 0.0500 time 0.4544 (0.4465) data time 0.0009 (0.0017) model time 0.4535 (0.4447) loss 2.8191 (2.9564) grad_norm 2.3191 (1.8226) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][580/625] eta 0:00:20 lr 0.000633 wd 0.0500 time 0.4420 (0.4465) data time 0.0008 (0.0017) model time 0.4412 (0.4447) loss 3.1688 (2.9538) grad_norm 2.8209 (1.8276) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][590/625] eta 0:00:15 lr 0.000633 wd 0.0500 time 0.4389 (0.4467) data time 0.0009 (0.0017) model time 0.4380 (0.4450) loss 3.5080 (2.9531) grad_norm 1.7978 (1.8237) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][600/625] eta 0:00:11 lr 0.000633 wd 0.0500 time 0.4345 (0.4466) data time 0.0010 (0.0016) model time 0.4335 (0.4449) loss 2.1286 (2.9517) grad_norm 1.7977 (1.8191) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][610/625] eta 0:00:06 lr 0.000633 wd 0.0500 time 0.4370 (0.4466) data time 0.0006 (0.0016) model time 0.4363 (0.4449) loss 3.2249 (2.9518) grad_norm 1.7347 (1.8134) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [155/300][620/625] eta 0:00:02 lr 0.000633 wd 0.0500 time 0.4384 (0.4470) data time 0.0004 (0.0016) model time 0.4380 (0.4454) loss 3.0880 (2.9524) grad_norm 1.3048 (1.8090) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 155 training takes 0:04:39 [2024-08-10 14:30:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:30:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:30:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5137 (0.5137) Acc@1 88.379 (88.379) Acc@5 98.633 (98.633) Mem 16699MB [2024-08-10 14:30:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8647 (0.6449) Acc@1 80.371 (85.582) Acc@5 95.215 (97.532) Mem 16699MB [2024-08-10 14:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.9390 (0.7674) Acc@1 77.637 (82.640) Acc@5 94.629 (96.298) Mem 16699MB [2024-08-10 14:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.338 Acc@5 96.285 [2024-08-10 14:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 14:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.34% [2024-08-10 14:30:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 14:30:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 14:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.4749 (0.4749) Acc@1 89.404 (89.404) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 14:30:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.7627 (0.5944) Acc@1 82.031 (86.932) Acc@5 96.484 (97.838) Mem 16699MB [2024-08-10 14:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.8677 (0.6984) Acc@1 78.467 (84.119) Acc@5 95.752 (96.789) Mem 16699MB [2024-08-10 14:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.799 Acc@5 96.809 [2024-08-10 14:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.80% [2024-08-10 14:30:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:30:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:30:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][0/625] eta 0:08:05 lr 0.000633 wd 0.0500 time 0.7772 (0.7772) data time 0.3824 (0.3824) model time 0.0000 (0.0000) loss 3.2590 (3.2590) grad_norm 1.3007 (1.3007) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][10/625] eta 0:05:00 lr 0.000633 wd 0.0500 time 0.4434 (0.4889) data time 0.0008 (0.0356) model time 0.0000 (0.0000) loss 3.0749 (2.9205) grad_norm 2.2347 (3.2210) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][20/625] eta 0:04:42 lr 0.000632 wd 0.0500 time 0.4388 (0.4675) data time 0.0007 (0.0191) model time 0.0000 (0.0000) loss 2.8222 (2.9130) grad_norm 3.5326 (2.8608) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][30/625] eta 0:04:33 lr 0.000632 wd 0.0500 time 0.4446 (0.4598) data time 0.0008 (0.0132) model time 0.0000 (0.0000) loss 2.7962 (2.9301) grad_norm 1.3975 (2.4681) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][40/625] eta 0:04:26 lr 0.000632 wd 0.0500 time 0.4363 (0.4556) data time 0.0008 (0.0102) model time 0.0000 (0.0000) loss 3.0632 (2.9070) grad_norm 5.8115 (2.3619) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][50/625] eta 0:04:20 lr 0.000632 wd 0.0500 time 0.4399 (0.4530) data time 0.0008 (0.0084) model time 0.0000 (0.0000) loss 3.3240 (2.9321) grad_norm 3.2709 (2.3147) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:30:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][60/625] eta 0:04:15 lr 0.000632 wd 0.0500 time 0.4381 (0.4517) data time 0.0009 (0.0072) model time 0.4372 (0.4440) loss 2.9100 (2.9052) grad_norm 1.9092 (2.2321) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][70/625] eta 0:04:10 lr 0.000632 wd 0.0500 time 0.4433 (0.4507) data time 0.0007 (0.0063) model time 0.4427 (0.4439) loss 3.3858 (2.8985) grad_norm 1.5529 (2.1830) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][80/625] eta 0:04:05 lr 0.000632 wd 0.0500 time 0.4417 (0.4498) data time 0.0007 (0.0056) model time 0.4410 (0.4434) loss 3.0304 (2.9002) grad_norm 1.6000 (2.1173) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][90/625] eta 0:04:00 lr 0.000632 wd 0.0500 time 0.4447 (0.4491) data time 0.0009 (0.0051) model time 0.4438 (0.4432) loss 2.4652 (2.9023) grad_norm 1.2012 (2.0433) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][100/625] eta 0:03:55 lr 0.000632 wd 0.0500 time 0.4401 (0.4485) data time 0.0009 (0.0047) model time 0.4392 (0.4429) loss 2.7579 (2.9001) grad_norm 1.4226 (2.0615) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][110/625] eta 0:03:50 lr 0.000631 wd 0.0500 time 0.4466 (0.4480) data time 0.0006 (0.0044) model time 0.4460 (0.4428) loss 2.5652 (2.8901) grad_norm 1.1522 (2.0398) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][120/625] eta 0:03:46 lr 0.000631 wd 0.0500 time 0.4460 (0.4476) data time 0.0006 (0.0041) model time 0.4454 (0.4428) loss 2.2372 (2.9040) grad_norm 1.4766 (1.9889) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][130/625] eta 0:03:42 lr 0.000631 wd 0.0500 time 0.4429 (0.4489) data time 0.0008 (0.0038) model time 0.4421 (0.4453) loss 2.7085 (2.9152) grad_norm 1.7008 (1.9746) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][140/625] eta 0:03:37 lr 0.000631 wd 0.0500 time 0.4441 (0.4485) data time 0.0007 (0.0036) model time 0.4434 (0.4451) loss 2.2475 (2.9315) grad_norm 1.3496 (1.9484) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][150/625] eta 0:03:32 lr 0.000631 wd 0.0500 time 0.4487 (0.4483) data time 0.0006 (0.0034) model time 0.4481 (0.4450) loss 2.3705 (2.9377) grad_norm 2.2710 (1.9558) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][160/625] eta 0:03:28 lr 0.000631 wd 0.0500 time 0.4426 (0.4481) data time 0.0008 (0.0033) model time 0.4418 (0.4449) loss 3.1813 (2.9411) grad_norm 1.4106 (1.9331) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][170/625] eta 0:03:23 lr 0.000631 wd 0.0500 time 0.4422 (0.4479) data time 0.0008 (0.0031) model time 0.4414 (0.4448) loss 2.8528 (2.9488) grad_norm 1.4855 (1.9285) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][180/625] eta 0:03:19 lr 0.000631 wd 0.0500 time 0.4457 (0.4476) data time 0.0006 (0.0030) model time 0.4451 (0.4446) loss 3.1266 (2.9534) grad_norm 1.7155 (1.9453) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][190/625] eta 0:03:15 lr 0.000631 wd 0.0500 time 0.4353 (0.4493) data time 0.0010 (0.0029) model time 0.4342 (0.4470) loss 3.2464 (2.9582) grad_norm 1.8687 (1.9406) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:31:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][200/625] eta 0:03:10 lr 0.000631 wd 0.0500 time 0.4427 (0.4489) data time 0.0006 (0.0028) model time 0.4421 (0.4467) loss 3.5773 (2.9567) grad_norm 1.6744 (1.9279) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][210/625] eta 0:03:06 lr 0.000630 wd 0.0500 time 0.4415 (0.4486) data time 0.0008 (0.0027) model time 0.4407 (0.4464) loss 3.3832 (2.9542) grad_norm 1.9172 (1.9232) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][220/625] eta 0:03:01 lr 0.000630 wd 0.0500 time 0.4438 (0.4485) data time 0.0008 (0.0026) model time 0.4430 (0.4463) loss 3.2429 (2.9590) grad_norm 2.0405 (1.9213) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][230/625] eta 0:02:57 lr 0.000630 wd 0.0500 time 0.4400 (0.4485) data time 0.0009 (0.0025) model time 0.4392 (0.4464) loss 2.5181 (2.9555) grad_norm 1.3237 (1.9048) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][240/625] eta 0:02:52 lr 0.000630 wd 0.0500 time 0.4403 (0.4483) data time 0.0008 (0.0025) model time 0.4395 (0.4461) loss 3.4510 (2.9575) grad_norm 2.7903 (1.9403) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][250/625] eta 0:02:48 lr 0.000630 wd 0.0500 time 0.4446 (0.4481) data time 0.0010 (0.0024) model time 0.4436 (0.4460) loss 2.2153 (2.9536) grad_norm 1.5672 (1.9367) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][260/625] eta 0:02:43 lr 0.000630 wd 0.0500 time 0.4469 (0.4480) data time 0.0006 (0.0023) model time 0.4463 (0.4460) loss 2.1426 (2.9498) grad_norm 2.1015 (1.9238) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][270/625] eta 0:02:39 lr 0.000630 wd 0.0500 time 0.4534 (0.4479) data time 0.0008 (0.0023) model time 0.4526 (0.4458) loss 2.9716 (2.9486) grad_norm 1.6142 (1.9197) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][280/625] eta 0:02:34 lr 0.000630 wd 0.0500 time 0.4466 (0.4477) data time 0.0006 (0.0022) model time 0.4460 (0.4457) loss 2.3330 (2.9547) grad_norm 1.8213 (1.9134) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][290/625] eta 0:02:29 lr 0.000630 wd 0.0500 time 0.4428 (0.4476) data time 0.0006 (0.0022) model time 0.4422 (0.4456) loss 1.9992 (2.9498) grad_norm 1.6189 (1.9049) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][300/625] eta 0:02:25 lr 0.000629 wd 0.0500 time 0.4408 (0.4475) data time 0.0008 (0.0022) model time 0.4401 (0.4454) loss 2.8291 (2.9595) grad_norm 1.0517 (1.8882) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][310/625] eta 0:02:20 lr 0.000629 wd 0.0500 time 0.4442 (0.4473) data time 0.0009 (0.0021) model time 0.4432 (0.4454) loss 3.0084 (2.9628) grad_norm 1.5850 (1.8757) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][320/625] eta 0:02:16 lr 0.000629 wd 0.0500 time 0.4440 (0.4473) data time 0.0009 (0.0021) model time 0.4431 (0.4453) loss 3.1095 (2.9604) grad_norm 2.0709 (1.8756) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][330/625] eta 0:02:11 lr 0.000629 wd 0.0500 time 0.4423 (0.4471) data time 0.0009 (0.0020) model time 0.4414 (0.4452) loss 2.5996 (2.9532) grad_norm 1.8495 (1.8708) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][340/625] eta 0:02:07 lr 0.000629 wd 0.0500 time 0.4420 (0.4470) data time 0.0007 (0.0020) model time 0.4412 (0.4451) loss 3.3999 (2.9532) grad_norm 1.5602 (1.8662) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][350/625] eta 0:02:02 lr 0.000629 wd 0.0500 time 0.4447 (0.4469) data time 0.0008 (0.0020) model time 0.4439 (0.4450) loss 2.9427 (2.9516) grad_norm 2.0361 (1.8675) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][360/625] eta 0:01:58 lr 0.000629 wd 0.0500 time 0.4491 (0.4469) data time 0.0006 (0.0019) model time 0.4484 (0.4450) loss 2.9167 (2.9625) grad_norm 1.8281 (1.8705) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][370/625] eta 0:01:53 lr 0.000629 wd 0.0500 time 0.4458 (0.4468) data time 0.0008 (0.0019) model time 0.4449 (0.4449) loss 2.1060 (2.9659) grad_norm 1.5368 (1.8645) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][380/625] eta 0:01:49 lr 0.000629 wd 0.0500 time 0.4460 (0.4472) data time 0.0006 (0.0019) model time 0.4454 (0.4454) loss 2.9154 (2.9599) grad_norm 1.4230 (1.8594) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][390/625] eta 0:01:45 lr 0.000628 wd 0.0500 time 0.4486 (0.4471) data time 0.0008 (0.0019) model time 0.4478 (0.4454) loss 3.1309 (2.9638) grad_norm 1.5924 (1.8536) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][400/625] eta 0:01:40 lr 0.000628 wd 0.0500 time 0.4399 (0.4471) data time 0.0006 (0.0018) model time 0.4393 (0.4453) loss 3.7361 (2.9664) grad_norm 1.6638 (1.8633) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][410/625] eta 0:01:36 lr 0.000628 wd 0.0500 time 0.4441 (0.4481) data time 0.0007 (0.0018) model time 0.4434 (0.4465) loss 3.2560 (2.9703) grad_norm 3.3324 (1.8718) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][420/625] eta 0:01:31 lr 0.000628 wd 0.0500 time 0.4417 (0.4480) data time 0.0006 (0.0018) model time 0.4412 (0.4464) loss 2.0816 (2.9637) grad_norm 1.5374 (1.8804) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][430/625] eta 0:01:27 lr 0.000628 wd 0.0500 time 0.4416 (0.4479) data time 0.0006 (0.0018) model time 0.4410 (0.4464) loss 2.9139 (2.9622) grad_norm 2.0484 (1.8763) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][440/625] eta 0:01:22 lr 0.000628 wd 0.0500 time 0.4439 (0.4478) data time 0.0008 (0.0017) model time 0.4431 (0.4463) loss 2.7868 (2.9618) grad_norm 1.9989 (1.8684) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][450/625] eta 0:01:18 lr 0.000628 wd 0.0500 time 0.4418 (0.4477) data time 0.0008 (0.0017) model time 0.4410 (0.4462) loss 3.0546 (2.9662) grad_norm 2.2523 (1.8669) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][460/625] eta 0:01:13 lr 0.000628 wd 0.0500 time 0.4388 (0.4476) data time 0.0008 (0.0017) model time 0.4379 (0.4460) loss 2.8765 (2.9689) grad_norm 1.9921 (1.8628) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][470/625] eta 0:01:09 lr 0.000628 wd 0.0500 time 0.4420 (0.4479) data time 0.0006 (0.0017) model time 0.4414 (0.4464) loss 3.2950 (2.9705) grad_norm 1.5240 (1.8577) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][480/625] eta 0:01:04 lr 0.000628 wd 0.0500 time 0.4400 (0.4477) data time 0.0008 (0.0017) model time 0.4392 (0.4462) loss 2.9927 (2.9661) grad_norm 1.7179 (1.8543) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][490/625] eta 0:01:00 lr 0.000627 wd 0.0500 time 0.4420 (0.4476) data time 0.0006 (0.0017) model time 0.4414 (0.4461) loss 2.2556 (2.9604) grad_norm 1.0932 (1.8468) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][500/625] eta 0:00:55 lr 0.000627 wd 0.0500 time 0.4404 (0.4475) data time 0.0009 (0.0016) model time 0.4395 (0.4461) loss 3.1976 (2.9622) grad_norm 2.2062 (1.8505) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][510/625] eta 0:00:51 lr 0.000627 wd 0.0500 time 0.4409 (0.4474) data time 0.0006 (0.0016) model time 0.4403 (0.4460) loss 3.9506 (2.9587) grad_norm 1.7755 (1.8500) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][520/625] eta 0:00:46 lr 0.000627 wd 0.0500 time 0.4475 (0.4474) data time 0.0006 (0.0016) model time 0.4469 (0.4459) loss 3.2916 (2.9598) grad_norm 1.7364 (1.8436) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][530/625] eta 0:00:42 lr 0.000627 wd 0.0500 time 0.4429 (0.4473) data time 0.0007 (0.0016) model time 0.4422 (0.4458) loss 3.2663 (2.9583) grad_norm 1.8701 (1.8532) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][540/625] eta 0:00:38 lr 0.000627 wd 0.0500 time 0.4397 (0.4472) data time 0.0007 (0.0016) model time 0.4390 (0.4457) loss 2.9130 (2.9574) grad_norm 2.1482 (1.8508) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][550/625] eta 0:00:33 lr 0.000627 wd 0.0500 time 0.6138 (0.4481) data time 0.0009 (0.0016) model time 0.6129 (0.4467) loss 3.0739 (2.9556) grad_norm 1.3941 (1.8468) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][560/625] eta 0:00:29 lr 0.000627 wd 0.0500 time 0.4501 (0.4480) data time 0.0006 (0.0016) model time 0.4495 (0.4466) loss 3.0991 (2.9585) grad_norm 2.3485 (1.8490) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][570/625] eta 0:00:24 lr 0.000627 wd 0.0500 time 0.4425 (0.4479) data time 0.0007 (0.0016) model time 0.4418 (0.4466) loss 3.4925 (2.9539) grad_norm 2.0315 (1.8860) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][580/625] eta 0:00:20 lr 0.000626 wd 0.0500 time 0.4440 (0.4478) data time 0.0009 (0.0015) model time 0.4431 (0.4465) loss 3.0331 (2.9573) grad_norm 2.2924 (1.8858) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][590/625] eta 0:00:15 lr 0.000626 wd 0.0500 time 0.4430 (0.4478) data time 0.0009 (0.0015) model time 0.4421 (0.4464) loss 3.3291 (2.9561) grad_norm 1.2248 (1.8791) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:34:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][600/625] eta 0:00:11 lr 0.000626 wd 0.0500 time 0.4429 (0.4477) data time 0.0007 (0.0015) model time 0.4423 (0.4463) loss 2.7554 (2.9544) grad_norm 1.1901 (1.8734) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][610/625] eta 0:00:06 lr 0.000626 wd 0.0500 time 0.4389 (0.4476) data time 0.0004 (0.0015) model time 0.4385 (0.4463) loss 2.4304 (2.9544) grad_norm 1.4726 (1.8692) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [156/300][620/625] eta 0:00:02 lr 0.000626 wd 0.0500 time 0.4392 (0.4475) data time 0.0004 (0.0015) model time 0.4388 (0.4461) loss 2.5188 (2.9557) grad_norm 1.2851 (1.8608) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 156 training takes 0:04:39 [2024-08-10 14:35:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:35:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:35:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.495 (0.495) Loss 0.5264 (0.5264) Acc@1 88.818 (88.818) Acc@5 98.389 (98.389) Mem 16699MB [2024-08-10 14:35:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.121 (0.155) Loss 0.8643 (0.6463) Acc@1 79.541 (85.756) Acc@5 95.654 (97.430) Mem 16699MB [2024-08-10 14:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.137) Loss 0.9355 (0.7662) Acc@1 77.246 (82.743) Acc@5 95.117 (96.257) Mem 16699MB [2024-08-10 14:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.438 Acc@5 96.233 [2024-08-10 14:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 14:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.44% [2024-08-10 14:35:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 14:35:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 14:35:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.4739 (0.4739) Acc@1 89.160 (89.160) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:35:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.149) Loss 0.7622 (0.5935) Acc@1 82.080 (86.981) Acc@5 96.533 (97.860) Mem 16699MB [2024-08-10 14:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.8657 (0.6974) Acc@1 78.223 (84.131) Acc@5 95.654 (96.815) Mem 16699MB [2024-08-10 14:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.827 [2024-08-10 14:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.82% [2024-08-10 14:35:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:35:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:35:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][0/625] eta 0:08:32 lr 0.000626 wd 0.0500 time 0.8194 (0.8194) data time 0.4273 (0.4273) model time 0.0000 (0.0000) loss 3.4792 (3.4792) grad_norm 1.5301 (1.5301) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][10/625] eta 0:04:53 lr 0.000626 wd 0.0500 time 0.4432 (0.4766) data time 0.0009 (0.0397) model time 0.0000 (0.0000) loss 2.9911 (3.0098) grad_norm 1.7214 (1.7711) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][20/625] eta 0:04:38 lr 0.000626 wd 0.0500 time 0.4456 (0.4607) data time 0.0008 (0.0213) model time 0.0000 (0.0000) loss 2.7099 (2.8746) grad_norm 1.4620 (1.8875) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][30/625] eta 0:04:30 lr 0.000626 wd 0.0500 time 0.4496 (0.4553) data time 0.0007 (0.0147) model time 0.0000 (0.0000) loss 3.6154 (2.9168) grad_norm 1.9743 (1.8402) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][40/625] eta 0:04:24 lr 0.000626 wd 0.0500 time 0.4454 (0.4525) data time 0.0006 (0.0114) model time 0.0000 (0.0000) loss 2.4194 (2.8717) grad_norm 2.2691 (1.8132) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][50/625] eta 0:04:21 lr 0.000625 wd 0.0500 time 0.4426 (0.4547) data time 0.0006 (0.0093) model time 0.0000 (0.0000) loss 3.6091 (2.8247) grad_norm 1.9222 (1.8115) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][60/625] eta 0:04:15 lr 0.000625 wd 0.0500 time 0.4443 (0.4528) data time 0.0009 (0.0079) model time 0.4434 (0.4426) loss 3.1522 (2.8338) grad_norm 2.0796 (1.8770) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][70/625] eta 0:04:10 lr 0.000625 wd 0.0500 time 0.4421 (0.4514) data time 0.0007 (0.0070) model time 0.4413 (0.4420) loss 2.5201 (2.8264) grad_norm 1.2985 (1.8821) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:35:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][80/625] eta 0:04:05 lr 0.000625 wd 0.0500 time 0.4450 (0.4502) data time 0.0006 (0.0062) model time 0.4444 (0.4417) loss 2.7270 (2.8226) grad_norm 1.1810 (1.8469) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:36:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][90/625] eta 0:04:00 lr 0.000625 wd 0.0500 time 0.4595 (0.4497) data time 0.0009 (0.0056) model time 0.4587 (0.4424) loss 2.3404 (2.8321) grad_norm 1.2829 (1.7970) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:36:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][100/625] eta 0:03:56 lr 0.000625 wd 0.0500 time 0.4394 (0.4496) data time 0.0007 (0.0052) model time 0.4387 (0.4434) loss 2.5933 (2.8304) grad_norm 1.8198 (1.7723) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:36:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][110/625] eta 0:03:51 lr 0.000625 wd 0.0500 time 0.4430 (0.4489) data time 0.0007 (0.0048) model time 0.4423 (0.4430) loss 3.2660 (2.8471) grad_norm 1.9734 (inf) loss_scale 256.0000 (505.0811) mem 16699MB [2024-08-10 14:36:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][120/625] eta 0:03:48 lr 0.000625 wd 0.0500 time 0.4436 (0.4516) data time 0.0009 (0.0045) model time 0.4427 (0.4484) loss 2.8567 (2.8698) grad_norm 2.5635 (inf) loss_scale 256.0000 (484.4959) mem 16699MB [2024-08-10 14:36:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][130/625] eta 0:03:43 lr 0.000625 wd 0.0500 time 0.4433 (0.4510) data time 0.0009 (0.0042) model time 0.4424 (0.4478) loss 2.3860 (2.8701) grad_norm 1.2620 (inf) loss_scale 256.0000 (467.0534) mem 16699MB [2024-08-10 14:36:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][140/625] eta 0:03:38 lr 0.000624 wd 0.0500 time 0.4636 (0.4505) data time 0.0009 (0.0040) model time 0.4628 (0.4473) loss 3.2413 (2.8735) grad_norm 1.2141 (inf) loss_scale 256.0000 (452.0851) mem 16699MB [2024-08-10 14:36:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][150/625] eta 0:03:33 lr 0.000624 wd 0.0500 time 0.4362 (0.4500) data time 0.0008 (0.0038) model time 0.4354 (0.4467) loss 3.2217 (2.8790) grad_norm 1.1947 (inf) loss_scale 256.0000 (439.0993) mem 16699MB [2024-08-10 14:36:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][160/625] eta 0:03:29 lr 0.000624 wd 0.0500 time 0.4422 (0.4506) data time 0.0009 (0.0036) model time 0.4413 (0.4478) loss 3.3681 (2.8822) grad_norm 1.4644 (inf) loss_scale 256.0000 (427.7267) mem 16699MB [2024-08-10 14:36:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][170/625] eta 0:03:24 lr 0.000624 wd 0.0500 time 0.4415 (0.4503) data time 0.0009 (0.0034) model time 0.4406 (0.4475) loss 2.5950 (2.8875) grad_norm 1.7648 (inf) loss_scale 256.0000 (417.6842) mem 16699MB [2024-08-10 14:36:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][180/625] eta 0:03:20 lr 0.000624 wd 0.0500 time 0.4425 (0.4503) data time 0.0007 (0.0033) model time 0.4418 (0.4477) loss 3.2145 (2.8907) grad_norm 1.6828 (inf) loss_scale 256.0000 (408.7514) mem 16699MB [2024-08-10 14:36:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][190/625] eta 0:03:15 lr 0.000624 wd 0.0500 time 0.4511 (0.4501) data time 0.0009 (0.0032) model time 0.4502 (0.4475) loss 3.4863 (2.8933) grad_norm 2.7446 (inf) loss_scale 256.0000 (400.7539) mem 16699MB [2024-08-10 14:36:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][200/625] eta 0:03:11 lr 0.000624 wd 0.0500 time 0.4387 (0.4497) data time 0.0006 (0.0031) model time 0.4381 (0.4471) loss 3.0393 (2.8896) grad_norm 1.3700 (inf) loss_scale 256.0000 (393.5522) mem 16699MB [2024-08-10 14:36:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][210/625] eta 0:03:06 lr 0.000624 wd 0.0500 time 0.4417 (0.4496) data time 0.0009 (0.0029) model time 0.4408 (0.4470) loss 2.5379 (2.8912) grad_norm 1.1176 (inf) loss_scale 256.0000 (387.0332) mem 16699MB [2024-08-10 14:36:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][220/625] eta 0:03:01 lr 0.000624 wd 0.0500 time 0.4532 (0.4493) data time 0.0007 (0.0029) model time 0.4525 (0.4468) loss 3.1884 (2.8954) grad_norm 1.2299 (inf) loss_scale 256.0000 (381.1041) mem 16699MB [2024-08-10 14:37:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][230/625] eta 0:02:57 lr 0.000624 wd 0.0500 time 0.4395 (0.4495) data time 0.0007 (0.0028) model time 0.4388 (0.4471) loss 3.3192 (2.9028) grad_norm 1.6452 (inf) loss_scale 256.0000 (375.6883) mem 16699MB [2024-08-10 14:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][240/625] eta 0:02:53 lr 0.000623 wd 0.0500 time 0.4417 (0.4499) data time 0.0007 (0.0027) model time 0.4410 (0.4477) loss 3.0276 (2.9075) grad_norm 1.8756 (inf) loss_scale 256.0000 (370.7220) mem 16699MB [2024-08-10 14:37:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][250/625] eta 0:02:48 lr 0.000623 wd 0.0500 time 0.4408 (0.4496) data time 0.0007 (0.0026) model time 0.4402 (0.4474) loss 3.5542 (2.9054) grad_norm 1.3291 (inf) loss_scale 256.0000 (366.1514) mem 16699MB [2024-08-10 14:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][260/625] eta 0:02:44 lr 0.000623 wd 0.0500 time 0.4427 (0.4494) data time 0.0007 (0.0026) model time 0.4419 (0.4472) loss 2.4961 (2.9022) grad_norm 1.7563 (inf) loss_scale 256.0000 (361.9310) mem 16699MB [2024-08-10 14:37:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][270/625] eta 0:02:39 lr 0.000623 wd 0.0500 time 0.4396 (0.4491) data time 0.0008 (0.0025) model time 0.4388 (0.4469) loss 2.9798 (2.9041) grad_norm 1.5405 (inf) loss_scale 256.0000 (358.0221) mem 16699MB [2024-08-10 14:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][280/625] eta 0:02:34 lr 0.000623 wd 0.0500 time 0.4401 (0.4488) data time 0.0009 (0.0025) model time 0.4392 (0.4466) loss 3.4847 (2.9026) grad_norm 1.2614 (inf) loss_scale 256.0000 (354.3915) mem 16699MB [2024-08-10 14:37:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][290/625] eta 0:02:30 lr 0.000623 wd 0.0500 time 0.4389 (0.4485) data time 0.0009 (0.0024) model time 0.4380 (0.4463) loss 3.1729 (2.9029) grad_norm 2.0536 (inf) loss_scale 256.0000 (351.0103) mem 16699MB [2024-08-10 14:37:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][300/625] eta 0:02:25 lr 0.000623 wd 0.0500 time 0.4448 (0.4483) data time 0.0008 (0.0023) model time 0.4439 (0.4461) loss 3.0440 (2.8997) grad_norm 1.5305 (inf) loss_scale 256.0000 (347.8538) mem 16699MB [2024-08-10 14:37:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][310/625] eta 0:02:21 lr 0.000623 wd 0.0500 time 0.4434 (0.4482) data time 0.0008 (0.0023) model time 0.4426 (0.4460) loss 3.0892 (2.8944) grad_norm 1.6860 (inf) loss_scale 256.0000 (344.9003) mem 16699MB [2024-08-10 14:37:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][320/625] eta 0:02:16 lr 0.000623 wd 0.0500 time 0.4435 (0.4481) data time 0.0009 (0.0023) model time 0.4426 (0.4459) loss 3.3668 (2.9001) grad_norm 1.5196 (inf) loss_scale 256.0000 (342.1308) mem 16699MB [2024-08-10 14:37:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][330/625] eta 0:02:12 lr 0.000622 wd 0.0500 time 0.4445 (0.4480) data time 0.0009 (0.0022) model time 0.4436 (0.4458) loss 3.0340 (2.9030) grad_norm 1.3159 (inf) loss_scale 256.0000 (339.5287) mem 16699MB [2024-08-10 14:37:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][340/625] eta 0:02:07 lr 0.000622 wd 0.0500 time 0.4407 (0.4478) data time 0.0006 (0.0022) model time 0.4401 (0.4457) loss 3.1385 (2.9055) grad_norm 1.2547 (inf) loss_scale 256.0000 (337.0792) mem 16699MB [2024-08-10 14:37:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][350/625] eta 0:02:03 lr 0.000622 wd 0.0500 time 0.4478 (0.4477) data time 0.0008 (0.0022) model time 0.4469 (0.4456) loss 1.6509 (2.9066) grad_norm 1.4992 (inf) loss_scale 256.0000 (334.7692) mem 16699MB [2024-08-10 14:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][360/625] eta 0:01:58 lr 0.000622 wd 0.0500 time 0.4414 (0.4476) data time 0.0007 (0.0021) model time 0.4407 (0.4455) loss 3.4794 (2.9111) grad_norm 1.3581 (inf) loss_scale 256.0000 (332.5873) mem 16699MB [2024-08-10 14:38:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][370/625] eta 0:01:54 lr 0.000622 wd 0.0500 time 0.4397 (0.4474) data time 0.0008 (0.0021) model time 0.4389 (0.4453) loss 3.1374 (2.9126) grad_norm 1.6439 (inf) loss_scale 256.0000 (330.5229) mem 16699MB [2024-08-10 14:38:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][380/625] eta 0:01:49 lr 0.000622 wd 0.0500 time 0.3890 (0.4474) data time 0.0009 (0.0021) model time 0.3882 (0.4454) loss 3.1282 (2.9049) grad_norm 1.8132 (inf) loss_scale 256.0000 (328.5669) mem 16699MB [2024-08-10 14:38:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][390/625] eta 0:01:45 lr 0.000622 wd 0.0500 time 0.4435 (0.4474) data time 0.0006 (0.0020) model time 0.4428 (0.4453) loss 1.9826 (2.9081) grad_norm 1.5806 (inf) loss_scale 256.0000 (326.7110) mem 16699MB [2024-08-10 14:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][400/625] eta 0:01:40 lr 0.000622 wd 0.0500 time 0.4436 (0.4474) data time 0.0008 (0.0020) model time 0.4428 (0.4454) loss 2.4644 (2.9016) grad_norm 2.1953 (inf) loss_scale 256.0000 (324.9476) mem 16699MB [2024-08-10 14:38:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][410/625] eta 0:01:36 lr 0.000622 wd 0.0500 time 0.4402 (0.4473) data time 0.0008 (0.0020) model time 0.4394 (0.4453) loss 2.4044 (2.9007) grad_norm 1.6304 (inf) loss_scale 256.0000 (323.2701) mem 16699MB [2024-08-10 14:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][420/625] eta 0:01:31 lr 0.000622 wd 0.0500 time 0.4400 (0.4472) data time 0.0010 (0.0019) model time 0.4390 (0.4452) loss 2.6770 (2.8996) grad_norm 1.4106 (inf) loss_scale 256.0000 (321.6722) mem 16699MB [2024-08-10 14:38:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][430/625] eta 0:01:27 lr 0.000621 wd 0.0500 time 0.4412 (0.4471) data time 0.0006 (0.0019) model time 0.4406 (0.4451) loss 3.7657 (2.9042) grad_norm 1.7293 (inf) loss_scale 256.0000 (320.1485) mem 16699MB [2024-08-10 14:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][440/625] eta 0:01:22 lr 0.000621 wd 0.0500 time 0.4396 (0.4470) data time 0.0010 (0.0019) model time 0.4387 (0.4450) loss 2.9237 (2.9043) grad_norm 1.2856 (inf) loss_scale 256.0000 (318.6939) mem 16699MB [2024-08-10 14:38:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][450/625] eta 0:01:18 lr 0.000621 wd 0.0500 time 0.4415 (0.4477) data time 0.0006 (0.0019) model time 0.4409 (0.4459) loss 2.3469 (2.9067) grad_norm 1.6597 (inf) loss_scale 256.0000 (317.3038) mem 16699MB [2024-08-10 14:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][460/625] eta 0:01:13 lr 0.000621 wd 0.0500 time 0.4440 (0.4481) data time 0.0007 (0.0019) model time 0.4433 (0.4463) loss 2.8428 (2.9122) grad_norm 1.1463 (inf) loss_scale 256.0000 (315.9740) mem 16699MB [2024-08-10 14:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][470/625] eta 0:01:09 lr 0.000621 wd 0.0500 time 0.4409 (0.4479) data time 0.0009 (0.0018) model time 0.4400 (0.4462) loss 2.0311 (2.9086) grad_norm 2.3918 (inf) loss_scale 256.0000 (314.7006) mem 16699MB [2024-08-10 14:38:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][480/625] eta 0:01:04 lr 0.000621 wd 0.0500 time 0.4409 (0.4478) data time 0.0006 (0.0018) model time 0.4403 (0.4461) loss 2.9716 (2.9096) grad_norm 1.5499 (inf) loss_scale 256.0000 (313.4802) mem 16699MB [2024-08-10 14:38:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][490/625] eta 0:01:00 lr 0.000621 wd 0.0500 time 0.4450 (0.4477) data time 0.0006 (0.0018) model time 0.4444 (0.4460) loss 3.2461 (2.9152) grad_norm 1.6529 (inf) loss_scale 256.0000 (312.3096) mem 16699MB [2024-08-10 14:39:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][500/625] eta 0:00:55 lr 0.000621 wd 0.0500 time 0.4424 (0.4476) data time 0.0009 (0.0018) model time 0.4415 (0.4459) loss 3.1441 (2.9206) grad_norm 1.4351 (inf) loss_scale 256.0000 (311.1856) mem 16699MB [2024-08-10 14:39:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][510/625] eta 0:00:51 lr 0.000621 wd 0.0500 time 0.4389 (0.4475) data time 0.0006 (0.0018) model time 0.4383 (0.4458) loss 3.1947 (2.9222) grad_norm 2.9107 (inf) loss_scale 256.0000 (310.1057) mem 16699MB [2024-08-10 14:39:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][520/625] eta 0:00:46 lr 0.000620 wd 0.0500 time 0.4409 (0.4474) data time 0.0009 (0.0018) model time 0.4401 (0.4457) loss 3.3007 (2.9211) grad_norm 2.0997 (inf) loss_scale 256.0000 (309.0672) mem 16699MB [2024-08-10 14:39:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][530/625] eta 0:00:42 lr 0.000620 wd 0.0500 time 0.4420 (0.4477) data time 0.0006 (0.0017) model time 0.4414 (0.4460) loss 2.5465 (2.9226) grad_norm 1.8443 (inf) loss_scale 256.0000 (308.0678) mem 16699MB [2024-08-10 14:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][540/625] eta 0:00:38 lr 0.000620 wd 0.0500 time 0.4606 (0.4476) data time 0.0008 (0.0017) model time 0.4597 (0.4460) loss 2.8910 (2.9210) grad_norm 1.4140 (inf) loss_scale 256.0000 (307.1054) mem 16699MB [2024-08-10 14:39:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][550/625] eta 0:00:33 lr 0.000620 wd 0.0500 time 0.4411 (0.4476) data time 0.0006 (0.0017) model time 0.4405 (0.4459) loss 2.7479 (2.9259) grad_norm 1.6260 (inf) loss_scale 256.0000 (306.1779) mem 16699MB [2024-08-10 14:39:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][560/625] eta 0:00:29 lr 0.000620 wd 0.0500 time 0.4418 (0.4475) data time 0.0007 (0.0017) model time 0.4411 (0.4458) loss 2.2634 (2.9225) grad_norm 1.3935 (inf) loss_scale 256.0000 (305.2834) mem 16699MB [2024-08-10 14:39:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][570/625] eta 0:00:24 lr 0.000620 wd 0.0500 time 0.4410 (0.4474) data time 0.0008 (0.0017) model time 0.4402 (0.4457) loss 3.5209 (2.9223) grad_norm 2.0708 (inf) loss_scale 256.0000 (304.4203) mem 16699MB [2024-08-10 14:39:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][580/625] eta 0:00:20 lr 0.000620 wd 0.0500 time 0.4448 (0.4473) data time 0.0007 (0.0017) model time 0.4442 (0.4457) loss 2.1807 (2.9230) grad_norm 1.1800 (inf) loss_scale 256.0000 (303.5869) mem 16699MB [2024-08-10 14:39:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][590/625] eta 0:00:15 lr 0.000620 wd 0.0500 time 0.4417 (0.4472) data time 0.0006 (0.0016) model time 0.4411 (0.4456) loss 3.0365 (2.9214) grad_norm 1.9045 (inf) loss_scale 256.0000 (302.7817) mem 16699MB [2024-08-10 14:39:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][600/625] eta 0:00:11 lr 0.000620 wd 0.0500 time 0.4424 (0.4472) data time 0.0008 (0.0016) model time 0.4416 (0.4456) loss 3.1828 (2.9213) grad_norm 1.2945 (inf) loss_scale 256.0000 (302.0033) mem 16699MB [2024-08-10 14:39:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][610/625] eta 0:00:06 lr 0.000619 wd 0.0500 time 0.4410 (0.4474) data time 0.0004 (0.0016) model time 0.4406 (0.4458) loss 3.1025 (2.9239) grad_norm 1.7128 (inf) loss_scale 256.0000 (301.2504) mem 16699MB [2024-08-10 14:39:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [157/300][620/625] eta 0:00:02 lr 0.000619 wd 0.0500 time 0.4363 (0.4473) data time 0.0004 (0.0016) model time 0.4358 (0.4457) loss 2.3821 (2.9223) grad_norm 1.3896 (inf) loss_scale 256.0000 (300.5217) mem 16699MB [2024-08-10 14:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 157 training takes 0:04:39 [2024-08-10 14:39:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:40:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5322 (0.5322) Acc@1 87.695 (87.695) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 14:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8345 (0.6527) Acc@1 80.078 (85.352) Acc@5 95.508 (97.488) Mem 16699MB [2024-08-10 14:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.8901 (0.7654) Acc@1 78.320 (82.589) Acc@5 95.312 (96.289) Mem 16699MB [2024-08-10 14:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.196 Acc@5 96.239 [2024-08-10 14:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-10 14:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.814 (0.814) Loss 0.4724 (0.4724) Acc@1 89.307 (89.307) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:40:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.185) Loss 0.7622 (0.5929) Acc@1 82.129 (86.998) Acc@5 96.533 (97.865) Mem 16699MB [2024-08-10 14:40:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.152) Loss 0.8647 (0.6968) Acc@1 78.125 (84.133) Acc@5 95.703 (96.819) Mem 16699MB [2024-08-10 14:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.817 Acc@5 96.833 [2024-08-10 14:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][0/625] eta 0:12:52 lr 0.000619 wd 0.0500 time 1.2358 (1.2358) data time 0.8327 (0.8327) model time 0.0000 (0.0000) loss 3.0148 (3.0148) grad_norm 1.4613 (1.4613) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][10/625] eta 0:05:16 lr 0.000619 wd 0.0500 time 0.4432 (0.5145) data time 0.0010 (0.0765) model time 0.0000 (0.0000) loss 3.0990 (2.8601) grad_norm 0.9582 (1.7310) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][20/625] eta 0:04:56 lr 0.000619 wd 0.0500 time 0.4396 (0.4904) data time 0.0006 (0.0405) model time 0.0000 (0.0000) loss 3.6241 (2.9460) grad_norm 1.2119 (1.6828) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][30/625] eta 0:04:46 lr 0.000619 wd 0.0500 time 0.4433 (0.4810) data time 0.0008 (0.0277) model time 0.0000 (0.0000) loss 3.1929 (2.8609) grad_norm 1.4960 (1.6067) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][40/625] eta 0:04:35 lr 0.000619 wd 0.0500 time 0.4402 (0.4716) data time 0.0007 (0.0212) model time 0.0000 (0.0000) loss 1.8318 (2.8558) grad_norm 1.3870 (1.6003) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][50/625] eta 0:04:28 lr 0.000619 wd 0.0500 time 0.4439 (0.4662) data time 0.0007 (0.0173) model time 0.0000 (0.0000) loss 3.5204 (2.8633) grad_norm 2.2309 (1.6898) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][60/625] eta 0:04:21 lr 0.000619 wd 0.0500 time 0.4457 (0.4628) data time 0.0008 (0.0146) model time 0.4448 (0.4444) loss 1.6763 (2.8458) grad_norm 1.1305 (1.7526) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][70/625] eta 0:04:15 lr 0.000619 wd 0.0500 time 0.4408 (0.4601) data time 0.0006 (0.0127) model time 0.4401 (0.4437) loss 2.4745 (2.8689) grad_norm 1.3815 (1.7310) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][80/625] eta 0:04:09 lr 0.000618 wd 0.0500 time 0.4551 (0.4582) data time 0.0006 (0.0112) model time 0.4545 (0.4438) loss 3.9612 (2.8661) grad_norm 1.7062 (1.7235) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][90/625] eta 0:04:04 lr 0.000618 wd 0.0500 time 0.4427 (0.4566) data time 0.0009 (0.0101) model time 0.4418 (0.4434) loss 2.6439 (2.9112) grad_norm 1.6471 (1.7233) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][100/625] eta 0:03:58 lr 0.000618 wd 0.0500 time 0.4428 (0.4551) data time 0.0009 (0.0092) model time 0.4419 (0.4430) loss 2.9608 (2.8934) grad_norm 1.6504 (1.7021) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][110/625] eta 0:03:54 lr 0.000618 wd 0.0500 time 0.4438 (0.4546) data time 0.0008 (0.0084) model time 0.4430 (0.4439) loss 2.0959 (2.8831) grad_norm 1.4732 (1.7374) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][120/625] eta 0:03:49 lr 0.000618 wd 0.0500 time 0.4454 (0.4538) data time 0.0009 (0.0078) model time 0.4446 (0.4438) loss 3.5654 (2.8914) grad_norm 2.3985 (1.7436) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][130/625] eta 0:03:44 lr 0.000618 wd 0.0500 time 0.4419 (0.4530) data time 0.0007 (0.0073) model time 0.4412 (0.4437) loss 2.4014 (2.8868) grad_norm 1.7760 (1.7773) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][140/625] eta 0:03:39 lr 0.000618 wd 0.0500 time 0.4421 (0.4523) data time 0.0008 (0.0068) model time 0.4413 (0.4435) loss 2.8974 (2.8801) grad_norm 1.3896 (1.7685) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][150/625] eta 0:03:34 lr 0.000618 wd 0.0500 time 0.4488 (0.4517) data time 0.0008 (0.0064) model time 0.4479 (0.4434) loss 2.9648 (2.8881) grad_norm 1.4969 (1.7439) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][160/625] eta 0:03:29 lr 0.000618 wd 0.0500 time 0.4394 (0.4510) data time 0.0006 (0.0061) model time 0.4388 (0.4431) loss 2.9642 (2.8937) grad_norm 1.9115 (1.7511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][170/625] eta 0:03:24 lr 0.000618 wd 0.0500 time 0.4411 (0.4505) data time 0.0009 (0.0058) model time 0.4402 (0.4429) loss 3.0622 (2.9012) grad_norm 2.9170 (1.7577) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][180/625] eta 0:03:20 lr 0.000617 wd 0.0500 time 0.4446 (0.4502) data time 0.0008 (0.0055) model time 0.4438 (0.4430) loss 2.8620 (2.9048) grad_norm 1.8487 (1.7587) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][190/625] eta 0:03:15 lr 0.000617 wd 0.0500 time 0.4455 (0.4499) data time 0.0009 (0.0053) model time 0.4446 (0.4431) loss 2.9374 (2.9061) grad_norm 1.7967 (1.7446) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][200/625] eta 0:03:11 lr 0.000617 wd 0.0500 time 0.4431 (0.4496) data time 0.0008 (0.0050) model time 0.4423 (0.4431) loss 3.2509 (2.9109) grad_norm 2.2016 (1.7532) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][210/625] eta 0:03:06 lr 0.000617 wd 0.0500 time 0.4487 (0.4503) data time 0.0009 (0.0049) model time 0.4479 (0.4443) loss 2.8617 (2.9186) grad_norm 2.1992 (1.7691) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][220/625] eta 0:03:02 lr 0.000617 wd 0.0500 time 0.4397 (0.4503) data time 0.0006 (0.0047) model time 0.4391 (0.4447) loss 3.5727 (2.9279) grad_norm 1.3975 (1.7627) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][230/625] eta 0:02:57 lr 0.000617 wd 0.0500 time 0.4410 (0.4500) data time 0.0006 (0.0045) model time 0.4404 (0.4445) loss 3.4227 (2.9263) grad_norm 1.3908 (1.7508) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][240/625] eta 0:02:53 lr 0.000617 wd 0.0500 time 0.4445 (0.4497) data time 0.0007 (0.0044) model time 0.4438 (0.4444) loss 2.5750 (2.9230) grad_norm 1.6442 (1.7561) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][250/625] eta 0:02:48 lr 0.000617 wd 0.0500 time 0.4429 (0.4494) data time 0.0007 (0.0042) model time 0.4422 (0.4442) loss 2.5569 (2.9160) grad_norm 1.4393 (1.7611) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][260/625] eta 0:02:43 lr 0.000617 wd 0.0500 time 0.4433 (0.4492) data time 0.0007 (0.0041) model time 0.4426 (0.4442) loss 1.7583 (2.9127) grad_norm 1.3315 (1.7646) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][270/625] eta 0:02:39 lr 0.000616 wd 0.0500 time 0.4536 (0.4490) data time 0.0006 (0.0040) model time 0.4530 (0.4441) loss 3.3522 (2.9182) grad_norm 1.5445 (1.7655) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][280/625] eta 0:02:34 lr 0.000616 wd 0.0500 time 0.4399 (0.4488) data time 0.0010 (0.0039) model time 0.4389 (0.4440) loss 3.3881 (2.9263) grad_norm 2.0305 (1.7617) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][290/625] eta 0:02:30 lr 0.000616 wd 0.0500 time 0.4452 (0.4487) data time 0.0007 (0.0038) model time 0.4445 (0.4441) loss 2.3891 (2.9265) grad_norm 1.3449 (1.7538) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][300/625] eta 0:02:25 lr 0.000616 wd 0.0500 time 0.4409 (0.4485) data time 0.0008 (0.0037) model time 0.4401 (0.4440) loss 3.4634 (2.9225) grad_norm 1.8426 (1.7465) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][310/625] eta 0:02:21 lr 0.000616 wd 0.0500 time 0.4393 (0.4483) data time 0.0007 (0.0036) model time 0.4387 (0.4439) loss 2.6656 (2.9251) grad_norm 1.3461 (1.7399) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][320/625] eta 0:02:16 lr 0.000616 wd 0.0500 time 0.4406 (0.4482) data time 0.0008 (0.0035) model time 0.4398 (0.4438) loss 3.2581 (2.9346) grad_norm 1.9312 (1.7344) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][330/625] eta 0:02:12 lr 0.000616 wd 0.0500 time 0.4494 (0.4485) data time 0.0009 (0.0034) model time 0.4485 (0.4444) loss 2.9882 (2.9296) grad_norm 2.2110 (1.7362) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][340/625] eta 0:02:07 lr 0.000616 wd 0.0500 time 0.4419 (0.4484) data time 0.0007 (0.0033) model time 0.4412 (0.4443) loss 3.4095 (2.9324) grad_norm 1.7581 (1.7283) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][350/625] eta 0:02:03 lr 0.000616 wd 0.0500 time 0.4436 (0.4482) data time 0.0007 (0.0033) model time 0.4429 (0.4443) loss 3.3319 (2.9377) grad_norm 1.8380 (1.7303) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][360/625] eta 0:01:58 lr 0.000615 wd 0.0500 time 0.4428 (0.4486) data time 0.0007 (0.0032) model time 0.4422 (0.4449) loss 3.2251 (2.9466) grad_norm 1.2363 (1.7311) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][370/625] eta 0:01:54 lr 0.000615 wd 0.0500 time 0.4387 (0.4485) data time 0.0009 (0.0032) model time 0.4378 (0.4447) loss 3.0030 (2.9484) grad_norm 2.0232 (1.7341) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:42:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][380/625] eta 0:01:49 lr 0.000615 wd 0.0500 time 0.4443 (0.4484) data time 0.0008 (0.0031) model time 0.4435 (0.4447) loss 2.8102 (2.9441) grad_norm 1.7523 (1.7401) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][390/625] eta 0:01:45 lr 0.000615 wd 0.0500 time 0.4415 (0.4482) data time 0.0009 (0.0030) model time 0.4406 (0.4446) loss 2.6404 (2.9437) grad_norm 2.0612 (1.7397) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][400/625] eta 0:01:40 lr 0.000615 wd 0.0500 time 0.4431 (0.4481) data time 0.0009 (0.0030) model time 0.4423 (0.4446) loss 2.9759 (2.9453) grad_norm 1.3662 (1.7367) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][410/625] eta 0:01:36 lr 0.000615 wd 0.0500 time 0.4455 (0.4480) data time 0.0007 (0.0029) model time 0.4448 (0.4445) loss 2.3926 (2.9443) grad_norm 2.3239 (1.7443) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][420/625] eta 0:01:31 lr 0.000615 wd 0.0500 time 0.4331 (0.4479) data time 0.0007 (0.0029) model time 0.4324 (0.4445) loss 3.5617 (2.9430) grad_norm 1.2722 (1.7451) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][430/625] eta 0:01:27 lr 0.000615 wd 0.0500 time 0.4432 (0.4483) data time 0.0009 (0.0028) model time 0.4423 (0.4450) loss 3.0338 (2.9431) grad_norm 1.7376 (1.7452) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][440/625] eta 0:01:22 lr 0.000615 wd 0.0500 time 0.4439 (0.4486) data time 0.0009 (0.0028) model time 0.4430 (0.4454) loss 2.6466 (2.9352) grad_norm 1.7065 (1.7376) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][450/625] eta 0:01:18 lr 0.000615 wd 0.0500 time 0.4426 (0.4485) data time 0.0006 (0.0028) model time 0.4420 (0.4453) loss 3.6517 (2.9321) grad_norm 1.9866 (1.7374) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][460/625] eta 0:01:13 lr 0.000614 wd 0.0500 time 0.4435 (0.4484) data time 0.0009 (0.0027) model time 0.4427 (0.4453) loss 3.4859 (2.9347) grad_norm 1.0986 (1.7461) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][470/625] eta 0:01:09 lr 0.000614 wd 0.0500 time 0.4406 (0.4483) data time 0.0010 (0.0027) model time 0.4397 (0.4452) loss 3.0198 (2.9327) grad_norm 1.3939 (1.7483) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][480/625] eta 0:01:05 lr 0.000614 wd 0.0500 time 0.4403 (0.4485) data time 0.0009 (0.0026) model time 0.4394 (0.4455) loss 2.5618 (2.9289) grad_norm 1.6101 (1.7606) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][490/625] eta 0:01:00 lr 0.000614 wd 0.0500 time 0.4395 (0.4484) data time 0.0009 (0.0026) model time 0.4386 (0.4454) loss 3.5212 (2.9308) grad_norm 1.9236 (1.7687) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][500/625] eta 0:00:56 lr 0.000614 wd 0.0500 time 0.4423 (0.4483) data time 0.0006 (0.0026) model time 0.4417 (0.4454) loss 3.4151 (2.9322) grad_norm 2.5807 (1.7737) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:43:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][510/625] eta 0:00:51 lr 0.000614 wd 0.0500 time 0.4413 (0.4482) data time 0.0008 (0.0025) model time 0.4405 (0.4453) loss 2.8846 (2.9298) grad_norm 1.5265 (1.7695) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][520/625] eta 0:00:47 lr 0.000614 wd 0.0500 time 0.4412 (0.4481) data time 0.0010 (0.0025) model time 0.4402 (0.4452) loss 2.0132 (2.9335) grad_norm 1.3336 (1.7651) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][530/625] eta 0:00:42 lr 0.000614 wd 0.0500 time 0.4364 (0.4480) data time 0.0008 (0.0025) model time 0.4355 (0.4452) loss 2.8265 (2.9307) grad_norm 2.5289 (1.7865) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][540/625] eta 0:00:38 lr 0.000614 wd 0.0500 time 0.4439 (0.4479) data time 0.0009 (0.0024) model time 0.4430 (0.4451) loss 2.6327 (2.9299) grad_norm 1.0968 (1.7847) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][550/625] eta 0:00:33 lr 0.000613 wd 0.0500 time 0.4439 (0.4478) data time 0.0008 (0.0024) model time 0.4431 (0.4450) loss 3.3149 (2.9326) grad_norm 1.9860 (1.7846) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][560/625] eta 0:00:29 lr 0.000613 wd 0.0500 time 0.4417 (0.4477) data time 0.0009 (0.0024) model time 0.4408 (0.4449) loss 3.2138 (2.9331) grad_norm 1.8191 (1.7824) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][570/625] eta 0:00:24 lr 0.000613 wd 0.0500 time 0.4410 (0.4477) data time 0.0006 (0.0024) model time 0.4404 (0.4449) loss 2.2339 (2.9312) grad_norm 2.5661 (1.7856) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][580/625] eta 0:00:20 lr 0.000613 wd 0.0500 time 0.6301 (0.4483) data time 0.0006 (0.0023) model time 0.6295 (0.4457) loss 2.3168 (2.9252) grad_norm 2.5302 (1.8165) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][590/625] eta 0:00:15 lr 0.000613 wd 0.0500 time 0.4398 (0.4482) data time 0.0006 (0.0023) model time 0.4392 (0.4456) loss 3.5408 (2.9274) grad_norm 1.2564 (1.8131) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][600/625] eta 0:00:11 lr 0.000613 wd 0.0500 time 0.4434 (0.4482) data time 0.0009 (0.0023) model time 0.4425 (0.4456) loss 2.5296 (2.9272) grad_norm 1.6684 (1.8071) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][610/625] eta 0:00:06 lr 0.000613 wd 0.0500 time 0.4391 (0.4481) data time 0.0004 (0.0023) model time 0.4387 (0.4455) loss 3.3191 (2.9234) grad_norm 1.3385 (1.8043) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [158/300][620/625] eta 0:00:02 lr 0.000613 wd 0.0500 time 0.4402 (0.4479) data time 0.0007 (0.0023) model time 0.4395 (0.4454) loss 3.2392 (2.9249) grad_norm 2.9544 (1.8030) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:44:47 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 158 training takes 0:04:39 [2024-08-10 14:44:47 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:44:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5420 (0.5420) Acc@1 88.281 (88.281) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 14:44:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.152) Loss 0.8418 (0.6579) Acc@1 80.322 (85.667) Acc@5 95.605 (97.461) Mem 16699MB [2024-08-10 14:44:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.9595 (0.7815) Acc@1 76.611 (82.513) Acc@5 94.922 (96.245) Mem 16699MB [2024-08-10 14:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.212 Acc@5 96.243 [2024-08-10 14:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.2% [2024-08-10 14:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.771 (0.771) Loss 0.4717 (0.4717) Acc@1 89.307 (89.307) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:44:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.182) Loss 0.7622 (0.5930) Acc@1 81.836 (86.976) Acc@5 96.533 (97.883) Mem 16699MB [2024-08-10 14:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.151) Loss 0.8633 (0.6966) Acc@1 78.320 (84.152) Acc@5 95.898 (96.840) Mem 16699MB [2024-08-10 14:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.831 Acc@5 96.857 [2024-08-10 14:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.83% [2024-08-10 14:44:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:44:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][0/625] eta 0:08:15 lr 0.000613 wd 0.0500 time 0.7932 (0.7932) data time 0.4033 (0.4033) model time 0.0000 (0.0000) loss 2.7478 (2.7478) grad_norm 2.3382 (2.3382) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][10/625] eta 0:04:52 lr 0.000613 wd 0.0500 time 0.4456 (0.4750) data time 0.0006 (0.0376) model time 0.0000 (0.0000) loss 2.8849 (2.9979) grad_norm 1.7047 (1.6532) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][20/625] eta 0:04:38 lr 0.000612 wd 0.0500 time 0.4403 (0.4602) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 2.7096 (2.9731) grad_norm 1.3293 (1.7469) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][30/625] eta 0:04:33 lr 0.000612 wd 0.0500 time 0.4470 (0.4603) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 3.1956 (2.9342) grad_norm 1.5127 (1.7351) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][40/625] eta 0:04:26 lr 0.000612 wd 0.0500 time 0.4454 (0.4559) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 3.5436 (2.9746) grad_norm 2.4620 (1.7467) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][50/625] eta 0:04:22 lr 0.000612 wd 0.0500 time 0.4427 (0.4565) data time 0.0006 (0.0088) model time 0.0000 (0.0000) loss 3.6063 (2.9652) grad_norm 1.6468 (1.7758) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][60/625] eta 0:04:16 lr 0.000612 wd 0.0500 time 0.4382 (0.4538) data time 0.0008 (0.0075) model time 0.4373 (0.4393) loss 3.1398 (2.9557) grad_norm 3.5398 (1.9290) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][70/625] eta 0:04:10 lr 0.000612 wd 0.0500 time 0.4536 (0.4521) data time 0.0006 (0.0066) model time 0.4530 (0.4403) loss 2.9207 (2.9333) grad_norm 2.0432 (2.0448) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][80/625] eta 0:04:05 lr 0.000612 wd 0.0500 time 0.4432 (0.4509) data time 0.0007 (0.0059) model time 0.4425 (0.4406) loss 3.4250 (2.9402) grad_norm 1.3715 (2.0074) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][90/625] eta 0:04:00 lr 0.000612 wd 0.0500 time 0.4416 (0.4499) data time 0.0006 (0.0053) model time 0.4409 (0.4407) loss 3.0298 (2.9096) grad_norm 1.1567 (1.9733) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][100/625] eta 0:03:55 lr 0.000612 wd 0.0500 time 0.4391 (0.4491) data time 0.0007 (0.0049) model time 0.4384 (0.4407) loss 3.2725 (2.9136) grad_norm 2.3213 (1.9512) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][110/625] eta 0:03:50 lr 0.000611 wd 0.0500 time 0.4417 (0.4485) data time 0.0007 (0.0045) model time 0.4410 (0.4408) loss 3.3852 (2.9134) grad_norm 1.7185 (1.9119) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][120/625] eta 0:03:46 lr 0.000611 wd 0.0500 time 0.4393 (0.4479) data time 0.0010 (0.0042) model time 0.4384 (0.4408) loss 2.7959 (2.8875) grad_norm 5.5219 (1.9289) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:45:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][130/625] eta 0:03:42 lr 0.000611 wd 0.0500 time 0.4401 (0.4504) data time 0.0006 (0.0040) model time 0.4395 (0.4457) loss 2.9606 (2.9179) grad_norm 2.1129 (1.9279) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][140/625] eta 0:03:38 lr 0.000611 wd 0.0500 time 0.4370 (0.4499) data time 0.0007 (0.0037) model time 0.4364 (0.4453) loss 2.9082 (2.9190) grad_norm 1.2722 (1.9012) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][150/625] eta 0:03:33 lr 0.000611 wd 0.0500 time 0.4513 (0.4494) data time 0.0006 (0.0036) model time 0.4507 (0.4450) loss 2.6111 (2.9146) grad_norm 1.3921 (1.8979) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][160/625] eta 0:03:28 lr 0.000611 wd 0.0500 time 0.4426 (0.4490) data time 0.0008 (0.0034) model time 0.4418 (0.4447) loss 1.9559 (2.9103) grad_norm 1.5935 (1.8826) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][170/625] eta 0:03:24 lr 0.000611 wd 0.0500 time 0.4446 (0.4486) data time 0.0009 (0.0032) model time 0.4437 (0.4444) loss 2.7578 (2.9071) grad_norm 1.1953 (1.8703) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][180/625] eta 0:03:19 lr 0.000611 wd 0.0500 time 0.4419 (0.4482) data time 0.0008 (0.0031) model time 0.4411 (0.4441) loss 2.3531 (2.9097) grad_norm 1.6662 (1.8514) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][190/625] eta 0:03:14 lr 0.000611 wd 0.0500 time 0.4370 (0.4478) data time 0.0008 (0.0030) model time 0.4362 (0.4437) loss 3.1604 (2.9087) grad_norm 2.2960 (1.8771) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][200/625] eta 0:03:10 lr 0.000611 wd 0.0500 time 0.4451 (0.4474) data time 0.0008 (0.0029) model time 0.4442 (0.4435) loss 2.7525 (2.9039) grad_norm 1.7249 (1.8708) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][210/625] eta 0:03:05 lr 0.000610 wd 0.0500 time 0.4407 (0.4471) data time 0.0008 (0.0028) model time 0.4399 (0.4432) loss 3.4109 (2.9078) grad_norm 1.0123 (1.8515) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][220/625] eta 0:03:00 lr 0.000610 wd 0.0500 time 0.4410 (0.4468) data time 0.0007 (0.0027) model time 0.4402 (0.4430) loss 2.4377 (2.8956) grad_norm 2.0631 (1.8389) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][230/625] eta 0:02:56 lr 0.000610 wd 0.0500 time 0.4454 (0.4466) data time 0.0008 (0.0026) model time 0.4446 (0.4429) loss 3.2393 (2.9048) grad_norm 1.5093 (1.8413) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][240/625] eta 0:02:51 lr 0.000610 wd 0.0500 time 0.4485 (0.4464) data time 0.0007 (0.0026) model time 0.4478 (0.4428) loss 2.9265 (2.9078) grad_norm 1.3841 (1.8584) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][250/625] eta 0:02:47 lr 0.000610 wd 0.0500 time 0.4444 (0.4462) data time 0.0008 (0.0025) model time 0.4437 (0.4427) loss 1.8193 (2.8975) grad_norm 2.5616 (1.8697) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][260/625] eta 0:02:42 lr 0.000610 wd 0.0500 time 0.4422 (0.4461) data time 0.0008 (0.0024) model time 0.4413 (0.4427) loss 2.9042 (2.9012) grad_norm 1.7528 (1.8704) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:46:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][270/625] eta 0:02:38 lr 0.000610 wd 0.0500 time 0.4481 (0.4460) data time 0.0006 (0.0024) model time 0.4475 (0.4426) loss 3.9185 (2.9147) grad_norm 1.9127 (1.9127) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][280/625] eta 0:02:33 lr 0.000610 wd 0.0500 time 0.4407 (0.4459) data time 0.0008 (0.0023) model time 0.4399 (0.4426) loss 3.4663 (2.9135) grad_norm 1.4146 (1.9101) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][290/625] eta 0:02:29 lr 0.000610 wd 0.0500 time 0.4494 (0.4458) data time 0.0006 (0.0023) model time 0.4488 (0.4426) loss 3.1635 (2.9119) grad_norm 4.2061 (1.9095) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][300/625] eta 0:02:24 lr 0.000609 wd 0.0500 time 0.4442 (0.4457) data time 0.0006 (0.0022) model time 0.4436 (0.4426) loss 3.6150 (2.9183) grad_norm 1.3032 (1.9117) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][310/625] eta 0:02:20 lr 0.000609 wd 0.0500 time 0.4436 (0.4456) data time 0.0008 (0.0022) model time 0.4428 (0.4426) loss 3.0945 (2.9210) grad_norm 2.8497 (1.9229) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][320/625] eta 0:02:15 lr 0.000609 wd 0.0500 time 0.4479 (0.4456) data time 0.0009 (0.0022) model time 0.4470 (0.4426) loss 3.0909 (2.9198) grad_norm 1.6873 (1.9209) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][330/625] eta 0:02:11 lr 0.000609 wd 0.0500 time 0.4433 (0.4455) data time 0.0008 (0.0021) model time 0.4424 (0.4426) loss 2.4766 (2.9084) grad_norm 1.3786 (1.9120) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][340/625] eta 0:02:06 lr 0.000609 wd 0.0500 time 0.4435 (0.4455) data time 0.0008 (0.0021) model time 0.4427 (0.4426) loss 2.7374 (2.9087) grad_norm 2.3983 (1.9128) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][350/625] eta 0:02:02 lr 0.000609 wd 0.0500 time 0.4439 (0.4454) data time 0.0006 (0.0021) model time 0.4432 (0.4427) loss 3.1445 (2.9054) grad_norm 1.6303 (1.9133) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][360/625] eta 0:01:58 lr 0.000609 wd 0.0500 time 0.4431 (0.4454) data time 0.0007 (0.0020) model time 0.4424 (0.4426) loss 2.8304 (2.9037) grad_norm 2.4193 (1.9063) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][370/625] eta 0:01:53 lr 0.000609 wd 0.0500 time 0.4423 (0.4458) data time 0.0009 (0.0020) model time 0.4414 (0.4432) loss 3.2218 (2.8984) grad_norm 1.8889 (1.9106) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][380/625] eta 0:01:49 lr 0.000609 wd 0.0500 time 0.6140 (0.4462) data time 0.0008 (0.0020) model time 0.6132 (0.4437) loss 3.2000 (2.9003) grad_norm 1.3921 (1.9017) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][390/625] eta 0:01:44 lr 0.000609 wd 0.0500 time 0.4424 (0.4461) data time 0.0008 (0.0019) model time 0.4416 (0.4437) loss 2.5134 (2.8964) grad_norm 1.4631 (1.8980) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][400/625] eta 0:01:40 lr 0.000608 wd 0.0500 time 0.4401 (0.4461) data time 0.0007 (0.0019) model time 0.4394 (0.4436) loss 3.1014 (2.8990) grad_norm 1.9741 (1.9092) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][410/625] eta 0:01:35 lr 0.000608 wd 0.0500 time 0.4424 (0.4460) data time 0.0009 (0.0019) model time 0.4415 (0.4436) loss 2.8323 (2.8968) grad_norm 1.4423 (1.9175) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][420/625] eta 0:01:31 lr 0.000608 wd 0.0500 time 0.4416 (0.4460) data time 0.0007 (0.0019) model time 0.4409 (0.4436) loss 2.6307 (2.8940) grad_norm 1.8344 (1.9135) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][430/625] eta 0:01:26 lr 0.000608 wd 0.0500 time 0.4395 (0.4459) data time 0.0008 (0.0018) model time 0.4387 (0.4436) loss 3.0628 (2.8908) grad_norm 1.6132 (1.9127) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][440/625] eta 0:01:22 lr 0.000608 wd 0.0500 time 0.4449 (0.4459) data time 0.0009 (0.0018) model time 0.4441 (0.4436) loss 3.1186 (2.8920) grad_norm 1.4154 (1.9071) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][450/625] eta 0:01:18 lr 0.000608 wd 0.0500 time 0.4425 (0.4458) data time 0.0006 (0.0018) model time 0.4419 (0.4436) loss 2.7318 (2.8895) grad_norm 2.5523 (1.8999) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][460/625] eta 0:01:13 lr 0.000608 wd 0.0500 time 0.6167 (0.4467) data time 0.0007 (0.0018) model time 0.6161 (0.4446) loss 2.6148 (2.8911) grad_norm 2.6015 (1.8997) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][470/625] eta 0:01:09 lr 0.000608 wd 0.0500 time 0.4436 (0.4466) data time 0.0006 (0.0018) model time 0.4430 (0.4445) loss 2.5666 (2.8943) grad_norm 1.7581 (1.9000) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][480/625] eta 0:01:04 lr 0.000608 wd 0.0500 time 0.4394 (0.4465) data time 0.0007 (0.0017) model time 0.4387 (0.4444) loss 3.0335 (2.8974) grad_norm 1.2862 (1.8990) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][490/625] eta 0:01:00 lr 0.000607 wd 0.0500 time 0.4409 (0.4464) data time 0.0007 (0.0017) model time 0.4403 (0.4444) loss 2.9278 (2.8997) grad_norm 1.4164 (1.8926) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][500/625] eta 0:00:55 lr 0.000607 wd 0.0500 time 0.4463 (0.4464) data time 0.0009 (0.0017) model time 0.4454 (0.4443) loss 1.9704 (2.8973) grad_norm 2.0693 (1.8901) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][510/625] eta 0:00:51 lr 0.000607 wd 0.0500 time 0.4422 (0.4463) data time 0.0006 (0.0017) model time 0.4416 (0.4443) loss 4.0168 (2.8996) grad_norm 1.7676 (1.8924) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][520/625] eta 0:00:46 lr 0.000607 wd 0.0500 time 0.4435 (0.4463) data time 0.0009 (0.0017) model time 0.4426 (0.4443) loss 3.3894 (2.9017) grad_norm 1.3629 (1.8871) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][530/625] eta 0:00:42 lr 0.000607 wd 0.0500 time 0.4418 (0.4462) data time 0.0010 (0.0017) model time 0.4408 (0.4442) loss 3.1812 (2.9043) grad_norm 3.1274 (1.8883) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][540/625] eta 0:00:37 lr 0.000607 wd 0.0500 time 0.4404 (0.4461) data time 0.0007 (0.0016) model time 0.4398 (0.4442) loss 3.3449 (2.9056) grad_norm 2.5806 (1.8905) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][550/625] eta 0:00:33 lr 0.000607 wd 0.0500 time 0.4419 (0.4461) data time 0.0009 (0.0016) model time 0.4411 (0.4441) loss 3.4515 (2.9057) grad_norm 1.2342 (1.8903) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][560/625] eta 0:00:29 lr 0.000607 wd 0.0500 time 0.4434 (0.4462) data time 0.0006 (0.0016) model time 0.4428 (0.4443) loss 1.7925 (2.9062) grad_norm 1.8905 (1.8894) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][570/625] eta 0:00:24 lr 0.000607 wd 0.0500 time 0.3930 (0.4463) data time 0.0008 (0.0016) model time 0.3922 (0.4444) loss 3.0342 (2.9069) grad_norm 1.5889 (1.8851) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][580/625] eta 0:00:20 lr 0.000606 wd 0.0500 time 0.4408 (0.4463) data time 0.0009 (0.0016) model time 0.4399 (0.4444) loss 3.2040 (2.9081) grad_norm 1.5658 (1.8816) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][590/625] eta 0:00:15 lr 0.000606 wd 0.0500 time 0.4436 (0.4462) data time 0.0006 (0.0016) model time 0.4430 (0.4443) loss 2.5821 (2.9059) grad_norm 1.4125 (1.8833) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][600/625] eta 0:00:11 lr 0.000606 wd 0.0500 time 0.4442 (0.4462) data time 0.0009 (0.0016) model time 0.4433 (0.4443) loss 2.1530 (2.8992) grad_norm 1.8114 (1.8780) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][610/625] eta 0:00:06 lr 0.000606 wd 0.0500 time 0.4385 (0.4461) data time 0.0004 (0.0016) model time 0.4381 (0.4443) loss 3.5178 (2.9016) grad_norm 1.2974 (1.8708) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [159/300][620/625] eta 0:00:02 lr 0.000606 wd 0.0500 time 0.4372 (0.4460) data time 0.0004 (0.0016) model time 0.4368 (0.4442) loss 2.2919 (2.8990) grad_norm 2.0762 (1.8668) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 159 training takes 0:04:38 [2024-08-10 14:49:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:49:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:49:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5234 (0.5234) Acc@1 89.014 (89.014) Acc@5 98.535 (98.535) Mem 16699MB [2024-08-10 14:49:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.151) Loss 0.8374 (0.6552) Acc@1 79.932 (85.747) Acc@5 95.410 (97.621) Mem 16699MB [2024-08-10 14:49:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9312 (0.7708) Acc@1 78.125 (82.731) Acc@5 94.678 (96.298) Mem 16699MB [2024-08-10 14:49:40 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.350 Acc@5 96.279 [2024-08-10 14:49:40 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 14:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.901 (0.901) Loss 0.4714 (0.4714) Acc@1 89.258 (89.258) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.191) Loss 0.7617 (0.5926) Acc@1 82.129 (86.985) Acc@5 96.484 (97.852) Mem 16699MB [2024-08-10 14:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.155) Loss 0.8623 (0.6962) Acc@1 78.320 (84.154) Acc@5 95.752 (96.826) Mem 16699MB [2024-08-10 14:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.841 Acc@5 96.845 [2024-08-10 14:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.84% [2024-08-10 14:49:44 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:49:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][0/625] eta 0:08:23 lr 0.000606 wd 0.0500 time 0.8049 (0.8049) data time 0.4169 (0.4169) model time 0.0000 (0.0000) loss 3.3777 (3.3777) grad_norm 1.3764 (1.3764) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][10/625] eta 0:04:53 lr 0.000606 wd 0.0500 time 0.4440 (0.4769) data time 0.0008 (0.0390) model time 0.0000 (0.0000) loss 2.9218 (2.8093) grad_norm 1.2114 (1.7283) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:49:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][20/625] eta 0:04:38 lr 0.000606 wd 0.0500 time 0.4389 (0.4605) data time 0.0007 (0.0209) model time 0.0000 (0.0000) loss 2.2778 (2.8684) grad_norm 1.3271 (1.7011) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][30/625] eta 0:04:31 lr 0.000606 wd 0.0500 time 0.4427 (0.4559) data time 0.0006 (0.0144) model time 0.0000 (0.0000) loss 2.9805 (2.8623) grad_norm 1.5334 (1.6525) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][40/625] eta 0:04:28 lr 0.000606 wd 0.0500 time 0.4400 (0.4583) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 3.0547 (2.8174) grad_norm 3.0298 (1.6764) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][50/625] eta 0:04:25 lr 0.000605 wd 0.0500 time 0.4422 (0.4617) data time 0.0008 (0.0093) model time 0.0000 (0.0000) loss 2.8949 (2.8173) grad_norm 1.3583 (1.6281) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][60/625] eta 0:04:20 lr 0.000605 wd 0.0500 time 0.4429 (0.4602) data time 0.0008 (0.0079) model time 0.4420 (0.4514) loss 3.6273 (2.7989) grad_norm 1.8359 (1.6449) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][70/625] eta 0:04:13 lr 0.000605 wd 0.0500 time 0.4407 (0.4576) data time 0.0009 (0.0069) model time 0.4398 (0.4462) loss 3.2669 (2.8413) grad_norm 2.0643 (1.6748) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][80/625] eta 0:04:08 lr 0.000605 wd 0.0500 time 0.4442 (0.4562) data time 0.0006 (0.0062) model time 0.4436 (0.4458) loss 3.0595 (2.8581) grad_norm 1.8514 (1.6966) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][90/625] eta 0:04:04 lr 0.000605 wd 0.0500 time 0.5722 (0.4562) data time 0.0009 (0.0057) model time 0.5713 (0.4481) loss 3.5550 (2.8537) grad_norm 1.6444 (1.7377) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][100/625] eta 0:03:58 lr 0.000605 wd 0.0500 time 0.4397 (0.4543) data time 0.0007 (0.0052) model time 0.4390 (0.4458) loss 3.3994 (2.8592) grad_norm 3.8433 (1.8222) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][110/625] eta 0:03:53 lr 0.000605 wd 0.0500 time 0.4414 (0.4536) data time 0.0009 (0.0048) model time 0.4406 (0.4458) loss 3.3595 (2.8662) grad_norm 1.7256 (2.0263) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][120/625] eta 0:03:48 lr 0.000605 wd 0.0500 time 0.4435 (0.4528) data time 0.0008 (0.0045) model time 0.4427 (0.4453) loss 2.6468 (2.8655) grad_norm 1.4816 (1.9818) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][130/625] eta 0:03:44 lr 0.000605 wd 0.0500 time 0.4383 (0.4534) data time 0.0006 (0.0043) model time 0.4377 (0.4470) loss 2.6250 (2.8787) grad_norm 1.9668 (2.0178) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][140/625] eta 0:03:39 lr 0.000605 wd 0.0500 time 0.4421 (0.4526) data time 0.0007 (0.0040) model time 0.4414 (0.4464) loss 2.4158 (2.8812) grad_norm 1.8737 (2.0234) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][150/625] eta 0:03:34 lr 0.000604 wd 0.0500 time 0.4428 (0.4520) data time 0.0007 (0.0038) model time 0.4421 (0.4460) loss 2.4553 (2.8892) grad_norm 2.1972 (2.0111) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:50:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][160/625] eta 0:03:29 lr 0.000604 wd 0.0500 time 0.4426 (0.4514) data time 0.0006 (0.0036) model time 0.4419 (0.4456) loss 2.9853 (2.8888) grad_norm 1.3387 (1.9872) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][170/625] eta 0:03:25 lr 0.000604 wd 0.0500 time 0.4461 (0.4510) data time 0.0007 (0.0035) model time 0.4454 (0.4454) loss 3.1930 (2.8886) grad_norm 0.9430 (1.9664) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][180/625] eta 0:03:20 lr 0.000604 wd 0.0500 time 0.4412 (0.4506) data time 0.0009 (0.0033) model time 0.4403 (0.4453) loss 3.5558 (2.8933) grad_norm 1.5693 (1.9709) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][190/625] eta 0:03:15 lr 0.000604 wd 0.0500 time 0.4399 (0.4502) data time 0.0010 (0.0032) model time 0.4389 (0.4451) loss 2.8185 (2.8960) grad_norm 2.2570 (1.9520) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][200/625] eta 0:03:11 lr 0.000604 wd 0.0500 time 0.4411 (0.4498) data time 0.0007 (0.0031) model time 0.4405 (0.4448) loss 3.6636 (2.9003) grad_norm 1.2693 (1.9320) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][210/625] eta 0:03:06 lr 0.000604 wd 0.0500 time 0.4441 (0.4495) data time 0.0008 (0.0030) model time 0.4432 (0.4446) loss 3.2435 (2.9007) grad_norm 1.2804 (1.9122) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][220/625] eta 0:03:01 lr 0.000604 wd 0.0500 time 0.4416 (0.4492) data time 0.0006 (0.0029) model time 0.4410 (0.4444) loss 3.4005 (2.9024) grad_norm 2.1990 (1.9204) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][230/625] eta 0:02:57 lr 0.000604 wd 0.0500 time 0.4453 (0.4497) data time 0.0006 (0.0028) model time 0.4447 (0.4453) loss 3.2397 (2.9038) grad_norm 2.6189 (1.9356) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 14:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][240/625] eta 0:02:53 lr 0.000603 wd 0.0500 time 0.4471 (0.4498) data time 0.0006 (0.0027) model time 0.4465 (0.4456) loss 2.6724 (2.9038) grad_norm 1.9172 (1.9333) loss_scale 512.0000 (264.4979) mem 16699MB [2024-08-10 14:51:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][250/625] eta 0:02:48 lr 0.000603 wd 0.0500 time 0.4422 (0.4495) data time 0.0009 (0.0027) model time 0.4413 (0.4454) loss 2.5020 (2.9047) grad_norm 1.6169 (1.9332) loss_scale 512.0000 (274.3586) mem 16699MB [2024-08-10 14:51:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][260/625] eta 0:02:43 lr 0.000603 wd 0.0500 time 0.4448 (0.4493) data time 0.0006 (0.0026) model time 0.4442 (0.4453) loss 3.4117 (2.9086) grad_norm 2.1871 (1.9495) loss_scale 512.0000 (283.4636) mem 16699MB [2024-08-10 14:51:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][270/625] eta 0:02:39 lr 0.000603 wd 0.0500 time 0.4467 (0.4490) data time 0.0008 (0.0025) model time 0.4460 (0.4451) loss 3.0482 (2.9067) grad_norm 1.7004 (1.9408) loss_scale 512.0000 (291.8967) mem 16699MB [2024-08-10 14:51:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][280/625] eta 0:02:34 lr 0.000603 wd 0.0500 time 0.4423 (0.4489) data time 0.0006 (0.0025) model time 0.4417 (0.4450) loss 3.3844 (2.9181) grad_norm 1.4631 (1.9563) loss_scale 512.0000 (299.7295) mem 16699MB [2024-08-10 14:51:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][290/625] eta 0:02:30 lr 0.000603 wd 0.0500 time 0.4519 (0.4487) data time 0.0009 (0.0024) model time 0.4510 (0.4449) loss 3.1610 (2.9152) grad_norm 1.2276 (1.9461) loss_scale 512.0000 (307.0241) mem 16699MB [2024-08-10 14:52:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][300/625] eta 0:02:25 lr 0.000603 wd 0.0500 time 0.4439 (0.4485) data time 0.0007 (0.0024) model time 0.4433 (0.4448) loss 1.9732 (2.9206) grad_norm 1.3403 (1.9355) loss_scale 512.0000 (313.8339) mem 16699MB [2024-08-10 14:52:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][310/625] eta 0:02:21 lr 0.000603 wd 0.0500 time 0.6674 (0.4491) data time 0.0009 (0.0023) model time 0.6665 (0.4457) loss 3.1128 (2.9228) grad_norm 1.8837 (1.9434) loss_scale 512.0000 (320.2058) mem 16699MB [2024-08-10 14:52:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][320/625] eta 0:02:16 lr 0.000603 wd 0.0500 time 0.4428 (0.4488) data time 0.0007 (0.0023) model time 0.4421 (0.4454) loss 3.5679 (2.9261) grad_norm 2.0004 (1.9334) loss_scale 512.0000 (326.1807) mem 16699MB [2024-08-10 14:52:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][330/625] eta 0:02:12 lr 0.000602 wd 0.0500 time 0.4382 (0.4487) data time 0.0007 (0.0023) model time 0.4374 (0.4453) loss 2.1044 (2.9311) grad_norm 2.2529 (1.9352) loss_scale 512.0000 (331.7946) mem 16699MB [2024-08-10 14:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][340/625] eta 0:02:07 lr 0.000602 wd 0.0500 time 0.4396 (0.4485) data time 0.0011 (0.0022) model time 0.4385 (0.4452) loss 2.9483 (2.9338) grad_norm 1.3210 (1.9329) loss_scale 512.0000 (337.0792) mem 16699MB [2024-08-10 14:52:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][350/625] eta 0:02:03 lr 0.000602 wd 0.0500 time 0.4425 (0.4492) data time 0.0006 (0.0022) model time 0.4418 (0.4462) loss 1.7724 (2.9267) grad_norm 28.0024 (2.0016) loss_scale 512.0000 (342.0627) mem 16699MB [2024-08-10 14:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][360/625] eta 0:01:59 lr 0.000602 wd 0.0500 time 0.4344 (0.4491) data time 0.0009 (0.0021) model time 0.4335 (0.4460) loss 3.1860 (2.9282) grad_norm 1.3090 (2.0099) loss_scale 512.0000 (346.7701) mem 16699MB [2024-08-10 14:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][370/625] eta 0:01:54 lr 0.000602 wd 0.0500 time 0.4450 (0.4489) data time 0.0006 (0.0021) model time 0.4443 (0.4459) loss 3.0401 (2.9286) grad_norm 1.2694 (1.9992) loss_scale 512.0000 (351.2237) mem 16699MB [2024-08-10 14:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][380/625] eta 0:01:49 lr 0.000602 wd 0.0500 time 0.4411 (0.4488) data time 0.0008 (0.0021) model time 0.4403 (0.4458) loss 2.3219 (2.9269) grad_norm 1.5152 (1.9958) loss_scale 512.0000 (355.4436) mem 16699MB [2024-08-10 14:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][390/625] eta 0:01:45 lr 0.000602 wd 0.0500 time 0.4480 (0.4486) data time 0.0008 (0.0021) model time 0.4473 (0.4457) loss 3.2713 (2.9264) grad_norm 2.1331 (2.0093) loss_scale 512.0000 (359.4476) mem 16699MB [2024-08-10 14:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][400/625] eta 0:01:40 lr 0.000602 wd 0.0500 time 0.4434 (0.4485) data time 0.0007 (0.0020) model time 0.4427 (0.4456) loss 3.5099 (2.9237) grad_norm 1.8200 (2.0078) loss_scale 512.0000 (363.2519) mem 16699MB [2024-08-10 14:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][410/625] eta 0:01:36 lr 0.000602 wd 0.0500 time 0.4401 (0.4483) data time 0.0009 (0.0020) model time 0.4392 (0.4454) loss 2.8148 (2.9172) grad_norm 2.4618 (2.0042) loss_scale 512.0000 (366.8710) mem 16699MB [2024-08-10 14:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][420/625] eta 0:01:31 lr 0.000602 wd 0.0500 time 0.4400 (0.4481) data time 0.0009 (0.0020) model time 0.4391 (0.4453) loss 3.6936 (2.9243) grad_norm 2.0989 (2.0049) loss_scale 512.0000 (370.3183) mem 16699MB [2024-08-10 14:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][430/625] eta 0:01:27 lr 0.000601 wd 0.0500 time 0.4386 (0.4480) data time 0.0007 (0.0019) model time 0.4379 (0.4452) loss 3.6095 (2.9233) grad_norm 1.6152 (2.0008) loss_scale 512.0000 (373.6056) mem 16699MB [2024-08-10 14:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][440/625] eta 0:01:22 lr 0.000601 wd 0.0500 time 0.4446 (0.4478) data time 0.0010 (0.0019) model time 0.4436 (0.4451) loss 3.0279 (2.9260) grad_norm 1.3035 (1.9914) loss_scale 512.0000 (376.7438) mem 16699MB [2024-08-10 14:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][450/625] eta 0:01:18 lr 0.000601 wd 0.0500 time 0.4424 (0.4483) data time 0.0009 (0.0019) model time 0.4415 (0.4456) loss 3.1718 (2.9233) grad_norm 1.6433 (2.0059) loss_scale 512.0000 (379.7428) mem 16699MB [2024-08-10 14:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][460/625] eta 0:01:14 lr 0.000601 wd 0.0500 time 0.4416 (0.4489) data time 0.0007 (0.0019) model time 0.4409 (0.4463) loss 2.1581 (2.9238) grad_norm 1.8148 (1.9998) loss_scale 512.0000 (382.6117) mem 16699MB [2024-08-10 14:53:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][470/625] eta 0:01:09 lr 0.000601 wd 0.0500 time 0.4447 (0.4488) data time 0.0006 (0.0019) model time 0.4441 (0.4462) loss 4.0120 (2.9221) grad_norm 1.1389 (1.9890) loss_scale 512.0000 (385.3588) mem 16699MB [2024-08-10 14:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][480/625] eta 0:01:05 lr 0.000601 wd 0.0500 time 0.4449 (0.4486) data time 0.0008 (0.0018) model time 0.4440 (0.4461) loss 2.7637 (2.9232) grad_norm 2.2974 (1.9850) loss_scale 512.0000 (387.9917) mem 16699MB [2024-08-10 14:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][490/625] eta 0:01:00 lr 0.000601 wd 0.0500 time 0.4544 (0.4486) data time 0.0008 (0.0018) model time 0.4536 (0.4461) loss 2.8356 (2.9249) grad_norm 1.4000 (1.9725) loss_scale 512.0000 (390.5173) mem 16699MB [2024-08-10 14:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][500/625] eta 0:00:56 lr 0.000601 wd 0.0500 time 0.4441 (0.4492) data time 0.0009 (0.0018) model time 0.4432 (0.4468) loss 3.1982 (2.9240) grad_norm 1.4929 (1.9595) loss_scale 512.0000 (392.9421) mem 16699MB [2024-08-10 14:53:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][510/625] eta 0:00:51 lr 0.000601 wd 0.0500 time 0.4411 (0.4491) data time 0.0007 (0.0018) model time 0.4404 (0.4467) loss 2.8350 (2.9268) grad_norm 7.7027 (1.9643) loss_scale 512.0000 (395.2720) mem 16699MB [2024-08-10 14:53:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][520/625] eta 0:00:47 lr 0.000600 wd 0.0500 time 0.4441 (0.4490) data time 0.0009 (0.0018) model time 0.4432 (0.4467) loss 2.9793 (2.9300) grad_norm 2.0533 (1.9690) loss_scale 512.0000 (397.5125) mem 16699MB [2024-08-10 14:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][530/625] eta 0:00:42 lr 0.000600 wd 0.0500 time 0.4441 (0.4489) data time 0.0008 (0.0018) model time 0.4433 (0.4466) loss 2.9925 (2.9329) grad_norm 2.0812 (1.9774) loss_scale 512.0000 (399.6685) mem 16699MB [2024-08-10 14:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][540/625] eta 0:00:38 lr 0.000600 wd 0.0500 time 0.4422 (0.4488) data time 0.0007 (0.0017) model time 0.4415 (0.4466) loss 3.0990 (2.9308) grad_norm 1.5148 (1.9817) loss_scale 512.0000 (401.7449) mem 16699MB [2024-08-10 14:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][550/625] eta 0:00:33 lr 0.000600 wd 0.0500 time 0.4422 (0.4488) data time 0.0008 (0.0017) model time 0.4414 (0.4465) loss 3.5106 (2.9291) grad_norm 2.1770 (1.9745) loss_scale 512.0000 (403.7459) mem 16699MB [2024-08-10 14:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][560/625] eta 0:00:29 lr 0.000600 wd 0.0500 time 0.4436 (0.4487) data time 0.0008 (0.0017) model time 0.4427 (0.4464) loss 2.9494 (2.9309) grad_norm 2.1265 (1.9742) loss_scale 512.0000 (405.6756) mem 16699MB [2024-08-10 14:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][570/625] eta 0:00:24 lr 0.000600 wd 0.0500 time 0.4435 (0.4486) data time 0.0008 (0.0017) model time 0.4426 (0.4463) loss 3.3235 (2.9334) grad_norm 1.9014 (1.9752) loss_scale 512.0000 (407.5377) mem 16699MB [2024-08-10 14:54:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][580/625] eta 0:00:20 lr 0.000600 wd 0.0500 time 0.4414 (0.4485) data time 0.0009 (0.0017) model time 0.4405 (0.4463) loss 3.0953 (2.9318) grad_norm 1.6165 (1.9687) loss_scale 512.0000 (409.3356) mem 16699MB [2024-08-10 14:54:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][590/625] eta 0:00:15 lr 0.000600 wd 0.0500 time 0.4446 (0.4484) data time 0.0006 (0.0017) model time 0.4440 (0.4462) loss 3.2724 (2.9300) grad_norm 1.6905 (1.9662) loss_scale 512.0000 (411.0728) mem 16699MB [2024-08-10 14:54:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][600/625] eta 0:00:11 lr 0.000600 wd 0.0500 time 0.4428 (0.4486) data time 0.0006 (0.0017) model time 0.4421 (0.4465) loss 2.1575 (2.9255) grad_norm 2.8810 (1.9704) loss_scale 512.0000 (412.7521) mem 16699MB [2024-08-10 14:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][610/625] eta 0:00:06 lr 0.000599 wd 0.0500 time 0.4431 (0.4488) data time 0.0004 (0.0016) model time 0.4426 (0.4467) loss 3.4246 (2.9248) grad_norm 1.5926 (1.9681) loss_scale 512.0000 (414.3764) mem 16699MB [2024-08-10 14:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [160/300][620/625] eta 0:00:02 lr 0.000599 wd 0.0500 time 0.4391 (0.4487) data time 0.0009 (0.0016) model time 0.4382 (0.4466) loss 3.2693 (2.9262) grad_norm 1.5593 (1.9609) loss_scale 512.0000 (415.9485) mem 16699MB [2024-08-10 14:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 160 training takes 0:04:40 [2024-08-10 14:54:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:54:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.499 (0.499) Loss 0.5488 (0.5488) Acc@1 88.281 (88.281) Acc@5 98.438 (98.438) Mem 16699MB [2024-08-10 14:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.155) Loss 0.8853 (0.6705) Acc@1 77.783 (85.329) Acc@5 95.361 (97.354) Mem 16699MB [2024-08-10 14:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9458 (0.7873) Acc@1 76.465 (82.373) Acc@5 94.873 (96.154) Mem 16699MB [2024-08-10 14:54:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.068 Acc@5 96.137 [2024-08-10 14:54:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.1% [2024-08-10 14:54:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.894 (0.894) Loss 0.4722 (0.4722) Acc@1 89.258 (89.258) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 14:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.191) Loss 0.7622 (0.5925) Acc@1 81.934 (86.981) Acc@5 96.436 (97.874) Mem 16699MB [2024-08-10 14:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.155) Loss 0.8608 (0.6961) Acc@1 78.418 (84.140) Acc@5 95.752 (96.845) Mem 16699MB [2024-08-10 14:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.835 Acc@5 96.855 [2024-08-10 14:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][0/625] eta 0:13:28 lr 0.000599 wd 0.0500 time 1.2929 (1.2929) data time 0.4661 (0.4661) model time 0.0000 (0.0000) loss 3.0837 (3.0837) grad_norm 1.2520 (1.2520) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:54:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][10/625] eta 0:05:19 lr 0.000599 wd 0.0500 time 0.4459 (0.5197) data time 0.0005 (0.0432) model time 0.0000 (0.0000) loss 1.9003 (2.8817) grad_norm 1.9888 (1.4682) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][20/625] eta 0:04:52 lr 0.000599 wd 0.0500 time 0.4423 (0.4829) data time 0.0006 (0.0230) model time 0.0000 (0.0000) loss 2.7570 (2.9548) grad_norm 1.3864 (1.5528) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:54:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][30/625] eta 0:04:39 lr 0.000599 wd 0.0500 time 0.4508 (0.4703) data time 0.0007 (0.0159) model time 0.0000 (0.0000) loss 2.7206 (3.0086) grad_norm 1.4888 (1.6133) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:54:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][40/625] eta 0:04:31 lr 0.000599 wd 0.0500 time 0.4443 (0.4637) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 2.0068 (2.9555) grad_norm 1.8639 (1.5858) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:54:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][50/625] eta 0:04:24 lr 0.000599 wd 0.0500 time 0.4424 (0.4600) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 2.5009 (2.9155) grad_norm 1.6760 (1.6755) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][60/625] eta 0:04:18 lr 0.000599 wd 0.0500 time 0.4486 (0.4574) data time 0.0008 (0.0085) model time 0.4478 (0.4436) loss 3.1424 (2.8619) grad_norm 1.9552 (1.7464) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][70/625] eta 0:04:13 lr 0.000599 wd 0.0500 time 0.4402 (0.4562) data time 0.0009 (0.0075) model time 0.4393 (0.4457) loss 2.2166 (2.8752) grad_norm 2.0653 (1.7652) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][80/625] eta 0:04:07 lr 0.000598 wd 0.0500 time 0.4403 (0.4544) data time 0.0006 (0.0066) model time 0.4397 (0.4440) loss 3.3136 (2.8999) grad_norm 1.9955 (1.7854) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][90/625] eta 0:04:03 lr 0.000598 wd 0.0500 time 0.4411 (0.4549) data time 0.0008 (0.0060) model time 0.4403 (0.4475) loss 2.9515 (2.9124) grad_norm 4.0042 (1.8371) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][100/625] eta 0:03:58 lr 0.000598 wd 0.0500 time 0.4429 (0.4536) data time 0.0008 (0.0055) model time 0.4422 (0.4463) loss 3.2140 (2.9009) grad_norm 1.7820 (1.8480) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][110/625] eta 0:03:53 lr 0.000598 wd 0.0500 time 0.4464 (0.4528) data time 0.0007 (0.0051) model time 0.4457 (0.4459) loss 2.3644 (2.9043) grad_norm 1.8264 (1.8346) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][120/625] eta 0:03:48 lr 0.000598 wd 0.0500 time 0.4421 (0.4521) data time 0.0008 (0.0047) model time 0.4413 (0.4455) loss 1.8540 (2.9035) grad_norm 1.5085 (1.8334) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][130/625] eta 0:03:43 lr 0.000598 wd 0.0500 time 0.4429 (0.4515) data time 0.0008 (0.0044) model time 0.4421 (0.4452) loss 2.9199 (2.9157) grad_norm 1.7564 (1.8736) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][140/625] eta 0:03:39 lr 0.000598 wd 0.0500 time 0.4428 (0.4524) data time 0.0008 (0.0042) model time 0.4420 (0.4472) loss 2.9179 (2.9148) grad_norm 1.4563 (1.8632) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][150/625] eta 0:03:34 lr 0.000598 wd 0.0500 time 0.4413 (0.4517) data time 0.0007 (0.0040) model time 0.4405 (0.4466) loss 2.2651 (2.9027) grad_norm 1.4106 (1.8565) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][160/625] eta 0:03:29 lr 0.000598 wd 0.0500 time 0.4409 (0.4511) data time 0.0009 (0.0038) model time 0.4400 (0.4461) loss 3.2754 (2.9119) grad_norm 1.6355 (1.8387) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][170/625] eta 0:03:25 lr 0.000598 wd 0.0500 time 0.4424 (0.4506) data time 0.0006 (0.0036) model time 0.4418 (0.4457) loss 2.7165 (2.9064) grad_norm 1.4439 (1.8313) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][180/625] eta 0:03:20 lr 0.000597 wd 0.0500 time 0.4425 (0.4502) data time 0.0010 (0.0035) model time 0.4416 (0.4455) loss 3.1545 (2.9126) grad_norm 1.3540 (1.8130) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][190/625] eta 0:03:15 lr 0.000597 wd 0.0500 time 0.4448 (0.4499) data time 0.0006 (0.0033) model time 0.4442 (0.4453) loss 3.7621 (2.9246) grad_norm 5.7158 (1.8361) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][200/625] eta 0:03:11 lr 0.000597 wd 0.0500 time 0.4426 (0.4495) data time 0.0006 (0.0032) model time 0.4420 (0.4450) loss 2.5912 (2.9303) grad_norm 1.9097 (1.8397) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][210/625] eta 0:03:06 lr 0.000597 wd 0.0500 time 0.4414 (0.4500) data time 0.0006 (0.0031) model time 0.4408 (0.4459) loss 3.3197 (2.9332) grad_norm 1.3209 (1.8428) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][220/625] eta 0:03:02 lr 0.000597 wd 0.0500 time 0.4388 (0.4496) data time 0.0006 (0.0030) model time 0.4382 (0.4456) loss 2.2769 (2.9263) grad_norm 1.5139 (1.8453) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][230/625] eta 0:02:57 lr 0.000597 wd 0.0500 time 0.4399 (0.4492) data time 0.0008 (0.0029) model time 0.4391 (0.4453) loss 3.1228 (2.9189) grad_norm 1.8934 (1.8302) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][240/625] eta 0:02:52 lr 0.000597 wd 0.0500 time 0.4382 (0.4489) data time 0.0006 (0.0028) model time 0.4375 (0.4451) loss 3.1087 (2.9192) grad_norm 2.6513 (1.8306) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][250/625] eta 0:02:48 lr 0.000597 wd 0.0500 time 0.4421 (0.4486) data time 0.0009 (0.0027) model time 0.4412 (0.4449) loss 2.1300 (2.9114) grad_norm 1.5148 (1.8404) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][260/625] eta 0:02:43 lr 0.000597 wd 0.0500 time 0.4406 (0.4484) data time 0.0006 (0.0027) model time 0.4400 (0.4447) loss 3.5451 (2.9087) grad_norm 2.5170 (1.8804) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][270/625] eta 0:02:39 lr 0.000596 wd 0.0500 time 0.4406 (0.4482) data time 0.0010 (0.0026) model time 0.4396 (0.4445) loss 3.0726 (2.9111) grad_norm 1.6167 (1.8746) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][280/625] eta 0:02:34 lr 0.000596 wd 0.0500 time 0.4448 (0.4480) data time 0.0006 (0.0025) model time 0.4442 (0.4445) loss 2.8129 (2.9114) grad_norm 1.4253 (1.8592) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][290/625] eta 0:02:30 lr 0.000596 wd 0.0500 time 0.4403 (0.4484) data time 0.0009 (0.0025) model time 0.4394 (0.4451) loss 1.9984 (2.8997) grad_norm 1.6083 (1.8640) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][300/625] eta 0:02:25 lr 0.000596 wd 0.0500 time 0.4449 (0.4481) data time 0.0008 (0.0024) model time 0.4442 (0.4448) loss 3.1351 (2.8964) grad_norm 1.4669 (1.8638) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][310/625] eta 0:02:21 lr 0.000596 wd 0.0500 time 0.4424 (0.4479) data time 0.0007 (0.0024) model time 0.4418 (0.4447) loss 3.0919 (2.8947) grad_norm 2.1837 (1.8690) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:56:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][320/625] eta 0:02:16 lr 0.000596 wd 0.0500 time 0.4420 (0.4478) data time 0.0006 (0.0023) model time 0.4414 (0.4446) loss 3.1969 (2.8946) grad_norm 1.7484 (1.8756) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][330/625] eta 0:02:12 lr 0.000596 wd 0.0500 time 0.4476 (0.4476) data time 0.0009 (0.0023) model time 0.4467 (0.4445) loss 3.1442 (2.8958) grad_norm 1.1165 (1.8608) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][340/625] eta 0:02:07 lr 0.000596 wd 0.0500 time 0.4413 (0.4475) data time 0.0006 (0.0023) model time 0.4407 (0.4444) loss 2.1205 (2.8966) grad_norm 2.5417 (1.8537) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][350/625] eta 0:02:03 lr 0.000596 wd 0.0500 time 0.4423 (0.4473) data time 0.0008 (0.0022) model time 0.4415 (0.4443) loss 3.0152 (2.9018) grad_norm 1.8576 (1.8520) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][360/625] eta 0:01:58 lr 0.000595 wd 0.0500 time 0.4421 (0.4477) data time 0.0008 (0.0022) model time 0.4413 (0.4447) loss 3.1780 (2.9125) grad_norm 1.3889 (1.8500) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][370/625] eta 0:01:54 lr 0.000595 wd 0.0500 time 0.4420 (0.4475) data time 0.0009 (0.0021) model time 0.4411 (0.4446) loss 2.7442 (2.9103) grad_norm 1.6736 (1.8440) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][380/625] eta 0:01:49 lr 0.000595 wd 0.0500 time 0.4452 (0.4474) data time 0.0006 (0.0021) model time 0.4446 (0.4446) loss 2.4289 (2.8990) grad_norm 1.6216 (1.8427) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][390/625] eta 0:01:45 lr 0.000595 wd 0.0500 time 0.4421 (0.4473) data time 0.0008 (0.0021) model time 0.4413 (0.4445) loss 1.9374 (2.8881) grad_norm 2.0156 (1.8389) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][400/625] eta 0:01:40 lr 0.000595 wd 0.0500 time 0.4422 (0.4472) data time 0.0008 (0.0020) model time 0.4415 (0.4445) loss 2.8163 (2.8892) grad_norm 1.3130 (1.8364) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][410/625] eta 0:01:36 lr 0.000595 wd 0.0500 time 0.4441 (0.4472) data time 0.0009 (0.0020) model time 0.4432 (0.4445) loss 2.8591 (2.8921) grad_norm 1.4073 (1.8269) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][420/625] eta 0:01:31 lr 0.000595 wd 0.0500 time 0.4445 (0.4471) data time 0.0008 (0.0020) model time 0.4436 (0.4444) loss 2.5031 (2.8984) grad_norm 7.6067 (1.8403) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][430/625] eta 0:01:27 lr 0.000595 wd 0.0500 time 0.4408 (0.4474) data time 0.0006 (0.0020) model time 0.4402 (0.4449) loss 2.2861 (2.8990) grad_norm 1.7230 (1.8374) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][440/625] eta 0:01:22 lr 0.000595 wd 0.0500 time 0.4435 (0.4477) data time 0.0006 (0.0019) model time 0.4429 (0.4452) loss 2.5247 (2.8971) grad_norm 2.1440 (1.8339) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:57:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][450/625] eta 0:01:18 lr 0.000595 wd 0.0500 time 0.4417 (0.4476) data time 0.0009 (0.0019) model time 0.4408 (0.4451) loss 2.7175 (2.9003) grad_norm 2.1280 (1.8353) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][460/625] eta 0:01:13 lr 0.000594 wd 0.0500 time 0.4411 (0.4475) data time 0.0006 (0.0019) model time 0.4405 (0.4450) loss 2.7504 (2.9006) grad_norm 1.6845 (1.8562) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][470/625] eta 0:01:09 lr 0.000594 wd 0.0500 time 0.4418 (0.4479) data time 0.0009 (0.0019) model time 0.4409 (0.4455) loss 3.1550 (2.8996) grad_norm 2.0101 (1.8505) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][480/625] eta 0:01:04 lr 0.000594 wd 0.0500 time 0.4526 (0.4478) data time 0.0006 (0.0019) model time 0.4520 (0.4454) loss 3.3801 (2.9018) grad_norm 1.6574 (1.8561) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][490/625] eta 0:01:00 lr 0.000594 wd 0.0500 time 0.4339 (0.4477) data time 0.0006 (0.0018) model time 0.4332 (0.4454) loss 3.4438 (2.9056) grad_norm 1.8679 (1.8649) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][500/625] eta 0:00:55 lr 0.000594 wd 0.0500 time 0.4415 (0.4476) data time 0.0009 (0.0018) model time 0.4407 (0.4453) loss 2.3104 (2.9006) grad_norm 1.6955 (1.8620) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][510/625] eta 0:00:51 lr 0.000594 wd 0.0500 time 0.4400 (0.4475) data time 0.0008 (0.0018) model time 0.4392 (0.4452) loss 2.6910 (2.8977) grad_norm 4.2315 (1.8680) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][520/625] eta 0:00:46 lr 0.000594 wd 0.0500 time 0.4412 (0.4474) data time 0.0007 (0.0018) model time 0.4405 (0.4451) loss 2.9039 (2.8979) grad_norm 1.4134 (1.8715) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][530/625] eta 0:00:42 lr 0.000594 wd 0.0500 time 0.4421 (0.4473) data time 0.0007 (0.0018) model time 0.4415 (0.4450) loss 2.0165 (2.8981) grad_norm 2.8385 (1.8834) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][540/625] eta 0:00:38 lr 0.000594 wd 0.0500 time 0.4400 (0.4472) data time 0.0008 (0.0018) model time 0.4391 (0.4449) loss 2.8826 (2.8989) grad_norm 1.8017 (1.8852) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][550/625] eta 0:00:33 lr 0.000593 wd 0.0500 time 0.4445 (0.4471) data time 0.0007 (0.0018) model time 0.4438 (0.4449) loss 2.8825 (2.8954) grad_norm 1.6266 (1.8882) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][560/625] eta 0:00:29 lr 0.000593 wd 0.0500 time 0.4428 (0.4470) data time 0.0008 (0.0017) model time 0.4420 (0.4448) loss 3.3391 (2.8977) grad_norm 2.1180 (1.8854) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][570/625] eta 0:00:24 lr 0.000593 wd 0.0500 time 0.4433 (0.4470) data time 0.0008 (0.0017) model time 0.4425 (0.4448) loss 2.7097 (2.8979) grad_norm 1.9524 (1.8834) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][580/625] eta 0:00:20 lr 0.000593 wd 0.0500 time 0.4425 (0.4469) data time 0.0008 (0.0017) model time 0.4417 (0.4447) loss 2.1314 (2.8937) grad_norm 1.6731 (1.8802) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:58:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][590/625] eta 0:00:15 lr 0.000593 wd 0.0500 time 0.4421 (0.4468) data time 0.0009 (0.0017) model time 0.4413 (0.4447) loss 3.3455 (2.8930) grad_norm 5.3236 (1.8825) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][600/625] eta 0:00:11 lr 0.000593 wd 0.0500 time 0.4417 (0.4467) data time 0.0006 (0.0017) model time 0.4410 (0.4446) loss 3.1052 (2.8918) grad_norm 1.8254 (1.8790) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][610/625] eta 0:00:06 lr 0.000593 wd 0.0500 time 0.4399 (0.4467) data time 0.0005 (0.0017) model time 0.4394 (0.4445) loss 2.8945 (2.8960) grad_norm 1.6834 (1.8750) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [161/300][620/625] eta 0:00:02 lr 0.000593 wd 0.0500 time 0.4395 (0.4468) data time 0.0004 (0.0017) model time 0.4392 (0.4447) loss 2.4373 (2.8928) grad_norm 1.1979 (1.8742) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 161 training takes 0:04:39 [2024-08-10 14:59:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 14:59:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 14:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5249 (0.5249) Acc@1 88.330 (88.330) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 14:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8501 (0.6437) Acc@1 79.150 (85.711) Acc@5 95.459 (97.510) Mem 16699MB [2024-08-10 14:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.135) Loss 0.9180 (0.7598) Acc@1 78.027 (82.650) Acc@5 94.971 (96.270) Mem 16699MB [2024-08-10 14:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.406 Acc@5 96.231 [2024-08-10 14:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 14:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.830 (0.830) Loss 0.4722 (0.4722) Acc@1 89.209 (89.209) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-10 14:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.187) Loss 0.7603 (0.5923) Acc@1 81.787 (86.954) Acc@5 96.484 (97.892) Mem 16699MB [2024-08-10 14:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.8579 (0.6961) Acc@1 78.418 (84.154) Acc@5 95.752 (96.852) Mem 16699MB [2024-08-10 14:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.847 Acc@5 96.861 [2024-08-10 14:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-10 14:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.85% [2024-08-10 14:59:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 14:59:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 14:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][0/625] eta 0:08:11 lr 0.000593 wd 0.0500 time 0.7857 (0.7857) data time 0.3900 (0.3900) model time 0.0000 (0.0000) loss 3.2091 (3.2091) grad_norm 1.6832 (1.6832) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][10/625] eta 0:04:51 lr 0.000593 wd 0.0500 time 0.4487 (0.4741) data time 0.0006 (0.0363) model time 0.0000 (0.0000) loss 2.3610 (2.4864) grad_norm 5.4391 (2.1229) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][20/625] eta 0:04:38 lr 0.000592 wd 0.0500 time 0.4405 (0.4595) data time 0.0008 (0.0194) model time 0.0000 (0.0000) loss 3.1680 (2.7174) grad_norm 1.4100 (1.9253) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][30/625] eta 0:04:36 lr 0.000592 wd 0.0500 time 0.4393 (0.4646) data time 0.0008 (0.0135) model time 0.0000 (0.0000) loss 3.1475 (2.8129) grad_norm 1.2995 (1.8431) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][40/625] eta 0:04:28 lr 0.000592 wd 0.0500 time 0.4398 (0.4590) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 2.8093 (2.8778) grad_norm 1.1743 (1.7417) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][50/625] eta 0:04:21 lr 0.000592 wd 0.0500 time 0.4367 (0.4556) data time 0.0009 (0.0085) model time 0.0000 (0.0000) loss 3.2972 (2.8610) grad_norm 2.1787 (1.7720) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][60/625] eta 0:04:18 lr 0.000592 wd 0.0500 time 0.4416 (0.4575) data time 0.0010 (0.0073) model time 0.4405 (0.4659) loss 2.5427 (2.8465) grad_norm 1.7918 (1.7969) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 14:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][70/625] eta 0:04:12 lr 0.000592 wd 0.0500 time 0.4423 (0.4553) data time 0.0010 (0.0064) model time 0.4413 (0.4536) loss 2.7052 (2.8650) grad_norm 1.6105 (1.7788) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][80/625] eta 0:04:07 lr 0.000592 wd 0.0500 time 0.4403 (0.4539) data time 0.0008 (0.0057) model time 0.4394 (0.4501) loss 2.9715 (2.8802) grad_norm 1.6877 (1.8058) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][90/625] eta 0:04:02 lr 0.000592 wd 0.0500 time 0.4389 (0.4526) data time 0.0006 (0.0052) model time 0.4382 (0.4479) loss 2.9011 (2.8643) grad_norm 1.8809 (1.8247) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][100/625] eta 0:03:57 lr 0.000592 wd 0.0500 time 0.4407 (0.4515) data time 0.0007 (0.0048) model time 0.4400 (0.4463) loss 3.6585 (2.8829) grad_norm 1.8133 (1.8327) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][110/625] eta 0:03:52 lr 0.000591 wd 0.0500 time 0.4411 (0.4505) data time 0.0006 (0.0044) model time 0.4404 (0.4453) loss 3.7309 (2.8969) grad_norm 2.6283 (1.8769) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][120/625] eta 0:03:47 lr 0.000591 wd 0.0500 time 0.4406 (0.4498) data time 0.0008 (0.0041) model time 0.4398 (0.4446) loss 2.4426 (2.9095) grad_norm 1.3843 (1.8818) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][130/625] eta 0:03:42 lr 0.000591 wd 0.0500 time 0.4431 (0.4493) data time 0.0006 (0.0039) model time 0.4424 (0.4443) loss 3.5866 (2.9111) grad_norm 1.4051 (1.8564) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][140/625] eta 0:03:38 lr 0.000591 wd 0.0500 time 0.4468 (0.4503) data time 0.0007 (0.0037) model time 0.4461 (0.4464) loss 3.5457 (2.9167) grad_norm 1.2655 (1.8377) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][150/625] eta 0:03:33 lr 0.000591 wd 0.0500 time 0.4475 (0.4499) data time 0.0008 (0.0035) model time 0.4467 (0.4461) loss 2.7354 (2.9135) grad_norm 1.5316 (1.8389) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][160/625] eta 0:03:29 lr 0.000591 wd 0.0500 time 0.4407 (0.4496) data time 0.0008 (0.0033) model time 0.4398 (0.4459) loss 2.9431 (2.9131) grad_norm 2.3997 (1.8344) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][170/625] eta 0:03:24 lr 0.000591 wd 0.0500 time 0.4403 (0.4493) data time 0.0007 (0.0032) model time 0.4396 (0.4456) loss 2.1065 (2.8985) grad_norm 1.7078 (1.8326) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][180/625] eta 0:03:19 lr 0.000591 wd 0.0500 time 0.4411 (0.4489) data time 0.0009 (0.0031) model time 0.4402 (0.4454) loss 3.1270 (2.8939) grad_norm 1.5176 (1.8264) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][190/625] eta 0:03:15 lr 0.000591 wd 0.0500 time 0.4429 (0.4486) data time 0.0009 (0.0030) model time 0.4421 (0.4451) loss 3.0441 (2.8854) grad_norm 1.8070 (1.8257) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][200/625] eta 0:03:10 lr 0.000591 wd 0.0500 time 0.4434 (0.4483) data time 0.0008 (0.0029) model time 0.4425 (0.4449) loss 3.2046 (2.8883) grad_norm 1.3113 (1.8156) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:00:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][210/625] eta 0:03:05 lr 0.000590 wd 0.0500 time 0.4433 (0.4480) data time 0.0007 (0.0028) model time 0.4427 (0.4447) loss 3.1922 (2.8994) grad_norm 2.4752 (1.8030) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][220/625] eta 0:03:01 lr 0.000590 wd 0.0500 time 0.4423 (0.4479) data time 0.0007 (0.0027) model time 0.4415 (0.4446) loss 2.3304 (2.8982) grad_norm 2.4914 (1.8109) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][230/625] eta 0:02:56 lr 0.000590 wd 0.0500 time 0.4425 (0.4477) data time 0.0008 (0.0026) model time 0.4416 (0.4445) loss 2.6812 (2.8998) grad_norm 1.7253 (1.8065) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][240/625] eta 0:02:52 lr 0.000590 wd 0.0500 time 0.4393 (0.4475) data time 0.0007 (0.0025) model time 0.4387 (0.4443) loss 3.1412 (2.8962) grad_norm 1.7924 (1.7996) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][250/625] eta 0:02:48 lr 0.000590 wd 0.0500 time 0.4401 (0.4480) data time 0.0007 (0.0025) model time 0.4394 (0.4452) loss 2.2262 (2.8970) grad_norm 1.4605 (1.8024) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][260/625] eta 0:02:43 lr 0.000590 wd 0.0500 time 0.4431 (0.4478) data time 0.0007 (0.0024) model time 0.4424 (0.4450) loss 2.3129 (2.8942) grad_norm 1.7406 (1.7915) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][270/625] eta 0:02:38 lr 0.000590 wd 0.0500 time 0.4473 (0.4476) data time 0.0006 (0.0024) model time 0.4467 (0.4448) loss 1.9667 (2.8995) grad_norm 3.1325 (1.8023) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][280/625] eta 0:02:34 lr 0.000590 wd 0.0500 time 0.4392 (0.4475) data time 0.0009 (0.0023) model time 0.4383 (0.4447) loss 3.3082 (2.8947) grad_norm 1.3772 (1.8084) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][290/625] eta 0:02:29 lr 0.000590 wd 0.0500 time 0.4440 (0.4473) data time 0.0007 (0.0023) model time 0.4434 (0.4446) loss 2.3390 (2.8905) grad_norm 1.0108 (1.8109) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][300/625] eta 0:02:25 lr 0.000589 wd 0.0500 time 0.4421 (0.4471) data time 0.0006 (0.0022) model time 0.4415 (0.4444) loss 1.9412 (2.8906) grad_norm 2.1502 (1.8176) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][310/625] eta 0:02:20 lr 0.000589 wd 0.0500 time 0.4406 (0.4469) data time 0.0009 (0.0022) model time 0.4396 (0.4443) loss 2.6451 (2.8856) grad_norm 1.3863 (1.8088) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][320/625] eta 0:02:16 lr 0.000589 wd 0.0500 time 0.4410 (0.4467) data time 0.0008 (0.0021) model time 0.4402 (0.4441) loss 2.9533 (2.8736) grad_norm 1.7121 (1.8077) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][330/625] eta 0:02:11 lr 0.000589 wd 0.0500 time 0.4428 (0.4466) data time 0.0007 (0.0021) model time 0.4421 (0.4440) loss 1.7663 (2.8725) grad_norm 1.1851 (1.8062) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:01:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][340/625] eta 0:02:07 lr 0.000589 wd 0.0500 time 0.4413 (0.4464) data time 0.0009 (0.0021) model time 0.4404 (0.4439) loss 2.6551 (2.8749) grad_norm 2.0505 (1.8008) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][350/625] eta 0:02:02 lr 0.000589 wd 0.0500 time 0.4424 (0.4463) data time 0.0008 (0.0020) model time 0.4416 (0.4438) loss 3.0379 (2.8811) grad_norm 1.3732 (1.7957) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][360/625] eta 0:01:58 lr 0.000589 wd 0.0500 time 0.4413 (0.4468) data time 0.0009 (0.0020) model time 0.4405 (0.4444) loss 2.9857 (2.8873) grad_norm 1.9733 (1.7978) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][370/625] eta 0:01:54 lr 0.000589 wd 0.0500 time 0.4431 (0.4471) data time 0.0007 (0.0020) model time 0.4424 (0.4448) loss 3.2364 (2.8852) grad_norm 2.6492 (1.8536) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][380/625] eta 0:01:49 lr 0.000589 wd 0.0500 time 0.4441 (0.4469) data time 0.0008 (0.0019) model time 0.4433 (0.4447) loss 2.9704 (2.8888) grad_norm 3.2145 (1.8676) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][390/625] eta 0:01:44 lr 0.000589 wd 0.0500 time 0.4402 (0.4468) data time 0.0006 (0.0019) model time 0.4396 (0.4445) loss 3.1615 (2.8882) grad_norm 1.6881 (1.8668) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][400/625] eta 0:01:40 lr 0.000588 wd 0.0500 time 0.4367 (0.4467) data time 0.0007 (0.0019) model time 0.4360 (0.4444) loss 3.0795 (2.8879) grad_norm 1.8107 (1.8573) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][410/625] eta 0:01:36 lr 0.000588 wd 0.0500 time 0.4397 (0.4466) data time 0.0010 (0.0019) model time 0.4387 (0.4443) loss 2.0414 (2.8865) grad_norm 1.4974 (1.8509) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][420/625] eta 0:01:31 lr 0.000588 wd 0.0500 time 0.4438 (0.4465) data time 0.0007 (0.0018) model time 0.4431 (0.4443) loss 3.4689 (2.8902) grad_norm 2.0451 (1.8534) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][430/625] eta 0:01:27 lr 0.000588 wd 0.0500 time 0.4453 (0.4464) data time 0.0011 (0.0018) model time 0.4442 (0.4443) loss 3.7120 (2.8963) grad_norm 1.4517 (1.8624) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][440/625] eta 0:01:22 lr 0.000588 wd 0.0500 time 0.4405 (0.4463) data time 0.0011 (0.0018) model time 0.4393 (0.4442) loss 3.0178 (2.9000) grad_norm 1.4466 (1.8563) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][450/625] eta 0:01:18 lr 0.000588 wd 0.0500 time 0.4388 (0.4463) data time 0.0009 (0.0018) model time 0.4379 (0.4441) loss 3.5250 (2.9024) grad_norm 2.2756 (1.8512) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][460/625] eta 0:01:13 lr 0.000588 wd 0.0500 time 0.4468 (0.4462) data time 0.0008 (0.0018) model time 0.4460 (0.4441) loss 2.7565 (2.9033) grad_norm 2.3426 (1.8854) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][470/625] eta 0:01:09 lr 0.000588 wd 0.0500 time 0.4433 (0.4468) data time 0.0007 (0.0018) model time 0.4427 (0.4447) loss 2.3581 (2.8985) grad_norm 1.9460 (1.8879) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:02:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][480/625] eta 0:01:04 lr 0.000588 wd 0.0500 time 0.4396 (0.4470) data time 0.0009 (0.0017) model time 0.4387 (0.4451) loss 3.0130 (2.8991) grad_norm 1.5991 (1.8905) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][490/625] eta 0:01:00 lr 0.000587 wd 0.0500 time 0.4461 (0.4469) data time 0.0006 (0.0017) model time 0.4454 (0.4450) loss 2.1081 (2.8994) grad_norm 1.3924 (1.8833) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][500/625] eta 0:00:55 lr 0.000587 wd 0.0500 time 0.4400 (0.4468) data time 0.0006 (0.0017) model time 0.4393 (0.4449) loss 2.9829 (2.9044) grad_norm 1.7130 (1.8828) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][510/625] eta 0:00:51 lr 0.000587 wd 0.0500 time 0.4422 (0.4468) data time 0.0008 (0.0017) model time 0.4413 (0.4448) loss 2.6512 (2.9075) grad_norm 1.2969 (1.8774) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][520/625] eta 0:00:46 lr 0.000587 wd 0.0500 time 0.4528 (0.4467) data time 0.0010 (0.0017) model time 0.4518 (0.4449) loss 2.7620 (2.9071) grad_norm 1.6432 (1.8805) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][530/625] eta 0:00:42 lr 0.000587 wd 0.0500 time 0.4434 (0.4467) data time 0.0008 (0.0017) model time 0.4425 (0.4448) loss 3.0264 (2.9025) grad_norm 1.3643 (1.8733) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][540/625] eta 0:00:37 lr 0.000587 wd 0.0500 time 0.4467 (0.4467) data time 0.0008 (0.0016) model time 0.4459 (0.4448) loss 3.1257 (2.9058) grad_norm 1.6228 (1.8695) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][550/625] eta 0:00:33 lr 0.000587 wd 0.0500 time 0.4428 (0.4467) data time 0.0009 (0.0016) model time 0.4419 (0.4449) loss 2.7185 (2.9107) grad_norm 2.2927 (1.8704) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][560/625] eta 0:00:29 lr 0.000587 wd 0.0500 time 0.4446 (0.4468) data time 0.0006 (0.0016) model time 0.4440 (0.4450) loss 3.2942 (2.9092) grad_norm 1.2363 (1.8678) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][570/625] eta 0:00:24 lr 0.000587 wd 0.0500 time 0.4415 (0.4467) data time 0.0007 (0.0016) model time 0.4408 (0.4449) loss 3.2720 (2.9052) grad_norm 2.0104 (1.8662) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][580/625] eta 0:00:20 lr 0.000586 wd 0.0500 time 0.4411 (0.4467) data time 0.0006 (0.0016) model time 0.4405 (0.4449) loss 2.3102 (2.9028) grad_norm 1.2806 (1.8649) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][590/625] eta 0:00:15 lr 0.000586 wd 0.0500 time 0.4423 (0.4469) data time 0.0006 (0.0016) model time 0.4416 (0.4451) loss 3.5997 (2.9007) grad_norm 1.6995 (1.8616) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][600/625] eta 0:00:11 lr 0.000586 wd 0.0500 time 0.4396 (0.4468) data time 0.0007 (0.0016) model time 0.4390 (0.4451) loss 3.0603 (2.9026) grad_norm 1.3676 (1.8553) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:03:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][610/625] eta 0:00:06 lr 0.000586 wd 0.0500 time 0.4438 (0.4469) data time 0.0006 (0.0016) model time 0.4432 (0.4452) loss 3.0606 (2.9033) grad_norm 2.6103 (1.8575) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [162/300][620/625] eta 0:00:02 lr 0.000586 wd 0.0500 time 0.4380 (0.4474) data time 0.0006 (0.0016) model time 0.4373 (0.4457) loss 2.0978 (2.9010) grad_norm 1.9556 (1.8634) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 162 training takes 0:04:39 [2024-08-10 15:04:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:04:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:04:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5381 (0.5381) Acc@1 87.842 (87.842) Acc@5 98.486 (98.486) Mem 16699MB [2024-08-10 15:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8955 (0.6538) Acc@1 78.467 (85.662) Acc@5 95.508 (97.581) Mem 16699MB [2024-08-10 15:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9927 (0.7724) Acc@1 75.928 (82.696) Acc@5 94.482 (96.275) Mem 16699MB [2024-08-10 15:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.346 Acc@5 96.239 [2024-08-10 15:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.3% [2024-08-10 15:04:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.4727 (0.4727) Acc@1 89.307 (89.307) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:04:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.189) Loss 0.7617 (0.5927) Acc@1 81.836 (87.003) Acc@5 96.484 (97.860) Mem 16699MB [2024-08-10 15:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.154) Loss 0.8564 (0.6957) Acc@1 78.467 (84.194) Acc@5 95.703 (96.842) Mem 16699MB [2024-08-10 15:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.885 Acc@5 96.851 [2024-08-10 15:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.89% [2024-08-10 15:04:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:04:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][0/625] eta 0:07:51 lr 0.000586 wd 0.0500 time 0.7539 (0.7539) data time 0.3649 (0.3649) model time 0.0000 (0.0000) loss 3.4554 (3.4554) grad_norm 1.3419 (1.3419) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][10/625] eta 0:04:54 lr 0.000586 wd 0.0500 time 0.4417 (0.4785) data time 0.0009 (0.0342) model time 0.0000 (0.0000) loss 2.3161 (2.8306) grad_norm 1.4863 (1.5122) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][20/625] eta 0:04:40 lr 0.000586 wd 0.0500 time 0.4353 (0.4642) data time 0.0006 (0.0183) model time 0.0000 (0.0000) loss 3.4229 (2.9501) grad_norm 1.1665 (1.5935) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][30/625] eta 0:04:34 lr 0.000586 wd 0.0500 time 0.4428 (0.4606) data time 0.0007 (0.0134) model time 0.0000 (0.0000) loss 3.3720 (2.9377) grad_norm 7.5966 (1.8195) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][40/625] eta 0:04:27 lr 0.000586 wd 0.0500 time 0.4473 (0.4572) data time 0.0006 (0.0104) model time 0.0000 (0.0000) loss 2.2851 (2.8629) grad_norm 1.7280 (1.8466) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][50/625] eta 0:04:21 lr 0.000585 wd 0.0500 time 0.4371 (0.4548) data time 0.0009 (0.0085) model time 0.0000 (0.0000) loss 3.0768 (2.9005) grad_norm 1.4735 (1.8070) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][60/625] eta 0:04:15 lr 0.000585 wd 0.0500 time 0.4422 (0.4528) data time 0.0008 (0.0075) model time 0.4414 (0.4399) loss 3.1003 (2.9083) grad_norm 1.7278 (1.7463) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][70/625] eta 0:04:12 lr 0.000585 wd 0.0500 time 0.4433 (0.4541) data time 0.0009 (0.0066) model time 0.4424 (0.4508) loss 3.5137 (2.9174) grad_norm 2.1606 (1.7855) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][80/625] eta 0:04:07 lr 0.000585 wd 0.0500 time 0.4431 (0.4538) data time 0.0008 (0.0059) model time 0.4423 (0.4506) loss 3.5305 (2.9343) grad_norm 2.6677 (1.8192) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][90/625] eta 0:04:02 lr 0.000585 wd 0.0500 time 0.4439 (0.4526) data time 0.0008 (0.0053) model time 0.4430 (0.4484) loss 3.0227 (2.9032) grad_norm 2.2630 (1.9149) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:04:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][100/625] eta 0:03:57 lr 0.000585 wd 0.0500 time 0.4401 (0.4517) data time 0.0009 (0.0049) model time 0.4392 (0.4473) loss 2.9821 (2.9146) grad_norm 2.1007 (1.9503) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][110/625] eta 0:03:52 lr 0.000585 wd 0.0500 time 0.4412 (0.4510) data time 0.0008 (0.0045) model time 0.4404 (0.4465) loss 2.7561 (2.9237) grad_norm 1.8751 (1.9515) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][120/625] eta 0:03:47 lr 0.000585 wd 0.0500 time 0.4407 (0.4503) data time 0.0008 (0.0042) model time 0.4399 (0.4460) loss 3.1110 (2.9159) grad_norm 3.5930 (1.9723) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][130/625] eta 0:03:43 lr 0.000585 wd 0.0500 time 0.4443 (0.4511) data time 0.0009 (0.0040) model time 0.4435 (0.4476) loss 2.7833 (2.9331) grad_norm 1.7985 (1.9855) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][140/625] eta 0:03:38 lr 0.000585 wd 0.0500 time 0.4449 (0.4505) data time 0.0007 (0.0038) model time 0.4442 (0.4470) loss 3.1998 (2.9377) grad_norm 1.9955 (1.9832) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][150/625] eta 0:03:33 lr 0.000584 wd 0.0500 time 0.4422 (0.4501) data time 0.0009 (0.0036) model time 0.4414 (0.4467) loss 2.9749 (2.9264) grad_norm 1.4496 (1.9592) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][160/625] eta 0:03:29 lr 0.000584 wd 0.0500 time 0.4424 (0.4497) data time 0.0006 (0.0034) model time 0.4418 (0.4463) loss 3.2084 (2.9363) grad_norm 1.3629 (1.9339) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][170/625] eta 0:03:24 lr 0.000584 wd 0.0500 time 0.4406 (0.4494) data time 0.0008 (0.0033) model time 0.4399 (0.4461) loss 1.9664 (2.9289) grad_norm 1.1965 (1.9053) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][180/625] eta 0:03:19 lr 0.000584 wd 0.0500 time 0.4410 (0.4490) data time 0.0008 (0.0031) model time 0.4403 (0.4457) loss 3.4022 (2.9251) grad_norm 1.2782 (1.8838) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][190/625] eta 0:03:15 lr 0.000584 wd 0.0500 time 0.4441 (0.4490) data time 0.0010 (0.0030) model time 0.4431 (0.4459) loss 3.2542 (2.9311) grad_norm 2.2353 (1.8860) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][200/625] eta 0:03:10 lr 0.000584 wd 0.0500 time 0.4405 (0.4487) data time 0.0006 (0.0029) model time 0.4399 (0.4456) loss 3.6150 (2.9466) grad_norm 1.4124 (1.8801) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][210/625] eta 0:03:06 lr 0.000584 wd 0.0500 time 0.4410 (0.4484) data time 0.0006 (0.0028) model time 0.4403 (0.4453) loss 2.5640 (2.9436) grad_norm 1.3943 (1.8661) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][220/625] eta 0:03:01 lr 0.000584 wd 0.0500 time 0.4451 (0.4490) data time 0.0008 (0.0027) model time 0.4444 (0.4463) loss 2.9135 (2.9488) grad_norm 1.3889 (1.8571) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][230/625] eta 0:02:57 lr 0.000584 wd 0.0500 time 0.4420 (0.4488) data time 0.0007 (0.0026) model time 0.4413 (0.4461) loss 2.9303 (2.9452) grad_norm 1.9452 (1.8450) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][240/625] eta 0:02:52 lr 0.000583 wd 0.0500 time 0.4701 (0.4488) data time 0.0006 (0.0026) model time 0.4695 (0.4462) loss 3.3608 (2.9406) grad_norm 1.6103 (1.8321) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][250/625] eta 0:02:48 lr 0.000583 wd 0.0500 time 0.4456 (0.4486) data time 0.0007 (0.0025) model time 0.4449 (0.4461) loss 3.3241 (2.9424) grad_norm 1.1060 (1.8162) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][260/625] eta 0:02:43 lr 0.000583 wd 0.0500 time 0.4397 (0.4491) data time 0.0006 (0.0024) model time 0.4391 (0.4467) loss 3.0181 (2.9390) grad_norm 1.6105 (1.8036) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][270/625] eta 0:02:39 lr 0.000583 wd 0.0500 time 0.4416 (0.4488) data time 0.0009 (0.0024) model time 0.4407 (0.4465) loss 2.8681 (2.9387) grad_norm 1.5907 (1.8027) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][280/625] eta 0:02:34 lr 0.000583 wd 0.0500 time 0.4374 (0.4486) data time 0.0006 (0.0023) model time 0.4368 (0.4462) loss 2.9760 (2.9323) grad_norm 4.0344 (1.8143) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][290/625] eta 0:02:30 lr 0.000583 wd 0.0500 time 0.4438 (0.4484) data time 0.0007 (0.0023) model time 0.4432 (0.4461) loss 3.5844 (2.9354) grad_norm 2.0076 (1.8201) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][300/625] eta 0:02:25 lr 0.000583 wd 0.0500 time 0.4469 (0.4487) data time 0.0007 (0.0022) model time 0.4462 (0.4465) loss 3.0333 (2.9329) grad_norm 1.6645 (1.8257) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][310/625] eta 0:02:21 lr 0.000583 wd 0.0500 time 0.4512 (0.4486) data time 0.0008 (0.0022) model time 0.4504 (0.4464) loss 2.7574 (2.9281) grad_norm 1.7458 (1.8187) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][320/625] eta 0:02:16 lr 0.000583 wd 0.0500 time 0.4449 (0.4484) data time 0.0007 (0.0021) model time 0.4442 (0.4462) loss 3.3368 (2.9323) grad_norm 1.5938 (1.8124) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][330/625] eta 0:02:12 lr 0.000582 wd 0.0500 time 0.4441 (0.4482) data time 0.0006 (0.0021) model time 0.4435 (0.4460) loss 2.4705 (2.9351) grad_norm 1.8979 (1.8146) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][340/625] eta 0:02:07 lr 0.000582 wd 0.0500 time 0.4373 (0.4480) data time 0.0007 (0.0021) model time 0.4366 (0.4458) loss 3.5339 (2.9433) grad_norm 2.8841 (1.8177) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][350/625] eta 0:02:03 lr 0.000582 wd 0.0500 time 0.4397 (0.4484) data time 0.0009 (0.0021) model time 0.4389 (0.4464) loss 2.9647 (2.9473) grad_norm 1.4790 (1.8240) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:06:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][360/625] eta 0:01:58 lr 0.000582 wd 0.0500 time 0.4441 (0.4489) data time 0.0006 (0.0020) model time 0.4436 (0.4470) loss 3.1495 (2.9448) grad_norm 1.7387 (1.8219) loss_scale 1024.0000 (516.2548) mem 16699MB [2024-08-10 15:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][370/625] eta 0:01:54 lr 0.000582 wd 0.0500 time 0.4418 (0.4488) data time 0.0006 (0.0020) model time 0.4412 (0.4469) loss 1.9111 (2.9371) grad_norm 1.5077 (1.8176) loss_scale 1024.0000 (529.9407) mem 16699MB [2024-08-10 15:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][380/625] eta 0:01:49 lr 0.000582 wd 0.0500 time 0.4367 (0.4486) data time 0.0011 (0.0020) model time 0.4356 (0.4467) loss 3.0071 (2.9379) grad_norm 1.4330 (1.8168) loss_scale 1024.0000 (542.9081) mem 16699MB [2024-08-10 15:07:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][390/625] eta 0:01:45 lr 0.000582 wd 0.0500 time 0.4406 (0.4485) data time 0.0009 (0.0019) model time 0.4397 (0.4466) loss 2.6782 (2.9408) grad_norm 1.8285 (1.8225) loss_scale 1024.0000 (555.2123) mem 16699MB [2024-08-10 15:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][400/625] eta 0:01:40 lr 0.000582 wd 0.0500 time 0.4391 (0.4483) data time 0.0007 (0.0019) model time 0.4385 (0.4464) loss 2.9505 (2.9365) grad_norm 1.4643 (1.8193) loss_scale 1024.0000 (566.9027) mem 16699MB [2024-08-10 15:07:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][410/625] eta 0:01:36 lr 0.000582 wd 0.0500 time 0.4433 (0.4486) data time 0.0009 (0.0019) model time 0.4423 (0.4467) loss 3.3334 (2.9317) grad_norm 2.6588 (1.8805) loss_scale 1024.0000 (578.0243) mem 16699MB [2024-08-10 15:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][420/625] eta 0:01:31 lr 0.000582 wd 0.0500 time 0.4408 (0.4484) data time 0.0007 (0.0019) model time 0.4401 (0.4466) loss 2.5462 (2.9264) grad_norm 1.5426 (1.8817) loss_scale 1024.0000 (588.6176) mem 16699MB [2024-08-10 15:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][430/625] eta 0:01:27 lr 0.000581 wd 0.0500 time 0.4373 (0.4483) data time 0.0009 (0.0018) model time 0.4365 (0.4464) loss 3.1025 (2.9141) grad_norm 2.7825 (1.8911) loss_scale 1024.0000 (598.7193) mem 16699MB [2024-08-10 15:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][440/625] eta 0:01:22 lr 0.000581 wd 0.0500 time 0.6547 (0.4486) data time 0.0006 (0.0018) model time 0.6541 (0.4469) loss 2.8063 (2.9151) grad_norm 1.6506 (1.9057) loss_scale 1024.0000 (608.3628) mem 16699MB [2024-08-10 15:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][450/625] eta 0:01:18 lr 0.000581 wd 0.0500 time 0.4393 (0.4484) data time 0.0008 (0.0018) model time 0.4385 (0.4466) loss 2.0174 (2.9135) grad_norm 1.1921 (1.9009) loss_scale 1024.0000 (617.5787) mem 16699MB [2024-08-10 15:07:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][460/625] eta 0:01:13 lr 0.000581 wd 0.0500 time 0.4416 (0.4482) data time 0.0007 (0.0018) model time 0.4409 (0.4465) loss 3.4546 (2.9115) grad_norm 1.2506 (1.8937) loss_scale 1024.0000 (626.3948) mem 16699MB [2024-08-10 15:07:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][470/625] eta 0:01:09 lr 0.000581 wd 0.0500 time 0.4404 (0.4481) data time 0.0008 (0.0018) model time 0.4396 (0.4464) loss 2.6502 (2.9125) grad_norm 1.4202 (1.8928) loss_scale 1024.0000 (634.8365) mem 16699MB [2024-08-10 15:07:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][480/625] eta 0:01:05 lr 0.000581 wd 0.0500 time 0.4420 (0.4484) data time 0.0008 (0.0017) model time 0.4412 (0.4467) loss 3.1873 (2.9152) grad_norm 1.2858 (1.8851) loss_scale 1024.0000 (642.9272) mem 16699MB [2024-08-10 15:07:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][490/625] eta 0:01:00 lr 0.000581 wd 0.0500 time 0.4411 (0.4482) data time 0.0006 (0.0017) model time 0.4405 (0.4465) loss 2.3700 (2.9116) grad_norm 1.4451 (1.8828) loss_scale 1024.0000 (650.6884) mem 16699MB [2024-08-10 15:07:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][500/625] eta 0:00:56 lr 0.000581 wd 0.0500 time 0.4413 (0.4487) data time 0.0007 (0.0017) model time 0.4407 (0.4471) loss 2.3862 (2.9096) grad_norm 2.3577 (1.8817) loss_scale 1024.0000 (658.1397) mem 16699MB [2024-08-10 15:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][510/625] eta 0:00:51 lr 0.000581 wd 0.0500 time 0.4424 (0.4486) data time 0.0008 (0.0017) model time 0.4416 (0.4470) loss 1.9253 (2.9074) grad_norm 1.5395 (1.8893) loss_scale 1024.0000 (665.2994) mem 16699MB [2024-08-10 15:08:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][520/625] eta 0:00:47 lr 0.000580 wd 0.0500 time 0.4453 (0.4485) data time 0.0006 (0.0017) model time 0.4446 (0.4469) loss 3.1727 (2.9102) grad_norm 1.0588 (1.8986) loss_scale 1024.0000 (672.1843) mem 16699MB [2024-08-10 15:08:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][530/625] eta 0:00:42 lr 0.000580 wd 0.0500 time 0.4422 (0.4484) data time 0.0008 (0.0017) model time 0.4414 (0.4468) loss 2.8478 (2.9112) grad_norm 1.1003 (1.9010) loss_scale 1024.0000 (678.8098) mem 16699MB [2024-08-10 15:08:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][540/625] eta 0:00:38 lr 0.000580 wd 0.0500 time 0.4455 (0.4484) data time 0.0007 (0.0016) model time 0.4448 (0.4468) loss 1.8319 (2.9117) grad_norm 1.7955 (1.8990) loss_scale 1024.0000 (685.1904) mem 16699MB [2024-08-10 15:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][550/625] eta 0:00:33 lr 0.000580 wd 0.0500 time 0.4418 (0.4486) data time 0.0008 (0.0016) model time 0.4410 (0.4470) loss 2.1753 (2.9093) grad_norm 1.5321 (1.8935) loss_scale 1024.0000 (691.3394) mem 16699MB [2024-08-10 15:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][560/625] eta 0:00:29 lr 0.000580 wd 0.0500 time 0.4476 (0.4485) data time 0.0006 (0.0016) model time 0.4469 (0.4469) loss 2.3863 (2.9058) grad_norm 1.4935 (1.8852) loss_scale 1024.0000 (697.2692) mem 16699MB [2024-08-10 15:08:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][570/625] eta 0:00:24 lr 0.000580 wd 0.0500 time 0.4414 (0.4484) data time 0.0007 (0.0016) model time 0.4407 (0.4468) loss 2.7120 (2.9056) grad_norm 2.2327 (1.8825) loss_scale 1024.0000 (702.9912) mem 16699MB [2024-08-10 15:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][580/625] eta 0:00:20 lr 0.000580 wd 0.0500 time 0.4425 (0.4483) data time 0.0008 (0.0016) model time 0.4417 (0.4468) loss 2.3743 (2.9087) grad_norm 1.3981 (1.8770) loss_scale 1024.0000 (708.5164) mem 16699MB [2024-08-10 15:08:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][590/625] eta 0:00:15 lr 0.000580 wd 0.0500 time 0.4426 (0.4482) data time 0.0006 (0.0016) model time 0.4419 (0.4467) loss 3.4708 (2.9085) grad_norm 1.6362 (1.8788) loss_scale 1024.0000 (713.8545) mem 16699MB [2024-08-10 15:08:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][600/625] eta 0:00:11 lr 0.000580 wd 0.0500 time 0.4409 (0.4481) data time 0.0009 (0.0016) model time 0.4401 (0.4466) loss 3.0774 (2.9077) grad_norm 1.4867 (1.8742) loss_scale 1024.0000 (719.0150) mem 16699MB [2024-08-10 15:08:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][610/625] eta 0:00:06 lr 0.000580 wd 0.0500 time 0.4336 (0.4480) data time 0.0007 (0.0016) model time 0.4330 (0.4465) loss 2.2559 (2.9093) grad_norm 4.1408 (1.8872) loss_scale 1024.0000 (724.0065) mem 16699MB [2024-08-10 15:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [163/300][620/625] eta 0:00:02 lr 0.000579 wd 0.0500 time 0.4361 (0.4479) data time 0.0006 (0.0016) model time 0.4355 (0.4463) loss 1.8951 (2.9077) grad_norm 1.4951 (1.8898) loss_scale 1024.0000 (728.8374) mem 16699MB [2024-08-10 15:08:54 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 163 training takes 0:04:40 [2024-08-10 15:08:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:08:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:08:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5049 (0.5049) Acc@1 88.428 (88.428) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 15:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8525 (0.6307) Acc@1 79.541 (85.751) Acc@5 95.557 (97.603) Mem 16699MB [2024-08-10 15:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9170 (0.7486) Acc@1 76.807 (82.799) Acc@5 95.654 (96.333) Mem 16699MB [2024-08-10 15:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.460 Acc@5 96.339 [2024-08-10 15:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-10 15:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.46% [2024-08-10 15:08:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 15:09:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 15:09:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.4739 (0.4739) Acc@1 89.307 (89.307) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-10 15:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.7622 (0.5924) Acc@1 81.543 (87.012) Acc@5 96.387 (97.843) Mem 16699MB [2024-08-10 15:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.8560 (0.6954) Acc@1 78.369 (84.194) Acc@5 95.801 (96.845) Mem 16699MB [2024-08-10 15:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.887 Acc@5 96.859 [2024-08-10 15:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.89% [2024-08-10 15:09:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:09:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][0/625] eta 0:07:57 lr 0.000579 wd 0.0500 time 0.7644 (0.7644) data time 0.3679 (0.3679) model time 0.0000 (0.0000) loss 1.8452 (1.8452) grad_norm 1.8675 (1.8675) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][10/625] eta 0:04:49 lr 0.000579 wd 0.0500 time 0.4388 (0.4707) data time 0.0007 (0.0343) model time 0.0000 (0.0000) loss 3.4035 (2.8554) grad_norm 1.8823 (2.1228) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][20/625] eta 0:04:39 lr 0.000579 wd 0.0500 time 0.4416 (0.4612) data time 0.0008 (0.0184) model time 0.0000 (0.0000) loss 3.0906 (2.9273) grad_norm 2.4321 (2.3450) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][30/625] eta 0:04:35 lr 0.000579 wd 0.0500 time 0.6519 (0.4623) data time 0.0009 (0.0127) model time 0.0000 (0.0000) loss 2.0646 (2.9249) grad_norm 1.7931 (2.1774) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][40/625] eta 0:04:27 lr 0.000579 wd 0.0500 time 0.4486 (0.4579) data time 0.0006 (0.0099) model time 0.0000 (0.0000) loss 3.2589 (2.8923) grad_norm 1.6837 (2.4541) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][50/625] eta 0:04:21 lr 0.000579 wd 0.0500 time 0.4438 (0.4553) data time 0.0008 (0.0081) model time 0.0000 (0.0000) loss 3.5028 (2.9126) grad_norm 1.8552 (2.2932) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][60/625] eta 0:04:16 lr 0.000579 wd 0.0500 time 0.4458 (0.4534) data time 0.0006 (0.0069) model time 0.4452 (0.4429) loss 3.3285 (2.9290) grad_norm 1.0621 (2.1533) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][70/625] eta 0:04:10 lr 0.000579 wd 0.0500 time 0.4425 (0.4522) data time 0.0010 (0.0061) model time 0.4415 (0.4436) loss 2.9644 (2.9331) grad_norm 1.5359 (2.0755) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][80/625] eta 0:04:05 lr 0.000578 wd 0.0500 time 0.4480 (0.4511) data time 0.0006 (0.0054) model time 0.4473 (0.4432) loss 3.3196 (2.9218) grad_norm 2.1966 (2.0433) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][90/625] eta 0:04:01 lr 0.000578 wd 0.0500 time 0.4402 (0.4521) data time 0.0007 (0.0049) model time 0.4395 (0.4472) loss 3.0982 (2.9362) grad_norm 2.4290 (2.0822) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][100/625] eta 0:03:56 lr 0.000578 wd 0.0500 time 0.4423 (0.4512) data time 0.0008 (0.0045) model time 0.4415 (0.4462) loss 2.2162 (2.9425) grad_norm 1.2529 (2.0677) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][110/625] eta 0:03:52 lr 0.000578 wd 0.0500 time 0.4460 (0.4505) data time 0.0010 (0.0042) model time 0.4450 (0.4456) loss 2.9151 (2.9568) grad_norm 1.8153 (2.0569) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:09:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][120/625] eta 0:03:47 lr 0.000578 wd 0.0500 time 0.4417 (0.4514) data time 0.0007 (0.0039) model time 0.4410 (0.4477) loss 2.1200 (2.9497) grad_norm 1.6021 (2.0353) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][130/625] eta 0:03:43 lr 0.000578 wd 0.0500 time 0.4423 (0.4509) data time 0.0006 (0.0037) model time 0.4417 (0.4473) loss 3.1175 (2.9450) grad_norm 2.3138 (2.0318) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][140/625] eta 0:03:38 lr 0.000578 wd 0.0500 time 0.4406 (0.4505) data time 0.0007 (0.0035) model time 0.4399 (0.4469) loss 3.3038 (2.9440) grad_norm 1.6806 (2.0174) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][150/625] eta 0:03:33 lr 0.000578 wd 0.0500 time 0.4425 (0.4499) data time 0.0009 (0.0033) model time 0.4416 (0.4463) loss 2.9275 (2.9378) grad_norm 1.2666 (1.9880) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][160/625] eta 0:03:29 lr 0.000578 wd 0.0500 time 0.4452 (0.4495) data time 0.0006 (0.0032) model time 0.4445 (0.4459) loss 2.6279 (2.9313) grad_norm 1.5238 (1.9783) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][170/625] eta 0:03:24 lr 0.000578 wd 0.0500 time 0.4424 (0.4493) data time 0.0007 (0.0030) model time 0.4418 (0.4458) loss 2.6892 (2.9154) grad_norm 1.3682 (1.9763) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][180/625] eta 0:03:19 lr 0.000577 wd 0.0500 time 0.4402 (0.4490) data time 0.0006 (0.0029) model time 0.4396 (0.4456) loss 3.1456 (2.9203) grad_norm 3.3950 (1.9805) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][190/625] eta 0:03:15 lr 0.000577 wd 0.0500 time 0.4464 (0.4497) data time 0.0009 (0.0028) model time 0.4456 (0.4468) loss 3.0999 (2.9221) grad_norm 2.0269 (1.9814) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][200/625] eta 0:03:11 lr 0.000577 wd 0.0500 time 0.4370 (0.4495) data time 0.0007 (0.0027) model time 0.4363 (0.4467) loss 3.0646 (2.9232) grad_norm 1.5868 (1.9583) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][210/625] eta 0:03:06 lr 0.000577 wd 0.0500 time 0.4466 (0.4493) data time 0.0007 (0.0026) model time 0.4459 (0.4465) loss 2.5053 (2.9101) grad_norm 1.4178 (1.9440) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][220/625] eta 0:03:01 lr 0.000577 wd 0.0500 time 0.4447 (0.4491) data time 0.0009 (0.0026) model time 0.4439 (0.4464) loss 3.0480 (2.9200) grad_norm 2.6672 (1.9505) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][230/625] eta 0:02:57 lr 0.000577 wd 0.0500 time 0.4451 (0.4490) data time 0.0009 (0.0025) model time 0.4442 (0.4463) loss 3.3423 (2.9211) grad_norm 1.9141 (1.9527) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][240/625] eta 0:02:53 lr 0.000577 wd 0.0500 time 0.4442 (0.4494) data time 0.0010 (0.0024) model time 0.4432 (0.4470) loss 3.3814 (2.9094) grad_norm 1.3804 (1.9406) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:10:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][250/625] eta 0:02:48 lr 0.000577 wd 0.0500 time 0.4424 (0.4493) data time 0.0007 (0.0024) model time 0.4417 (0.4469) loss 3.7587 (2.9181) grad_norm 1.1633 (1.9330) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][260/625] eta 0:02:43 lr 0.000577 wd 0.0500 time 0.4450 (0.4491) data time 0.0009 (0.0023) model time 0.4441 (0.4467) loss 2.4467 (2.9227) grad_norm 1.5544 (1.9316) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][270/625] eta 0:02:39 lr 0.000576 wd 0.0500 time 0.4445 (0.4490) data time 0.0009 (0.0023) model time 0.4437 (0.4467) loss 2.9155 (2.9231) grad_norm 1.6620 (1.9277) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][280/625] eta 0:02:34 lr 0.000576 wd 0.0500 time 0.4439 (0.4489) data time 0.0009 (0.0022) model time 0.4430 (0.4465) loss 3.1064 (2.9230) grad_norm 2.0967 (1.9213) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][290/625] eta 0:02:30 lr 0.000576 wd 0.0500 time 0.4407 (0.4487) data time 0.0007 (0.0022) model time 0.4400 (0.4463) loss 3.1918 (2.9262) grad_norm 1.7473 (1.9268) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][300/625] eta 0:02:25 lr 0.000576 wd 0.0500 time 0.4441 (0.4485) data time 0.0006 (0.0021) model time 0.4434 (0.4462) loss 3.1641 (2.9271) grad_norm 1.5079 (1.9208) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][310/625] eta 0:02:21 lr 0.000576 wd 0.0500 time 0.4518 (0.4483) data time 0.0009 (0.0021) model time 0.4509 (0.4461) loss 3.3269 (2.9246) grad_norm 1.1145 (1.9075) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][320/625] eta 0:02:16 lr 0.000576 wd 0.0500 time 0.4408 (0.4482) data time 0.0009 (0.0020) model time 0.4399 (0.4459) loss 2.1381 (2.9245) grad_norm 1.3612 (1.8946) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][330/625] eta 0:02:12 lr 0.000576 wd 0.0500 time 0.4435 (0.4480) data time 0.0007 (0.0020) model time 0.4427 (0.4458) loss 2.2233 (2.9232) grad_norm 1.3568 (1.8794) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][340/625] eta 0:02:07 lr 0.000576 wd 0.0500 time 0.4402 (0.4479) data time 0.0009 (0.0020) model time 0.4393 (0.4458) loss 2.2927 (2.9250) grad_norm 1.4417 (1.8751) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][350/625] eta 0:02:03 lr 0.000576 wd 0.0500 time 0.4410 (0.4478) data time 0.0009 (0.0019) model time 0.4401 (0.4456) loss 2.8932 (2.9247) grad_norm 1.7609 (1.8730) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][360/625] eta 0:01:58 lr 0.000576 wd 0.0500 time 0.4417 (0.4477) data time 0.0006 (0.0019) model time 0.4411 (0.4455) loss 3.3121 (2.9227) grad_norm 1.9540 (1.8750) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][370/625] eta 0:01:54 lr 0.000575 wd 0.0500 time 0.4401 (0.4481) data time 0.0006 (0.0019) model time 0.4394 (0.4461) loss 2.8655 (2.9177) grad_norm 2.5885 (1.8760) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][380/625] eta 0:01:49 lr 0.000575 wd 0.0500 time 0.6577 (0.4486) data time 0.0007 (0.0019) model time 0.6570 (0.4466) loss 3.7969 (2.9207) grad_norm 2.1141 (1.8739) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][390/625] eta 0:01:45 lr 0.000575 wd 0.0500 time 0.4468 (0.4483) data time 0.0008 (0.0018) model time 0.4461 (0.4464) loss 3.4106 (2.9204) grad_norm 2.6299 (1.8835) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][400/625] eta 0:01:40 lr 0.000575 wd 0.0500 time 0.4430 (0.4483) data time 0.0007 (0.0018) model time 0.4423 (0.4464) loss 3.0351 (2.9274) grad_norm 1.8674 (1.8822) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][410/625] eta 0:01:36 lr 0.000575 wd 0.0500 time 0.4465 (0.4490) data time 0.0007 (0.0018) model time 0.4458 (0.4472) loss 2.1807 (2.9213) grad_norm 2.5387 (1.9065) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][420/625] eta 0:01:32 lr 0.000575 wd 0.0500 time 0.4453 (0.4489) data time 0.0008 (0.0018) model time 0.4445 (0.4471) loss 2.9472 (2.9223) grad_norm 1.6819 (1.8972) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][430/625] eta 0:01:27 lr 0.000575 wd 0.0500 time 0.4448 (0.4492) data time 0.0007 (0.0018) model time 0.4441 (0.4475) loss 1.9172 (2.9169) grad_norm 1.5357 (1.8922) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][440/625] eta 0:01:23 lr 0.000575 wd 0.0500 time 0.4415 (0.4491) data time 0.0009 (0.0017) model time 0.4407 (0.4473) loss 2.2969 (2.9130) grad_norm 2.5149 (1.9000) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][450/625] eta 0:01:18 lr 0.000575 wd 0.0500 time 0.4437 (0.4493) data time 0.0006 (0.0017) model time 0.4430 (0.4477) loss 3.6511 (2.9150) grad_norm 1.8569 (1.9021) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][460/625] eta 0:01:14 lr 0.000574 wd 0.0500 time 0.4427 (0.4492) data time 0.0009 (0.0017) model time 0.4418 (0.4475) loss 3.1875 (2.9182) grad_norm 1.4333 (1.9012) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][470/625] eta 0:01:09 lr 0.000574 wd 0.0500 time 0.4419 (0.4491) data time 0.0009 (0.0017) model time 0.4410 (0.4474) loss 2.5278 (2.9218) grad_norm 1.6378 (1.8964) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][480/625] eta 0:01:05 lr 0.000574 wd 0.0500 time 0.4411 (0.4490) data time 0.0008 (0.0017) model time 0.4402 (0.4473) loss 3.2362 (2.9253) grad_norm 2.1916 (1.8950) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][490/625] eta 0:01:00 lr 0.000574 wd 0.0500 time 0.4435 (0.4489) data time 0.0006 (0.0017) model time 0.4428 (0.4472) loss 3.2344 (2.9276) grad_norm 5.8718 (1.8966) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][500/625] eta 0:00:56 lr 0.000574 wd 0.0500 time 0.4420 (0.4488) data time 0.0009 (0.0016) model time 0.4411 (0.4471) loss 2.1599 (2.9161) grad_norm 1.8050 (1.9109) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][510/625] eta 0:00:51 lr 0.000574 wd 0.0500 time 0.4461 (0.4486) data time 0.0006 (0.0016) model time 0.4454 (0.4470) loss 2.8266 (2.9165) grad_norm 1.4133 (1.9076) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][520/625] eta 0:00:47 lr 0.000574 wd 0.0500 time 0.4413 (0.4485) data time 0.0010 (0.0016) model time 0.4403 (0.4469) loss 3.0666 (2.9179) grad_norm 1.8090 (1.8989) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][530/625] eta 0:00:42 lr 0.000574 wd 0.0500 time 0.4436 (0.4484) data time 0.0010 (0.0016) model time 0.4426 (0.4468) loss 1.9687 (2.9161) grad_norm 1.5146 (1.8929) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][540/625] eta 0:00:38 lr 0.000574 wd 0.0500 time 0.4458 (0.4484) data time 0.0009 (0.0016) model time 0.4450 (0.4467) loss 2.3500 (2.9127) grad_norm 1.6393 (1.8898) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][550/625] eta 0:00:33 lr 0.000573 wd 0.0500 time 0.4409 (0.4483) data time 0.0009 (0.0016) model time 0.4400 (0.4466) loss 3.0683 (2.9153) grad_norm 2.1996 (1.8853) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:13:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][560/625] eta 0:00:29 lr 0.000573 wd 0.0500 time 0.4411 (0.4488) data time 0.0008 (0.0016) model time 0.4402 (0.4472) loss 3.3487 (2.9141) grad_norm 1.8225 (1.8892) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][570/625] eta 0:00:24 lr 0.000573 wd 0.0500 time 0.4418 (0.4487) data time 0.0009 (0.0016) model time 0.4409 (0.4471) loss 2.9903 (2.9123) grad_norm 7.1410 (inf) loss_scale 512.0000 (1015.9299) mem 16699MB [2024-08-10 15:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][580/625] eta 0:00:20 lr 0.000573 wd 0.0500 time 0.4426 (0.4486) data time 0.0009 (0.0016) model time 0.4417 (0.4471) loss 3.3256 (2.9083) grad_norm 1.5652 (inf) loss_scale 512.0000 (1007.2565) mem 16699MB [2024-08-10 15:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][590/625] eta 0:00:15 lr 0.000573 wd 0.0500 time 0.4436 (0.4485) data time 0.0006 (0.0015) model time 0.4429 (0.4470) loss 2.2767 (2.9093) grad_norm 1.5054 (inf) loss_scale 512.0000 (998.8765) mem 16699MB [2024-08-10 15:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][600/625] eta 0:00:11 lr 0.000573 wd 0.0500 time 0.4410 (0.4484) data time 0.0006 (0.0015) model time 0.4403 (0.4469) loss 2.7394 (2.9134) grad_norm 1.5781 (inf) loss_scale 512.0000 (990.7754) mem 16699MB [2024-08-10 15:13:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][610/625] eta 0:00:06 lr 0.000573 wd 0.0500 time 0.4415 (0.4484) data time 0.0006 (0.0015) model time 0.4409 (0.4468) loss 2.7827 (2.9140) grad_norm 1.9623 (inf) loss_scale 512.0000 (982.9394) mem 16699MB [2024-08-10 15:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [164/300][620/625] eta 0:00:02 lr 0.000573 wd 0.0500 time 0.4375 (0.4483) data time 0.0006 (0.0015) model time 0.4369 (0.4468) loss 2.7097 (2.9133) grad_norm 1.4058 (inf) loss_scale 512.0000 (975.3559) mem 16699MB [2024-08-10 15:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 164 training takes 0:04:40 [2024-08-10 15:13:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:13:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.5483 (0.5483) Acc@1 87.793 (87.793) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 15:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.155) Loss 0.9028 (0.6680) Acc@1 78.369 (85.476) Acc@5 95.654 (97.545) Mem 16699MB [2024-08-10 15:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.137) Loss 0.9976 (0.7810) Acc@1 76.416 (82.659) Acc@5 94.922 (96.345) Mem 16699MB [2024-08-10 15:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.414 Acc@5 96.355 [2024-08-10 15:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 15:13:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.812 (0.812) Loss 0.4736 (0.4736) Acc@1 89.404 (89.404) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-10 15:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.185) Loss 0.7617 (0.5922) Acc@1 81.543 (87.034) Acc@5 96.436 (97.829) Mem 16699MB [2024-08-10 15:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8550 (0.6950) Acc@1 78.711 (84.189) Acc@5 95.801 (96.838) Mem 16699MB [2024-08-10 15:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.899 Acc@5 96.859 [2024-08-10 15:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.90% [2024-08-10 15:13:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:13:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][0/625] eta 0:08:25 lr 0.000573 wd 0.0500 time 0.8094 (0.8094) data time 0.4198 (0.4198) model time 0.0000 (0.0000) loss 3.2050 (3.2050) grad_norm 1.4308 (1.4308) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][10/625] eta 0:05:00 lr 0.000573 wd 0.0500 time 0.4443 (0.4893) data time 0.0006 (0.0389) model time 0.0000 (0.0000) loss 1.9606 (2.7667) grad_norm 1.6514 (1.6667) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][20/625] eta 0:04:48 lr 0.000572 wd 0.0500 time 0.6205 (0.4762) data time 0.0008 (0.0208) model time 0.0000 (0.0000) loss 3.1712 (2.8745) grad_norm 2.6639 (1.7413) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][30/625] eta 0:04:37 lr 0.000572 wd 0.0500 time 0.4455 (0.4657) data time 0.0007 (0.0144) model time 0.0000 (0.0000) loss 2.6308 (2.9056) grad_norm 1.2142 (1.7195) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][40/625] eta 0:04:29 lr 0.000572 wd 0.0500 time 0.4522 (0.4605) data time 0.0009 (0.0111) model time 0.0000 (0.0000) loss 2.8257 (2.9139) grad_norm 1.9433 (1.7135) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][50/625] eta 0:04:23 lr 0.000572 wd 0.0500 time 0.4456 (0.4575) data time 0.0006 (0.0091) model time 0.0000 (0.0000) loss 3.2354 (2.9441) grad_norm 1.7698 (1.8151) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][60/625] eta 0:04:17 lr 0.000572 wd 0.0500 time 0.4508 (0.4553) data time 0.0007 (0.0077) model time 0.4501 (0.4433) loss 1.6840 (2.9163) grad_norm 2.0059 (1.8101) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][70/625] eta 0:04:11 lr 0.000572 wd 0.0500 time 0.4403 (0.4537) data time 0.0009 (0.0068) model time 0.4394 (0.4429) loss 2.2275 (2.8984) grad_norm 1.3661 (1.8361) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][80/625] eta 0:04:06 lr 0.000572 wd 0.0500 time 0.4400 (0.4526) data time 0.0009 (0.0061) model time 0.4391 (0.4432) loss 3.0281 (2.8744) grad_norm 1.7152 (1.7951) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][90/625] eta 0:04:01 lr 0.000572 wd 0.0500 time 0.4480 (0.4518) data time 0.0007 (0.0055) model time 0.4473 (0.4436) loss 2.3221 (2.8759) grad_norm 1.4660 (1.7770) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][100/625] eta 0:03:56 lr 0.000572 wd 0.0500 time 0.4414 (0.4510) data time 0.0006 (0.0050) model time 0.4408 (0.4434) loss 3.3107 (2.8924) grad_norm 1.4240 (1.7775) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][110/625] eta 0:03:51 lr 0.000572 wd 0.0500 time 0.4475 (0.4504) data time 0.0006 (0.0047) model time 0.4468 (0.4434) loss 2.7565 (2.8984) grad_norm 1.2358 (1.7647) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][120/625] eta 0:03:48 lr 0.000571 wd 0.0500 time 0.4404 (0.4526) data time 0.0006 (0.0044) model time 0.4397 (0.4481) loss 3.5469 (2.9031) grad_norm 2.8824 (1.7918) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][130/625] eta 0:03:44 lr 0.000571 wd 0.0500 time 0.6733 (0.4537) data time 0.0008 (0.0041) model time 0.6725 (0.4504) loss 2.7623 (2.8907) grad_norm 1.8804 (1.7880) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][140/625] eta 0:03:40 lr 0.000571 wd 0.0500 time 0.4422 (0.4543) data time 0.0009 (0.0039) model time 0.4413 (0.4515) loss 3.2835 (2.9019) grad_norm 1.5230 (1.8607) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][150/625] eta 0:03:35 lr 0.000571 wd 0.0500 time 0.4443 (0.4536) data time 0.0008 (0.0037) model time 0.4435 (0.4507) loss 3.1484 (2.9044) grad_norm 2.2594 (1.8610) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][160/625] eta 0:03:30 lr 0.000571 wd 0.0500 time 0.4426 (0.4529) data time 0.0008 (0.0035) model time 0.4418 (0.4499) loss 2.5594 (2.9066) grad_norm 1.2388 (1.8495) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][170/625] eta 0:03:25 lr 0.000571 wd 0.0500 time 0.4418 (0.4523) data time 0.0006 (0.0033) model time 0.4412 (0.4492) loss 2.6548 (2.8971) grad_norm 1.9246 (1.8910) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][180/625] eta 0:03:21 lr 0.000571 wd 0.0500 time 0.4415 (0.4518) data time 0.0008 (0.0032) model time 0.4406 (0.4487) loss 3.4475 (2.9025) grad_norm 1.5618 (1.8754) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][190/625] eta 0:03:16 lr 0.000571 wd 0.0500 time 0.4506 (0.4514) data time 0.0009 (0.0031) model time 0.4497 (0.4482) loss 2.8831 (2.9026) grad_norm 1.8678 (1.8609) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][200/625] eta 0:03:11 lr 0.000571 wd 0.0500 time 0.4445 (0.4510) data time 0.0006 (0.0030) model time 0.4438 (0.4479) loss 2.8085 (2.9121) grad_norm 1.3419 (1.8700) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][210/625] eta 0:03:07 lr 0.000570 wd 0.0500 time 0.5448 (0.4511) data time 0.0009 (0.0029) model time 0.5439 (0.4481) loss 2.2067 (2.9024) grad_norm 2.1924 (1.8734) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][220/625] eta 0:03:02 lr 0.000570 wd 0.0500 time 0.4432 (0.4508) data time 0.0006 (0.0028) model time 0.4426 (0.4478) loss 2.1936 (2.9021) grad_norm 1.4589 (1.8932) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][230/625] eta 0:02:57 lr 0.000570 wd 0.0500 time 0.4445 (0.4504) data time 0.0008 (0.0027) model time 0.4437 (0.4475) loss 2.8227 (2.9092) grad_norm 1.9063 (1.8840) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][240/625] eta 0:02:53 lr 0.000570 wd 0.0500 time 0.4451 (0.4501) data time 0.0008 (0.0026) model time 0.4443 (0.4473) loss 3.0662 (2.9127) grad_norm 1.5074 (1.8656) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][250/625] eta 0:02:48 lr 0.000570 wd 0.0500 time 0.4436 (0.4499) data time 0.0007 (0.0026) model time 0.4430 (0.4470) loss 3.3784 (2.9069) grad_norm 1.6728 (1.8531) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][260/625] eta 0:02:44 lr 0.000570 wd 0.0500 time 0.4458 (0.4496) data time 0.0009 (0.0025) model time 0.4450 (0.4468) loss 2.9386 (2.9010) grad_norm 1.7513 (1.8413) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][270/625] eta 0:02:39 lr 0.000570 wd 0.0500 time 0.4434 (0.4494) data time 0.0006 (0.0024) model time 0.4427 (0.4467) loss 1.8987 (2.9016) grad_norm 1.5492 (1.8396) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][280/625] eta 0:02:35 lr 0.000570 wd 0.0500 time 0.4464 (0.4493) data time 0.0008 (0.0024) model time 0.4456 (0.4466) loss 3.2771 (2.9102) grad_norm 1.5233 (1.8297) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][290/625] eta 0:02:30 lr 0.000570 wd 0.0500 time 0.4462 (0.4491) data time 0.0007 (0.0024) model time 0.4455 (0.4464) loss 2.2277 (2.9057) grad_norm 1.7235 (1.8487) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][300/625] eta 0:02:25 lr 0.000570 wd 0.0500 time 0.4399 (0.4489) data time 0.0006 (0.0023) model time 0.4393 (0.4463) loss 2.2434 (2.9087) grad_norm 1.9492 (1.8655) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][310/625] eta 0:02:21 lr 0.000569 wd 0.0500 time 0.4454 (0.4488) data time 0.0008 (0.0023) model time 0.4445 (0.4462) loss 3.0535 (2.9129) grad_norm 2.5097 (1.8643) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][320/625] eta 0:02:16 lr 0.000569 wd 0.0500 time 0.4371 (0.4486) data time 0.0010 (0.0022) model time 0.4361 (0.4460) loss 2.9687 (2.9171) grad_norm 2.0443 (1.8534) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][330/625] eta 0:02:12 lr 0.000569 wd 0.0500 time 0.4498 (0.4485) data time 0.0007 (0.0022) model time 0.4491 (0.4460) loss 2.3779 (2.9143) grad_norm 1.5073 (1.8456) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][340/625] eta 0:02:07 lr 0.000569 wd 0.0500 time 0.4427 (0.4490) data time 0.0009 (0.0021) model time 0.4418 (0.4466) loss 2.6334 (2.9195) grad_norm 1.8120 (1.8421) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][350/625] eta 0:02:03 lr 0.000569 wd 0.0500 time 0.4449 (0.4488) data time 0.0009 (0.0021) model time 0.4440 (0.4464) loss 2.9991 (2.9212) grad_norm 1.6306 (1.8478) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][360/625] eta 0:01:59 lr 0.000569 wd 0.0500 time 0.4396 (0.4496) data time 0.0007 (0.0021) model time 0.4389 (0.4474) loss 3.4101 (2.9257) grad_norm 1.6158 (1.8430) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][370/625] eta 0:01:54 lr 0.000569 wd 0.0500 time 0.4443 (0.4495) data time 0.0007 (0.0020) model time 0.4436 (0.4473) loss 3.3483 (2.9271) grad_norm 1.5133 (1.8368) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][380/625] eta 0:01:50 lr 0.000569 wd 0.0500 time 0.4424 (0.4494) data time 0.0008 (0.0020) model time 0.4415 (0.4472) loss 2.5675 (2.9254) grad_norm 1.9479 (1.8340) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][390/625] eta 0:01:45 lr 0.000569 wd 0.0500 time 0.4442 (0.4492) data time 0.0009 (0.0020) model time 0.4434 (0.4471) loss 2.4608 (2.9248) grad_norm 1.6670 (1.8365) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:16:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][400/625] eta 0:01:41 lr 0.000568 wd 0.0500 time 0.4434 (0.4491) data time 0.0007 (0.0020) model time 0.4427 (0.4470) loss 1.7222 (2.9180) grad_norm 1.6048 (1.8337) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][410/625] eta 0:01:36 lr 0.000568 wd 0.0500 time 0.4443 (0.4491) data time 0.0007 (0.0020) model time 0.4437 (0.4469) loss 2.4608 (2.9085) grad_norm 2.2313 (1.8295) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][420/625] eta 0:01:32 lr 0.000568 wd 0.0500 time 0.4418 (0.4490) data time 0.0006 (0.0020) model time 0.4412 (0.4468) loss 3.6520 (2.9082) grad_norm 2.3392 (1.8273) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][430/625] eta 0:01:27 lr 0.000568 wd 0.0500 time 0.6398 (0.4493) data time 0.0009 (0.0019) model time 0.6389 (0.4472) loss 1.8537 (2.9024) grad_norm 2.0020 (1.8279) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][440/625] eta 0:01:23 lr 0.000568 wd 0.0500 time 0.4396 (0.4492) data time 0.0009 (0.0019) model time 0.4386 (0.4471) loss 2.2188 (2.9009) grad_norm 1.6427 (1.8278) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][450/625] eta 0:01:18 lr 0.000568 wd 0.0500 time 0.6400 (0.4495) data time 0.0008 (0.0019) model time 0.6392 (0.4475) loss 2.9542 (2.9010) grad_norm 2.3534 (1.8294) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][460/625] eta 0:01:14 lr 0.000568 wd 0.0500 time 0.4430 (0.4494) data time 0.0008 (0.0019) model time 0.4422 (0.4474) loss 3.4424 (2.8994) grad_norm 1.6979 (1.8264) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][470/625] eta 0:01:09 lr 0.000568 wd 0.0500 time 0.6175 (0.4497) data time 0.0008 (0.0019) model time 0.6167 (0.4478) loss 2.9101 (2.8927) grad_norm 1.2378 (1.8202) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][480/625] eta 0:01:05 lr 0.000568 wd 0.0500 time 0.4442 (0.4496) data time 0.0008 (0.0018) model time 0.4434 (0.4477) loss 3.1169 (2.8957) grad_norm 1.4385 (1.8145) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][490/625] eta 0:01:00 lr 0.000567 wd 0.0500 time 0.4455 (0.4496) data time 0.0009 (0.0018) model time 0.4446 (0.4478) loss 2.0833 (2.8958) grad_norm 2.2016 (1.8227) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][500/625] eta 0:00:56 lr 0.000567 wd 0.0500 time 0.4449 (0.4504) data time 0.0008 (0.0018) model time 0.4440 (0.4486) loss 3.3442 (2.8962) grad_norm 1.5105 (1.8242) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][510/625] eta 0:00:51 lr 0.000567 wd 0.0500 time 0.4437 (0.4503) data time 0.0009 (0.0018) model time 0.4428 (0.4485) loss 3.2088 (2.8946) grad_norm 1.9993 (1.8258) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][520/625] eta 0:00:47 lr 0.000567 wd 0.0500 time 0.4419 (0.4501) data time 0.0032 (0.0018) model time 0.4387 (0.4483) loss 2.2547 (2.8947) grad_norm 2.2048 (1.8268) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][530/625] eta 0:00:42 lr 0.000567 wd 0.0500 time 0.4394 (0.4501) data time 0.0007 (0.0017) model time 0.4387 (0.4483) loss 3.7730 (2.8982) grad_norm 1.5293 (1.8462) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][540/625] eta 0:00:38 lr 0.000567 wd 0.0500 time 0.4429 (0.4500) data time 0.0006 (0.0017) model time 0.4423 (0.4482) loss 3.5550 (2.8942) grad_norm 2.0783 (1.8471) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][550/625] eta 0:00:33 lr 0.000567 wd 0.0500 time 0.4392 (0.4499) data time 0.0006 (0.0017) model time 0.4386 (0.4481) loss 3.5070 (2.8935) grad_norm 1.6328 (1.8432) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][560/625] eta 0:00:29 lr 0.000567 wd 0.0500 time 0.4451 (0.4498) data time 0.0007 (0.0017) model time 0.4445 (0.4480) loss 3.4925 (2.8972) grad_norm 1.7344 (1.8507) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][570/625] eta 0:00:24 lr 0.000567 wd 0.0500 time 0.4453 (0.4497) data time 0.0009 (0.0017) model time 0.4444 (0.4480) loss 2.9517 (2.8969) grad_norm 2.5542 (1.8531) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][580/625] eta 0:00:20 lr 0.000567 wd 0.0500 time 0.4428 (0.4499) data time 0.0008 (0.0017) model time 0.4419 (0.4482) loss 2.1554 (2.8916) grad_norm 1.7249 (1.8471) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][590/625] eta 0:00:15 lr 0.000566 wd 0.0500 time 0.4399 (0.4498) data time 0.0009 (0.0017) model time 0.4390 (0.4481) loss 3.6190 (2.8935) grad_norm 1.5131 (1.8424) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][600/625] eta 0:00:11 lr 0.000566 wd 0.0500 time 0.4397 (0.4497) data time 0.0007 (0.0017) model time 0.4390 (0.4480) loss 2.3690 (2.8938) grad_norm 1.7372 (1.8425) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][610/625] eta 0:00:06 lr 0.000566 wd 0.0500 time 0.4381 (0.4495) data time 0.0006 (0.0016) model time 0.4376 (0.4478) loss 2.8762 (2.8952) grad_norm 2.7233 (1.8430) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [165/300][620/625] eta 0:00:02 lr 0.000566 wd 0.0500 time 0.4404 (0.4494) data time 0.0006 (0.0016) model time 0.4399 (0.4477) loss 3.1679 (2.8996) grad_norm 1.8932 (1.8472) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 165 training takes 0:04:40 [2024-08-10 15:18:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:18:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5244 (0.5244) Acc@1 88.525 (88.525) Acc@5 98.193 (98.193) Mem 16699MB [2024-08-10 15:18:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8311 (0.6511) Acc@1 79.590 (85.676) Acc@5 96.240 (97.514) Mem 16699MB [2024-08-10 15:18:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9644 (0.7646) Acc@1 77.197 (82.703) Acc@5 94.434 (96.263) Mem 16699MB [2024-08-10 15:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.448 Acc@5 96.277 [2024-08-10 15:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 15:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.846 (0.846) Loss 0.4729 (0.4729) Acc@1 89.307 (89.307) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-10 15:18:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.187) Loss 0.7627 (0.5922) Acc@1 81.836 (87.043) Acc@5 96.143 (97.843) Mem 16699MB [2024-08-10 15:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.8555 (0.6948) Acc@1 78.516 (84.201) Acc@5 95.850 (96.861) Mem 16699MB [2024-08-10 15:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.905 Acc@5 96.877 [2024-08-10 15:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.91% [2024-08-10 15:18:44 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:18:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][0/625] eta 0:08:23 lr 0.000566 wd 0.0500 time 0.8060 (0.8060) data time 0.4152 (0.4152) model time 0.0000 (0.0000) loss 3.4202 (3.4202) grad_norm 1.6856 (1.6856) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][10/625] eta 0:04:53 lr 0.000566 wd 0.0500 time 0.4534 (0.4777) data time 0.0010 (0.0386) model time 0.0000 (0.0000) loss 2.9366 (2.9892) grad_norm 1.1958 (1.8441) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][20/625] eta 0:04:38 lr 0.000566 wd 0.0500 time 0.4421 (0.4611) data time 0.0007 (0.0207) model time 0.0000 (0.0000) loss 3.4256 (3.0050) grad_norm 1.4713 (1.6987) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][30/625] eta 0:04:34 lr 0.000566 wd 0.0500 time 0.4430 (0.4612) data time 0.0008 (0.0143) model time 0.0000 (0.0000) loss 2.6884 (2.9024) grad_norm 2.0277 (1.7500) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][40/625] eta 0:04:26 lr 0.000566 wd 0.0500 time 0.4390 (0.4562) data time 0.0006 (0.0110) model time 0.0000 (0.0000) loss 2.8141 (2.8402) grad_norm 1.5526 (1.7575) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][50/625] eta 0:04:20 lr 0.000566 wd 0.0500 time 0.4396 (0.4535) data time 0.0007 (0.0090) model time 0.0000 (0.0000) loss 3.0488 (2.8654) grad_norm 1.4435 (1.7512) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][60/625] eta 0:04:15 lr 0.000565 wd 0.0500 time 0.4448 (0.4530) data time 0.0007 (0.0077) model time 0.4441 (0.4494) loss 3.2049 (2.8688) grad_norm 1.6465 (1.7262) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][70/625] eta 0:04:12 lr 0.000565 wd 0.0500 time 0.4426 (0.4541) data time 0.0008 (0.0067) model time 0.4419 (0.4547) loss 3.0129 (2.8567) grad_norm 2.0724 (1.7491) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][80/625] eta 0:04:06 lr 0.000565 wd 0.0500 time 0.4445 (0.4528) data time 0.0009 (0.0060) model time 0.4437 (0.4506) loss 3.0971 (2.8808) grad_norm 2.0055 (1.7618) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][90/625] eta 0:04:03 lr 0.000565 wd 0.0500 time 0.6685 (0.4543) data time 0.0006 (0.0054) model time 0.6678 (0.4545) loss 2.0602 (2.8576) grad_norm 1.6217 (1.7370) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][100/625] eta 0:03:57 lr 0.000565 wd 0.0500 time 0.4466 (0.4533) data time 0.0009 (0.0050) model time 0.4457 (0.4522) loss 2.1008 (2.8594) grad_norm 1.3596 (1.7208) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][110/625] eta 0:03:52 lr 0.000565 wd 0.0500 time 0.4435 (0.4524) data time 0.0009 (0.0046) model time 0.4426 (0.4506) loss 2.9436 (2.8561) grad_norm 1.3779 (1.7170) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][120/625] eta 0:03:48 lr 0.000565 wd 0.0500 time 0.4384 (0.4516) data time 0.0010 (0.0043) model time 0.4374 (0.4493) loss 2.5888 (2.8621) grad_norm 1.6925 (1.7132) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][130/625] eta 0:03:43 lr 0.000565 wd 0.0500 time 0.4435 (0.4524) data time 0.0006 (0.0041) model time 0.4429 (0.4509) loss 1.9578 (2.8469) grad_norm 1.7065 (1.7412) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][140/625] eta 0:03:39 lr 0.000565 wd 0.0500 time 0.4439 (0.4518) data time 0.0007 (0.0038) model time 0.4432 (0.4500) loss 3.2818 (2.8596) grad_norm 1.5805 (1.7426) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][150/625] eta 0:03:34 lr 0.000564 wd 0.0500 time 0.4438 (0.4513) data time 0.0006 (0.0036) model time 0.4432 (0.4493) loss 2.5613 (2.8535) grad_norm 1.5584 (1.7402) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][160/625] eta 0:03:29 lr 0.000564 wd 0.0500 time 0.4410 (0.4508) data time 0.0008 (0.0035) model time 0.4401 (0.4487) loss 3.3354 (2.8509) grad_norm 1.9899 (1.7321) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][170/625] eta 0:03:24 lr 0.000564 wd 0.0500 time 0.4446 (0.4504) data time 0.0009 (0.0033) model time 0.4437 (0.4482) loss 3.7264 (2.8626) grad_norm 1.5366 (1.7386) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][180/625] eta 0:03:20 lr 0.000564 wd 0.0500 time 0.4482 (0.4501) data time 0.0011 (0.0032) model time 0.4471 (0.4478) loss 3.0153 (2.8594) grad_norm 1.5978 (1.7856) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][190/625] eta 0:03:15 lr 0.000564 wd 0.0500 time 0.4467 (0.4497) data time 0.0009 (0.0031) model time 0.4458 (0.4475) loss 2.9131 (2.8566) grad_norm 1.8778 (1.7928) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][200/625] eta 0:03:10 lr 0.000564 wd 0.0500 time 0.4417 (0.4494) data time 0.0007 (0.0030) model time 0.4410 (0.4471) loss 2.1006 (2.8501) grad_norm 1.9571 (1.7865) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][210/625] eta 0:03:06 lr 0.000564 wd 0.0500 time 0.4443 (0.4491) data time 0.0008 (0.0029) model time 0.4435 (0.4468) loss 2.8532 (2.8473) grad_norm 1.4687 (1.7695) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][220/625] eta 0:03:02 lr 0.000564 wd 0.0500 time 0.4401 (0.4496) data time 0.0006 (0.0028) model time 0.4395 (0.4475) loss 2.1651 (2.8467) grad_norm 1.9604 (1.7802) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][230/625] eta 0:02:57 lr 0.000564 wd 0.0500 time 0.4485 (0.4494) data time 0.0007 (0.0027) model time 0.4478 (0.4473) loss 3.3846 (2.8522) grad_norm 2.0614 (1.7880) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][240/625] eta 0:02:52 lr 0.000563 wd 0.0500 time 0.4444 (0.4491) data time 0.0006 (0.0026) model time 0.4437 (0.4470) loss 1.8693 (2.8445) grad_norm 2.6543 (1.8071) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][250/625] eta 0:02:48 lr 0.000563 wd 0.0500 time 0.4399 (0.4488) data time 0.0007 (0.0026) model time 0.4392 (0.4467) loss 3.0596 (2.8533) grad_norm 1.4406 (1.8184) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][260/625] eta 0:02:43 lr 0.000563 wd 0.0500 time 0.4444 (0.4489) data time 0.0009 (0.0025) model time 0.4435 (0.4469) loss 2.8176 (2.8535) grad_norm 2.0228 (1.8072) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][270/625] eta 0:02:39 lr 0.000563 wd 0.0500 time 0.4437 (0.4487) data time 0.0007 (0.0024) model time 0.4430 (0.4467) loss 3.1078 (2.8617) grad_norm 1.6788 (1.8286) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][280/625] eta 0:02:34 lr 0.000563 wd 0.0500 time 0.4435 (0.4491) data time 0.0006 (0.0024) model time 0.4429 (0.4472) loss 3.2256 (2.8612) grad_norm 1.3734 (1.8228) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:20:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][290/625] eta 0:02:30 lr 0.000563 wd 0.0500 time 0.4453 (0.4489) data time 0.0006 (0.0023) model time 0.4446 (0.4470) loss 2.5852 (2.8632) grad_norm 1.6508 (1.8345) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][300/625] eta 0:02:25 lr 0.000563 wd 0.0500 time 0.4456 (0.4488) data time 0.0008 (0.0023) model time 0.4448 (0.4469) loss 2.1469 (2.8724) grad_norm 1.5700 (1.8245) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][310/625] eta 0:02:21 lr 0.000563 wd 0.0500 time 0.4423 (0.4487) data time 0.0009 (0.0022) model time 0.4414 (0.4468) loss 2.8626 (2.8661) grad_norm 1.1951 (1.8158) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][320/625] eta 0:02:16 lr 0.000563 wd 0.0500 time 0.4451 (0.4485) data time 0.0008 (0.0022) model time 0.4442 (0.4467) loss 2.1289 (2.8590) grad_norm 1.9382 (1.8177) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][330/625] eta 0:02:12 lr 0.000563 wd 0.0500 time 0.4425 (0.4483) data time 0.0008 (0.0022) model time 0.4417 (0.4464) loss 2.6824 (2.8633) grad_norm 2.7176 (1.8249) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][340/625] eta 0:02:07 lr 0.000562 wd 0.0500 time 0.4475 (0.4481) data time 0.0008 (0.0021) model time 0.4467 (0.4463) loss 3.0413 (2.8652) grad_norm 1.5627 (1.8277) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][350/625] eta 0:02:03 lr 0.000562 wd 0.0500 time 0.4392 (0.4479) data time 0.0010 (0.0021) model time 0.4382 (0.4461) loss 2.4046 (2.8633) grad_norm 1.3707 (1.8292) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][360/625] eta 0:01:58 lr 0.000562 wd 0.0500 time 0.4430 (0.4479) data time 0.0006 (0.0021) model time 0.4424 (0.4460) loss 3.1198 (2.8680) grad_norm 1.6465 (1.8237) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][370/625] eta 0:01:54 lr 0.000562 wd 0.0500 time 0.4443 (0.4478) data time 0.0010 (0.0020) model time 0.4434 (0.4459) loss 3.1743 (2.8716) grad_norm 2.0881 (1.8236) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][380/625] eta 0:01:49 lr 0.000562 wd 0.0500 time 0.4546 (0.4477) data time 0.0006 (0.0020) model time 0.4541 (0.4459) loss 3.4420 (2.8691) grad_norm 2.1888 (1.8268) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][390/625] eta 0:01:45 lr 0.000562 wd 0.0500 time 0.4434 (0.4476) data time 0.0010 (0.0020) model time 0.4424 (0.4458) loss 1.9289 (2.8637) grad_norm 1.2999 (1.8220) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][400/625] eta 0:01:40 lr 0.000562 wd 0.0500 time 0.4416 (0.4475) data time 0.0007 (0.0019) model time 0.4409 (0.4457) loss 3.2965 (2.8621) grad_norm 1.8187 (1.8185) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][410/625] eta 0:01:36 lr 0.000562 wd 0.0500 time 0.4456 (0.4474) data time 0.0009 (0.0019) model time 0.4447 (0.4456) loss 2.6700 (2.8562) grad_norm 2.0745 (1.8170) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][420/625] eta 0:01:31 lr 0.000562 wd 0.0500 time 0.6593 (0.4478) data time 0.0006 (0.0019) model time 0.6587 (0.4461) loss 3.2089 (2.8543) grad_norm 1.3459 (1.8081) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][430/625] eta 0:01:27 lr 0.000561 wd 0.0500 time 0.4427 (0.4480) data time 0.0009 (0.0019) model time 0.4417 (0.4464) loss 2.8532 (2.8520) grad_norm 1.5950 (1.8012) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][440/625] eta 0:01:22 lr 0.000561 wd 0.0500 time 0.4456 (0.4484) data time 0.0006 (0.0018) model time 0.4450 (0.4468) loss 3.3145 (2.8590) grad_norm 1.4379 (1.8046) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][450/625] eta 0:01:18 lr 0.000561 wd 0.0500 time 0.4429 (0.4483) data time 0.0006 (0.0018) model time 0.4423 (0.4467) loss 2.2895 (2.8572) grad_norm 1.4513 (1.8128) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][460/625] eta 0:01:14 lr 0.000561 wd 0.0500 time 0.4437 (0.4486) data time 0.0006 (0.0018) model time 0.4430 (0.4471) loss 2.7326 (2.8584) grad_norm 1.2443 (1.8062) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][470/625] eta 0:01:09 lr 0.000561 wd 0.0500 time 0.4405 (0.4485) data time 0.0010 (0.0018) model time 0.4395 (0.4469) loss 3.2129 (2.8551) grad_norm 6.6117 (1.8142) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][480/625] eta 0:01:05 lr 0.000561 wd 0.0500 time 0.4446 (0.4487) data time 0.0007 (0.0018) model time 0.4439 (0.4472) loss 3.0990 (2.8554) grad_norm 1.7699 (1.8212) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][490/625] eta 0:01:00 lr 0.000561 wd 0.0500 time 0.4400 (0.4486) data time 0.0006 (0.0017) model time 0.4394 (0.4471) loss 1.7479 (2.8571) grad_norm 2.3996 (1.8276) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][500/625] eta 0:00:56 lr 0.000561 wd 0.0500 time 0.4427 (0.4484) data time 0.0008 (0.0017) model time 0.4419 (0.4470) loss 3.0960 (2.8574) grad_norm 2.4066 (1.8309) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][510/625] eta 0:00:51 lr 0.000561 wd 0.0500 time 0.4465 (0.4483) data time 0.0008 (0.0017) model time 0.4457 (0.4469) loss 3.3123 (2.8564) grad_norm 1.7682 (1.8334) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][520/625] eta 0:00:47 lr 0.000561 wd 0.0500 time 0.4396 (0.4483) data time 0.0007 (0.0017) model time 0.4389 (0.4468) loss 2.4748 (2.8557) grad_norm 1.8546 (1.8509) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][530/625] eta 0:00:42 lr 0.000560 wd 0.0500 time 0.4413 (0.4482) data time 0.0007 (0.0017) model time 0.4407 (0.4467) loss 2.1475 (2.8565) grad_norm 1.8486 (1.8589) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][540/625] eta 0:00:38 lr 0.000560 wd 0.0500 time 0.4441 (0.4481) data time 0.0006 (0.0017) model time 0.4435 (0.4466) loss 3.4445 (2.8590) grad_norm 1.3632 (1.8660) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][550/625] eta 0:00:33 lr 0.000560 wd 0.0500 time 0.4431 (0.4480) data time 0.0008 (0.0017) model time 0.4422 (0.4465) loss 3.2677 (2.8615) grad_norm 1.6977 (1.8621) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:22:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][560/625] eta 0:00:29 lr 0.000560 wd 0.0500 time 0.4396 (0.4479) data time 0.0006 (0.0016) model time 0.4390 (0.4464) loss 3.3305 (2.8665) grad_norm 1.4532 (1.8607) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][570/625] eta 0:00:24 lr 0.000560 wd 0.0500 time 0.4409 (0.4478) data time 0.0009 (0.0016) model time 0.4401 (0.4463) loss 2.2231 (2.8696) grad_norm 1.8733 (1.8584) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][580/625] eta 0:00:20 lr 0.000560 wd 0.0500 time 0.4407 (0.4477) data time 0.0007 (0.0016) model time 0.4400 (0.4462) loss 2.4281 (2.8668) grad_norm 1.3438 (1.8573) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][590/625] eta 0:00:15 lr 0.000560 wd 0.0500 time 0.4434 (0.4479) data time 0.0009 (0.0016) model time 0.4425 (0.4465) loss 2.8036 (2.8663) grad_norm 1.3716 (1.8533) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][600/625] eta 0:00:11 lr 0.000560 wd 0.0500 time 0.4434 (0.4479) data time 0.0008 (0.0016) model time 0.4426 (0.4465) loss 2.6786 (2.8689) grad_norm 1.6604 (1.8479) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][610/625] eta 0:00:06 lr 0.000560 wd 0.0500 time 0.4374 (0.4478) data time 0.0004 (0.0016) model time 0.4370 (0.4464) loss 2.7652 (2.8723) grad_norm 1.5798 (1.8496) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [166/300][620/625] eta 0:00:02 lr 0.000559 wd 0.0500 time 0.4413 (0.4482) data time 0.0004 (0.0016) model time 0.4409 (0.4468) loss 3.3578 (2.8769) grad_norm 1.7071 (1.8519) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 166 training takes 0:04:40 [2024-08-10 15:23:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:23:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.486 (0.486) Loss 0.5103 (0.5103) Acc@1 88.672 (88.672) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 15:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.153) Loss 0.8628 (0.6508) Acc@1 79.980 (85.760) Acc@5 95.410 (97.541) Mem 16699MB [2024-08-10 15:23:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.8984 (0.7598) Acc@1 77.734 (82.826) Acc@5 95.264 (96.340) Mem 16699MB [2024-08-10 15:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.544 Acc@5 96.289 [2024-08-10 15:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-10 15:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.54% [2024-08-10 15:23:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 15:23:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 15:23:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.4729 (0.4729) Acc@1 89.258 (89.258) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.7617 (0.5921) Acc@1 81.836 (87.056) Acc@5 96.191 (97.834) Mem 16699MB [2024-08-10 15:23:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.8555 (0.6946) Acc@1 78.418 (84.217) Acc@5 95.801 (96.849) Mem 16699MB [2024-08-10 15:23:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.905 Acc@5 96.867 [2024-08-10 15:23:36 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:23:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][0/625] eta 0:13:05 lr 0.000559 wd 0.0500 time 1.2573 (1.2573) data time 0.5790 (0.5790) model time 0.0000 (0.0000) loss 2.9446 (2.9446) grad_norm 1.3288 (1.3288) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][10/625] eta 0:05:18 lr 0.000559 wd 0.0500 time 0.4429 (0.5172) data time 0.0009 (0.0534) model time 0.0000 (0.0000) loss 2.8993 (3.1162) grad_norm 2.3900 (1.8650) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][20/625] eta 0:04:55 lr 0.000559 wd 0.0500 time 0.4440 (0.4886) data time 0.0009 (0.0284) model time 0.0000 (0.0000) loss 2.5330 (2.8516) grad_norm 1.7652 (1.8331) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][30/625] eta 0:04:42 lr 0.000559 wd 0.0500 time 0.4436 (0.4743) data time 0.0009 (0.0195) model time 0.0000 (0.0000) loss 2.2770 (2.7927) grad_norm 1.4249 (1.8193) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][40/625] eta 0:04:33 lr 0.000559 wd 0.0500 time 0.4427 (0.4669) data time 0.0007 (0.0150) model time 0.0000 (0.0000) loss 3.0390 (2.7707) grad_norm 1.7405 (2.0354) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:23:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][50/625] eta 0:04:27 lr 0.000559 wd 0.0500 time 0.4425 (0.4657) data time 0.0007 (0.0122) model time 0.0000 (0.0000) loss 1.8581 (2.7588) grad_norm 5.2612 (2.2481) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][60/625] eta 0:04:21 lr 0.000559 wd 0.0500 time 0.4424 (0.4622) data time 0.0009 (0.0104) model time 0.4415 (0.4435) loss 3.7909 (2.8226) grad_norm 1.7469 (2.1847) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][70/625] eta 0:04:14 lr 0.000559 wd 0.0500 time 0.4394 (0.4593) data time 0.0008 (0.0090) model time 0.4385 (0.4422) loss 2.6010 (2.8472) grad_norm 1.6037 (2.0967) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][80/625] eta 0:04:09 lr 0.000559 wd 0.0500 time 0.4424 (0.4573) data time 0.0010 (0.0080) model time 0.4415 (0.4421) loss 2.3194 (2.8659) grad_norm 1.2521 (2.0174) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][90/625] eta 0:04:03 lr 0.000558 wd 0.0500 time 0.4547 (0.4559) data time 0.0007 (0.0073) model time 0.4540 (0.4424) loss 3.4569 (2.8408) grad_norm 1.8025 (1.9669) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][100/625] eta 0:03:58 lr 0.000558 wd 0.0500 time 0.4455 (0.4547) data time 0.0008 (0.0066) model time 0.4447 (0.4426) loss 1.8380 (2.8430) grad_norm 1.8489 (1.9193) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][110/625] eta 0:03:53 lr 0.000558 wd 0.0500 time 0.4442 (0.4538) data time 0.0008 (0.0061) model time 0.4434 (0.4428) loss 2.9280 (2.8629) grad_norm 1.5835 (1.8989) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][120/625] eta 0:03:48 lr 0.000558 wd 0.0500 time 0.4405 (0.4529) data time 0.0006 (0.0057) model time 0.4399 (0.4427) loss 2.0315 (2.8519) grad_norm 1.9762 (1.8741) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][130/625] eta 0:03:44 lr 0.000558 wd 0.0500 time 0.4477 (0.4535) data time 0.0008 (0.0053) model time 0.4468 (0.4448) loss 2.5624 (2.8507) grad_norm 1.6760 (1.8491) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][140/625] eta 0:03:40 lr 0.000558 wd 0.0500 time 0.4375 (0.4543) data time 0.0008 (0.0050) model time 0.4367 (0.4470) loss 3.2164 (2.8423) grad_norm 1.6205 (1.8638) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][150/625] eta 0:03:35 lr 0.000558 wd 0.0500 time 0.4409 (0.4535) data time 0.0008 (0.0047) model time 0.4401 (0.4464) loss 3.1705 (2.8537) grad_norm 2.0388 (1.8688) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][160/625] eta 0:03:30 lr 0.000558 wd 0.0500 time 0.4403 (0.4530) data time 0.0009 (0.0045) model time 0.4394 (0.4462) loss 2.9958 (2.8413) grad_norm 1.5103 (1.8743) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][170/625] eta 0:03:25 lr 0.000558 wd 0.0500 time 0.4485 (0.4525) data time 0.0006 (0.0043) model time 0.4478 (0.4460) loss 3.2186 (2.8386) grad_norm 1.4585 (1.8604) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][180/625] eta 0:03:21 lr 0.000557 wd 0.0500 time 0.4415 (0.4520) data time 0.0006 (0.0041) model time 0.4409 (0.4458) loss 3.6429 (2.8522) grad_norm 1.8068 (1.8465) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][190/625] eta 0:03:16 lr 0.000557 wd 0.0500 time 0.4426 (0.4521) data time 0.0008 (0.0039) model time 0.4418 (0.4463) loss 3.7460 (2.8543) grad_norm 1.6351 (1.8484) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][200/625] eta 0:03:11 lr 0.000557 wd 0.0500 time 0.4428 (0.4517) data time 0.0009 (0.0038) model time 0.4419 (0.4460) loss 3.2338 (2.8522) grad_norm 1.4178 (1.8547) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][210/625] eta 0:03:07 lr 0.000557 wd 0.0500 time 0.4394 (0.4513) data time 0.0006 (0.0036) model time 0.4388 (0.4458) loss 1.7828 (2.8483) grad_norm 1.4568 (1.8523) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][220/625] eta 0:03:02 lr 0.000557 wd 0.0500 time 0.4388 (0.4509) data time 0.0008 (0.0035) model time 0.4380 (0.4456) loss 2.9902 (2.8532) grad_norm 1.5294 (1.8510) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][230/625] eta 0:02:57 lr 0.000557 wd 0.0500 time 0.4460 (0.4506) data time 0.0006 (0.0034) model time 0.4455 (0.4454) loss 3.2122 (2.8539) grad_norm 1.6641 (1.8579) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][240/625] eta 0:02:53 lr 0.000557 wd 0.0500 time 0.4404 (0.4507) data time 0.0008 (0.0033) model time 0.4396 (0.4457) loss 3.4680 (2.8556) grad_norm 2.3203 (1.8751) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][250/625] eta 0:02:48 lr 0.000557 wd 0.0500 time 0.4423 (0.4503) data time 0.0008 (0.0032) model time 0.4414 (0.4455) loss 2.7857 (2.8485) grad_norm 1.3412 (1.8691) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][260/625] eta 0:02:44 lr 0.000557 wd 0.0500 time 0.4403 (0.4500) data time 0.0008 (0.0031) model time 0.4395 (0.4453) loss 3.2386 (2.8452) grad_norm 1.9202 (1.8733) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][270/625] eta 0:02:39 lr 0.000557 wd 0.0500 time 0.4394 (0.4498) data time 0.0006 (0.0030) model time 0.4387 (0.4452) loss 3.3317 (2.8434) grad_norm 2.4641 (1.8688) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][280/625] eta 0:02:35 lr 0.000556 wd 0.0500 time 0.4472 (0.4495) data time 0.0008 (0.0030) model time 0.4464 (0.4451) loss 2.8644 (2.8414) grad_norm 1.5323 (1.8838) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][290/625] eta 0:02:30 lr 0.000556 wd 0.0500 time 0.4442 (0.4493) data time 0.0006 (0.0029) model time 0.4436 (0.4449) loss 3.1060 (2.8469) grad_norm 1.5707 (1.8908) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][300/625] eta 0:02:25 lr 0.000556 wd 0.0500 time 0.4493 (0.4491) data time 0.0010 (0.0028) model time 0.4483 (0.4448) loss 2.8834 (2.8427) grad_norm 1.5535 (1.8802) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:25:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][310/625] eta 0:02:21 lr 0.000556 wd 0.0500 time 0.4449 (0.4489) data time 0.0008 (0.0028) model time 0.4441 (0.4447) loss 3.1026 (2.8448) grad_norm 2.4011 (1.8837) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][320/625] eta 0:02:16 lr 0.000556 wd 0.0500 time 0.4448 (0.4488) data time 0.0006 (0.0027) model time 0.4442 (0.4447) loss 2.5169 (2.8468) grad_norm 2.4220 (1.8832) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][330/625] eta 0:02:12 lr 0.000556 wd 0.0500 time 0.4418 (0.4486) data time 0.0008 (0.0027) model time 0.4410 (0.4446) loss 3.0864 (2.8571) grad_norm 1.7857 (1.8889) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][340/625] eta 0:02:07 lr 0.000556 wd 0.0500 time 0.4433 (0.4485) data time 0.0006 (0.0026) model time 0.4428 (0.4445) loss 2.8219 (2.8580) grad_norm 1.2145 (1.8817) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][350/625] eta 0:02:03 lr 0.000556 wd 0.0500 time 0.4428 (0.4489) data time 0.0007 (0.0026) model time 0.4421 (0.4451) loss 2.6565 (2.8632) grad_norm 1.2871 (1.8693) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][360/625] eta 0:01:58 lr 0.000556 wd 0.0500 time 0.4395 (0.4487) data time 0.0009 (0.0025) model time 0.4386 (0.4450) loss 2.8861 (2.8589) grad_norm 1.3482 (1.8677) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][370/625] eta 0:01:54 lr 0.000555 wd 0.0500 time 0.4429 (0.4486) data time 0.0008 (0.0025) model time 0.4420 (0.4450) loss 3.1651 (2.8612) grad_norm 1.7341 (1.8582) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][380/625] eta 0:01:49 lr 0.000555 wd 0.0500 time 0.4448 (0.4485) data time 0.0006 (0.0024) model time 0.4443 (0.4450) loss 2.3161 (2.8609) grad_norm 1.9782 (1.8578) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][390/625] eta 0:01:45 lr 0.000555 wd 0.0500 time 0.4426 (0.4484) data time 0.0006 (0.0024) model time 0.4419 (0.4449) loss 3.5153 (2.8670) grad_norm 1.8734 (1.8551) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][400/625] eta 0:01:40 lr 0.000555 wd 0.0500 time 0.4445 (0.4483) data time 0.0009 (0.0024) model time 0.4437 (0.4449) loss 2.7096 (2.8673) grad_norm 1.9721 (1.8528) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][410/625] eta 0:01:36 lr 0.000555 wd 0.0500 time 0.4414 (0.4487) data time 0.0007 (0.0023) model time 0.4406 (0.4453) loss 2.1020 (2.8644) grad_norm 1.6780 (1.8476) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][420/625] eta 0:01:31 lr 0.000555 wd 0.0500 time 0.4431 (0.4485) data time 0.0008 (0.0023) model time 0.4422 (0.4452) loss 3.0747 (2.8576) grad_norm 2.5599 (1.8480) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][430/625] eta 0:01:27 lr 0.000555 wd 0.0500 time 0.4445 (0.4484) data time 0.0009 (0.0023) model time 0.4436 (0.4452) loss 3.1334 (2.8588) grad_norm 1.6986 (1.8600) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][440/625] eta 0:01:22 lr 0.000555 wd 0.0500 time 0.4397 (0.4482) data time 0.0008 (0.0022) model time 0.4389 (0.4451) loss 3.4650 (2.8619) grad_norm 1.4869 (1.8596) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][450/625] eta 0:01:18 lr 0.000555 wd 0.0500 time 0.4407 (0.4482) data time 0.0007 (0.0022) model time 0.4400 (0.4450) loss 3.3034 (2.8611) grad_norm 1.6196 (1.8775) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][460/625] eta 0:01:13 lr 0.000555 wd 0.0500 time 0.4471 (0.4484) data time 0.0008 (0.0022) model time 0.4463 (0.4453) loss 2.7229 (2.8595) grad_norm 1.6308 (1.8716) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][470/625] eta 0:01:09 lr 0.000554 wd 0.0500 time 0.4409 (0.4487) data time 0.0008 (0.0021) model time 0.4401 (0.4457) loss 3.0374 (2.8600) grad_norm 1.1385 (1.8654) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][480/625] eta 0:01:05 lr 0.000554 wd 0.0500 time 0.4443 (0.4491) data time 0.0007 (0.0021) model time 0.4437 (0.4462) loss 2.9337 (2.8636) grad_norm 10.0421 (1.8780) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][490/625] eta 0:01:00 lr 0.000554 wd 0.0500 time 0.4419 (0.4490) data time 0.0006 (0.0021) model time 0.4413 (0.4461) loss 3.7559 (2.8665) grad_norm 1.6013 (1.8759) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][500/625] eta 0:00:56 lr 0.000554 wd 0.0500 time 0.4448 (0.4489) data time 0.0007 (0.0021) model time 0.4441 (0.4460) loss 1.7507 (2.8642) grad_norm 1.8689 (1.8808) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][510/625] eta 0:00:51 lr 0.000554 wd 0.0500 time 0.4419 (0.4488) data time 0.0009 (0.0020) model time 0.4410 (0.4460) loss 3.3957 (2.8630) grad_norm 1.9444 (1.8849) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][520/625] eta 0:00:47 lr 0.000554 wd 0.0500 time 0.4439 (0.4487) data time 0.0006 (0.0020) model time 0.4433 (0.4459) loss 3.0479 (2.8686) grad_norm 1.1895 (1.8770) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][530/625] eta 0:00:42 lr 0.000554 wd 0.0500 time 0.4447 (0.4486) data time 0.0009 (0.0020) model time 0.4438 (0.4458) loss 3.0601 (2.8673) grad_norm 2.1935 (1.8813) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][540/625] eta 0:00:38 lr 0.000554 wd 0.0500 time 0.4428 (0.4486) data time 0.0008 (0.0020) model time 0.4420 (0.4460) loss 3.2460 (2.8693) grad_norm 1.5033 (1.8825) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][550/625] eta 0:00:33 lr 0.000554 wd 0.0500 time 0.6112 (0.4489) data time 0.0008 (0.0020) model time 0.6104 (0.4462) loss 2.7583 (2.8684) grad_norm 1.8487 (1.8958) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][560/625] eta 0:00:29 lr 0.000553 wd 0.0500 time 0.4455 (0.4488) data time 0.0009 (0.0019) model time 0.4446 (0.4462) loss 2.9610 (2.8707) grad_norm 5.8822 (1.9005) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][570/625] eta 0:00:24 lr 0.000553 wd 0.0500 time 0.4404 (0.4487) data time 0.0008 (0.0019) model time 0.4396 (0.4461) loss 2.3200 (2.8691) grad_norm 2.1250 (1.9026) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][580/625] eta 0:00:20 lr 0.000553 wd 0.0500 time 0.4412 (0.4486) data time 0.0006 (0.0019) model time 0.4406 (0.4460) loss 3.0095 (2.8677) grad_norm 1.4492 (1.8998) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][590/625] eta 0:00:15 lr 0.000553 wd 0.0500 time 0.4423 (0.4485) data time 0.0009 (0.0019) model time 0.4414 (0.4460) loss 3.1792 (2.8687) grad_norm 1.2075 (1.8938) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][600/625] eta 0:00:11 lr 0.000553 wd 0.0500 time 0.4452 (0.4487) data time 0.0006 (0.0019) model time 0.4446 (0.4462) loss 3.5184 (2.8700) grad_norm 1.4751 (1.8905) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][610/625] eta 0:00:06 lr 0.000553 wd 0.0500 time 0.4410 (0.4486) data time 0.0004 (0.0019) model time 0.4405 (0.4461) loss 1.9689 (2.8655) grad_norm 2.1430 (1.8893) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [167/300][620/625] eta 0:00:02 lr 0.000553 wd 0.0500 time 0.4442 (0.4485) data time 0.0004 (0.0018) model time 0.4438 (0.4461) loss 3.3834 (2.8645) grad_norm 1.2450 (1.8909) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 167 training takes 0:04:40 [2024-08-10 15:28:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:28:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:28:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.493 (0.493) Loss 0.5293 (0.5293) Acc@1 88.525 (88.525) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 15:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.154) Loss 0.8950 (0.6618) Acc@1 78.027 (85.645) Acc@5 95.264 (97.474) Mem 16699MB [2024-08-10 15:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.136) Loss 0.9824 (0.7778) Acc@1 76.465 (82.673) Acc@5 94.678 (96.277) Mem 16699MB [2024-08-10 15:28:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.424 Acc@5 96.233 [2024-08-10 15:28:21 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 15:28:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.826 (0.826) Loss 0.4717 (0.4717) Acc@1 89.258 (89.258) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.185) Loss 0.7607 (0.5917) Acc@1 81.787 (87.092) Acc@5 96.289 (97.852) Mem 16699MB [2024-08-10 15:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.8564 (0.6944) Acc@1 78.467 (84.205) Acc@5 95.801 (96.859) Mem 16699MB [2024-08-10 15:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.907 Acc@5 96.881 [2024-08-10 15:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.91% [2024-08-10 15:28:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:28:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:28:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][0/625] eta 0:08:46 lr 0.000553 wd 0.0500 time 0.8428 (0.8428) data time 0.4575 (0.4575) model time 0.0000 (0.0000) loss 2.7309 (2.7309) grad_norm 2.4428 (2.4428) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][10/625] eta 0:04:54 lr 0.000553 wd 0.0500 time 0.4453 (0.4790) data time 0.0006 (0.0424) model time 0.0000 (0.0000) loss 3.4482 (3.0489) grad_norm 1.7665 (2.0282) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][20/625] eta 0:04:39 lr 0.000553 wd 0.0500 time 0.4417 (0.4618) data time 0.0008 (0.0226) model time 0.0000 (0.0000) loss 3.5669 (2.9774) grad_norm 1.5518 (1.9824) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][30/625] eta 0:04:31 lr 0.000552 wd 0.0500 time 0.4391 (0.4558) data time 0.0009 (0.0157) model time 0.0000 (0.0000) loss 3.4740 (3.0256) grad_norm 1.5050 (1.8379) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][40/625] eta 0:04:24 lr 0.000552 wd 0.0500 time 0.4423 (0.4527) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 2.6988 (2.9536) grad_norm 2.0793 (1.8394) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][50/625] eta 0:04:21 lr 0.000552 wd 0.0500 time 0.4429 (0.4546) data time 0.0009 (0.0099) model time 0.0000 (0.0000) loss 3.2682 (2.9534) grad_norm 1.2807 (1.8661) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][60/625] eta 0:04:16 lr 0.000552 wd 0.0500 time 0.4487 (0.4533) data time 0.0006 (0.0084) model time 0.4481 (0.4454) loss 2.1828 (2.8967) grad_norm 2.4319 (1.8521) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:28:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][70/625] eta 0:04:13 lr 0.000552 wd 0.0500 time 0.4440 (0.4559) data time 0.0009 (0.0074) model time 0.4430 (0.4581) loss 3.1473 (2.8903) grad_norm 1.9007 (1.8289) loss_scale 1024.0000 (576.9014) mem 16699MB [2024-08-10 15:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][80/625] eta 0:04:07 lr 0.000552 wd 0.0500 time 0.4446 (0.4543) data time 0.0008 (0.0066) model time 0.4438 (0.4529) loss 1.9624 (2.8739) grad_norm 2.8143 (1.8290) loss_scale 1024.0000 (632.0988) mem 16699MB [2024-08-10 15:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][90/625] eta 0:04:02 lr 0.000552 wd 0.0500 time 0.4382 (0.4531) data time 0.0006 (0.0059) model time 0.4376 (0.4502) loss 1.8681 (2.8813) grad_norm 2.9898 (1.8901) loss_scale 1024.0000 (675.1648) mem 16699MB [2024-08-10 15:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][100/625] eta 0:03:57 lr 0.000552 wd 0.0500 time 0.4443 (0.4521) data time 0.0007 (0.0054) model time 0.4436 (0.4485) loss 2.7262 (2.8839) grad_norm 1.8809 (1.9282) loss_scale 1024.0000 (709.7030) mem 16699MB [2024-08-10 15:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][110/625] eta 0:03:52 lr 0.000552 wd 0.0500 time 0.4479 (0.4513) data time 0.0009 (0.0050) model time 0.4470 (0.4476) loss 3.0708 (2.8642) grad_norm 1.2792 (1.9006) loss_scale 1024.0000 (738.0180) mem 16699MB [2024-08-10 15:29:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][120/625] eta 0:03:48 lr 0.000551 wd 0.0500 time 0.4455 (0.4524) data time 0.0009 (0.0047) model time 0.4446 (0.4498) loss 3.5173 (2.8758) grad_norm 1.3362 (1.8749) loss_scale 1024.0000 (761.6529) mem 16699MB [2024-08-10 15:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][130/625] eta 0:03:43 lr 0.000551 wd 0.0500 time 0.4419 (0.4517) data time 0.0008 (0.0044) model time 0.4411 (0.4489) loss 3.6320 (2.8847) grad_norm 1.5469 (1.9150) loss_scale 1024.0000 (781.6794) mem 16699MB [2024-08-10 15:29:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][140/625] eta 0:03:38 lr 0.000551 wd 0.0500 time 0.4503 (0.4513) data time 0.0009 (0.0042) model time 0.4494 (0.4484) loss 3.2201 (2.8749) grad_norm 1.5468 (1.8954) loss_scale 1024.0000 (798.8652) mem 16699MB [2024-08-10 15:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][150/625] eta 0:03:34 lr 0.000551 wd 0.0500 time 0.4441 (0.4508) data time 0.0010 (0.0040) model time 0.4432 (0.4479) loss 2.0994 (2.8691) grad_norm 1.6013 (1.8660) loss_scale 1024.0000 (813.7748) mem 16699MB [2024-08-10 15:29:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][160/625] eta 0:03:29 lr 0.000551 wd 0.0500 time 0.4451 (0.4503) data time 0.0006 (0.0038) model time 0.4445 (0.4473) loss 2.2449 (2.8619) grad_norm 1.1464 (1.8422) loss_scale 1024.0000 (826.8323) mem 16699MB [2024-08-10 15:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][170/625] eta 0:03:24 lr 0.000551 wd 0.0500 time 0.4379 (0.4498) data time 0.0010 (0.0036) model time 0.4369 (0.4468) loss 3.4855 (2.8648) grad_norm 1.5196 (1.8413) loss_scale 1024.0000 (838.3626) mem 16699MB [2024-08-10 15:29:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][180/625] eta 0:03:20 lr 0.000551 wd 0.0500 time 0.4448 (0.4495) data time 0.0006 (0.0034) model time 0.4442 (0.4465) loss 2.9257 (2.8711) grad_norm 1.6431 (1.8346) loss_scale 1024.0000 (848.6188) mem 16699MB [2024-08-10 15:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][190/625] eta 0:03:15 lr 0.000551 wd 0.0500 time 0.4449 (0.4493) data time 0.0008 (0.0033) model time 0.4441 (0.4464) loss 3.0590 (2.8710) grad_norm 1.5713 (1.8386) loss_scale 1024.0000 (857.8010) mem 16699MB [2024-08-10 15:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][200/625] eta 0:03:10 lr 0.000551 wd 0.0500 time 0.4427 (0.4490) data time 0.0008 (0.0032) model time 0.4419 (0.4462) loss 2.8754 (2.8747) grad_norm 1.8461 (1.8252) loss_scale 1024.0000 (866.0697) mem 16699MB [2024-08-10 15:30:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][210/625] eta 0:03:06 lr 0.000551 wd 0.0500 time 0.4424 (0.4493) data time 0.0008 (0.0031) model time 0.4416 (0.4466) loss 2.6285 (2.8835) grad_norm 1.4336 (1.8271) loss_scale 1024.0000 (873.5545) mem 16699MB [2024-08-10 15:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][220/625] eta 0:03:01 lr 0.000550 wd 0.0500 time 0.4451 (0.4491) data time 0.0008 (0.0030) model time 0.4443 (0.4465) loss 3.0300 (2.8862) grad_norm 2.8529 (1.8207) loss_scale 1024.0000 (880.3620) mem 16699MB [2024-08-10 15:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][230/625] eta 0:02:57 lr 0.000550 wd 0.0500 time 0.4418 (0.4489) data time 0.0008 (0.0029) model time 0.4409 (0.4463) loss 2.1481 (2.8767) grad_norm 2.5441 (1.8365) loss_scale 1024.0000 (886.5801) mem 16699MB [2024-08-10 15:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][240/625] eta 0:02:52 lr 0.000550 wd 0.0500 time 0.4474 (0.4493) data time 0.0006 (0.0028) model time 0.4468 (0.4469) loss 3.4686 (2.8770) grad_norm 2.3475 (1.8433) loss_scale 1024.0000 (892.2822) mem 16699MB [2024-08-10 15:30:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][250/625] eta 0:02:48 lr 0.000550 wd 0.0500 time 0.4405 (0.4490) data time 0.0007 (0.0027) model time 0.4398 (0.4467) loss 2.5015 (2.8774) grad_norm 1.4902 (1.8454) loss_scale 1024.0000 (897.5299) mem 16699MB [2024-08-10 15:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][260/625] eta 0:02:44 lr 0.000550 wd 0.0500 time 0.4413 (0.4495) data time 0.0009 (0.0027) model time 0.4404 (0.4474) loss 3.2288 (2.8821) grad_norm 1.7103 (1.8455) loss_scale 1024.0000 (902.3755) mem 16699MB [2024-08-10 15:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][270/625] eta 0:02:39 lr 0.000550 wd 0.0500 time 0.4433 (0.4493) data time 0.0008 (0.0026) model time 0.4424 (0.4472) loss 3.0684 (2.8836) grad_norm 2.1812 (1.8705) loss_scale 1024.0000 (906.8635) mem 16699MB [2024-08-10 15:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][280/625] eta 0:02:34 lr 0.000550 wd 0.0500 time 0.4429 (0.4492) data time 0.0006 (0.0025) model time 0.4423 (0.4470) loss 1.9079 (2.8857) grad_norm 1.8013 (1.8714) loss_scale 1024.0000 (911.0320) mem 16699MB [2024-08-10 15:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][290/625] eta 0:02:30 lr 0.000550 wd 0.0500 time 0.4390 (0.4496) data time 0.0007 (0.0025) model time 0.4383 (0.4476) loss 1.8111 (2.8762) grad_norm 1.6066 (1.8628) loss_scale 1024.0000 (914.9141) mem 16699MB [2024-08-10 15:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][300/625] eta 0:02:26 lr 0.000550 wd 0.0500 time 0.4402 (0.4493) data time 0.0006 (0.0024) model time 0.4395 (0.4473) loss 2.4713 (2.8597) grad_norm 1.8292 (1.8538) loss_scale 1024.0000 (918.5382) mem 16699MB [2024-08-10 15:30:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][310/625] eta 0:02:21 lr 0.000549 wd 0.0500 time 0.4465 (0.4491) data time 0.0007 (0.0024) model time 0.4458 (0.4471) loss 2.0926 (2.8674) grad_norm 1.7455 (1.8453) loss_scale 1024.0000 (921.9293) mem 16699MB [2024-08-10 15:30:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][320/625] eta 0:02:16 lr 0.000549 wd 0.0500 time 0.4430 (0.4489) data time 0.0006 (0.0023) model time 0.4424 (0.4470) loss 2.7042 (2.8692) grad_norm 1.8777 (1.8393) loss_scale 1024.0000 (925.1090) mem 16699MB [2024-08-10 15:30:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][330/625] eta 0:02:12 lr 0.000549 wd 0.0500 time 0.4388 (0.4488) data time 0.0008 (0.0023) model time 0.4380 (0.4468) loss 2.1991 (2.8655) grad_norm 1.9533 (1.8404) loss_scale 1024.0000 (928.0967) mem 16699MB [2024-08-10 15:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][340/625] eta 0:02:07 lr 0.000549 wd 0.0500 time 0.4450 (0.4486) data time 0.0008 (0.0023) model time 0.4442 (0.4467) loss 3.0382 (2.8630) grad_norm 0.9357 (1.8292) loss_scale 1024.0000 (930.9091) mem 16699MB [2024-08-10 15:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][350/625] eta 0:02:03 lr 0.000549 wd 0.0500 time 0.4435 (0.4489) data time 0.0009 (0.0022) model time 0.4426 (0.4470) loss 3.0909 (2.8649) grad_norm 1.8233 (1.8538) loss_scale 1024.0000 (933.5613) mem 16699MB [2024-08-10 15:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][360/625] eta 0:01:58 lr 0.000549 wd 0.0500 time 0.4421 (0.4488) data time 0.0009 (0.0022) model time 0.4412 (0.4469) loss 3.0266 (2.8717) grad_norm 1.5394 (1.8557) loss_scale 1024.0000 (936.0665) mem 16699MB [2024-08-10 15:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][370/625] eta 0:01:54 lr 0.000549 wd 0.0500 time 0.4414 (0.4486) data time 0.0006 (0.0022) model time 0.4408 (0.4467) loss 1.6380 (2.8693) grad_norm 1.6638 (1.8469) loss_scale 1024.0000 (938.4367) mem 16699MB [2024-08-10 15:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][380/625] eta 0:01:49 lr 0.000549 wd 0.0500 time 0.4407 (0.4486) data time 0.0007 (0.0021) model time 0.4401 (0.4468) loss 2.4733 (2.8716) grad_norm 1.2654 (1.8367) loss_scale 1024.0000 (940.6824) mem 16699MB [2024-08-10 15:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][390/625] eta 0:01:45 lr 0.000549 wd 0.0500 time 0.4442 (0.4485) data time 0.0006 (0.0021) model time 0.4436 (0.4466) loss 3.1250 (2.8681) grad_norm 1.6444 (1.8294) loss_scale 1024.0000 (942.8133) mem 16699MB [2024-08-10 15:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][400/625] eta 0:01:40 lr 0.000549 wd 0.0500 time 0.4419 (0.4483) data time 0.0007 (0.0021) model time 0.4413 (0.4465) loss 2.3703 (2.8663) grad_norm 1.6970 (1.8251) loss_scale 1024.0000 (944.8379) mem 16699MB [2024-08-10 15:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][410/625] eta 0:01:36 lr 0.000548 wd 0.0500 time 0.4441 (0.4482) data time 0.0009 (0.0021) model time 0.4431 (0.4464) loss 3.3778 (2.8688) grad_norm 2.2940 (1.8410) loss_scale 1024.0000 (946.7640) mem 16699MB [2024-08-10 15:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][420/625] eta 0:01:31 lr 0.000548 wd 0.0500 time 0.4422 (0.4482) data time 0.0007 (0.0020) model time 0.4415 (0.4463) loss 1.8417 (2.8671) grad_norm 2.5747 (1.8472) loss_scale 1024.0000 (948.5986) mem 16699MB [2024-08-10 15:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][430/625] eta 0:01:27 lr 0.000548 wd 0.0500 time 0.4431 (0.4484) data time 0.0006 (0.0020) model time 0.4425 (0.4466) loss 2.3703 (2.8656) grad_norm 1.2568 (1.8502) loss_scale 1024.0000 (950.3480) mem 16699MB [2024-08-10 15:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][440/625] eta 0:01:22 lr 0.000548 wd 0.0500 time 0.4416 (0.4483) data time 0.0008 (0.0020) model time 0.4408 (0.4465) loss 3.0368 (2.8642) grad_norm 1.7688 (1.8458) loss_scale 1024.0000 (952.0181) mem 16699MB [2024-08-10 15:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][450/625] eta 0:01:18 lr 0.000548 wd 0.0500 time 0.4448 (0.4487) data time 0.0007 (0.0020) model time 0.4441 (0.4470) loss 2.5674 (2.8624) grad_norm 2.7584 (1.8454) loss_scale 1024.0000 (953.6142) mem 16699MB [2024-08-10 15:31:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][460/625] eta 0:01:14 lr 0.000548 wd 0.0500 time 0.4418 (0.4490) data time 0.0006 (0.0019) model time 0.4412 (0.4474) loss 3.0689 (2.8611) grad_norm 1.2255 (1.8417) loss_scale 1024.0000 (955.1410) mem 16699MB [2024-08-10 15:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][470/625] eta 0:01:09 lr 0.000548 wd 0.0500 time 0.4446 (0.4489) data time 0.0006 (0.0019) model time 0.4439 (0.4473) loss 3.5870 (2.8604) grad_norm 2.1539 (1.8397) loss_scale 1024.0000 (956.6030) mem 16699MB [2024-08-10 15:32:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][480/625] eta 0:01:05 lr 0.000548 wd 0.0500 time 0.4433 (0.4494) data time 0.0007 (0.0019) model time 0.4427 (0.4478) loss 3.2878 (2.8633) grad_norm 1.4315 (1.8414) loss_scale 1024.0000 (958.0042) mem 16699MB [2024-08-10 15:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][490/625] eta 0:01:00 lr 0.000548 wd 0.0500 time 0.4431 (0.4493) data time 0.0007 (0.0019) model time 0.4425 (0.4477) loss 2.2174 (2.8569) grad_norm 1.6952 (1.8372) loss_scale 1024.0000 (959.3483) mem 16699MB [2024-08-10 15:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][500/625] eta 0:00:56 lr 0.000547 wd 0.0500 time 0.4426 (0.4491) data time 0.0009 (0.0019) model time 0.4417 (0.4476) loss 2.4198 (2.8584) grad_norm 1.2167 (1.8358) loss_scale 1024.0000 (960.6387) mem 16699MB [2024-08-10 15:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][510/625] eta 0:00:51 lr 0.000547 wd 0.0500 time 0.4435 (0.4490) data time 0.0008 (0.0018) model time 0.4426 (0.4475) loss 2.4922 (2.8617) grad_norm 1.5546 (1.8307) loss_scale 1024.0000 (961.8787) mem 16699MB [2024-08-10 15:32:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][520/625] eta 0:00:47 lr 0.000547 wd 0.0500 time 0.4425 (0.4489) data time 0.0007 (0.0018) model time 0.4418 (0.4474) loss 3.1330 (2.8664) grad_norm 3.3738 (1.8335) loss_scale 1024.0000 (963.0710) mem 16699MB [2024-08-10 15:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][530/625] eta 0:00:42 lr 0.000547 wd 0.0500 time 0.4441 (0.4488) data time 0.0007 (0.0018) model time 0.4434 (0.4473) loss 3.6065 (2.8689) grad_norm 2.3007 (1.8339) loss_scale 1024.0000 (964.2185) mem 16699MB [2024-08-10 15:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][540/625] eta 0:00:38 lr 0.000547 wd 0.0500 time 0.4412 (0.4487) data time 0.0009 (0.0018) model time 0.4404 (0.4472) loss 2.9534 (2.8655) grad_norm 1.4586 (1.8338) loss_scale 1024.0000 (965.3235) mem 16699MB [2024-08-10 15:32:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][550/625] eta 0:00:33 lr 0.000547 wd 0.0500 time 0.4411 (0.4486) data time 0.0007 (0.0018) model time 0.4405 (0.4470) loss 2.3755 (2.8664) grad_norm 1.2335 (1.8292) loss_scale 1024.0000 (966.3884) mem 16699MB [2024-08-10 15:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][560/625] eta 0:00:29 lr 0.000547 wd 0.0500 time 0.4465 (0.4485) data time 0.0006 (0.0018) model time 0.4459 (0.4470) loss 2.9370 (2.8619) grad_norm 1.6146 (1.8266) loss_scale 1024.0000 (967.4153) mem 16699MB [2024-08-10 15:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][570/625] eta 0:00:24 lr 0.000547 wd 0.0500 time 0.4441 (0.4484) data time 0.0006 (0.0017) model time 0.4434 (0.4469) loss 3.1410 (2.8597) grad_norm 2.4435 (1.8300) loss_scale 1024.0000 (968.4063) mem 16699MB [2024-08-10 15:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][580/625] eta 0:00:20 lr 0.000547 wd 0.0500 time 0.4418 (0.4483) data time 0.0009 (0.0017) model time 0.4409 (0.4468) loss 3.3390 (2.8609) grad_norm 1.7712 (1.8294) loss_scale 1024.0000 (969.3632) mem 16699MB [2024-08-10 15:32:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][590/625] eta 0:00:15 lr 0.000546 wd 0.0500 time 0.4443 (0.4482) data time 0.0006 (0.0017) model time 0.4437 (0.4467) loss 3.0525 (2.8565) grad_norm 2.1451 (1.8450) loss_scale 1024.0000 (970.2876) mem 16699MB [2024-08-10 15:32:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][600/625] eta 0:00:11 lr 0.000546 wd 0.0500 time 0.4430 (0.4482) data time 0.0008 (0.0017) model time 0.4422 (0.4466) loss 3.1813 (2.8531) grad_norm 1.3742 (1.8422) loss_scale 1024.0000 (971.1814) mem 16699MB [2024-08-10 15:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][610/625] eta 0:00:06 lr 0.000546 wd 0.0500 time 0.4370 (0.4484) data time 0.0006 (0.0017) model time 0.4364 (0.4469) loss 2.5507 (2.8539) grad_norm 1.2309 (1.8359) loss_scale 1024.0000 (972.0458) mem 16699MB [2024-08-10 15:33:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [168/300][620/625] eta 0:00:02 lr 0.000546 wd 0.0500 time 0.4392 (0.4482) data time 0.0006 (0.0017) model time 0.4386 (0.4467) loss 2.4244 (2.8556) grad_norm 1.5701 (1.8339) loss_scale 1024.0000 (972.8824) mem 16699MB [2024-08-10 15:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 168 training takes 0:04:40 [2024-08-10 15:33:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:33:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.508 (0.508) Loss 0.5142 (0.5142) Acc@1 88.770 (88.770) Acc@5 98.633 (98.633) Mem 16699MB [2024-08-10 15:33:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.156) Loss 0.8325 (0.6382) Acc@1 80.127 (85.702) Acc@5 95.947 (97.554) Mem 16699MB [2024-08-10 15:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.137) Loss 0.9087 (0.7566) Acc@1 77.539 (82.801) Acc@5 94.824 (96.322) Mem 16699MB [2024-08-10 15:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.514 Acc@5 96.335 [2024-08-10 15:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.5% [2024-08-10 15:33:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.837 (0.837) Loss 0.4707 (0.4707) Acc@1 89.209 (89.209) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.187) Loss 0.7603 (0.5911) Acc@1 81.592 (87.007) Acc@5 96.143 (97.838) Mem 16699MB [2024-08-10 15:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.153) Loss 0.8564 (0.6937) Acc@1 78.564 (84.180) Acc@5 95.850 (96.847) Mem 16699MB [2024-08-10 15:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.883 Acc@5 96.869 [2024-08-10 15:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:33:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][0/625] eta 0:12:52 lr 0.000546 wd 0.0500 time 1.2363 (1.2363) data time 0.4683 (0.4683) model time 0.0000 (0.0000) loss 3.0814 (3.0814) grad_norm 1.4008 (1.4008) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][10/625] eta 0:05:19 lr 0.000546 wd 0.0500 time 0.4489 (0.5193) data time 0.0006 (0.0434) model time 0.0000 (0.0000) loss 2.8856 (2.8087) grad_norm 2.1233 (1.7729) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][20/625] eta 0:04:52 lr 0.000546 wd 0.0500 time 0.4438 (0.4842) data time 0.0007 (0.0232) model time 0.0000 (0.0000) loss 3.0247 (2.8640) grad_norm 2.2482 (1.8318) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][30/625] eta 0:04:46 lr 0.000546 wd 0.0500 time 0.4444 (0.4816) data time 0.0006 (0.0160) model time 0.0000 (0.0000) loss 3.1787 (2.8209) grad_norm 1.8376 (1.7598) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][40/625] eta 0:04:36 lr 0.000546 wd 0.0500 time 0.4421 (0.4721) data time 0.0008 (0.0123) model time 0.0000 (0.0000) loss 3.1378 (2.7938) grad_norm 1.8145 (1.7815) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][50/625] eta 0:04:28 lr 0.000546 wd 0.0500 time 0.4425 (0.4664) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 2.5705 (2.8156) grad_norm 2.9398 (1.8407) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][60/625] eta 0:04:23 lr 0.000545 wd 0.0500 time 0.4411 (0.4658) data time 0.0007 (0.0086) model time 0.4404 (0.4619) loss 2.0728 (2.8410) grad_norm 2.6504 (1.8705) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][70/625] eta 0:04:16 lr 0.000545 wd 0.0500 time 0.4483 (0.4629) data time 0.0006 (0.0075) model time 0.4476 (0.4529) loss 3.4297 (2.8634) grad_norm 3.4549 (1.9538) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][80/625] eta 0:04:11 lr 0.000545 wd 0.0500 time 0.4400 (0.4607) data time 0.0007 (0.0067) model time 0.4393 (0.4501) loss 3.5597 (2.9199) grad_norm 1.3931 (1.9285) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:33:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][90/625] eta 0:04:05 lr 0.000545 wd 0.0500 time 0.4424 (0.4589) data time 0.0008 (0.0060) model time 0.4416 (0.4485) loss 3.3288 (2.9141) grad_norm 1.8466 (1.9979) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][100/625] eta 0:04:00 lr 0.000545 wd 0.0500 time 0.4407 (0.4575) data time 0.0007 (0.0055) model time 0.4400 (0.4474) loss 3.3491 (2.9122) grad_norm 3.8335 (2.0117) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][110/625] eta 0:03:54 lr 0.000545 wd 0.0500 time 0.4413 (0.4562) data time 0.0007 (0.0051) model time 0.4406 (0.4465) loss 1.9285 (2.9079) grad_norm 1.7909 (2.0824) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][120/625] eta 0:03:49 lr 0.000545 wd 0.0500 time 0.4442 (0.4550) data time 0.0010 (0.0048) model time 0.4432 (0.4458) loss 2.6820 (2.9120) grad_norm 9.5330 (2.1662) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][130/625] eta 0:03:44 lr 0.000545 wd 0.0500 time 0.4444 (0.4541) data time 0.0008 (0.0045) model time 0.4436 (0.4453) loss 3.0548 (2.9206) grad_norm 1.6476 (2.1400) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][140/625] eta 0:03:39 lr 0.000545 wd 0.0500 time 0.4467 (0.4534) data time 0.0009 (0.0042) model time 0.4458 (0.4451) loss 2.8176 (2.9059) grad_norm 1.3342 (2.1145) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][150/625] eta 0:03:35 lr 0.000545 wd 0.0500 time 0.4423 (0.4528) data time 0.0008 (0.0040) model time 0.4415 (0.4448) loss 2.9443 (2.9125) grad_norm 2.0833 (2.1335) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][160/625] eta 0:03:30 lr 0.000544 wd 0.0500 time 0.4452 (0.4522) data time 0.0006 (0.0038) model time 0.4447 (0.4447) loss 3.2852 (2.9154) grad_norm 1.3489 (2.1345) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][170/625] eta 0:03:25 lr 0.000544 wd 0.0500 time 0.4417 (0.4516) data time 0.0008 (0.0037) model time 0.4410 (0.4444) loss 2.7118 (2.9173) grad_norm 2.1170 (2.1139) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][180/625] eta 0:03:20 lr 0.000544 wd 0.0500 time 0.4399 (0.4511) data time 0.0007 (0.0035) model time 0.4392 (0.4441) loss 3.0002 (2.9134) grad_norm 1.5057 (2.1055) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][190/625] eta 0:03:16 lr 0.000544 wd 0.0500 time 0.4419 (0.4515) data time 0.0006 (0.0034) model time 0.4413 (0.4452) loss 2.9954 (2.9175) grad_norm 1.1810 (2.0765) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][200/625] eta 0:03:11 lr 0.000544 wd 0.0500 time 0.4383 (0.4510) data time 0.0009 (0.0032) model time 0.4374 (0.4449) loss 3.1135 (2.9209) grad_norm 2.1979 (2.0627) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][210/625] eta 0:03:07 lr 0.000544 wd 0.0500 time 0.4512 (0.4517) data time 0.0008 (0.0031) model time 0.4505 (0.4461) loss 3.0766 (2.9175) grad_norm 1.8333 (2.0753) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][220/625] eta 0:03:02 lr 0.000544 wd 0.0500 time 0.4438 (0.4516) data time 0.0007 (0.0030) model time 0.4431 (0.4463) loss 3.0822 (2.9291) grad_norm 2.2139 (2.0779) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][230/625] eta 0:02:58 lr 0.000544 wd 0.0500 time 0.4412 (0.4513) data time 0.0008 (0.0029) model time 0.4404 (0.4461) loss 2.7548 (2.9197) grad_norm 1.0761 (2.0687) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][240/625] eta 0:02:53 lr 0.000544 wd 0.0500 time 0.4433 (0.4509) data time 0.0005 (0.0029) model time 0.4428 (0.4459) loss 3.1843 (2.9334) grad_norm 1.3516 (2.0448) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][250/625] eta 0:02:48 lr 0.000543 wd 0.0500 time 0.4408 (0.4506) data time 0.0009 (0.0028) model time 0.4399 (0.4456) loss 2.9691 (2.9343) grad_norm 2.1421 (2.0254) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][260/625] eta 0:02:44 lr 0.000543 wd 0.0500 time 0.4411 (0.4503) data time 0.0008 (0.0027) model time 0.4403 (0.4454) loss 2.3947 (2.9382) grad_norm 1.6486 (2.0053) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][270/625] eta 0:02:39 lr 0.000543 wd 0.0500 time 0.4397 (0.4501) data time 0.0007 (0.0026) model time 0.4390 (0.4454) loss 2.0311 (2.9344) grad_norm 3.5739 (2.0137) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][280/625] eta 0:02:35 lr 0.000543 wd 0.0500 time 0.4440 (0.4504) data time 0.0007 (0.0026) model time 0.4433 (0.4460) loss 2.8445 (2.9332) grad_norm 1.5808 (2.0068) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][290/625] eta 0:02:30 lr 0.000543 wd 0.0500 time 0.4438 (0.4503) data time 0.0007 (0.0025) model time 0.4431 (0.4459) loss 3.3570 (2.9308) grad_norm 2.2905 (2.0043) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][300/625] eta 0:02:26 lr 0.000543 wd 0.0500 time 0.4419 (0.4500) data time 0.0008 (0.0025) model time 0.4410 (0.4458) loss 3.3742 (2.9274) grad_norm 1.9205 (1.9984) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][310/625] eta 0:02:21 lr 0.000543 wd 0.0500 time 0.4419 (0.4498) data time 0.0007 (0.0024) model time 0.4412 (0.4457) loss 2.1611 (2.9286) grad_norm 2.1401 (2.0000) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][320/625] eta 0:02:17 lr 0.000543 wd 0.0500 time 0.4441 (0.4496) data time 0.0009 (0.0024) model time 0.4432 (0.4455) loss 1.8615 (2.9212) grad_norm 2.5971 (2.0119) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][330/625] eta 0:02:12 lr 0.000543 wd 0.0500 time 0.4408 (0.4494) data time 0.0007 (0.0023) model time 0.4401 (0.4453) loss 3.4608 (2.9177) grad_norm 2.2990 (2.0020) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][340/625] eta 0:02:08 lr 0.000543 wd 0.0500 time 0.4416 (0.4491) data time 0.0008 (0.0023) model time 0.4407 (0.4452) loss 3.0302 (2.9157) grad_norm 1.6578 (2.0070) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][350/625] eta 0:02:03 lr 0.000542 wd 0.0500 time 0.4443 (0.4490) data time 0.0006 (0.0023) model time 0.4437 (0.4451) loss 2.9697 (2.9134) grad_norm 1.3957 (1.9948) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:35:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][360/625] eta 0:01:59 lr 0.000542 wd 0.0500 time 0.4423 (0.4498) data time 0.0008 (0.0022) model time 0.4415 (0.4461) loss 3.7730 (2.9174) grad_norm 3.6404 (2.0053) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][370/625] eta 0:01:54 lr 0.000542 wd 0.0500 time 0.4458 (0.4496) data time 0.0006 (0.0022) model time 0.4451 (0.4460) loss 3.4550 (2.9250) grad_norm 1.7350 (2.0117) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][380/625] eta 0:01:50 lr 0.000542 wd 0.0500 time 0.4532 (0.4495) data time 0.0007 (0.0021) model time 0.4525 (0.4459) loss 2.4338 (2.9211) grad_norm 1.9917 (2.0057) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][390/625] eta 0:01:45 lr 0.000542 wd 0.0500 time 0.4414 (0.4493) data time 0.0007 (0.0021) model time 0.4407 (0.4458) loss 3.2256 (2.9217) grad_norm 4.2718 (2.0178) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][400/625] eta 0:01:41 lr 0.000542 wd 0.0500 time 0.4407 (0.4492) data time 0.0006 (0.0021) model time 0.4401 (0.4457) loss 3.2425 (2.9209) grad_norm 3.2194 (2.0177) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][410/625] eta 0:01:36 lr 0.000542 wd 0.0500 time 0.4418 (0.4495) data time 0.0011 (0.0021) model time 0.4407 (0.4462) loss 3.2564 (2.9176) grad_norm 1.2250 (2.0072) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][420/625] eta 0:01:32 lr 0.000542 wd 0.0500 time 0.3892 (0.4497) data time 0.0009 (0.0021) model time 0.3882 (0.4465) loss 2.5695 (2.9143) grad_norm 1.7255 (2.0017) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][430/625] eta 0:01:27 lr 0.000542 wd 0.0500 time 0.4478 (0.4496) data time 0.0008 (0.0020) model time 0.4470 (0.4464) loss 1.8630 (2.9099) grad_norm 2.0036 (2.0006) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][440/625] eta 0:01:23 lr 0.000541 wd 0.0500 time 0.4438 (0.4498) data time 0.0007 (0.0020) model time 0.4432 (0.4467) loss 3.0298 (2.9082) grad_norm 1.4966 (1.9962) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][450/625] eta 0:01:18 lr 0.000541 wd 0.0500 time 0.4440 (0.4497) data time 0.0009 (0.0020) model time 0.4431 (0.4466) loss 3.3701 (2.9032) grad_norm 1.7341 (1.9906) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][460/625] eta 0:01:14 lr 0.000541 wd 0.0500 time 0.4414 (0.4495) data time 0.0006 (0.0020) model time 0.4408 (0.4465) loss 2.1788 (2.9086) grad_norm 1.4374 (1.9848) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][470/625] eta 0:01:09 lr 0.000541 wd 0.0500 time 0.4443 (0.4494) data time 0.0008 (0.0019) model time 0.4434 (0.4464) loss 2.7473 (2.9068) grad_norm 1.9097 (1.9785) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][480/625] eta 0:01:05 lr 0.000541 wd 0.0500 time 0.4462 (0.4493) data time 0.0006 (0.0019) model time 0.4456 (0.4463) loss 2.9923 (2.9093) grad_norm 1.6929 (1.9715) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:36:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][490/625] eta 0:01:00 lr 0.000541 wd 0.0500 time 0.4393 (0.4492) data time 0.0007 (0.0019) model time 0.4386 (0.4462) loss 2.9216 (2.9096) grad_norm 2.2074 (1.9640) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][500/625] eta 0:00:56 lr 0.000541 wd 0.0500 time 0.4450 (0.4491) data time 0.0008 (0.0019) model time 0.4442 (0.4461) loss 2.4328 (2.9114) grad_norm 1.8604 (1.9636) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][510/625] eta 0:00:51 lr 0.000541 wd 0.0500 time 0.4453 (0.4490) data time 0.0008 (0.0019) model time 0.4445 (0.4461) loss 2.5483 (2.9109) grad_norm 1.4079 (1.9587) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][520/625] eta 0:00:47 lr 0.000541 wd 0.0500 time 0.4436 (0.4489) data time 0.0007 (0.0018) model time 0.4429 (0.4460) loss 2.5819 (2.9039) grad_norm 1.4221 (1.9493) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][530/625] eta 0:00:42 lr 0.000540 wd 0.0500 time 0.4462 (0.4488) data time 0.0007 (0.0018) model time 0.4456 (0.4460) loss 3.3947 (2.9021) grad_norm 3.6189 (1.9693) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][540/625] eta 0:00:38 lr 0.000540 wd 0.0500 time 0.4426 (0.4487) data time 0.0010 (0.0018) model time 0.4416 (0.4459) loss 2.9300 (2.9035) grad_norm 1.5825 (1.9696) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][550/625] eta 0:00:33 lr 0.000540 wd 0.0500 time 0.4430 (0.4487) data time 0.0007 (0.0018) model time 0.4423 (0.4460) loss 3.5544 (2.9033) grad_norm 2.4168 (1.9679) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][560/625] eta 0:00:29 lr 0.000540 wd 0.0500 time 0.4383 (0.4490) data time 0.0009 (0.0018) model time 0.4374 (0.4463) loss 2.8790 (2.9019) grad_norm 1.3120 (1.9610) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][570/625] eta 0:00:24 lr 0.000540 wd 0.0500 time 0.4435 (0.4489) data time 0.0009 (0.0018) model time 0.4426 (0.4462) loss 2.9129 (2.8993) grad_norm 1.8749 (1.9540) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][580/625] eta 0:00:20 lr 0.000540 wd 0.0500 time 0.6148 (0.4491) data time 0.0008 (0.0017) model time 0.6140 (0.4465) loss 3.2366 (2.9002) grad_norm 1.5562 (1.9515) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][590/625] eta 0:00:15 lr 0.000540 wd 0.0500 time 0.4414 (0.4490) data time 0.0009 (0.0017) model time 0.4405 (0.4464) loss 2.8480 (2.9016) grad_norm 3.6104 (1.9529) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][600/625] eta 0:00:11 lr 0.000540 wd 0.0500 time 0.4466 (0.4489) data time 0.0009 (0.0017) model time 0.4457 (0.4463) loss 2.9670 (2.9002) grad_norm 2.1453 (1.9560) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][610/625] eta 0:00:06 lr 0.000540 wd 0.0500 time 0.4377 (0.4487) data time 0.0004 (0.0017) model time 0.4373 (0.4462) loss 2.2428 (2.8927) grad_norm 1.7135 (1.9539) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [169/300][620/625] eta 0:00:02 lr 0.000540 wd 0.0500 time 0.4354 (0.4486) data time 0.0004 (0.0017) model time 0.4350 (0.4460) loss 3.4232 (2.8944) grad_norm 2.1250 (1.9556) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:37:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 169 training takes 0:04:40 [2024-08-10 15:37:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:37:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5498 (0.5498) Acc@1 88.330 (88.330) Acc@5 98.291 (98.291) Mem 16699MB [2024-08-10 15:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8608 (0.6566) Acc@1 79.980 (85.684) Acc@5 95.459 (97.559) Mem 16699MB [2024-08-10 15:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.9414 (0.7714) Acc@1 76.562 (82.678) Acc@5 94.775 (96.343) Mem 16699MB [2024-08-10 15:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.432 Acc@5 96.329 [2024-08-10 15:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.4% [2024-08-10 15:38:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.825 (0.825) Loss 0.4702 (0.4702) Acc@1 89.355 (89.355) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:38:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.185) Loss 0.7607 (0.5912) Acc@1 81.641 (87.083) Acc@5 96.045 (97.820) Mem 16699MB [2024-08-10 15:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.152) Loss 0.8550 (0.6935) Acc@1 78.467 (84.231) Acc@5 95.801 (96.870) Mem 16699MB [2024-08-10 15:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.907 Acc@5 96.881 [2024-08-10 15:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.91% [2024-08-10 15:38:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:38:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:38:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][0/625] eta 0:07:43 lr 0.000539 wd 0.0500 time 0.7418 (0.7418) data time 0.3497 (0.3497) model time 0.0000 (0.0000) loss 2.2560 (2.2560) grad_norm 2.1408 (2.1408) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][10/625] eta 0:04:49 lr 0.000539 wd 0.0500 time 0.4440 (0.4707) data time 0.0006 (0.0327) model time 0.0000 (0.0000) loss 3.4575 (2.9190) grad_norm 1.9208 (2.0535) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][20/625] eta 0:04:42 lr 0.000539 wd 0.0500 time 0.6126 (0.4662) data time 0.0007 (0.0175) model time 0.0000 (0.0000) loss 3.2767 (2.9061) grad_norm 1.3297 (1.7348) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][30/625] eta 0:04:37 lr 0.000539 wd 0.0500 time 0.4414 (0.4656) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 3.2914 (2.8601) grad_norm 1.8433 (1.8130) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][40/625] eta 0:04:29 lr 0.000539 wd 0.0500 time 0.4437 (0.4600) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 2.4374 (2.8479) grad_norm 1.4430 (1.8167) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][50/625] eta 0:04:22 lr 0.000539 wd 0.0500 time 0.4477 (0.4565) data time 0.0006 (0.0077) model time 0.0000 (0.0000) loss 3.1439 (2.9101) grad_norm 1.3858 (1.7815) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][60/625] eta 0:04:16 lr 0.000539 wd 0.0500 time 0.4423 (0.4541) data time 0.0008 (0.0066) model time 0.4415 (0.4411) loss 2.9900 (2.8793) grad_norm 1.8845 (1.7692) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][70/625] eta 0:04:11 lr 0.000539 wd 0.0500 time 0.4373 (0.4523) data time 0.0006 (0.0058) model time 0.4367 (0.4407) loss 2.5985 (2.8436) grad_norm 1.1581 (1.7418) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][80/625] eta 0:04:05 lr 0.000539 wd 0.0500 time 0.4414 (0.4509) data time 0.0006 (0.0052) model time 0.4408 (0.4405) loss 3.2854 (2.8533) grad_norm 1.4142 (1.7470) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][90/625] eta 0:04:00 lr 0.000539 wd 0.0500 time 0.4441 (0.4502) data time 0.0007 (0.0047) model time 0.4434 (0.4412) loss 3.4027 (2.8385) grad_norm 1.3684 (1.7397) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][100/625] eta 0:03:55 lr 0.000538 wd 0.0500 time 0.4443 (0.4494) data time 0.0008 (0.0043) model time 0.4435 (0.4413) loss 2.9464 (2.8298) grad_norm 1.6478 (1.8020) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][110/625] eta 0:03:51 lr 0.000538 wd 0.0500 time 0.4431 (0.4490) data time 0.0009 (0.0041) model time 0.4422 (0.4416) loss 2.5785 (2.8291) grad_norm 1.6973 (1.8280) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][120/625] eta 0:03:47 lr 0.000538 wd 0.0500 time 0.4439 (0.4503) data time 0.0008 (0.0038) model time 0.4431 (0.4448) loss 3.2073 (2.8312) grad_norm 1.1625 (1.8154) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][130/625] eta 0:03:43 lr 0.000538 wd 0.0500 time 0.4422 (0.4524) data time 0.0009 (0.0036) model time 0.4413 (0.4488) loss 3.2256 (2.8437) grad_norm 2.0746 (1.8185) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][140/625] eta 0:03:39 lr 0.000538 wd 0.0500 time 0.4394 (0.4516) data time 0.0010 (0.0034) model time 0.4385 (0.4480) loss 2.9863 (2.8575) grad_norm 1.5583 (1.8063) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][150/625] eta 0:03:34 lr 0.000538 wd 0.0500 time 0.4451 (0.4511) data time 0.0008 (0.0032) model time 0.4443 (0.4473) loss 3.1014 (2.8568) grad_norm 1.6352 (1.8052) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][160/625] eta 0:03:29 lr 0.000538 wd 0.0500 time 0.4440 (0.4506) data time 0.0009 (0.0031) model time 0.4431 (0.4469) loss 2.1219 (2.8537) grad_norm 1.3241 (1.8368) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][170/625] eta 0:03:24 lr 0.000538 wd 0.0500 time 0.4461 (0.4503) data time 0.0008 (0.0030) model time 0.4453 (0.4468) loss 2.9969 (2.8582) grad_norm 1.8086 (1.8268) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][180/625] eta 0:03:20 lr 0.000538 wd 0.0500 time 0.4407 (0.4500) data time 0.0008 (0.0029) model time 0.4399 (0.4466) loss 3.3006 (2.8531) grad_norm 1.4378 (1.8185) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][190/625] eta 0:03:15 lr 0.000537 wd 0.0500 time 0.4460 (0.4499) data time 0.0007 (0.0027) model time 0.4453 (0.4466) loss 1.9203 (2.8475) grad_norm 1.3473 (1.8231) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][200/625] eta 0:03:11 lr 0.000537 wd 0.0500 time 0.4425 (0.4496) data time 0.0007 (0.0026) model time 0.4418 (0.4464) loss 2.9294 (2.8456) grad_norm 2.0496 (1.8151) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][210/625] eta 0:03:06 lr 0.000537 wd 0.0500 time 0.4478 (0.4493) data time 0.0008 (0.0026) model time 0.4470 (0.4461) loss 3.0944 (2.8326) grad_norm 1.4599 (1.8022) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][220/625] eta 0:03:01 lr 0.000537 wd 0.0500 time 0.4422 (0.4492) data time 0.0008 (0.0025) model time 0.4414 (0.4461) loss 3.1838 (2.8421) grad_norm 1.5819 (1.8012) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][230/625] eta 0:02:57 lr 0.000537 wd 0.0500 time 0.4392 (0.4490) data time 0.0009 (0.0024) model time 0.4383 (0.4460) loss 3.0449 (2.8413) grad_norm 2.8814 (1.8222) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][240/625] eta 0:02:52 lr 0.000537 wd 0.0500 time 0.4411 (0.4489) data time 0.0008 (0.0024) model time 0.4403 (0.4459) loss 3.1267 (2.8527) grad_norm 1.2138 (1.8163) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][250/625] eta 0:02:48 lr 0.000537 wd 0.0500 time 0.4419 (0.4487) data time 0.0008 (0.0023) model time 0.4411 (0.4458) loss 3.0277 (2.8642) grad_norm 2.2218 (1.8096) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][260/625] eta 0:02:43 lr 0.000537 wd 0.0500 time 0.4473 (0.4486) data time 0.0008 (0.0022) model time 0.4465 (0.4458) loss 2.9609 (2.8670) grad_norm 2.3877 (1.8117) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][270/625] eta 0:02:39 lr 0.000537 wd 0.0500 time 0.4526 (0.4485) data time 0.0006 (0.0022) model time 0.4520 (0.4458) loss 2.8028 (2.8566) grad_norm 1.9326 (1.8169) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][280/625] eta 0:02:34 lr 0.000537 wd 0.0500 time 0.4395 (0.4483) data time 0.0007 (0.0021) model time 0.4388 (0.4456) loss 3.2162 (2.8592) grad_norm 1.9640 (1.8279) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][290/625] eta 0:02:30 lr 0.000536 wd 0.0500 time 0.4418 (0.4481) data time 0.0006 (0.0021) model time 0.4412 (0.4455) loss 3.1716 (2.8579) grad_norm 1.7598 (1.8347) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][300/625] eta 0:02:25 lr 0.000536 wd 0.0500 time 0.4484 (0.4480) data time 0.0010 (0.0021) model time 0.4474 (0.4454) loss 2.9972 (2.8521) grad_norm 2.2323 (1.8272) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][310/625] eta 0:02:21 lr 0.000536 wd 0.0500 time 0.4455 (0.4479) data time 0.0007 (0.0020) model time 0.4448 (0.4453) loss 2.4282 (2.8468) grad_norm 1.4977 (1.8207) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][320/625] eta 0:02:16 lr 0.000536 wd 0.0500 time 0.4432 (0.4478) data time 0.0006 (0.0020) model time 0.4426 (0.4452) loss 3.2924 (2.8460) grad_norm 1.2581 (1.8326) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][330/625] eta 0:02:12 lr 0.000536 wd 0.0500 time 0.4429 (0.4477) data time 0.0006 (0.0020) model time 0.4423 (0.4451) loss 3.1135 (2.8554) grad_norm 1.5942 (1.8256) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][340/625] eta 0:02:07 lr 0.000536 wd 0.0500 time 0.4422 (0.4476) data time 0.0006 (0.0019) model time 0.4416 (0.4451) loss 2.1291 (2.8511) grad_norm 1.3171 (1.8217) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][350/625] eta 0:02:03 lr 0.000536 wd 0.0500 time 0.4428 (0.4481) data time 0.0008 (0.0019) model time 0.4420 (0.4457) loss 3.1470 (2.8494) grad_norm 1.2087 (1.8145) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][360/625] eta 0:01:58 lr 0.000536 wd 0.0500 time 0.4426 (0.4484) data time 0.0008 (0.0019) model time 0.4419 (0.4461) loss 2.4332 (2.8458) grad_norm 1.5346 (1.8087) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][370/625] eta 0:01:54 lr 0.000536 wd 0.0500 time 0.4398 (0.4487) data time 0.0007 (0.0018) model time 0.4391 (0.4466) loss 3.1402 (2.8482) grad_norm 2.2110 (1.8227) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][380/625] eta 0:01:49 lr 0.000535 wd 0.0500 time 0.4446 (0.4486) data time 0.0007 (0.0018) model time 0.4440 (0.4464) loss 1.7132 (2.8476) grad_norm 1.9925 (1.8373) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][390/625] eta 0:01:45 lr 0.000535 wd 0.0500 time 0.4468 (0.4485) data time 0.0007 (0.0018) model time 0.4461 (0.4464) loss 2.5393 (2.8462) grad_norm 1.9753 (1.8429) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][400/625] eta 0:01:40 lr 0.000535 wd 0.0500 time 0.4434 (0.4484) data time 0.0009 (0.0018) model time 0.4425 (0.4463) loss 3.2196 (2.8417) grad_norm 2.4739 (1.8420) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][410/625] eta 0:01:36 lr 0.000535 wd 0.0500 time 0.4436 (0.4482) data time 0.0007 (0.0018) model time 0.4429 (0.4462) loss 3.2057 (2.8491) grad_norm 1.9584 (1.8413) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][420/625] eta 0:01:31 lr 0.000535 wd 0.0500 time 0.4424 (0.4481) data time 0.0008 (0.0017) model time 0.4416 (0.4461) loss 3.2471 (2.8497) grad_norm 1.6575 (1.8472) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][430/625] eta 0:01:27 lr 0.000535 wd 0.0500 time 0.4427 (0.4480) data time 0.0007 (0.0017) model time 0.4420 (0.4460) loss 3.7287 (2.8534) grad_norm 1.5628 (1.8445) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][440/625] eta 0:01:22 lr 0.000535 wd 0.0500 time 0.4524 (0.4479) data time 0.0009 (0.0017) model time 0.4515 (0.4459) loss 3.2561 (2.8578) grad_norm 1.2918 (1.8403) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][450/625] eta 0:01:18 lr 0.000535 wd 0.0500 time 0.4451 (0.4483) data time 0.0006 (0.0017) model time 0.4445 (0.4464) loss 2.8218 (2.8563) grad_norm 4.7075 (1.8444) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][460/625] eta 0:01:14 lr 0.000535 wd 0.0500 time 0.6154 (0.4486) data time 0.0006 (0.0017) model time 0.6148 (0.4467) loss 3.1494 (2.8634) grad_norm 1.1926 (1.8523) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][470/625] eta 0:01:09 lr 0.000535 wd 0.0500 time 0.4441 (0.4485) data time 0.0007 (0.0017) model time 0.4434 (0.4466) loss 1.8253 (2.8645) grad_norm 1.5850 (1.8615) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][480/625] eta 0:01:05 lr 0.000534 wd 0.0500 time 0.4447 (0.4484) data time 0.0006 (0.0016) model time 0.4441 (0.4465) loss 3.2178 (2.8637) grad_norm 1.5829 (1.8544) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][490/625] eta 0:01:00 lr 0.000534 wd 0.0500 time 0.4427 (0.4487) data time 0.0008 (0.0016) model time 0.4419 (0.4468) loss 2.8174 (2.8534) grad_norm 1.7722 (1.8447) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][500/625] eta 0:00:56 lr 0.000534 wd 0.0500 time 0.4411 (0.4489) data time 0.0006 (0.0016) model time 0.4405 (0.4471) loss 3.3068 (2.8532) grad_norm 1.2734 (1.8388) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][510/625] eta 0:00:51 lr 0.000534 wd 0.0500 time 0.4439 (0.4488) data time 0.0007 (0.0016) model time 0.4432 (0.4470) loss 2.5529 (2.8505) grad_norm 1.5479 (1.8362) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:41:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][520/625] eta 0:00:47 lr 0.000534 wd 0.0500 time 0.4426 (0.4486) data time 0.0006 (0.0016) model time 0.4419 (0.4469) loss 2.5216 (2.8484) grad_norm 1.7179 (1.8318) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][530/625] eta 0:00:42 lr 0.000534 wd 0.0500 time 0.4402 (0.4486) data time 0.0007 (0.0016) model time 0.4395 (0.4468) loss 3.2167 (2.8575) grad_norm 2.1579 (1.8440) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][540/625] eta 0:00:38 lr 0.000534 wd 0.0500 time 0.4430 (0.4485) data time 0.0007 (0.0016) model time 0.4423 (0.4467) loss 2.7465 (2.8593) grad_norm 1.5381 (1.8417) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][550/625] eta 0:00:33 lr 0.000534 wd 0.0500 time 0.4521 (0.4485) data time 0.0009 (0.0015) model time 0.4512 (0.4468) loss 3.1995 (2.8607) grad_norm 1.3345 (1.8363) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][560/625] eta 0:00:29 lr 0.000534 wd 0.0500 time 0.4402 (0.4486) data time 0.0009 (0.0015) model time 0.4394 (0.4469) loss 3.2121 (2.8618) grad_norm 1.7515 (1.8313) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][570/625] eta 0:00:24 lr 0.000533 wd 0.0500 time 0.4421 (0.4486) data time 0.0007 (0.0015) model time 0.4414 (0.4469) loss 3.3119 (2.8659) grad_norm 1.5753 (1.8298) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][580/625] eta 0:00:20 lr 0.000533 wd 0.0500 time 0.4436 (0.4485) data time 0.0007 (0.0015) model time 0.4429 (0.4468) loss 2.8742 (2.8704) grad_norm 1.6169 (1.8282) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][590/625] eta 0:00:15 lr 0.000533 wd 0.0500 time 0.4390 (0.4484) data time 0.0007 (0.0015) model time 0.4384 (0.4467) loss 3.6587 (2.8740) grad_norm 1.7323 (1.8288) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][600/625] eta 0:00:11 lr 0.000533 wd 0.0500 time 0.4497 (0.4484) data time 0.0009 (0.0015) model time 0.4488 (0.4467) loss 1.8826 (2.8708) grad_norm 1.5286 (1.8279) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][610/625] eta 0:00:06 lr 0.000533 wd 0.0500 time 0.4389 (0.4483) data time 0.0004 (0.0015) model time 0.4385 (0.4466) loss 2.3370 (2.8701) grad_norm 1.4339 (1.8303) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [170/300][620/625] eta 0:00:02 lr 0.000533 wd 0.0500 time 0.4425 (0.4482) data time 0.0004 (0.0015) model time 0.4420 (0.4466) loss 2.8352 (2.8700) grad_norm 1.8317 (1.8554) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:42:45 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 170 training takes 0:04:40 [2024-08-10 15:42:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:42:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5122 (0.5122) Acc@1 89.209 (89.209) Acc@5 98.438 (98.438) Mem 16699MB [2024-08-10 15:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.152) Loss 0.8340 (0.6448) Acc@1 80.225 (85.778) Acc@5 95.557 (97.545) Mem 16699MB [2024-08-10 15:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9292 (0.7621) Acc@1 77.686 (82.750) Acc@5 95.020 (96.298) Mem 16699MB [2024-08-10 15:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.560 Acc@5 96.321 [2024-08-10 15:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-10 15:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.56% [2024-08-10 15:42:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 15:42:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 15:42:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.4695 (0.4695) Acc@1 89.258 (89.258) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:42:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.7617 (0.5908) Acc@1 81.445 (87.029) Acc@5 96.094 (97.820) Mem 16699MB [2024-08-10 15:42:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.8530 (0.6932) Acc@1 78.662 (84.212) Acc@5 95.947 (96.870) Mem 16699MB [2024-08-10 15:42:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.887 Acc@5 96.875 [2024-08-10 15:42:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:42:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][0/625] eta 0:13:40 lr 0.000533 wd 0.0500 time 1.3123 (1.3123) data time 0.6465 (0.6465) model time 0.0000 (0.0000) loss 3.1832 (3.1832) grad_norm 1.9552 (1.9552) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][10/625] eta 0:05:21 lr 0.000533 wd 0.0500 time 0.4429 (0.5235) data time 0.0006 (0.0596) model time 0.0000 (0.0000) loss 2.1429 (2.7218) grad_norm 1.3734 (1.8515) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][20/625] eta 0:04:53 lr 0.000533 wd 0.0500 time 0.4414 (0.4856) data time 0.0006 (0.0316) model time 0.0000 (0.0000) loss 2.6108 (2.8013) grad_norm 2.0218 (1.8778) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][30/625] eta 0:04:45 lr 0.000533 wd 0.0500 time 0.4444 (0.4790) data time 0.0007 (0.0217) model time 0.0000 (0.0000) loss 3.0253 (2.8225) grad_norm 1.4613 (1.8894) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][40/625] eta 0:04:35 lr 0.000532 wd 0.0500 time 0.4435 (0.4708) data time 0.0007 (0.0167) model time 0.0000 (0.0000) loss 2.8661 (2.8456) grad_norm 1.6132 (1.8386) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][50/625] eta 0:04:29 lr 0.000532 wd 0.0500 time 0.4430 (0.4693) data time 0.0009 (0.0136) model time 0.0000 (0.0000) loss 2.9906 (2.8405) grad_norm 2.0254 (1.7950) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][60/625] eta 0:04:22 lr 0.000532 wd 0.0500 time 0.4471 (0.4654) data time 0.0008 (0.0115) model time 0.4463 (0.4444) loss 1.6442 (2.8101) grad_norm 1.3703 (1.7822) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][70/625] eta 0:04:17 lr 0.000532 wd 0.0500 time 0.3939 (0.4634) data time 0.0009 (0.0100) model time 0.3930 (0.4474) loss 2.9087 (2.8164) grad_norm 1.5641 (1.7277) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][80/625] eta 0:04:11 lr 0.000532 wd 0.0500 time 0.4447 (0.4609) data time 0.0009 (0.0089) model time 0.4438 (0.4457) loss 3.1476 (2.8305) grad_norm 1.6839 (1.7346) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][90/625] eta 0:04:06 lr 0.000532 wd 0.0500 time 0.4419 (0.4606) data time 0.0009 (0.0080) model time 0.4410 (0.4485) loss 3.2052 (2.8391) grad_norm 2.8619 (1.8946) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][100/625] eta 0:04:00 lr 0.000532 wd 0.0500 time 0.4409 (0.4589) data time 0.0008 (0.0073) model time 0.4401 (0.4473) loss 3.0515 (2.8414) grad_norm 1.6136 (1.9121) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][110/625] eta 0:03:55 lr 0.000532 wd 0.0500 time 0.4486 (0.4576) data time 0.0006 (0.0067) model time 0.4480 (0.4468) loss 3.0427 (2.8163) grad_norm 1.6379 (1.8893) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][120/625] eta 0:03:50 lr 0.000532 wd 0.0500 time 0.4431 (0.4564) data time 0.0006 (0.0063) model time 0.4424 (0.4461) loss 2.6752 (2.8092) grad_norm 1.2518 (1.8606) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][130/625] eta 0:03:46 lr 0.000531 wd 0.0500 time 0.4461 (0.4569) data time 0.0008 (0.0058) model time 0.4453 (0.4481) loss 2.7822 (2.8009) grad_norm 1.4842 (1.8396) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:43:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][140/625] eta 0:03:41 lr 0.000531 wd 0.0500 time 0.4417 (0.4560) data time 0.0008 (0.0055) model time 0.4409 (0.4475) loss 2.9316 (2.8205) grad_norm 2.5641 (1.8636) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][150/625] eta 0:03:36 lr 0.000531 wd 0.0500 time 0.4477 (0.4551) data time 0.0007 (0.0052) model time 0.4470 (0.4470) loss 2.9218 (2.8318) grad_norm 2.0948 (1.8795) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][160/625] eta 0:03:31 lr 0.000531 wd 0.0500 time 0.4475 (0.4545) data time 0.0006 (0.0049) model time 0.4469 (0.4467) loss 3.0386 (2.8369) grad_norm 1.7383 (1.9088) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][170/625] eta 0:03:26 lr 0.000531 wd 0.0500 time 0.4456 (0.4538) data time 0.0006 (0.0047) model time 0.4450 (0.4463) loss 3.4195 (2.8445) grad_norm 1.7795 (1.8988) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][180/625] eta 0:03:21 lr 0.000531 wd 0.0500 time 0.4446 (0.4533) data time 0.0008 (0.0045) model time 0.4438 (0.4460) loss 3.1741 (2.8449) grad_norm 1.5054 (1.8827) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][190/625] eta 0:03:16 lr 0.000531 wd 0.0500 time 0.4431 (0.4528) data time 0.0006 (0.0043) model time 0.4425 (0.4458) loss 3.2579 (2.8544) grad_norm 1.5355 (1.8676) loss_scale 2048.0000 (1045.4450) mem 16699MB [2024-08-10 15:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][200/625] eta 0:03:12 lr 0.000531 wd 0.0500 time 0.4439 (0.4523) data time 0.0009 (0.0041) model time 0.4431 (0.4456) loss 2.3950 (2.8505) grad_norm 2.0813 (1.8564) loss_scale 2048.0000 (1095.3234) mem 16699MB [2024-08-10 15:44:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][210/625] eta 0:03:07 lr 0.000531 wd 0.0500 time 0.4450 (0.4519) data time 0.0007 (0.0040) model time 0.4443 (0.4455) loss 2.9205 (2.8611) grad_norm 1.7740 (1.8520) loss_scale 2048.0000 (1140.4739) mem 16699MB [2024-08-10 15:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][220/625] eta 0:03:03 lr 0.000531 wd 0.0500 time 0.4445 (0.4524) data time 0.0009 (0.0038) model time 0.4437 (0.4464) loss 2.8739 (2.8722) grad_norm 1.5696 (1.8496) loss_scale 2048.0000 (1181.5385) mem 16699MB [2024-08-10 15:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][230/625] eta 0:02:58 lr 0.000530 wd 0.0500 time 0.4377 (0.4519) data time 0.0009 (0.0037) model time 0.4367 (0.4461) loss 2.5472 (2.8694) grad_norm 1.2109 (1.8460) loss_scale 2048.0000 (1219.0476) mem 16699MB [2024-08-10 15:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][240/625] eta 0:02:53 lr 0.000530 wd 0.0500 time 0.4455 (0.4519) data time 0.0009 (0.0036) model time 0.4446 (0.4462) loss 3.0598 (2.8632) grad_norm 1.5591 (1.8661) loss_scale 2048.0000 (1253.4440) mem 16699MB [2024-08-10 15:44:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][250/625] eta 0:02:49 lr 0.000530 wd 0.0500 time 0.4404 (0.4515) data time 0.0008 (0.0035) model time 0.4397 (0.4461) loss 3.2391 (2.8701) grad_norm 1.2823 (1.8709) loss_scale 2048.0000 (1285.0996) mem 16699MB [2024-08-10 15:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][260/625] eta 0:02:44 lr 0.000530 wd 0.0500 time 0.4421 (0.4512) data time 0.0006 (0.0034) model time 0.4415 (0.4459) loss 3.2722 (2.8725) grad_norm 1.3937 (1.8682) loss_scale 2048.0000 (1314.3295) mem 16699MB [2024-08-10 15:44:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][270/625] eta 0:02:40 lr 0.000530 wd 0.0500 time 0.4464 (0.4510) data time 0.0006 (0.0033) model time 0.4457 (0.4458) loss 1.8541 (2.8737) grad_norm 1.3813 (1.8511) loss_scale 2048.0000 (1341.4022) mem 16699MB [2024-08-10 15:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][280/625] eta 0:02:35 lr 0.000530 wd 0.0500 time 0.4436 (0.4508) data time 0.0008 (0.0032) model time 0.4428 (0.4457) loss 2.7674 (2.8745) grad_norm 1.6719 (1.8451) loss_scale 2048.0000 (1366.5480) mem 16699MB [2024-08-10 15:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][290/625] eta 0:02:31 lr 0.000530 wd 0.0500 time 0.3921 (0.4510) data time 0.0007 (0.0031) model time 0.3913 (0.4463) loss 2.7776 (2.8746) grad_norm 1.7324 (inf) loss_scale 1024.0000 (1382.9278) mem 16699MB [2024-08-10 15:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][300/625] eta 0:02:26 lr 0.000530 wd 0.0500 time 0.4405 (0.4507) data time 0.0007 (0.0031) model time 0.4398 (0.4460) loss 3.1901 (2.8818) grad_norm 1.6365 (inf) loss_scale 1024.0000 (1371.0033) mem 16699MB [2024-08-10 15:45:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][310/625] eta 0:02:21 lr 0.000530 wd 0.0500 time 0.4480 (0.4505) data time 0.0006 (0.0030) model time 0.4473 (0.4459) loss 3.2611 (2.8859) grad_norm 1.3233 (inf) loss_scale 1024.0000 (1359.8457) mem 16699MB [2024-08-10 15:45:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][320/625] eta 0:02:17 lr 0.000529 wd 0.0500 time 0.4507 (0.4503) data time 0.0007 (0.0029) model time 0.4500 (0.4458) loss 2.6524 (2.8799) grad_norm 1.6602 (inf) loss_scale 1024.0000 (1349.3832) mem 16699MB [2024-08-10 15:45:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][330/625] eta 0:02:12 lr 0.000529 wd 0.0500 time 0.4447 (0.4501) data time 0.0006 (0.0029) model time 0.4440 (0.4457) loss 2.5035 (2.8779) grad_norm 1.4369 (inf) loss_scale 1024.0000 (1339.5529) mem 16699MB [2024-08-10 15:45:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][340/625] eta 0:02:08 lr 0.000529 wd 0.0500 time 0.4440 (0.4500) data time 0.0007 (0.0028) model time 0.4433 (0.4457) loss 2.9793 (2.8804) grad_norm 2.0626 (inf) loss_scale 1024.0000 (1330.2991) mem 16699MB [2024-08-10 15:45:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][350/625] eta 0:02:03 lr 0.000529 wd 0.0500 time 0.4427 (0.4507) data time 0.0009 (0.0028) model time 0.4418 (0.4467) loss 2.6626 (2.8803) grad_norm 1.9495 (inf) loss_scale 1024.0000 (1321.5726) mem 16699MB [2024-08-10 15:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][360/625] eta 0:01:59 lr 0.000529 wd 0.0500 time 0.4432 (0.4505) data time 0.0007 (0.0027) model time 0.4425 (0.4465) loss 2.8426 (2.8791) grad_norm 2.3600 (inf) loss_scale 1024.0000 (1313.3296) mem 16699MB [2024-08-10 15:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][370/625] eta 0:01:54 lr 0.000529 wd 0.0500 time 0.4440 (0.4503) data time 0.0006 (0.0027) model time 0.4434 (0.4464) loss 2.1845 (2.8805) grad_norm 1.7022 (inf) loss_scale 1024.0000 (1305.5310) mem 16699MB [2024-08-10 15:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][380/625] eta 0:01:50 lr 0.000529 wd 0.0500 time 0.4388 (0.4501) data time 0.0009 (0.0026) model time 0.4379 (0.4462) loss 3.0988 (2.8717) grad_norm 1.4694 (inf) loss_scale 1024.0000 (1298.1417) mem 16699MB [2024-08-10 15:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][390/625] eta 0:01:45 lr 0.000529 wd 0.0500 time 0.4436 (0.4500) data time 0.0007 (0.0026) model time 0.4429 (0.4461) loss 3.3943 (2.8678) grad_norm 2.4470 (inf) loss_scale 1024.0000 (1291.1304) mem 16699MB [2024-08-10 15:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][400/625] eta 0:01:41 lr 0.000529 wd 0.0500 time 0.4488 (0.4498) data time 0.0006 (0.0025) model time 0.4482 (0.4460) loss 3.4255 (2.8630) grad_norm 1.7574 (inf) loss_scale 1024.0000 (1284.4688) mem 16699MB [2024-08-10 15:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][410/625] eta 0:01:36 lr 0.000529 wd 0.0500 time 0.4380 (0.4496) data time 0.0008 (0.0025) model time 0.4372 (0.4459) loss 1.8088 (2.8634) grad_norm 1.6142 (inf) loss_scale 1024.0000 (1278.1314) mem 16699MB [2024-08-10 15:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][420/625] eta 0:01:32 lr 0.000528 wd 0.0500 time 0.4439 (0.4495) data time 0.0008 (0.0024) model time 0.4431 (0.4459) loss 3.5147 (2.8679) grad_norm 2.0174 (inf) loss_scale 1024.0000 (1272.0950) mem 16699MB [2024-08-10 15:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][430/625] eta 0:01:27 lr 0.000528 wd 0.0500 time 0.4447 (0.4498) data time 0.0006 (0.0024) model time 0.4441 (0.4463) loss 2.0261 (2.8681) grad_norm 1.5851 (inf) loss_scale 1024.0000 (1266.3387) mem 16699MB [2024-08-10 15:46:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][440/625] eta 0:01:23 lr 0.000528 wd 0.0500 time 0.4402 (0.4505) data time 0.0009 (0.0024) model time 0.4393 (0.4471) loss 3.0501 (2.8668) grad_norm 1.8492 (inf) loss_scale 1024.0000 (1260.8435) mem 16699MB [2024-08-10 15:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][450/625] eta 0:01:18 lr 0.000528 wd 0.0500 time 0.4415 (0.4503) data time 0.0007 (0.0023) model time 0.4407 (0.4470) loss 3.3131 (2.8672) grad_norm 1.8345 (inf) loss_scale 1024.0000 (1255.5920) mem 16699MB [2024-08-10 15:46:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][460/625] eta 0:01:14 lr 0.000528 wd 0.0500 time 0.4415 (0.4506) data time 0.0006 (0.0023) model time 0.4409 (0.4473) loss 3.2777 (2.8723) grad_norm 1.9661 (inf) loss_scale 1024.0000 (1250.5683) mem 16699MB [2024-08-10 15:46:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][470/625] eta 0:01:09 lr 0.000528 wd 0.0500 time 0.4525 (0.4504) data time 0.0007 (0.0023) model time 0.4518 (0.4472) loss 3.0280 (2.8733) grad_norm 1.8833 (inf) loss_scale 1024.0000 (1245.7580) mem 16699MB [2024-08-10 15:46:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][480/625] eta 0:01:05 lr 0.000528 wd 0.0500 time 0.4456 (0.4503) data time 0.0007 (0.0023) model time 0.4449 (0.4471) loss 2.8183 (2.8719) grad_norm 1.4220 (inf) loss_scale 1024.0000 (1241.1476) mem 16699MB [2024-08-10 15:46:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][490/625] eta 0:01:00 lr 0.000528 wd 0.0500 time 0.4420 (0.4502) data time 0.0009 (0.0022) model time 0.4411 (0.4470) loss 2.9959 (2.8708) grad_norm 2.7699 (inf) loss_scale 1024.0000 (1236.7251) mem 16699MB [2024-08-10 15:46:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][500/625] eta 0:00:56 lr 0.000528 wd 0.0500 time 0.4425 (0.4508) data time 0.0007 (0.0022) model time 0.4419 (0.4477) loss 3.1761 (2.8734) grad_norm 1.8157 (inf) loss_scale 1024.0000 (1232.4790) mem 16699MB [2024-08-10 15:46:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][510/625] eta 0:00:51 lr 0.000527 wd 0.0500 time 0.4490 (0.4506) data time 0.0009 (0.0022) model time 0.4481 (0.4476) loss 1.9708 (2.8696) grad_norm 2.0451 (inf) loss_scale 1024.0000 (1228.3992) mem 16699MB [2024-08-10 15:46:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][520/625] eta 0:00:47 lr 0.000527 wd 0.0500 time 0.4410 (0.4505) data time 0.0009 (0.0022) model time 0.4401 (0.4476) loss 3.2049 (2.8710) grad_norm 1.6279 (inf) loss_scale 1024.0000 (1224.4760) mem 16699MB [2024-08-10 15:46:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][530/625] eta 0:00:42 lr 0.000527 wd 0.0500 time 0.4408 (0.4504) data time 0.0009 (0.0021) model time 0.4399 (0.4475) loss 3.0326 (2.8730) grad_norm 1.7301 (inf) loss_scale 1024.0000 (1220.7006) mem 16699MB [2024-08-10 15:46:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][540/625] eta 0:00:38 lr 0.000527 wd 0.0500 time 0.4446 (0.4503) data time 0.0009 (0.0021) model time 0.4437 (0.4474) loss 2.9034 (2.8712) grad_norm 2.0298 (inf) loss_scale 1024.0000 (1217.0647) mem 16699MB [2024-08-10 15:47:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][550/625] eta 0:00:33 lr 0.000527 wd 0.0500 time 0.4410 (0.4502) data time 0.0009 (0.0021) model time 0.4401 (0.4473) loss 2.7215 (2.8707) grad_norm 1.9926 (inf) loss_scale 1024.0000 (1213.5608) mem 16699MB [2024-08-10 15:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][560/625] eta 0:00:29 lr 0.000527 wd 0.0500 time 0.4453 (0.4501) data time 0.0006 (0.0021) model time 0.4447 (0.4472) loss 2.4854 (2.8687) grad_norm 1.4373 (inf) loss_scale 1024.0000 (1210.1818) mem 16699MB [2024-08-10 15:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][570/625] eta 0:00:24 lr 0.000527 wd 0.0500 time 0.4399 (0.4499) data time 0.0008 (0.0021) model time 0.4391 (0.4471) loss 2.9366 (2.8670) grad_norm 1.5851 (inf) loss_scale 1024.0000 (1206.9212) mem 16699MB [2024-08-10 15:47:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][580/625] eta 0:00:20 lr 0.000527 wd 0.0500 time 0.6421 (0.4502) data time 0.0008 (0.0020) model time 0.6413 (0.4474) loss 2.7777 (2.8650) grad_norm 1.9538 (inf) loss_scale 1024.0000 (1203.7728) mem 16699MB [2024-08-10 15:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][590/625] eta 0:00:15 lr 0.000527 wd 0.0500 time 0.4429 (0.4500) data time 0.0009 (0.0020) model time 0.4420 (0.4473) loss 2.9543 (2.8686) grad_norm 1.5906 (inf) loss_scale 1024.0000 (1200.7310) mem 16699MB [2024-08-10 15:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][600/625] eta 0:00:11 lr 0.000527 wd 0.0500 time 0.4460 (0.4499) data time 0.0006 (0.0020) model time 0.4454 (0.4472) loss 2.7598 (2.8698) grad_norm 1.7147 (inf) loss_scale 1024.0000 (1197.7903) mem 16699MB [2024-08-10 15:47:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][610/625] eta 0:00:06 lr 0.000526 wd 0.0500 time 0.4379 (0.4501) data time 0.0004 (0.0020) model time 0.4375 (0.4474) loss 3.6995 (2.8719) grad_norm 2.9309 (inf) loss_scale 1024.0000 (1194.9460) mem 16699MB [2024-08-10 15:47:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [171/300][620/625] eta 0:00:02 lr 0.000526 wd 0.0500 time 0.4406 (0.4500) data time 0.0004 (0.0020) model time 0.4402 (0.4474) loss 2.8620 (2.8725) grad_norm 1.8860 (inf) loss_scale 1024.0000 (1192.1932) mem 16699MB [2024-08-10 15:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 171 training takes 0:04:41 [2024-08-10 15:47:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:47:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.5234 (0.5234) Acc@1 88.623 (88.623) Acc@5 98.340 (98.340) Mem 16699MB [2024-08-10 15:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8540 (0.6443) Acc@1 79.980 (85.853) Acc@5 95.264 (97.452) Mem 16699MB [2024-08-10 15:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9326 (0.7566) Acc@1 77.002 (82.982) Acc@5 95.166 (96.284) Mem 16699MB [2024-08-10 15:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.684 Acc@5 96.295 [2024-08-10 15:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 15:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.68% [2024-08-10 15:47:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 15:47:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 15:47:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.479 (0.479) Loss 0.4695 (0.4695) Acc@1 89.209 (89.209) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 15:47:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.7612 (0.5906) Acc@1 81.396 (87.003) Acc@5 95.898 (97.820) Mem 16699MB [2024-08-10 15:47:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.8506 (0.6929) Acc@1 78.613 (84.191) Acc@5 95.996 (96.873) Mem 16699MB [2024-08-10 15:47:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.887 Acc@5 96.873 [2024-08-10 15:47:46 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][0/625] eta 0:12:14 lr 0.000526 wd 0.0500 time 1.1753 (1.1753) data time 0.6059 (0.6059) model time 0.0000 (0.0000) loss 3.0812 (3.0812) grad_norm 1.5577 (1.5577) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][10/625] eta 0:05:13 lr 0.000526 wd 0.0500 time 0.4444 (0.5103) data time 0.0008 (0.0560) model time 0.0000 (0.0000) loss 1.9517 (2.8779) grad_norm 1.8408 (1.8118) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][20/625] eta 0:04:49 lr 0.000526 wd 0.0500 time 0.4406 (0.4780) data time 0.0007 (0.0298) model time 0.0000 (0.0000) loss 3.4318 (2.9000) grad_norm 1.3060 (1.8985) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][30/625] eta 0:04:40 lr 0.000526 wd 0.0500 time 0.4387 (0.4719) data time 0.0006 (0.0205) model time 0.0000 (0.0000) loss 3.5501 (2.9539) grad_norm 2.1379 (2.1902) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][40/625] eta 0:04:31 lr 0.000526 wd 0.0500 time 0.4423 (0.4649) data time 0.0006 (0.0157) model time 0.0000 (0.0000) loss 3.5141 (2.9385) grad_norm 2.9136 (2.2198) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][50/625] eta 0:04:24 lr 0.000526 wd 0.0500 time 0.4418 (0.4606) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 3.5939 (2.9110) grad_norm 1.6407 (2.1342) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][60/625] eta 0:04:18 lr 0.000526 wd 0.0500 time 0.4425 (0.4578) data time 0.0008 (0.0109) model time 0.4416 (0.4425) loss 3.3354 (2.9214) grad_norm 2.1381 (2.0648) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][70/625] eta 0:04:12 lr 0.000526 wd 0.0500 time 0.4442 (0.4556) data time 0.0006 (0.0095) model time 0.4436 (0.4420) loss 3.4927 (2.8998) grad_norm 1.6783 (2.0044) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][80/625] eta 0:04:07 lr 0.000525 wd 0.0500 time 0.4418 (0.4542) data time 0.0008 (0.0084) model time 0.4411 (0.4422) loss 3.3086 (2.8858) grad_norm 1.3856 (1.9613) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][90/625] eta 0:04:03 lr 0.000525 wd 0.0500 time 0.4465 (0.4551) data time 0.0007 (0.0076) model time 0.4458 (0.4472) loss 3.0955 (2.8657) grad_norm 1.9076 (1.9948) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][100/625] eta 0:03:58 lr 0.000525 wd 0.0500 time 0.4400 (0.4539) data time 0.0007 (0.0069) model time 0.4393 (0.4460) loss 2.6276 (2.8617) grad_norm 3.4920 (2.1923) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][110/625] eta 0:03:53 lr 0.000525 wd 0.0500 time 0.4438 (0.4528) data time 0.0006 (0.0064) model time 0.4432 (0.4452) loss 2.7493 (2.8703) grad_norm 2.2686 (2.1738) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][120/625] eta 0:03:48 lr 0.000525 wd 0.0500 time 0.4411 (0.4519) data time 0.0006 (0.0059) model time 0.4405 (0.4447) loss 3.2247 (2.8586) grad_norm 2.3561 (2.1617) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][130/625] eta 0:03:44 lr 0.000525 wd 0.0500 time 0.4424 (0.4528) data time 0.0009 (0.0055) model time 0.4415 (0.4470) loss 3.3401 (2.8665) grad_norm 1.5382 (2.1222) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][140/625] eta 0:03:39 lr 0.000525 wd 0.0500 time 0.4439 (0.4533) data time 0.0007 (0.0052) model time 0.4431 (0.4482) loss 2.5855 (2.8822) grad_norm 1.8122 (2.1003) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][150/625] eta 0:03:34 lr 0.000525 wd 0.0500 time 0.4417 (0.4526) data time 0.0007 (0.0049) model time 0.4410 (0.4476) loss 3.6939 (2.8871) grad_norm 1.1207 (2.0695) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][160/625] eta 0:03:30 lr 0.000525 wd 0.0500 time 0.4456 (0.4521) data time 0.0008 (0.0047) model time 0.4448 (0.4472) loss 2.8693 (2.8807) grad_norm 2.7039 (2.0439) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][170/625] eta 0:03:25 lr 0.000524 wd 0.0500 time 0.4407 (0.4515) data time 0.0009 (0.0045) model time 0.4398 (0.4467) loss 3.0028 (2.8812) grad_norm 1.8635 (2.0232) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][180/625] eta 0:03:20 lr 0.000524 wd 0.0500 time 0.4452 (0.4510) data time 0.0009 (0.0043) model time 0.4443 (0.4464) loss 2.7935 (2.8801) grad_norm 2.0479 (2.0208) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][190/625] eta 0:03:16 lr 0.000524 wd 0.0500 time 0.4465 (0.4506) data time 0.0008 (0.0041) model time 0.4456 (0.4461) loss 3.0610 (2.8877) grad_norm 2.1555 (2.0181) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][200/625] eta 0:03:11 lr 0.000524 wd 0.0500 time 0.4444 (0.4503) data time 0.0008 (0.0039) model time 0.4436 (0.4459) loss 3.3389 (2.8942) grad_norm 2.0628 (2.0144) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][210/625] eta 0:03:07 lr 0.000524 wd 0.0500 time 0.4451 (0.4508) data time 0.0009 (0.0038) model time 0.4443 (0.4467) loss 3.2565 (2.8927) grad_norm 1.1538 (1.9908) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][220/625] eta 0:03:02 lr 0.000524 wd 0.0500 time 0.4421 (0.4505) data time 0.0009 (0.0036) model time 0.4412 (0.4465) loss 2.3931 (2.8798) grad_norm 1.6584 (1.9757) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][230/625] eta 0:02:57 lr 0.000524 wd 0.0500 time 0.4429 (0.4501) data time 0.0007 (0.0035) model time 0.4422 (0.4463) loss 3.3080 (2.8780) grad_norm 1.5989 (1.9730) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][240/625] eta 0:02:53 lr 0.000524 wd 0.0500 time 0.4431 (0.4498) data time 0.0006 (0.0034) model time 0.4425 (0.4460) loss 2.5156 (2.8748) grad_norm 1.8910 (1.9532) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][250/625] eta 0:02:48 lr 0.000524 wd 0.0500 time 0.4430 (0.4496) data time 0.0009 (0.0033) model time 0.4421 (0.4458) loss 2.8515 (2.8668) grad_norm 1.6203 (1.9419) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][260/625] eta 0:02:43 lr 0.000524 wd 0.0500 time 0.4427 (0.4493) data time 0.0006 (0.0032) model time 0.4421 (0.4457) loss 3.4286 (2.8695) grad_norm 2.6603 (1.9418) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][270/625] eta 0:02:39 lr 0.000523 wd 0.0500 time 0.4456 (0.4491) data time 0.0006 (0.0031) model time 0.4450 (0.4455) loss 2.7432 (2.8633) grad_norm 1.6652 (1.9594) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][280/625] eta 0:02:34 lr 0.000523 wd 0.0500 time 0.4444 (0.4489) data time 0.0006 (0.0031) model time 0.4438 (0.4454) loss 2.8593 (2.8661) grad_norm 1.9762 (1.9536) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][290/625] eta 0:02:30 lr 0.000523 wd 0.0500 time 0.4433 (0.4488) data time 0.0009 (0.0030) model time 0.4424 (0.4454) loss 2.6640 (2.8511) grad_norm 1.3310 (1.9489) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][300/625] eta 0:02:25 lr 0.000523 wd 0.0500 time 0.4454 (0.4486) data time 0.0006 (0.0029) model time 0.4448 (0.4453) loss 3.3135 (2.8526) grad_norm 1.5622 (1.9423) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][310/625] eta 0:02:21 lr 0.000523 wd 0.0500 time 0.4464 (0.4485) data time 0.0007 (0.0029) model time 0.4458 (0.4452) loss 2.2266 (2.8509) grad_norm 1.5733 (1.9278) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][320/625] eta 0:02:16 lr 0.000523 wd 0.0500 time 0.4410 (0.4488) data time 0.0006 (0.0028) model time 0.4404 (0.4457) loss 2.1711 (2.8485) grad_norm 1.2556 (1.9325) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][330/625] eta 0:02:12 lr 0.000523 wd 0.0500 time 0.4442 (0.4487) data time 0.0007 (0.0027) model time 0.4435 (0.4456) loss 3.4790 (2.8508) grad_norm 1.5996 (1.9283) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][340/625] eta 0:02:07 lr 0.000523 wd 0.0500 time 0.4459 (0.4485) data time 0.0008 (0.0027) model time 0.4450 (0.4455) loss 2.4678 (2.8478) grad_norm 1.9228 (1.9412) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][350/625] eta 0:02:03 lr 0.000523 wd 0.0500 time 0.4407 (0.4484) data time 0.0008 (0.0026) model time 0.4399 (0.4454) loss 3.4632 (2.8566) grad_norm 1.8889 (1.9395) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][360/625] eta 0:01:59 lr 0.000522 wd 0.0500 time 0.4436 (0.4492) data time 0.0006 (0.0026) model time 0.4429 (0.4464) loss 2.1292 (2.8448) grad_norm 1.6313 (1.9421) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][370/625] eta 0:01:54 lr 0.000522 wd 0.0500 time 0.4439 (0.4491) data time 0.0006 (0.0025) model time 0.4432 (0.4463) loss 2.2889 (2.8460) grad_norm 1.8907 (1.9348) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][380/625] eta 0:01:49 lr 0.000522 wd 0.0500 time 0.4442 (0.4489) data time 0.0009 (0.0025) model time 0.4433 (0.4462) loss 2.4483 (2.8443) grad_norm 1.4534 (1.9295) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][390/625] eta 0:01:45 lr 0.000522 wd 0.0500 time 0.4418 (0.4488) data time 0.0008 (0.0024) model time 0.4411 (0.4461) loss 3.4143 (2.8527) grad_norm 1.9034 (1.9473) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][400/625] eta 0:01:40 lr 0.000522 wd 0.0500 time 0.4435 (0.4486) data time 0.0006 (0.0024) model time 0.4428 (0.4460) loss 3.2768 (2.8571) grad_norm 3.5250 (1.9539) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][410/625] eta 0:01:36 lr 0.000522 wd 0.0500 time 0.4456 (0.4485) data time 0.0007 (0.0024) model time 0.4449 (0.4459) loss 2.0940 (2.8485) grad_norm 1.4332 (1.9522) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][420/625] eta 0:01:31 lr 0.000522 wd 0.0500 time 0.4431 (0.4484) data time 0.0006 (0.0023) model time 0.4425 (0.4458) loss 2.2621 (2.8497) grad_norm 1.8480 (1.9467) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:50:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][430/625] eta 0:01:27 lr 0.000522 wd 0.0500 time 0.4455 (0.4487) data time 0.0006 (0.0023) model time 0.4449 (0.4462) loss 1.9769 (2.8478) grad_norm 1.9882 (1.9555) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][440/625] eta 0:01:22 lr 0.000522 wd 0.0500 time 0.4495 (0.4486) data time 0.0006 (0.0023) model time 0.4488 (0.4462) loss 3.7538 (2.8539) grad_norm 1.7049 (1.9577) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][450/625] eta 0:01:18 lr 0.000522 wd 0.0500 time 0.4467 (0.4485) data time 0.0006 (0.0022) model time 0.4461 (0.4461) loss 2.2539 (2.8529) grad_norm 1.7203 (1.9542) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][460/625] eta 0:01:14 lr 0.000521 wd 0.0500 time 0.6620 (0.4489) data time 0.0008 (0.0022) model time 0.6611 (0.4466) loss 2.8728 (2.8509) grad_norm 1.5842 (1.9492) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][470/625] eta 0:01:09 lr 0.000521 wd 0.0500 time 0.4406 (0.4488) data time 0.0007 (0.0022) model time 0.4399 (0.4464) loss 3.7608 (2.8489) grad_norm 1.7702 (1.9434) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][480/625] eta 0:01:05 lr 0.000521 wd 0.0500 time 0.4388 (0.4490) data time 0.0009 (0.0022) model time 0.4378 (0.4467) loss 3.2851 (2.8509) grad_norm 1.6555 (1.9369) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][490/625] eta 0:01:00 lr 0.000521 wd 0.0500 time 0.4482 (0.4489) data time 0.0008 (0.0021) model time 0.4474 (0.4466) loss 1.8284 (2.8496) grad_norm 1.8509 (1.9396) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][500/625] eta 0:00:56 lr 0.000521 wd 0.0500 time 0.4440 (0.4488) data time 0.0009 (0.0021) model time 0.4431 (0.4466) loss 3.3595 (2.8523) grad_norm 1.3683 (1.9374) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][510/625] eta 0:00:51 lr 0.000521 wd 0.0500 time 0.4504 (0.4489) data time 0.0006 (0.0021) model time 0.4498 (0.4467) loss 1.9048 (2.8503) grad_norm 2.5895 (1.9474) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][520/625] eta 0:00:47 lr 0.000521 wd 0.0500 time 0.4413 (0.4488) data time 0.0007 (0.0021) model time 0.4406 (0.4466) loss 3.4006 (2.8513) grad_norm 5.6255 (1.9498) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][530/625] eta 0:00:42 lr 0.000521 wd 0.0500 time 0.4420 (0.4487) data time 0.0008 (0.0020) model time 0.4411 (0.4465) loss 2.8633 (2.8582) grad_norm 1.5501 (1.9506) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][540/625] eta 0:00:38 lr 0.000521 wd 0.0500 time 0.4425 (0.4486) data time 0.0006 (0.0020) model time 0.4418 (0.4464) loss 3.2609 (2.8589) grad_norm 2.8777 (1.9513) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][550/625] eta 0:00:33 lr 0.000520 wd 0.0500 time 0.4394 (0.4486) data time 0.0010 (0.0020) model time 0.4384 (0.4465) loss 3.2330 (2.8553) grad_norm 1.8274 (1.9459) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:51:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][560/625] eta 0:00:29 lr 0.000520 wd 0.0500 time 0.4457 (0.4485) data time 0.0007 (0.0020) model time 0.4450 (0.4464) loss 3.3574 (2.8581) grad_norm 4.2931 (1.9465) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][570/625] eta 0:00:24 lr 0.000520 wd 0.0500 time 0.4551 (0.4485) data time 0.0007 (0.0020) model time 0.4544 (0.4464) loss 2.9365 (2.8591) grad_norm 6.7889 (1.9644) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][580/625] eta 0:00:20 lr 0.000520 wd 0.0500 time 0.4405 (0.4484) data time 0.0008 (0.0019) model time 0.4397 (0.4463) loss 1.6663 (2.8587) grad_norm 1.9944 (1.9740) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][590/625] eta 0:00:15 lr 0.000520 wd 0.0500 time 0.4481 (0.4483) data time 0.0006 (0.0019) model time 0.4475 (0.4463) loss 2.6343 (2.8599) grad_norm 1.3539 (1.9690) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][600/625] eta 0:00:11 lr 0.000520 wd 0.0500 time 0.4479 (0.4483) data time 0.0009 (0.0019) model time 0.4469 (0.4462) loss 3.2438 (2.8597) grad_norm 1.2833 (1.9634) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][610/625] eta 0:00:06 lr 0.000520 wd 0.0500 time 0.4362 (0.4482) data time 0.0004 (0.0019) model time 0.4358 (0.4462) loss 3.4091 (2.8596) grad_norm 2.4222 (1.9632) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [172/300][620/625] eta 0:00:02 lr 0.000520 wd 0.0500 time 0.4379 (0.4482) data time 0.0006 (0.0019) model time 0.4373 (0.4462) loss 2.3202 (2.8610) grad_norm 1.3234 (1.9601) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 172 training takes 0:04:40 [2024-08-10 15:52:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:52:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5044 (0.5044) Acc@1 89.160 (89.160) Acc@5 98.438 (98.438) Mem 16699MB [2024-08-10 15:52:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8452 (0.6441) Acc@1 78.906 (85.760) Acc@5 96.289 (97.683) Mem 16699MB [2024-08-10 15:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9468 (0.7637) Acc@1 77.295 (82.733) Acc@5 94.873 (96.426) Mem 16699MB [2024-08-10 15:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.558 Acc@5 96.405 [2024-08-10 15:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-10 15:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.961 (0.961) Loss 0.4697 (0.4697) Acc@1 89.209 (89.209) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-10 15:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.196) Loss 0.7607 (0.5905) Acc@1 81.396 (87.052) Acc@5 96.143 (97.838) Mem 16699MB [2024-08-10 15:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.158) Loss 0.8506 (0.6927) Acc@1 78.467 (84.259) Acc@5 95.996 (96.873) Mem 16699MB [2024-08-10 15:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.945 Acc@5 96.871 [2024-08-10 15:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-10 15:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.95% [2024-08-10 15:52:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:52:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][0/625] eta 0:08:04 lr 0.000520 wd 0.0500 time 0.7759 (0.7759) data time 0.3868 (0.3868) model time 0.0000 (0.0000) loss 2.9031 (2.9031) grad_norm 1.1645 (1.1645) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 15:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][10/625] eta 0:04:52 lr 0.000520 wd 0.0500 time 0.4476 (0.4748) data time 0.0008 (0.0360) model time 0.0000 (0.0000) loss 2.8930 (2.7317) grad_norm 1.8482 (inf) loss_scale 512.0000 (698.1818) mem 16699MB [2024-08-10 15:52:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][20/625] eta 0:04:38 lr 0.000519 wd 0.0500 time 0.4424 (0.4604) data time 0.0008 (0.0192) model time 0.0000 (0.0000) loss 3.0101 (2.8916) grad_norm 2.0950 (inf) loss_scale 512.0000 (609.5238) mem 16699MB [2024-08-10 15:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][30/625] eta 0:04:34 lr 0.000519 wd 0.0500 time 0.4448 (0.4606) data time 0.0007 (0.0133) model time 0.0000 (0.0000) loss 3.2307 (2.8451) grad_norm 1.9616 (inf) loss_scale 512.0000 (578.0645) mem 16699MB [2024-08-10 15:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][40/625] eta 0:04:26 lr 0.000519 wd 0.0500 time 0.4418 (0.4562) data time 0.0007 (0.0103) model time 0.0000 (0.0000) loss 3.0265 (2.8252) grad_norm 1.8121 (inf) loss_scale 512.0000 (561.9512) mem 16699MB [2024-08-10 15:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][50/625] eta 0:04:23 lr 0.000519 wd 0.0500 time 0.4396 (0.4578) data time 0.0007 (0.0085) model time 0.0000 (0.0000) loss 3.1737 (2.8969) grad_norm 1.4841 (inf) loss_scale 512.0000 (552.1569) mem 16699MB [2024-08-10 15:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][60/625] eta 0:04:17 lr 0.000519 wd 0.0500 time 0.4501 (0.4558) data time 0.0009 (0.0072) model time 0.4492 (0.4451) loss 3.3008 (2.9122) grad_norm 1.4739 (inf) loss_scale 512.0000 (545.5738) mem 16699MB [2024-08-10 15:53:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][70/625] eta 0:04:13 lr 0.000519 wd 0.0500 time 0.4421 (0.4568) data time 0.0009 (0.0064) model time 0.4413 (0.4533) loss 2.7210 (2.9393) grad_norm 1.8068 (inf) loss_scale 512.0000 (540.8451) mem 16699MB [2024-08-10 15:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][80/625] eta 0:04:08 lr 0.000519 wd 0.0500 time 0.4449 (0.4551) data time 0.0006 (0.0057) model time 0.4442 (0.4497) loss 2.9048 (2.9362) grad_norm 1.8604 (inf) loss_scale 512.0000 (537.2840) mem 16699MB [2024-08-10 15:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][90/625] eta 0:04:03 lr 0.000519 wd 0.0500 time 0.4437 (0.4548) data time 0.0006 (0.0052) model time 0.4431 (0.4500) loss 2.7985 (2.9278) grad_norm 1.7537 (inf) loss_scale 512.0000 (534.5055) mem 16699MB [2024-08-10 15:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][100/625] eta 0:03:58 lr 0.000519 wd 0.0500 time 0.4410 (0.4538) data time 0.0006 (0.0048) model time 0.4404 (0.4488) loss 2.3568 (2.9427) grad_norm 2.6036 (inf) loss_scale 512.0000 (532.2772) mem 16699MB [2024-08-10 15:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][110/625] eta 0:03:53 lr 0.000519 wd 0.0500 time 0.4444 (0.4529) data time 0.0008 (0.0044) model time 0.4436 (0.4478) loss 2.9835 (2.9209) grad_norm 2.4009 (inf) loss_scale 512.0000 (530.4505) mem 16699MB [2024-08-10 15:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][120/625] eta 0:03:48 lr 0.000518 wd 0.0500 time 0.4445 (0.4520) data time 0.0009 (0.0041) model time 0.4437 (0.4469) loss 2.3712 (2.8933) grad_norm 2.3113 (inf) loss_scale 512.0000 (528.9256) mem 16699MB [2024-08-10 15:53:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][130/625] eta 0:03:43 lr 0.000518 wd 0.0500 time 0.4464 (0.4514) data time 0.0009 (0.0039) model time 0.4456 (0.4463) loss 3.5541 (2.9084) grad_norm 1.6964 (inf) loss_scale 512.0000 (527.6336) mem 16699MB [2024-08-10 15:53:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][140/625] eta 0:03:39 lr 0.000518 wd 0.0500 time 0.4501 (0.4522) data time 0.0007 (0.0037) model time 0.4494 (0.4481) loss 2.5383 (2.9081) grad_norm 1.5791 (inf) loss_scale 512.0000 (526.5248) mem 16699MB [2024-08-10 15:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][150/625] eta 0:03:34 lr 0.000518 wd 0.0500 time 0.4459 (0.4519) data time 0.0009 (0.0035) model time 0.4450 (0.4479) loss 2.8851 (2.9092) grad_norm 1.5579 (inf) loss_scale 512.0000 (525.5629) mem 16699MB [2024-08-10 15:53:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][160/625] eta 0:03:29 lr 0.000518 wd 0.0500 time 0.4472 (0.4516) data time 0.0009 (0.0033) model time 0.4464 (0.4478) loss 2.8107 (2.9108) grad_norm 3.1178 (inf) loss_scale 512.0000 (524.7205) mem 16699MB [2024-08-10 15:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][170/625] eta 0:03:25 lr 0.000518 wd 0.0500 time 0.4456 (0.4512) data time 0.0008 (0.0032) model time 0.4448 (0.4475) loss 3.0446 (2.9055) grad_norm 1.4976 (inf) loss_scale 512.0000 (523.9766) mem 16699MB [2024-08-10 15:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][180/625] eta 0:03:20 lr 0.000518 wd 0.0500 time 0.4489 (0.4511) data time 0.0008 (0.0031) model time 0.4481 (0.4476) loss 2.7682 (2.9059) grad_norm 3.0001 (inf) loss_scale 512.0000 (523.3149) mem 16699MB [2024-08-10 15:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][190/625] eta 0:03:16 lr 0.000518 wd 0.0500 time 0.4385 (0.4511) data time 0.0010 (0.0030) model time 0.4376 (0.4477) loss 3.1846 (2.9092) grad_norm 1.6654 (inf) loss_scale 512.0000 (522.7225) mem 16699MB [2024-08-10 15:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][200/625] eta 0:03:11 lr 0.000518 wd 0.0500 time 0.4424 (0.4509) data time 0.0006 (0.0029) model time 0.4417 (0.4476) loss 3.0542 (2.9050) grad_norm 1.8445 (inf) loss_scale 512.0000 (522.1891) mem 16699MB [2024-08-10 15:54:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][210/625] eta 0:03:06 lr 0.000517 wd 0.0500 time 0.4414 (0.4505) data time 0.0009 (0.0028) model time 0.4405 (0.4473) loss 2.8002 (2.8917) grad_norm 2.0782 (inf) loss_scale 512.0000 (521.7062) mem 16699MB [2024-08-10 15:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][220/625] eta 0:03:02 lr 0.000517 wd 0.0500 time 0.4484 (0.4503) data time 0.0008 (0.0027) model time 0.4475 (0.4471) loss 3.2394 (2.8816) grad_norm 1.8363 (inf) loss_scale 512.0000 (521.2670) mem 16699MB [2024-08-10 15:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][230/625] eta 0:02:57 lr 0.000517 wd 0.0500 time 0.4453 (0.4500) data time 0.0010 (0.0026) model time 0.4443 (0.4469) loss 2.7163 (2.8837) grad_norm 3.3448 (inf) loss_scale 512.0000 (520.8658) mem 16699MB [2024-08-10 15:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][240/625] eta 0:02:53 lr 0.000517 wd 0.0500 time 0.4469 (0.4506) data time 0.0006 (0.0025) model time 0.4462 (0.4478) loss 3.5522 (2.8846) grad_norm 1.5367 (inf) loss_scale 512.0000 (520.4979) mem 16699MB [2024-08-10 15:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][250/625] eta 0:02:48 lr 0.000517 wd 0.0500 time 0.4429 (0.4504) data time 0.0010 (0.0025) model time 0.4419 (0.4476) loss 2.8375 (2.8843) grad_norm 1.3692 (inf) loss_scale 512.0000 (520.1594) mem 16699MB [2024-08-10 15:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][260/625] eta 0:02:44 lr 0.000517 wd 0.0500 time 0.4463 (0.4504) data time 0.0007 (0.0024) model time 0.4457 (0.4477) loss 3.8345 (2.8904) grad_norm 1.7555 (inf) loss_scale 512.0000 (519.8467) mem 16699MB [2024-08-10 15:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][270/625] eta 0:02:39 lr 0.000517 wd 0.0500 time 0.4444 (0.4501) data time 0.0006 (0.0024) model time 0.4438 (0.4475) loss 1.8004 (2.8911) grad_norm 1.6364 (inf) loss_scale 512.0000 (519.5572) mem 16699MB [2024-08-10 15:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][280/625] eta 0:02:35 lr 0.000517 wd 0.0500 time 0.4445 (0.4499) data time 0.0007 (0.0023) model time 0.4438 (0.4472) loss 2.8843 (2.8830) grad_norm 1.4906 (inf) loss_scale 512.0000 (519.2883) mem 16699MB [2024-08-10 15:54:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][290/625] eta 0:02:30 lr 0.000517 wd 0.0500 time 0.4449 (0.4503) data time 0.0009 (0.0023) model time 0.4439 (0.4477) loss 3.2494 (2.8820) grad_norm 2.3194 (inf) loss_scale 512.0000 (519.0378) mem 16699MB [2024-08-10 15:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][300/625] eta 0:02:26 lr 0.000517 wd 0.0500 time 0.4430 (0.4500) data time 0.0007 (0.0022) model time 0.4423 (0.4475) loss 3.5403 (2.8859) grad_norm 1.9174 (inf) loss_scale 512.0000 (518.8040) mem 16699MB [2024-08-10 15:54:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][310/625] eta 0:02:21 lr 0.000516 wd 0.0500 time 0.4435 (0.4504) data time 0.0008 (0.0022) model time 0.4427 (0.4480) loss 3.3097 (2.8863) grad_norm 1.4070 (inf) loss_scale 512.0000 (518.5852) mem 16699MB [2024-08-10 15:55:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][320/625] eta 0:02:17 lr 0.000516 wd 0.0500 time 0.4452 (0.4507) data time 0.0009 (0.0021) model time 0.4443 (0.4484) loss 3.2213 (2.8911) grad_norm 2.8651 (inf) loss_scale 512.0000 (518.3801) mem 16699MB [2024-08-10 15:55:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][330/625] eta 0:02:12 lr 0.000516 wd 0.0500 time 0.4395 (0.4505) data time 0.0007 (0.0021) model time 0.4389 (0.4483) loss 1.6890 (2.8849) grad_norm 1.5277 (inf) loss_scale 512.0000 (518.1873) mem 16699MB [2024-08-10 15:55:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][340/625] eta 0:02:08 lr 0.000516 wd 0.0500 time 0.4435 (0.4503) data time 0.0008 (0.0021) model time 0.4427 (0.4481) loss 2.6750 (2.8832) grad_norm 1.8523 (inf) loss_scale 512.0000 (518.0059) mem 16699MB [2024-08-10 15:55:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][350/625] eta 0:02:03 lr 0.000516 wd 0.0500 time 0.4399 (0.4501) data time 0.0008 (0.0020) model time 0.4391 (0.4478) loss 3.5350 (2.8816) grad_norm 1.3249 (inf) loss_scale 512.0000 (517.8348) mem 16699MB [2024-08-10 15:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][360/625] eta 0:01:59 lr 0.000516 wd 0.0500 time 0.4451 (0.4499) data time 0.0006 (0.0020) model time 0.4444 (0.4477) loss 1.8983 (2.8823) grad_norm 1.4377 (inf) loss_scale 512.0000 (517.6731) mem 16699MB [2024-08-10 15:55:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][370/625] eta 0:01:54 lr 0.000516 wd 0.0500 time 0.4537 (0.4502) data time 0.0009 (0.0020) model time 0.4528 (0.4481) loss 2.7195 (2.8792) grad_norm 2.1318 (inf) loss_scale 512.0000 (517.5202) mem 16699MB [2024-08-10 15:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][380/625] eta 0:01:50 lr 0.000516 wd 0.0500 time 0.4393 (0.4500) data time 0.0008 (0.0019) model time 0.4385 (0.4479) loss 3.3805 (2.8814) grad_norm 1.8503 (inf) loss_scale 512.0000 (517.3753) mem 16699MB [2024-08-10 15:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][390/625] eta 0:01:45 lr 0.000516 wd 0.0500 time 0.4433 (0.4499) data time 0.0007 (0.0019) model time 0.4427 (0.4478) loss 2.3836 (2.8778) grad_norm 2.9141 (inf) loss_scale 512.0000 (517.2379) mem 16699MB [2024-08-10 15:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][400/625] eta 0:01:41 lr 0.000515 wd 0.0500 time 0.4441 (0.4498) data time 0.0009 (0.0019) model time 0.4432 (0.4477) loss 3.2346 (2.8774) grad_norm 1.7560 (inf) loss_scale 512.0000 (517.1072) mem 16699MB [2024-08-10 15:55:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][410/625] eta 0:01:36 lr 0.000515 wd 0.0500 time 0.4469 (0.4496) data time 0.0007 (0.0019) model time 0.4461 (0.4476) loss 3.5536 (2.8790) grad_norm 1.6836 (inf) loss_scale 512.0000 (516.9830) mem 16699MB [2024-08-10 15:55:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][420/625] eta 0:01:32 lr 0.000515 wd 0.0500 time 0.4444 (0.4495) data time 0.0006 (0.0018) model time 0.4438 (0.4474) loss 2.3957 (2.8749) grad_norm 1.2672 (inf) loss_scale 512.0000 (516.8646) mem 16699MB [2024-08-10 15:55:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][430/625] eta 0:01:27 lr 0.000515 wd 0.0500 time 0.4405 (0.4493) data time 0.0006 (0.0018) model time 0.4399 (0.4473) loss 3.4181 (2.8739) grad_norm 1.6739 (inf) loss_scale 512.0000 (516.7517) mem 16699MB [2024-08-10 15:55:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][440/625] eta 0:01:23 lr 0.000515 wd 0.0500 time 0.4469 (0.4495) data time 0.0006 (0.0018) model time 0.4462 (0.4475) loss 2.0448 (2.8747) grad_norm 1.1137 (inf) loss_scale 512.0000 (516.6440) mem 16699MB [2024-08-10 15:55:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][450/625] eta 0:01:18 lr 0.000515 wd 0.0500 time 0.4465 (0.4494) data time 0.0006 (0.0018) model time 0.4458 (0.4475) loss 3.1697 (2.8736) grad_norm 2.1262 (inf) loss_scale 512.0000 (516.5410) mem 16699MB [2024-08-10 15:56:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][460/625] eta 0:01:14 lr 0.000515 wd 0.0500 time 0.4432 (0.4502) data time 0.0008 (0.0018) model time 0.4424 (0.4483) loss 2.8934 (2.8677) grad_norm 1.7188 (inf) loss_scale 512.0000 (516.4425) mem 16699MB [2024-08-10 15:56:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][470/625] eta 0:01:09 lr 0.000515 wd 0.0500 time 0.4483 (0.4501) data time 0.0008 (0.0017) model time 0.4475 (0.4482) loss 2.5244 (2.8665) grad_norm 1.2643 (inf) loss_scale 512.0000 (516.3482) mem 16699MB [2024-08-10 15:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][480/625] eta 0:01:05 lr 0.000515 wd 0.0500 time 0.4474 (0.4507) data time 0.0006 (0.0017) model time 0.4468 (0.4489) loss 2.8919 (2.8688) grad_norm 1.5639 (inf) loss_scale 512.0000 (516.2578) mem 16699MB [2024-08-10 15:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][490/625] eta 0:01:00 lr 0.000515 wd 0.0500 time 0.4411 (0.4505) data time 0.0009 (0.0017) model time 0.4401 (0.4488) loss 3.2058 (2.8705) grad_norm 3.3319 (inf) loss_scale 512.0000 (516.1711) mem 16699MB [2024-08-10 15:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][500/625] eta 0:00:56 lr 0.000514 wd 0.0500 time 0.4477 (0.4504) data time 0.0006 (0.0017) model time 0.4471 (0.4487) loss 3.1857 (2.8638) grad_norm 1.6523 (inf) loss_scale 512.0000 (516.0878) mem 16699MB [2024-08-10 15:56:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][510/625] eta 0:00:51 lr 0.000514 wd 0.0500 time 0.4424 (0.4504) data time 0.0009 (0.0017) model time 0.4415 (0.4487) loss 2.9454 (2.8615) grad_norm 1.2931 (inf) loss_scale 512.0000 (516.0078) mem 16699MB [2024-08-10 15:56:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][520/625] eta 0:00:47 lr 0.000514 wd 0.0500 time 0.4460 (0.4503) data time 0.0008 (0.0017) model time 0.4451 (0.4486) loss 3.1606 (2.8617) grad_norm 1.7731 (inf) loss_scale 512.0000 (515.9309) mem 16699MB [2024-08-10 15:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][530/625] eta 0:00:42 lr 0.000514 wd 0.0500 time 0.4426 (0.4502) data time 0.0009 (0.0016) model time 0.4417 (0.4484) loss 3.3150 (2.8601) grad_norm 1.4619 (inf) loss_scale 512.0000 (515.8569) mem 16699MB [2024-08-10 15:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][540/625] eta 0:00:38 lr 0.000514 wd 0.0500 time 0.4431 (0.4501) data time 0.0009 (0.0016) model time 0.4422 (0.4483) loss 2.8980 (2.8600) grad_norm 1.7311 (inf) loss_scale 512.0000 (515.7856) mem 16699MB [2024-08-10 15:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][550/625] eta 0:00:33 lr 0.000514 wd 0.0500 time 0.4402 (0.4500) data time 0.0006 (0.0016) model time 0.4396 (0.4482) loss 3.2830 (2.8586) grad_norm 1.7157 (inf) loss_scale 512.0000 (515.7169) mem 16699MB [2024-08-10 15:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][560/625] eta 0:00:29 lr 0.000514 wd 0.0500 time 0.4410 (0.4500) data time 0.0007 (0.0016) model time 0.4403 (0.4483) loss 2.3500 (2.8554) grad_norm 1.7880 (inf) loss_scale 512.0000 (515.6506) mem 16699MB [2024-08-10 15:56:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][570/625] eta 0:00:24 lr 0.000514 wd 0.0500 time 0.4401 (0.4499) data time 0.0009 (0.0016) model time 0.4392 (0.4482) loss 3.3221 (2.8583) grad_norm 1.9072 (inf) loss_scale 512.0000 (515.5867) mem 16699MB [2024-08-10 15:56:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][580/625] eta 0:00:20 lr 0.000514 wd 0.0500 time 0.4402 (0.4497) data time 0.0006 (0.0016) model time 0.4396 (0.4480) loss 2.7344 (2.8562) grad_norm 1.8858 (inf) loss_scale 512.0000 (515.5250) mem 16699MB [2024-08-10 15:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][590/625] eta 0:00:15 lr 0.000513 wd 0.0500 time 0.4456 (0.4496) data time 0.0008 (0.0016) model time 0.4448 (0.4479) loss 2.5107 (2.8531) grad_norm 5.6676 (inf) loss_scale 512.0000 (515.4653) mem 16699MB [2024-08-10 15:57:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][600/625] eta 0:00:11 lr 0.000513 wd 0.0500 time 0.4429 (0.4496) data time 0.0009 (0.0016) model time 0.4421 (0.4479) loss 2.9999 (2.8511) grad_norm 2.0088 (inf) loss_scale 512.0000 (515.4077) mem 16699MB [2024-08-10 15:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][610/625] eta 0:00:06 lr 0.000513 wd 0.0500 time 0.4384 (0.4499) data time 0.0006 (0.0016) model time 0.4378 (0.4482) loss 2.6747 (2.8495) grad_norm 1.5668 (inf) loss_scale 512.0000 (515.3519) mem 16699MB [2024-08-10 15:57:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [173/300][620/625] eta 0:00:02 lr 0.000513 wd 0.0500 time 0.4383 (0.4497) data time 0.0006 (0.0015) model time 0.4377 (0.4480) loss 3.4500 (2.8490) grad_norm 1.7505 (inf) loss_scale 512.0000 (515.2979) mem 16699MB [2024-08-10 15:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 173 training takes 0:04:41 [2024-08-10 15:57:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 15:57:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 15:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5151 (0.5151) Acc@1 87.988 (87.988) Acc@5 98.438 (98.438) Mem 16699MB [2024-08-10 15:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8569 (0.6296) Acc@1 78.809 (85.689) Acc@5 95.703 (97.474) Mem 16699MB [2024-08-10 15:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.8921 (0.7399) Acc@1 78.223 (82.964) Acc@5 95.361 (96.291) Mem 16699MB [2024-08-10 15:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.662 Acc@5 96.309 [2024-08-10 15:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 15:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.845 (0.845) Loss 0.4695 (0.4695) Acc@1 89.258 (89.258) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 15:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.188) Loss 0.7617 (0.5902) Acc@1 81.250 (87.061) Acc@5 96.338 (97.874) Mem 16699MB [2024-08-10 15:57:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.154) Loss 0.8477 (0.6924) Acc@1 78.662 (84.275) Acc@5 96.045 (96.912) Mem 16699MB [2024-08-10 15:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.953 Acc@5 96.909 [2024-08-10 15:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-10 15:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.95% [2024-08-10 15:57:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 15:57:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 15:57:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][0/625] eta 0:08:09 lr 0.000513 wd 0.0500 time 0.7837 (0.7837) data time 0.3949 (0.3949) model time 0.0000 (0.0000) loss 3.1724 (3.1724) grad_norm 5.1228 (5.1228) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:57:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][10/625] eta 0:04:51 lr 0.000513 wd 0.0500 time 0.4435 (0.4736) data time 0.0007 (0.0367) model time 0.0000 (0.0000) loss 2.2186 (2.9707) grad_norm 1.9720 (2.6057) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][20/625] eta 0:04:37 lr 0.000513 wd 0.0500 time 0.4375 (0.4583) data time 0.0009 (0.0199) model time 0.0000 (0.0000) loss 2.2380 (2.9002) grad_norm 1.5573 (2.2454) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][30/625] eta 0:04:33 lr 0.000513 wd 0.0500 time 0.4427 (0.4592) data time 0.0007 (0.0138) model time 0.0000 (0.0000) loss 2.9968 (2.8473) grad_norm 1.3930 (2.0458) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][40/625] eta 0:04:26 lr 0.000513 wd 0.0500 time 0.4447 (0.4555) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 3.2409 (2.8246) grad_norm 2.2202 (2.2456) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][50/625] eta 0:04:20 lr 0.000513 wd 0.0500 time 0.4410 (0.4531) data time 0.0006 (0.0087) model time 0.0000 (0.0000) loss 3.7940 (2.9067) grad_norm 1.3942 (2.1764) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:57:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][60/625] eta 0:04:15 lr 0.000512 wd 0.0500 time 0.4389 (0.4516) data time 0.0010 (0.0074) model time 0.4379 (0.4435) loss 2.1821 (2.8929) grad_norm 1.7753 (2.2442) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][70/625] eta 0:04:12 lr 0.000512 wd 0.0500 time 0.6281 (0.4556) data time 0.0006 (0.0065) model time 0.6275 (0.4612) loss 3.4642 (2.9252) grad_norm 1.8052 (2.2037) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][80/625] eta 0:04:07 lr 0.000512 wd 0.0500 time 0.4450 (0.4542) data time 0.0008 (0.0058) model time 0.4442 (0.4552) loss 3.3646 (2.9104) grad_norm 2.0582 (2.1400) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][90/625] eta 0:04:02 lr 0.000512 wd 0.0500 time 0.4431 (0.4536) data time 0.0008 (0.0052) model time 0.4423 (0.4534) loss 3.0295 (2.9026) grad_norm 1.7272 (2.1683) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][100/625] eta 0:03:57 lr 0.000512 wd 0.0500 time 0.4441 (0.4527) data time 0.0009 (0.0048) model time 0.4432 (0.4514) loss 2.5353 (2.9078) grad_norm 1.4211 (2.1006) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][110/625] eta 0:03:52 lr 0.000512 wd 0.0500 time 0.4438 (0.4519) data time 0.0006 (0.0045) model time 0.4433 (0.4501) loss 2.5939 (2.8983) grad_norm 1.9525 (2.0494) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][120/625] eta 0:03:47 lr 0.000512 wd 0.0500 time 0.4405 (0.4513) data time 0.0008 (0.0042) model time 0.4396 (0.4491) loss 3.4901 (2.8992) grad_norm 1.3175 (2.0066) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][130/625] eta 0:03:43 lr 0.000512 wd 0.0500 time 0.4428 (0.4521) data time 0.0008 (0.0039) model time 0.4419 (0.4507) loss 3.2973 (2.9110) grad_norm 1.5586 (2.0123) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][140/625] eta 0:03:38 lr 0.000512 wd 0.0500 time 0.4421 (0.4514) data time 0.0007 (0.0037) model time 0.4415 (0.4496) loss 2.0213 (2.9128) grad_norm 1.7477 (2.0699) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][150/625] eta 0:03:34 lr 0.000511 wd 0.0500 time 0.4373 (0.4509) data time 0.0006 (0.0035) model time 0.4367 (0.4489) loss 3.4158 (2.8994) grad_norm 2.4850 (2.0793) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][160/625] eta 0:03:29 lr 0.000511 wd 0.0500 time 0.4537 (0.4504) data time 0.0006 (0.0033) model time 0.4531 (0.4483) loss 2.0604 (2.8929) grad_norm 1.3497 (2.0597) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][170/625] eta 0:03:24 lr 0.000511 wd 0.0500 time 0.4394 (0.4499) data time 0.0006 (0.0032) model time 0.4388 (0.4477) loss 1.8693 (2.8539) grad_norm 1.3877 (2.0764) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][180/625] eta 0:03:20 lr 0.000511 wd 0.0500 time 0.4408 (0.4495) data time 0.0009 (0.0031) model time 0.4400 (0.4472) loss 3.0451 (2.8586) grad_norm 1.2576 (2.0530) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][190/625] eta 0:03:15 lr 0.000511 wd 0.0500 time 0.4404 (0.4498) data time 0.0008 (0.0030) model time 0.4396 (0.4477) loss 3.0571 (2.8656) grad_norm 1.5089 (2.0400) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:58:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][200/625] eta 0:03:10 lr 0.000511 wd 0.0500 time 0.4420 (0.4494) data time 0.0010 (0.0029) model time 0.4410 (0.4473) loss 2.6047 (2.8731) grad_norm 1.4814 (2.0232) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][210/625] eta 0:03:06 lr 0.000511 wd 0.0500 time 0.4403 (0.4502) data time 0.0007 (0.0028) model time 0.4396 (0.4484) loss 3.5335 (2.8786) grad_norm 2.5575 (2.0936) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][220/625] eta 0:03:02 lr 0.000511 wd 0.0500 time 0.4441 (0.4498) data time 0.0006 (0.0027) model time 0.4435 (0.4480) loss 2.9931 (2.8898) grad_norm 2.0768 (2.0852) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][230/625] eta 0:02:57 lr 0.000511 wd 0.0500 time 0.4406 (0.4496) data time 0.0006 (0.0026) model time 0.4400 (0.4477) loss 3.3573 (2.8869) grad_norm 1.6785 (2.0716) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][240/625] eta 0:02:52 lr 0.000511 wd 0.0500 time 0.4402 (0.4493) data time 0.0006 (0.0025) model time 0.4396 (0.4474) loss 3.3781 (2.8750) grad_norm 1.5423 (2.0645) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][250/625] eta 0:02:48 lr 0.000510 wd 0.0500 time 0.4448 (0.4491) data time 0.0009 (0.0025) model time 0.4439 (0.4472) loss 3.1201 (2.8789) grad_norm 1.4442 (2.0724) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][260/625] eta 0:02:44 lr 0.000510 wd 0.0500 time 0.4416 (0.4495) data time 0.0006 (0.0024) model time 0.4410 (0.4477) loss 2.2189 (2.8776) grad_norm 1.9242 (2.0679) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][270/625] eta 0:02:39 lr 0.000510 wd 0.0500 time 0.4428 (0.4493) data time 0.0008 (0.0024) model time 0.4420 (0.4475) loss 3.6071 (2.8781) grad_norm 1.5940 (2.0650) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][280/625] eta 0:02:34 lr 0.000510 wd 0.0500 time 0.4429 (0.4491) data time 0.0006 (0.0023) model time 0.4422 (0.4473) loss 2.1108 (2.8725) grad_norm 2.2985 (2.0672) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][290/625] eta 0:02:30 lr 0.000510 wd 0.0500 time 0.4410 (0.4488) data time 0.0007 (0.0023) model time 0.4403 (0.4470) loss 3.6300 (2.8728) grad_norm 1.7923 (2.0646) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][300/625] eta 0:02:25 lr 0.000510 wd 0.0500 time 0.4400 (0.4486) data time 0.0006 (0.0022) model time 0.4394 (0.4468) loss 3.6131 (2.8677) grad_norm 1.9333 (2.0565) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][310/625] eta 0:02:21 lr 0.000510 wd 0.0500 time 0.4415 (0.4488) data time 0.0007 (0.0022) model time 0.4408 (0.4471) loss 2.4447 (2.8663) grad_norm 3.8161 (2.0468) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][320/625] eta 0:02:16 lr 0.000510 wd 0.0500 time 0.4423 (0.4485) data time 0.0007 (0.0021) model time 0.4417 (0.4468) loss 3.5965 (2.8578) grad_norm 1.7956 (2.0399) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 15:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][330/625] eta 0:02:12 lr 0.000510 wd 0.0500 time 0.4476 (0.4484) data time 0.0008 (0.0021) model time 0.4467 (0.4467) loss 3.3954 (2.8649) grad_norm 2.2648 (2.0385) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][340/625] eta 0:02:07 lr 0.000509 wd 0.0500 time 0.4444 (0.4482) data time 0.0006 (0.0020) model time 0.4438 (0.4465) loss 2.4506 (2.8648) grad_norm 1.8929 (2.0461) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][350/625] eta 0:02:03 lr 0.000509 wd 0.0500 time 0.4527 (0.4488) data time 0.0006 (0.0020) model time 0.4521 (0.4472) loss 3.2882 (2.8588) grad_norm 1.3570 (2.0435) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][360/625] eta 0:01:59 lr 0.000509 wd 0.0500 time 0.4439 (0.4497) data time 0.0006 (0.0020) model time 0.4433 (0.4483) loss 3.3846 (2.8550) grad_norm 2.5285 (2.0429) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][370/625] eta 0:01:54 lr 0.000509 wd 0.0500 time 0.4417 (0.4495) data time 0.0007 (0.0020) model time 0.4410 (0.4480) loss 3.3280 (2.8512) grad_norm 1.9112 (2.0340) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][380/625] eta 0:01:50 lr 0.000509 wd 0.0500 time 0.4489 (0.4493) data time 0.0008 (0.0019) model time 0.4481 (0.4479) loss 2.7624 (2.8508) grad_norm 1.1989 (2.0193) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][390/625] eta 0:01:45 lr 0.000509 wd 0.0500 time 0.4433 (0.4491) data time 0.0008 (0.0019) model time 0.4425 (0.4477) loss 1.9486 (2.8443) grad_norm 1.5665 (2.0106) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][400/625] eta 0:01:41 lr 0.000509 wd 0.0500 time 0.4474 (0.4490) data time 0.0006 (0.0019) model time 0.4468 (0.4475) loss 2.4724 (2.8390) grad_norm 1.2773 (1.9981) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][410/625] eta 0:01:36 lr 0.000509 wd 0.0500 time 0.4414 (0.4497) data time 0.0008 (0.0019) model time 0.4406 (0.4483) loss 2.8568 (2.8367) grad_norm 1.5672 (1.9924) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][420/625] eta 0:01:32 lr 0.000509 wd 0.0500 time 0.4450 (0.4495) data time 0.0008 (0.0018) model time 0.4442 (0.4482) loss 2.7643 (2.8388) grad_norm 1.7865 (2.0015) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][430/625] eta 0:01:27 lr 0.000509 wd 0.0500 time 0.4417 (0.4494) data time 0.0008 (0.0018) model time 0.4409 (0.4480) loss 3.1236 (2.8402) grad_norm 2.4877 (2.0087) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][440/625] eta 0:01:23 lr 0.000508 wd 0.0500 time 0.4414 (0.4493) data time 0.0008 (0.0018) model time 0.4405 (0.4479) loss 3.3439 (2.8444) grad_norm 1.2691 (2.0072) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][450/625] eta 0:01:18 lr 0.000508 wd 0.0500 time 0.4424 (0.4492) data time 0.0006 (0.0018) model time 0.4418 (0.4478) loss 2.5621 (2.8393) grad_norm 1.6270 (1.9995) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][460/625] eta 0:01:14 lr 0.000508 wd 0.0500 time 0.4459 (0.4495) data time 0.0006 (0.0017) model time 0.4452 (0.4481) loss 3.3567 (2.8363) grad_norm 1.9403 (1.9960) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:00:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][470/625] eta 0:01:09 lr 0.000508 wd 0.0500 time 0.4413 (0.4494) data time 0.0008 (0.0017) model time 0.4405 (0.4480) loss 3.1083 (2.8313) grad_norm 1.7180 (1.9880) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][480/625] eta 0:01:05 lr 0.000508 wd 0.0500 time 0.4482 (0.4496) data time 0.0006 (0.0017) model time 0.4475 (0.4483) loss 3.2786 (2.8340) grad_norm 1.4930 (1.9862) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][490/625] eta 0:01:00 lr 0.000508 wd 0.0500 time 0.4412 (0.4496) data time 0.0009 (0.0017) model time 0.4403 (0.4483) loss 3.0317 (2.8356) grad_norm 2.0920 (1.9958) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][500/625] eta 0:00:56 lr 0.000508 wd 0.0500 time 0.4422 (0.4501) data time 0.0009 (0.0017) model time 0.4414 (0.4489) loss 2.9954 (2.8335) grad_norm 1.9866 (1.9906) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][510/625] eta 0:00:51 lr 0.000508 wd 0.0500 time 0.4447 (0.4500) data time 0.0006 (0.0017) model time 0.4440 (0.4487) loss 2.8822 (2.8336) grad_norm 1.8251 (1.9815) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][520/625] eta 0:00:47 lr 0.000508 wd 0.0500 time 0.4445 (0.4499) data time 0.0009 (0.0016) model time 0.4437 (0.4486) loss 2.8980 (2.8315) grad_norm 1.5164 (1.9785) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][530/625] eta 0:00:42 lr 0.000508 wd 0.0500 time 0.4426 (0.4497) data time 0.0009 (0.0016) model time 0.4418 (0.4485) loss 3.0815 (2.8369) grad_norm 3.1224 (1.9788) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][540/625] eta 0:00:38 lr 0.000507 wd 0.0500 time 0.4409 (0.4497) data time 0.0008 (0.0016) model time 0.4401 (0.4484) loss 3.5342 (2.8373) grad_norm 1.3448 (1.9733) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][550/625] eta 0:00:33 lr 0.000507 wd 0.0500 time 0.4478 (0.4497) data time 0.0007 (0.0016) model time 0.4471 (0.4485) loss 1.8250 (2.8387) grad_norm 1.6333 (1.9755) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][560/625] eta 0:00:29 lr 0.000507 wd 0.0500 time 0.4444 (0.4498) data time 0.0009 (0.0016) model time 0.4435 (0.4486) loss 3.5564 (2.8411) grad_norm 2.0666 (1.9716) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][570/625] eta 0:00:24 lr 0.000507 wd 0.0500 time 0.4487 (0.4497) data time 0.0007 (0.0016) model time 0.4480 (0.4485) loss 2.8769 (2.8441) grad_norm 1.5899 (1.9703) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][580/625] eta 0:00:20 lr 0.000507 wd 0.0500 time 0.4464 (0.4496) data time 0.0008 (0.0016) model time 0.4456 (0.4484) loss 2.3348 (2.8442) grad_norm 4.0239 (1.9688) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][590/625] eta 0:00:15 lr 0.000507 wd 0.0500 time 0.4446 (0.4495) data time 0.0006 (0.0016) model time 0.4439 (0.4483) loss 3.2492 (2.8442) grad_norm 1.1962 (1.9676) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][600/625] eta 0:00:11 lr 0.000507 wd 0.0500 time 0.4405 (0.4495) data time 0.0010 (0.0016) model time 0.4395 (0.4483) loss 2.6287 (2.8406) grad_norm 2.3938 (1.9681) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][610/625] eta 0:00:06 lr 0.000507 wd 0.0500 time 0.4412 (0.4494) data time 0.0005 (0.0015) model time 0.4407 (0.4482) loss 2.9951 (2.8414) grad_norm 2.0273 (1.9874) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [174/300][620/625] eta 0:00:02 lr 0.000507 wd 0.0500 time 0.4413 (0.4493) data time 0.0004 (0.0015) model time 0.4409 (0.4480) loss 2.4250 (2.8467) grad_norm 1.9829 (1.9897) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 174 training takes 0:04:40 [2024-08-10 16:02:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:02:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:02:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.491 (0.491) Loss 0.5161 (0.5161) Acc@1 88.184 (88.184) Acc@5 98.486 (98.486) Mem 16699MB [2024-08-10 16:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.154) Loss 0.8271 (0.6297) Acc@1 79.785 (85.902) Acc@5 96.289 (97.625) Mem 16699MB [2024-08-10 16:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9453 (0.7451) Acc@1 77.295 (83.047) Acc@5 94.824 (96.405) Mem 16699MB [2024-08-10 16:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.815 Acc@5 96.389 [2024-08-10 16:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 16:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.82% [2024-08-10 16:02:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 16:02:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 16:02:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.4685 (0.4685) Acc@1 89.209 (89.209) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 16:02:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.7617 (0.5897) Acc@1 81.104 (87.038) Acc@5 96.289 (97.896) Mem 16699MB [2024-08-10 16:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.8462 (0.6915) Acc@1 78.711 (84.273) Acc@5 95.947 (96.926) Mem 16699MB [2024-08-10 16:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.979 Acc@5 96.921 [2024-08-10 16:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-10 16:02:18 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.98% [2024-08-10 16:02:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:02:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:02:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][0/625] eta 0:08:18 lr 0.000507 wd 0.0500 time 0.7969 (0.7969) data time 0.4084 (0.4084) model time 0.0000 (0.0000) loss 2.3677 (2.3677) grad_norm 1.8686 (1.8686) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][10/625] eta 0:04:52 lr 0.000506 wd 0.0500 time 0.4455 (0.4760) data time 0.0009 (0.0380) model time 0.0000 (0.0000) loss 3.1121 (2.6595) grad_norm 3.3652 (2.1783) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][20/625] eta 0:04:38 lr 0.000506 wd 0.0500 time 0.4467 (0.4611) data time 0.0006 (0.0203) model time 0.0000 (0.0000) loss 1.9072 (2.6945) grad_norm 2.2435 (2.1037) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][30/625] eta 0:04:34 lr 0.000506 wd 0.0500 time 0.4450 (0.4622) data time 0.0006 (0.0141) model time 0.0000 (0.0000) loss 3.2205 (2.7674) grad_norm 1.4366 (1.9797) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][40/625] eta 0:04:29 lr 0.000506 wd 0.0500 time 0.4438 (0.4614) data time 0.0006 (0.0108) model time 0.0000 (0.0000) loss 2.8728 (2.8148) grad_norm 2.1699 (1.9437) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][50/625] eta 0:04:23 lr 0.000506 wd 0.0500 time 0.4510 (0.4583) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 2.8670 (2.8692) grad_norm 1.8562 (2.0573) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][60/625] eta 0:04:17 lr 0.000506 wd 0.0500 time 0.4410 (0.4560) data time 0.0006 (0.0075) model time 0.4403 (0.4438) loss 1.9847 (2.8890) grad_norm 2.0157 (2.0041) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][70/625] eta 0:04:13 lr 0.000506 wd 0.0500 time 0.6138 (0.4571) data time 0.0007 (0.0066) model time 0.6131 (0.4534) loss 3.1069 (2.8530) grad_norm 1.4123 (1.9433) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:02:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][80/625] eta 0:04:08 lr 0.000506 wd 0.0500 time 0.4412 (0.4555) data time 0.0008 (0.0059) model time 0.4404 (0.4499) loss 3.1169 (2.8215) grad_norm 1.4579 (1.8969) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][90/625] eta 0:04:03 lr 0.000506 wd 0.0500 time 0.4443 (0.4558) data time 0.0007 (0.0054) model time 0.4436 (0.4519) loss 2.6240 (2.8127) grad_norm 1.9300 (1.8957) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:03:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][100/625] eta 0:03:58 lr 0.000505 wd 0.0500 time 0.4414 (0.4545) data time 0.0007 (0.0049) model time 0.4408 (0.4497) loss 2.9390 (2.8293) grad_norm 1.2259 (1.8629) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-10 16:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][110/625] eta 0:03:53 lr 0.000505 wd 0.0500 time 0.4377 (0.4532) data time 0.0010 (0.0046) model time 0.4367 (0.4480) loss 2.3342 (2.8332) grad_norm 1.5065 (inf) loss_scale 256.0000 (491.2432) mem 16699MB [2024-08-10 16:03:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][120/625] eta 0:03:49 lr 0.000505 wd 0.0500 time 0.4428 (0.4537) data time 0.0008 (0.0043) model time 0.4421 (0.4495) loss 3.3537 (2.8294) grad_norm 1.5254 (inf) loss_scale 256.0000 (471.8017) mem 16699MB [2024-08-10 16:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][130/625] eta 0:03:44 lr 0.000505 wd 0.0500 time 0.4401 (0.4528) data time 0.0007 (0.0040) model time 0.4395 (0.4485) loss 3.3495 (2.8516) grad_norm 2.0641 (inf) loss_scale 256.0000 (455.3282) mem 16699MB [2024-08-10 16:03:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][140/625] eta 0:03:39 lr 0.000505 wd 0.0500 time 0.4609 (0.4523) data time 0.0007 (0.0038) model time 0.4602 (0.4481) loss 2.4239 (2.8337) grad_norm 1.5667 (inf) loss_scale 256.0000 (441.1915) mem 16699MB [2024-08-10 16:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][150/625] eta 0:03:34 lr 0.000505 wd 0.0500 time 0.4438 (0.4518) data time 0.0008 (0.0036) model time 0.4429 (0.4475) loss 2.1514 (2.8380) grad_norm 1.4708 (inf) loss_scale 256.0000 (428.9272) mem 16699MB [2024-08-10 16:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][160/625] eta 0:03:29 lr 0.000505 wd 0.0500 time 0.4451 (0.4513) data time 0.0009 (0.0034) model time 0.4441 (0.4472) loss 2.6663 (2.8537) grad_norm 1.5914 (inf) loss_scale 256.0000 (418.1863) mem 16699MB [2024-08-10 16:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][170/625] eta 0:03:25 lr 0.000505 wd 0.0500 time 0.4478 (0.4518) data time 0.0009 (0.0033) model time 0.4469 (0.4482) loss 2.8779 (2.8487) grad_norm 1.4762 (inf) loss_scale 256.0000 (408.7018) mem 16699MB [2024-08-10 16:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][180/625] eta 0:03:20 lr 0.000505 wd 0.0500 time 0.4508 (0.4514) data time 0.0008 (0.0032) model time 0.4499 (0.4478) loss 3.2302 (2.8451) grad_norm 2.6352 (inf) loss_scale 256.0000 (400.2652) mem 16699MB [2024-08-10 16:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][190/625] eta 0:03:16 lr 0.000505 wd 0.0500 time 0.4429 (0.4520) data time 0.0009 (0.0030) model time 0.4420 (0.4488) loss 3.0573 (2.8458) grad_norm 1.7456 (inf) loss_scale 256.0000 (392.7120) mem 16699MB [2024-08-10 16:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][200/625] eta 0:03:11 lr 0.000504 wd 0.0500 time 0.4432 (0.4516) data time 0.0009 (0.0029) model time 0.4423 (0.4484) loss 2.6021 (2.8408) grad_norm 1.4071 (inf) loss_scale 256.0000 (385.9104) mem 16699MB [2024-08-10 16:03:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][210/625] eta 0:03:07 lr 0.000504 wd 0.0500 time 0.4412 (0.4512) data time 0.0009 (0.0028) model time 0.4403 (0.4481) loss 2.2905 (2.8514) grad_norm 1.4189 (inf) loss_scale 256.0000 (379.7536) mem 16699MB [2024-08-10 16:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][220/625] eta 0:03:02 lr 0.000504 wd 0.0500 time 0.4459 (0.4509) data time 0.0009 (0.0028) model time 0.4450 (0.4478) loss 2.1651 (2.8522) grad_norm 1.9221 (inf) loss_scale 256.0000 (374.1538) mem 16699MB [2024-08-10 16:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][230/625] eta 0:02:57 lr 0.000504 wd 0.0500 time 0.4446 (0.4506) data time 0.0006 (0.0027) model time 0.4440 (0.4475) loss 2.7455 (2.8524) grad_norm 2.4571 (inf) loss_scale 256.0000 (369.0390) mem 16699MB [2024-08-10 16:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][240/625] eta 0:02:53 lr 0.000504 wd 0.0500 time 0.4426 (0.4502) data time 0.0007 (0.0026) model time 0.4419 (0.4471) loss 2.6361 (2.8603) grad_norm 2.1029 (inf) loss_scale 256.0000 (364.3485) mem 16699MB [2024-08-10 16:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][250/625] eta 0:02:48 lr 0.000504 wd 0.0500 time 0.4396 (0.4499) data time 0.0007 (0.0025) model time 0.4389 (0.4468) loss 2.4880 (2.8605) grad_norm 1.7420 (inf) loss_scale 256.0000 (360.0319) mem 16699MB [2024-08-10 16:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][260/625] eta 0:02:44 lr 0.000504 wd 0.0500 time 0.4410 (0.4496) data time 0.0006 (0.0025) model time 0.4403 (0.4466) loss 1.5993 (2.8533) grad_norm 1.5841 (inf) loss_scale 256.0000 (356.0460) mem 16699MB [2024-08-10 16:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][270/625] eta 0:02:39 lr 0.000504 wd 0.0500 time 0.4455 (0.4494) data time 0.0006 (0.0024) model time 0.4448 (0.4465) loss 3.5822 (2.8461) grad_norm 2.4665 (inf) loss_scale 256.0000 (352.3542) mem 16699MB [2024-08-10 16:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][280/625] eta 0:02:34 lr 0.000504 wd 0.0500 time 0.4438 (0.4492) data time 0.0006 (0.0024) model time 0.4432 (0.4463) loss 2.3479 (2.8497) grad_norm 1.8958 (inf) loss_scale 256.0000 (348.9253) mem 16699MB [2024-08-10 16:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][290/625] eta 0:02:30 lr 0.000503 wd 0.0500 time 0.4424 (0.4490) data time 0.0007 (0.0023) model time 0.4417 (0.4462) loss 2.6734 (2.8549) grad_norm 1.1212 (inf) loss_scale 256.0000 (345.7320) mem 16699MB [2024-08-10 16:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][300/625] eta 0:02:25 lr 0.000503 wd 0.0500 time 0.4425 (0.4489) data time 0.0008 (0.0023) model time 0.4417 (0.4460) loss 2.9950 (2.8613) grad_norm 8.3623 (inf) loss_scale 256.0000 (342.7508) mem 16699MB [2024-08-10 16:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][310/625] eta 0:02:21 lr 0.000503 wd 0.0500 time 0.4497 (0.4487) data time 0.0008 (0.0022) model time 0.4488 (0.4459) loss 3.6718 (2.8603) grad_norm 1.9755 (inf) loss_scale 256.0000 (339.9614) mem 16699MB [2024-08-10 16:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][320/625] eta 0:02:16 lr 0.000503 wd 0.0500 time 0.4323 (0.4486) data time 0.0007 (0.0022) model time 0.4316 (0.4459) loss 3.4153 (2.8629) grad_norm 1.9532 (inf) loss_scale 256.0000 (337.3458) mem 16699MB [2024-08-10 16:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][330/625] eta 0:02:12 lr 0.000503 wd 0.0500 time 0.4422 (0.4484) data time 0.0009 (0.0022) model time 0.4413 (0.4457) loss 3.3563 (2.8637) grad_norm 2.2008 (inf) loss_scale 256.0000 (334.8882) mem 16699MB [2024-08-10 16:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][340/625] eta 0:02:07 lr 0.000503 wd 0.0500 time 0.4434 (0.4483) data time 0.0008 (0.0021) model time 0.4426 (0.4456) loss 2.7177 (2.8658) grad_norm 1.5529 (inf) loss_scale 256.0000 (332.5748) mem 16699MB [2024-08-10 16:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][350/625] eta 0:02:03 lr 0.000503 wd 0.0500 time 0.4394 (0.4481) data time 0.0006 (0.0021) model time 0.4388 (0.4455) loss 2.2696 (2.8599) grad_norm 1.5255 (inf) loss_scale 256.0000 (330.3932) mem 16699MB [2024-08-10 16:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][360/625] eta 0:01:58 lr 0.000503 wd 0.0500 time 0.4420 (0.4480) data time 0.0007 (0.0021) model time 0.4413 (0.4453) loss 2.3834 (2.8569) grad_norm 1.5081 (inf) loss_scale 256.0000 (328.3324) mem 16699MB [2024-08-10 16:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][370/625] eta 0:01:54 lr 0.000503 wd 0.0500 time 0.4433 (0.4484) data time 0.0007 (0.0020) model time 0.4426 (0.4459) loss 2.6728 (2.8527) grad_norm 1.9589 (inf) loss_scale 256.0000 (326.3827) mem 16699MB [2024-08-10 16:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][380/625] eta 0:01:49 lr 0.000503 wd 0.0500 time 0.4437 (0.4487) data time 0.0006 (0.0020) model time 0.4431 (0.4463) loss 3.3173 (2.8546) grad_norm 1.5650 (inf) loss_scale 256.0000 (324.5354) mem 16699MB [2024-08-10 16:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][390/625] eta 0:01:45 lr 0.000502 wd 0.0500 time 0.4416 (0.4488) data time 0.0007 (0.0020) model time 0.4409 (0.4465) loss 1.6497 (2.8513) grad_norm 1.7769 (inf) loss_scale 256.0000 (322.7826) mem 16699MB [2024-08-10 16:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][400/625] eta 0:01:40 lr 0.000502 wd 0.0500 time 0.4323 (0.4487) data time 0.0007 (0.0019) model time 0.4316 (0.4463) loss 3.5119 (2.8551) grad_norm 2.4052 (inf) loss_scale 256.0000 (321.1172) mem 16699MB [2024-08-10 16:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][410/625] eta 0:01:36 lr 0.000502 wd 0.0500 time 0.4392 (0.4498) data time 0.0008 (0.0019) model time 0.4383 (0.4477) loss 3.1189 (2.8581) grad_norm 1.2659 (inf) loss_scale 256.0000 (319.5328) mem 16699MB [2024-08-10 16:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][420/625] eta 0:01:32 lr 0.000502 wd 0.0500 time 0.4464 (0.4496) data time 0.0006 (0.0019) model time 0.4458 (0.4475) loss 2.0487 (2.8545) grad_norm 1.9121 (inf) loss_scale 256.0000 (318.0238) mem 16699MB [2024-08-10 16:05:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][430/625] eta 0:01:27 lr 0.000502 wd 0.0500 time 0.4452 (0.4499) data time 0.0008 (0.0019) model time 0.4443 (0.4479) loss 2.6363 (2.8483) grad_norm 1.5844 (inf) loss_scale 256.0000 (316.5847) mem 16699MB [2024-08-10 16:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][440/625] eta 0:01:23 lr 0.000502 wd 0.0500 time 0.4467 (0.4498) data time 0.0007 (0.0018) model time 0.4460 (0.4477) loss 1.7830 (2.8485) grad_norm 2.2334 (inf) loss_scale 256.0000 (315.2109) mem 16699MB [2024-08-10 16:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][450/625] eta 0:01:18 lr 0.000502 wd 0.0500 time 0.5842 (0.4499) data time 0.0009 (0.0018) model time 0.5833 (0.4480) loss 2.9124 (2.8480) grad_norm 1.3885 (inf) loss_scale 256.0000 (313.8980) mem 16699MB [2024-08-10 16:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][460/625] eta 0:01:14 lr 0.000502 wd 0.0500 time 0.4405 (0.4498) data time 0.0008 (0.0018) model time 0.4397 (0.4478) loss 2.3480 (2.8481) grad_norm 2.2401 (inf) loss_scale 256.0000 (312.6421) mem 16699MB [2024-08-10 16:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][470/625] eta 0:01:09 lr 0.000502 wd 0.0500 time 0.4456 (0.4496) data time 0.0006 (0.0018) model time 0.4449 (0.4476) loss 2.9390 (2.8483) grad_norm 2.1981 (inf) loss_scale 256.0000 (311.4395) mem 16699MB [2024-08-10 16:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][480/625] eta 0:01:05 lr 0.000501 wd 0.0500 time 0.4422 (0.4495) data time 0.0006 (0.0018) model time 0.4416 (0.4475) loss 3.2091 (2.8513) grad_norm 1.2508 (inf) loss_scale 256.0000 (310.2869) mem 16699MB [2024-08-10 16:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][490/625] eta 0:01:00 lr 0.000501 wd 0.0500 time 0.4444 (0.4494) data time 0.0007 (0.0017) model time 0.4437 (0.4474) loss 2.9680 (2.8559) grad_norm 1.7502 (inf) loss_scale 256.0000 (309.1813) mem 16699MB [2024-08-10 16:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][500/625] eta 0:00:56 lr 0.000501 wd 0.0500 time 0.4450 (0.4493) data time 0.0006 (0.0017) model time 0.4444 (0.4473) loss 3.0374 (2.8629) grad_norm 1.8641 (inf) loss_scale 256.0000 (308.1198) mem 16699MB [2024-08-10 16:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][510/625] eta 0:00:51 lr 0.000501 wd 0.0500 time 0.4453 (0.4492) data time 0.0006 (0.0017) model time 0.4447 (0.4473) loss 3.6542 (2.8699) grad_norm 2.1725 (inf) loss_scale 256.0000 (307.0998) mem 16699MB [2024-08-10 16:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][520/625] eta 0:00:47 lr 0.000501 wd 0.0500 time 0.4483 (0.4491) data time 0.0006 (0.0017) model time 0.4477 (0.4472) loss 2.9229 (2.8670) grad_norm 1.5071 (inf) loss_scale 256.0000 (306.1190) mem 16699MB [2024-08-10 16:06:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][530/625] eta 0:00:42 lr 0.000501 wd 0.0500 time 0.6262 (0.4493) data time 0.0007 (0.0017) model time 0.6255 (0.4474) loss 2.8564 (2.8650) grad_norm 2.1831 (inf) loss_scale 256.0000 (305.1751) mem 16699MB [2024-08-10 16:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][540/625] eta 0:00:38 lr 0.000501 wd 0.0500 time 0.4428 (0.4492) data time 0.0006 (0.0017) model time 0.4421 (0.4473) loss 3.2339 (2.8643) grad_norm 1.7080 (inf) loss_scale 256.0000 (304.2662) mem 16699MB [2024-08-10 16:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][550/625] eta 0:00:33 lr 0.000501 wd 0.0500 time 0.4518 (0.4491) data time 0.0008 (0.0017) model time 0.4510 (0.4472) loss 1.8122 (2.8635) grad_norm 1.6413 (inf) loss_scale 256.0000 (303.3902) mem 16699MB [2024-08-10 16:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][560/625] eta 0:00:29 lr 0.000501 wd 0.0500 time 0.4444 (0.4497) data time 0.0007 (0.0016) model time 0.4437 (0.4479) loss 3.0888 (2.8673) grad_norm 1.9066 (inf) loss_scale 256.0000 (302.5455) mem 16699MB [2024-08-10 16:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][570/625] eta 0:00:24 lr 0.000501 wd 0.0500 time 0.4448 (0.4497) data time 0.0006 (0.0016) model time 0.4442 (0.4479) loss 2.7864 (2.8701) grad_norm 1.8983 (inf) loss_scale 256.0000 (301.7303) mem 16699MB [2024-08-10 16:06:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][580/625] eta 0:00:20 lr 0.000500 wd 0.0500 time 0.4413 (0.4496) data time 0.0009 (0.0016) model time 0.4405 (0.4479) loss 2.4975 (2.8709) grad_norm 1.6698 (inf) loss_scale 256.0000 (300.9432) mem 16699MB [2024-08-10 16:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][590/625] eta 0:00:15 lr 0.000500 wd 0.0500 time 0.4409 (0.4495) data time 0.0009 (0.0016) model time 0.4400 (0.4478) loss 2.9202 (2.8701) grad_norm 1.3549 (inf) loss_scale 256.0000 (300.1827) mem 16699MB [2024-08-10 16:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][600/625] eta 0:00:11 lr 0.000500 wd 0.0500 time 0.4569 (0.4496) data time 0.0006 (0.0016) model time 0.4563 (0.4479) loss 2.9713 (2.8700) grad_norm 3.5977 (inf) loss_scale 256.0000 (299.4476) mem 16699MB [2024-08-10 16:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][610/625] eta 0:00:06 lr 0.000500 wd 0.0500 time 0.4374 (0.4495) data time 0.0006 (0.0016) model time 0.4367 (0.4478) loss 2.7803 (2.8709) grad_norm 1.8820 (inf) loss_scale 256.0000 (298.7365) mem 16699MB [2024-08-10 16:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [175/300][620/625] eta 0:00:02 lr 0.000500 wd 0.0500 time 0.4395 (0.4495) data time 0.0004 (0.0016) model time 0.4391 (0.4477) loss 2.6691 (2.8725) grad_norm 1.5551 (inf) loss_scale 256.0000 (298.0483) mem 16699MB [2024-08-10 16:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 175 training takes 0:04:40 [2024-08-10 16:07:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:07:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.5137 (0.5137) Acc@1 88.672 (88.672) Acc@5 98.633 (98.633) Mem 16699MB [2024-08-10 16:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.154) Loss 0.8716 (0.6472) Acc@1 79.150 (85.631) Acc@5 95.752 (97.563) Mem 16699MB [2024-08-10 16:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.136) Loss 0.9399 (0.7570) Acc@1 78.125 (82.968) Acc@5 94.873 (96.401) Mem 16699MB [2024-08-10 16:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.678 Acc@5 96.371 [2024-08-10 16:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 16:07:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.877 (0.877) Loss 0.4688 (0.4688) Acc@1 89.404 (89.404) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 16:07:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.190) Loss 0.7612 (0.5892) Acc@1 81.104 (87.100) Acc@5 96.289 (97.900) Mem 16699MB [2024-08-10 16:07:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.155) Loss 0.8452 (0.6910) Acc@1 78.662 (84.328) Acc@5 95.898 (96.940) Mem 16699MB [2024-08-10 16:07:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.035 Acc@5 96.933 [2024-08-10 16:07:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-10 16:07:09 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.04% [2024-08-10 16:07:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:07:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:07:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][0/625] eta 0:08:01 lr 0.000500 wd 0.0500 time 0.7699 (0.7699) data time 0.3803 (0.3803) model time 0.0000 (0.0000) loss 2.9140 (2.9140) grad_norm 1.9481 (1.9481) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][10/625] eta 0:04:50 lr 0.000500 wd 0.0500 time 0.4439 (0.4727) data time 0.0008 (0.0355) model time 0.0000 (0.0000) loss 3.0453 (2.9602) grad_norm 2.7140 (1.9643) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][20/625] eta 0:04:37 lr 0.000500 wd 0.0500 time 0.4415 (0.4585) data time 0.0007 (0.0190) model time 0.0000 (0.0000) loss 3.4622 (2.9129) grad_norm 1.5427 (2.0436) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][30/625] eta 0:04:32 lr 0.000500 wd 0.0500 time 0.4440 (0.4584) data time 0.0007 (0.0132) model time 0.0000 (0.0000) loss 1.7699 (2.8474) grad_norm 1.7771 (1.9280) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][40/625] eta 0:04:25 lr 0.000500 wd 0.0500 time 0.4408 (0.4547) data time 0.0009 (0.0102) model time 0.0000 (0.0000) loss 2.6403 (2.8929) grad_norm 1.6169 (1.9979) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][50/625] eta 0:04:19 lr 0.000499 wd 0.0500 time 0.4445 (0.4522) data time 0.0009 (0.0084) model time 0.0000 (0.0000) loss 3.3484 (2.9368) grad_norm 2.1745 (1.9921) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][60/625] eta 0:04:14 lr 0.000499 wd 0.0500 time 0.4414 (0.4504) data time 0.0008 (0.0072) model time 0.4405 (0.4403) loss 3.0630 (2.9328) grad_norm 2.8214 (2.0185) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][70/625] eta 0:04:09 lr 0.000499 wd 0.0500 time 0.4406 (0.4492) data time 0.0006 (0.0063) model time 0.4400 (0.4408) loss 2.3953 (2.9335) grad_norm 1.9605 (2.0031) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][80/625] eta 0:04:04 lr 0.000499 wd 0.0500 time 0.4407 (0.4486) data time 0.0007 (0.0057) model time 0.4400 (0.4416) loss 1.7497 (2.9100) grad_norm 1.8646 (2.0435) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][90/625] eta 0:04:00 lr 0.000499 wd 0.0500 time 0.4409 (0.4490) data time 0.0007 (0.0052) model time 0.4403 (0.4440) loss 2.0307 (2.9085) grad_norm 2.6631 (2.1458) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:07:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][100/625] eta 0:03:55 lr 0.000499 wd 0.0500 time 0.4440 (0.4486) data time 0.0007 (0.0047) model time 0.4434 (0.4439) loss 2.6911 (2.8977) grad_norm 1.6458 (2.1294) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][110/625] eta 0:03:51 lr 0.000499 wd 0.0500 time 0.4420 (0.4495) data time 0.0009 (0.0044) model time 0.4412 (0.4462) loss 2.8932 (2.8933) grad_norm 1.2047 (2.0854) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][120/625] eta 0:03:48 lr 0.000499 wd 0.0500 time 0.4408 (0.4518) data time 0.0009 (0.0041) model time 0.4400 (0.4505) loss 2.8774 (2.9161) grad_norm 2.8492 (2.0625) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][130/625] eta 0:03:44 lr 0.000499 wd 0.0500 time 0.4419 (0.4527) data time 0.0006 (0.0039) model time 0.4412 (0.4521) loss 3.2955 (2.9089) grad_norm 1.6691 (2.0378) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][140/625] eta 0:03:39 lr 0.000498 wd 0.0500 time 0.4430 (0.4531) data time 0.0009 (0.0036) model time 0.4421 (0.4526) loss 3.4142 (2.9123) grad_norm 1.8097 (2.0056) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][150/625] eta 0:03:34 lr 0.000498 wd 0.0500 time 0.4434 (0.4524) data time 0.0009 (0.0035) model time 0.4426 (0.4516) loss 3.0092 (2.8992) grad_norm 1.3957 (1.9785) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][160/625] eta 0:03:30 lr 0.000498 wd 0.0500 time 0.4434 (0.4519) data time 0.0009 (0.0033) model time 0.4425 (0.4508) loss 2.7793 (2.8970) grad_norm 2.7417 (1.9878) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][170/625] eta 0:03:25 lr 0.000498 wd 0.0500 time 0.4480 (0.4524) data time 0.0006 (0.0032) model time 0.4474 (0.4515) loss 2.0834 (2.8804) grad_norm 2.0392 (1.9833) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][180/625] eta 0:03:21 lr 0.000498 wd 0.0500 time 0.4412 (0.4519) data time 0.0006 (0.0031) model time 0.4406 (0.4508) loss 2.2102 (2.8852) grad_norm 1.5158 (1.9716) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][190/625] eta 0:03:16 lr 0.000498 wd 0.0500 time 0.4434 (0.4514) data time 0.0007 (0.0029) model time 0.4427 (0.4502) loss 2.6525 (2.8861) grad_norm 2.5292 (2.0446) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][200/625] eta 0:03:11 lr 0.000498 wd 0.0500 time 0.4451 (0.4511) data time 0.0006 (0.0028) model time 0.4445 (0.4497) loss 2.9742 (2.8700) grad_norm 1.4770 (2.0347) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][210/625] eta 0:03:07 lr 0.000498 wd 0.0500 time 0.4471 (0.4507) data time 0.0008 (0.0028) model time 0.4463 (0.4493) loss 3.0366 (2.8757) grad_norm 2.3596 (2.0364) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][220/625] eta 0:03:02 lr 0.000498 wd 0.0500 time 0.4421 (0.4507) data time 0.0006 (0.0027) model time 0.4415 (0.4493) loss 2.2805 (2.8700) grad_norm 2.3061 (2.0464) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][230/625] eta 0:02:57 lr 0.000498 wd 0.0500 time 0.4445 (0.4505) data time 0.0006 (0.0026) model time 0.4438 (0.4490) loss 3.3118 (2.8751) grad_norm 1.6476 (2.0440) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][240/625] eta 0:02:53 lr 0.000497 wd 0.0500 time 0.4447 (0.4503) data time 0.0007 (0.0025) model time 0.4440 (0.4488) loss 3.4491 (2.8794) grad_norm 1.7894 (2.0334) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][250/625] eta 0:02:48 lr 0.000497 wd 0.0500 time 0.4439 (0.4500) data time 0.0006 (0.0025) model time 0.4433 (0.4486) loss 3.3522 (2.8793) grad_norm 1.5062 (2.0120) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][260/625] eta 0:02:44 lr 0.000497 wd 0.0500 time 0.4444 (0.4498) data time 0.0009 (0.0024) model time 0.4435 (0.4483) loss 3.1801 (2.8837) grad_norm 1.5807 (2.0006) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][270/625] eta 0:02:39 lr 0.000497 wd 0.0500 time 0.4423 (0.4496) data time 0.0007 (0.0023) model time 0.4416 (0.4480) loss 3.0797 (2.8798) grad_norm 1.5374 (1.9917) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][280/625] eta 0:02:35 lr 0.000497 wd 0.0500 time 0.4440 (0.4494) data time 0.0010 (0.0023) model time 0.4430 (0.4478) loss 3.0048 (2.8710) grad_norm 1.3732 (1.9790) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][290/625] eta 0:02:30 lr 0.000497 wd 0.0500 time 0.4425 (0.4492) data time 0.0009 (0.0022) model time 0.4416 (0.4476) loss 2.6788 (2.8620) grad_norm 2.1986 (2.2056) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][300/625] eta 0:02:25 lr 0.000497 wd 0.0500 time 0.4435 (0.4490) data time 0.0007 (0.0022) model time 0.4428 (0.4474) loss 3.1678 (2.8533) grad_norm 2.0390 (2.1923) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][310/625] eta 0:02:21 lr 0.000497 wd 0.0500 time 0.4441 (0.4493) data time 0.0007 (0.0022) model time 0.4434 (0.4478) loss 2.1244 (2.8489) grad_norm 1.6246 (2.1771) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][320/625] eta 0:02:16 lr 0.000497 wd 0.0500 time 0.4408 (0.4491) data time 0.0010 (0.0021) model time 0.4398 (0.4476) loss 2.1132 (2.8498) grad_norm 1.4502 (2.1611) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][330/625] eta 0:02:12 lr 0.000496 wd 0.0500 time 0.4419 (0.4489) data time 0.0010 (0.0021) model time 0.4410 (0.4474) loss 3.1140 (2.8519) grad_norm 2.4325 (2.1784) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][340/625] eta 0:02:07 lr 0.000496 wd 0.0500 time 0.4520 (0.4488) data time 0.0009 (0.0021) model time 0.4512 (0.4472) loss 2.8425 (2.8544) grad_norm 3.1248 (2.1788) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][350/625] eta 0:02:03 lr 0.000496 wd 0.0500 time 0.4433 (0.4496) data time 0.0009 (0.0020) model time 0.4424 (0.4482) loss 3.0748 (2.8591) grad_norm 2.1417 (2.1717) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][360/625] eta 0:01:59 lr 0.000496 wd 0.0500 time 0.4428 (0.4494) data time 0.0008 (0.0020) model time 0.4420 (0.4480) loss 2.1523 (2.8586) grad_norm 1.6819 (2.1677) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:09:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][370/625] eta 0:01:54 lr 0.000496 wd 0.0500 time 0.4474 (0.4493) data time 0.0009 (0.0020) model time 0.4466 (0.4478) loss 2.7676 (2.8604) grad_norm 1.5055 (2.1599) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][380/625] eta 0:01:50 lr 0.000496 wd 0.0500 time 0.4462 (0.4491) data time 0.0006 (0.0019) model time 0.4455 (0.4477) loss 2.7111 (2.8527) grad_norm 3.8007 (2.1834) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][390/625] eta 0:01:45 lr 0.000496 wd 0.0500 time 0.4480 (0.4492) data time 0.0008 (0.0019) model time 0.4471 (0.4478) loss 3.0715 (2.8593) grad_norm 1.5308 (2.1795) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][400/625] eta 0:01:41 lr 0.000496 wd 0.0500 time 0.4424 (0.4491) data time 0.0007 (0.0019) model time 0.4417 (0.4477) loss 3.1817 (2.8600) grad_norm 2.5846 (2.2052) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][410/625] eta 0:01:36 lr 0.000496 wd 0.0500 time 0.4428 (0.4489) data time 0.0009 (0.0019) model time 0.4420 (0.4475) loss 2.4018 (2.8661) grad_norm 3.8549 (2.2029) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][420/625] eta 0:01:31 lr 0.000496 wd 0.0500 time 0.4400 (0.4487) data time 0.0009 (0.0018) model time 0.4390 (0.4473) loss 3.3302 (2.8677) grad_norm 1.7203 (2.1939) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][430/625] eta 0:01:27 lr 0.000495 wd 0.0500 time 0.4388 (0.4486) data time 0.0009 (0.0018) model time 0.4379 (0.4472) loss 2.1605 (2.8627) grad_norm 1.4906 (2.1931) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][440/625] eta 0:01:23 lr 0.000495 wd 0.0500 time 0.4417 (0.4492) data time 0.0009 (0.0018) model time 0.4409 (0.4478) loss 2.4242 (2.8660) grad_norm 1.5039 (2.1867) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][450/625] eta 0:01:18 lr 0.000495 wd 0.0500 time 0.6368 (0.4496) data time 0.0009 (0.0018) model time 0.6359 (0.4483) loss 3.0499 (2.8668) grad_norm 4.0653 (2.2173) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][460/625] eta 0:01:14 lr 0.000495 wd 0.0500 time 0.4449 (0.4498) data time 0.0008 (0.0018) model time 0.4441 (0.4486) loss 3.0900 (2.8651) grad_norm 1.5204 (2.2123) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][470/625] eta 0:01:09 lr 0.000495 wd 0.0500 time 0.6275 (0.4501) data time 0.0009 (0.0018) model time 0.6266 (0.4489) loss 3.2159 (2.8613) grad_norm 1.6633 (2.2175) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][480/625] eta 0:01:05 lr 0.000495 wd 0.0500 time 0.4505 (0.4499) data time 0.0006 (0.0017) model time 0.4498 (0.4487) loss 1.9963 (2.8566) grad_norm 3.4105 (2.2259) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][490/625] eta 0:01:00 lr 0.000495 wd 0.0500 time 0.4395 (0.4498) data time 0.0008 (0.0017) model time 0.4387 (0.4486) loss 2.9084 (2.8531) grad_norm 1.7774 (2.2216) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:10:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][500/625] eta 0:00:56 lr 0.000495 wd 0.0500 time 0.4441 (0.4504) data time 0.0006 (0.0017) model time 0.4435 (0.4493) loss 3.2223 (2.8572) grad_norm 1.2222 (2.2107) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][510/625] eta 0:00:51 lr 0.000495 wd 0.0500 time 0.4427 (0.4503) data time 0.0008 (0.0017) model time 0.4419 (0.4491) loss 2.0880 (2.8525) grad_norm 1.4249 (2.2169) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][520/625] eta 0:00:47 lr 0.000494 wd 0.0500 time 0.4405 (0.4501) data time 0.0006 (0.0017) model time 0.4398 (0.4490) loss 2.8040 (2.8575) grad_norm 1.2122 (2.2083) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][530/625] eta 0:00:42 lr 0.000494 wd 0.0500 time 0.6140 (0.4503) data time 0.0008 (0.0017) model time 0.6132 (0.4492) loss 3.3197 (2.8559) grad_norm 1.6514 (2.2019) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][540/625] eta 0:00:38 lr 0.000494 wd 0.0500 time 0.4407 (0.4502) data time 0.0006 (0.0016) model time 0.4400 (0.4490) loss 2.7990 (2.8541) grad_norm 2.5246 (2.2035) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][550/625] eta 0:00:33 lr 0.000494 wd 0.0500 time 0.4405 (0.4500) data time 0.0009 (0.0016) model time 0.4396 (0.4489) loss 2.6344 (2.8552) grad_norm 1.6723 (2.1970) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][560/625] eta 0:00:29 lr 0.000494 wd 0.0500 time 0.4447 (0.4499) data time 0.0007 (0.0016) model time 0.4440 (0.4487) loss 1.8013 (2.8535) grad_norm 1.6995 (2.1938) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][570/625] eta 0:00:24 lr 0.000494 wd 0.0500 time 0.4436 (0.4498) data time 0.0008 (0.0016) model time 0.4428 (0.4486) loss 3.1157 (2.8537) grad_norm 1.8341 (2.1864) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][580/625] eta 0:00:20 lr 0.000494 wd 0.0500 time 0.4416 (0.4497) data time 0.0006 (0.0016) model time 0.4410 (0.4485) loss 2.3592 (2.8524) grad_norm 1.3350 (2.1823) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][590/625] eta 0:00:15 lr 0.000494 wd 0.0500 time 0.4425 (0.4498) data time 0.0009 (0.0016) model time 0.4417 (0.4487) loss 2.4063 (2.8522) grad_norm 1.3474 (2.1734) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][600/625] eta 0:00:11 lr 0.000494 wd 0.0500 time 0.4412 (0.4497) data time 0.0008 (0.0016) model time 0.4404 (0.4485) loss 3.2148 (2.8523) grad_norm 1.3861 (2.1617) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][610/625] eta 0:00:06 lr 0.000494 wd 0.0500 time 0.4372 (0.4496) data time 0.0005 (0.0016) model time 0.4367 (0.4484) loss 2.6333 (2.8535) grad_norm 1.5208 (2.1567) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [176/300][620/625] eta 0:00:02 lr 0.000493 wd 0.0500 time 0.4360 (0.4494) data time 0.0006 (0.0016) model time 0.4353 (0.4482) loss 2.8025 (2.8573) grad_norm 2.0110 (2.1606) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 176 training takes 0:04:40 [2024-08-10 16:11:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:11:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 0.5249 (0.5249) Acc@1 88.574 (88.574) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-10 16:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.153) Loss 0.8374 (0.6413) Acc@1 80.713 (86.088) Acc@5 95.605 (97.692) Mem 16699MB [2024-08-10 16:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9434 (0.7580) Acc@1 77.783 (83.124) Acc@5 94.678 (96.440) Mem 16699MB [2024-08-10 16:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.762 Acc@5 96.449 [2024-08-10 16:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 16:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.4688 (0.4688) Acc@1 89.307 (89.307) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-10 16:11:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.190) Loss 0.7612 (0.5888) Acc@1 81.250 (87.154) Acc@5 96.240 (97.923) Mem 16699MB [2024-08-10 16:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.155) Loss 0.8452 (0.6905) Acc@1 78.955 (84.408) Acc@5 95.850 (96.938) Mem 16699MB [2024-08-10 16:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.099 Acc@5 96.939 [2024-08-10 16:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-10 16:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.10% [2024-08-10 16:12:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:12:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][0/625] eta 0:07:49 lr 0.000493 wd 0.0500 time 0.7505 (0.7505) data time 0.3606 (0.3606) model time 0.0000 (0.0000) loss 1.8812 (1.8812) grad_norm 1.9259 (1.9259) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][10/625] eta 0:04:57 lr 0.000493 wd 0.0500 time 0.4393 (0.4845) data time 0.0008 (0.0336) model time 0.0000 (0.0000) loss 2.0230 (2.5261) grad_norm 19.4252 (3.5153) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][20/625] eta 0:04:40 lr 0.000493 wd 0.0500 time 0.4425 (0.4644) data time 0.0006 (0.0180) model time 0.0000 (0.0000) loss 1.9166 (2.6754) grad_norm 1.5409 (2.7718) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][30/625] eta 0:04:35 lr 0.000493 wd 0.0500 time 0.4429 (0.4638) data time 0.0008 (0.0125) model time 0.0000 (0.0000) loss 3.2144 (2.6614) grad_norm 1.7728 (2.4346) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][40/625] eta 0:04:30 lr 0.000493 wd 0.0500 time 0.4449 (0.4624) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 3.2050 (2.6403) grad_norm 1.6664 (2.4176) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][50/625] eta 0:04:23 lr 0.000493 wd 0.0500 time 0.4383 (0.4583) data time 0.0008 (0.0080) model time 0.0000 (0.0000) loss 3.0563 (2.7097) grad_norm 1.7895 (2.5809) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][60/625] eta 0:04:17 lr 0.000493 wd 0.0500 time 0.4391 (0.4554) data time 0.0006 (0.0068) model time 0.4385 (0.4400) loss 2.9020 (2.6920) grad_norm 1.2429 (2.4702) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][70/625] eta 0:04:13 lr 0.000493 wd 0.0500 time 0.4422 (0.4560) data time 0.0008 (0.0060) model time 0.4414 (0.4494) loss 2.2562 (2.7102) grad_norm 2.5389 (2.3749) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][80/625] eta 0:04:07 lr 0.000493 wd 0.0500 time 0.4431 (0.4542) data time 0.0009 (0.0053) model time 0.4421 (0.4466) loss 2.7090 (2.7373) grad_norm 2.2608 (2.3047) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][90/625] eta 0:04:03 lr 0.000492 wd 0.0500 time 0.4427 (0.4552) data time 0.0008 (0.0048) model time 0.4419 (0.4506) loss 1.6239 (2.7368) grad_norm 1.4080 (2.2559) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][100/625] eta 0:03:58 lr 0.000492 wd 0.0500 time 0.4401 (0.4540) data time 0.0009 (0.0045) model time 0.4392 (0.4487) loss 2.7205 (2.7397) grad_norm 2.2306 (2.2102) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][110/625] eta 0:03:54 lr 0.000492 wd 0.0500 time 0.4398 (0.4545) data time 0.0011 (0.0042) model time 0.4387 (0.4503) loss 2.7691 (2.7447) grad_norm 1.3571 (2.1522) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][120/625] eta 0:03:49 lr 0.000492 wd 0.0500 time 0.4424 (0.4535) data time 0.0007 (0.0039) model time 0.4417 (0.4490) loss 3.0657 (2.7556) grad_norm 1.6422 (2.0923) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][130/625] eta 0:03:44 lr 0.000492 wd 0.0500 time 0.4422 (0.4538) data time 0.0008 (0.0037) model time 0.4414 (0.4500) loss 2.7685 (2.7577) grad_norm 1.1430 (2.0456) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][140/625] eta 0:03:39 lr 0.000492 wd 0.0500 time 0.4405 (0.4529) data time 0.0011 (0.0035) model time 0.4394 (0.4490) loss 2.7201 (2.7563) grad_norm 1.5955 (2.0571) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][150/625] eta 0:03:34 lr 0.000492 wd 0.0500 time 0.4422 (0.4522) data time 0.0007 (0.0033) model time 0.4415 (0.4481) loss 3.1079 (2.7545) grad_norm 1.8989 (2.0497) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][160/625] eta 0:03:29 lr 0.000492 wd 0.0500 time 0.4398 (0.4515) data time 0.0008 (0.0032) model time 0.4390 (0.4474) loss 2.6922 (2.7659) grad_norm 3.2965 (2.0668) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][170/625] eta 0:03:25 lr 0.000492 wd 0.0500 time 0.4443 (0.4509) data time 0.0009 (0.0031) model time 0.4434 (0.4469) loss 3.2893 (2.7704) grad_norm 1.8102 (2.0564) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][180/625] eta 0:03:20 lr 0.000492 wd 0.0500 time 0.4417 (0.4506) data time 0.0009 (0.0029) model time 0.4408 (0.4466) loss 2.6478 (2.7708) grad_norm 1.4985 (2.0416) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][190/625] eta 0:03:15 lr 0.000491 wd 0.0500 time 0.4429 (0.4501) data time 0.0009 (0.0028) model time 0.4420 (0.4462) loss 2.6326 (2.7752) grad_norm 2.1603 (2.0252) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][200/625] eta 0:03:11 lr 0.000491 wd 0.0500 time 0.4454 (0.4506) data time 0.0007 (0.0028) model time 0.4447 (0.4471) loss 3.6277 (2.7859) grad_norm 3.1647 (2.0107) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][210/625] eta 0:03:06 lr 0.000491 wd 0.0500 time 0.4400 (0.4502) data time 0.0010 (0.0027) model time 0.4391 (0.4467) loss 2.7560 (2.7924) grad_norm 3.2072 (2.0134) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][220/625] eta 0:03:02 lr 0.000491 wd 0.0500 time 0.4426 (0.4505) data time 0.0006 (0.0026) model time 0.4420 (0.4472) loss 3.4052 (2.8006) grad_norm 1.9407 (2.0044) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][230/625] eta 0:02:57 lr 0.000491 wd 0.0500 time 0.4419 (0.4502) data time 0.0009 (0.0026) model time 0.4409 (0.4469) loss 3.1335 (2.7926) grad_norm 1.6938 (2.0025) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][240/625] eta 0:02:53 lr 0.000491 wd 0.0500 time 0.4389 (0.4498) data time 0.0007 (0.0025) model time 0.4383 (0.4465) loss 3.0775 (2.7971) grad_norm 1.4092 (1.9989) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][250/625] eta 0:02:48 lr 0.000491 wd 0.0500 time 0.4437 (0.4496) data time 0.0006 (0.0024) model time 0.4431 (0.4463) loss 2.9071 (2.8072) grad_norm 1.8732 (2.0174) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:13:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][260/625] eta 0:02:44 lr 0.000491 wd 0.0500 time 0.4430 (0.4496) data time 0.0009 (0.0024) model time 0.4421 (0.4465) loss 3.3612 (2.8133) grad_norm 2.4919 (2.0085) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][270/625] eta 0:02:39 lr 0.000491 wd 0.0500 time 0.4421 (0.4494) data time 0.0006 (0.0023) model time 0.4414 (0.4463) loss 3.2689 (2.8063) grad_norm 1.2377 (1.9994) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][280/625] eta 0:02:34 lr 0.000490 wd 0.0500 time 0.4461 (0.4492) data time 0.0007 (0.0023) model time 0.4454 (0.4462) loss 2.1487 (2.8024) grad_norm 1.8964 (1.9976) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][290/625] eta 0:02:30 lr 0.000490 wd 0.0500 time 0.4395 (0.4490) data time 0.0010 (0.0022) model time 0.4385 (0.4461) loss 3.2559 (2.8113) grad_norm 1.4422 (1.9937) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][300/625] eta 0:02:25 lr 0.000490 wd 0.0500 time 0.4441 (0.4488) data time 0.0006 (0.0022) model time 0.4435 (0.4459) loss 2.1865 (2.8159) grad_norm 2.1542 (1.9902) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][310/625] eta 0:02:21 lr 0.000490 wd 0.0500 time 0.4402 (0.4487) data time 0.0010 (0.0021) model time 0.4392 (0.4458) loss 2.9958 (2.8226) grad_norm 1.6528 (1.9847) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][320/625] eta 0:02:16 lr 0.000490 wd 0.0500 time 0.4426 (0.4485) data time 0.0006 (0.0021) model time 0.4420 (0.4457) loss 3.5815 (2.8226) grad_norm 1.6875 (1.9706) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][330/625] eta 0:02:12 lr 0.000490 wd 0.0500 time 0.4434 (0.4484) data time 0.0006 (0.0021) model time 0.4428 (0.4456) loss 1.6136 (2.8229) grad_norm 1.8875 (1.9749) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][340/625] eta 0:02:07 lr 0.000490 wd 0.0500 time 0.4453 (0.4483) data time 0.0008 (0.0020) model time 0.4445 (0.4455) loss 3.0945 (2.8266) grad_norm 4.7339 (1.9815) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][350/625] eta 0:02:03 lr 0.000490 wd 0.0500 time 0.4415 (0.4481) data time 0.0007 (0.0020) model time 0.4408 (0.4454) loss 3.3975 (2.8308) grad_norm 1.5679 (1.9859) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][360/625] eta 0:01:58 lr 0.000490 wd 0.0500 time 0.4424 (0.4480) data time 0.0006 (0.0020) model time 0.4418 (0.4453) loss 2.0411 (2.8250) grad_norm 1.7936 (1.9820) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][370/625] eta 0:01:54 lr 0.000490 wd 0.0500 time 0.4451 (0.4479) data time 0.0007 (0.0019) model time 0.4444 (0.4453) loss 3.4350 (2.8225) grad_norm 1.6526 (1.9798) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][380/625] eta 0:01:49 lr 0.000489 wd 0.0500 time 0.4403 (0.4482) data time 0.0017 (0.0019) model time 0.4386 (0.4456) loss 3.3303 (2.8180) grad_norm 2.3914 (1.9672) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:14:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][390/625] eta 0:01:45 lr 0.000489 wd 0.0500 time 0.4439 (0.4481) data time 0.0006 (0.0019) model time 0.4433 (0.4455) loss 2.4369 (2.8130) grad_norm 2.5068 (1.9653) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][400/625] eta 0:01:40 lr 0.000489 wd 0.0500 time 0.4415 (0.4480) data time 0.0009 (0.0019) model time 0.4406 (0.4455) loss 3.4571 (2.8093) grad_norm 1.2813 (1.9593) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][410/625] eta 0:01:36 lr 0.000489 wd 0.0500 time 0.4423 (0.4479) data time 0.0009 (0.0018) model time 0.4415 (0.4454) loss 2.8872 (2.8065) grad_norm 1.4544 (1.9540) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][420/625] eta 0:01:31 lr 0.000489 wd 0.0500 time 0.4416 (0.4479) data time 0.0008 (0.0018) model time 0.4408 (0.4455) loss 2.4832 (2.8090) grad_norm 1.9350 (1.9511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][430/625] eta 0:01:27 lr 0.000489 wd 0.0500 time 0.4438 (0.4483) data time 0.0006 (0.0018) model time 0.4431 (0.4460) loss 2.6642 (2.8128) grad_norm 1.3823 (1.9409) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][440/625] eta 0:01:23 lr 0.000489 wd 0.0500 time 0.4437 (0.4490) data time 0.0009 (0.0018) model time 0.4428 (0.4468) loss 3.1447 (2.8165) grad_norm 2.0880 (1.9397) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][450/625] eta 0:01:18 lr 0.000489 wd 0.0500 time 0.4434 (0.4488) data time 0.0008 (0.0018) model time 0.4425 (0.4467) loss 2.5344 (2.8179) grad_norm 1.3133 (1.9345) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][460/625] eta 0:01:14 lr 0.000489 wd 0.0500 time 0.4431 (0.4487) data time 0.0008 (0.0017) model time 0.4423 (0.4465) loss 2.6117 (2.8199) grad_norm 4.1308 (1.9370) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][470/625] eta 0:01:09 lr 0.000488 wd 0.0500 time 0.4439 (0.4489) data time 0.0009 (0.0017) model time 0.4430 (0.4468) loss 2.7594 (2.8184) grad_norm 2.1550 (1.9518) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][480/625] eta 0:01:05 lr 0.000488 wd 0.0500 time 0.4433 (0.4491) data time 0.0009 (0.0017) model time 0.4424 (0.4470) loss 2.2500 (2.8111) grad_norm 1.5566 (1.9512) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][490/625] eta 0:01:00 lr 0.000488 wd 0.0500 time 0.4409 (0.4490) data time 0.0008 (0.0017) model time 0.4401 (0.4469) loss 1.9602 (2.8102) grad_norm 1.7828 (1.9937) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][500/625] eta 0:00:56 lr 0.000488 wd 0.0500 time 0.4422 (0.4488) data time 0.0007 (0.0017) model time 0.4415 (0.4468) loss 2.2777 (2.8104) grad_norm 1.4072 (1.9910) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][510/625] eta 0:00:51 lr 0.000488 wd 0.0500 time 0.4441 (0.4487) data time 0.0009 (0.0017) model time 0.4432 (0.4467) loss 2.8435 (2.8108) grad_norm 1.3673 (1.9967) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:15:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][520/625] eta 0:00:47 lr 0.000488 wd 0.0500 time 0.4432 (0.4486) data time 0.0007 (0.0016) model time 0.4426 (0.4466) loss 3.2523 (2.8110) grad_norm 1.5231 (1.9969) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][530/625] eta 0:00:42 lr 0.000488 wd 0.0500 time 0.4401 (0.4485) data time 0.0007 (0.0016) model time 0.4395 (0.4465) loss 3.6251 (2.8140) grad_norm 3.6647 (1.9969) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][540/625] eta 0:00:38 lr 0.000488 wd 0.0500 time 0.4470 (0.4484) data time 0.0009 (0.0016) model time 0.4462 (0.4464) loss 2.9647 (2.8175) grad_norm 1.2484 (1.9878) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][550/625] eta 0:00:33 lr 0.000488 wd 0.0500 time 0.4359 (0.4482) data time 0.0006 (0.0016) model time 0.4352 (0.4463) loss 3.4502 (2.8191) grad_norm 9.3580 (2.0086) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][560/625] eta 0:00:29 lr 0.000488 wd 0.0500 time 0.5750 (0.4484) data time 0.0006 (0.0016) model time 0.5743 (0.4465) loss 3.5383 (2.8236) grad_norm 1.7583 (2.0077) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][570/625] eta 0:00:24 lr 0.000487 wd 0.0500 time 0.4404 (0.4486) data time 0.0007 (0.0016) model time 0.4397 (0.4467) loss 3.4732 (2.8236) grad_norm 1.2734 (2.0183) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][580/625] eta 0:00:20 lr 0.000487 wd 0.0500 time 0.4392 (0.4485) data time 0.0008 (0.0016) model time 0.4384 (0.4466) loss 3.2746 (2.8269) grad_norm 1.5596 (2.0211) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][590/625] eta 0:00:15 lr 0.000487 wd 0.0500 time 0.4424 (0.4486) data time 0.0009 (0.0016) model time 0.4414 (0.4468) loss 3.1481 (2.8290) grad_norm 1.4863 (2.0178) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][600/625] eta 0:00:11 lr 0.000487 wd 0.0500 time 0.4433 (0.4486) data time 0.0009 (0.0015) model time 0.4423 (0.4467) loss 2.7824 (2.8308) grad_norm 2.1638 (2.0147) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][610/625] eta 0:00:06 lr 0.000487 wd 0.0500 time 0.4392 (0.4485) data time 0.0006 (0.0015) model time 0.4386 (0.4466) loss 2.9932 (2.8328) grad_norm 1.3586 (2.0066) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [177/300][620/625] eta 0:00:02 lr 0.000487 wd 0.0500 time 0.4391 (0.4488) data time 0.0007 (0.0015) model time 0.4384 (0.4470) loss 2.4081 (2.8327) grad_norm 1.9383 (2.0024) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:42 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 177 training takes 0:04:40 [2024-08-10 16:16:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:16:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.480 (0.480) Loss 0.5322 (0.5322) Acc@1 87.256 (87.256) Acc@5 98.535 (98.535) Mem 16699MB [2024-08-10 16:16:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8740 (0.6329) Acc@1 79.004 (85.977) Acc@5 94.678 (97.510) Mem 16699MB [2024-08-10 16:16:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9287 (0.7461) Acc@1 77.197 (82.915) Acc@5 94.775 (96.403) Mem 16699MB [2024-08-10 16:16:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.650 Acc@5 96.407 [2024-08-10 16:16:47 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-10 16:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.796 (0.796) Loss 0.4690 (0.4690) Acc@1 89.453 (89.453) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 16:16:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.185) Loss 0.7612 (0.5879) Acc@1 81.152 (87.136) Acc@5 96.289 (97.949) Mem 16699MB [2024-08-10 16:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.152) Loss 0.8438 (0.6896) Acc@1 79.053 (84.398) Acc@5 95.850 (96.966) Mem 16699MB [2024-08-10 16:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.079 Acc@5 96.967 [2024-08-10 16:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-10 16:16:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][0/625] eta 0:12:31 lr 0.000487 wd 0.0500 time 1.2019 (1.2019) data time 0.7293 (0.7293) model time 0.0000 (0.0000) loss 3.1794 (3.1794) grad_norm 2.1345 (2.1345) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:16:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][10/625] eta 0:05:22 lr 0.000487 wd 0.0500 time 0.4375 (0.5248) data time 0.0007 (0.0672) model time 0.0000 (0.0000) loss 1.5990 (2.5989) grad_norm 1.3215 (1.7922) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][20/625] eta 0:04:53 lr 0.000487 wd 0.0500 time 0.4419 (0.4854) data time 0.0008 (0.0356) model time 0.0000 (0.0000) loss 3.0507 (2.7332) grad_norm 1.5851 (1.7319) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][30/625] eta 0:04:40 lr 0.000487 wd 0.0500 time 0.4396 (0.4714) data time 0.0007 (0.0244) model time 0.0000 (0.0000) loss 3.3166 (2.8236) grad_norm 1.5434 (1.7087) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][40/625] eta 0:04:31 lr 0.000486 wd 0.0500 time 0.4412 (0.4645) data time 0.0006 (0.0187) model time 0.0000 (0.0000) loss 3.1588 (2.7983) grad_norm 4.0563 (1.8207) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][50/625] eta 0:04:26 lr 0.000486 wd 0.0500 time 0.4453 (0.4636) data time 0.0010 (0.0152) model time 0.0000 (0.0000) loss 2.7811 (2.8140) grad_norm 1.7517 (1.8511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][60/625] eta 0:04:20 lr 0.000486 wd 0.0500 time 0.4413 (0.4605) data time 0.0009 (0.0129) model time 0.4404 (0.4436) loss 2.8980 (2.8169) grad_norm 1.4858 (1.8272) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][70/625] eta 0:04:14 lr 0.000486 wd 0.0500 time 0.4431 (0.4581) data time 0.0006 (0.0112) model time 0.4425 (0.4434) loss 3.0106 (2.8386) grad_norm 2.1946 (1.8815) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][80/625] eta 0:04:08 lr 0.000486 wd 0.0500 time 0.4425 (0.4563) data time 0.0006 (0.0099) model time 0.4418 (0.4431) loss 3.2954 (2.8541) grad_norm 2.7190 (1.8678) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][90/625] eta 0:04:04 lr 0.000486 wd 0.0500 time 0.4443 (0.4565) data time 0.0009 (0.0089) model time 0.4434 (0.4467) loss 3.3085 (2.8374) grad_norm 1.6939 (1.8589) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][100/625] eta 0:03:59 lr 0.000486 wd 0.0500 time 0.4440 (0.4553) data time 0.0008 (0.0081) model time 0.4432 (0.4459) loss 3.0666 (2.8312) grad_norm 1.6724 (1.8596) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][110/625] eta 0:03:53 lr 0.000486 wd 0.0500 time 0.4432 (0.4541) data time 0.0008 (0.0075) model time 0.4424 (0.4452) loss 2.0137 (2.8245) grad_norm 1.6611 (1.8611) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][120/625] eta 0:03:49 lr 0.000486 wd 0.0500 time 0.4431 (0.4547) data time 0.0008 (0.0069) model time 0.4423 (0.4473) loss 2.4220 (2.8425) grad_norm 1.3551 (1.8839) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][130/625] eta 0:03:45 lr 0.000485 wd 0.0500 time 0.4405 (0.4554) data time 0.0006 (0.0065) model time 0.4399 (0.4493) loss 3.3373 (2.8336) grad_norm 2.3988 (1.8946) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][140/625] eta 0:03:41 lr 0.000485 wd 0.0500 time 0.4499 (0.4560) data time 0.0009 (0.0061) model time 0.4490 (0.4508) loss 3.0751 (2.8276) grad_norm 1.5928 (1.8865) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][150/625] eta 0:03:36 lr 0.000485 wd 0.0500 time 0.4435 (0.4551) data time 0.0006 (0.0057) model time 0.4429 (0.4498) loss 3.3754 (2.8222) grad_norm 1.4737 (1.8643) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][160/625] eta 0:03:31 lr 0.000485 wd 0.0500 time 0.4392 (0.4543) data time 0.0009 (0.0054) model time 0.4384 (0.4491) loss 2.6024 (2.8176) grad_norm 1.2235 (1.8496) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][170/625] eta 0:03:26 lr 0.000485 wd 0.0500 time 0.4414 (0.4536) data time 0.0006 (0.0052) model time 0.4408 (0.4485) loss 2.5402 (2.8181) grad_norm 1.8078 (1.8357) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][180/625] eta 0:03:21 lr 0.000485 wd 0.0500 time 0.4424 (0.4530) data time 0.0009 (0.0049) model time 0.4415 (0.4479) loss 1.6086 (2.8059) grad_norm 2.8127 (1.8619) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][190/625] eta 0:03:16 lr 0.000485 wd 0.0500 time 0.4427 (0.4529) data time 0.0008 (0.0047) model time 0.4419 (0.4481) loss 3.1036 (2.8215) grad_norm 1.8235 (1.8855) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][200/625] eta 0:03:12 lr 0.000485 wd 0.0500 time 0.4423 (0.4532) data time 0.0006 (0.0045) model time 0.4417 (0.4488) loss 1.4757 (2.8097) grad_norm 2.3557 (1.8765) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][210/625] eta 0:03:07 lr 0.000485 wd 0.0500 time 0.4420 (0.4528) data time 0.0006 (0.0044) model time 0.4413 (0.4484) loss 3.0243 (2.8093) grad_norm 1.7084 (1.8721) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][220/625] eta 0:03:03 lr 0.000485 wd 0.0500 time 0.4472 (0.4524) data time 0.0006 (0.0042) model time 0.4466 (0.4481) loss 3.2995 (2.8206) grad_norm 2.3828 (1.8734) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:18:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][230/625] eta 0:02:58 lr 0.000484 wd 0.0500 time 0.4429 (0.4519) data time 0.0006 (0.0041) model time 0.4423 (0.4478) loss 1.8521 (2.8208) grad_norm 2.5445 (1.8918) loss_scale 512.0000 (260.4329) mem 16699MB [2024-08-10 16:18:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][240/625] eta 0:02:53 lr 0.000484 wd 0.0500 time 0.4402 (0.4517) data time 0.0009 (0.0039) model time 0.4393 (0.4476) loss 3.1707 (2.8194) grad_norm 1.5103 (1.8778) loss_scale 512.0000 (270.8714) mem 16699MB [2024-08-10 16:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][250/625] eta 0:02:49 lr 0.000484 wd 0.0500 time 0.4396 (0.4513) data time 0.0007 (0.0038) model time 0.4389 (0.4473) loss 2.2796 (2.8189) grad_norm 1.7039 (1.8703) loss_scale 512.0000 (280.4781) mem 16699MB [2024-08-10 16:18:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][260/625] eta 0:02:44 lr 0.000484 wd 0.0500 time 0.4428 (0.4510) data time 0.0006 (0.0037) model time 0.4422 (0.4471) loss 3.0468 (2.8216) grad_norm 1.6510 (1.8695) loss_scale 512.0000 (289.3487) mem 16699MB [2024-08-10 16:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][270/625] eta 0:02:40 lr 0.000484 wd 0.0500 time 0.4424 (0.4508) data time 0.0008 (0.0036) model time 0.4416 (0.4469) loss 3.3750 (2.8210) grad_norm 2.1187 (1.8719) loss_scale 512.0000 (297.5646) mem 16699MB [2024-08-10 16:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][280/625] eta 0:02:35 lr 0.000484 wd 0.0500 time 0.4418 (0.4506) data time 0.0008 (0.0035) model time 0.4410 (0.4468) loss 2.4705 (2.8169) grad_norm 1.8345 (1.8793) loss_scale 512.0000 (305.1957) mem 16699MB [2024-08-10 16:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][290/625] eta 0:02:30 lr 0.000484 wd 0.0500 time 0.4453 (0.4504) data time 0.0008 (0.0034) model time 0.4445 (0.4467) loss 2.9155 (2.8297) grad_norm 2.3922 (1.9008) loss_scale 512.0000 (312.3024) mem 16699MB [2024-08-10 16:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][300/625] eta 0:02:26 lr 0.000484 wd 0.0500 time 0.4417 (0.4502) data time 0.0008 (0.0033) model time 0.4409 (0.4466) loss 3.2560 (2.8308) grad_norm 1.7792 (1.8962) loss_scale 512.0000 (318.9369) mem 16699MB [2024-08-10 16:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][310/625] eta 0:02:21 lr 0.000484 wd 0.0500 time 0.4440 (0.4500) data time 0.0009 (0.0032) model time 0.4431 (0.4464) loss 3.2003 (2.8315) grad_norm 1.4454 (1.8901) loss_scale 512.0000 (325.1447) mem 16699MB [2024-08-10 16:19:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][320/625] eta 0:02:17 lr 0.000484 wd 0.0500 time 0.4407 (0.4498) data time 0.0006 (0.0032) model time 0.4401 (0.4462) loss 3.4527 (2.8356) grad_norm 4.7031 (1.9261) loss_scale 512.0000 (330.9657) mem 16699MB [2024-08-10 16:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][330/625] eta 0:02:12 lr 0.000483 wd 0.0500 time 0.4426 (0.4495) data time 0.0007 (0.0031) model time 0.4420 (0.4461) loss 3.4597 (2.8372) grad_norm 2.8514 (1.9510) loss_scale 512.0000 (336.4350) mem 16699MB [2024-08-10 16:19:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][340/625] eta 0:02:08 lr 0.000483 wd 0.0500 time 0.4434 (0.4494) data time 0.0008 (0.0030) model time 0.4426 (0.4460) loss 3.2012 (2.8364) grad_norm 1.5901 (1.9473) loss_scale 512.0000 (341.5836) mem 16699MB [2024-08-10 16:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][350/625] eta 0:02:03 lr 0.000483 wd 0.0500 time 0.4443 (0.4492) data time 0.0008 (0.0030) model time 0.4435 (0.4459) loss 2.5622 (2.8321) grad_norm 2.3999 (1.9402) loss_scale 512.0000 (346.4387) mem 16699MB [2024-08-10 16:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][360/625] eta 0:01:59 lr 0.000483 wd 0.0500 time 0.4402 (0.4491) data time 0.0009 (0.0029) model time 0.4393 (0.4458) loss 3.0187 (2.8352) grad_norm 1.2504 (1.9389) loss_scale 512.0000 (351.0249) mem 16699MB [2024-08-10 16:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][370/625] eta 0:01:54 lr 0.000483 wd 0.0500 time 0.4457 (0.4489) data time 0.0006 (0.0029) model time 0.4450 (0.4457) loss 2.2333 (2.8378) grad_norm 2.0570 (1.9362) loss_scale 512.0000 (355.3639) mem 16699MB [2024-08-10 16:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][380/625] eta 0:01:49 lr 0.000483 wd 0.0500 time 0.4428 (0.4488) data time 0.0008 (0.0028) model time 0.4420 (0.4456) loss 3.4190 (2.8398) grad_norm 1.7634 (1.9275) loss_scale 512.0000 (359.4751) mem 16699MB [2024-08-10 16:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][390/625] eta 0:01:45 lr 0.000483 wd 0.0500 time 0.4453 (0.4487) data time 0.0008 (0.0028) model time 0.4445 (0.4455) loss 2.5488 (2.8278) grad_norm 1.4756 (1.9217) loss_scale 512.0000 (363.3760) mem 16699MB [2024-08-10 16:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][400/625] eta 0:01:40 lr 0.000483 wd 0.0500 time 0.4445 (0.4486) data time 0.0009 (0.0027) model time 0.4436 (0.4455) loss 2.9535 (2.8275) grad_norm 1.6132 (1.9147) loss_scale 512.0000 (367.0823) mem 16699MB [2024-08-10 16:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][410/625] eta 0:01:36 lr 0.000483 wd 0.0500 time 0.4456 (0.4488) data time 0.0008 (0.0027) model time 0.4448 (0.4458) loss 2.0274 (2.8246) grad_norm 2.5242 (1.9129) loss_scale 512.0000 (370.6083) mem 16699MB [2024-08-10 16:20:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][420/625] eta 0:01:32 lr 0.000482 wd 0.0500 time 0.4457 (0.4495) data time 0.0009 (0.0026) model time 0.4448 (0.4466) loss 2.4000 (2.8238) grad_norm 1.7767 (1.9258) loss_scale 512.0000 (373.9667) mem 16699MB [2024-08-10 16:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][430/625] eta 0:01:27 lr 0.000482 wd 0.0500 time 0.4421 (0.4493) data time 0.0006 (0.0026) model time 0.4415 (0.4465) loss 2.9864 (2.8253) grad_norm 1.8819 (1.9304) loss_scale 512.0000 (377.1694) mem 16699MB [2024-08-10 16:20:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][440/625] eta 0:01:23 lr 0.000482 wd 0.0500 time 0.4413 (0.4492) data time 0.0009 (0.0026) model time 0.4404 (0.4464) loss 2.9876 (2.8232) grad_norm 1.8452 (1.9278) loss_scale 512.0000 (380.2268) mem 16699MB [2024-08-10 16:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][450/625] eta 0:01:18 lr 0.000482 wd 0.0500 time 0.4398 (0.4491) data time 0.0006 (0.0025) model time 0.4391 (0.4463) loss 2.4918 (2.8291) grad_norm 1.9879 (1.9476) loss_scale 512.0000 (383.1486) mem 16699MB [2024-08-10 16:20:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][460/625] eta 0:01:14 lr 0.000482 wd 0.0500 time 0.4421 (0.4496) data time 0.0008 (0.0025) model time 0.4413 (0.4469) loss 3.0856 (2.8290) grad_norm 1.4243 (1.9454) loss_scale 512.0000 (385.9436) mem 16699MB [2024-08-10 16:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][470/625] eta 0:01:09 lr 0.000482 wd 0.0500 time 0.6657 (0.4503) data time 0.0008 (0.0025) model time 0.6648 (0.4478) loss 3.0330 (2.8306) grad_norm 1.8010 (1.9578) loss_scale 512.0000 (388.6200) mem 16699MB [2024-08-10 16:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][480/625] eta 0:01:05 lr 0.000482 wd 0.0500 time 0.4439 (0.4502) data time 0.0007 (0.0024) model time 0.4433 (0.4477) loss 1.8301 (2.8236) grad_norm 2.3686 (1.9838) loss_scale 512.0000 (391.1850) mem 16699MB [2024-08-10 16:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][490/625] eta 0:01:00 lr 0.000482 wd 0.0500 time 0.4440 (0.4501) data time 0.0006 (0.0024) model time 0.4434 (0.4476) loss 2.9112 (2.8264) grad_norm 2.4477 (1.9872) loss_scale 512.0000 (393.6456) mem 16699MB [2024-08-10 16:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][500/625] eta 0:00:56 lr 0.000482 wd 0.0500 time 0.4455 (0.4500) data time 0.0009 (0.0024) model time 0.4446 (0.4475) loss 3.3614 (2.8310) grad_norm 1.2231 (1.9855) loss_scale 512.0000 (396.0080) mem 16699MB [2024-08-10 16:20:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][510/625] eta 0:00:51 lr 0.000482 wd 0.0500 time 0.4464 (0.4499) data time 0.0009 (0.0023) model time 0.4455 (0.4474) loss 3.4394 (2.8319) grad_norm 1.1015 (inf) loss_scale 256.0000 (394.7710) mem 16699MB [2024-08-10 16:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][520/625] eta 0:00:47 lr 0.000481 wd 0.0500 time 0.4411 (0.4498) data time 0.0008 (0.0023) model time 0.4403 (0.4473) loss 3.1160 (2.8374) grad_norm 1.4766 (inf) loss_scale 256.0000 (392.1075) mem 16699MB [2024-08-10 16:20:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][530/625] eta 0:00:42 lr 0.000481 wd 0.0500 time 0.4462 (0.4496) data time 0.0009 (0.0023) model time 0.4453 (0.4472) loss 2.6001 (2.8342) grad_norm 1.3710 (inf) loss_scale 256.0000 (389.5443) mem 16699MB [2024-08-10 16:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][540/625] eta 0:00:38 lr 0.000481 wd 0.0500 time 0.4461 (0.4496) data time 0.0009 (0.0023) model time 0.4452 (0.4472) loss 1.9946 (2.8357) grad_norm 3.6864 (inf) loss_scale 256.0000 (387.0758) mem 16699MB [2024-08-10 16:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][550/625] eta 0:00:33 lr 0.000481 wd 0.0500 time 0.6160 (0.4498) data time 0.0009 (0.0022) model time 0.6151 (0.4474) loss 3.1198 (2.8383) grad_norm 1.3963 (inf) loss_scale 256.0000 (384.6969) mem 16699MB [2024-08-10 16:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][560/625] eta 0:00:29 lr 0.000481 wd 0.0500 time 0.4422 (0.4497) data time 0.0009 (0.0022) model time 0.4412 (0.4474) loss 2.9008 (2.8369) grad_norm 1.9259 (inf) loss_scale 256.0000 (382.4029) mem 16699MB [2024-08-10 16:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][570/625] eta 0:00:24 lr 0.000481 wd 0.0500 time 0.4398 (0.4498) data time 0.0007 (0.0022) model time 0.4391 (0.4475) loss 3.3706 (2.8380) grad_norm 3.6789 (inf) loss_scale 256.0000 (380.1891) mem 16699MB [2024-08-10 16:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][580/625] eta 0:00:20 lr 0.000481 wd 0.0500 time 0.4455 (0.4497) data time 0.0008 (0.0022) model time 0.4447 (0.4475) loss 1.7783 (2.8336) grad_norm 1.9201 (inf) loss_scale 256.0000 (378.0516) mem 16699MB [2024-08-10 16:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][590/625] eta 0:00:15 lr 0.000481 wd 0.0500 time 0.4420 (0.4496) data time 0.0009 (0.0022) model time 0.4411 (0.4474) loss 3.0225 (2.8312) grad_norm 2.5916 (inf) loss_scale 256.0000 (375.9865) mem 16699MB [2024-08-10 16:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][600/625] eta 0:00:11 lr 0.000481 wd 0.0500 time 0.4393 (0.4495) data time 0.0009 (0.0021) model time 0.4384 (0.4473) loss 3.1126 (2.8323) grad_norm 1.2450 (inf) loss_scale 256.0000 (373.9900) mem 16699MB [2024-08-10 16:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][610/625] eta 0:00:06 lr 0.000480 wd 0.0500 time 0.4385 (0.4497) data time 0.0006 (0.0021) model time 0.4378 (0.4476) loss 2.9967 (2.8324) grad_norm 1.8896 (inf) loss_scale 256.0000 (372.0589) mem 16699MB [2024-08-10 16:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [178/300][620/625] eta 0:00:02 lr 0.000480 wd 0.0500 time 0.4351 (0.4496) data time 0.0004 (0.0021) model time 0.4347 (0.4474) loss 2.5329 (2.8325) grad_norm 6.1526 (inf) loss_scale 256.0000 (370.1900) mem 16699MB [2024-08-10 16:21:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 178 training takes 0:04:40 [2024-08-10 16:21:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:21:33 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5303 (0.5303) Acc@1 88.672 (88.672) Acc@5 98.389 (98.389) Mem 16699MB [2024-08-10 16:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.153) Loss 0.8535 (0.6436) Acc@1 79.590 (85.924) Acc@5 95.703 (97.519) Mem 16699MB [2024-08-10 16:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.136) Loss 0.9331 (0.7595) Acc@1 77.539 (82.889) Acc@5 94.482 (96.331) Mem 16699MB [2024-08-10 16:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.620 Acc@5 96.339 [2024-08-10 16:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.6% [2024-08-10 16:21:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.775 (0.775) Loss 0.4705 (0.4705) Acc@1 89.355 (89.355) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 16:21:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.182) Loss 0.7607 (0.5880) Acc@1 81.250 (87.140) Acc@5 96.338 (97.927) Mem 16699MB [2024-08-10 16:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.151) Loss 0.8442 (0.6894) Acc@1 78.955 (84.408) Acc@5 95.850 (96.956) Mem 16699MB [2024-08-10 16:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.095 Acc@5 96.955 [2024-08-10 16:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-10 16:21:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][0/625] eta 0:13:01 lr 0.000480 wd 0.0500 time 1.2506 (1.2506) data time 0.7476 (0.7476) model time 0.0000 (0.0000) loss 2.8349 (2.8349) grad_norm 1.7469 (1.7469) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:21:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][10/625] eta 0:05:19 lr 0.000480 wd 0.0500 time 0.4471 (0.5193) data time 0.0008 (0.0688) model time 0.0000 (0.0000) loss 2.3542 (2.9302) grad_norm 2.2403 (2.4937) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:21:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][20/625] eta 0:04:52 lr 0.000480 wd 0.0500 time 0.4421 (0.4835) data time 0.0006 (0.0365) model time 0.0000 (0.0000) loss 2.7335 (2.8194) grad_norm 1.5009 (2.2472) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:21:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][30/625] eta 0:04:40 lr 0.000480 wd 0.0500 time 0.4418 (0.4706) data time 0.0006 (0.0250) model time 0.0000 (0.0000) loss 3.2507 (2.8333) grad_norm 1.8997 (2.1238) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][40/625] eta 0:04:34 lr 0.000480 wd 0.0500 time 0.4481 (0.4686) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 3.3869 (2.8948) grad_norm 1.8763 (2.2083) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][50/625] eta 0:04:28 lr 0.000480 wd 0.0500 time 0.4415 (0.4669) data time 0.0009 (0.0156) model time 0.0000 (0.0000) loss 3.1724 (2.8849) grad_norm 1.2252 (2.0905) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][60/625] eta 0:04:21 lr 0.000480 wd 0.0500 time 0.4407 (0.4633) data time 0.0006 (0.0132) model time 0.4400 (0.4443) loss 3.3986 (2.8741) grad_norm 2.2373 (2.0703) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][70/625] eta 0:04:17 lr 0.000480 wd 0.0500 time 0.4443 (0.4637) data time 0.0006 (0.0114) model time 0.4436 (0.4547) loss 2.8227 (2.8656) grad_norm 1.9100 (2.0137) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][80/625] eta 0:04:11 lr 0.000479 wd 0.0500 time 0.4418 (0.4612) data time 0.0008 (0.0101) model time 0.4409 (0.4507) loss 2.8985 (2.8588) grad_norm 3.8966 (2.0871) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][90/625] eta 0:04:05 lr 0.000479 wd 0.0500 time 0.4458 (0.4594) data time 0.0007 (0.0091) model time 0.4452 (0.4488) loss 3.1222 (2.8404) grad_norm 1.9939 (2.0918) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][100/625] eta 0:04:00 lr 0.000479 wd 0.0500 time 0.4421 (0.4578) data time 0.0006 (0.0083) model time 0.4415 (0.4477) loss 3.0008 (2.8361) grad_norm 1.4869 (2.0601) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][110/625] eta 0:03:55 lr 0.000479 wd 0.0500 time 0.4424 (0.4565) data time 0.0008 (0.0077) model time 0.4416 (0.4468) loss 3.3321 (2.8329) grad_norm 1.9643 (2.0767) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][120/625] eta 0:03:51 lr 0.000479 wd 0.0500 time 0.4439 (0.4584) data time 0.0008 (0.0071) model time 0.4431 (0.4512) loss 2.8779 (2.8230) grad_norm 2.1701 (2.0967) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][130/625] eta 0:03:46 lr 0.000479 wd 0.0500 time 0.4439 (0.4572) data time 0.0006 (0.0067) model time 0.4433 (0.4500) loss 2.4693 (2.8280) grad_norm 1.8425 (2.0733) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][140/625] eta 0:03:41 lr 0.000479 wd 0.0500 time 0.4393 (0.4562) data time 0.0010 (0.0062) model time 0.4384 (0.4491) loss 3.0605 (2.8490) grad_norm 1.8058 (2.0606) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][150/625] eta 0:03:36 lr 0.000479 wd 0.0500 time 0.4440 (0.4560) data time 0.0006 (0.0059) model time 0.4434 (0.4495) loss 1.8684 (2.8422) grad_norm 1.6469 (2.0562) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][160/625] eta 0:03:31 lr 0.000479 wd 0.0500 time 0.4441 (0.4553) data time 0.0009 (0.0056) model time 0.4432 (0.4490) loss 3.0727 (2.8355) grad_norm 1.5943 (2.0563) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][170/625] eta 0:03:26 lr 0.000479 wd 0.0500 time 0.4388 (0.4546) data time 0.0007 (0.0053) model time 0.4381 (0.4484) loss 1.9512 (2.8271) grad_norm 2.1499 (2.0734) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][180/625] eta 0:03:21 lr 0.000478 wd 0.0500 time 0.4415 (0.4539) data time 0.0006 (0.0051) model time 0.4408 (0.4478) loss 2.2306 (2.8042) grad_norm 1.8560 (2.2422) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][190/625] eta 0:03:17 lr 0.000478 wd 0.0500 time 0.4443 (0.4533) data time 0.0007 (0.0048) model time 0.4437 (0.4474) loss 3.1662 (2.8161) grad_norm 1.4263 (2.2478) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][200/625] eta 0:03:12 lr 0.000478 wd 0.0500 time 0.4429 (0.4527) data time 0.0006 (0.0046) model time 0.4424 (0.4470) loss 3.5219 (2.8200) grad_norm 3.0875 (2.2482) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][210/625] eta 0:03:08 lr 0.000478 wd 0.0500 time 0.4475 (0.4531) data time 0.0006 (0.0045) model time 0.4468 (0.4477) loss 3.7013 (2.8299) grad_norm 4.8245 (2.2828) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][220/625] eta 0:03:03 lr 0.000478 wd 0.0500 time 0.4430 (0.4527) data time 0.0006 (0.0043) model time 0.4423 (0.4475) loss 2.6494 (2.8250) grad_norm 2.7469 (2.2658) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][230/625] eta 0:02:58 lr 0.000478 wd 0.0500 time 0.4444 (0.4526) data time 0.0008 (0.0042) model time 0.4436 (0.4476) loss 3.1801 (2.8283) grad_norm 1.7013 (2.2421) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][240/625] eta 0:02:54 lr 0.000478 wd 0.0500 time 0.4447 (0.4528) data time 0.0006 (0.0040) model time 0.4441 (0.4481) loss 2.5353 (2.8302) grad_norm 1.9695 (2.2231) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][250/625] eta 0:02:49 lr 0.000478 wd 0.0500 time 0.4460 (0.4524) data time 0.0009 (0.0039) model time 0.4451 (0.4478) loss 2.9381 (2.8332) grad_norm 1.2405 (2.2272) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][260/625] eta 0:02:45 lr 0.000478 wd 0.0500 time 0.4426 (0.4528) data time 0.0009 (0.0038) model time 0.4418 (0.4484) loss 3.2295 (2.8319) grad_norm 2.0549 (2.2124) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][270/625] eta 0:02:40 lr 0.000478 wd 0.0500 time 0.4436 (0.4525) data time 0.0008 (0.0037) model time 0.4428 (0.4482) loss 3.1945 (2.8416) grad_norm 1.7707 (2.1992) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][280/625] eta 0:02:35 lr 0.000477 wd 0.0500 time 0.4427 (0.4521) data time 0.0006 (0.0036) model time 0.4421 (0.4480) loss 3.5073 (2.8465) grad_norm 2.4574 (2.1909) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][290/625] eta 0:02:31 lr 0.000477 wd 0.0500 time 0.4435 (0.4519) data time 0.0008 (0.0035) model time 0.4427 (0.4478) loss 3.6652 (2.8426) grad_norm 2.0324 (2.1761) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][300/625] eta 0:02:26 lr 0.000477 wd 0.0500 time 0.4440 (0.4516) data time 0.0007 (0.0034) model time 0.4433 (0.4476) loss 3.3728 (2.8438) grad_norm 1.6216 (2.1811) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][310/625] eta 0:02:22 lr 0.000477 wd 0.0500 time 0.4455 (0.4514) data time 0.0008 (0.0033) model time 0.4447 (0.4475) loss 2.1024 (2.8478) grad_norm 2.0093 (2.1873) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 16:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][320/625] eta 0:02:17 lr 0.000477 wd 0.0500 time 0.4436 (0.4512) data time 0.0006 (0.0032) model time 0.4430 (0.4473) loss 2.1882 (2.8519) grad_norm 1.9520 (inf) loss_scale 128.0000 (252.0125) mem 16699MB [2024-08-10 16:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][330/625] eta 0:02:13 lr 0.000477 wd 0.0500 time 0.4392 (0.4510) data time 0.0006 (0.0032) model time 0.4386 (0.4472) loss 2.5132 (2.8496) grad_norm 2.0354 (inf) loss_scale 128.0000 (248.2659) mem 16699MB [2024-08-10 16:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][340/625] eta 0:02:08 lr 0.000477 wd 0.0500 time 0.4450 (0.4508) data time 0.0006 (0.0031) model time 0.4444 (0.4470) loss 2.9534 (2.8488) grad_norm 1.5713 (inf) loss_scale 128.0000 (244.7390) mem 16699MB [2024-08-10 16:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][350/625] eta 0:02:03 lr 0.000477 wd 0.0500 time 0.4415 (0.4506) data time 0.0008 (0.0030) model time 0.4407 (0.4469) loss 2.9329 (2.8495) grad_norm 1.7275 (inf) loss_scale 128.0000 (241.4131) mem 16699MB [2024-08-10 16:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][360/625] eta 0:01:59 lr 0.000477 wd 0.0500 time 0.4444 (0.4508) data time 0.0006 (0.0030) model time 0.4437 (0.4472) loss 3.1422 (2.8572) grad_norm 2.5883 (inf) loss_scale 128.0000 (238.2715) mem 16699MB [2024-08-10 16:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][370/625] eta 0:01:55 lr 0.000476 wd 0.0500 time 0.4590 (0.4511) data time 0.0009 (0.0029) model time 0.4581 (0.4476) loss 2.6829 (2.8555) grad_norm 2.2723 (inf) loss_scale 128.0000 (235.2992) mem 16699MB [2024-08-10 16:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][380/625] eta 0:01:50 lr 0.000476 wd 0.0500 time 0.4434 (0.4509) data time 0.0006 (0.0029) model time 0.4428 (0.4475) loss 3.1477 (2.8620) grad_norm 1.2449 (inf) loss_scale 128.0000 (232.4829) mem 16699MB [2024-08-10 16:24:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][390/625] eta 0:01:45 lr 0.000476 wd 0.0500 time 0.4488 (0.4507) data time 0.0006 (0.0028) model time 0.4482 (0.4474) loss 3.4244 (2.8629) grad_norm 1.4685 (inf) loss_scale 128.0000 (229.8107) mem 16699MB [2024-08-10 16:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][400/625] eta 0:01:41 lr 0.000476 wd 0.0500 time 0.4395 (0.4505) data time 0.0009 (0.0028) model time 0.4386 (0.4472) loss 2.3584 (2.8633) grad_norm 1.9286 (inf) loss_scale 128.0000 (227.2718) mem 16699MB [2024-08-10 16:24:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][410/625] eta 0:01:36 lr 0.000476 wd 0.0500 time 0.4416 (0.4503) data time 0.0008 (0.0027) model time 0.4408 (0.4470) loss 2.7397 (2.8629) grad_norm 1.7304 (inf) loss_scale 128.0000 (224.8564) mem 16699MB [2024-08-10 16:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][420/625] eta 0:01:32 lr 0.000476 wd 0.0500 time 0.4411 (0.4501) data time 0.0009 (0.0027) model time 0.4402 (0.4469) loss 3.2050 (2.8609) grad_norm 2.0773 (inf) loss_scale 128.0000 (222.5558) mem 16699MB [2024-08-10 16:24:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][430/625] eta 0:01:27 lr 0.000476 wd 0.0500 time 0.4440 (0.4499) data time 0.0006 (0.0026) model time 0.4434 (0.4467) loss 3.4018 (2.8626) grad_norm 1.6520 (inf) loss_scale 128.0000 (220.3619) mem 16699MB [2024-08-10 16:24:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][440/625] eta 0:01:23 lr 0.000476 wd 0.0500 time 0.4410 (0.4497) data time 0.0009 (0.0026) model time 0.4401 (0.4466) loss 3.0641 (2.8613) grad_norm 1.2705 (inf) loss_scale 128.0000 (218.2676) mem 16699MB [2024-08-10 16:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][450/625] eta 0:01:18 lr 0.000476 wd 0.0500 time 0.4440 (0.4504) data time 0.0007 (0.0026) model time 0.4433 (0.4474) loss 3.0259 (2.8621) grad_norm 1.2009 (inf) loss_scale 128.0000 (216.2661) mem 16699MB [2024-08-10 16:25:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][460/625] eta 0:01:14 lr 0.000476 wd 0.0500 time 0.4455 (0.4510) data time 0.0008 (0.0025) model time 0.4447 (0.4481) loss 3.4462 (2.8619) grad_norm 1.4883 (inf) loss_scale 128.0000 (214.3514) mem 16699MB [2024-08-10 16:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][470/625] eta 0:01:09 lr 0.000475 wd 0.0500 time 0.4427 (0.4509) data time 0.0006 (0.0025) model time 0.4421 (0.4480) loss 2.9861 (2.8688) grad_norm 1.6092 (inf) loss_scale 128.0000 (212.5180) mem 16699MB [2024-08-10 16:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][480/625] eta 0:01:05 lr 0.000475 wd 0.0500 time 0.4421 (0.4512) data time 0.0007 (0.0025) model time 0.4414 (0.4484) loss 3.1212 (2.8718) grad_norm 1.5692 (inf) loss_scale 128.0000 (210.7609) mem 16699MB [2024-08-10 16:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][490/625] eta 0:01:00 lr 0.000475 wd 0.0500 time 0.4440 (0.4510) data time 0.0008 (0.0024) model time 0.4432 (0.4483) loss 2.8815 (2.8732) grad_norm 1.7310 (inf) loss_scale 128.0000 (209.0754) mem 16699MB [2024-08-10 16:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][500/625] eta 0:00:56 lr 0.000475 wd 0.0500 time 0.4415 (0.4508) data time 0.0007 (0.0024) model time 0.4408 (0.4481) loss 2.9570 (2.8723) grad_norm 1.7703 (inf) loss_scale 128.0000 (207.4571) mem 16699MB [2024-08-10 16:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][510/625] eta 0:00:51 lr 0.000475 wd 0.0500 time 0.4429 (0.4507) data time 0.0008 (0.0024) model time 0.4421 (0.4480) loss 2.9096 (2.8734) grad_norm 1.8711 (inf) loss_scale 128.0000 (205.9022) mem 16699MB [2024-08-10 16:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][520/625] eta 0:00:47 lr 0.000475 wd 0.0500 time 0.4424 (0.4508) data time 0.0008 (0.0023) model time 0.4416 (0.4482) loss 2.7646 (2.8668) grad_norm 1.6175 (inf) loss_scale 128.0000 (204.4069) mem 16699MB [2024-08-10 16:25:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][530/625] eta 0:00:42 lr 0.000475 wd 0.0500 time 0.4482 (0.4507) data time 0.0008 (0.0023) model time 0.4474 (0.4481) loss 3.0845 (2.8658) grad_norm 2.7483 (inf) loss_scale 128.0000 (202.9680) mem 16699MB [2024-08-10 16:25:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][540/625] eta 0:00:38 lr 0.000475 wd 0.0500 time 0.4414 (0.4506) data time 0.0006 (0.0023) model time 0.4408 (0.4480) loss 3.5920 (2.8658) grad_norm 2.2557 (inf) loss_scale 128.0000 (201.5823) mem 16699MB [2024-08-10 16:25:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][550/625] eta 0:00:33 lr 0.000475 wd 0.0500 time 0.4406 (0.4504) data time 0.0006 (0.0023) model time 0.4400 (0.4479) loss 3.3698 (2.8677) grad_norm 1.4699 (inf) loss_scale 128.0000 (200.2468) mem 16699MB [2024-08-10 16:25:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][560/625] eta 0:00:29 lr 0.000474 wd 0.0500 time 0.4400 (0.4503) data time 0.0007 (0.0022) model time 0.4393 (0.4478) loss 2.5937 (2.8633) grad_norm 1.9678 (inf) loss_scale 128.0000 (198.9590) mem 16699MB [2024-08-10 16:25:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][570/625] eta 0:00:24 lr 0.000474 wd 0.0500 time 0.4417 (0.4502) data time 0.0006 (0.0022) model time 0.4411 (0.4477) loss 2.2710 (2.8615) grad_norm 1.4863 (inf) loss_scale 128.0000 (197.7163) mem 16699MB [2024-08-10 16:26:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][580/625] eta 0:00:20 lr 0.000474 wd 0.0500 time 0.4429 (0.4501) data time 0.0008 (0.0022) model time 0.4420 (0.4476) loss 3.2348 (2.8646) grad_norm 1.2820 (inf) loss_scale 128.0000 (196.5164) mem 16699MB [2024-08-10 16:26:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][590/625] eta 0:00:15 lr 0.000474 wd 0.0500 time 0.4400 (0.4503) data time 0.0009 (0.0022) model time 0.4391 (0.4478) loss 2.6828 (2.8666) grad_norm 2.2499 (inf) loss_scale 128.0000 (195.3570) mem 16699MB [2024-08-10 16:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][600/625] eta 0:00:11 lr 0.000474 wd 0.0500 time 0.4406 (0.4501) data time 0.0009 (0.0022) model time 0.4397 (0.4477) loss 3.2863 (2.8703) grad_norm 1.7905 (inf) loss_scale 128.0000 (194.2363) mem 16699MB [2024-08-10 16:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][610/625] eta 0:00:06 lr 0.000474 wd 0.0500 time 0.4427 (0.4503) data time 0.0006 (0.0021) model time 0.4421 (0.4479) loss 2.7112 (2.8678) grad_norm 1.7926 (inf) loss_scale 128.0000 (193.1522) mem 16699MB [2024-08-10 16:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [179/300][620/625] eta 0:00:02 lr 0.000474 wd 0.0500 time 0.4407 (0.4505) data time 0.0006 (0.0021) model time 0.4400 (0.4481) loss 2.2562 (2.8679) grad_norm 1.1745 (inf) loss_scale 128.0000 (192.1031) mem 16699MB [2024-08-10 16:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 179 training takes 0:04:41 [2024-08-10 16:26:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:26:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:26:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.479 (0.479) Loss 0.5273 (0.5273) Acc@1 88.232 (88.232) Acc@5 98.682 (98.682) Mem 16699MB [2024-08-10 16:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.153) Loss 0.8540 (0.6440) Acc@1 79.883 (85.835) Acc@5 95.752 (97.585) Mem 16699MB [2024-08-10 16:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9487 (0.7560) Acc@1 76.611 (82.868) Acc@5 95.166 (96.501) Mem 16699MB [2024-08-10 16:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.654 Acc@5 96.479 [2024-08-10 16:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.7% [2024-08-10 16:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.810 (0.810) Loss 0.4707 (0.4707) Acc@1 89.453 (89.453) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 16:26:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.185) Loss 0.7603 (0.5878) Acc@1 81.543 (87.207) Acc@5 96.338 (97.914) Mem 16699MB [2024-08-10 16:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.152) Loss 0.8433 (0.6892) Acc@1 79.248 (84.428) Acc@5 95.898 (96.954) Mem 16699MB [2024-08-10 16:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.119 Acc@5 96.949 [2024-08-10 16:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-10 16:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.12% [2024-08-10 16:26:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:26:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][0/625] eta 0:07:36 lr 0.000474 wd 0.0500 time 0.7307 (0.7307) data time 0.3359 (0.3359) model time 0.0000 (0.0000) loss 1.5762 (1.5762) grad_norm 1.6354 (1.6354) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:26:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][10/625] eta 0:04:48 lr 0.000474 wd 0.0500 time 0.4468 (0.4689) data time 0.0007 (0.0318) model time 0.0000 (0.0000) loss 2.7859 (2.6473) grad_norm 1.3503 (1.8016) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][20/625] eta 0:04:36 lr 0.000474 wd 0.0500 time 0.4422 (0.4570) data time 0.0008 (0.0171) model time 0.0000 (0.0000) loss 2.9400 (2.7100) grad_norm 1.8402 (1.7359) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:26:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][30/625] eta 0:04:35 lr 0.000474 wd 0.0500 time 0.4472 (0.4627) data time 0.0009 (0.0119) model time 0.0000 (0.0000) loss 3.4188 (2.7760) grad_norm 1.7266 (1.6939) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:26:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][40/625] eta 0:04:30 lr 0.000473 wd 0.0500 time 0.4443 (0.4619) data time 0.0006 (0.0092) model time 0.0000 (0.0000) loss 3.1426 (2.7736) grad_norm 1.7275 (1.7142) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:26:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][50/625] eta 0:04:23 lr 0.000473 wd 0.0500 time 0.4435 (0.4582) data time 0.0006 (0.0076) model time 0.0000 (0.0000) loss 1.9886 (2.7961) grad_norm 1.5871 (1.7312) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][60/625] eta 0:04:19 lr 0.000473 wd 0.0500 time 0.6660 (0.4594) data time 0.0007 (0.0065) model time 0.6653 (0.4644) loss 3.6127 (2.8371) grad_norm 2.0003 (1.7715) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][70/625] eta 0:04:13 lr 0.000473 wd 0.0500 time 0.4464 (0.4564) data time 0.0008 (0.0057) model time 0.4455 (0.4509) loss 2.3045 (2.8332) grad_norm 2.4397 (1.7742) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][80/625] eta 0:04:07 lr 0.000473 wd 0.0500 time 0.4414 (0.4548) data time 0.0007 (0.0051) model time 0.4407 (0.4482) loss 3.3365 (2.8636) grad_norm 1.2327 (1.7464) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][90/625] eta 0:04:02 lr 0.000473 wd 0.0500 time 0.4441 (0.4536) data time 0.0010 (0.0047) model time 0.4431 (0.4467) loss 3.3463 (2.8605) grad_norm 1.8343 (1.7109) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][100/625] eta 0:03:57 lr 0.000473 wd 0.0500 time 0.4446 (0.4526) data time 0.0009 (0.0043) model time 0.4438 (0.4459) loss 2.7618 (2.8451) grad_norm 1.5557 (1.7189) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][110/625] eta 0:03:52 lr 0.000473 wd 0.0500 time 0.4419 (0.4518) data time 0.0006 (0.0040) model time 0.4413 (0.4454) loss 3.4216 (2.8600) grad_norm 2.1558 (1.7443) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][120/625] eta 0:03:47 lr 0.000473 wd 0.0500 time 0.4422 (0.4512) data time 0.0007 (0.0037) model time 0.4415 (0.4451) loss 3.2427 (2.8529) grad_norm 2.1629 (1.7573) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][130/625] eta 0:03:43 lr 0.000472 wd 0.0500 time 0.4399 (0.4519) data time 0.0007 (0.0035) model time 0.4392 (0.4470) loss 3.0744 (2.8430) grad_norm 1.3813 (1.7534) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][140/625] eta 0:03:38 lr 0.000472 wd 0.0500 time 0.4438 (0.4513) data time 0.0007 (0.0033) model time 0.4432 (0.4464) loss 2.4154 (2.8381) grad_norm 4.1917 (1.7638) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][150/625] eta 0:03:34 lr 0.000472 wd 0.0500 time 0.4402 (0.4508) data time 0.0007 (0.0032) model time 0.4395 (0.4461) loss 1.8810 (2.8218) grad_norm 1.9018 (1.7650) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][160/625] eta 0:03:29 lr 0.000472 wd 0.0500 time 0.4443 (0.4503) data time 0.0006 (0.0030) model time 0.4437 (0.4457) loss 1.8640 (2.8285) grad_norm 1.7578 (1.7585) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][170/625] eta 0:03:24 lr 0.000472 wd 0.0500 time 0.4428 (0.4499) data time 0.0007 (0.0029) model time 0.4422 (0.4454) loss 2.5872 (2.8224) grad_norm 1.6025 (1.7549) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][180/625] eta 0:03:20 lr 0.000472 wd 0.0500 time 0.4457 (0.4496) data time 0.0008 (0.0028) model time 0.4449 (0.4453) loss 3.4391 (2.8191) grad_norm 2.4363 (1.7618) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:27:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][190/625] eta 0:03:15 lr 0.000472 wd 0.0500 time 0.4420 (0.4501) data time 0.0008 (0.0027) model time 0.4412 (0.4463) loss 3.5186 (2.8178) grad_norm 1.7110 (1.7746) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][200/625] eta 0:03:11 lr 0.000472 wd 0.0500 time 0.4434 (0.4499) data time 0.0008 (0.0026) model time 0.4426 (0.4461) loss 3.7645 (2.8274) grad_norm 2.4370 (1.8915) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][210/625] eta 0:03:06 lr 0.000472 wd 0.0500 time 0.4451 (0.4504) data time 0.0009 (0.0025) model time 0.4442 (0.4470) loss 3.0166 (2.8265) grad_norm 1.6903 (1.9020) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][220/625] eta 0:03:02 lr 0.000472 wd 0.0500 time 0.4428 (0.4505) data time 0.0008 (0.0024) model time 0.4420 (0.4473) loss 2.9458 (2.8123) grad_norm 1.7843 (1.9277) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][230/625] eta 0:02:57 lr 0.000471 wd 0.0500 time 0.4421 (0.4505) data time 0.0009 (0.0024) model time 0.4411 (0.4474) loss 2.6894 (2.7999) grad_norm 1.8152 (1.9282) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][240/625] eta 0:02:53 lr 0.000471 wd 0.0500 time 0.4429 (0.4502) data time 0.0009 (0.0023) model time 0.4420 (0.4472) loss 3.1625 (2.8026) grad_norm 2.0240 (1.9624) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][250/625] eta 0:02:48 lr 0.000471 wd 0.0500 time 0.4398 (0.4499) data time 0.0009 (0.0023) model time 0.4390 (0.4469) loss 2.6922 (2.7986) grad_norm 1.7091 (1.9502) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][260/625] eta 0:02:44 lr 0.000471 wd 0.0500 time 0.4469 (0.4496) data time 0.0008 (0.0022) model time 0.4461 (0.4467) loss 3.2393 (2.7970) grad_norm 1.1968 (1.9430) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][270/625] eta 0:02:39 lr 0.000471 wd 0.0500 time 0.4424 (0.4494) data time 0.0008 (0.0022) model time 0.4416 (0.4464) loss 2.8785 (2.8023) grad_norm 1.5358 (1.9418) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][280/625] eta 0:02:34 lr 0.000471 wd 0.0500 time 0.4452 (0.4491) data time 0.0009 (0.0021) model time 0.4443 (0.4462) loss 2.7767 (2.8021) grad_norm 1.9375 (1.9360) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][290/625] eta 0:02:30 lr 0.000471 wd 0.0500 time 0.4434 (0.4489) data time 0.0008 (0.0021) model time 0.4426 (0.4460) loss 1.9089 (2.7981) grad_norm 1.2432 (1.9200) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][300/625] eta 0:02:25 lr 0.000471 wd 0.0500 time 0.4409 (0.4487) data time 0.0006 (0.0020) model time 0.4403 (0.4458) loss 3.2847 (2.8012) grad_norm 1.5946 (1.9103) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][310/625] eta 0:02:21 lr 0.000471 wd 0.0500 time 0.4415 (0.4484) data time 0.0006 (0.0020) model time 0.4409 (0.4456) loss 3.2605 (2.8077) grad_norm 2.2045 (1.9252) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][320/625] eta 0:02:16 lr 0.000470 wd 0.0500 time 0.4421 (0.4483) data time 0.0008 (0.0020) model time 0.4413 (0.4455) loss 2.7186 (2.8158) grad_norm 2.5322 (1.9261) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][330/625] eta 0:02:12 lr 0.000470 wd 0.0500 time 0.4471 (0.4482) data time 0.0006 (0.0019) model time 0.4464 (0.4455) loss 3.3887 (2.8216) grad_norm 3.1719 (1.9625) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][340/625] eta 0:02:07 lr 0.000470 wd 0.0500 time 0.4419 (0.4480) data time 0.0006 (0.0019) model time 0.4413 (0.4454) loss 3.5393 (2.8254) grad_norm 1.5419 (1.9626) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][350/625] eta 0:02:03 lr 0.000470 wd 0.0500 time 0.4466 (0.4479) data time 0.0008 (0.0019) model time 0.4459 (0.4453) loss 2.5028 (2.8244) grad_norm 1.1072 (1.9588) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][360/625] eta 0:01:58 lr 0.000470 wd 0.0500 time 0.4434 (0.4482) data time 0.0007 (0.0018) model time 0.4428 (0.4457) loss 3.1423 (2.8283) grad_norm 1.9160 (1.9721) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][370/625] eta 0:01:54 lr 0.000470 wd 0.0500 time 0.4353 (0.4485) data time 0.0009 (0.0018) model time 0.4344 (0.4460) loss 1.9039 (2.8242) grad_norm 2.2941 (1.9738) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][380/625] eta 0:01:49 lr 0.000470 wd 0.0500 time 0.4431 (0.4483) data time 0.0007 (0.0018) model time 0.4424 (0.4459) loss 2.4603 (2.8226) grad_norm 2.1451 (1.9688) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][390/625] eta 0:01:45 lr 0.000470 wd 0.0500 time 0.4446 (0.4481) data time 0.0008 (0.0018) model time 0.4438 (0.4457) loss 2.8276 (2.8226) grad_norm 1.6559 (1.9632) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][400/625] eta 0:01:40 lr 0.000470 wd 0.0500 time 0.4407 (0.4484) data time 0.0006 (0.0017) model time 0.4401 (0.4461) loss 2.9405 (2.8259) grad_norm 1.6901 (1.9724) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][410/625] eta 0:01:36 lr 0.000470 wd 0.0500 time 0.4428 (0.4488) data time 0.0008 (0.0017) model time 0.4420 (0.4466) loss 2.5825 (2.8249) grad_norm 1.6873 (1.9766) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][420/625] eta 0:01:31 lr 0.000469 wd 0.0500 time 0.4424 (0.4487) data time 0.0008 (0.0017) model time 0.4416 (0.4465) loss 2.6593 (2.8220) grad_norm 1.4073 (1.9698) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][430/625] eta 0:01:27 lr 0.000469 wd 0.0500 time 0.4435 (0.4486) data time 0.0006 (0.0017) model time 0.4429 (0.4464) loss 3.1350 (2.8219) grad_norm 10.2458 (1.9821) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][440/625] eta 0:01:23 lr 0.000469 wd 0.0500 time 0.4456 (0.4489) data time 0.0008 (0.0017) model time 0.4448 (0.4468) loss 3.1127 (2.8229) grad_norm 1.3459 (1.9744) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][450/625] eta 0:01:18 lr 0.000469 wd 0.0500 time 0.4470 (0.4491) data time 0.0008 (0.0016) model time 0.4461 (0.4471) loss 2.9348 (2.8201) grad_norm 2.8278 (2.0054) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][460/625] eta 0:01:14 lr 0.000469 wd 0.0500 time 0.4457 (0.4490) data time 0.0006 (0.0016) model time 0.4451 (0.4470) loss 3.5467 (2.8268) grad_norm 1.9857 (2.0052) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][470/625] eta 0:01:09 lr 0.000469 wd 0.0500 time 0.4426 (0.4493) data time 0.0006 (0.0016) model time 0.4420 (0.4474) loss 2.0062 (2.8270) grad_norm 1.3730 (2.0008) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][480/625] eta 0:01:05 lr 0.000469 wd 0.0500 time 0.4455 (0.4492) data time 0.0009 (0.0016) model time 0.4446 (0.4473) loss 2.9854 (2.8227) grad_norm 1.4376 (1.9936) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][490/625] eta 0:01:00 lr 0.000469 wd 0.0500 time 0.4478 (0.4491) data time 0.0006 (0.0016) model time 0.4472 (0.4472) loss 3.4249 (2.8216) grad_norm 2.1369 (1.9922) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][500/625] eta 0:00:56 lr 0.000469 wd 0.0500 time 0.4402 (0.4490) data time 0.0009 (0.0016) model time 0.4393 (0.4471) loss 2.3604 (2.8176) grad_norm 1.8399 (1.9944) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][510/625] eta 0:00:51 lr 0.000469 wd 0.0500 time 0.4428 (0.4489) data time 0.0006 (0.0016) model time 0.4421 (0.4470) loss 2.2970 (2.8175) grad_norm 8.8133 (2.0105) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][520/625] eta 0:00:47 lr 0.000468 wd 0.0500 time 0.4443 (0.4488) data time 0.0008 (0.0015) model time 0.4434 (0.4469) loss 2.9528 (2.8160) grad_norm 2.4353 (2.0159) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][530/625] eta 0:00:42 lr 0.000468 wd 0.0500 time 0.4421 (0.4487) data time 0.0008 (0.0015) model time 0.4413 (0.4468) loss 3.3832 (2.8149) grad_norm 2.8044 (2.0174) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][540/625] eta 0:00:38 lr 0.000468 wd 0.0500 time 0.4402 (0.4486) data time 0.0006 (0.0015) model time 0.4396 (0.4467) loss 3.6122 (2.8203) grad_norm 1.9053 (2.0336) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][550/625] eta 0:00:33 lr 0.000468 wd 0.0500 time 0.6674 (0.4489) data time 0.0006 (0.0015) model time 0.6668 (0.4471) loss 2.9950 (2.8263) grad_norm 1.8965 (2.0289) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][560/625] eta 0:00:29 lr 0.000468 wd 0.0500 time 0.4403 (0.4489) data time 0.0007 (0.0015) model time 0.4396 (0.4471) loss 3.3908 (2.8266) grad_norm 1.5364 (2.0220) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][570/625] eta 0:00:24 lr 0.000468 wd 0.0500 time 0.4426 (0.4488) data time 0.0006 (0.0015) model time 0.4420 (0.4470) loss 2.9398 (2.8222) grad_norm 1.6656 (2.0178) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][580/625] eta 0:00:20 lr 0.000468 wd 0.0500 time 0.6181 (0.4490) data time 0.0006 (0.0015) model time 0.6175 (0.4473) loss 1.6472 (2.8197) grad_norm 1.5773 (2.0129) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:30:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][590/625] eta 0:00:15 lr 0.000468 wd 0.0500 time 0.4397 (0.4493) data time 0.0006 (0.0015) model time 0.4391 (0.4476) loss 2.1565 (2.8214) grad_norm 1.3703 (2.0056) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][600/625] eta 0:00:11 lr 0.000468 wd 0.0500 time 0.4389 (0.4492) data time 0.0009 (0.0015) model time 0.4380 (0.4474) loss 2.5056 (2.8204) grad_norm 1.5322 (2.0033) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][610/625] eta 0:00:06 lr 0.000467 wd 0.0500 time 0.4396 (0.4491) data time 0.0006 (0.0014) model time 0.4390 (0.4474) loss 2.9725 (2.8191) grad_norm 2.9122 (2.0037) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [180/300][620/625] eta 0:00:02 lr 0.000467 wd 0.0500 time 0.4376 (0.4489) data time 0.0004 (0.0014) model time 0.4372 (0.4472) loss 2.8319 (2.8158) grad_norm 3.0083 (2.0235) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 180 training takes 0:04:40 [2024-08-10 16:31:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:31:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:31:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5190 (0.5190) Acc@1 88.525 (88.525) Acc@5 98.438 (98.438) Mem 16699MB [2024-08-10 16:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8545 (0.6339) Acc@1 78.955 (85.871) Acc@5 95.752 (97.625) Mem 16699MB [2024-08-10 16:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9351 (0.7530) Acc@1 77.197 (82.980) Acc@5 95.020 (96.422) Mem 16699MB [2024-08-10 16:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.756 Acc@5 96.411 [2024-08-10 16:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 16:31:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.849 (0.849) Loss 0.4717 (0.4717) Acc@1 89.453 (89.453) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-10 16:31:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.188) Loss 0.7612 (0.5878) Acc@1 81.592 (87.185) Acc@5 96.387 (97.940) Mem 16699MB [2024-08-10 16:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.8413 (0.6889) Acc@1 79.248 (84.415) Acc@5 95.947 (96.970) Mem 16699MB [2024-08-10 16:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.091 Acc@5 96.965 [2024-08-10 16:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-10 16:31:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][0/625] eta 0:13:05 lr 0.000467 wd 0.0500 time 1.2566 (1.2566) data time 0.6627 (0.6627) model time 0.0000 (0.0000) loss 3.2371 (3.2371) grad_norm 1.8943 (1.8943) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][10/625] eta 0:05:18 lr 0.000467 wd 0.0500 time 0.4429 (0.5173) data time 0.0008 (0.0610) model time 0.0000 (0.0000) loss 2.5481 (2.5751) grad_norm 1.5745 (1.6807) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][20/625] eta 0:04:52 lr 0.000467 wd 0.0500 time 0.4426 (0.4827) data time 0.0006 (0.0324) model time 0.0000 (0.0000) loss 3.3172 (2.5966) grad_norm 1.2298 (1.6019) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][30/625] eta 0:04:43 lr 0.000467 wd 0.0500 time 0.4433 (0.4768) data time 0.0010 (0.0222) model time 0.0000 (0.0000) loss 3.0208 (2.6903) grad_norm 1.3392 (1.6461) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][40/625] eta 0:04:34 lr 0.000467 wd 0.0500 time 0.4423 (0.4690) data time 0.0006 (0.0170) model time 0.0000 (0.0000) loss 3.5600 (2.6608) grad_norm 2.3207 (1.6268) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][50/625] eta 0:04:28 lr 0.000467 wd 0.0500 time 0.6117 (0.4674) data time 0.0009 (0.0139) model time 0.0000 (0.0000) loss 2.8018 (2.7043) grad_norm 1.5523 (1.7748) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][60/625] eta 0:04:22 lr 0.000467 wd 0.0500 time 0.4424 (0.4639) data time 0.0007 (0.0118) model time 0.4417 (0.4452) loss 3.5157 (2.7294) grad_norm 1.9130 (1.8015) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][70/625] eta 0:04:16 lr 0.000467 wd 0.0500 time 0.4528 (0.4614) data time 0.0009 (0.0102) model time 0.4520 (0.4452) loss 2.9377 (2.7831) grad_norm 2.0195 (1.8127) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:31:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][80/625] eta 0:04:10 lr 0.000467 wd 0.0500 time 0.4422 (0.4593) data time 0.0007 (0.0091) model time 0.4414 (0.4445) loss 3.3334 (2.7840) grad_norm 1.6686 (1.8395) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][90/625] eta 0:04:04 lr 0.000466 wd 0.0500 time 0.4454 (0.4577) data time 0.0009 (0.0082) model time 0.4445 (0.4444) loss 2.9861 (2.7898) grad_norm 2.6952 (1.8468) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][100/625] eta 0:04:00 lr 0.000466 wd 0.0500 time 0.4429 (0.4580) data time 0.0008 (0.0075) model time 0.4421 (0.4474) loss 3.2528 (2.8188) grad_norm 1.6680 (1.8413) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][110/625] eta 0:03:55 lr 0.000466 wd 0.0500 time 0.4452 (0.4569) data time 0.0007 (0.0069) model time 0.4445 (0.4470) loss 3.1641 (2.8269) grad_norm 2.6062 (1.8518) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][120/625] eta 0:03:51 lr 0.000466 wd 0.0500 time 0.4507 (0.4578) data time 0.0006 (0.0064) model time 0.4501 (0.4498) loss 2.8714 (2.8179) grad_norm 2.4001 (1.8615) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][130/625] eta 0:03:47 lr 0.000466 wd 0.0500 time 0.4451 (0.4595) data time 0.0006 (0.0060) model time 0.4445 (0.4535) loss 3.2268 (2.8124) grad_norm 2.1681 (1.8703) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][140/625] eta 0:03:42 lr 0.000466 wd 0.0500 time 0.4412 (0.4583) data time 0.0009 (0.0056) model time 0.4404 (0.4523) loss 3.1404 (2.8119) grad_norm 3.4844 (1.9121) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][150/625] eta 0:03:37 lr 0.000466 wd 0.0500 time 0.4519 (0.4574) data time 0.0007 (0.0053) model time 0.4512 (0.4514) loss 2.8835 (2.7955) grad_norm 2.3598 (1.9064) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][160/625] eta 0:03:32 lr 0.000466 wd 0.0500 time 0.4420 (0.4564) data time 0.0008 (0.0050) model time 0.4412 (0.4504) loss 2.8667 (2.7971) grad_norm 1.5960 (1.8890) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][170/625] eta 0:03:27 lr 0.000466 wd 0.0500 time 0.4435 (0.4555) data time 0.0009 (0.0048) model time 0.4426 (0.4496) loss 2.9050 (2.7871) grad_norm 2.2211 (1.8938) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][180/625] eta 0:03:22 lr 0.000465 wd 0.0500 time 0.4389 (0.4548) data time 0.0010 (0.0046) model time 0.4379 (0.4489) loss 2.7257 (2.7800) grad_norm 4.6015 (1.9106) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][190/625] eta 0:03:17 lr 0.000465 wd 0.0500 time 0.4450 (0.4542) data time 0.0006 (0.0044) model time 0.4444 (0.4485) loss 3.0662 (2.7975) grad_norm 1.3138 (1.9099) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][200/625] eta 0:03:12 lr 0.000465 wd 0.0500 time 0.4454 (0.4537) data time 0.0006 (0.0042) model time 0.4448 (0.4482) loss 3.0125 (2.8048) grad_norm 2.2788 (1.9028) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][210/625] eta 0:03:08 lr 0.000465 wd 0.0500 time 0.4410 (0.4533) data time 0.0006 (0.0040) model time 0.4404 (0.4479) loss 3.8905 (2.8046) grad_norm 1.1358 (1.8893) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][220/625] eta 0:03:03 lr 0.000465 wd 0.0500 time 0.4407 (0.4528) data time 0.0006 (0.0039) model time 0.4401 (0.4475) loss 3.0511 (2.8105) grad_norm 3.9841 (1.8926) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][230/625] eta 0:02:58 lr 0.000465 wd 0.0500 time 0.4399 (0.4524) data time 0.0008 (0.0037) model time 0.4391 (0.4472) loss 3.1878 (2.8029) grad_norm 1.4350 (1.8895) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][240/625] eta 0:02:54 lr 0.000465 wd 0.0500 time 0.4450 (0.4527) data time 0.0009 (0.0036) model time 0.4442 (0.4479) loss 3.2404 (2.8093) grad_norm 2.0046 (1.8849) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][250/625] eta 0:02:49 lr 0.000465 wd 0.0500 time 0.4410 (0.4523) data time 0.0006 (0.0035) model time 0.4404 (0.4475) loss 3.1888 (2.7930) grad_norm 2.0196 (1.8852) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][260/625] eta 0:02:44 lr 0.000465 wd 0.0500 time 0.4456 (0.4519) data time 0.0006 (0.0034) model time 0.4449 (0.4473) loss 2.4510 (2.7896) grad_norm 1.6474 (1.8974) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][270/625] eta 0:02:40 lr 0.000465 wd 0.0500 time 0.4444 (0.4516) data time 0.0007 (0.0033) model time 0.4437 (0.4471) loss 3.2298 (2.7904) grad_norm 1.9694 (1.9011) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][280/625] eta 0:02:35 lr 0.000464 wd 0.0500 time 0.4448 (0.4514) data time 0.0006 (0.0032) model time 0.4442 (0.4470) loss 3.3498 (2.7933) grad_norm 1.6134 (1.8960) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][290/625] eta 0:02:31 lr 0.000464 wd 0.0500 time 0.4423 (0.4511) data time 0.0008 (0.0032) model time 0.4415 (0.4468) loss 3.0467 (2.7966) grad_norm 2.0523 (1.9071) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][300/625] eta 0:02:26 lr 0.000464 wd 0.0500 time 0.4489 (0.4509) data time 0.0006 (0.0031) model time 0.4483 (0.4466) loss 2.5821 (2.7956) grad_norm 2.4742 (1.9073) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][310/625] eta 0:02:21 lr 0.000464 wd 0.0500 time 0.4403 (0.4506) data time 0.0006 (0.0030) model time 0.4397 (0.4464) loss 2.2931 (2.7947) grad_norm 2.9321 (1.9128) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][320/625] eta 0:02:17 lr 0.000464 wd 0.0500 time 0.4440 (0.4504) data time 0.0006 (0.0029) model time 0.4433 (0.4463) loss 3.5996 (2.7975) grad_norm 2.2947 (1.9347) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][330/625] eta 0:02:12 lr 0.000464 wd 0.0500 time 0.4436 (0.4502) data time 0.0008 (0.0029) model time 0.4428 (0.4461) loss 2.9799 (2.8072) grad_norm 1.9753 (1.9336) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][340/625] eta 0:02:08 lr 0.000464 wd 0.0500 time 0.4402 (0.4500) data time 0.0008 (0.0028) model time 0.4394 (0.4460) loss 3.1362 (2.8123) grad_norm 3.1243 (1.9517) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][350/625] eta 0:02:03 lr 0.000464 wd 0.0500 time 0.4418 (0.4502) data time 0.0006 (0.0028) model time 0.4412 (0.4464) loss 3.2388 (2.8181) grad_norm 1.5934 (1.9613) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:34:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][360/625] eta 0:01:59 lr 0.000464 wd 0.0500 time 0.4382 (0.4500) data time 0.0006 (0.0027) model time 0.4376 (0.4463) loss 2.1243 (2.8121) grad_norm 1.9896 (1.9615) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][370/625] eta 0:01:54 lr 0.000464 wd 0.0500 time 0.4433 (0.4503) data time 0.0006 (0.0027) model time 0.4427 (0.4468) loss 2.4223 (2.8042) grad_norm 1.1818 (1.9486) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-10 16:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 16:34:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:34:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:36:05 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 16:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 16:36:07 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 16:36:31 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 16:36:31 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 16:36:34 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 16:36:36 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 16:36:36 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 181) [2024-08-10 16:36:36 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 16:37:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][380/625] eta 0:13:23 lr 0.000463 wd 0.0500 time 0.4434 (3.2778) data time 0.0007 (0.1187) model time 0.4427 (3.1592) loss 3.5341 (3.3061) grad_norm 1.6226 (1.8543) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][390/625] eta 0:05:54 lr 0.000463 wd 0.0500 time 0.4439 (1.5076) data time 0.0009 (0.0451) model time 0.4431 (1.4625) loss 2.8576 (3.1190) grad_norm 1.9337 (2.4648) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][400/625] eta 0:04:07 lr 0.000463 wd 0.0500 time 0.4439 (1.0990) data time 0.0007 (0.0281) model time 0.4433 (1.0709) loss 3.0509 (3.1293) grad_norm 1.4131 (2.2718) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][410/625] eta 0:03:18 lr 0.000463 wd 0.0500 time 0.4440 (0.9248) data time 0.0009 (0.0205) model time 0.4431 (0.9043) loss 2.9196 (3.0961) grad_norm 1.7000 (2.1101) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][420/625] eta 0:02:49 lr 0.000463 wd 0.0500 time 0.4419 (0.8246) data time 0.0009 (0.0163) model time 0.4410 (0.8083) loss 2.8498 (3.0302) grad_norm 1.5015 (2.0622) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][430/625] eta 0:02:27 lr 0.000463 wd 0.0500 time 0.4441 (0.7566) data time 0.0007 (0.0135) model time 0.4434 (0.7431) loss 3.3796 (3.0205) grad_norm 1.8206 (2.0369) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][440/625] eta 0:02:11 lr 0.000463 wd 0.0500 time 0.4427 (0.7094) data time 0.0007 (0.0116) model time 0.4420 (0.6978) loss 2.5301 (2.9908) grad_norm 2.0733 (2.0840) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][450/625] eta 0:01:58 lr 0.000463 wd 0.0500 time 0.4457 (0.6747) data time 0.0008 (0.0102) model time 0.4449 (0.6645) loss 3.0964 (2.9614) grad_norm 1.6827 (2.0732) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][460/625] eta 0:01:46 lr 0.000463 wd 0.0500 time 0.4480 (0.6482) data time 0.0006 (0.0091) model time 0.4474 (0.6391) loss 2.0689 (2.9278) grad_norm 1.7318 (2.0398) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][470/625] eta 0:01:37 lr 0.000462 wd 0.0500 time 0.4443 (0.6272) data time 0.0007 (0.0082) model time 0.4436 (0.6190) loss 3.0861 (2.9371) grad_norm 1.9382 (2.0591) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][480/625] eta 0:01:28 lr 0.000462 wd 0.0500 time 0.4445 (0.6100) data time 0.0009 (0.0075) model time 0.4436 (0.6025) loss 3.1828 (2.9609) grad_norm 1.4820 (2.0566) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][490/625] eta 0:01:20 lr 0.000462 wd 0.0500 time 0.4468 (0.5959) data time 0.0007 (0.0070) model time 0.4461 (0.5889) loss 3.2296 (2.9526) grad_norm 1.8700 (2.0562) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][500/625] eta 0:01:12 lr 0.000462 wd 0.0500 time 0.4410 (0.5840) data time 0.0006 (0.0065) model time 0.4404 (0.5775) loss 1.9443 (2.9363) grad_norm 1.7803 (2.0436) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][510/625] eta 0:01:05 lr 0.000462 wd 0.0500 time 0.4445 (0.5739) data time 0.0009 (0.0061) model time 0.4436 (0.5678) loss 2.4879 (2.9356) grad_norm 1.5502 (2.0496) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:38:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][520/625] eta 0:00:59 lr 0.000462 wd 0.0500 time 0.4432 (0.5651) data time 0.0007 (0.0057) model time 0.4425 (0.5594) loss 2.0605 (2.9167) grad_norm 1.3833 (2.0324) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-10 16:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 16:38:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:38:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 16:40:49 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 16:41:03 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 16:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 16:41:15 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 16:41:18 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 16:41:20 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 16:41:20 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 181) [2024-08-10 16:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 16:41:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][530/625] eta 0:05:14 lr 0.000462 wd 0.0500 time 0.4811 (3.3089) data time 0.0014 (0.0893) model time 0.4797 (3.2196) loss 3.0578 (3.1692) grad_norm 2.0128 (1.8573) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:41:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][540/625] eta 0:02:19 lr 0.000462 wd 0.0500 time 0.4841 (1.6463) data time 0.0011 (0.0374) model time 0.4830 (1.6089) loss 2.7621 (3.0709) grad_norm 1.6763 (1.7100) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][550/625] eta 0:01:31 lr 0.000462 wd 0.0500 time 0.4752 (1.2141) data time 0.0009 (0.0240) model time 0.4743 (1.1901) loss 3.3798 (3.0418) grad_norm 1.5816 (1.9970) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][560/625] eta 0:01:07 lr 0.000462 wd 0.0500 time 0.7437 (1.0308) data time 0.0011 (0.0178) model time 0.7426 (1.0130) loss 2.6917 (3.0143) grad_norm 2.1707 (1.9768) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][570/625] eta 0:00:50 lr 0.000461 wd 0.0500 time 0.4945 (0.9128) data time 0.0009 (0.0143) model time 0.4936 (0.8986) loss 3.0813 (2.9792) grad_norm 1.2498 (2.0035) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][580/625] eta 0:00:37 lr 0.000461 wd 0.0500 time 0.4868 (0.8374) data time 0.0011 (0.0119) model time 0.4857 (0.8255) loss 3.0933 (2.9758) grad_norm 1.5634 (2.0650) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][590/625] eta 0:00:27 lr 0.000461 wd 0.0500 time 0.4893 (0.7850) data time 0.0010 (0.0103) model time 0.4883 (0.7747) loss 2.9169 (2.9356) grad_norm 1.8449 (2.0501) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][600/625] eta 0:00:18 lr 0.000461 wd 0.0500 time 0.4840 (0.7461) data time 0.0013 (0.0091) model time 0.4826 (0.7370) loss 3.5056 (2.9201) grad_norm 1.5643 (2.1140) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][610/625] eta 0:00:10 lr 0.000461 wd 0.0500 time 0.4777 (0.7159) data time 0.0008 (0.0083) model time 0.4769 (0.7077) loss 3.1977 (2.9071) grad_norm 1.9320 (2.0974) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [181/300][620/625] eta 0:00:03 lr 0.000461 wd 0.0500 time 0.4787 (0.6914) data time 0.0008 (0.0075) model time 0.4779 (0.6839) loss 3.1834 (2.9163) grad_norm 1.4172 (2.0971) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:42:33 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 181 training takes 0:01:08 [2024-08-10 16:42:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:42:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:42:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5122 (0.5122) Acc@1 88.721 (88.721) Acc@5 98.291 (98.291) Mem 16721MB [2024-08-10 16:42:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.8281 (0.6301) Acc@1 79.541 (85.906) Acc@5 95.996 (97.559) Mem 16721MB [2024-08-10 16:42:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9375 (0.7431) Acc@1 77.344 (83.126) Acc@5 95.020 (96.410) Mem 16721MB [2024-08-10 16:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.835 Acc@5 96.397 [2024-08-10 16:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 16:42:43 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 82.84% [2024-08-10 16:42:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 16:42:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 16:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.4724 (0.4724) Acc@1 89.404 (89.404) Acc@5 98.828 (98.828) Mem 16721MB [2024-08-10 16:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.7598 (0.5875) Acc@1 81.396 (87.207) Acc@5 96.484 (97.931) Mem 16721MB [2024-08-10 16:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.8423 (0.6889) Acc@1 79.150 (84.431) Acc@5 95.947 (96.968) Mem 16721MB [2024-08-10 16:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.121 Acc@5 96.953 [2024-08-10 16:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-10 16:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.12% [2024-08-10 16:42:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:42:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:42:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][0/625] eta 0:10:07 lr 0.000461 wd 0.0500 time 0.9717 (0.9717) data time 0.4448 (0.4448) model time 0.0000 (0.0000) loss 2.8090 (2.8090) grad_norm 1.8950 (1.8950) loss_scale 128.0000 (128.0000) mem 16725MB [2024-08-10 16:42:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][10/625] eta 0:05:22 lr 0.000461 wd 0.0500 time 0.4810 (0.5244) data time 0.0011 (0.0414) model time 0.0000 (0.0000) loss 3.5926 (3.0229) grad_norm 1.3986 (1.7864) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][20/625] eta 0:05:04 lr 0.000461 wd 0.0500 time 0.4869 (0.5041) data time 0.0009 (0.0222) model time 0.0000 (0.0000) loss 3.1263 (3.0027) grad_norm 1.6681 (1.7052) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][30/625] eta 0:04:56 lr 0.000461 wd 0.0500 time 0.4821 (0.4976) data time 0.0012 (0.0154) model time 0.0000 (0.0000) loss 3.0235 (2.9131) grad_norm 1.9412 (2.0018) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][40/625] eta 0:04:51 lr 0.000460 wd 0.0500 time 0.4195 (0.4979) data time 0.0012 (0.0119) model time 0.0000 (0.0000) loss 2.9484 (2.8821) grad_norm 1.4309 (2.3147) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][50/625] eta 0:04:44 lr 0.000460 wd 0.0500 time 0.4816 (0.4952) data time 0.0010 (0.0098) model time 0.0000 (0.0000) loss 3.5612 (2.8870) grad_norm 2.1599 (2.4897) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][60/625] eta 0:04:38 lr 0.000460 wd 0.0500 time 0.4862 (0.4933) data time 0.0012 (0.0084) model time 0.4851 (0.4821) loss 2.7412 (2.8998) grad_norm 1.6352 (2.4068) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][70/625] eta 0:04:32 lr 0.000460 wd 0.0500 time 0.4851 (0.4918) data time 0.0012 (0.0074) model time 0.4839 (0.4819) loss 2.4567 (2.8880) grad_norm 1.8177 (2.3229) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][80/625] eta 0:04:27 lr 0.000460 wd 0.0500 time 0.4836 (0.4906) data time 0.0012 (0.0066) model time 0.4824 (0.4815) loss 3.1404 (2.8639) grad_norm 2.0130 (2.2697) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][90/625] eta 0:04:22 lr 0.000460 wd 0.0500 time 0.4832 (0.4900) data time 0.0012 (0.0060) model time 0.4820 (0.4822) loss 3.0497 (2.8618) grad_norm 1.6684 (2.2419) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][100/625] eta 0:04:17 lr 0.000460 wd 0.0500 time 0.4881 (0.4897) data time 0.0009 (0.0055) model time 0.4872 (0.4830) loss 3.2712 (2.8528) grad_norm 1.3555 (2.1776) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][110/625] eta 0:04:11 lr 0.000460 wd 0.0500 time 0.4873 (0.4893) data time 0.0013 (0.0051) model time 0.4860 (0.4831) loss 2.7545 (2.8484) grad_norm 1.5354 (2.1617) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][120/625] eta 0:04:07 lr 0.000460 wd 0.0500 time 0.4919 (0.4909) data time 0.0008 (0.0048) model time 0.4911 (0.4866) loss 3.3113 (2.8380) grad_norm 1.9568 (2.1771) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:43:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][130/625] eta 0:04:02 lr 0.000460 wd 0.0500 time 0.4808 (0.4904) data time 0.0007 (0.0045) model time 0.4800 (0.4862) loss 2.9325 (2.8290) grad_norm 2.2727 (2.1599) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][140/625] eta 0:03:57 lr 0.000459 wd 0.0500 time 0.4821 (0.4899) data time 0.0010 (0.0043) model time 0.4811 (0.4858) loss 3.1172 (2.8315) grad_norm 1.6428 (2.1380) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][150/625] eta 0:03:52 lr 0.000459 wd 0.0500 time 0.4804 (0.4894) data time 0.0008 (0.0040) model time 0.4796 (0.4853) loss 2.9631 (2.8170) grad_norm 1.5837 (2.1767) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][160/625] eta 0:03:47 lr 0.000459 wd 0.0500 time 0.4808 (0.4890) data time 0.0011 (0.0039) model time 0.4797 (0.4850) loss 3.0970 (2.8057) grad_norm 2.4360 (2.1811) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][170/625] eta 0:03:42 lr 0.000459 wd 0.0500 time 0.4833 (0.4888) data time 0.0011 (0.0037) model time 0.4821 (0.4850) loss 2.8331 (2.7960) grad_norm 1.5835 (2.1836) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][180/625] eta 0:03:37 lr 0.000459 wd 0.0500 time 0.4797 (0.4885) data time 0.0011 (0.0036) model time 0.4786 (0.4848) loss 2.2320 (2.8041) grad_norm 1.9958 (2.2680) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][190/625] eta 0:03:32 lr 0.000459 wd 0.0500 time 0.4830 (0.4891) data time 0.0011 (0.0034) model time 0.4818 (0.4857) loss 2.5588 (2.8003) grad_norm 1.9105 (2.2592) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][200/625] eta 0:03:27 lr 0.000459 wd 0.0500 time 0.4868 (0.4890) data time 0.0011 (0.0033) model time 0.4857 (0.4857) loss 2.8545 (2.7846) grad_norm 2.1309 (2.2447) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][210/625] eta 0:03:22 lr 0.000459 wd 0.0500 time 0.4862 (0.4887) data time 0.0011 (0.0032) model time 0.4851 (0.4855) loss 3.1066 (2.7867) grad_norm 1.7604 (2.2375) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][220/625] eta 0:03:17 lr 0.000459 wd 0.0500 time 0.4796 (0.4884) data time 0.0011 (0.0031) model time 0.4785 (0.4853) loss 2.7536 (2.7988) grad_norm 2.8537 (2.2232) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][230/625] eta 0:03:12 lr 0.000458 wd 0.0500 time 0.4854 (0.4882) data time 0.0012 (0.0030) model time 0.4842 (0.4851) loss 3.0140 (2.7974) grad_norm 1.9113 (2.2173) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][240/625] eta 0:03:07 lr 0.000458 wd 0.0500 time 0.4845 (0.4881) data time 0.0009 (0.0030) model time 0.4836 (0.4851) loss 3.2635 (2.8008) grad_norm 3.4564 (2.2039) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][250/625] eta 0:03:02 lr 0.000458 wd 0.0500 time 0.4879 (0.4880) data time 0.0008 (0.0029) model time 0.4871 (0.4850) loss 3.3694 (2.8054) grad_norm 1.6906 (2.1919) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][260/625] eta 0:02:58 lr 0.000458 wd 0.0500 time 0.4782 (0.4879) data time 0.0008 (0.0028) model time 0.4774 (0.4850) loss 3.4752 (2.8082) grad_norm 2.0494 (2.1724) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][270/625] eta 0:02:53 lr 0.000458 wd 0.0500 time 0.4788 (0.4877) data time 0.0010 (0.0027) model time 0.4777 (0.4849) loss 3.0400 (2.8048) grad_norm 1.9480 (2.1684) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][280/625] eta 0:02:48 lr 0.000458 wd 0.0500 time 0.4794 (0.4876) data time 0.0008 (0.0027) model time 0.4786 (0.4848) loss 3.2698 (2.8048) grad_norm 1.2751 (2.1507) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][290/625] eta 0:02:43 lr 0.000458 wd 0.0500 time 0.4871 (0.4874) data time 0.0009 (0.0026) model time 0.4862 (0.4847) loss 2.9134 (2.7998) grad_norm 6.1467 (2.1681) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][300/625] eta 0:02:38 lr 0.000458 wd 0.0500 time 0.4807 (0.4873) data time 0.0008 (0.0026) model time 0.4799 (0.4846) loss 3.5511 (2.8084) grad_norm 1.6245 (2.1638) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][310/625] eta 0:02:33 lr 0.000458 wd 0.0500 time 0.4840 (0.4872) data time 0.0011 (0.0025) model time 0.4829 (0.4846) loss 2.7863 (2.8132) grad_norm 1.7938 (2.1596) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][320/625] eta 0:02:28 lr 0.000458 wd 0.0500 time 0.4865 (0.4873) data time 0.0010 (0.0025) model time 0.4855 (0.4847) loss 2.9889 (2.8146) grad_norm 1.4898 (2.1453) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][330/625] eta 0:02:23 lr 0.000457 wd 0.0500 time 0.4835 (0.4873) data time 0.0009 (0.0024) model time 0.4826 (0.4847) loss 2.5441 (2.8232) grad_norm 2.0100 (2.1425) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][340/625] eta 0:02:18 lr 0.000457 wd 0.0500 time 0.4817 (0.4873) data time 0.0009 (0.0024) model time 0.4808 (0.4848) loss 2.9791 (2.8306) grad_norm 3.0472 (2.1575) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][350/625] eta 0:02:13 lr 0.000457 wd 0.0500 time 0.4824 (0.4872) data time 0.0011 (0.0024) model time 0.4813 (0.4848) loss 2.3154 (2.8263) grad_norm 1.4635 (2.1446) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][360/625] eta 0:02:09 lr 0.000457 wd 0.0500 time 0.4870 (0.4872) data time 0.0009 (0.0023) model time 0.4861 (0.4848) loss 2.5583 (2.8214) grad_norm 1.5137 (2.1358) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][370/625] eta 0:02:04 lr 0.000457 wd 0.0500 time 0.4798 (0.4872) data time 0.0011 (0.0023) model time 0.4787 (0.4848) loss 2.7678 (2.8138) grad_norm 2.3653 (2.1296) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:45:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][380/625] eta 0:01:59 lr 0.000457 wd 0.0500 time 0.4941 (0.4871) data time 0.0008 (0.0023) model time 0.4933 (0.4847) loss 3.2047 (2.8132) grad_norm 2.1922 (2.1565) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:46:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][390/625] eta 0:01:54 lr 0.000457 wd 0.0500 time 0.4860 (0.4871) data time 0.0011 (0.0022) model time 0.4850 (0.4847) loss 3.0342 (2.8192) grad_norm 2.8298 (2.1659) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:46:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][400/625] eta 0:01:49 lr 0.000457 wd 0.0500 time 0.4823 (0.4870) data time 0.0011 (0.0022) model time 0.4812 (0.4847) loss 2.9505 (2.8186) grad_norm 2.6583 (2.2355) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:46:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][410/625] eta 0:01:44 lr 0.000457 wd 0.0500 time 0.4886 (0.4871) data time 0.0009 (0.0022) model time 0.4877 (0.4848) loss 3.3709 (2.8221) grad_norm 2.7470 (2.2369) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][420/625] eta 0:01:39 lr 0.000457 wd 0.0500 time 0.4833 (0.4870) data time 0.0009 (0.0022) model time 0.4824 (0.4848) loss 1.5121 (2.8175) grad_norm 1.2580 (2.2368) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:46:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][430/625] eta 0:01:34 lr 0.000456 wd 0.0500 time 0.4847 (0.4870) data time 0.0011 (0.0021) model time 0.4836 (0.4848) loss 2.9929 (2.8134) grad_norm 1.3732 (2.2303) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][440/625] eta 0:01:30 lr 0.000456 wd 0.0500 time 0.4851 (0.4870) data time 0.0009 (0.0021) model time 0.4842 (0.4848) loss 1.8303 (2.8128) grad_norm 1.6124 (2.2208) loss_scale 256.0000 (129.4512) mem 16721MB [2024-08-10 16:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][450/625] eta 0:01:25 lr 0.000456 wd 0.0500 time 0.4831 (0.4874) data time 0.0008 (0.0021) model time 0.4823 (0.4853) loss 3.5703 (2.8174) grad_norm 1.7671 (2.2100) loss_scale 256.0000 (132.2572) mem 16721MB [2024-08-10 16:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][460/625] eta 0:01:20 lr 0.000456 wd 0.0500 time 0.4826 (0.4874) data time 0.0008 (0.0021) model time 0.4818 (0.4853) loss 3.3942 (2.8201) grad_norm 1.8295 (2.1983) loss_scale 256.0000 (134.9414) mem 16721MB [2024-08-10 16:46:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][470/625] eta 0:01:15 lr 0.000456 wd 0.0500 time 0.4831 (0.4873) data time 0.0011 (0.0020) model time 0.4820 (0.4852) loss 2.5447 (2.8203) grad_norm 1.6509 (2.1986) loss_scale 256.0000 (137.5117) mem 16721MB [2024-08-10 16:46:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][480/625] eta 0:01:10 lr 0.000456 wd 0.0500 time 0.4906 (0.4873) data time 0.0009 (0.0020) model time 0.4896 (0.4852) loss 3.1000 (2.8256) grad_norm 2.1080 (2.1930) loss_scale 256.0000 (139.9751) mem 16721MB [2024-08-10 16:46:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][490/625] eta 0:01:05 lr 0.000456 wd 0.0500 time 0.4805 (0.4872) data time 0.0011 (0.0020) model time 0.4794 (0.4852) loss 3.3875 (2.8287) grad_norm 1.2602 (2.1917) loss_scale 256.0000 (142.3381) mem 16721MB [2024-08-10 16:46:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][500/625] eta 0:01:00 lr 0.000456 wd 0.0500 time 0.4875 (0.4871) data time 0.0008 (0.0020) model time 0.4867 (0.4851) loss 3.0466 (2.8260) grad_norm 1.4303 (2.1818) loss_scale 256.0000 (144.6068) mem 16721MB [2024-08-10 16:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][510/625] eta 0:00:56 lr 0.000456 wd 0.0500 time 0.4734 (0.4870) data time 0.0011 (0.0020) model time 0.4723 (0.4850) loss 1.8559 (2.8248) grad_norm 1.3460 (2.1721) loss_scale 256.0000 (146.7867) mem 16721MB [2024-08-10 16:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][520/625] eta 0:00:51 lr 0.000455 wd 0.0500 time 0.6426 (0.4872) data time 0.0011 (0.0020) model time 0.6416 (0.4853) loss 2.7670 (2.8270) grad_norm 1.3286 (2.1628) loss_scale 256.0000 (148.8829) mem 16721MB [2024-08-10 16:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][530/625] eta 0:00:46 lr 0.000455 wd 0.0500 time 0.4819 (0.4872) data time 0.0012 (0.0019) model time 0.4808 (0.4852) loss 2.5697 (2.8271) grad_norm 1.9718 (2.1590) loss_scale 256.0000 (150.9002) mem 16721MB [2024-08-10 16:47:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][540/625] eta 0:00:41 lr 0.000455 wd 0.0500 time 0.4897 (0.4872) data time 0.0011 (0.0019) model time 0.4887 (0.4853) loss 2.1864 (2.8257) grad_norm 1.8617 (2.1568) loss_scale 256.0000 (152.8429) mem 16721MB [2024-08-10 16:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][550/625] eta 0:00:36 lr 0.000455 wd 0.0500 time 0.4806 (0.4872) data time 0.0012 (0.0019) model time 0.4794 (0.4853) loss 1.8406 (2.8260) grad_norm 1.7019 (2.1532) loss_scale 256.0000 (154.7151) mem 16721MB [2024-08-10 16:47:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][560/625] eta 0:00:31 lr 0.000455 wd 0.0500 time 0.4855 (0.4872) data time 0.0010 (0.0019) model time 0.4845 (0.4853) loss 3.1951 (2.8237) grad_norm 1.7322 (2.1476) loss_scale 256.0000 (156.5205) mem 16721MB [2024-08-10 16:47:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][570/625] eta 0:00:26 lr 0.000455 wd 0.0500 time 0.4787 (0.4871) data time 0.0010 (0.0019) model time 0.4776 (0.4852) loss 2.5481 (2.8285) grad_norm 1.6483 (2.1375) loss_scale 256.0000 (158.2627) mem 16721MB [2024-08-10 16:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][580/625] eta 0:00:21 lr 0.000455 wd 0.0500 time 0.4858 (0.4870) data time 0.0011 (0.0019) model time 0.4846 (0.4851) loss 2.6454 (2.8303) grad_norm 2.0483 (2.1318) loss_scale 256.0000 (159.9449) mem 16721MB [2024-08-10 16:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][590/625] eta 0:00:17 lr 0.000455 wd 0.0500 time 0.4857 (0.4869) data time 0.0011 (0.0018) model time 0.4846 (0.4851) loss 3.1607 (2.8291) grad_norm 2.2391 (inf) loss_scale 128.0000 (160.0541) mem 16721MB [2024-08-10 16:47:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][600/625] eta 0:00:12 lr 0.000455 wd 0.0500 time 0.4880 (0.4869) data time 0.0009 (0.0018) model time 0.4871 (0.4850) loss 2.3426 (2.8262) grad_norm 1.7242 (inf) loss_scale 128.0000 (159.5208) mem 16721MB [2024-08-10 16:47:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][610/625] eta 0:00:07 lr 0.000455 wd 0.0500 time 0.4847 (0.4868) data time 0.0006 (0.0018) model time 0.4841 (0.4850) loss 2.0725 (2.8247) grad_norm 1.4300 (inf) loss_scale 128.0000 (159.0049) mem 16721MB [2024-08-10 16:47:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [182/300][620/625] eta 0:00:02 lr 0.000454 wd 0.0500 time 0.4819 (0.4868) data time 0.0005 (0.0018) model time 0.4814 (0.4849) loss 2.5268 (2.8238) grad_norm 2.8280 (inf) loss_scale 128.0000 (158.5056) mem 16721MB [2024-08-10 16:47:57 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 182 training takes 0:05:04 [2024-08-10 16:47:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:47:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.5215 (0.5215) Acc@1 88.721 (88.721) Acc@5 98.535 (98.535) Mem 16721MB [2024-08-10 16:48:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8237 (0.6310) Acc@1 80.127 (85.991) Acc@5 95.410 (97.576) Mem 16721MB [2024-08-10 16:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.142) Loss 0.9116 (0.7462) Acc@1 77.930 (83.110) Acc@5 95.410 (96.489) Mem 16721MB [2024-08-10 16:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.806 Acc@5 96.481 [2024-08-10 16:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.8% [2024-08-10 16:48:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.863 (0.863) Loss 0.4734 (0.4734) Acc@1 89.355 (89.355) Acc@5 98.828 (98.828) Mem 16721MB [2024-08-10 16:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.196) Loss 0.7573 (0.5870) Acc@1 81.689 (87.287) Acc@5 96.484 (97.945) Mem 16721MB [2024-08-10 16:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.159) Loss 0.8438 (0.6885) Acc@1 79.248 (84.505) Acc@5 95.947 (96.991) Mem 16721MB [2024-08-10 16:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.169 Acc@5 96.983 [2024-08-10 16:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 16:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.17% [2024-08-10 16:48:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:48:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][0/625] eta 0:08:22 lr 0.000454 wd 0.0500 time 0.8038 (0.8038) data time 0.3869 (0.3869) model time 0.0000 (0.0000) loss 2.3456 (2.3456) grad_norm 2.4358 (2.4358) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][10/625] eta 0:05:16 lr 0.000454 wd 0.0500 time 0.4847 (0.5139) data time 0.0008 (0.0362) model time 0.0000 (0.0000) loss 3.2233 (3.0084) grad_norm 1.5442 (1.7636) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][20/625] eta 0:05:02 lr 0.000454 wd 0.0500 time 0.4860 (0.5004) data time 0.0009 (0.0195) model time 0.0000 (0.0000) loss 3.2396 (2.8969) grad_norm 1.7357 (1.9023) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][30/625] eta 0:05:03 lr 0.000454 wd 0.0500 time 0.4829 (0.5101) data time 0.0010 (0.0136) model time 0.0000 (0.0000) loss 2.9734 (2.8058) grad_norm 1.6242 (2.2215) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][40/625] eta 0:04:55 lr 0.000454 wd 0.0500 time 0.4849 (0.5045) data time 0.0011 (0.0105) model time 0.0000 (0.0000) loss 2.4120 (2.8533) grad_norm 2.4551 (2.1745) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][50/625] eta 0:04:48 lr 0.000454 wd 0.0500 time 0.4861 (0.5012) data time 0.0011 (0.0087) model time 0.0000 (0.0000) loss 2.3485 (2.8514) grad_norm 1.4128 (2.0811) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][60/625] eta 0:04:41 lr 0.000454 wd 0.0500 time 0.4816 (0.4983) data time 0.0010 (0.0074) model time 0.4806 (0.4828) loss 2.7460 (2.8387) grad_norm 1.8511 (2.0472) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][70/625] eta 0:04:36 lr 0.000454 wd 0.0500 time 0.4882 (0.4987) data time 0.0009 (0.0065) model time 0.4873 (0.4913) loss 3.2765 (2.8171) grad_norm 1.4966 (1.9899) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][80/625] eta 0:04:30 lr 0.000454 wd 0.0500 time 0.4789 (0.4971) data time 0.0011 (0.0059) model time 0.4778 (0.4891) loss 1.8116 (2.7862) grad_norm 1.3649 (1.9732) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][90/625] eta 0:04:25 lr 0.000453 wd 0.0500 time 0.4862 (0.4957) data time 0.0011 (0.0053) model time 0.4851 (0.4876) loss 2.9204 (2.7819) grad_norm 6.7678 (1.9848) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:48:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][100/625] eta 0:04:19 lr 0.000453 wd 0.0500 time 0.4805 (0.4944) data time 0.0008 (0.0049) model time 0.4796 (0.4863) loss 2.2167 (2.7626) grad_norm 1.6315 (1.9573) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][110/625] eta 0:04:14 lr 0.000453 wd 0.0500 time 0.4835 (0.4933) data time 0.0011 (0.0046) model time 0.4824 (0.4855) loss 1.9110 (2.7563) grad_norm 1.9740 (1.9563) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][120/625] eta 0:04:08 lr 0.000453 wd 0.0500 time 0.4840 (0.4926) data time 0.0011 (0.0043) model time 0.4828 (0.4853) loss 2.8165 (2.7565) grad_norm 1.8002 (1.9888) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][130/625] eta 0:04:03 lr 0.000453 wd 0.0500 time 0.4907 (0.4921) data time 0.0011 (0.0040) model time 0.4896 (0.4853) loss 3.0505 (2.7638) grad_norm 1.7945 (2.0071) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][140/625] eta 0:03:58 lr 0.000453 wd 0.0500 time 0.4855 (0.4917) data time 0.0010 (0.0038) model time 0.4845 (0.4853) loss 2.6857 (2.7635) grad_norm 2.6207 (2.0052) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][150/625] eta 0:03:53 lr 0.000453 wd 0.0500 time 0.4893 (0.4912) data time 0.0008 (0.0036) model time 0.4885 (0.4851) loss 1.6912 (2.7653) grad_norm 1.5670 (2.0108) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][160/625] eta 0:03:48 lr 0.000453 wd 0.0500 time 0.4807 (0.4907) data time 0.0009 (0.0035) model time 0.4798 (0.4847) loss 2.7595 (2.7595) grad_norm 3.1017 (2.0129) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][170/625] eta 0:03:43 lr 0.000453 wd 0.0500 time 0.4830 (0.4903) data time 0.0009 (0.0033) model time 0.4821 (0.4846) loss 2.7592 (2.7598) grad_norm 1.7799 (2.0028) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][180/625] eta 0:03:38 lr 0.000453 wd 0.0500 time 0.4825 (0.4899) data time 0.0009 (0.0032) model time 0.4816 (0.4844) loss 1.8168 (2.7565) grad_norm 2.0719 (1.9920) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][190/625] eta 0:03:33 lr 0.000452 wd 0.0500 time 0.4909 (0.4897) data time 0.0010 (0.0031) model time 0.4898 (0.4844) loss 2.6640 (2.7567) grad_norm 2.4946 (2.0152) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][200/625] eta 0:03:28 lr 0.000452 wd 0.0500 time 0.4848 (0.4895) data time 0.0010 (0.0030) model time 0.4838 (0.4844) loss 3.0259 (2.7681) grad_norm 1.5540 (2.0132) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][210/625] eta 0:03:23 lr 0.000452 wd 0.0500 time 0.4886 (0.4893) data time 0.0010 (0.0029) model time 0.4876 (0.4845) loss 3.0391 (2.7688) grad_norm 1.9642 (2.0074) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][220/625] eta 0:03:18 lr 0.000452 wd 0.0500 time 0.4849 (0.4910) data time 0.0009 (0.0028) model time 0.4841 (0.4869) loss 2.2108 (2.7629) grad_norm 1.2303 (1.9914) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][230/625] eta 0:03:13 lr 0.000452 wd 0.0500 time 0.4827 (0.4906) data time 0.0010 (0.0027) model time 0.4817 (0.4866) loss 2.7515 (2.7525) grad_norm 3.2654 (2.0147) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][240/625] eta 0:03:08 lr 0.000452 wd 0.0500 time 0.4843 (0.4904) data time 0.0013 (0.0027) model time 0.4829 (0.4864) loss 2.9278 (2.7547) grad_norm 1.6535 (2.0082) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][250/625] eta 0:03:03 lr 0.000452 wd 0.0500 time 0.4873 (0.4901) data time 0.0012 (0.0026) model time 0.4861 (0.4862) loss 1.8166 (2.7593) grad_norm 1.8134 (1.9972) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][260/625] eta 0:02:59 lr 0.000452 wd 0.0500 time 0.4837 (0.4904) data time 0.0008 (0.0026) model time 0.4829 (0.4868) loss 3.9274 (2.7622) grad_norm 1.4171 (1.9804) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][270/625] eta 0:02:54 lr 0.000452 wd 0.0500 time 0.4863 (0.4903) data time 0.0009 (0.0025) model time 0.4854 (0.4867) loss 1.8955 (2.7576) grad_norm 2.1145 (1.9698) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][280/625] eta 0:02:49 lr 0.000452 wd 0.0500 time 0.4868 (0.4902) data time 0.0011 (0.0025) model time 0.4857 (0.4867) loss 3.0063 (2.7617) grad_norm 1.7011 (1.9635) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][290/625] eta 0:02:44 lr 0.000451 wd 0.0500 time 0.4907 (0.4900) data time 0.0011 (0.0024) model time 0.4896 (0.4866) loss 1.9260 (2.7635) grad_norm 2.1218 (1.9602) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][300/625] eta 0:02:39 lr 0.000451 wd 0.0500 time 0.4862 (0.4899) data time 0.0011 (0.0024) model time 0.4851 (0.4865) loss 2.8586 (2.7580) grad_norm 2.0456 (1.9595) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][310/625] eta 0:02:34 lr 0.000451 wd 0.0500 time 0.4825 (0.4897) data time 0.0011 (0.0023) model time 0.4814 (0.4864) loss 3.3168 (2.7590) grad_norm 10.9678 (1.9765) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][320/625] eta 0:02:29 lr 0.000451 wd 0.0500 time 0.4823 (0.4896) data time 0.0008 (0.0023) model time 0.4814 (0.4863) loss 2.4657 (2.7641) grad_norm 1.4073 (1.9703) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][330/625] eta 0:02:24 lr 0.000451 wd 0.0500 time 0.4901 (0.4895) data time 0.0010 (0.0022) model time 0.4891 (0.4863) loss 3.2174 (2.7721) grad_norm 2.8579 (1.9778) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][340/625] eta 0:02:19 lr 0.000451 wd 0.0500 time 0.4950 (0.4895) data time 0.0012 (0.0022) model time 0.4938 (0.4864) loss 2.5307 (2.7749) grad_norm 1.3678 (1.9722) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][350/625] eta 0:02:14 lr 0.000451 wd 0.0500 time 0.4864 (0.4895) data time 0.0010 (0.0022) model time 0.4854 (0.4864) loss 2.9763 (2.7696) grad_norm 1.6901 (1.9663) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][360/625] eta 0:02:09 lr 0.000451 wd 0.0500 time 0.4879 (0.4894) data time 0.0009 (0.0021) model time 0.4870 (0.4864) loss 2.5589 (2.7692) grad_norm 1.6675 (1.9620) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][370/625] eta 0:02:04 lr 0.000451 wd 0.0500 time 0.4915 (0.4893) data time 0.0008 (0.0021) model time 0.4907 (0.4863) loss 3.4192 (2.7734) grad_norm 1.5680 (1.9529) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][380/625] eta 0:01:59 lr 0.000450 wd 0.0500 time 0.4888 (0.4892) data time 0.0009 (0.0021) model time 0.4879 (0.4862) loss 2.8091 (2.7726) grad_norm 1.3435 (1.9467) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][390/625] eta 0:01:54 lr 0.000450 wd 0.0500 time 0.4855 (0.4891) data time 0.0011 (0.0021) model time 0.4844 (0.4862) loss 3.2993 (2.7770) grad_norm 2.0732 (1.9429) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][400/625] eta 0:01:50 lr 0.000450 wd 0.0500 time 0.4864 (0.4890) data time 0.0008 (0.0020) model time 0.4856 (0.4862) loss 3.4192 (2.7812) grad_norm 1.4611 (1.9337) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][410/625] eta 0:01:45 lr 0.000450 wd 0.0500 time 0.4855 (0.4889) data time 0.0011 (0.0020) model time 0.4844 (0.4861) loss 2.7958 (2.7799) grad_norm 2.0053 (1.9526) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][420/625] eta 0:01:40 lr 0.000450 wd 0.0500 time 0.4866 (0.4889) data time 0.0008 (0.0020) model time 0.4857 (0.4862) loss 3.1674 (2.7824) grad_norm 2.0951 (1.9516) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][430/625] eta 0:01:35 lr 0.000450 wd 0.0500 time 0.4896 (0.4889) data time 0.0009 (0.0020) model time 0.4887 (0.4862) loss 3.0152 (2.7820) grad_norm 1.5647 (1.9441) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][440/625] eta 0:01:30 lr 0.000450 wd 0.0500 time 0.4854 (0.4898) data time 0.0012 (0.0020) model time 0.4843 (0.4872) loss 2.9422 (2.7773) grad_norm 1.5439 (1.9420) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][450/625] eta 0:01:25 lr 0.000450 wd 0.0500 time 0.4856 (0.4897) data time 0.0008 (0.0019) model time 0.4849 (0.4872) loss 3.2565 (2.7838) grad_norm 1.6426 (1.9358) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][460/625] eta 0:01:20 lr 0.000450 wd 0.0500 time 0.4836 (0.4896) data time 0.0010 (0.0019) model time 0.4826 (0.4871) loss 3.0080 (2.7892) grad_norm 2.9282 (1.9393) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:51:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][470/625] eta 0:01:15 lr 0.000450 wd 0.0500 time 0.4861 (0.4895) data time 0.0008 (0.0019) model time 0.4853 (0.4870) loss 3.1979 (2.7886) grad_norm 1.6516 (1.9575) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][480/625] eta 0:01:11 lr 0.000449 wd 0.0500 time 0.4876 (0.4898) data time 0.0009 (0.0019) model time 0.4866 (0.4873) loss 3.1081 (2.7941) grad_norm 1.6836 (1.9567) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][490/625] eta 0:01:06 lr 0.000449 wd 0.0500 time 0.4855 (0.4897) data time 0.0008 (0.0019) model time 0.4847 (0.4874) loss 3.2891 (2.7938) grad_norm 1.4842 (1.9515) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][500/625] eta 0:01:01 lr 0.000449 wd 0.0500 time 0.4892 (0.4897) data time 0.0011 (0.0018) model time 0.4881 (0.4873) loss 2.5756 (2.7910) grad_norm 2.3965 (1.9487) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][510/625] eta 0:00:56 lr 0.000449 wd 0.0500 time 0.4847 (0.4896) data time 0.0013 (0.0018) model time 0.4835 (0.4873) loss 2.8175 (2.7894) grad_norm 2.3380 (1.9523) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][520/625] eta 0:00:51 lr 0.000449 wd 0.0500 time 0.4832 (0.4896) data time 0.0011 (0.0018) model time 0.4822 (0.4872) loss 3.4273 (2.7908) grad_norm 1.6029 (1.9498) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][530/625] eta 0:00:46 lr 0.000449 wd 0.0500 time 0.4816 (0.4895) data time 0.0011 (0.0018) model time 0.4804 (0.4872) loss 2.5206 (2.7893) grad_norm 1.4393 (1.9448) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][540/625] eta 0:00:41 lr 0.000449 wd 0.0500 time 0.4855 (0.4894) data time 0.0008 (0.0018) model time 0.4847 (0.4871) loss 2.9527 (2.7937) grad_norm 1.5031 (1.9404) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][550/625] eta 0:00:36 lr 0.000449 wd 0.0500 time 0.4827 (0.4893) data time 0.0010 (0.0018) model time 0.4817 (0.4870) loss 3.1310 (2.7955) grad_norm 1.5109 (1.9418) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][560/625] eta 0:00:31 lr 0.000449 wd 0.0500 time 0.4878 (0.4893) data time 0.0010 (0.0018) model time 0.4867 (0.4870) loss 2.6246 (2.7997) grad_norm 2.0569 (1.9426) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][570/625] eta 0:00:26 lr 0.000449 wd 0.0500 time 0.4848 (0.4892) data time 0.0008 (0.0018) model time 0.4840 (0.4869) loss 2.6437 (2.8028) grad_norm 1.5978 (1.9552) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][580/625] eta 0:00:22 lr 0.000448 wd 0.0500 time 0.4881 (0.4896) data time 0.0008 (0.0018) model time 0.4873 (0.4873) loss 2.9907 (2.8043) grad_norm 2.3313 (1.9541) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:52:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][590/625] eta 0:00:17 lr 0.000448 wd 0.0500 time 0.4832 (0.4899) data time 0.0008 (0.0018) model time 0.4824 (0.4877) loss 2.8834 (2.8004) grad_norm 2.8832 (1.9586) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][600/625] eta 0:00:12 lr 0.000448 wd 0.0500 time 0.4832 (0.4897) data time 0.0008 (0.0017) model time 0.4824 (0.4876) loss 2.0817 (2.7962) grad_norm 1.4360 (1.9595) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][610/625] eta 0:00:07 lr 0.000448 wd 0.0500 time 0.4822 (0.4897) data time 0.0008 (0.0017) model time 0.4814 (0.4875) loss 2.9506 (2.7968) grad_norm 1.6285 (1.9554) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [183/300][620/625] eta 0:00:02 lr 0.000448 wd 0.0500 time 0.4848 (0.4895) data time 0.0008 (0.0017) model time 0.4840 (0.4874) loss 2.4354 (2.7993) grad_norm 2.1338 (1.9586) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 183 training takes 0:05:06 [2024-08-10 16:53:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:53:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:53:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.525 (0.525) Loss 0.5229 (0.5229) Acc@1 88.428 (88.428) Acc@5 98.486 (98.486) Mem 16721MB [2024-08-10 16:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8120 (0.6344) Acc@1 80.615 (86.253) Acc@5 95.850 (97.661) Mem 16721MB [2024-08-10 16:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9102 (0.7465) Acc@1 78.320 (83.366) Acc@5 95.117 (96.519) Mem 16721MB [2024-08-10 16:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.021 Acc@5 96.491 [2024-08-10 16:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 16:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.02% [2024-08-10 16:53:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 16:53:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 16:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.526 (0.526) Loss 0.4746 (0.4746) Acc@1 89.404 (89.404) Acc@5 98.730 (98.730) Mem 16721MB [2024-08-10 16:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.163) Loss 0.7578 (0.5868) Acc@1 81.738 (87.269) Acc@5 96.484 (97.954) Mem 16721MB [2024-08-10 16:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 0.8447 (0.6885) Acc@1 79.297 (84.496) Acc@5 95.996 (96.998) Mem 16721MB [2024-08-10 16:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.171 Acc@5 96.989 [2024-08-10 16:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 16:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.17% [2024-08-10 16:53:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:53:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][0/625] eta 0:09:16 lr 0.000448 wd 0.0500 time 0.8908 (0.8908) data time 0.4723 (0.4723) model time 0.0000 (0.0000) loss 3.2977 (3.2977) grad_norm 1.7397 (1.7397) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][10/625] eta 0:05:21 lr 0.000448 wd 0.0500 time 0.4850 (0.5232) data time 0.0011 (0.0440) model time 0.0000 (0.0000) loss 3.0270 (2.9740) grad_norm 1.5848 (1.9701) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][20/625] eta 0:05:06 lr 0.000448 wd 0.0500 time 0.4870 (0.5062) data time 0.0011 (0.0236) model time 0.0000 (0.0000) loss 2.8070 (2.9315) grad_norm 1.7881 (1.8876) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][30/625] eta 0:04:57 lr 0.000448 wd 0.0500 time 0.4846 (0.4994) data time 0.0012 (0.0164) model time 0.0000 (0.0000) loss 2.5011 (2.9007) grad_norm 1.9600 (1.8720) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][40/625] eta 0:04:50 lr 0.000448 wd 0.0500 time 0.4954 (0.4961) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 1.7146 (2.8734) grad_norm 2.5946 (1.9012) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][50/625] eta 0:04:44 lr 0.000447 wd 0.0500 time 0.4815 (0.4940) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 3.3574 (2.8893) grad_norm 2.4650 (1.9726) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:53:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][60/625] eta 0:04:38 lr 0.000447 wd 0.0500 time 0.4893 (0.4927) data time 0.0008 (0.0089) model time 0.4885 (0.4850) loss 2.4737 (2.9011) grad_norm 2.7506 (1.9679) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][70/625] eta 0:04:32 lr 0.000447 wd 0.0500 time 0.4888 (0.4919) data time 0.0010 (0.0078) model time 0.4878 (0.4853) loss 2.9849 (2.8937) grad_norm 1.3486 (1.9421) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][80/625] eta 0:04:27 lr 0.000447 wd 0.0500 time 0.4879 (0.4911) data time 0.0009 (0.0070) model time 0.4871 (0.4851) loss 3.1893 (2.8862) grad_norm 1.9759 (1.9916) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][90/625] eta 0:04:22 lr 0.000447 wd 0.0500 time 0.4830 (0.4905) data time 0.0008 (0.0063) model time 0.4821 (0.4851) loss 3.2479 (2.8652) grad_norm 1.8201 (2.0200) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][100/625] eta 0:04:17 lr 0.000447 wd 0.0500 time 0.4811 (0.4899) data time 0.0011 (0.0058) model time 0.4801 (0.4846) loss 3.2818 (2.8715) grad_norm 1.8370 (2.0231) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][110/625] eta 0:04:11 lr 0.000447 wd 0.0500 time 0.4790 (0.4893) data time 0.0008 (0.0053) model time 0.4782 (0.4842) loss 3.1840 (2.8915) grad_norm 1.7700 (2.0038) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][120/625] eta 0:04:07 lr 0.000447 wd 0.0500 time 0.5972 (0.4898) data time 0.0011 (0.0050) model time 0.5961 (0.4856) loss 3.2067 (2.8701) grad_norm 2.5909 (2.0769) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][130/625] eta 0:04:02 lr 0.000447 wd 0.0500 time 0.4859 (0.4905) data time 0.0008 (0.0047) model time 0.4851 (0.4872) loss 1.9545 (2.8394) grad_norm 2.5827 (2.0889) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][140/625] eta 0:03:57 lr 0.000447 wd 0.0500 time 0.4909 (0.4901) data time 0.0009 (0.0044) model time 0.4900 (0.4868) loss 3.1688 (2.8168) grad_norm 1.7423 (2.0704) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][150/625] eta 0:03:52 lr 0.000446 wd 0.0500 time 0.4839 (0.4899) data time 0.0009 (0.0042) model time 0.4830 (0.4867) loss 3.1396 (2.8141) grad_norm 1.6930 (2.0470) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][160/625] eta 0:03:47 lr 0.000446 wd 0.0500 time 0.4853 (0.4897) data time 0.0008 (0.0040) model time 0.4845 (0.4866) loss 2.9754 (2.8204) grad_norm 1.7245 (2.0322) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][170/625] eta 0:03:42 lr 0.000446 wd 0.0500 time 0.4852 (0.4895) data time 0.0008 (0.0038) model time 0.4844 (0.4865) loss 2.7031 (2.8252) grad_norm 1.5628 (2.0060) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][180/625] eta 0:03:37 lr 0.000446 wd 0.0500 time 0.4834 (0.4892) data time 0.0012 (0.0037) model time 0.4823 (0.4863) loss 3.0131 (2.8161) grad_norm 2.2445 (2.0125) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][190/625] eta 0:03:32 lr 0.000446 wd 0.0500 time 0.4829 (0.4890) data time 0.0012 (0.0036) model time 0.4817 (0.4862) loss 3.1768 (2.8182) grad_norm 3.2276 (2.0386) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][200/625] eta 0:03:27 lr 0.000446 wd 0.0500 time 0.4862 (0.4887) data time 0.0008 (0.0034) model time 0.4855 (0.4859) loss 2.2962 (2.8071) grad_norm 1.9971 (2.0400) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][210/625] eta 0:03:22 lr 0.000446 wd 0.0500 time 0.4859 (0.4887) data time 0.0011 (0.0033) model time 0.4848 (0.4859) loss 1.8691 (2.8013) grad_norm 1.6450 (2.0389) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][220/625] eta 0:03:17 lr 0.000446 wd 0.0500 time 0.4836 (0.4886) data time 0.0009 (0.0032) model time 0.4827 (0.4859) loss 3.0255 (2.7942) grad_norm 1.4349 (2.0289) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][230/625] eta 0:03:12 lr 0.000446 wd 0.0500 time 0.4946 (0.4885) data time 0.0009 (0.0031) model time 0.4937 (0.4859) loss 3.0614 (2.7918) grad_norm 1.4688 (2.0227) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][240/625] eta 0:03:08 lr 0.000446 wd 0.0500 time 0.4842 (0.4884) data time 0.0011 (0.0030) model time 0.4832 (0.4859) loss 3.1393 (2.8034) grad_norm 2.8479 (2.0209) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][250/625] eta 0:03:03 lr 0.000445 wd 0.0500 time 0.4844 (0.4883) data time 0.0010 (0.0030) model time 0.4834 (0.4858) loss 2.7509 (2.8116) grad_norm 1.6820 (2.0200) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][260/625] eta 0:02:58 lr 0.000445 wd 0.0500 time 0.4874 (0.4881) data time 0.0009 (0.0029) model time 0.4865 (0.4857) loss 2.8603 (2.8155) grad_norm 2.8820 (2.0189) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][270/625] eta 0:02:53 lr 0.000445 wd 0.0500 time 0.4835 (0.4880) data time 0.0010 (0.0028) model time 0.4825 (0.4856) loss 3.0872 (2.8117) grad_norm 2.1145 (2.0631) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][280/625] eta 0:02:48 lr 0.000445 wd 0.0500 time 0.4824 (0.4879) data time 0.0010 (0.0027) model time 0.4814 (0.4855) loss 2.9207 (2.8171) grad_norm 2.1163 (2.0746) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][290/625] eta 0:02:43 lr 0.000445 wd 0.0500 time 0.4908 (0.4879) data time 0.0011 (0.0027) model time 0.4897 (0.4855) loss 2.7531 (2.8159) grad_norm 1.3286 (2.0653) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][300/625] eta 0:02:38 lr 0.000445 wd 0.0500 time 0.4839 (0.4878) data time 0.0008 (0.0026) model time 0.4831 (0.4855) loss 1.9681 (2.7973) grad_norm 1.6534 (2.0563) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:55:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][310/625] eta 0:02:33 lr 0.000445 wd 0.0500 time 0.4849 (0.4877) data time 0.0011 (0.0026) model time 0.4837 (0.4855) loss 2.5645 (2.7924) grad_norm 2.3041 (2.0477) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][320/625] eta 0:02:28 lr 0.000445 wd 0.0500 time 0.4838 (0.4876) data time 0.0008 (0.0025) model time 0.4830 (0.4854) loss 1.8889 (2.7882) grad_norm 1.7473 (2.0634) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][330/625] eta 0:02:23 lr 0.000445 wd 0.0500 time 0.4818 (0.4875) data time 0.0011 (0.0025) model time 0.4807 (0.4853) loss 2.4864 (2.7893) grad_norm 1.2330 (2.0468) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][340/625] eta 0:02:19 lr 0.000444 wd 0.0500 time 0.6968 (0.4880) data time 0.0008 (0.0024) model time 0.6960 (0.4859) loss 3.2020 (2.7919) grad_norm 1.6186 (2.0418) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][350/625] eta 0:02:14 lr 0.000444 wd 0.0500 time 0.4820 (0.4877) data time 0.0011 (0.0024) model time 0.4809 (0.4856) loss 3.3381 (2.7999) grad_norm 1.9315 (2.0412) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][360/625] eta 0:02:09 lr 0.000444 wd 0.0500 time 0.4843 (0.4877) data time 0.0011 (0.0024) model time 0.4832 (0.4857) loss 3.3943 (2.8006) grad_norm 3.7752 (2.0355) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][370/625] eta 0:02:04 lr 0.000444 wd 0.0500 time 0.4853 (0.4877) data time 0.0013 (0.0023) model time 0.4840 (0.4856) loss 3.1782 (2.7960) grad_norm 1.4809 (2.0480) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][380/625] eta 0:01:59 lr 0.000444 wd 0.0500 time 0.4956 (0.4877) data time 0.0008 (0.0023) model time 0.4948 (0.4857) loss 3.3443 (2.8019) grad_norm 1.8314 (2.0450) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][390/625] eta 0:01:54 lr 0.000444 wd 0.0500 time 0.4834 (0.4876) data time 0.0011 (0.0023) model time 0.4824 (0.4856) loss 2.9831 (2.7989) grad_norm 2.3779 (2.0577) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][400/625] eta 0:01:49 lr 0.000444 wd 0.0500 time 0.4775 (0.4874) data time 0.0011 (0.0022) model time 0.4764 (0.4854) loss 2.9316 (2.7930) grad_norm 1.5391 (2.0520) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][410/625] eta 0:01:44 lr 0.000444 wd 0.0500 time 0.4844 (0.4874) data time 0.0011 (0.0022) model time 0.4833 (0.4854) loss 3.3690 (2.7954) grad_norm 1.5122 (2.0466) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][420/625] eta 0:01:39 lr 0.000444 wd 0.0500 time 0.4774 (0.4872) data time 0.0011 (0.0022) model time 0.4763 (0.4853) loss 1.8702 (2.7947) grad_norm 1.9505 (2.0379) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][430/625] eta 0:01:34 lr 0.000444 wd 0.0500 time 0.4859 (0.4871) data time 0.0011 (0.0022) model time 0.4849 (0.4852) loss 3.2203 (2.7980) grad_norm 2.4605 (2.0639) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][440/625] eta 0:01:30 lr 0.000443 wd 0.0500 time 0.4818 (0.4871) data time 0.0012 (0.0021) model time 0.4806 (0.4852) loss 2.6730 (2.7968) grad_norm 1.7639 (2.0593) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][450/625] eta 0:01:25 lr 0.000443 wd 0.0500 time 0.4822 (0.4870) data time 0.0011 (0.0021) model time 0.4811 (0.4851) loss 2.5514 (2.8003) grad_norm 1.4416 (2.0503) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][460/625] eta 0:01:20 lr 0.000443 wd 0.0500 time 0.4822 (0.4870) data time 0.0009 (0.0021) model time 0.4813 (0.4851) loss 3.4184 (2.7970) grad_norm 2.8293 (2.0524) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][470/625] eta 0:01:15 lr 0.000443 wd 0.0500 time 0.4869 (0.4875) data time 0.0011 (0.0021) model time 0.4858 (0.4856) loss 2.2722 (2.7979) grad_norm 1.7375 (2.0596) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][480/625] eta 0:01:10 lr 0.000443 wd 0.0500 time 0.4833 (0.4874) data time 0.0011 (0.0020) model time 0.4823 (0.4855) loss 3.1597 (2.8044) grad_norm 1.8679 (2.0710) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][490/625] eta 0:01:05 lr 0.000443 wd 0.0500 time 0.4849 (0.4877) data time 0.0008 (0.0020) model time 0.4841 (0.4859) loss 2.3468 (2.8024) grad_norm 1.7393 (2.0640) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][500/625] eta 0:01:00 lr 0.000443 wd 0.0500 time 0.4845 (0.4876) data time 0.0009 (0.0020) model time 0.4836 (0.4859) loss 2.4773 (2.8004) grad_norm 5.6279 (2.1703) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][510/625] eta 0:00:56 lr 0.000443 wd 0.0500 time 0.4890 (0.4877) data time 0.0011 (0.0020) model time 0.4879 (0.4859) loss 3.3045 (2.8001) grad_norm 4.0531 (2.1949) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][520/625] eta 0:00:51 lr 0.000443 wd 0.0500 time 0.4876 (0.4876) data time 0.0008 (0.0020) model time 0.4868 (0.4859) loss 2.7305 (2.7962) grad_norm 1.9699 (2.1983) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][530/625] eta 0:00:46 lr 0.000443 wd 0.0500 time 0.4910 (0.4876) data time 0.0011 (0.0019) model time 0.4899 (0.4859) loss 2.9950 (2.8012) grad_norm 1.8123 (2.1966) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][540/625] eta 0:00:41 lr 0.000442 wd 0.0500 time 0.4830 (0.4876) data time 0.0008 (0.0019) model time 0.4822 (0.4858) loss 2.4228 (2.7987) grad_norm 2.0229 (2.2197) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][550/625] eta 0:00:36 lr 0.000442 wd 0.0500 time 0.4868 (0.4875) data time 0.0011 (0.0019) model time 0.4857 (0.4858) loss 3.2364 (2.7990) grad_norm 1.6627 (2.2212) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][560/625] eta 0:00:31 lr 0.000442 wd 0.0500 time 0.4810 (0.4874) data time 0.0011 (0.0019) model time 0.4799 (0.4857) loss 3.3177 (2.8022) grad_norm 1.4286 (2.2153) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][570/625] eta 0:00:26 lr 0.000442 wd 0.0500 time 0.4855 (0.4874) data time 0.0011 (0.0019) model time 0.4844 (0.4857) loss 3.0998 (2.7976) grad_norm 2.1655 (2.2159) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][580/625] eta 0:00:21 lr 0.000442 wd 0.0500 time 0.4904 (0.4874) data time 0.0008 (0.0019) model time 0.4896 (0.4857) loss 1.9986 (2.7963) grad_norm 1.7119 (2.2158) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][590/625] eta 0:00:17 lr 0.000442 wd 0.0500 time 0.4879 (0.4874) data time 0.0014 (0.0019) model time 0.4865 (0.4858) loss 2.7347 (2.7976) grad_norm 3.6251 (2.2434) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][600/625] eta 0:00:12 lr 0.000442 wd 0.0500 time 0.4829 (0.4874) data time 0.0011 (0.0018) model time 0.4818 (0.4858) loss 2.9413 (2.7964) grad_norm 1.5211 (2.2420) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][610/625] eta 0:00:07 lr 0.000442 wd 0.0500 time 0.4816 (0.4874) data time 0.0006 (0.0018) model time 0.4810 (0.4858) loss 3.2948 (2.7969) grad_norm 1.5089 (2.2337) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [184/300][620/625] eta 0:00:02 lr 0.000442 wd 0.0500 time 0.4771 (0.4873) data time 0.0008 (0.0018) model time 0.4763 (0.4857) loss 2.6945 (2.7901) grad_norm 2.6877 (2.2939) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 184 training takes 0:05:04 [2024-08-10 16:58:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 16:58:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 16:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5093 (0.5093) Acc@1 88.867 (88.867) Acc@5 98.633 (98.633) Mem 16721MB [2024-08-10 16:58:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8179 (0.6390) Acc@1 80.566 (86.026) Acc@5 95.850 (97.603) Mem 16721MB [2024-08-10 16:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.142) Loss 0.9126 (0.7496) Acc@1 78.320 (83.133) Acc@5 94.775 (96.482) Mem 16721MB [2024-08-10 16:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.855 Acc@5 96.505 [2024-08-10 16:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 16:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.825 (0.825) Loss 0.4744 (0.4744) Acc@1 89.453 (89.453) Acc@5 98.730 (98.730) Mem 16721MB [2024-08-10 16:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.193) Loss 0.7568 (0.5865) Acc@1 81.787 (87.274) Acc@5 96.436 (97.949) Mem 16721MB [2024-08-10 16:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.8447 (0.6882) Acc@1 79.443 (84.531) Acc@5 95.996 (96.994) Mem 16721MB [2024-08-10 16:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.201 Acc@5 96.985 [2024-08-10 16:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 16:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.20% [2024-08-10 16:58:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 16:58:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 16:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][0/625] eta 0:08:39 lr 0.000442 wd 0.0500 time 0.8306 (0.8306) data time 0.4100 (0.4100) model time 0.0000 (0.0000) loss 3.0447 (3.0447) grad_norm 1.5784 (1.5784) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][10/625] eta 0:05:18 lr 0.000441 wd 0.0500 time 0.4903 (0.5178) data time 0.0012 (0.0383) model time 0.0000 (0.0000) loss 2.0823 (2.4961) grad_norm 1.5973 (1.6640) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][20/625] eta 0:05:04 lr 0.000441 wd 0.0500 time 0.4848 (0.5039) data time 0.0009 (0.0206) model time 0.0000 (0.0000) loss 3.5129 (2.6867) grad_norm 1.7267 (2.1566) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][30/625] eta 0:04:56 lr 0.000441 wd 0.0500 time 0.4840 (0.4991) data time 0.0011 (0.0143) model time 0.0000 (0.0000) loss 3.3280 (2.7929) grad_norm 1.4184 (2.0574) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][40/625] eta 0:04:50 lr 0.000441 wd 0.0500 time 0.4836 (0.4958) data time 0.0011 (0.0111) model time 0.0000 (0.0000) loss 3.0817 (2.7765) grad_norm 1.8577 (2.0720) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][50/625] eta 0:04:48 lr 0.000441 wd 0.0500 time 0.4840 (0.5010) data time 0.0010 (0.0091) model time 0.0000 (0.0000) loss 3.0335 (2.7906) grad_norm 12.7389 (2.2281) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][60/625] eta 0:04:41 lr 0.000441 wd 0.0500 time 0.4801 (0.4981) data time 0.0009 (0.0079) model time 0.4792 (0.4814) loss 3.6417 (2.8130) grad_norm 1.7315 (2.2900) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][70/625] eta 0:04:35 lr 0.000441 wd 0.0500 time 0.4781 (0.4961) data time 0.0010 (0.0069) model time 0.4770 (0.4823) loss 2.7259 (2.8033) grad_norm 1.3352 (2.2281) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][80/625] eta 0:04:29 lr 0.000441 wd 0.0500 time 0.4851 (0.4945) data time 0.0008 (0.0062) model time 0.4842 (0.4820) loss 2.0668 (2.7937) grad_norm 1.7811 (2.1987) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][90/625] eta 0:04:23 lr 0.000441 wd 0.0500 time 0.4845 (0.4934) data time 0.0008 (0.0056) model time 0.4837 (0.4825) loss 2.2799 (2.7892) grad_norm 2.3180 (2.1488) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][100/625] eta 0:04:18 lr 0.000441 wd 0.0500 time 0.4821 (0.4926) data time 0.0008 (0.0052) model time 0.4814 (0.4829) loss 3.0779 (2.8073) grad_norm 1.9014 (2.1069) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][110/625] eta 0:04:13 lr 0.000440 wd 0.0500 time 0.4877 (0.4920) data time 0.0011 (0.0048) model time 0.4866 (0.4832) loss 2.9591 (2.7998) grad_norm 2.4630 (2.1212) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][120/625] eta 0:04:08 lr 0.000440 wd 0.0500 time 0.4866 (0.4914) data time 0.0010 (0.0045) model time 0.4856 (0.4833) loss 2.0404 (2.8003) grad_norm 1.8288 (2.1935) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][130/625] eta 0:04:02 lr 0.000440 wd 0.0500 time 0.4793 (0.4908) data time 0.0008 (0.0042) model time 0.4785 (0.4831) loss 3.2145 (2.8075) grad_norm 2.6754 (2.2093) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][140/625] eta 0:03:57 lr 0.000440 wd 0.0500 time 0.4851 (0.4903) data time 0.0010 (0.0040) model time 0.4841 (0.4831) loss 3.3325 (2.8099) grad_norm 1.5834 (2.2131) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 16:59:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][150/625] eta 0:03:52 lr 0.000440 wd 0.0500 time 0.4855 (0.4898) data time 0.0012 (0.0038) model time 0.4843 (0.4829) loss 2.7407 (2.8096) grad_norm 1.9457 (2.2650) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][160/625] eta 0:03:47 lr 0.000440 wd 0.0500 time 0.4862 (0.4894) data time 0.0008 (0.0036) model time 0.4854 (0.4828) loss 3.1576 (2.8185) grad_norm 2.4988 (2.2412) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][170/625] eta 0:03:42 lr 0.000440 wd 0.0500 time 0.4924 (0.4892) data time 0.0011 (0.0035) model time 0.4913 (0.4831) loss 3.2889 (2.8126) grad_norm 2.2826 (2.2649) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][180/625] eta 0:03:37 lr 0.000440 wd 0.0500 time 0.4840 (0.4890) data time 0.0008 (0.0034) model time 0.4832 (0.4832) loss 3.7181 (2.8216) grad_norm 1.7529 (2.2697) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][190/625] eta 0:03:32 lr 0.000440 wd 0.0500 time 0.4843 (0.4889) data time 0.0009 (0.0032) model time 0.4834 (0.4833) loss 3.5509 (2.8286) grad_norm 2.0481 (2.2712) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][200/625] eta 0:03:27 lr 0.000440 wd 0.0500 time 0.4884 (0.4887) data time 0.0012 (0.0031) model time 0.4872 (0.4834) loss 2.9998 (2.8235) grad_norm 1.9195 (2.2492) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][210/625] eta 0:03:22 lr 0.000439 wd 0.0500 time 0.4803 (0.4885) data time 0.0011 (0.0030) model time 0.4793 (0.4835) loss 3.4290 (2.8288) grad_norm 3.1808 (2.3384) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][220/625] eta 0:03:17 lr 0.000439 wd 0.0500 time 0.4780 (0.4883) data time 0.0012 (0.0029) model time 0.4768 (0.4834) loss 2.5992 (2.8288) grad_norm 1.3863 (2.3244) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][230/625] eta 0:03:12 lr 0.000439 wd 0.0500 time 0.4852 (0.4880) data time 0.0010 (0.0029) model time 0.4842 (0.4832) loss 2.8955 (2.8252) grad_norm 1.7525 (2.3302) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][240/625] eta 0:03:08 lr 0.000439 wd 0.0500 time 0.4826 (0.4885) data time 0.0011 (0.0028) model time 0.4815 (0.4841) loss 3.2318 (2.8289) grad_norm 2.0667 (2.3203) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][250/625] eta 0:03:03 lr 0.000439 wd 0.0500 time 0.4840 (0.4883) data time 0.0011 (0.0027) model time 0.4830 (0.4840) loss 2.5003 (2.8173) grad_norm 3.1068 (2.3168) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][260/625] eta 0:02:58 lr 0.000439 wd 0.0500 time 0.4803 (0.4882) data time 0.0008 (0.0027) model time 0.4795 (0.4840) loss 2.6742 (2.8153) grad_norm 3.3151 (2.3316) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][270/625] eta 0:02:53 lr 0.000439 wd 0.0500 time 0.4853 (0.4880) data time 0.0008 (0.0026) model time 0.4845 (0.4839) loss 3.0552 (2.8140) grad_norm 1.6517 (2.3385) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:00:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][280/625] eta 0:02:48 lr 0.000439 wd 0.0500 time 0.4762 (0.4878) data time 0.0009 (0.0025) model time 0.4754 (0.4838) loss 3.2245 (2.8128) grad_norm 1.4158 (2.3272) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][290/625] eta 0:02:43 lr 0.000439 wd 0.0500 time 0.4774 (0.4876) data time 0.0011 (0.0025) model time 0.4763 (0.4836) loss 2.7609 (2.8136) grad_norm 1.6204 (2.3186) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][300/625] eta 0:02:38 lr 0.000438 wd 0.0500 time 0.4867 (0.4874) data time 0.0010 (0.0024) model time 0.4857 (0.4835) loss 2.7329 (2.8163) grad_norm 1.5312 (2.2925) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][310/625] eta 0:02:33 lr 0.000438 wd 0.0500 time 0.4863 (0.4873) data time 0.0008 (0.0024) model time 0.4855 (0.4835) loss 3.1985 (2.8142) grad_norm 1.9235 (2.2883) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][320/625] eta 0:02:28 lr 0.000438 wd 0.0500 time 0.4830 (0.4873) data time 0.0009 (0.0024) model time 0.4821 (0.4836) loss 3.4270 (2.8146) grad_norm 2.6631 (2.2775) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][330/625] eta 0:02:23 lr 0.000438 wd 0.0500 time 0.4952 (0.4873) data time 0.0010 (0.0023) model time 0.4942 (0.4837) loss 3.2018 (2.8114) grad_norm 2.2013 (2.2740) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][340/625] eta 0:02:18 lr 0.000438 wd 0.0500 time 0.4829 (0.4873) data time 0.0007 (0.0023) model time 0.4822 (0.4838) loss 3.2000 (2.8093) grad_norm 2.0020 (2.2610) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][350/625] eta 0:02:13 lr 0.000438 wd 0.0500 time 0.4830 (0.4872) data time 0.0008 (0.0022) model time 0.4822 (0.4838) loss 3.1002 (2.8162) grad_norm 1.4062 (2.2449) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][360/625] eta 0:02:09 lr 0.000438 wd 0.0500 time 0.4891 (0.4872) data time 0.0008 (0.0022) model time 0.4883 (0.4838) loss 2.6557 (2.8213) grad_norm 2.1545 (2.2456) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][370/625] eta 0:02:04 lr 0.000438 wd 0.0500 time 0.4807 (0.4871) data time 0.0008 (0.0022) model time 0.4799 (0.4838) loss 3.3524 (2.8210) grad_norm 1.7638 (2.2397) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][380/625] eta 0:01:59 lr 0.000438 wd 0.0500 time 0.4889 (0.4875) data time 0.0007 (0.0022) model time 0.4881 (0.4843) loss 2.9266 (2.8192) grad_norm 2.2086 (2.2304) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][390/625] eta 0:01:54 lr 0.000438 wd 0.0500 time 0.5006 (0.4875) data time 0.0008 (0.0021) model time 0.4999 (0.4844) loss 2.6568 (2.8169) grad_norm 1.6269 (2.2203) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:01:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][400/625] eta 0:01:49 lr 0.000437 wd 0.0500 time 0.4830 (0.4874) data time 0.0009 (0.0021) model time 0.4821 (0.4844) loss 3.4430 (2.8144) grad_norm 3.1579 (2.2224) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][410/625] eta 0:01:44 lr 0.000437 wd 0.0500 time 0.4866 (0.4874) data time 0.0008 (0.0021) model time 0.4858 (0.4843) loss 2.5537 (2.8154) grad_norm 2.2687 (2.2289) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][420/625] eta 0:01:39 lr 0.000437 wd 0.0500 time 0.4816 (0.4872) data time 0.0009 (0.0021) model time 0.4807 (0.4842) loss 2.0094 (2.8112) grad_norm 1.6064 (2.2243) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][430/625] eta 0:01:34 lr 0.000437 wd 0.0500 time 0.4821 (0.4871) data time 0.0008 (0.0020) model time 0.4814 (0.4841) loss 3.4248 (2.8168) grad_norm 1.9345 (2.2187) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][440/625] eta 0:01:30 lr 0.000437 wd 0.0500 time 0.4807 (0.4869) data time 0.0008 (0.0020) model time 0.4799 (0.4840) loss 1.7697 (2.8086) grad_norm 2.3902 (2.2227) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][450/625] eta 0:01:25 lr 0.000437 wd 0.0500 time 0.4836 (0.4868) data time 0.0010 (0.0020) model time 0.4826 (0.4839) loss 3.3342 (2.8091) grad_norm 2.4784 (2.2307) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][460/625] eta 0:01:20 lr 0.000437 wd 0.0500 time 0.4841 (0.4873) data time 0.0007 (0.0020) model time 0.4833 (0.4845) loss 2.2174 (2.8121) grad_norm 2.2087 (2.2302) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][470/625] eta 0:01:15 lr 0.000437 wd 0.0500 time 0.4868 (0.4872) data time 0.0008 (0.0019) model time 0.4859 (0.4844) loss 2.1952 (2.8142) grad_norm 1.8827 (2.2212) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][480/625] eta 0:01:10 lr 0.000437 wd 0.0500 time 0.4861 (0.4872) data time 0.0011 (0.0019) model time 0.4849 (0.4845) loss 2.2837 (2.8221) grad_norm 5.8923 (2.2253) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][490/625] eta 0:01:05 lr 0.000437 wd 0.0500 time 0.4783 (0.4871) data time 0.0010 (0.0019) model time 0.4773 (0.4844) loss 3.3776 (2.8178) grad_norm 4.2576 (2.2249) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][500/625] eta 0:01:00 lr 0.000436 wd 0.0500 time 0.4835 (0.4870) data time 0.0007 (0.0019) model time 0.4828 (0.4843) loss 3.0205 (2.8192) grad_norm 1.2965 (2.2194) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][510/625] eta 0:00:55 lr 0.000436 wd 0.0500 time 0.4851 (0.4869) data time 0.0011 (0.0019) model time 0.4840 (0.4843) loss 2.7727 (2.8217) grad_norm 1.4896 (2.2165) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][520/625] eta 0:00:51 lr 0.000436 wd 0.0500 time 0.4813 (0.4868) data time 0.0012 (0.0019) model time 0.4801 (0.4842) loss 2.2537 (2.8208) grad_norm 2.1429 (2.2076) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][530/625] eta 0:00:46 lr 0.000436 wd 0.0500 time 0.4849 (0.4868) data time 0.0010 (0.0018) model time 0.4838 (0.4842) loss 2.5427 (2.8183) grad_norm 1.2920 (2.1953) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][540/625] eta 0:00:41 lr 0.000436 wd 0.0500 time 0.4871 (0.4867) data time 0.0007 (0.0018) model time 0.4864 (0.4842) loss 3.3014 (2.8194) grad_norm 2.5993 (2.1995) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][550/625] eta 0:00:36 lr 0.000436 wd 0.0500 time 0.4854 (0.4867) data time 0.0010 (0.0018) model time 0.4844 (0.4842) loss 2.6482 (2.8228) grad_norm 2.0319 (2.1945) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][560/625] eta 0:00:31 lr 0.000436 wd 0.0500 time 0.4830 (0.4867) data time 0.0011 (0.0018) model time 0.4819 (0.4842) loss 3.2275 (2.8255) grad_norm 1.9383 (2.1881) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][570/625] eta 0:00:26 lr 0.000436 wd 0.0500 time 0.4847 (0.4868) data time 0.0009 (0.0018) model time 0.4838 (0.4843) loss 3.1500 (2.8236) grad_norm 1.4133 (2.1784) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][580/625] eta 0:00:21 lr 0.000436 wd 0.0500 time 0.4865 (0.4867) data time 0.0011 (0.0018) model time 0.4853 (0.4843) loss 3.2061 (2.8270) grad_norm 1.8423 (2.1744) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][590/625] eta 0:00:17 lr 0.000436 wd 0.0500 time 0.4875 (0.4867) data time 0.0011 (0.0018) model time 0.4864 (0.4843) loss 2.7931 (2.8293) grad_norm 2.6269 (2.1726) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][600/625] eta 0:00:12 lr 0.000435 wd 0.0500 time 0.4854 (0.4867) data time 0.0012 (0.0018) model time 0.4843 (0.4843) loss 2.2352 (2.8293) grad_norm 1.4125 (2.1693) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][610/625] eta 0:00:07 lr 0.000435 wd 0.0500 time 0.4815 (0.4871) data time 0.0008 (0.0018) model time 0.4807 (0.4847) loss 3.5882 (2.8295) grad_norm 1.2287 (2.1599) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [185/300][620/625] eta 0:00:02 lr 0.000435 wd 0.0500 time 0.4854 (0.4870) data time 0.0008 (0.0017) model time 0.4846 (0.4846) loss 2.8226 (2.8277) grad_norm 1.7835 (2.1715) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:03:45 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 185 training takes 0:05:04 [2024-08-10 17:03:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:03:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:03:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.507 (0.507) Loss 0.5288 (0.5288) Acc@1 87.842 (87.842) Acc@5 98.584 (98.584) Mem 16721MB [2024-08-10 17:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.8208 (0.6321) Acc@1 80.566 (86.315) Acc@5 95.996 (97.661) Mem 16721MB [2024-08-10 17:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.9082 (0.7495) Acc@1 78.906 (83.378) Acc@5 95.215 (96.501) Mem 16721MB [2024-08-10 17:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.993 Acc@5 96.449 [2024-08-10 17:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 17:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.885 (0.885) Loss 0.4744 (0.4744) Acc@1 89.355 (89.355) Acc@5 98.730 (98.730) Mem 16721MB [2024-08-10 17:03:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.196) Loss 0.7549 (0.5858) Acc@1 81.885 (87.291) Acc@5 96.533 (97.949) Mem 16721MB [2024-08-10 17:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.159) Loss 0.8433 (0.6879) Acc@1 79.248 (84.521) Acc@5 95.996 (96.996) Mem 16721MB [2024-08-10 17:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.213 Acc@5 96.987 [2024-08-10 17:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 17:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.21% [2024-08-10 17:03:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 17:03:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 17:03:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][0/625] eta 0:08:40 lr 0.000435 wd 0.0500 time 0.8324 (0.8324) data time 0.4127 (0.4127) model time 0.0000 (0.0000) loss 3.0624 (3.0624) grad_norm 3.2353 (3.2353) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][10/625] eta 0:05:15 lr 0.000435 wd 0.0500 time 0.4866 (0.5136) data time 0.0011 (0.0385) model time 0.0000 (0.0000) loss 2.8807 (2.5882) grad_norm 1.4365 (2.0375) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][20/625] eta 0:05:01 lr 0.000435 wd 0.0500 time 0.4782 (0.4988) data time 0.0011 (0.0207) model time 0.0000 (0.0000) loss 3.2217 (2.7112) grad_norm 1.9807 (1.9277) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][30/625] eta 0:04:53 lr 0.000435 wd 0.0500 time 0.4796 (0.4940) data time 0.0011 (0.0144) model time 0.0000 (0.0000) loss 3.3114 (2.7487) grad_norm 1.7908 (2.0361) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][40/625] eta 0:04:47 lr 0.000435 wd 0.0500 time 0.4903 (0.4919) data time 0.0008 (0.0111) model time 0.0000 (0.0000) loss 3.3237 (2.8240) grad_norm 2.1547 (1.9880) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][50/625] eta 0:04:41 lr 0.000435 wd 0.0500 time 0.4838 (0.4903) data time 0.0011 (0.0092) model time 0.0000 (0.0000) loss 2.0643 (2.7958) grad_norm 1.8841 (2.2198) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][60/625] eta 0:04:36 lr 0.000435 wd 0.0500 time 0.4851 (0.4891) data time 0.0008 (0.0078) model time 0.4843 (0.4818) loss 2.4824 (2.7995) grad_norm 2.7259 (2.2152) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][70/625] eta 0:04:30 lr 0.000434 wd 0.0500 time 0.4835 (0.4883) data time 0.0011 (0.0069) model time 0.4823 (0.4819) loss 2.0915 (2.8128) grad_norm 2.1562 (2.1587) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][80/625] eta 0:04:25 lr 0.000434 wd 0.0500 time 0.4843 (0.4874) data time 0.0009 (0.0062) model time 0.4834 (0.4813) loss 1.7815 (2.8152) grad_norm 2.1991 (2.1156) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-10 17:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][90/625] eta 0:04:20 lr 0.000434 wd 0.0500 time 0.4795 (0.4873) data time 0.0010 (0.0056) model time 0.4786 (0.4823) loss 2.5239 (2.8237) grad_norm 1.9794 (2.0906) loss_scale 256.0000 (137.8462) mem 16721MB [2024-08-10 17:04:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][100/625] eta 0:04:15 lr 0.000434 wd 0.0500 time 0.4806 (0.4866) data time 0.0011 (0.0052) model time 0.4795 (0.4817) loss 2.9918 (2.8370) grad_norm 2.1736 (2.0631) loss_scale 256.0000 (149.5446) mem 16721MB [2024-08-10 17:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][110/625] eta 0:04:10 lr 0.000434 wd 0.0500 time 0.4857 (0.4864) data time 0.0008 (0.0048) model time 0.4848 (0.4820) loss 3.4364 (2.8598) grad_norm 2.3458 (2.0880) loss_scale 256.0000 (159.1351) mem 16721MB [2024-08-10 17:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][120/625] eta 0:04:05 lr 0.000434 wd 0.0500 time 0.4839 (0.4864) data time 0.0012 (0.0045) model time 0.4827 (0.4824) loss 2.3206 (2.8449) grad_norm 2.0625 (2.0694) loss_scale 256.0000 (167.1405) mem 16721MB [2024-08-10 17:05:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][130/625] eta 0:04:00 lr 0.000434 wd 0.0500 time 0.4814 (0.4863) data time 0.0012 (0.0042) model time 0.4802 (0.4826) loss 2.4196 (2.8581) grad_norm 1.6869 (2.0784) loss_scale 256.0000 (173.9237) mem 16721MB [2024-08-10 17:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][140/625] eta 0:03:55 lr 0.000434 wd 0.0500 time 0.4831 (0.4862) data time 0.0011 (0.0040) model time 0.4820 (0.4827) loss 3.2191 (2.8372) grad_norm 1.4885 (2.0551) loss_scale 256.0000 (179.7447) mem 16721MB [2024-08-10 17:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][150/625] eta 0:03:50 lr 0.000434 wd 0.0500 time 0.4822 (0.4860) data time 0.0008 (0.0038) model time 0.4814 (0.4826) loss 3.0423 (2.8371) grad_norm 2.2474 (2.0841) loss_scale 256.0000 (184.7947) mem 16721MB [2024-08-10 17:05:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][160/625] eta 0:03:45 lr 0.000434 wd 0.0500 time 0.4823 (0.4858) data time 0.0007 (0.0036) model time 0.4816 (0.4826) loss 3.3548 (2.8271) grad_norm 1.5680 (2.0616) loss_scale 256.0000 (189.2174) mem 16721MB [2024-08-10 17:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][170/625] eta 0:03:40 lr 0.000433 wd 0.0500 time 0.4814 (0.4856) data time 0.0008 (0.0035) model time 0.4807 (0.4824) loss 2.2410 (2.8346) grad_norm 2.3594 (2.0494) loss_scale 256.0000 (193.1228) mem 16721MB [2024-08-10 17:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][180/625] eta 0:03:36 lr 0.000433 wd 0.0500 time 0.4921 (0.4855) data time 0.0014 (0.0034) model time 0.4908 (0.4825) loss 2.7243 (2.8328) grad_norm 1.7063 (2.0425) loss_scale 256.0000 (196.5967) mem 16721MB [2024-08-10 17:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][190/625] eta 0:03:31 lr 0.000433 wd 0.0500 time 0.4853 (0.4854) data time 0.0008 (0.0032) model time 0.4845 (0.4826) loss 2.1362 (2.8241) grad_norm 2.2135 (2.0490) loss_scale 256.0000 (199.7068) mem 16721MB [2024-08-10 17:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][200/625] eta 0:03:26 lr 0.000433 wd 0.0500 time 0.4834 (0.4853) data time 0.0008 (0.0031) model time 0.4826 (0.4825) loss 1.9040 (2.8189) grad_norm 2.4728 (2.0408) loss_scale 256.0000 (202.5075) mem 16721MB [2024-08-10 17:05:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][210/625] eta 0:03:22 lr 0.000433 wd 0.0500 time 0.4791 (0.4869) data time 0.0010 (0.0030) model time 0.4781 (0.4847) loss 1.7336 (2.8181) grad_norm 1.6692 (2.0394) loss_scale 256.0000 (205.0427) mem 16721MB [2024-08-10 17:05:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][220/625] eta 0:03:17 lr 0.000433 wd 0.0500 time 0.4896 (0.4867) data time 0.0008 (0.0029) model time 0.4888 (0.4845) loss 2.7027 (2.8218) grad_norm 3.1647 (2.0358) loss_scale 256.0000 (207.3484) mem 16721MB [2024-08-10 17:05:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][230/625] eta 0:03:12 lr 0.000433 wd 0.0500 time 0.4800 (0.4865) data time 0.0008 (0.0029) model time 0.4792 (0.4843) loss 2.2884 (2.8148) grad_norm 1.8096 (2.0206) loss_scale 256.0000 (209.4545) mem 16721MB [2024-08-10 17:05:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][240/625] eta 0:03:07 lr 0.000433 wd 0.0500 time 0.4784 (0.4863) data time 0.0008 (0.0028) model time 0.4776 (0.4842) loss 3.0226 (2.8203) grad_norm 6.7490 (2.0535) loss_scale 256.0000 (211.3859) mem 16721MB [2024-08-10 17:05:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][250/625] eta 0:03:02 lr 0.000433 wd 0.0500 time 0.4820 (0.4862) data time 0.0011 (0.0027) model time 0.4809 (0.4841) loss 3.1330 (2.8147) grad_norm 1.6778 (2.0577) loss_scale 256.0000 (213.1633) mem 16721MB [2024-08-10 17:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][260/625] eta 0:02:57 lr 0.000433 wd 0.0500 time 0.4877 (0.4862) data time 0.0010 (0.0027) model time 0.4867 (0.4841) loss 2.2867 (2.7983) grad_norm 3.1530 (2.0561) loss_scale 256.0000 (214.8046) mem 16721MB [2024-08-10 17:06:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][270/625] eta 0:02:52 lr 0.000432 wd 0.0500 time 0.4842 (0.4863) data time 0.0010 (0.0026) model time 0.4831 (0.4843) loss 2.0835 (2.7955) grad_norm 1.3345 (2.1251) loss_scale 256.0000 (216.3247) mem 16721MB [2024-08-10 17:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][280/625] eta 0:02:47 lr 0.000432 wd 0.0500 time 0.4842 (0.4863) data time 0.0008 (0.0025) model time 0.4834 (0.4843) loss 2.9717 (2.8005) grad_norm 2.1948 (2.1966) loss_scale 256.0000 (217.7367) mem 16721MB [2024-08-10 17:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][290/625] eta 0:02:42 lr 0.000432 wd 0.0500 time 0.4845 (0.4863) data time 0.0008 (0.0025) model time 0.4836 (0.4844) loss 3.1522 (2.7937) grad_norm 1.6917 (2.2011) loss_scale 256.0000 (219.0515) mem 16721MB [2024-08-10 17:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][300/625] eta 0:02:38 lr 0.000432 wd 0.0500 time 0.4891 (0.4863) data time 0.0010 (0.0024) model time 0.4881 (0.4844) loss 3.0337 (2.7996) grad_norm 1.6275 (2.1841) loss_scale 256.0000 (220.2791) mem 16721MB [2024-08-10 17:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][310/625] eta 0:02:33 lr 0.000432 wd 0.0500 time 0.4845 (0.4867) data time 0.0010 (0.0024) model time 0.4835 (0.4849) loss 3.3453 (2.7933) grad_norm 2.2827 (2.1984) loss_scale 256.0000 (221.4277) mem 16721MB [2024-08-10 17:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][320/625] eta 0:02:28 lr 0.000432 wd 0.0500 time 0.4864 (0.4867) data time 0.0011 (0.0024) model time 0.4853 (0.4850) loss 3.0978 (2.7968) grad_norm 2.4981 (2.1990) loss_scale 256.0000 (222.5047) mem 16721MB [2024-08-10 17:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][330/625] eta 0:02:23 lr 0.000432 wd 0.0500 time 0.4856 (0.4868) data time 0.0011 (0.0023) model time 0.4845 (0.4850) loss 2.8837 (2.8023) grad_norm 1.5046 (2.1936) loss_scale 256.0000 (223.5166) mem 16721MB [2024-08-10 17:06:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][340/625] eta 0:02:18 lr 0.000432 wd 0.0500 time 0.4867 (0.4868) data time 0.0008 (0.0023) model time 0.4859 (0.4851) loss 3.5233 (2.8084) grad_norm 2.3024 (2.1998) loss_scale 256.0000 (224.4692) mem 16721MB [2024-08-10 17:06:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][350/625] eta 0:02:14 lr 0.000432 wd 0.0500 time 0.7193 (0.4874) data time 0.0011 (0.0023) model time 0.7182 (0.4859) loss 1.8458 (2.8004) grad_norm 2.0415 (2.2020) loss_scale 256.0000 (225.3675) mem 16721MB [2024-08-10 17:06:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][360/625] eta 0:02:09 lr 0.000431 wd 0.0500 time 0.4836 (0.4878) data time 0.0008 (0.0022) model time 0.4828 (0.4864) loss 2.6796 (2.8034) grad_norm 2.2461 (2.1981) loss_scale 256.0000 (226.2161) mem 16721MB [2024-08-10 17:06:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][370/625] eta 0:02:04 lr 0.000431 wd 0.0500 time 0.4814 (0.4878) data time 0.0011 (0.0022) model time 0.4803 (0.4863) loss 3.3213 (2.8079) grad_norm 1.5472 (2.1854) loss_scale 256.0000 (227.0189) mem 16721MB [2024-08-10 17:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][380/625] eta 0:01:59 lr 0.000431 wd 0.0500 time 0.4828 (0.4877) data time 0.0008 (0.0022) model time 0.4820 (0.4863) loss 1.8367 (2.8038) grad_norm 1.8350 (2.1938) loss_scale 256.0000 (227.7795) mem 16721MB [2024-08-10 17:07:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][390/625] eta 0:01:54 lr 0.000431 wd 0.0500 time 0.4818 (0.4876) data time 0.0010 (0.0021) model time 0.4808 (0.4862) loss 2.9172 (2.8059) grad_norm 1.8143 (2.1852) loss_scale 256.0000 (228.5013) mem 16721MB [2024-08-10 17:07:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][400/625] eta 0:01:49 lr 0.000431 wd 0.0500 time 0.4810 (0.4876) data time 0.0008 (0.0021) model time 0.4802 (0.4861) loss 3.1658 (2.8068) grad_norm 2.8266 (2.1779) loss_scale 256.0000 (229.1870) mem 16721MB [2024-08-10 17:07:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][410/625] eta 0:01:44 lr 0.000431 wd 0.0500 time 0.4831 (0.4876) data time 0.0010 (0.0021) model time 0.4821 (0.4861) loss 3.0104 (2.8082) grad_norm 6.4265 (2.1876) loss_scale 256.0000 (229.8394) mem 16721MB [2024-08-10 17:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][420/625] eta 0:01:39 lr 0.000431 wd 0.0500 time 0.4867 (0.4876) data time 0.0008 (0.0021) model time 0.4859 (0.4861) loss 2.2613 (2.8105) grad_norm 2.4695 (2.1911) loss_scale 256.0000 (230.4608) mem 16721MB [2024-08-10 17:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][430/625] eta 0:01:35 lr 0.000431 wd 0.0500 time 0.4870 (0.4876) data time 0.0008 (0.0020) model time 0.4862 (0.4861) loss 3.2715 (2.8085) grad_norm 1.3652 (2.1820) loss_scale 256.0000 (231.0534) mem 16721MB [2024-08-10 17:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][440/625] eta 0:01:30 lr 0.000431 wd 0.0500 time 0.4772 (0.4875) data time 0.0008 (0.0020) model time 0.4764 (0.4861) loss 2.2972 (2.8089) grad_norm 1.9354 (2.1809) loss_scale 256.0000 (231.6190) mem 16721MB [2024-08-10 17:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][450/625] eta 0:01:25 lr 0.000431 wd 0.0500 time 0.4792 (0.4875) data time 0.0010 (0.0020) model time 0.4782 (0.4861) loss 2.8447 (2.8121) grad_norm 2.0473 (2.1808) loss_scale 256.0000 (232.1596) mem 16721MB [2024-08-10 17:07:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][460/625] eta 0:01:20 lr 0.000430 wd 0.0500 time 0.4872 (0.4878) data time 0.0010 (0.0020) model time 0.4862 (0.4864) loss 3.2207 (2.8110) grad_norm 1.9511 (2.1729) loss_scale 256.0000 (232.6768) mem 16721MB [2024-08-10 17:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][470/625] eta 0:01:15 lr 0.000430 wd 0.0500 time 0.4882 (0.4877) data time 0.0011 (0.0019) model time 0.4871 (0.4863) loss 2.8757 (2.8086) grad_norm 1.3334 (2.1647) loss_scale 256.0000 (233.1720) mem 16721MB [2024-08-10 17:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][480/625] eta 0:01:10 lr 0.000430 wd 0.0500 time 0.4859 (0.4877) data time 0.0011 (0.0019) model time 0.4848 (0.4863) loss 3.4290 (2.8062) grad_norm 2.5350 (2.1703) loss_scale 256.0000 (233.6466) mem 16721MB [2024-08-10 17:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][490/625] eta 0:01:05 lr 0.000430 wd 0.0500 time 0.4862 (0.4876) data time 0.0011 (0.0019) model time 0.4851 (0.4862) loss 3.3581 (2.8045) grad_norm 2.2832 (2.1708) loss_scale 256.0000 (234.1018) mem 16721MB [2024-08-10 17:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][500/625] eta 0:01:00 lr 0.000430 wd 0.0500 time 0.4871 (0.4876) data time 0.0011 (0.0019) model time 0.4860 (0.4862) loss 1.7875 (2.8043) grad_norm 1.8139 (2.1609) loss_scale 256.0000 (234.5389) mem 16721MB [2024-08-10 17:08:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][510/625] eta 0:00:56 lr 0.000430 wd 0.0500 time 0.4855 (0.4875) data time 0.0012 (0.0019) model time 0.4843 (0.4861) loss 3.3504 (2.8047) grad_norm 1.3659 (2.1505) loss_scale 256.0000 (234.9589) mem 16721MB [2024-08-10 17:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][520/625] eta 0:00:51 lr 0.000430 wd 0.0500 time 0.4820 (0.4874) data time 0.0008 (0.0019) model time 0.4812 (0.4860) loss 3.0757 (2.8070) grad_norm 1.6333 (2.1376) loss_scale 256.0000 (235.3628) mem 16721MB [2024-08-10 17:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][530/625] eta 0:00:46 lr 0.000430 wd 0.0500 time 0.4806 (0.4874) data time 0.0009 (0.0019) model time 0.4797 (0.4860) loss 3.2068 (2.8088) grad_norm 2.7041 (2.1501) loss_scale 256.0000 (235.7514) mem 16721MB [2024-08-10 17:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][540/625] eta 0:00:41 lr 0.000430 wd 0.0500 time 0.4902 (0.4873) data time 0.0011 (0.0018) model time 0.4892 (0.4859) loss 2.6319 (2.8015) grad_norm 1.7379 (2.1481) loss_scale 256.0000 (236.1257) mem 16721MB [2024-08-10 17:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][550/625] eta 0:00:36 lr 0.000430 wd 0.0500 time 0.4878 (0.4873) data time 0.0008 (0.0018) model time 0.4870 (0.4859) loss 3.1032 (2.8033) grad_norm 1.4585 (2.1466) loss_scale 256.0000 (236.4864) mem 16721MB [2024-08-10 17:08:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][560/625] eta 0:00:31 lr 0.000429 wd 0.0500 time 0.4856 (0.4873) data time 0.0011 (0.0018) model time 0.4844 (0.4859) loss 2.8599 (2.8027) grad_norm 1.8510 (2.1579) loss_scale 256.0000 (236.8342) mem 16721MB [2024-08-10 17:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][570/625] eta 0:00:26 lr 0.000429 wd 0.0500 time 0.4893 (0.4873) data time 0.0011 (0.0018) model time 0.4882 (0.4859) loss 2.5148 (2.8044) grad_norm 1.6839 (2.1532) loss_scale 256.0000 (237.1699) mem 16721MB [2024-08-10 17:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][580/625] eta 0:00:21 lr 0.000429 wd 0.0500 time 0.4881 (0.4872) data time 0.0010 (0.0018) model time 0.4871 (0.4859) loss 2.8188 (2.8046) grad_norm 2.4536 (2.1509) loss_scale 256.0000 (237.4940) mem 16721MB [2024-08-10 17:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][590/625] eta 0:00:17 lr 0.000429 wd 0.0500 time 0.4789 (0.4872) data time 0.0011 (0.0018) model time 0.4779 (0.4858) loss 2.2299 (2.8009) grad_norm 3.8325 (2.1609) loss_scale 256.0000 (237.8071) mem 16721MB [2024-08-10 17:08:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][600/625] eta 0:00:12 lr 0.000429 wd 0.0500 time 0.4887 (0.4871) data time 0.0008 (0.0018) model time 0.4879 (0.4857) loss 2.2995 (2.8032) grad_norm 2.1320 (2.1634) loss_scale 256.0000 (238.1098) mem 16721MB [2024-08-10 17:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][610/625] eta 0:00:07 lr 0.000429 wd 0.0500 time 0.4795 (0.4870) data time 0.0005 (0.0018) model time 0.4789 (0.4857) loss 3.5495 (2.8069) grad_norm 1.7440 (2.1598) loss_scale 256.0000 (238.4026) mem 16721MB [2024-08-10 17:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [186/300][620/625] eta 0:00:02 lr 0.000429 wd 0.0500 time 0.4804 (0.4870) data time 0.0008 (0.0017) model time 0.4796 (0.4856) loss 2.0965 (2.8063) grad_norm 3.2840 (2.1615) loss_scale 256.0000 (238.6860) mem 16721MB [2024-08-10 17:09:00 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 186 training takes 0:05:04 [2024-08-10 17:09:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:09:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.498 (0.498) Loss 0.5361 (0.5361) Acc@1 88.037 (88.037) Acc@5 98.584 (98.584) Mem 16721MB [2024-08-10 17:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.121 (0.160) Loss 0.8145 (0.6343) Acc@1 80.615 (86.191) Acc@5 95.898 (97.687) Mem 16721MB [2024-08-10 17:09:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.140) Loss 0.9282 (0.7491) Acc@1 77.979 (83.238) Acc@5 95.312 (96.505) Mem 16721MB [2024-08-10 17:09:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.903 Acc@5 96.509 [2024-08-10 17:09:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 17:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.908 (0.908) Loss 0.4741 (0.4741) Acc@1 89.355 (89.355) Acc@5 98.682 (98.682) Mem 16721MB [2024-08-10 17:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.198) Loss 0.7559 (0.5858) Acc@1 81.982 (87.318) Acc@5 96.484 (97.963) Mem 16721MB [2024-08-10 17:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.160) Loss 0.8423 (0.6878) Acc@1 79.395 (84.556) Acc@5 95.996 (97.015) Mem 16721MB [2024-08-10 17:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.241 Acc@5 97.007 [2024-08-10 17:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 17:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.24% [2024-08-10 17:09:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 17:09:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 17:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][0/625] eta 0:08:49 lr 0.000429 wd 0.0500 time 0.8470 (0.8470) data time 0.4307 (0.4307) model time 0.0000 (0.0000) loss 2.7738 (2.7738) grad_norm 1.9070 (1.9070) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][10/625] eta 0:05:19 lr 0.000429 wd 0.0500 time 0.4883 (0.5201) data time 0.0010 (0.0401) model time 0.0000 (0.0000) loss 3.3030 (3.0292) grad_norm 1.4450 (1.7641) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][20/625] eta 0:05:04 lr 0.000429 wd 0.0500 time 0.4863 (0.5039) data time 0.0008 (0.0215) model time 0.0000 (0.0000) loss 2.3216 (2.9160) grad_norm 1.9163 (2.8272) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][30/625] eta 0:05:00 lr 0.000428 wd 0.0500 time 0.4904 (0.5045) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 2.6040 (2.8858) grad_norm 1.5865 (2.6850) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][40/625] eta 0:04:54 lr 0.000428 wd 0.0500 time 0.4861 (0.5039) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 2.5103 (2.8846) grad_norm 1.6483 (2.5421) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][50/625] eta 0:04:47 lr 0.000428 wd 0.0500 time 0.4881 (0.5000) data time 0.0011 (0.0094) model time 0.0000 (0.0000) loss 2.9281 (2.8496) grad_norm 5.3716 (2.9897) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][60/625] eta 0:04:41 lr 0.000428 wd 0.0500 time 0.4858 (0.4975) data time 0.0008 (0.0081) model time 0.4850 (0.4840) loss 2.7946 (2.8195) grad_norm 1.7247 (2.9972) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][70/625] eta 0:04:35 lr 0.000428 wd 0.0500 time 0.4856 (0.4959) data time 0.0008 (0.0071) model time 0.4847 (0.4847) loss 3.0645 (2.8541) grad_norm 1.8762 (2.8650) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][80/625] eta 0:04:29 lr 0.000428 wd 0.0500 time 0.4835 (0.4947) data time 0.0008 (0.0063) model time 0.4827 (0.4846) loss 3.1486 (2.8472) grad_norm 2.8771 (2.8762) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][90/625] eta 0:04:24 lr 0.000428 wd 0.0500 time 0.4874 (0.4936) data time 0.0009 (0.0057) model time 0.4865 (0.4844) loss 3.3730 (2.8565) grad_norm 1.5539 (2.8637) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][100/625] eta 0:04:18 lr 0.000428 wd 0.0500 time 0.4798 (0.4925) data time 0.0009 (0.0053) model time 0.4790 (0.4839) loss 2.0366 (2.8405) grad_norm 2.4875 (2.9333) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][110/625] eta 0:04:13 lr 0.000428 wd 0.0500 time 0.4871 (0.4918) data time 0.0009 (0.0049) model time 0.4862 (0.4838) loss 2.0492 (2.8239) grad_norm 1.8448 (2.8269) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][120/625] eta 0:04:08 lr 0.000428 wd 0.0500 time 0.4802 (0.4911) data time 0.0008 (0.0046) model time 0.4794 (0.4836) loss 3.0360 (2.8210) grad_norm 1.3689 (2.7511) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][130/625] eta 0:04:02 lr 0.000427 wd 0.0500 time 0.4821 (0.4905) data time 0.0009 (0.0043) model time 0.4813 (0.4835) loss 3.3410 (2.8208) grad_norm 1.3797 (2.6673) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][140/625] eta 0:03:57 lr 0.000427 wd 0.0500 time 0.4886 (0.4902) data time 0.0008 (0.0041) model time 0.4877 (0.4836) loss 3.5040 (2.8304) grad_norm 1.7690 (2.6872) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][150/625] eta 0:03:52 lr 0.000427 wd 0.0500 time 0.4841 (0.4898) data time 0.0008 (0.0039) model time 0.4833 (0.4835) loss 3.0517 (2.8312) grad_norm 3.2540 (2.6492) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][160/625] eta 0:03:47 lr 0.000427 wd 0.0500 time 0.4803 (0.4894) data time 0.0011 (0.0037) model time 0.4792 (0.4835) loss 3.2271 (2.8373) grad_norm 1.5249 (2.6069) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][170/625] eta 0:03:42 lr 0.000427 wd 0.0500 time 0.4854 (0.4892) data time 0.0008 (0.0035) model time 0.4846 (0.4836) loss 2.1950 (2.8321) grad_norm 1.5883 (2.5699) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][180/625] eta 0:03:37 lr 0.000427 wd 0.0500 time 0.4828 (0.4889) data time 0.0009 (0.0034) model time 0.4819 (0.4835) loss 3.5188 (2.8346) grad_norm 1.5893 (2.5260) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][190/625] eta 0:03:32 lr 0.000427 wd 0.0500 time 0.4861 (0.4888) data time 0.0010 (0.0033) model time 0.4852 (0.4836) loss 3.2574 (2.8288) grad_norm 1.3431 (2.5062) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][200/625] eta 0:03:27 lr 0.000427 wd 0.0500 time 0.4842 (0.4885) data time 0.0009 (0.0032) model time 0.4833 (0.4836) loss 2.5921 (2.8241) grad_norm 1.8467 (2.4789) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][210/625] eta 0:03:22 lr 0.000427 wd 0.0500 time 0.4862 (0.4884) data time 0.0011 (0.0031) model time 0.4851 (0.4837) loss 2.5684 (2.8160) grad_norm 2.2358 (2.4488) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][220/625] eta 0:03:17 lr 0.000427 wd 0.0500 time 0.4749 (0.4883) data time 0.0014 (0.0030) model time 0.4735 (0.4838) loss 2.8415 (2.8153) grad_norm 1.6703 (2.4214) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][230/625] eta 0:03:12 lr 0.000426 wd 0.0500 time 0.4884 (0.4883) data time 0.0009 (0.0029) model time 0.4875 (0.4839) loss 3.2877 (2.8208) grad_norm 1.2229 (2.4063) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][240/625] eta 0:03:07 lr 0.000426 wd 0.0500 time 0.4816 (0.4882) data time 0.0008 (0.0028) model time 0.4808 (0.4839) loss 3.1702 (2.8157) grad_norm 1.9244 (2.3732) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][250/625] eta 0:03:03 lr 0.000426 wd 0.0500 time 0.4830 (0.4880) data time 0.0009 (0.0028) model time 0.4821 (0.4839) loss 1.9698 (2.8074) grad_norm 1.3012 (2.3597) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][260/625] eta 0:02:58 lr 0.000426 wd 0.0500 time 0.4859 (0.4879) data time 0.0010 (0.0027) model time 0.4849 (0.4839) loss 3.1554 (2.8063) grad_norm 1.7857 (2.3341) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][270/625] eta 0:02:53 lr 0.000426 wd 0.0500 time 0.4862 (0.4878) data time 0.0011 (0.0026) model time 0.4851 (0.4839) loss 2.0068 (2.7999) grad_norm 1.7959 (2.3202) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][280/625] eta 0:02:48 lr 0.000426 wd 0.0500 time 0.4865 (0.4877) data time 0.0012 (0.0026) model time 0.4853 (0.4839) loss 3.0939 (2.8061) grad_norm 1.7615 (2.3171) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][290/625] eta 0:02:43 lr 0.000426 wd 0.0500 time 0.4896 (0.4877) data time 0.0009 (0.0025) model time 0.4887 (0.4840) loss 1.7323 (2.7997) grad_norm 2.0583 (2.2983) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][300/625] eta 0:02:38 lr 0.000426 wd 0.0500 time 0.4837 (0.4876) data time 0.0011 (0.0025) model time 0.4826 (0.4840) loss 2.9093 (2.7987) grad_norm 1.6887 (2.2738) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][310/625] eta 0:02:33 lr 0.000426 wd 0.0500 time 0.4813 (0.4875) data time 0.0009 (0.0024) model time 0.4804 (0.4840) loss 3.2153 (2.7948) grad_norm 1.6901 (2.2691) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][320/625] eta 0:02:28 lr 0.000426 wd 0.0500 time 0.4805 (0.4874) data time 0.0009 (0.0024) model time 0.4796 (0.4839) loss 1.7111 (2.7851) grad_norm 1.8239 (2.2521) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][330/625] eta 0:02:23 lr 0.000425 wd 0.0500 time 0.4851 (0.4873) data time 0.0011 (0.0023) model time 0.4840 (0.4839) loss 2.9125 (2.7874) grad_norm 1.9831 (2.2436) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:11:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][340/625] eta 0:02:18 lr 0.000425 wd 0.0500 time 0.4822 (0.4872) data time 0.0010 (0.0023) model time 0.4811 (0.4839) loss 3.1620 (2.7925) grad_norm 1.8774 (2.2475) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][350/625] eta 0:02:13 lr 0.000425 wd 0.0500 time 0.4839 (0.4871) data time 0.0008 (0.0023) model time 0.4831 (0.4839) loss 2.9640 (2.7917) grad_norm 2.4023 (2.2385) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][360/625] eta 0:02:09 lr 0.000425 wd 0.0500 time 0.4884 (0.4871) data time 0.0009 (0.0022) model time 0.4875 (0.4839) loss 3.3524 (2.7971) grad_norm 2.0384 (2.2299) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][370/625] eta 0:02:04 lr 0.000425 wd 0.0500 time 0.4844 (0.4881) data time 0.0012 (0.0022) model time 0.4832 (0.4851) loss 3.3811 (2.7937) grad_norm 2.8612 (2.2209) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][380/625] eta 0:01:59 lr 0.000425 wd 0.0500 time 0.4864 (0.4881) data time 0.0009 (0.0022) model time 0.4855 (0.4852) loss 3.0384 (2.7935) grad_norm 1.6025 (2.2198) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][390/625] eta 0:01:54 lr 0.000425 wd 0.0500 time 0.4854 (0.4880) data time 0.0011 (0.0022) model time 0.4843 (0.4852) loss 2.8623 (2.7947) grad_norm 2.0143 (2.2220) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][400/625] eta 0:01:49 lr 0.000425 wd 0.0500 time 0.4832 (0.4880) data time 0.0008 (0.0021) model time 0.4825 (0.4852) loss 2.8834 (2.7974) grad_norm 1.5235 (2.2133) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][410/625] eta 0:01:44 lr 0.000425 wd 0.0500 time 0.4863 (0.4879) data time 0.0011 (0.0021) model time 0.4852 (0.4851) loss 2.8238 (2.8029) grad_norm 1.5333 (2.1985) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][420/625] eta 0:01:40 lr 0.000425 wd 0.0500 time 0.4914 (0.4879) data time 0.0011 (0.0021) model time 0.4904 (0.4851) loss 2.5739 (2.8014) grad_norm 1.4112 (2.1844) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][430/625] eta 0:01:35 lr 0.000424 wd 0.0500 time 0.4967 (0.4879) data time 0.0008 (0.0021) model time 0.4959 (0.4852) loss 2.8684 (2.8030) grad_norm 2.0517 (2.1713) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][440/625] eta 0:01:30 lr 0.000424 wd 0.0500 time 0.4860 (0.4879) data time 0.0011 (0.0020) model time 0.4849 (0.4852) loss 2.9124 (2.8025) grad_norm 1.9303 (2.1643) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][450/625] eta 0:01:25 lr 0.000424 wd 0.0500 time 0.4836 (0.4878) data time 0.0008 (0.0020) model time 0.4828 (0.4852) loss 1.7194 (2.8011) grad_norm 2.1943 (2.1626) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][460/625] eta 0:01:20 lr 0.000424 wd 0.0500 time 0.4865 (0.4878) data time 0.0011 (0.0020) model time 0.4854 (0.4852) loss 2.3746 (2.8031) grad_norm 1.3772 (2.1489) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][470/625] eta 0:01:15 lr 0.000424 wd 0.0500 time 0.4797 (0.4877) data time 0.0012 (0.0020) model time 0.4786 (0.4852) loss 1.9169 (2.8032) grad_norm 1.8028 (2.1410) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][480/625] eta 0:01:10 lr 0.000424 wd 0.0500 time 0.4819 (0.4877) data time 0.0011 (0.0020) model time 0.4808 (0.4852) loss 2.6259 (2.8023) grad_norm 2.1950 (2.1332) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][490/625] eta 0:01:05 lr 0.000424 wd 0.0500 time 0.4910 (0.4877) data time 0.0012 (0.0019) model time 0.4898 (0.4852) loss 2.9988 (2.8047) grad_norm 1.7353 (2.1272) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][500/625] eta 0:01:00 lr 0.000424 wd 0.0500 time 0.4852 (0.4876) data time 0.0011 (0.0019) model time 0.4841 (0.4852) loss 2.7459 (2.8043) grad_norm 1.8227 (2.1241) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][510/625] eta 0:00:56 lr 0.000424 wd 0.0500 time 0.4813 (0.4876) data time 0.0011 (0.0019) model time 0.4802 (0.4851) loss 2.7633 (2.8038) grad_norm 1.3746 (2.1234) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][520/625] eta 0:00:51 lr 0.000424 wd 0.0500 time 0.4835 (0.4875) data time 0.0011 (0.0019) model time 0.4824 (0.4851) loss 3.0953 (2.8045) grad_norm 1.7874 (2.1294) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][530/625] eta 0:00:46 lr 0.000423 wd 0.0500 time 0.4842 (0.4875) data time 0.0008 (0.0019) model time 0.4834 (0.4851) loss 2.9783 (2.8035) grad_norm 2.3251 (2.1401) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][540/625] eta 0:00:41 lr 0.000423 wd 0.0500 time 0.4786 (0.4874) data time 0.0009 (0.0019) model time 0.4777 (0.4850) loss 2.7396 (2.8046) grad_norm 2.2170 (2.1327) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][550/625] eta 0:00:36 lr 0.000423 wd 0.0500 time 0.4820 (0.4873) data time 0.0011 (0.0018) model time 0.4809 (0.4849) loss 3.0927 (2.8053) grad_norm 1.8074 (2.1320) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][560/625] eta 0:00:31 lr 0.000423 wd 0.0500 time 0.4865 (0.4876) data time 0.0007 (0.0018) model time 0.4858 (0.4853) loss 2.7159 (2.8052) grad_norm 2.1862 (2.1270) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][570/625] eta 0:00:26 lr 0.000423 wd 0.0500 time 0.4854 (0.4876) data time 0.0012 (0.0018) model time 0.4843 (0.4853) loss 3.4547 (2.8079) grad_norm 6.1771 (2.1271) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][580/625] eta 0:00:21 lr 0.000423 wd 0.0500 time 0.4840 (0.4876) data time 0.0009 (0.0018) model time 0.4831 (0.4853) loss 2.6764 (2.8080) grad_norm 2.2188 (2.1349) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:13:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][590/625] eta 0:00:17 lr 0.000423 wd 0.0500 time 0.4868 (0.4876) data time 0.0009 (0.0018) model time 0.4859 (0.4853) loss 3.0002 (2.8044) grad_norm 1.9276 (2.1364) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][600/625] eta 0:00:12 lr 0.000423 wd 0.0500 time 0.4879 (0.4876) data time 0.0010 (0.0018) model time 0.4869 (0.4853) loss 3.3086 (2.8067) grad_norm 2.1269 (2.1364) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][610/625] eta 0:00:07 lr 0.000423 wd 0.0500 time 0.4815 (0.4875) data time 0.0008 (0.0018) model time 0.4808 (0.4853) loss 3.0471 (2.8046) grad_norm 1.8805 (2.1281) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [187/300][620/625] eta 0:00:02 lr 0.000422 wd 0.0500 time 0.4813 (0.4874) data time 0.0008 (0.0018) model time 0.4805 (0.4853) loss 3.3186 (2.8031) grad_norm 2.5246 (2.1345) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 187 training takes 0:05:04 [2024-08-10 17:14:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:14:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.5400 (0.5400) Acc@1 87.451 (87.451) Acc@5 98.779 (98.779) Mem 16721MB [2024-08-10 17:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.8271 (0.6294) Acc@1 80.420 (85.986) Acc@5 95.752 (97.647) Mem 16721MB [2024-08-10 17:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9160 (0.7458) Acc@1 77.930 (83.131) Acc@5 95.215 (96.487) Mem 16721MB [2024-08-10 17:14:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.853 Acc@5 96.451 [2024-08-10 17:14:21 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 82.9% [2024-08-10 17:14:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.844 (0.844) Loss 0.4734 (0.4734) Acc@1 89.453 (89.453) Acc@5 98.682 (98.682) Mem 16721MB [2024-08-10 17:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.194) Loss 0.7559 (0.5854) Acc@1 81.787 (87.296) Acc@5 96.533 (97.954) Mem 16721MB [2024-08-10 17:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.8418 (0.6875) Acc@1 79.834 (84.566) Acc@5 96.045 (97.005) Mem 16721MB [2024-08-10 17:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.251 Acc@5 96.995 [2024-08-10 17:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 17:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.25% [2024-08-10 17:14:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 17:14:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 17:14:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][0/625] eta 0:08:32 lr 0.000422 wd 0.0500 time 0.8198 (0.8198) data time 0.4007 (0.4007) model time 0.0000 (0.0000) loss 3.3153 (3.3153) grad_norm 1.8723 (1.8723) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][10/625] eta 0:05:18 lr 0.000422 wd 0.0500 time 0.4872 (0.5181) data time 0.0009 (0.0374) model time 0.0000 (0.0000) loss 2.9871 (2.6215) grad_norm 2.2950 (4.2618) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][20/625] eta 0:05:04 lr 0.000422 wd 0.0500 time 0.4863 (0.5031) data time 0.0010 (0.0201) model time 0.0000 (0.0000) loss 2.8416 (2.6849) grad_norm 1.4923 (3.0673) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][30/625] eta 0:04:55 lr 0.000422 wd 0.0500 time 0.4836 (0.4973) data time 0.0009 (0.0139) model time 0.0000 (0.0000) loss 2.7878 (2.6527) grad_norm 1.5550 (2.7639) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][40/625] eta 0:04:49 lr 0.000422 wd 0.0500 time 0.4955 (0.4944) data time 0.0010 (0.0108) model time 0.0000 (0.0000) loss 2.1885 (2.6567) grad_norm 1.7992 (2.5004) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][50/625] eta 0:04:42 lr 0.000422 wd 0.0500 time 0.4801 (0.4920) data time 0.0008 (0.0089) model time 0.0000 (0.0000) loss 2.4894 (2.6503) grad_norm 2.1780 (2.4662) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][60/625] eta 0:04:37 lr 0.000422 wd 0.0500 time 0.4874 (0.4911) data time 0.0012 (0.0076) model time 0.4862 (0.4852) loss 2.5894 (2.6819) grad_norm 1.7186 (2.5082) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][70/625] eta 0:04:31 lr 0.000422 wd 0.0500 time 0.4838 (0.4899) data time 0.0011 (0.0067) model time 0.4827 (0.4837) loss 2.0669 (2.6974) grad_norm 1.7127 (2.5322) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][80/625] eta 0:04:26 lr 0.000422 wd 0.0500 time 0.4831 (0.4892) data time 0.0012 (0.0060) model time 0.4819 (0.4834) loss 2.9259 (2.6961) grad_norm 2.3337 (2.5296) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][90/625] eta 0:04:22 lr 0.000422 wd 0.0500 time 0.4877 (0.4904) data time 0.0009 (0.0054) model time 0.4868 (0.4874) loss 2.5926 (2.6916) grad_norm 1.7238 (2.4543) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][100/625] eta 0:04:17 lr 0.000421 wd 0.0500 time 0.4944 (0.4900) data time 0.0009 (0.0050) model time 0.4935 (0.4869) loss 2.1085 (2.6820) grad_norm 2.2938 (2.3725) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][110/625] eta 0:04:12 lr 0.000421 wd 0.0500 time 0.4839 (0.4897) data time 0.0012 (0.0047) model time 0.4827 (0.4867) loss 2.8263 (2.6944) grad_norm 1.5418 (2.2982) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][120/625] eta 0:04:07 lr 0.000421 wd 0.0500 time 0.4820 (0.4891) data time 0.0009 (0.0044) model time 0.4811 (0.4860) loss 2.8358 (2.7242) grad_norm 2.6726 (2.3078) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][130/625] eta 0:04:02 lr 0.000421 wd 0.0500 time 0.4799 (0.4902) data time 0.0011 (0.0041) model time 0.4788 (0.4880) loss 2.7075 (2.7262) grad_norm 2.1055 (2.2966) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][140/625] eta 0:03:57 lr 0.000421 wd 0.0500 time 0.4789 (0.4896) data time 0.0012 (0.0039) model time 0.4777 (0.4872) loss 2.1548 (2.7360) grad_norm 3.5931 (2.2783) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][150/625] eta 0:03:52 lr 0.000421 wd 0.0500 time 0.4846 (0.4892) data time 0.0008 (0.0037) model time 0.4838 (0.4867) loss 3.8856 (2.7412) grad_norm 2.4651 (2.2721) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][160/625] eta 0:03:47 lr 0.000421 wd 0.0500 time 0.4869 (0.4891) data time 0.0010 (0.0035) model time 0.4859 (0.4866) loss 2.1039 (2.7309) grad_norm 2.7517 (2.2957) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][170/625] eta 0:03:42 lr 0.000421 wd 0.0500 time 0.4839 (0.4889) data time 0.0011 (0.0034) model time 0.4828 (0.4865) loss 3.1015 (2.7472) grad_norm 1.7313 (2.3085) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][180/625] eta 0:03:37 lr 0.000421 wd 0.0500 time 0.4895 (0.4888) data time 0.0011 (0.0033) model time 0.4884 (0.4865) loss 2.5106 (2.7518) grad_norm 1.5279 (2.3180) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:15:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][190/625] eta 0:03:32 lr 0.000421 wd 0.0500 time 0.4809 (0.4886) data time 0.0010 (0.0032) model time 0.4799 (0.4862) loss 2.5955 (2.7644) grad_norm 1.3003 (2.2920) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][200/625] eta 0:03:27 lr 0.000420 wd 0.0500 time 0.4835 (0.4884) data time 0.0010 (0.0031) model time 0.4825 (0.4861) loss 2.6835 (2.7738) grad_norm 1.4507 (2.2528) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][210/625] eta 0:03:22 lr 0.000420 wd 0.0500 time 0.4886 (0.4883) data time 0.0008 (0.0030) model time 0.4878 (0.4860) loss 3.2578 (2.7794) grad_norm 1.9671 (2.2742) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][220/625] eta 0:03:17 lr 0.000420 wd 0.0500 time 0.4816 (0.4880) data time 0.0012 (0.0029) model time 0.4804 (0.4858) loss 3.0979 (2.7628) grad_norm 1.4643 (2.2590) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][230/625] eta 0:03:12 lr 0.000420 wd 0.0500 time 0.4838 (0.4879) data time 0.0008 (0.0028) model time 0.4830 (0.4857) loss 2.4363 (2.7559) grad_norm 2.1056 (2.2356) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][240/625] eta 0:03:07 lr 0.000420 wd 0.0500 time 0.4843 (0.4879) data time 0.0011 (0.0027) model time 0.4832 (0.4857) loss 3.1289 (2.7492) grad_norm 1.8486 (2.2117) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][250/625] eta 0:03:02 lr 0.000420 wd 0.0500 time 0.4915 (0.4879) data time 0.0008 (0.0027) model time 0.4907 (0.4857) loss 1.6862 (2.7442) grad_norm 2.0510 (2.1970) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][260/625] eta 0:02:58 lr 0.000420 wd 0.0500 time 0.4884 (0.4879) data time 0.0012 (0.0026) model time 0.4872 (0.4858) loss 2.7736 (2.7461) grad_norm 1.8272 (2.1884) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][270/625] eta 0:02:53 lr 0.000420 wd 0.0500 time 0.4805 (0.4877) data time 0.0008 (0.0025) model time 0.4797 (0.4856) loss 3.2278 (2.7470) grad_norm 4.3494 (2.2017) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][280/625] eta 0:02:48 lr 0.000420 wd 0.0500 time 0.4867 (0.4876) data time 0.0011 (0.0025) model time 0.4856 (0.4855) loss 2.9806 (2.7510) grad_norm 1.7783 (2.2080) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][290/625] eta 0:02:43 lr 0.000420 wd 0.0500 time 0.4856 (0.4874) data time 0.0010 (0.0024) model time 0.4846 (0.4854) loss 2.6674 (2.7511) grad_norm 1.7789 (2.1995) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][300/625] eta 0:02:38 lr 0.000419 wd 0.0500 time 0.4810 (0.4873) data time 0.0008 (0.0024) model time 0.4802 (0.4853) loss 2.6858 (2.7479) grad_norm 1.3893 (2.1772) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][310/625] eta 0:02:33 lr 0.000419 wd 0.0500 time 0.4860 (0.4878) data time 0.0013 (0.0023) model time 0.4848 (0.4860) loss 2.5271 (2.7428) grad_norm 2.0105 (2.1770) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][320/625] eta 0:02:28 lr 0.000419 wd 0.0500 time 0.4828 (0.4878) data time 0.0008 (0.0023) model time 0.4819 (0.4859) loss 3.2267 (2.7530) grad_norm 1.5764 (2.1709) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][330/625] eta 0:02:23 lr 0.000419 wd 0.0500 time 0.4839 (0.4877) data time 0.0012 (0.0023) model time 0.4827 (0.4859) loss 3.3274 (2.7600) grad_norm 1.9789 (2.1581) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][340/625] eta 0:02:18 lr 0.000419 wd 0.0500 time 0.4799 (0.4876) data time 0.0010 (0.0022) model time 0.4789 (0.4857) loss 2.0800 (2.7568) grad_norm 1.4102 (2.1822) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][350/625] eta 0:02:14 lr 0.000419 wd 0.0500 time 0.4788 (0.4884) data time 0.0010 (0.0022) model time 0.4778 (0.4867) loss 3.0987 (2.7552) grad_norm 3.6742 (2.1890) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][360/625] eta 0:02:09 lr 0.000419 wd 0.0500 time 0.4841 (0.4883) data time 0.0011 (0.0022) model time 0.4830 (0.4866) loss 2.7287 (2.7587) grad_norm 2.4930 (2.1820) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][370/625] eta 0:02:04 lr 0.000419 wd 0.0500 time 0.4815 (0.4881) data time 0.0010 (0.0021) model time 0.4805 (0.4864) loss 2.1463 (2.7621) grad_norm 2.2634 (2.1880) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][380/625] eta 0:01:59 lr 0.000419 wd 0.0500 time 0.4891 (0.4880) data time 0.0008 (0.0021) model time 0.4883 (0.4864) loss 2.0243 (2.7545) grad_norm 1.8311 (2.1819) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][390/625] eta 0:01:54 lr 0.000418 wd 0.0500 time 0.4886 (0.4880) data time 0.0008 (0.0021) model time 0.4878 (0.4863) loss 3.0479 (2.7599) grad_norm 2.3651 (2.1898) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][400/625] eta 0:01:49 lr 0.000418 wd 0.0500 time 0.4877 (0.4879) data time 0.0011 (0.0021) model time 0.4866 (0.4863) loss 3.0903 (2.7616) grad_norm 1.5427 (2.1830) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][410/625] eta 0:01:44 lr 0.000418 wd 0.0500 time 0.4804 (0.4878) data time 0.0009 (0.0020) model time 0.4795 (0.4862) loss 2.3249 (2.7627) grad_norm 3.2223 (2.2455) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][420/625] eta 0:01:39 lr 0.000418 wd 0.0500 time 0.4818 (0.4877) data time 0.0011 (0.0020) model time 0.4807 (0.4861) loss 3.1087 (2.7603) grad_norm 5.2227 (2.2530) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][430/625] eta 0:01:35 lr 0.000418 wd 0.0500 time 0.4781 (0.4876) data time 0.0009 (0.0020) model time 0.4772 (0.4860) loss 3.1637 (2.7634) grad_norm 1.2082 (2.2469) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][440/625] eta 0:01:30 lr 0.000418 wd 0.0500 time 0.4924 (0.4875) data time 0.0011 (0.0020) model time 0.4914 (0.4859) loss 2.9924 (2.7629) grad_norm 1.8520 (2.2417) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][450/625] eta 0:01:25 lr 0.000418 wd 0.0500 time 0.4131 (0.4878) data time 0.0009 (0.0019) model time 0.4122 (0.4862) loss 2.5035 (2.7599) grad_norm 2.3421 (2.2451) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][460/625] eta 0:01:20 lr 0.000418 wd 0.0500 time 0.4815 (0.4878) data time 0.0008 (0.0019) model time 0.4807 (0.4862) loss 2.7012 (2.7635) grad_norm 1.7611 (2.2408) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][470/625] eta 0:01:15 lr 0.000418 wd 0.0500 time 0.4878 (0.4877) data time 0.0011 (0.0019) model time 0.4868 (0.4861) loss 2.7767 (2.7689) grad_norm 2.6490 (2.2474) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][480/625] eta 0:01:10 lr 0.000418 wd 0.0500 time 0.4928 (0.4877) data time 0.0008 (0.0019) model time 0.4921 (0.4862) loss 2.9737 (2.7659) grad_norm 3.3326 (2.2518) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][490/625] eta 0:01:05 lr 0.000417 wd 0.0500 time 0.4822 (0.4877) data time 0.0012 (0.0019) model time 0.4810 (0.4861) loss 3.1468 (2.7676) grad_norm 1.2797 (2.2422) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][500/625] eta 0:01:01 lr 0.000417 wd 0.0500 time 0.4856 (0.4884) data time 0.0011 (0.0019) model time 0.4845 (0.4869) loss 3.1103 (2.7643) grad_norm 1.9251 (2.2312) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][510/625] eta 0:00:56 lr 0.000417 wd 0.0500 time 0.4846 (0.4883) data time 0.0008 (0.0018) model time 0.4838 (0.4869) loss 3.1020 (2.7684) grad_norm 4.1528 (2.2298) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][520/625] eta 0:00:51 lr 0.000417 wd 0.0500 time 0.4833 (0.4883) data time 0.0010 (0.0018) model time 0.4823 (0.4868) loss 3.0828 (2.7693) grad_norm 1.7097 (2.2225) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][530/625] eta 0:00:46 lr 0.000417 wd 0.0500 time 0.4856 (0.4882) data time 0.0008 (0.0018) model time 0.4848 (0.4868) loss 1.7235 (2.7664) grad_norm 2.8756 (2.2301) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][540/625] eta 0:00:41 lr 0.000417 wd 0.0500 time 0.4894 (0.4882) data time 0.0011 (0.0018) model time 0.4883 (0.4867) loss 2.9648 (2.7701) grad_norm 1.9659 (2.2313) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][550/625] eta 0:00:36 lr 0.000417 wd 0.0500 time 0.4805 (0.4881) data time 0.0012 (0.0018) model time 0.4793 (0.4867) loss 2.4343 (2.7746) grad_norm 2.4407 (2.2290) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][560/625] eta 0:00:31 lr 0.000417 wd 0.0500 time 0.4840 (0.4881) data time 0.0008 (0.0018) model time 0.4832 (0.4866) loss 2.3259 (2.7758) grad_norm 2.9256 (2.2256) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][570/625] eta 0:00:26 lr 0.000417 wd 0.0500 time 0.4803 (0.4880) data time 0.0009 (0.0018) model time 0.4794 (0.4865) loss 2.9154 (2.7759) grad_norm 1.6174 (2.2277) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][580/625] eta 0:00:21 lr 0.000417 wd 0.0500 time 0.4848 (0.4879) data time 0.0011 (0.0018) model time 0.4837 (0.4865) loss 3.0753 (2.7742) grad_norm 2.6681 (2.2499) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][590/625] eta 0:00:17 lr 0.000416 wd 0.0500 time 0.4825 (0.4879) data time 0.0011 (0.0017) model time 0.4814 (0.4864) loss 1.8444 (2.7756) grad_norm 2.1680 (2.2538) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][600/625] eta 0:00:12 lr 0.000416 wd 0.0500 time 0.4879 (0.4879) data time 0.0011 (0.0017) model time 0.4868 (0.4864) loss 2.3163 (2.7769) grad_norm 1.5354 (2.2586) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][610/625] eta 0:00:07 lr 0.000416 wd 0.0500 time 0.4874 (0.4879) data time 0.0008 (0.0017) model time 0.4866 (0.4864) loss 3.0842 (2.7749) grad_norm 1.8312 (2.2504) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [188/300][620/625] eta 0:00:02 lr 0.000416 wd 0.0500 time 0.4826 (0.4878) data time 0.0008 (0.0017) model time 0.4818 (0.4864) loss 2.8412 (2.7805) grad_norm 1.4887 (2.2429) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:31 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 188 training takes 0:05:04 [2024-08-10 17:19:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:19:33 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.5352 (0.5352) Acc@1 88.037 (88.037) Acc@5 98.730 (98.730) Mem 16721MB [2024-08-10 17:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.7993 (0.6380) Acc@1 81.006 (86.093) Acc@5 96.338 (97.647) Mem 16721MB [2024-08-10 17:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9370 (0.7533) Acc@1 77.930 (83.238) Acc@5 94.824 (96.501) Mem 16721MB [2024-08-10 17:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.059 Acc@5 96.533 [2024-08-10 17:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 17:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.06% [2024-08-10 17:19:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 17:19:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 17:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.4731 (0.4731) Acc@1 89.355 (89.355) Acc@5 98.730 (98.730) Mem 16721MB [2024-08-10 17:19:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.7539 (0.5851) Acc@1 81.787 (87.287) Acc@5 96.631 (97.971) Mem 16721MB [2024-08-10 17:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.141) Loss 0.8413 (0.6874) Acc@1 79.785 (84.559) Acc@5 95.996 (97.008) Mem 16721MB [2024-08-10 17:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.241 Acc@5 97.001 [2024-08-10 17:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 17:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][0/625] eta 0:13:33 lr 0.000416 wd 0.0500 time 1.3014 (1.3014) data time 0.4514 (0.4514) model time 0.0000 (0.0000) loss 2.6190 (2.6190) grad_norm 1.1595 (1.1595) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][10/625] eta 0:05:44 lr 0.000416 wd 0.0500 time 0.4850 (0.5606) data time 0.0008 (0.0419) model time 0.0000 (0.0000) loss 1.8283 (2.7841) grad_norm 3.9895 (3.1915) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][20/625] eta 0:05:17 lr 0.000416 wd 0.0500 time 0.4800 (0.5241) data time 0.0011 (0.0225) model time 0.0000 (0.0000) loss 2.5252 (2.7589) grad_norm 2.2727 (2.8760) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:19:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][30/625] eta 0:05:04 lr 0.000416 wd 0.0500 time 0.4889 (0.5119) data time 0.0011 (0.0156) model time 0.0000 (0.0000) loss 2.7294 (2.7517) grad_norm 2.6462 (2.6632) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][40/625] eta 0:04:57 lr 0.000416 wd 0.0500 time 0.4825 (0.5091) data time 0.0011 (0.0120) model time 0.0000 (0.0000) loss 2.8994 (2.7482) grad_norm 1.6128 (2.6159) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][50/625] eta 0:04:49 lr 0.000416 wd 0.0500 time 0.4800 (0.5043) data time 0.0013 (0.0099) model time 0.0000 (0.0000) loss 2.9384 (2.7679) grad_norm 1.9216 (2.4602) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][60/625] eta 0:04:43 lr 0.000416 wd 0.0500 time 0.4816 (0.5011) data time 0.0009 (0.0084) model time 0.4806 (0.4842) loss 2.4617 (2.7776) grad_norm 1.8206 (2.3804) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][70/625] eta 0:04:36 lr 0.000415 wd 0.0500 time 0.4888 (0.4988) data time 0.0009 (0.0074) model time 0.4879 (0.4839) loss 3.1948 (2.7700) grad_norm 1.8831 (2.3446) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][80/625] eta 0:04:30 lr 0.000415 wd 0.0500 time 0.4867 (0.4969) data time 0.0008 (0.0066) model time 0.4859 (0.4834) loss 3.5486 (2.7929) grad_norm 1.8026 (2.3174) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][90/625] eta 0:04:27 lr 0.000415 wd 0.0500 time 0.4802 (0.4997) data time 0.0010 (0.0060) model time 0.4792 (0.4928) loss 1.5368 (2.7853) grad_norm 1.9883 (2.4153) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][100/625] eta 0:04:21 lr 0.000415 wd 0.0500 time 0.4863 (0.4983) data time 0.0008 (0.0055) model time 0.4855 (0.4910) loss 3.1866 (2.7941) grad_norm 1.2787 (2.3584) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][110/625] eta 0:04:16 lr 0.000415 wd 0.0500 time 0.4893 (0.4972) data time 0.0008 (0.0051) model time 0.4885 (0.4901) loss 3.3178 (2.8190) grad_norm 1.8109 (2.3388) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][120/625] eta 0:04:10 lr 0.000415 wd 0.0500 time 0.4879 (0.4964) data time 0.0008 (0.0048) model time 0.4871 (0.4896) loss 3.2192 (2.8118) grad_norm 1.5015 (2.3223) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][130/625] eta 0:04:05 lr 0.000415 wd 0.0500 time 0.4835 (0.4956) data time 0.0011 (0.0045) model time 0.4824 (0.4890) loss 2.2775 (2.8142) grad_norm 1.8125 (2.2891) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][140/625] eta 0:03:59 lr 0.000415 wd 0.0500 time 0.4802 (0.4947) data time 0.0008 (0.0043) model time 0.4794 (0.4882) loss 3.3638 (2.8141) grad_norm 2.4005 (2.2457) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:20:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][150/625] eta 0:03:54 lr 0.000415 wd 0.0500 time 0.4854 (0.4940) data time 0.0011 (0.0041) model time 0.4844 (0.4877) loss 2.9822 (2.8152) grad_norm 1.5555 (2.2073) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:21:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][160/625] eta 0:03:49 lr 0.000415 wd 0.0500 time 0.4822 (0.4934) data time 0.0008 (0.0039) model time 0.4814 (0.4872) loss 2.9585 (2.8241) grad_norm 1.6435 (2.2116) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:21:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][170/625] eta 0:03:44 lr 0.000414 wd 0.0500 time 0.4864 (0.4928) data time 0.0011 (0.0037) model time 0.4854 (0.4869) loss 2.1816 (2.8251) grad_norm 1.8763 (2.2112) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:21:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][180/625] eta 0:03:39 lr 0.000414 wd 0.0500 time 0.4856 (0.4925) data time 0.0010 (0.0036) model time 0.4847 (0.4868) loss 3.3490 (2.8353) grad_norm 2.3546 (2.1997) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][190/625] eta 0:03:34 lr 0.000414 wd 0.0500 time 0.4964 (0.4922) data time 0.0008 (0.0034) model time 0.4956 (0.4868) loss 3.6530 (2.8510) grad_norm 3.0988 (2.1857) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][200/625] eta 0:03:29 lr 0.000414 wd 0.0500 time 0.4830 (0.4919) data time 0.0010 (0.0033) model time 0.4820 (0.4867) loss 3.3421 (2.8487) grad_norm 1.8754 (2.1925) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-10 17:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][210/625] eta 0:03:24 lr 0.000414 wd 0.0500 time 0.4839 (0.4917) data time 0.0010 (0.0032) model time 0.4829 (0.4866) loss 3.0445 (2.8548) grad_norm 1.5406 (2.1671) loss_scale 512.0000 (258.4265) mem 16721MB [2024-08-10 17:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][220/625] eta 0:03:18 lr 0.000414 wd 0.0500 time 0.4813 (0.4913) data time 0.0008 (0.0031) model time 0.4805 (0.4863) loss 3.4181 (2.8607) grad_norm 3.0582 (2.1650) loss_scale 512.0000 (269.9005) mem 16721MB [2024-08-10 17:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][230/625] eta 0:03:13 lr 0.000414 wd 0.0500 time 0.4854 (0.4910) data time 0.0008 (0.0030) model time 0.4846 (0.4861) loss 3.2354 (2.8701) grad_norm 1.9992 (2.1652) loss_scale 512.0000 (280.3810) mem 16721MB [2024-08-10 17:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][240/625] eta 0:03:08 lr 0.000414 wd 0.0500 time 0.4777 (0.4906) data time 0.0011 (0.0029) model time 0.4766 (0.4859) loss 2.9447 (2.8655) grad_norm 1.4651 (2.1466) loss_scale 512.0000 (289.9917) mem 16721MB [2024-08-10 17:21:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][250/625] eta 0:03:03 lr 0.000414 wd 0.0500 time 0.4881 (0.4904) data time 0.0008 (0.0029) model time 0.4873 (0.4858) loss 3.1662 (2.8665) grad_norm 1.5183 (2.1374) loss_scale 512.0000 (298.8367) mem 16721MB [2024-08-10 17:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][260/625] eta 0:02:58 lr 0.000413 wd 0.0500 time 0.4832 (0.4902) data time 0.0009 (0.0028) model time 0.4823 (0.4857) loss 3.3446 (2.8647) grad_norm 2.1321 (2.1268) loss_scale 512.0000 (307.0038) mem 16721MB [2024-08-10 17:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][270/625] eta 0:02:53 lr 0.000413 wd 0.0500 time 0.4872 (0.4900) data time 0.0008 (0.0027) model time 0.4863 (0.4856) loss 3.1756 (2.8607) grad_norm 1.5062 (2.1176) loss_scale 512.0000 (314.5683) mem 16721MB [2024-08-10 17:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][280/625] eta 0:02:49 lr 0.000413 wd 0.0500 time 0.4829 (0.4899) data time 0.0011 (0.0027) model time 0.4818 (0.4856) loss 3.0927 (2.8566) grad_norm 1.6946 (2.1165) loss_scale 512.0000 (321.5943) mem 16721MB [2024-08-10 17:22:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][290/625] eta 0:02:44 lr 0.000413 wd 0.0500 time 0.4834 (0.4897) data time 0.0008 (0.0026) model time 0.4826 (0.4855) loss 1.7639 (2.8505) grad_norm 1.7234 (2.1058) loss_scale 512.0000 (328.1375) mem 16721MB [2024-08-10 17:22:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][300/625] eta 0:02:39 lr 0.000413 wd 0.0500 time 0.4865 (0.4894) data time 0.0008 (0.0026) model time 0.4857 (0.4854) loss 3.2969 (2.8507) grad_norm 2.2259 (2.0916) loss_scale 512.0000 (334.2458) mem 16721MB [2024-08-10 17:22:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][310/625] eta 0:02:34 lr 0.000413 wd 0.0500 time 0.4896 (0.4893) data time 0.0010 (0.0025) model time 0.4886 (0.4852) loss 2.9305 (2.8443) grad_norm 1.6560 (2.0798) loss_scale 512.0000 (339.9614) mem 16721MB [2024-08-10 17:22:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][320/625] eta 0:02:29 lr 0.000413 wd 0.0500 time 0.4845 (0.4891) data time 0.0009 (0.0025) model time 0.4837 (0.4851) loss 2.8042 (2.8437) grad_norm 1.6569 (2.0693) loss_scale 512.0000 (345.3209) mem 16721MB [2024-08-10 17:22:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][330/625] eta 0:02:24 lr 0.000413 wd 0.0500 time 0.4856 (0.4890) data time 0.0008 (0.0024) model time 0.4848 (0.4851) loss 2.4131 (2.8414) grad_norm 1.7544 (2.0600) loss_scale 512.0000 (350.3565) mem 16721MB [2024-08-10 17:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][340/625] eta 0:02:19 lr 0.000413 wd 0.0500 time 0.4876 (0.4888) data time 0.0011 (0.0024) model time 0.4865 (0.4850) loss 2.8762 (2.8448) grad_norm 2.6310 (2.1015) loss_scale 512.0000 (355.0968) mem 16721MB [2024-08-10 17:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][350/625] eta 0:02:14 lr 0.000413 wd 0.0500 time 0.4814 (0.4887) data time 0.0008 (0.0023) model time 0.4806 (0.4850) loss 2.6872 (2.8402) grad_norm 2.3049 (2.1118) loss_scale 512.0000 (359.5670) mem 16721MB [2024-08-10 17:22:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][360/625] eta 0:02:09 lr 0.000412 wd 0.0500 time 0.4793 (0.4885) data time 0.0008 (0.0023) model time 0.4785 (0.4848) loss 3.3189 (2.8420) grad_norm 2.8374 (2.1210) loss_scale 512.0000 (363.7895) mem 16721MB [2024-08-10 17:22:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][370/625] eta 0:02:04 lr 0.000412 wd 0.0500 time 0.4826 (0.4888) data time 0.0011 (0.0023) model time 0.4815 (0.4853) loss 2.8967 (2.8370) grad_norm 1.4025 (2.1148) loss_scale 512.0000 (367.7844) mem 16721MB [2024-08-10 17:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][380/625] eta 0:01:59 lr 0.000412 wd 0.0500 time 0.4870 (0.4887) data time 0.0009 (0.0022) model time 0.4861 (0.4852) loss 2.4824 (2.8311) grad_norm 1.4967 (2.1016) loss_scale 512.0000 (371.5696) mem 16721MB [2024-08-10 17:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][390/625] eta 0:01:54 lr 0.000412 wd 0.0500 time 0.4876 (0.4885) data time 0.0010 (0.0022) model time 0.4865 (0.4851) loss 2.0366 (2.8244) grad_norm 1.7165 (2.0958) loss_scale 512.0000 (375.1611) mem 16721MB [2024-08-10 17:22:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][400/625] eta 0:01:49 lr 0.000412 wd 0.0500 time 0.4851 (0.4884) data time 0.0010 (0.0022) model time 0.4841 (0.4851) loss 2.5884 (2.8232) grad_norm 2.1095 (2.0866) loss_scale 512.0000 (378.5736) mem 16721MB [2024-08-10 17:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][410/625] eta 0:01:44 lr 0.000412 wd 0.0500 time 0.4800 (0.4883) data time 0.0012 (0.0022) model time 0.4788 (0.4850) loss 3.1349 (2.8214) grad_norm 1.9712 (2.0770) loss_scale 512.0000 (381.8200) mem 16721MB [2024-08-10 17:23:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][420/625] eta 0:01:40 lr 0.000412 wd 0.0500 time 0.4862 (0.4882) data time 0.0008 (0.0021) model time 0.4854 (0.4850) loss 2.3542 (2.8166) grad_norm 1.6201 (2.0674) loss_scale 512.0000 (384.9121) mem 16721MB [2024-08-10 17:23:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][430/625] eta 0:01:35 lr 0.000412 wd 0.0500 time 0.4829 (0.4891) data time 0.0009 (0.0021) model time 0.4820 (0.4860) loss 3.7172 (2.8186) grad_norm 1.8442 (2.0794) loss_scale 512.0000 (387.8608) mem 16721MB [2024-08-10 17:23:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][440/625] eta 0:01:30 lr 0.000412 wd 0.0500 time 0.4874 (0.4889) data time 0.0011 (0.0021) model time 0.4864 (0.4859) loss 3.2920 (2.8180) grad_norm 1.6250 (2.0740) loss_scale 512.0000 (390.6757) mem 16721MB [2024-08-10 17:23:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][450/625] eta 0:01:25 lr 0.000412 wd 0.0500 time 0.4800 (0.4888) data time 0.0012 (0.0021) model time 0.4788 (0.4858) loss 2.9738 (2.8174) grad_norm 1.5449 (2.0661) loss_scale 512.0000 (393.3659) mem 16721MB [2024-08-10 17:23:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][460/625] eta 0:01:20 lr 0.000411 wd 0.0500 time 0.4870 (0.4887) data time 0.0011 (0.0020) model time 0.4859 (0.4857) loss 2.8657 (2.8210) grad_norm 1.6051 (2.0608) loss_scale 512.0000 (395.9393) mem 16721MB [2024-08-10 17:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][470/625] eta 0:01:15 lr 0.000411 wd 0.0500 time 0.4830 (0.4886) data time 0.0010 (0.0020) model time 0.4820 (0.4857) loss 1.7898 (2.8195) grad_norm 1.8925 (2.0586) loss_scale 512.0000 (398.4034) mem 16721MB [2024-08-10 17:23:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][480/625] eta 0:01:10 lr 0.000411 wd 0.0500 time 0.4847 (0.4886) data time 0.0010 (0.0020) model time 0.4837 (0.4856) loss 2.6643 (2.8187) grad_norm 1.6778 (2.0635) loss_scale 512.0000 (400.7651) mem 16721MB [2024-08-10 17:23:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][490/625] eta 0:01:05 lr 0.000411 wd 0.0500 time 0.4881 (0.4885) data time 0.0008 (0.0020) model time 0.4873 (0.4856) loss 2.1546 (2.8175) grad_norm 1.3897 (2.0703) loss_scale 512.0000 (403.0305) mem 16721MB [2024-08-10 17:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][500/625] eta 0:01:01 lr 0.000411 wd 0.0500 time 0.4796 (0.4884) data time 0.0008 (0.0020) model time 0.4788 (0.4855) loss 3.1358 (2.8187) grad_norm 1.6969 (2.0830) loss_scale 512.0000 (405.2056) mem 16721MB [2024-08-10 17:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][510/625] eta 0:00:56 lr 0.000411 wd 0.0500 time 0.4803 (0.4883) data time 0.0008 (0.0019) model time 0.4795 (0.4855) loss 3.1621 (2.8231) grad_norm 1.3820 (2.0766) loss_scale 512.0000 (407.2955) mem 16721MB [2024-08-10 17:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][520/625] eta 0:00:51 lr 0.000411 wd 0.0500 time 0.4841 (0.4882) data time 0.0010 (0.0019) model time 0.4830 (0.4854) loss 2.8498 (2.8221) grad_norm 3.8029 (2.0718) loss_scale 512.0000 (409.3052) mem 16721MB [2024-08-10 17:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][530/625] eta 0:00:46 lr 0.000411 wd 0.0500 time 0.4826 (0.4881) data time 0.0009 (0.0019) model time 0.4817 (0.4853) loss 2.4814 (2.8229) grad_norm 2.2156 (2.0693) loss_scale 512.0000 (411.2392) mem 16721MB [2024-08-10 17:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][540/625] eta 0:00:41 lr 0.000411 wd 0.0500 time 0.4858 (0.4880) data time 0.0010 (0.0019) model time 0.4847 (0.4852) loss 2.8649 (2.8234) grad_norm 2.0563 (2.0673) loss_scale 512.0000 (413.1017) mem 16721MB [2024-08-10 17:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][550/625] eta 0:00:36 lr 0.000411 wd 0.0500 time 0.4843 (0.4880) data time 0.0071 (0.0019) model time 0.4772 (0.4852) loss 2.7413 (2.8203) grad_norm 1.1950 (2.0610) loss_scale 512.0000 (414.8966) mem 16721MB [2024-08-10 17:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][560/625] eta 0:00:31 lr 0.000410 wd 0.0500 time 0.4815 (0.4880) data time 0.0010 (0.0019) model time 0.4804 (0.4853) loss 2.8312 (2.8152) grad_norm 1.2401 (2.0525) loss_scale 512.0000 (416.6275) mem 16721MB [2024-08-10 17:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][570/625] eta 0:00:26 lr 0.000410 wd 0.0500 time 0.4823 (0.4880) data time 0.0008 (0.0019) model time 0.4815 (0.4853) loss 2.6044 (2.8155) grad_norm 2.0949 (2.0436) loss_scale 512.0000 (418.2977) mem 16721MB [2024-08-10 17:24:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][580/625] eta 0:00:21 lr 0.000410 wd 0.0500 time 0.4810 (0.4879) data time 0.0011 (0.0018) model time 0.4799 (0.4852) loss 3.1700 (2.8172) grad_norm 2.4130 (2.0531) loss_scale 512.0000 (419.9105) mem 16721MB [2024-08-10 17:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][590/625] eta 0:00:17 lr 0.000410 wd 0.0500 time 0.4868 (0.4878) data time 0.0008 (0.0018) model time 0.4860 (0.4852) loss 3.2660 (2.8182) grad_norm 2.1405 (2.0642) loss_scale 512.0000 (421.4687) mem 16721MB [2024-08-10 17:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][600/625] eta 0:00:12 lr 0.000410 wd 0.0500 time 0.4886 (0.4878) data time 0.0010 (0.0018) model time 0.4875 (0.4852) loss 2.4023 (2.8146) grad_norm 1.9217 (2.0643) loss_scale 512.0000 (422.9750) mem 16721MB [2024-08-10 17:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][610/625] eta 0:00:07 lr 0.000410 wd 0.0500 time 0.4836 (0.4877) data time 0.0006 (0.0018) model time 0.4830 (0.4851) loss 3.6300 (2.8147) grad_norm 1.7675 (2.0631) loss_scale 512.0000 (424.4321) mem 16721MB [2024-08-10 17:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [189/300][620/625] eta 0:00:02 lr 0.000410 wd 0.0500 time 0.4848 (0.4881) data time 0.0006 (0.0018) model time 0.4842 (0.4855) loss 3.5773 (2.8160) grad_norm 2.0961 (2.0595) loss_scale 512.0000 (425.8422) mem 16721MB [2024-08-10 17:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 189 training takes 0:05:05 [2024-08-10 17:24:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:24:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.520 (0.520) Loss 0.4883 (0.4883) Acc@1 88.916 (88.916) Acc@5 98.779 (98.779) Mem 16721MB [2024-08-10 17:24:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.7988 (0.6163) Acc@1 80.664 (86.404) Acc@5 95.752 (97.665) Mem 16721MB [2024-08-10 17:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.8911 (0.7335) Acc@1 79.492 (83.329) Acc@5 95.312 (96.547) Mem 16721MB [2024-08-10 17:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 82.987 Acc@5 96.519 [2024-08-10 17:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 17:24:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.813 (0.813) Loss 0.4731 (0.4731) Acc@1 89.453 (89.453) Acc@5 98.730 (98.730) Mem 16721MB [2024-08-10 17:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.191) Loss 0.7539 (0.5854) Acc@1 81.641 (87.256) Acc@5 96.680 (97.980) Mem 16721MB [2024-08-10 17:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.156) Loss 0.8423 (0.6875) Acc@1 79.834 (84.542) Acc@5 95.996 (97.008) Mem 16721MB [2024-08-10 17:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.237 Acc@5 97.005 [2024-08-10 17:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 17:24:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][0/625] eta 0:13:24 lr 0.000410 wd 0.0500 time 1.2876 (1.2876) data time 0.5498 (0.5498) model time 0.0000 (0.0000) loss 2.8532 (2.8532) grad_norm 2.4833 (2.4833) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][10/625] eta 0:05:43 lr 0.000410 wd 0.0500 time 0.4820 (0.5578) data time 0.0008 (0.0509) model time 0.0000 (0.0000) loss 3.1558 (2.8038) grad_norm 2.3655 (2.0562) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][20/625] eta 0:05:15 lr 0.000410 wd 0.0500 time 0.4819 (0.5220) data time 0.0013 (0.0273) model time 0.0000 (0.0000) loss 2.8977 (2.8124) grad_norm 1.6921 (2.0995) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][30/625] eta 0:05:03 lr 0.000410 wd 0.0500 time 0.4837 (0.5094) data time 0.0009 (0.0188) model time 0.0000 (0.0000) loss 2.5980 (2.8305) grad_norm 1.7656 (2.0414) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][40/625] eta 0:04:54 lr 0.000409 wd 0.0500 time 0.4787 (0.5028) data time 0.0011 (0.0145) model time 0.0000 (0.0000) loss 2.8161 (2.8542) grad_norm 1.2444 (2.1884) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][50/625] eta 0:04:47 lr 0.000409 wd 0.0500 time 0.4837 (0.4994) data time 0.0010 (0.0119) model time 0.0000 (0.0000) loss 3.2733 (2.8502) grad_norm 1.6260 (2.1989) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][60/625] eta 0:04:40 lr 0.000409 wd 0.0500 time 0.4883 (0.4973) data time 0.0008 (0.0101) model time 0.4875 (0.4852) loss 3.2470 (2.8536) grad_norm 1.6631 (2.1992) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][70/625] eta 0:04:35 lr 0.000409 wd 0.0500 time 0.4904 (0.4957) data time 0.0012 (0.0089) model time 0.4893 (0.4853) loss 3.2534 (2.8350) grad_norm 1.7941 (2.1812) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][80/625] eta 0:04:30 lr 0.000409 wd 0.0500 time 0.4900 (0.4964) data time 0.0011 (0.0079) model time 0.4889 (0.4903) loss 2.3641 (2.8411) grad_norm 1.2807 (2.1585) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][90/625] eta 0:04:24 lr 0.000409 wd 0.0500 time 0.4872 (0.4952) data time 0.0012 (0.0072) model time 0.4860 (0.4888) loss 3.1307 (2.8206) grad_norm 1.6828 (2.1284) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][100/625] eta 0:04:19 lr 0.000409 wd 0.0500 time 0.4911 (0.4942) data time 0.0011 (0.0066) model time 0.4901 (0.4879) loss 3.1154 (2.8217) grad_norm 1.8661 (2.1500) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][110/625] eta 0:04:13 lr 0.000409 wd 0.0500 time 0.4778 (0.4932) data time 0.0008 (0.0061) model time 0.4770 (0.4868) loss 1.8712 (2.8165) grad_norm 2.0467 (2.1584) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:25:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][120/625] eta 0:04:08 lr 0.000409 wd 0.0500 time 0.4830 (0.4923) data time 0.0014 (0.0056) model time 0.4815 (0.4861) loss 2.8778 (2.8337) grad_norm 2.7773 (2.2033) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][130/625] eta 0:04:03 lr 0.000409 wd 0.0500 time 0.4918 (0.4917) data time 0.0011 (0.0053) model time 0.4907 (0.4857) loss 3.2929 (2.8221) grad_norm 1.8913 (2.2050) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][140/625] eta 0:03:58 lr 0.000408 wd 0.0500 time 0.4822 (0.4924) data time 0.0011 (0.0050) model time 0.4811 (0.4874) loss 2.7610 (2.8321) grad_norm 1.8480 (2.1776) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][150/625] eta 0:03:53 lr 0.000408 wd 0.0500 time 0.4837 (0.4920) data time 0.0009 (0.0048) model time 0.4828 (0.4870) loss 3.9069 (2.8312) grad_norm 1.7763 (2.2068) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][160/625] eta 0:03:48 lr 0.000408 wd 0.0500 time 0.4817 (0.4915) data time 0.0008 (0.0045) model time 0.4809 (0.4868) loss 3.3990 (2.8371) grad_norm 2.3364 (2.2329) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][170/625] eta 0:03:43 lr 0.000408 wd 0.0500 time 0.4783 (0.4910) data time 0.0011 (0.0043) model time 0.4771 (0.4863) loss 3.1145 (2.8301) grad_norm 1.8107 (2.2189) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][180/625] eta 0:03:38 lr 0.000408 wd 0.0500 time 0.4919 (0.4907) data time 0.0008 (0.0041) model time 0.4910 (0.4861) loss 2.1993 (2.8232) grad_norm 1.6097 (2.2515) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][190/625] eta 0:03:33 lr 0.000408 wd 0.0500 time 0.4868 (0.4902) data time 0.0011 (0.0040) model time 0.4857 (0.4858) loss 2.3477 (2.8268) grad_norm 2.0066 (2.2569) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][200/625] eta 0:03:28 lr 0.000408 wd 0.0500 time 0.4865 (0.4900) data time 0.0011 (0.0038) model time 0.4854 (0.4858) loss 2.9074 (2.8373) grad_norm 1.5618 (2.2985) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][210/625] eta 0:03:23 lr 0.000408 wd 0.0500 time 0.4848 (0.4898) data time 0.0008 (0.0037) model time 0.4840 (0.4856) loss 1.7187 (2.8262) grad_norm 1.3747 (2.2677) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][220/625] eta 0:03:18 lr 0.000408 wd 0.0500 time 0.4771 (0.4894) data time 0.0010 (0.0036) model time 0.4761 (0.4853) loss 3.5347 (2.8191) grad_norm 1.8548 (2.2535) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][230/625] eta 0:03:13 lr 0.000408 wd 0.0500 time 0.4811 (0.4892) data time 0.0008 (0.0035) model time 0.4803 (0.4852) loss 2.7622 (2.8203) grad_norm 1.3674 (2.2367) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][240/625] eta 0:03:08 lr 0.000407 wd 0.0500 time 0.4800 (0.4888) data time 0.0008 (0.0034) model time 0.4792 (0.4849) loss 2.2681 (2.8233) grad_norm 1.7243 (2.2220) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][250/625] eta 0:03:03 lr 0.000407 wd 0.0500 time 0.4840 (0.4886) data time 0.0008 (0.0033) model time 0.4833 (0.4847) loss 1.6150 (2.8273) grad_norm 1.9370 (2.2151) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][260/625] eta 0:02:58 lr 0.000407 wd 0.0500 time 0.4856 (0.4884) data time 0.0008 (0.0032) model time 0.4848 (0.4846) loss 3.1918 (2.8221) grad_norm 1.7156 (2.2398) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][270/625] eta 0:02:53 lr 0.000407 wd 0.0500 time 0.4809 (0.4881) data time 0.0009 (0.0031) model time 0.4800 (0.4844) loss 3.3936 (2.8181) grad_norm 2.2750 (2.2391) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][280/625] eta 0:02:48 lr 0.000407 wd 0.0500 time 0.4832 (0.4880) data time 0.0008 (0.0031) model time 0.4824 (0.4844) loss 1.7450 (2.8132) grad_norm 1.7025 (2.2165) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][290/625] eta 0:02:43 lr 0.000407 wd 0.0500 time 0.4819 (0.4879) data time 0.0012 (0.0030) model time 0.4807 (0.4843) loss 3.0486 (2.8080) grad_norm 1.4554 (2.2235) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][300/625] eta 0:02:38 lr 0.000407 wd 0.0500 time 0.4824 (0.4878) data time 0.0011 (0.0029) model time 0.4813 (0.4843) loss 3.1277 (2.8171) grad_norm 2.1321 (2.2124) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][310/625] eta 0:02:33 lr 0.000407 wd 0.0500 time 0.4831 (0.4877) data time 0.0008 (0.0029) model time 0.4823 (0.4843) loss 3.2670 (2.8176) grad_norm 1.3757 (2.2035) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][320/625] eta 0:02:28 lr 0.000407 wd 0.0500 time 0.4800 (0.4882) data time 0.0010 (0.0028) model time 0.4791 (0.4850) loss 1.6849 (2.8155) grad_norm 1.8922 (2.1927) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][330/625] eta 0:02:23 lr 0.000406 wd 0.0500 time 0.4813 (0.4881) data time 0.0012 (0.0028) model time 0.4801 (0.4849) loss 2.9356 (2.8101) grad_norm 1.6811 (2.1898) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][340/625] eta 0:02:19 lr 0.000406 wd 0.0500 time 0.4844 (0.4879) data time 0.0009 (0.0027) model time 0.4835 (0.4847) loss 3.1138 (2.8117) grad_norm 3.5086 (2.1826) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][350/625] eta 0:02:14 lr 0.000406 wd 0.0500 time 0.4867 (0.4878) data time 0.0009 (0.0027) model time 0.4859 (0.4847) loss 3.0262 (2.8124) grad_norm 1.4864 (2.1744) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][360/625] eta 0:02:09 lr 0.000406 wd 0.0500 time 0.4829 (0.4877) data time 0.0009 (0.0026) model time 0.4820 (0.4847) loss 2.7247 (2.8107) grad_norm 1.2457 (2.1646) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][370/625] eta 0:02:04 lr 0.000406 wd 0.0500 time 0.4860 (0.4877) data time 0.0009 (0.0026) model time 0.4852 (0.4847) loss 1.7555 (2.8124) grad_norm 1.4432 (2.1641) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][380/625] eta 0:01:59 lr 0.000406 wd 0.0500 time 0.4797 (0.4876) data time 0.0012 (0.0026) model time 0.4785 (0.4846) loss 2.6818 (2.8136) grad_norm 1.7798 (2.1549) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][390/625] eta 0:01:54 lr 0.000406 wd 0.0500 time 0.4832 (0.4875) data time 0.0008 (0.0025) model time 0.4824 (0.4846) loss 1.9743 (2.8093) grad_norm 2.4742 (2.1509) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][400/625] eta 0:01:49 lr 0.000406 wd 0.0500 time 0.4828 (0.4874) data time 0.0011 (0.0025) model time 0.4817 (0.4845) loss 3.4805 (2.8139) grad_norm 2.6276 (2.1486) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][410/625] eta 0:01:44 lr 0.000406 wd 0.0500 time 0.4828 (0.4873) data time 0.0009 (0.0025) model time 0.4818 (0.4845) loss 2.9370 (2.8133) grad_norm 2.3221 (2.1451) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][420/625] eta 0:01:39 lr 0.000406 wd 0.0500 time 0.4857 (0.4876) data time 0.0010 (0.0024) model time 0.4847 (0.4849) loss 2.3158 (2.8110) grad_norm 5.0382 (2.1573) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][430/625] eta 0:01:35 lr 0.000405 wd 0.0500 time 0.4891 (0.4876) data time 0.0012 (0.0024) model time 0.4878 (0.4849) loss 1.8512 (2.8103) grad_norm 1.5633 (2.1540) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][440/625] eta 0:01:30 lr 0.000405 wd 0.0500 time 0.4806 (0.4875) data time 0.0013 (0.0024) model time 0.4793 (0.4849) loss 2.4656 (2.8079) grad_norm 1.4807 (2.1407) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][450/625] eta 0:01:25 lr 0.000405 wd 0.0500 time 0.4851 (0.4875) data time 0.0008 (0.0023) model time 0.4843 (0.4849) loss 3.2857 (2.8015) grad_norm 1.8222 (2.1379) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][460/625] eta 0:01:20 lr 0.000405 wd 0.0500 time 0.4903 (0.4875) data time 0.0009 (0.0023) model time 0.4895 (0.4848) loss 3.1893 (2.8002) grad_norm 2.1308 (2.1398) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][470/625] eta 0:01:15 lr 0.000405 wd 0.0500 time 0.4867 (0.4874) data time 0.0009 (0.0023) model time 0.4858 (0.4848) loss 1.9420 (2.7990) grad_norm 1.1016 (2.1389) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][480/625] eta 0:01:10 lr 0.000405 wd 0.0500 time 0.4866 (0.4877) data time 0.0008 (0.0023) model time 0.4857 (0.4852) loss 2.1281 (2.7962) grad_norm 1.8634 (2.1511) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][490/625] eta 0:01:05 lr 0.000405 wd 0.0500 time 0.4839 (0.4876) data time 0.0009 (0.0022) model time 0.4830 (0.4852) loss 2.0882 (2.8014) grad_norm 1.9584 (2.1468) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:28:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][500/625] eta 0:01:00 lr 0.000405 wd 0.0500 time 0.4882 (0.4876) data time 0.0010 (0.0022) model time 0.4872 (0.4851) loss 3.1297 (2.8024) grad_norm 3.1338 (2.1571) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][510/625] eta 0:00:56 lr 0.000405 wd 0.0500 time 0.4844 (0.4880) data time 0.0011 (0.0022) model time 0.4832 (0.4856) loss 2.5125 (2.7998) grad_norm 1.4124 (2.1468) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][520/625] eta 0:00:51 lr 0.000405 wd 0.0500 time 0.4849 (0.4879) data time 0.0011 (0.0022) model time 0.4838 (0.4856) loss 3.4535 (2.8001) grad_norm 2.4068 (2.1485) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][530/625] eta 0:00:46 lr 0.000404 wd 0.0500 time 0.4822 (0.4879) data time 0.0008 (0.0021) model time 0.4814 (0.4855) loss 3.7774 (2.8068) grad_norm 1.5245 (2.1414) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][540/625] eta 0:00:41 lr 0.000404 wd 0.0500 time 0.4854 (0.4878) data time 0.0011 (0.0021) model time 0.4843 (0.4855) loss 2.5848 (2.8105) grad_norm 2.8149 (2.1417) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][550/625] eta 0:00:36 lr 0.000404 wd 0.0500 time 0.4854 (0.4878) data time 0.0011 (0.0021) model time 0.4843 (0.4855) loss 3.1449 (2.8075) grad_norm 1.3482 (2.1331) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][560/625] eta 0:00:31 lr 0.000404 wd 0.0500 time 0.4832 (0.4877) data time 0.0009 (0.0021) model time 0.4823 (0.4855) loss 2.9053 (2.8072) grad_norm 2.0516 (2.1279) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][570/625] eta 0:00:26 lr 0.000404 wd 0.0500 time 0.4867 (0.4877) data time 0.0012 (0.0021) model time 0.4856 (0.4855) loss 2.3050 (2.8022) grad_norm 2.2504 (2.1273) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][580/625] eta 0:00:21 lr 0.000404 wd 0.0500 time 0.4857 (0.4877) data time 0.0012 (0.0021) model time 0.4844 (0.4854) loss 3.3735 (2.8042) grad_norm 2.0229 (2.1251) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][590/625] eta 0:00:17 lr 0.000404 wd 0.0500 time 0.4888 (0.4877) data time 0.0008 (0.0020) model time 0.4880 (0.4854) loss 2.9021 (2.8062) grad_norm 1.5668 (2.1215) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][600/625] eta 0:00:12 lr 0.000404 wd 0.0500 time 0.4812 (0.4876) data time 0.0009 (0.0020) model time 0.4803 (0.4854) loss 3.2547 (2.8063) grad_norm 1.8790 (2.1164) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][610/625] eta 0:00:07 lr 0.000404 wd 0.0500 time 0.4855 (0.4876) data time 0.0008 (0.0020) model time 0.4847 (0.4854) loss 2.8017 (2.8037) grad_norm 3.6713 (2.1214) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [190/300][620/625] eta 0:00:02 lr 0.000404 wd 0.0500 time 0.4768 (0.4875) data time 0.0006 (0.0020) model time 0.4762 (0.4853) loss 2.8626 (2.8035) grad_norm 1.4979 (2.1231) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 190 training takes 0:05:04 [2024-08-10 17:30:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:30:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5239 (0.5239) Acc@1 89.648 (89.648) Acc@5 98.682 (98.682) Mem 16721MB [2024-08-10 17:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.167) Loss 0.8350 (0.6454) Acc@1 80.420 (86.266) Acc@5 95.947 (97.683) Mem 16721MB [2024-08-10 17:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.144) Loss 0.9067 (0.7580) Acc@1 79.297 (83.445) Acc@5 95.410 (96.554) Mem 16721MB [2024-08-10 17:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.107 Acc@5 96.563 [2024-08-10 17:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 17:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.11% [2024-08-10 17:30:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 17:30:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 17:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.582 (0.582) Loss 0.4727 (0.4727) Acc@1 89.453 (89.453) Acc@5 98.779 (98.779) Mem 16721MB [2024-08-10 17:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.167) Loss 0.7524 (0.5847) Acc@1 81.689 (87.282) Acc@5 96.680 (97.980) Mem 16721MB [2024-08-10 17:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.144) Loss 0.8418 (0.6870) Acc@1 79.932 (84.566) Acc@5 95.947 (97.033) Mem 16721MB [2024-08-10 17:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.245 Acc@5 97.029 [2024-08-10 17:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 17:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][0/625] eta 0:14:08 lr 0.000404 wd 0.0500 time 1.3571 (1.3571) data time 0.6530 (0.6530) model time 0.0000 (0.0000) loss 3.1840 (3.1840) grad_norm 1.4012 (1.4012) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][10/625] eta 0:05:47 lr 0.000403 wd 0.0500 time 0.4857 (0.5654) data time 0.0010 (0.0604) model time 0.0000 (0.0000) loss 2.5200 (2.7309) grad_norm 2.3560 (1.7431) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][20/625] eta 0:05:19 lr 0.000403 wd 0.0500 time 0.4896 (0.5285) data time 0.0009 (0.0321) model time 0.0000 (0.0000) loss 2.8553 (2.7117) grad_norm 2.0061 (1.9364) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][30/625] eta 0:05:06 lr 0.000403 wd 0.0500 time 0.4861 (0.5154) data time 0.0008 (0.0221) model time 0.0000 (0.0000) loss 3.4385 (2.7616) grad_norm 2.1513 (1.9416) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][40/625] eta 0:04:57 lr 0.000403 wd 0.0500 time 0.4825 (0.5081) data time 0.0009 (0.0169) model time 0.0000 (0.0000) loss 2.3908 (2.7741) grad_norm 1.4410 (1.9106) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][50/625] eta 0:04:49 lr 0.000403 wd 0.0500 time 0.4816 (0.5034) data time 0.0010 (0.0138) model time 0.0000 (0.0000) loss 2.6277 (2.7711) grad_norm 1.3619 (1.8902) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][60/625] eta 0:04:42 lr 0.000403 wd 0.0500 time 0.4859 (0.5004) data time 0.0011 (0.0118) model time 0.4849 (0.4840) loss 1.8475 (2.7741) grad_norm 1.6162 (1.8672) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][70/625] eta 0:04:37 lr 0.000403 wd 0.0500 time 0.4870 (0.5006) data time 0.0008 (0.0103) model time 0.4863 (0.4926) loss 2.6684 (2.7536) grad_norm 1.3047 (1.8226) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][80/625] eta 0:04:31 lr 0.000403 wd 0.0500 time 0.4851 (0.4989) data time 0.0007 (0.0091) model time 0.4843 (0.4901) loss 3.0791 (2.7633) grad_norm 1.7605 (1.9312) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:30:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][90/625] eta 0:04:28 lr 0.000403 wd 0.0500 time 0.4197 (0.5016) data time 0.0011 (0.0082) model time 0.4187 (0.4983) loss 2.4752 (2.7534) grad_norm 1.4188 (1.9430) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][100/625] eta 0:04:22 lr 0.000403 wd 0.0500 time 0.4919 (0.5003) data time 0.0008 (0.0075) model time 0.4911 (0.4961) loss 2.9178 (2.7791) grad_norm 3.6162 (1.9485) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][110/625] eta 0:04:16 lr 0.000402 wd 0.0500 time 0.4906 (0.4990) data time 0.0009 (0.0069) model time 0.4897 (0.4942) loss 2.7750 (2.7993) grad_norm 2.0416 (1.9458) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][120/625] eta 0:04:11 lr 0.000402 wd 0.0500 time 0.4800 (0.4976) data time 0.0013 (0.0065) model time 0.4787 (0.4923) loss 2.9085 (2.8012) grad_norm 2.0596 (1.9222) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][130/625] eta 0:04:05 lr 0.000402 wd 0.0500 time 0.4819 (0.4966) data time 0.0012 (0.0061) model time 0.4807 (0.4911) loss 2.8301 (2.7915) grad_norm 2.7019 (1.9181) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][140/625] eta 0:04:00 lr 0.000402 wd 0.0500 time 0.4835 (0.4957) data time 0.0010 (0.0057) model time 0.4825 (0.4902) loss 2.9953 (2.8047) grad_norm 1.7361 (1.9054) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][150/625] eta 0:03:55 lr 0.000402 wd 0.0500 time 0.4870 (0.4950) data time 0.0008 (0.0054) model time 0.4862 (0.4896) loss 3.1825 (2.8236) grad_norm 1.4762 (1.8946) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][160/625] eta 0:03:49 lr 0.000402 wd 0.0500 time 0.4844 (0.4945) data time 0.0010 (0.0052) model time 0.4834 (0.4892) loss 3.0368 (2.8259) grad_norm 1.3365 (1.8747) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][170/625] eta 0:03:44 lr 0.000402 wd 0.0500 time 0.4836 (0.4940) data time 0.0011 (0.0049) model time 0.4825 (0.4889) loss 2.3143 (2.8066) grad_norm 1.9070 (1.8748) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][180/625] eta 0:03:39 lr 0.000402 wd 0.0500 time 0.4882 (0.4936) data time 0.0009 (0.0047) model time 0.4873 (0.4886) loss 3.3030 (2.8068) grad_norm 1.5576 (1.8873) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][190/625] eta 0:03:34 lr 0.000402 wd 0.0500 time 0.4860 (0.4931) data time 0.0008 (0.0045) model time 0.4851 (0.4883) loss 1.9253 (2.8060) grad_norm 1.6041 (1.9365) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][200/625] eta 0:03:29 lr 0.000402 wd 0.0500 time 0.4813 (0.4927) data time 0.0011 (0.0043) model time 0.4802 (0.4879) loss 2.1090 (2.8038) grad_norm 1.5211 (1.9386) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][210/625] eta 0:03:24 lr 0.000401 wd 0.0500 time 0.4823 (0.4923) data time 0.0010 (0.0042) model time 0.4812 (0.4876) loss 2.7297 (2.8145) grad_norm 1.8320 (1.9421) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:31:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][220/625] eta 0:03:19 lr 0.000401 wd 0.0500 time 0.4883 (0.4919) data time 0.0010 (0.0040) model time 0.4873 (0.4874) loss 3.1762 (2.8152) grad_norm 2.1588 (1.9794) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:32:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][230/625] eta 0:03:14 lr 0.000401 wd 0.0500 time 0.4868 (0.4917) data time 0.0011 (0.0039) model time 0.4857 (0.4873) loss 2.7768 (2.8118) grad_norm 2.7078 (1.9860) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:32:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][240/625] eta 0:03:09 lr 0.000401 wd 0.0500 time 0.4845 (0.4915) data time 0.0011 (0.0038) model time 0.4834 (0.4871) loss 2.8239 (2.8109) grad_norm 8.4854 (2.0282) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][250/625] eta 0:03:04 lr 0.000401 wd 0.0500 time 0.4804 (0.4912) data time 0.0008 (0.0037) model time 0.4796 (0.4870) loss 2.8319 (2.8143) grad_norm 2.4765 (2.0388) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][260/625] eta 0:02:59 lr 0.000401 wd 0.0500 time 0.4787 (0.4913) data time 0.0011 (0.0036) model time 0.4776 (0.4873) loss 2.9035 (2.8135) grad_norm 2.0414 (2.0401) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:32:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][270/625] eta 0:02:54 lr 0.000401 wd 0.0500 time 0.4850 (0.4910) data time 0.0009 (0.0035) model time 0.4841 (0.4871) loss 3.5550 (2.8121) grad_norm 1.9844 (2.0431) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 17:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 17:32:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:32:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 17:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 17:34:35 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 17:34:45 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 17:34:45 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 17:34:47 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 17:34:49 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 17:34:50 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 191) [2024-08-10 17:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 17:35:13 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 17:35:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:35:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:52:53 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 17:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 17:53:08 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 17:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 17:53:21 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 17:53:23 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 17:53:25 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 17:53:26 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 191) [2024-08-10 17:53:26 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 17:53:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][280/625] eta 0:26:03 lr 0.000401 wd 0.0500 time 0.4675 (4.5325) data time 0.0008 (0.1771) model time 0.4667 (4.3554) loss 3.1014 (3.0142) grad_norm 3.4959 (2.7209) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][290/625] eta 0:10:11 lr 0.000401 wd 0.0500 time 0.4661 (1.8247) data time 0.0011 (0.0597) model time 0.4650 (1.7650) loss 3.1908 (2.9998) grad_norm 2.3181 (2.3853) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][300/625] eta 0:06:56 lr 0.000401 wd 0.0500 time 0.4638 (1.2818) data time 0.0011 (0.0363) model time 0.4627 (1.2455) loss 3.3485 (3.0249) grad_norm 2.0341 (2.4078) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][310/625] eta 0:05:32 lr 0.000400 wd 0.0500 time 0.4642 (1.0565) data time 0.0011 (0.0262) model time 0.4632 (1.0303) loss 2.9476 (3.0070) grad_norm 2.3332 (2.5524) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][320/625] eta 0:04:43 lr 0.000400 wd 0.0500 time 0.4650 (0.9300) data time 0.0010 (0.0206) model time 0.4640 (0.9094) loss 3.1753 (2.9735) grad_norm 1.7681 (2.5478) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][330/625] eta 0:04:09 lr 0.000400 wd 0.0500 time 0.4727 (0.8461) data time 0.0008 (0.0171) model time 0.4719 (0.8290) loss 2.1530 (2.9490) grad_norm 2.1067 (2.4687) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][340/625] eta 0:03:44 lr 0.000400 wd 0.0500 time 0.4708 (0.7884) data time 0.0010 (0.0146) model time 0.4698 (0.7738) loss 3.0788 (2.9315) grad_norm 1.8476 (2.3631) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][350/625] eta 0:03:25 lr 0.000400 wd 0.0500 time 0.4743 (0.7463) data time 0.0010 (0.0128) model time 0.4733 (0.7335) loss 2.3098 (2.8944) grad_norm 2.0171 (2.2765) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][360/625] eta 0:03:09 lr 0.000400 wd 0.0500 time 0.4693 (0.7140) data time 0.0009 (0.0114) model time 0.4684 (0.7026) loss 2.7694 (2.8818) grad_norm 2.4315 (2.2503) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][370/625] eta 0:02:55 lr 0.000400 wd 0.0500 time 0.4700 (0.6882) data time 0.0010 (0.0103) model time 0.4690 (0.6779) loss 3.2802 (2.8863) grad_norm 2.3804 (2.2405) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][380/625] eta 0:02:43 lr 0.000400 wd 0.0500 time 0.4699 (0.6673) data time 0.0011 (0.0094) model time 0.4689 (0.6579) loss 2.5952 (2.9021) grad_norm 1.9321 (2.2224) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][390/625] eta 0:02:32 lr 0.000400 wd 0.0500 time 0.4638 (0.6499) data time 0.0008 (0.0087) model time 0.4630 (0.6412) loss 2.4586 (2.8950) grad_norm 2.0654 (2.2615) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][400/625] eta 0:02:22 lr 0.000400 wd 0.0500 time 0.4654 (0.6352) data time 0.0008 (0.0081) model time 0.4645 (0.6271) loss 2.8553 (2.8945) grad_norm 1.9077 (2.2415) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][410/625] eta 0:02:13 lr 0.000399 wd 0.0500 time 0.4684 (0.6230) data time 0.0008 (0.0076) model time 0.4676 (0.6155) loss 2.6280 (2.9034) grad_norm 2.0250 (2.2005) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][420/625] eta 0:02:05 lr 0.000399 wd 0.0500 time 0.4724 (0.6126) data time 0.0010 (0.0071) model time 0.4714 (0.6055) loss 2.8740 (2.8879) grad_norm 1.7999 (2.1722) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][430/625] eta 0:01:57 lr 0.000399 wd 0.0500 time 0.4610 (0.6035) data time 0.0012 (0.0067) model time 0.4598 (0.5967) loss 3.2497 (2.8820) grad_norm 1.4854 (2.1481) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][440/625] eta 0:01:50 lr 0.000399 wd 0.0500 time 0.4631 (0.5953) data time 0.0010 (0.0064) model time 0.4621 (0.5890) loss 3.1005 (2.8782) grad_norm 1.6326 (2.1461) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][450/625] eta 0:01:42 lr 0.000399 wd 0.0500 time 0.4686 (0.5881) data time 0.0008 (0.0061) model time 0.4677 (0.5820) loss 2.9139 (2.8711) grad_norm 2.1776 (2.1287) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][460/625] eta 0:01:36 lr 0.000399 wd 0.0500 time 0.4651 (0.5824) data time 0.0011 (0.0058) model time 0.4640 (0.5766) loss 2.6607 (2.8624) grad_norm 2.0001 (2.1098) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 17:55:20 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 17:55:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:55:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 17:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 17:57:01 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 17:57:13 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 17:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 17:57:24 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 17:57:27 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 17:57:29 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 17:57:29 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 191) [2024-08-10 17:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 17:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][470/625] eta 0:09:09 lr 0.000399 wd 0.0500 time 0.4405 (3.5433) data time 0.0008 (0.1190) model time 0.4397 (3.4243) loss 3.2759 (3.1436) grad_norm 2.3874 (2.0585) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:57:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][480/625] eta 0:03:52 lr 0.000399 wd 0.0500 time 0.4412 (1.6055) data time 0.0010 (0.0452) model time 0.4402 (1.5603) loss 2.8598 (2.9877) grad_norm 1.5878 (2.4750) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][490/625] eta 0:02:36 lr 0.000399 wd 0.0500 time 0.4432 (1.1584) data time 0.0007 (0.0282) model time 0.4425 (1.1302) loss 2.8058 (2.9827) grad_norm 1.6923 (2.3009) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][500/625] eta 0:02:00 lr 0.000399 wd 0.0500 time 0.4442 (0.9670) data time 0.0009 (0.0206) model time 0.4433 (0.9464) loss 2.8688 (2.9723) grad_norm 1.8685 (2.1733) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][510/625] eta 0:01:38 lr 0.000398 wd 0.0500 time 0.4448 (0.8567) data time 0.0009 (0.0163) model time 0.4439 (0.8404) loss 2.6407 (2.9163) grad_norm 1.4917 (2.1291) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][520/625] eta 0:01:22 lr 0.000398 wd 0.0500 time 0.4454 (0.7828) data time 0.0006 (0.0136) model time 0.4448 (0.7692) loss 3.4469 (2.9189) grad_norm 2.5819 (2.1923) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][530/625] eta 0:01:09 lr 0.000398 wd 0.0500 time 0.4418 (0.7315) data time 0.0006 (0.0117) model time 0.4411 (0.7199) loss 2.3759 (2.8876) grad_norm 4.0663 (2.1682) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][540/625] eta 0:00:58 lr 0.000398 wd 0.0500 time 0.4457 (0.6936) data time 0.0008 (0.0102) model time 0.4449 (0.6834) loss 3.0268 (2.8539) grad_norm 1.4503 (2.1340) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][550/625] eta 0:00:49 lr 0.000398 wd 0.0500 time 0.4550 (0.6648) data time 0.0006 (0.0092) model time 0.4544 (0.6557) loss 2.0802 (2.8200) grad_norm 4.1853 (2.1351) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][560/625] eta 0:00:41 lr 0.000398 wd 0.0500 time 0.4452 (0.6419) data time 0.0006 (0.0083) model time 0.4446 (0.6337) loss 3.3007 (2.8298) grad_norm 1.8879 (2.2810) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][570/625] eta 0:00:34 lr 0.000398 wd 0.0500 time 0.4400 (0.6233) data time 0.0009 (0.0076) model time 0.4391 (0.6157) loss 3.0268 (2.8412) grad_norm 1.9984 (2.2360) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][580/625] eta 0:00:27 lr 0.000398 wd 0.0500 time 0.4403 (0.6077) data time 0.0008 (0.0070) model time 0.4395 (0.6007) loss 2.9924 (2.8393) grad_norm 1.8535 (2.2102) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][590/625] eta 0:00:20 lr 0.000398 wd 0.0500 time 0.4439 (0.5945) data time 0.0006 (0.0065) model time 0.4433 (0.5880) loss 1.7869 (2.8239) grad_norm 1.3775 (2.1675) loss_scale 512.0000 (512.0000) mem 16695MB [2024-08-10 17:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 17:58:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 17:58:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 18:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 18:11:29 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 18:11:42 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 18:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 18:11:54 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 18:11:57 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 18:11:59 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 18:11:59 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 191) [2024-08-10 18:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 18:12:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][600/625] eta 0:01:46 lr 0.000398 wd 0.0500 time 0.4721 (4.2485) data time 0.0010 (0.1235) model time 0.4712 (4.1250) loss 3.2406 (3.0017) grad_norm 5.3300 (2.1479) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 18:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][610/625] eta 0:00:28 lr 0.000397 wd 0.0500 time 0.4645 (1.8885) data time 0.0008 (0.0472) model time 0.4637 (1.8413) loss 2.8549 (2.9658) grad_norm 2.0155 (1.9091) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 18:12:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [191/300][620/625] eta 0:00:06 lr 0.000397 wd 0.0500 time 0.4654 (1.3412) data time 0.0005 (0.0294) model time 0.4649 (1.3119) loss 2.8310 (2.9958) grad_norm 1.4538 (1.8627) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 18:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 191 training takes 0:00:37 [2024-08-10 18:12:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 18:12:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 18:12:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.577 (0.577) Loss 0.5122 (0.5122) Acc@1 89.209 (89.209) Acc@5 98.730 (98.730) Mem 16712MB [2024-08-10 18:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.165) Loss 0.8271 (0.6303) Acc@1 80.176 (86.208) Acc@5 95.850 (97.714) Mem 16712MB [2024-08-10 18:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.142) Loss 0.9102 (0.7429) Acc@1 78.467 (83.303) Acc@5 95.020 (96.501) Mem 16712MB [2024-08-10 18:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.027 Acc@5 96.507 [2024-08-10 18:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.0% [2024-08-10 18:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.866 (0.866) Loss 0.4719 (0.4719) Acc@1 89.453 (89.453) Acc@5 98.779 (98.779) Mem 16712MB [2024-08-10 18:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.195) Loss 0.7505 (0.5847) Acc@1 81.787 (87.287) Acc@5 96.631 (97.985) Mem 16712MB [2024-08-10 18:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.158) Loss 0.8413 (0.6870) Acc@1 80.127 (84.568) Acc@5 95.996 (97.019) Mem 16712MB [2024-08-10 18:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.245 Acc@5 97.021 [2024-08-10 18:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-10 18:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.25% [2024-08-10 18:12:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 18:13:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 18:13:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][0/625] eta 0:09:51 lr 0.000397 wd 0.0500 time 0.9464 (0.9464) data time 0.4084 (0.4084) model time 0.0000 (0.0000) loss 1.8713 (1.8713) grad_norm 2.0584 (2.0584) loss_scale 512.0000 (512.0000) mem 16716MB [2024-08-10 18:13:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][10/625] eta 0:05:17 lr 0.000397 wd 0.0500 time 0.4816 (0.5165) data time 0.0011 (0.0381) model time 0.0000 (0.0000) loss 2.7418 (2.7141) grad_norm 4.4459 (2.7825) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][20/625] eta 0:05:01 lr 0.000397 wd 0.0500 time 0.4793 (0.4980) data time 0.0008 (0.0205) model time 0.0000 (0.0000) loss 3.2579 (2.8632) grad_norm 3.4990 (2.5858) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][30/625] eta 0:04:52 lr 0.000397 wd 0.0500 time 0.4770 (0.4923) data time 0.0010 (0.0142) model time 0.0000 (0.0000) loss 2.9656 (2.8383) grad_norm 2.0084 (2.3642) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][40/625] eta 0:04:46 lr 0.000397 wd 0.0500 time 0.4800 (0.4890) data time 0.0010 (0.0110) model time 0.0000 (0.0000) loss 2.5944 (2.8025) grad_norm 1.9321 (2.3040) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][50/625] eta 0:04:39 lr 0.000397 wd 0.0500 time 0.4708 (0.4868) data time 0.0010 (0.0091) model time 0.0000 (0.0000) loss 2.5399 (2.8142) grad_norm 2.1374 (2.2885) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][60/625] eta 0:04:33 lr 0.000397 wd 0.0500 time 0.4699 (0.4844) data time 0.0007 (0.0077) model time 0.4692 (0.4713) loss 3.7233 (2.8071) grad_norm 1.4282 (2.3101) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][70/625] eta 0:04:28 lr 0.000397 wd 0.0500 time 0.4762 (0.4831) data time 0.0008 (0.0068) model time 0.4754 (0.4727) loss 2.8728 (2.8348) grad_norm 2.1937 (2.3337) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][80/625] eta 0:04:22 lr 0.000396 wd 0.0500 time 0.4662 (0.4816) data time 0.0011 (0.0061) model time 0.4651 (0.4718) loss 2.3943 (2.8261) grad_norm 1.7928 (2.3240) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][90/625] eta 0:04:17 lr 0.000396 wd 0.0500 time 0.4724 (0.4814) data time 0.0008 (0.0055) model time 0.4715 (0.4735) loss 1.8942 (2.8342) grad_norm 2.4103 (2.2951) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][100/625] eta 0:04:12 lr 0.000396 wd 0.0500 time 0.4905 (0.4812) data time 0.0012 (0.0051) model time 0.4893 (0.4745) loss 3.2432 (2.8263) grad_norm 1.4532 (2.2550) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][110/625] eta 0:04:07 lr 0.000396 wd 0.0500 time 0.4808 (0.4812) data time 0.0008 (0.0047) model time 0.4800 (0.4753) loss 3.1063 (2.8191) grad_norm 1.3625 (2.2089) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][120/625] eta 0:04:02 lr 0.000396 wd 0.0500 time 0.4782 (0.4810) data time 0.0011 (0.0044) model time 0.4771 (0.4757) loss 2.2197 (2.8176) grad_norm 1.2838 (2.1682) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][130/625] eta 0:03:57 lr 0.000396 wd 0.0500 time 0.4726 (0.4805) data time 0.0011 (0.0042) model time 0.4715 (0.4754) loss 3.1761 (2.8177) grad_norm 1.5463 (2.1566) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][140/625] eta 0:03:52 lr 0.000396 wd 0.0500 time 0.4679 (0.4799) data time 0.0011 (0.0039) model time 0.4668 (0.4750) loss 2.5529 (2.8129) grad_norm 1.5626 (2.1448) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][150/625] eta 0:03:47 lr 0.000396 wd 0.0500 time 0.4734 (0.4795) data time 0.0011 (0.0038) model time 0.4722 (0.4747) loss 3.3564 (2.8020) grad_norm 1.7648 (2.1558) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][160/625] eta 0:03:42 lr 0.000396 wd 0.0500 time 0.4768 (0.4793) data time 0.0011 (0.0036) model time 0.4757 (0.4747) loss 2.5058 (2.7984) grad_norm 2.4072 (2.1423) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][170/625] eta 0:03:38 lr 0.000396 wd 0.0500 time 0.4795 (0.4795) data time 0.0012 (0.0034) model time 0.4784 (0.4754) loss 2.5677 (2.7864) grad_norm 1.5837 (2.1287) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][180/625] eta 0:03:33 lr 0.000395 wd 0.0500 time 0.4797 (0.4795) data time 0.0010 (0.0033) model time 0.4787 (0.4756) loss 3.1966 (2.7823) grad_norm 1.7751 (2.1287) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][190/625] eta 0:03:28 lr 0.000395 wd 0.0500 time 0.4730 (0.4803) data time 0.0009 (0.0032) model time 0.4721 (0.4769) loss 2.8256 (2.7768) grad_norm 1.8334 (2.1195) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][200/625] eta 0:03:24 lr 0.000395 wd 0.0500 time 0.4728 (0.4800) data time 0.0008 (0.0031) model time 0.4719 (0.4767) loss 1.9636 (2.7760) grad_norm 3.4394 (2.1166) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][210/625] eta 0:03:19 lr 0.000395 wd 0.0500 time 0.4718 (0.4796) data time 0.0008 (0.0030) model time 0.4710 (0.4763) loss 2.8539 (2.7737) grad_norm 1.9517 (2.1163) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][220/625] eta 0:03:14 lr 0.000395 wd 0.0500 time 0.4729 (0.4793) data time 0.0009 (0.0029) model time 0.4720 (0.4760) loss 2.7255 (2.7662) grad_norm 1.8170 (2.1151) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][230/625] eta 0:03:09 lr 0.000395 wd 0.0500 time 0.4799 (0.4790) data time 0.0008 (0.0028) model time 0.4791 (0.4757) loss 2.6018 (2.7573) grad_norm 2.0599 (2.1078) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:14:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][240/625] eta 0:03:04 lr 0.000395 wd 0.0500 time 0.4770 (0.4789) data time 0.0008 (0.0027) model time 0.4762 (0.4758) loss 2.9648 (2.7519) grad_norm 2.0278 (2.0932) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][250/625] eta 0:02:59 lr 0.000395 wd 0.0500 time 0.4806 (0.4790) data time 0.0011 (0.0027) model time 0.4795 (0.4759) loss 2.8916 (2.7554) grad_norm 1.9658 (2.0844) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][260/625] eta 0:02:54 lr 0.000395 wd 0.0500 time 0.4784 (0.4789) data time 0.0009 (0.0026) model time 0.4776 (0.4760) loss 1.8388 (2.7516) grad_norm 1.6988 (2.0725) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][270/625] eta 0:02:49 lr 0.000395 wd 0.0500 time 0.4704 (0.4788) data time 0.0011 (0.0026) model time 0.4693 (0.4759) loss 2.5265 (2.7466) grad_norm 3.6193 (2.0724) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][280/625] eta 0:02:45 lr 0.000394 wd 0.0500 time 0.4756 (0.4786) data time 0.0011 (0.0025) model time 0.4744 (0.4758) loss 3.0294 (2.7438) grad_norm 1.4999 (2.0610) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][290/625] eta 0:02:40 lr 0.000394 wd 0.0500 time 0.4731 (0.4785) data time 0.0008 (0.0025) model time 0.4723 (0.4757) loss 3.4384 (2.7550) grad_norm 2.0925 (2.0521) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][300/625] eta 0:02:35 lr 0.000394 wd 0.0500 time 0.4719 (0.4784) data time 0.0009 (0.0024) model time 0.4709 (0.4756) loss 2.0001 (2.7543) grad_norm 1.9430 (2.0460) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][310/625] eta 0:02:30 lr 0.000394 wd 0.0500 time 0.4807 (0.4784) data time 0.0009 (0.0024) model time 0.4797 (0.4756) loss 3.2845 (2.7573) grad_norm 1.8904 (2.0412) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][320/625] eta 0:02:25 lr 0.000394 wd 0.0500 time 0.4831 (0.4784) data time 0.0008 (0.0023) model time 0.4823 (0.4758) loss 3.2211 (2.7581) grad_norm 3.7312 (2.0444) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][330/625] eta 0:02:21 lr 0.000394 wd 0.0500 time 0.4784 (0.4785) data time 0.0008 (0.0023) model time 0.4776 (0.4759) loss 2.1977 (2.7578) grad_norm 2.1120 (2.0469) loss_scale 512.0000 (512.0000) mem 16717MB [2024-08-10 18:15:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][340/625] eta 0:02:16 lr 0.000394 wd 0.0500 time 0.4714 (0.4784) data time 0.0010 (0.0023) model time 0.4704 (0.4759) loss 2.2485 (2.7570) grad_norm 5.8687 (2.0775) loss_scale 1024.0000 (522.5103) mem 16717MB [2024-08-10 18:15:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][350/625] eta 0:02:11 lr 0.000394 wd 0.0500 time 0.4733 (0.4782) data time 0.0008 (0.0022) model time 0.4725 (0.4757) loss 1.9432 (2.7564) grad_norm 1.8117 (2.0774) loss_scale 1024.0000 (536.7977) mem 16717MB [2024-08-10 18:15:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][360/625] eta 0:02:06 lr 0.000394 wd 0.0500 time 0.4710 (0.4781) data time 0.0008 (0.0022) model time 0.4702 (0.4756) loss 3.5071 (2.7538) grad_norm 1.9025 (2.0818) loss_scale 1024.0000 (550.2936) mem 16717MB [2024-08-10 18:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][370/625] eta 0:02:01 lr 0.000394 wd 0.0500 time 0.4781 (0.4780) data time 0.0010 (0.0022) model time 0.4771 (0.4755) loss 2.6509 (2.7560) grad_norm 1.4225 (2.0729) loss_scale 1024.0000 (563.0620) mem 16717MB [2024-08-10 18:16:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][380/625] eta 0:01:57 lr 0.000393 wd 0.0500 time 0.4804 (0.4779) data time 0.0010 (0.0021) model time 0.4794 (0.4755) loss 2.8990 (2.7602) grad_norm 1.9987 (2.0797) loss_scale 1024.0000 (575.1601) mem 16717MB [2024-08-10 18:16:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][390/625] eta 0:01:52 lr 0.000393 wd 0.0500 time 0.4785 (0.4784) data time 0.0009 (0.0021) model time 0.4776 (0.4761) loss 3.2411 (2.7582) grad_norm 2.6355 (2.0778) loss_scale 1024.0000 (586.6394) mem 16717MB [2024-08-10 18:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][400/625] eta 0:01:47 lr 0.000393 wd 0.0500 time 0.4784 (0.4784) data time 0.0008 (0.0021) model time 0.4776 (0.4761) loss 3.2520 (2.7640) grad_norm 2.0631 (2.0730) loss_scale 1024.0000 (597.5461) mem 16717MB [2024-08-10 18:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][410/625] eta 0:01:42 lr 0.000393 wd 0.0500 time 0.4776 (0.4790) data time 0.0008 (0.0021) model time 0.4767 (0.4768) loss 3.4754 (2.7678) grad_norm 1.2670 (2.0664) loss_scale 1024.0000 (607.9221) mem 16717MB [2024-08-10 18:16:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][420/625] eta 0:01:38 lr 0.000393 wd 0.0500 time 0.4755 (0.4788) data time 0.0008 (0.0020) model time 0.4747 (0.4767) loss 2.6656 (2.7686) grad_norm 1.9596 (2.0639) loss_scale 1024.0000 (617.8052) mem 16717MB [2024-08-10 18:16:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][430/625] eta 0:01:33 lr 0.000393 wd 0.0500 time 0.4725 (0.4787) data time 0.0012 (0.0020) model time 0.4714 (0.4766) loss 2.8924 (2.7635) grad_norm 2.0385 (2.0703) loss_scale 1024.0000 (627.2297) mem 16717MB [2024-08-10 18:16:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][440/625] eta 0:01:28 lr 0.000393 wd 0.0500 time 0.4744 (0.4786) data time 0.0011 (0.0020) model time 0.4734 (0.4765) loss 2.7092 (2.7583) grad_norm 1.9731 (2.0692) loss_scale 1024.0000 (636.2268) mem 16717MB [2024-08-10 18:16:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][450/625] eta 0:01:23 lr 0.000393 wd 0.0500 time 0.4829 (0.4786) data time 0.0007 (0.0020) model time 0.4821 (0.4765) loss 3.1410 (2.7592) grad_norm 2.0031 (2.0679) loss_scale 1024.0000 (644.8248) mem 16717MB [2024-08-10 18:16:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][460/625] eta 0:01:18 lr 0.000393 wd 0.0500 time 0.4783 (0.4786) data time 0.0010 (0.0019) model time 0.4774 (0.4765) loss 2.8944 (2.7620) grad_norm 1.6399 (2.0698) loss_scale 1024.0000 (653.0499) mem 16717MB [2024-08-10 18:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][470/625] eta 0:01:14 lr 0.000393 wd 0.0500 time 0.4817 (0.4786) data time 0.0010 (0.0019) model time 0.4807 (0.4766) loss 3.0408 (2.7624) grad_norm 1.8903 (2.0702) loss_scale 1024.0000 (660.9257) mem 16717MB [2024-08-10 18:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][480/625] eta 0:01:09 lr 0.000392 wd 0.0500 time 0.4803 (0.4787) data time 0.0008 (0.0019) model time 0.4795 (0.4766) loss 3.0143 (2.7657) grad_norm 1.7909 (2.0703) loss_scale 1024.0000 (668.4740) mem 16717MB [2024-08-10 18:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][490/625] eta 0:01:04 lr 0.000392 wd 0.0500 time 0.4718 (0.4786) data time 0.0008 (0.0019) model time 0.4710 (0.4766) loss 2.5740 (2.7654) grad_norm 2.3633 (2.0757) loss_scale 1024.0000 (675.7149) mem 16717MB [2024-08-10 18:17:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][500/625] eta 0:00:59 lr 0.000392 wd 0.0500 time 0.4757 (0.4785) data time 0.0010 (0.0019) model time 0.4746 (0.4765) loss 3.2450 (2.7615) grad_norm 1.8753 (2.0976) loss_scale 1024.0000 (682.6667) mem 16717MB [2024-08-10 18:17:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][510/625] eta 0:00:55 lr 0.000392 wd 0.0500 time 0.4738 (0.4785) data time 0.0010 (0.0019) model time 0.4728 (0.4765) loss 3.0552 (2.7608) grad_norm 1.7155 (2.0942) loss_scale 1024.0000 (689.3464) mem 16717MB [2024-08-10 18:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][520/625] eta 0:00:50 lr 0.000392 wd 0.0500 time 0.4682 (0.4784) data time 0.0008 (0.0018) model time 0.4673 (0.4764) loss 3.0168 (2.7618) grad_norm 2.2271 (2.0982) loss_scale 1024.0000 (695.7697) mem 16717MB [2024-08-10 18:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 18:17:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 18:17:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 18:23:07 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 18:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 18:23:22 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 18:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 18:23:34 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 18:23:36 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 18:23:38 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 18:23:38 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 192) [2024-08-10 18:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 18:24:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][530/625] eta 0:04:03 lr 0.000392 wd 0.0500 time 0.4841 (2.5664) data time 0.0013 (0.0984) model time 0.4827 (2.4680) loss 2.8362 (3.1372) grad_norm 1.9540 (2.0899) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][540/625] eta 0:02:09 lr 0.000392 wd 0.0500 time 0.4859 (1.5278) data time 0.0008 (0.0498) model time 0.4850 (1.4780) loss 3.1644 (2.9806) grad_norm 2.1079 (1.9942) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:24:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][550/625] eta 0:01:29 lr 0.000392 wd 0.0500 time 0.4839 (1.1894) data time 0.0011 (0.0335) model time 0.4828 (1.1559) loss 3.0755 (3.0048) grad_norm 1.5930 (1.9216) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][560/625] eta 0:01:06 lr 0.000392 wd 0.0500 time 0.4909 (1.0192) data time 0.0008 (0.0255) model time 0.4900 (0.9936) loss 2.2540 (2.9506) grad_norm 2.3373 (1.9937) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:24:25 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 18:24:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 18:24:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 18:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 18:27:42 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 18:27:55 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 18:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 18:28:04 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 18:28:06 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 18:28:08 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 18:28:08 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 192) [2024-08-10 18:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 18:28:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][570/625] eta 0:02:29 lr 0.000392 wd 0.0500 time 0.4484 (2.7238) data time 0.0006 (0.0901) model time 0.4478 (2.6336) loss 3.3705 (3.1683) grad_norm 1.8922 (1.9319) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 18:28:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][580/625] eta 0:01:05 lr 0.000392 wd 0.0500 time 0.4455 (1.4593) data time 0.0007 (0.0406) model time 0.4448 (1.4187) loss 3.5252 (3.0197) grad_norm 4.2292 (1.9687) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 18:28:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][590/625] eta 0:00:38 lr 0.000391 wd 0.0500 time 0.4538 (1.0991) data time 0.0009 (0.0264) model time 0.4530 (1.0727) loss 3.1873 (3.0115) grad_norm 1.4966 (1.8863) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 18:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][600/625] eta 0:00:23 lr 0.000391 wd 0.0500 time 0.3869 (0.9406) data time 0.0009 (0.0197) model time 0.3859 (0.9209) loss 2.8227 (3.0046) grad_norm 1.4932 (1.8117) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 18:28:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][610/625] eta 0:00:12 lr 0.000391 wd 0.0500 time 0.4507 (0.8384) data time 0.0005 (0.0158) model time 0.4502 (0.8226) loss 3.0184 (2.9856) grad_norm 1.7688 (1.8554) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 18:28:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [192/300][620/625] eta 0:00:03 lr 0.000391 wd 0.0500 time 0.4436 (0.7708) data time 0.0005 (0.0132) model time 0.4432 (0.7576) loss 2.3295 (2.9553) grad_norm 2.4042 (1.9636) loss_scale 1024.0000 (1024.0000) mem 16695MB [2024-08-10 18:28:59 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 192 training takes 0:00:46 [2024-08-10 18:28:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 18:29:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 18:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.458 (0.458) Loss 0.5234 (0.5234) Acc@1 89.111 (89.111) Acc@5 98.486 (98.486) Mem 16695MB [2024-08-10 18:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.155) Loss 0.8560 (0.6332) Acc@1 79.932 (86.341) Acc@5 95.459 (97.621) Mem 16695MB [2024-08-10 18:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.8774 (0.7502) Acc@1 79.492 (83.394) Acc@5 95.166 (96.470) Mem 16695MB [2024-08-10 18:29:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.141 Acc@5 96.483 [2024-08-10 18:29:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 18:29:08 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.14% [2024-08-10 18:29:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 18:29:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 18:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.4717 (0.4717) Acc@1 89.648 (89.648) Acc@5 98.779 (98.779) Mem 16695MB [2024-08-10 18:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 0.7500 (0.5846) Acc@1 81.787 (87.331) Acc@5 96.680 (97.976) Mem 16695MB [2024-08-10 18:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.132) Loss 0.8413 (0.6870) Acc@1 80.273 (84.608) Acc@5 95.996 (97.017) Mem 16695MB [2024-08-10 18:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.291 Acc@5 97.019 [2024-08-10 18:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 18:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.29% [2024-08-10 18:29:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 18:29:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 18:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][0/625] eta 0:08:46 lr 0.000391 wd 0.0500 time 0.8419 (0.8419) data time 0.3540 (0.3540) model time 0.0000 (0.0000) loss 2.4041 (2.4041) grad_norm 1.5081 (1.5081) loss_scale 1024.0000 (1024.0000) mem 16704MB [2024-08-10 18:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][10/625] eta 0:04:56 lr 0.000391 wd 0.0500 time 0.4477 (0.4826) data time 0.0008 (0.0329) model time 0.0000 (0.0000) loss 3.1311 (2.5959) grad_norm 1.8526 (2.0485) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][20/625] eta 0:04:41 lr 0.000391 wd 0.0500 time 0.4461 (0.4657) data time 0.0006 (0.0176) model time 0.0000 (0.0000) loss 1.8870 (2.6217) grad_norm 2.8961 (1.9346) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][30/625] eta 0:04:33 lr 0.000391 wd 0.0500 time 0.4508 (0.4603) data time 0.0007 (0.0122) model time 0.0000 (0.0000) loss 3.1015 (2.6953) grad_norm 1.7072 (1.9006) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][40/625] eta 0:04:27 lr 0.000391 wd 0.0500 time 0.4459 (0.4574) data time 0.0007 (0.0094) model time 0.0000 (0.0000) loss 3.0184 (2.7606) grad_norm 1.6057 (1.8625) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][50/625] eta 0:04:21 lr 0.000391 wd 0.0500 time 0.4419 (0.4555) data time 0.0006 (0.0077) model time 0.0000 (0.0000) loss 2.9754 (2.7497) grad_norm 1.5143 (1.8117) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][60/625] eta 0:04:16 lr 0.000390 wd 0.0500 time 0.4469 (0.4542) data time 0.0008 (0.0066) model time 0.4461 (0.4467) loss 3.0121 (2.7690) grad_norm 1.6588 (2.0067) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][70/625] eta 0:04:11 lr 0.000390 wd 0.0500 time 0.4517 (0.4534) data time 0.0009 (0.0058) model time 0.4509 (0.4472) loss 2.8790 (2.7701) grad_norm 2.1384 (2.0462) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][80/625] eta 0:04:06 lr 0.000390 wd 0.0500 time 0.4419 (0.4527) data time 0.0008 (0.0052) model time 0.4411 (0.4471) loss 3.1872 (2.7724) grad_norm 1.9659 (2.0745) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][90/625] eta 0:04:02 lr 0.000390 wd 0.0500 time 0.4449 (0.4532) data time 0.0006 (0.0047) model time 0.4443 (0.4494) loss 2.8325 (2.7652) grad_norm 2.0631 (2.0891) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][100/625] eta 0:03:57 lr 0.000390 wd 0.0500 time 0.4481 (0.4529) data time 0.0007 (0.0043) model time 0.4474 (0.4494) loss 2.5048 (2.7760) grad_norm 2.1799 (2.0854) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][110/625] eta 0:03:53 lr 0.000390 wd 0.0500 time 0.4526 (0.4527) data time 0.0009 (0.0040) model time 0.4517 (0.4495) loss 2.8843 (2.7807) grad_norm 2.4020 (2.0714) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][120/625] eta 0:03:48 lr 0.000390 wd 0.0500 time 0.4495 (0.4525) data time 0.0008 (0.0038) model time 0.4486 (0.4494) loss 2.8147 (2.7598) grad_norm 1.5962 (2.0539) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][130/625] eta 0:03:44 lr 0.000390 wd 0.0500 time 0.6590 (0.4539) data time 0.0007 (0.0035) model time 0.6584 (0.4521) loss 2.8250 (2.7626) grad_norm 1.5014 (2.0330) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][140/625] eta 0:03:39 lr 0.000390 wd 0.0500 time 0.4501 (0.4535) data time 0.0007 (0.0033) model time 0.4494 (0.4516) loss 2.2994 (2.7429) grad_norm 1.9117 (2.0133) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][150/625] eta 0:03:35 lr 0.000390 wd 0.0500 time 0.4503 (0.4533) data time 0.0009 (0.0032) model time 0.4494 (0.4513) loss 1.5392 (2.7292) grad_norm 2.8361 (2.0004) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][160/625] eta 0:03:30 lr 0.000389 wd 0.0500 time 0.4464 (0.4530) data time 0.0007 (0.0030) model time 0.4457 (0.4511) loss 2.3987 (2.7325) grad_norm 1.7026 (1.9851) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][170/625] eta 0:03:26 lr 0.000389 wd 0.0500 time 0.4443 (0.4529) data time 0.0009 (0.0029) model time 0.4434 (0.4510) loss 2.4382 (2.7357) grad_norm 1.6825 (1.9668) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][180/625] eta 0:03:21 lr 0.000389 wd 0.0500 time 0.4512 (0.4527) data time 0.0008 (0.0028) model time 0.4504 (0.4508) loss 2.9686 (2.7416) grad_norm 2.8452 (1.9760) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][190/625] eta 0:03:16 lr 0.000389 wd 0.0500 time 0.4506 (0.4526) data time 0.0008 (0.0027) model time 0.4498 (0.4507) loss 2.9451 (2.7321) grad_norm 1.7566 (1.9836) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][200/625] eta 0:03:12 lr 0.000389 wd 0.0500 time 0.4454 (0.4525) data time 0.0007 (0.0026) model time 0.4448 (0.4506) loss 2.6287 (2.7213) grad_norm 1.6892 (1.9915) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][210/625] eta 0:03:07 lr 0.000389 wd 0.0500 time 0.4561 (0.4525) data time 0.0006 (0.0025) model time 0.4555 (0.4507) loss 3.3024 (2.7261) grad_norm 5.1001 (2.0002) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:30:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][220/625] eta 0:03:03 lr 0.000389 wd 0.0500 time 0.4495 (0.4524) data time 0.0008 (0.0024) model time 0.4487 (0.4507) loss 3.1096 (2.7299) grad_norm 2.0348 (2.0077) loss_scale 1024.0000 (1024.0000) mem 16699MB [2024-08-10 18:31:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][230/625] eta 0:02:58 lr 0.000389 wd 0.0500 time 0.4533 (0.4523) data time 0.0009 (0.0024) model time 0.4524 (0.4505) loss 2.4448 (2.7284) grad_norm 1.5621 (inf) loss_scale 512.0000 (1001.8355) mem 16699MB [2024-08-10 18:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][240/625] eta 0:02:54 lr 0.000389 wd 0.0500 time 0.4484 (0.4522) data time 0.0006 (0.0023) model time 0.4478 (0.4505) loss 2.5103 (2.7241) grad_norm 1.9092 (inf) loss_scale 512.0000 (981.5104) mem 16699MB [2024-08-10 18:31:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][250/625] eta 0:02:49 lr 0.000389 wd 0.0500 time 0.4485 (0.4522) data time 0.0006 (0.0022) model time 0.4479 (0.4505) loss 3.0474 (2.7263) grad_norm 4.0571 (inf) loss_scale 512.0000 (962.8048) mem 16699MB [2024-08-10 18:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][260/625] eta 0:02:45 lr 0.000388 wd 0.0500 time 0.4540 (0.4522) data time 0.0008 (0.0022) model time 0.4531 (0.4505) loss 2.6278 (2.7352) grad_norm 1.9727 (inf) loss_scale 512.0000 (945.5326) mem 16699MB [2024-08-10 18:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][270/625] eta 0:02:40 lr 0.000388 wd 0.0500 time 0.4507 (0.4521) data time 0.0008 (0.0022) model time 0.4499 (0.4505) loss 2.9788 (2.7355) grad_norm 5.3691 (inf) loss_scale 512.0000 (929.5351) mem 16699MB [2024-08-10 18:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][280/625] eta 0:02:35 lr 0.000388 wd 0.0500 time 0.4435 (0.4520) data time 0.0009 (0.0021) model time 0.4426 (0.4504) loss 2.9015 (2.7404) grad_norm 2.0363 (inf) loss_scale 512.0000 (914.6762) mem 16699MB [2024-08-10 18:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][290/625] eta 0:02:31 lr 0.000388 wd 0.0500 time 0.4491 (0.4519) data time 0.0007 (0.0021) model time 0.4484 (0.4503) loss 3.0552 (2.7442) grad_norm 1.5336 (inf) loss_scale 512.0000 (900.8385) mem 16699MB [2024-08-10 18:31:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][300/625] eta 0:02:26 lr 0.000388 wd 0.0500 time 0.4484 (0.4518) data time 0.0006 (0.0020) model time 0.4478 (0.4502) loss 3.2133 (2.7481) grad_norm 1.7447 (inf) loss_scale 512.0000 (887.9203) mem 16699MB [2024-08-10 18:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][310/625] eta 0:02:22 lr 0.000388 wd 0.0500 time 0.4496 (0.4522) data time 0.0006 (0.0020) model time 0.4490 (0.4507) loss 3.3277 (2.7476) grad_norm 3.6085 (inf) loss_scale 512.0000 (875.8328) mem 16699MB [2024-08-10 18:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][320/625] eta 0:02:17 lr 0.000388 wd 0.0500 time 0.4452 (0.4522) data time 0.0008 (0.0019) model time 0.4444 (0.4507) loss 1.6519 (2.7406) grad_norm 1.2686 (inf) loss_scale 512.0000 (864.4984) mem 16699MB [2024-08-10 18:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][330/625] eta 0:02:13 lr 0.000388 wd 0.0500 time 0.4479 (0.4521) data time 0.0008 (0.0019) model time 0.4470 (0.4507) loss 3.2511 (2.7396) grad_norm 1.2221 (inf) loss_scale 512.0000 (853.8489) mem 16699MB [2024-08-10 18:31:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][340/625] eta 0:02:08 lr 0.000388 wd 0.0500 time 0.4484 (0.4521) data time 0.0008 (0.0019) model time 0.4476 (0.4506) loss 2.4672 (2.7455) grad_norm 1.4658 (inf) loss_scale 512.0000 (843.8240) mem 16699MB [2024-08-10 18:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][350/625] eta 0:02:04 lr 0.000388 wd 0.0500 time 0.4480 (0.4520) data time 0.0008 (0.0018) model time 0.4472 (0.4505) loss 2.8733 (2.7523) grad_norm 1.4437 (inf) loss_scale 512.0000 (834.3704) mem 16699MB [2024-08-10 18:32:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][360/625] eta 0:01:59 lr 0.000387 wd 0.0500 time 0.4480 (0.4518) data time 0.0006 (0.0018) model time 0.4474 (0.4504) loss 2.0690 (2.7498) grad_norm 1.8416 (inf) loss_scale 512.0000 (825.4404) mem 16699MB [2024-08-10 18:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][370/625] eta 0:01:55 lr 0.000387 wd 0.0500 time 0.4510 (0.4518) data time 0.0006 (0.0018) model time 0.4503 (0.4503) loss 3.0164 (2.7574) grad_norm 1.3706 (inf) loss_scale 512.0000 (816.9919) mem 16699MB [2024-08-10 18:32:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][380/625] eta 0:01:50 lr 0.000387 wd 0.0500 time 0.4489 (0.4517) data time 0.0007 (0.0018) model time 0.4482 (0.4502) loss 2.9361 (2.7587) grad_norm 9.2582 (inf) loss_scale 512.0000 (808.9869) mem 16699MB [2024-08-10 18:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][390/625] eta 0:01:46 lr 0.000387 wd 0.0500 time 0.4499 (0.4516) data time 0.0006 (0.0017) model time 0.4493 (0.4502) loss 2.0004 (2.7569) grad_norm 1.5613 (inf) loss_scale 512.0000 (801.3913) mem 16699MB [2024-08-10 18:32:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][400/625] eta 0:01:41 lr 0.000387 wd 0.0500 time 0.4508 (0.4515) data time 0.0006 (0.0017) model time 0.4502 (0.4501) loss 2.4580 (2.7527) grad_norm 2.0942 (inf) loss_scale 512.0000 (794.1746) mem 16699MB [2024-08-10 18:32:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][410/625] eta 0:01:37 lr 0.000387 wd 0.0500 time 0.4494 (0.4515) data time 0.0007 (0.0017) model time 0.4488 (0.4501) loss 2.7981 (2.7485) grad_norm 3.0466 (inf) loss_scale 512.0000 (787.3090) mem 16699MB [2024-08-10 18:32:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][420/625] eta 0:01:32 lr 0.000387 wd 0.0500 time 0.4475 (0.4514) data time 0.0006 (0.0017) model time 0.4469 (0.4501) loss 3.4645 (2.7499) grad_norm 2.7854 (inf) loss_scale 512.0000 (780.7696) mem 16699MB [2024-08-10 18:32:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][430/625] eta 0:01:28 lr 0.000387 wd 0.0500 time 0.4521 (0.4514) data time 0.0006 (0.0017) model time 0.4515 (0.4500) loss 2.1960 (2.7530) grad_norm 1.3373 (inf) loss_scale 512.0000 (774.5336) mem 16699MB [2024-08-10 18:32:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][440/625] eta 0:01:23 lr 0.000387 wd 0.0500 time 0.4522 (0.4514) data time 0.0006 (0.0016) model time 0.4516 (0.4500) loss 2.3705 (2.7525) grad_norm 2.1563 (inf) loss_scale 512.0000 (768.5805) mem 16699MB [2024-08-10 18:32:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][450/625] eta 0:01:18 lr 0.000387 wd 0.0500 time 0.4534 (0.4514) data time 0.0006 (0.0016) model time 0.4528 (0.4500) loss 3.4067 (2.7571) grad_norm 2.2806 (inf) loss_scale 512.0000 (762.8914) mem 16699MB [2024-08-10 18:32:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][460/625] eta 0:01:14 lr 0.000386 wd 0.0500 time 0.4542 (0.4517) data time 0.0009 (0.0016) model time 0.4533 (0.4504) loss 2.6778 (2.7542) grad_norm 5.2968 (inf) loss_scale 512.0000 (757.4490) mem 16699MB [2024-08-10 18:32:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][470/625] eta 0:01:10 lr 0.000386 wd 0.0500 time 0.4509 (0.4521) data time 0.0007 (0.0016) model time 0.4502 (0.4509) loss 3.1849 (2.7528) grad_norm 2.0387 (inf) loss_scale 512.0000 (752.2378) mem 16699MB [2024-08-10 18:32:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][480/625] eta 0:01:05 lr 0.000386 wd 0.0500 time 0.4475 (0.4521) data time 0.0010 (0.0016) model time 0.4466 (0.4509) loss 2.9422 (2.7514) grad_norm 1.8646 (inf) loss_scale 512.0000 (747.2432) mem 16699MB [2024-08-10 18:32:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][490/625] eta 0:01:01 lr 0.000386 wd 0.0500 time 0.4519 (0.4521) data time 0.0007 (0.0016) model time 0.4511 (0.4508) loss 3.2662 (2.7561) grad_norm 1.7291 (inf) loss_scale 512.0000 (742.4521) mem 16699MB [2024-08-10 18:33:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][500/625] eta 0:00:56 lr 0.000386 wd 0.0500 time 0.4538 (0.4520) data time 0.0009 (0.0015) model time 0.4529 (0.4508) loss 2.6824 (2.7585) grad_norm 2.1634 (inf) loss_scale 512.0000 (737.8523) mem 16699MB [2024-08-10 18:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][510/625] eta 0:00:51 lr 0.000386 wd 0.0500 time 0.4494 (0.4520) data time 0.0007 (0.0015) model time 0.4487 (0.4507) loss 2.6644 (2.7599) grad_norm 2.4456 (inf) loss_scale 512.0000 (733.4325) mem 16699MB [2024-08-10 18:33:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][520/625] eta 0:00:47 lr 0.000386 wd 0.0500 time 0.4461 (0.4519) data time 0.0008 (0.0015) model time 0.4453 (0.4507) loss 3.3042 (2.7639) grad_norm 1.5058 (inf) loss_scale 512.0000 (729.1823) mem 16699MB [2024-08-10 18:33:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][530/625] eta 0:00:42 lr 0.000386 wd 0.0500 time 0.4435 (0.4519) data time 0.0009 (0.0015) model time 0.4426 (0.4506) loss 2.2550 (2.7675) grad_norm 2.2906 (inf) loss_scale 512.0000 (725.0923) mem 16699MB [2024-08-10 18:33:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][540/625] eta 0:00:38 lr 0.000386 wd 0.0500 time 0.4456 (0.4518) data time 0.0008 (0.0015) model time 0.4447 (0.4506) loss 1.9245 (2.7637) grad_norm 1.8074 (inf) loss_scale 512.0000 (721.1534) mem 16699MB [2024-08-10 18:33:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][550/625] eta 0:00:33 lr 0.000386 wd 0.0500 time 0.4462 (0.4518) data time 0.0009 (0.0015) model time 0.4453 (0.4506) loss 2.8332 (2.7660) grad_norm 1.7431 (inf) loss_scale 512.0000 (717.3575) mem 16699MB [2024-08-10 18:33:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][560/625] eta 0:00:29 lr 0.000386 wd 0.0500 time 0.4490 (0.4518) data time 0.0007 (0.0015) model time 0.4484 (0.4505) loss 3.3666 (2.7700) grad_norm 2.5333 (inf) loss_scale 512.0000 (713.6970) mem 16699MB [2024-08-10 18:33:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][570/625] eta 0:00:24 lr 0.000385 wd 0.0500 time 0.4481 (0.4517) data time 0.0007 (0.0015) model time 0.4475 (0.4505) loss 2.9469 (2.7719) grad_norm 1.7068 (inf) loss_scale 512.0000 (710.1646) mem 16699MB [2024-08-10 18:33:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][580/625] eta 0:00:20 lr 0.000385 wd 0.0500 time 0.4526 (0.4517) data time 0.0007 (0.0014) model time 0.4520 (0.4505) loss 2.8843 (2.7699) grad_norm 1.9466 (inf) loss_scale 512.0000 (706.7539) mem 16699MB [2024-08-10 18:33:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][590/625] eta 0:00:15 lr 0.000385 wd 0.0500 time 0.4509 (0.4516) data time 0.0006 (0.0014) model time 0.4503 (0.4504) loss 3.1780 (2.7699) grad_norm 1.4814 (inf) loss_scale 512.0000 (703.4585) mem 16699MB [2024-08-10 18:33:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][600/625] eta 0:00:11 lr 0.000385 wd 0.0500 time 0.4556 (0.4516) data time 0.0008 (0.0014) model time 0.4548 (0.4504) loss 2.4441 (2.7659) grad_norm 1.8136 (inf) loss_scale 512.0000 (700.2729) mem 16699MB [2024-08-10 18:33:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][610/625] eta 0:00:06 lr 0.000385 wd 0.0500 time 0.4460 (0.4516) data time 0.0004 (0.0014) model time 0.4455 (0.4504) loss 3.0274 (2.7694) grad_norm 1.3593 (inf) loss_scale 512.0000 (697.1915) mem 16699MB [2024-08-10 18:33:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [193/300][620/625] eta 0:00:02 lr 0.000385 wd 0.0500 time 0.4470 (0.4515) data time 0.0004 (0.0014) model time 0.4466 (0.4504) loss 3.1358 (2.7701) grad_norm 6.2081 (inf) loss_scale 512.0000 (694.2093) mem 16699MB [2024-08-10 18:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 18:33:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 18:34:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 18:47:27 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 18:47:28 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 18:47:41 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 18:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 18:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 18:56:39 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 18:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 18:56:50 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 18:56:53 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 18:56:55 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 18:56:55 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 193) [2024-08-10 18:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 18:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][0/625] eta 3:54:46 lr 0.000385 wd 0.0500 time 22.5379 (22.5379) data time 0.7149 (0.7149) model time 0.0000 (0.0000) loss 3.0626 (3.0626) grad_norm 1.9490 (1.9490) loss_scale 512.0000 (512.0000) mem 25855MB [2024-08-10 18:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][10/625] eta 0:27:00 lr 0.000385 wd 0.0500 time 0.4723 (2.6357) data time 0.0011 (0.0660) model time 0.0000 (0.0000) loss 2.3557 (2.9876) grad_norm 1.4722 (1.7997) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][20/625] eta 0:16:12 lr 0.000385 wd 0.0500 time 0.4797 (1.6077) data time 0.0011 (0.0351) model time 0.0000 (0.0000) loss 3.0169 (2.9763) grad_norm 1.4793 (2.0066) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][30/625] eta 0:12:24 lr 0.000385 wd 0.0500 time 0.4745 (1.2507) data time 0.0008 (0.0241) model time 0.0000 (0.0000) loss 2.1790 (2.9815) grad_norm 1.5926 (2.0346) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][40/625] eta 0:10:23 lr 0.000384 wd 0.0500 time 0.4784 (1.0662) data time 0.0011 (0.0185) model time 0.0000 (0.0000) loss 3.1195 (2.9192) grad_norm 3.4824 (2.0608) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:57:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][50/625] eta 0:09:06 lr 0.000384 wd 0.0500 time 0.4640 (0.9499) data time 0.0009 (0.0151) model time 0.0000 (0.0000) loss 2.7219 (2.9085) grad_norm 1.8176 (1.9917) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][60/625] eta 0:08:12 lr 0.000384 wd 0.0500 time 0.4742 (0.8723) data time 0.0011 (0.0128) model time 0.4731 (0.4754) loss 2.7705 (2.8695) grad_norm 1.9543 (1.9487) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:57:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][70/625] eta 0:07:33 lr 0.000384 wd 0.0500 time 0.4749 (0.8163) data time 0.0011 (0.0111) model time 0.4738 (0.4746) loss 2.7200 (2.8397) grad_norm 1.6473 (1.9538) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][80/625] eta 0:07:02 lr 0.000384 wd 0.0500 time 0.4762 (0.7744) data time 0.0012 (0.0099) model time 0.4750 (0.4750) loss 2.3916 (2.8220) grad_norm 3.6797 (2.0029) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][90/625] eta 0:06:36 lr 0.000384 wd 0.0500 time 0.4733 (0.7417) data time 0.0010 (0.0089) model time 0.4723 (0.4752) loss 3.4083 (2.8154) grad_norm 1.7253 (2.0179) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][100/625] eta 0:06:15 lr 0.000384 wd 0.0500 time 0.4745 (0.7151) data time 0.0009 (0.0082) model time 0.4736 (0.4746) loss 3.1791 (2.8147) grad_norm 5.8698 (2.0303) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][110/625] eta 0:05:57 lr 0.000384 wd 0.0500 time 0.4772 (0.6935) data time 0.0012 (0.0075) model time 0.4760 (0.4745) loss 2.4449 (2.8202) grad_norm 2.0297 (2.0093) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][120/625] eta 0:05:40 lr 0.000384 wd 0.0500 time 0.4696 (0.6752) data time 0.0008 (0.0070) model time 0.4688 (0.4740) loss 1.6976 (2.8196) grad_norm 1.4772 (2.0514) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][130/625] eta 0:05:26 lr 0.000384 wd 0.0500 time 0.4795 (0.6600) data time 0.0011 (0.0065) model time 0.4784 (0.4742) loss 3.0741 (2.8122) grad_norm 1.7513 (2.0932) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][140/625] eta 0:05:13 lr 0.000383 wd 0.0500 time 0.4838 (0.6473) data time 0.0009 (0.0062) model time 0.4829 (0.4748) loss 2.9527 (2.8019) grad_norm 2.4694 (2.0777) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][150/625] eta 0:05:02 lr 0.000383 wd 0.0500 time 0.4776 (0.6362) data time 0.0011 (0.0058) model time 0.4766 (0.4751) loss 2.3106 (2.7994) grad_norm 1.5514 (2.0684) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][160/625] eta 0:04:51 lr 0.000383 wd 0.0500 time 0.4763 (0.6263) data time 0.0011 (0.0055) model time 0.4752 (0.4753) loss 3.1650 (2.8077) grad_norm 1.6403 (2.0673) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][170/625] eta 0:04:40 lr 0.000383 wd 0.0500 time 0.4757 (0.6174) data time 0.0011 (0.0053) model time 0.4746 (0.4750) loss 2.9558 (2.8032) grad_norm 2.1484 (2.1174) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][180/625] eta 0:04:31 lr 0.000383 wd 0.0500 time 0.4767 (0.6096) data time 0.0012 (0.0050) model time 0.4756 (0.4750) loss 3.0062 (2.7890) grad_norm 1.5841 (2.1012) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][190/625] eta 0:04:22 lr 0.000383 wd 0.0500 time 0.4736 (0.6034) data time 0.0011 (0.0048) model time 0.4725 (0.4760) loss 2.3164 (2.7896) grad_norm 1.7536 (2.0971) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][200/625] eta 0:04:13 lr 0.000383 wd 0.0500 time 0.4794 (0.5970) data time 0.0011 (0.0046) model time 0.4783 (0.4759) loss 2.4552 (2.7778) grad_norm 2.2223 (2.0887) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][210/625] eta 0:04:05 lr 0.000383 wd 0.0500 time 0.4755 (0.5913) data time 0.0011 (0.0045) model time 0.4744 (0.4759) loss 2.9488 (2.7738) grad_norm 1.8217 (2.0739) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][220/625] eta 0:03:57 lr 0.000383 wd 0.0500 time 0.4789 (0.5862) data time 0.0008 (0.0043) model time 0.4781 (0.4760) loss 2.8522 (2.7723) grad_norm 2.6326 (2.0694) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 18:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 18:59:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 18:59:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:00:58 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:00:59 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:01:12 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:01:22 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:01:24 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:01:26 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:01:26 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 194) [2024-08-10 19:01:26 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:03:39 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:03:51 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:03:54 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:03:56 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:03:56 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 194) [2024-08-10 19:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][230/625] eta 2:14:05 lr 0.000383 wd 0.0500 time 20.3692 (20.3692) data time 0.8257 (0.8257) model time 19.5435 (19.5435) loss 3.5740 (3.5740) grad_norm 6.7778 (6.7778) loss_scale 512.0000 (512.0000) mem 25845MB [2024-08-10 19:04:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][240/625] eta 0:14:48 lr 0.000382 wd 0.0500 time 0.4437 (2.3082) data time 0.0008 (0.0758) model time 0.4429 (2.2324) loss 2.6253 (3.1374) grad_norm 1.5277 (2.5887) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][250/625] eta 0:08:52 lr 0.000382 wd 0.0500 time 0.4412 (1.4189) data time 0.0008 (0.0401) model time 0.4404 (1.3788) loss 2.8705 (3.0607) grad_norm 2.0249 (2.3393) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][260/625] eta 0:06:45 lr 0.000382 wd 0.0500 time 0.4412 (1.1119) data time 0.0006 (0.0274) model time 0.4406 (1.0845) loss 2.1410 (3.0113) grad_norm 1.9359 (2.1737) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][270/625] eta 0:05:38 lr 0.000382 wd 0.0500 time 0.4457 (0.9537) data time 0.0008 (0.0209) model time 0.4450 (0.9328) loss 2.4127 (2.9603) grad_norm 1.6940 (2.1755) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][280/625] eta 0:04:54 lr 0.000382 wd 0.0500 time 0.4389 (0.8533) data time 0.0006 (0.0170) model time 0.4383 (0.8364) loss 2.9048 (2.9412) grad_norm 2.1814 (2.2304) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][290/625] eta 0:04:23 lr 0.000382 wd 0.0500 time 0.4432 (0.7860) data time 0.0008 (0.0143) model time 0.4423 (0.7716) loss 2.9610 (2.9122) grad_norm 3.4104 (2.2808) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][300/625] eta 0:03:59 lr 0.000382 wd 0.0500 time 0.4408 (0.7375) data time 0.0008 (0.0124) model time 0.4400 (0.7251) loss 2.4824 (2.8795) grad_norm 1.8805 (2.2295) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][310/625] eta 0:03:40 lr 0.000382 wd 0.0500 time 0.4400 (0.7010) data time 0.0007 (0.0110) model time 0.4393 (0.6900) loss 2.3139 (2.8679) grad_norm 1.6838 (2.1796) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][320/625] eta 0:03:25 lr 0.000382 wd 0.0500 time 0.4436 (0.6726) data time 0.0006 (0.0098) model time 0.4430 (0.6627) loss 3.6646 (2.8601) grad_norm 1.4782 (2.1982) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][330/625] eta 0:03:11 lr 0.000382 wd 0.0500 time 0.4424 (0.6498) data time 0.0006 (0.0089) model time 0.4418 (0.6409) loss 3.1926 (2.8574) grad_norm 1.9435 (2.1964) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][340/625] eta 0:02:59 lr 0.000381 wd 0.0500 time 0.4405 (0.6311) data time 0.0008 (0.0082) model time 0.4397 (0.6229) loss 2.4025 (2.8562) grad_norm 1.8481 (2.1591) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][350/625] eta 0:02:49 lr 0.000381 wd 0.0500 time 0.4469 (0.6156) data time 0.0006 (0.0076) model time 0.4464 (0.6080) loss 1.9224 (2.8612) grad_norm 1.6222 (2.1598) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][360/625] eta 0:02:39 lr 0.000381 wd 0.0500 time 0.4474 (0.6025) data time 0.0007 (0.0071) model time 0.4467 (0.5954) loss 3.0272 (2.8518) grad_norm 1.5824 (2.1563) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][370/625] eta 0:02:30 lr 0.000381 wd 0.0500 time 0.4435 (0.5912) data time 0.0007 (0.0066) model time 0.4429 (0.5846) loss 2.7501 (2.8392) grad_norm 1.9199 (2.1477) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][380/625] eta 0:02:22 lr 0.000381 wd 0.0500 time 0.4452 (0.5814) data time 0.0008 (0.0062) model time 0.4444 (0.5752) loss 2.0986 (2.8364) grad_norm 2.2962 (2.1200) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][390/625] eta 0:02:14 lr 0.000381 wd 0.0500 time 0.4491 (0.5730) data time 0.0008 (0.0059) model time 0.4483 (0.5671) loss 3.0215 (2.8370) grad_norm 3.0391 (2.1249) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][400/625] eta 0:02:07 lr 0.000381 wd 0.0500 time 0.4436 (0.5655) data time 0.0007 (0.0056) model time 0.4428 (0.5599) loss 2.8603 (2.8309) grad_norm 2.3781 (2.1303) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][410/625] eta 0:02:00 lr 0.000381 wd 0.0500 time 0.4449 (0.5588) data time 0.0008 (0.0053) model time 0.4441 (0.5535) loss 2.9513 (2.8173) grad_norm 1.7091 (2.2547) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][420/625] eta 0:01:53 lr 0.000381 wd 0.0500 time 0.4414 (0.5536) data time 0.0008 (0.0051) model time 0.4406 (0.5485) loss 2.3343 (2.8099) grad_norm 1.8419 (2.2373) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][430/625] eta 0:01:46 lr 0.000381 wd 0.0500 time 0.4439 (0.5481) data time 0.0008 (0.0049) model time 0.4431 (0.5433) loss 2.7150 (2.8002) grad_norm 1.5542 (2.2155) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][440/625] eta 0:01:40 lr 0.000381 wd 0.0500 time 0.4498 (0.5433) data time 0.0007 (0.0047) model time 0.4491 (0.5386) loss 3.1440 (2.7957) grad_norm 2.2432 (2.2509) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:05:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][450/625] eta 0:01:34 lr 0.000380 wd 0.0500 time 0.4468 (0.5388) data time 0.0006 (0.0045) model time 0.4462 (0.5343) loss 3.0215 (2.7937) grad_norm 1.6308 (2.2325) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][460/625] eta 0:01:28 lr 0.000380 wd 0.0500 time 0.4391 (0.5347) data time 0.0007 (0.0043) model time 0.4384 (0.5304) loss 1.8655 (2.7894) grad_norm 1.4992 (2.2357) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][470/625] eta 0:01:22 lr 0.000380 wd 0.0500 time 0.4439 (0.5309) data time 0.0006 (0.0042) model time 0.4433 (0.5267) loss 2.7242 (2.7882) grad_norm 1.8509 (2.2429) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][480/625] eta 0:01:16 lr 0.000380 wd 0.0500 time 0.4430 (0.5275) data time 0.0006 (0.0041) model time 0.4424 (0.5234) loss 2.9903 (2.7844) grad_norm 1.9437 (2.2555) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][490/625] eta 0:01:10 lr 0.000380 wd 0.0500 time 0.4436 (0.5243) data time 0.0006 (0.0039) model time 0.4431 (0.5204) loss 2.8298 (2.7750) grad_norm 1.7373 (2.2485) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][500/625] eta 0:01:05 lr 0.000380 wd 0.0500 time 0.4475 (0.5214) data time 0.0006 (0.0038) model time 0.4469 (0.5176) loss 3.3978 (2.7737) grad_norm 11.7399 (2.2929) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][510/625] eta 0:00:59 lr 0.000380 wd 0.0500 time 0.4454 (0.5188) data time 0.0007 (0.0037) model time 0.4446 (0.5151) loss 2.8478 (2.7778) grad_norm 1.7269 (2.2942) loss_scale 512.0000 (512.0000) mem 16702MB [2024-08-10 19:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 19:06:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:06:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:09:44 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:09:58 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:10:01 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:10:03 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:10:03 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 194) [2024-08-10 19:10:03 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][520/625] eta 0:07:57 lr 0.000380 wd 0.0500 time 0.4168 (4.5438) data time 0.0008 (0.1466) model time 0.4160 (4.3972) loss 2.8868 (3.0356) grad_norm 1.3741 (1.7517) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][530/625] eta 0:03:07 lr 0.000380 wd 0.0500 time 0.4672 (1.9696) data time 0.0010 (0.0556) model time 0.4662 (1.9140) loss 2.6168 (2.8692) grad_norm 2.3578 (3.0662) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][540/625] eta 0:01:56 lr 0.000380 wd 0.0500 time 0.4117 (1.3728) data time 0.0007 (0.0346) model time 0.4110 (1.3381) loss 2.9974 (2.9155) grad_norm 2.1626 (2.9675) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][550/625] eta 0:01:23 lr 0.000379 wd 0.0500 time 0.4150 (1.1160) data time 0.0010 (0.0253) model time 0.4140 (1.0907) loss 2.9214 (2.9173) grad_norm 1.4797 (3.0108) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][560/625] eta 0:01:03 lr 0.000379 wd 0.0500 time 0.4120 (0.9724) data time 0.0011 (0.0200) model time 0.4110 (0.9524) loss 2.8106 (2.8994) grad_norm 1.4975 (2.7912) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:10:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][570/625] eta 0:00:48 lr 0.000379 wd 0.0500 time 0.4120 (0.8734) data time 0.0007 (0.0166) model time 0.4112 (0.8568) loss 3.0162 (2.8908) grad_norm 1.8270 (3.2525) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:11:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][580/625] eta 0:00:36 lr 0.000379 wd 0.0500 time 0.4231 (0.8074) data time 0.0008 (0.0143) model time 0.4223 (0.7932) loss 2.3485 (2.8770) grad_norm 1.6514 (3.0364) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:11:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][590/625] eta 0:00:26 lr 0.000379 wd 0.0500 time 0.4109 (0.7571) data time 0.0013 (0.0125) model time 0.4096 (0.7446) loss 3.1249 (2.8444) grad_norm 1.8554 (2.8951) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:11:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][600/625] eta 0:00:17 lr 0.000379 wd 0.0500 time 0.4190 (0.7187) data time 0.0007 (0.0112) model time 0.4182 (0.7075) loss 2.5240 (2.8180) grad_norm 1.6887 (2.7733) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:11:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][610/625] eta 0:00:10 lr 0.000379 wd 0.0500 time 0.4080 (0.6879) data time 0.0005 (0.0101) model time 0.4075 (0.6778) loss 2.7117 (2.8233) grad_norm 1.9419 (2.6898) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:11:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [194/300][620/625] eta 0:00:03 lr 0.000379 wd 0.0500 time 0.4047 (0.6614) data time 0.0007 (0.0092) model time 0.4040 (0.6522) loss 3.1019 (2.8454) grad_norm 2.3797 (2.7791) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 194 training takes 0:01:11 [2024-08-10 19:11:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:11:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:11:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.578 (0.578) Loss 0.5088 (0.5088) Acc@1 89.258 (89.258) Acc@5 98.633 (98.633) Mem 16700MB [2024-08-10 19:11:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.169) Loss 0.8184 (0.6307) Acc@1 80.566 (86.510) Acc@5 95.801 (97.652) Mem 16700MB [2024-08-10 19:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.143) Loss 0.8882 (0.7450) Acc@1 77.979 (83.457) Acc@5 95.801 (96.617) Mem 16700MB [2024-08-10 19:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.185 Acc@5 96.609 [2024-08-10 19:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 19:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.19% [2024-08-10 19:11:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 19:11:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 19:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.594 (0.594) Loss 0.4707 (0.4707) Acc@1 89.648 (89.648) Acc@5 98.779 (98.779) Mem 16700MB [2024-08-10 19:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.167) Loss 0.7480 (0.5840) Acc@1 81.445 (87.287) Acc@5 96.680 (97.985) Mem 16700MB [2024-08-10 19:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.131 (0.142) Loss 0.8398 (0.6863) Acc@1 80.225 (84.601) Acc@5 95.996 (97.019) Mem 16700MB [2024-08-10 19:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.297 Acc@5 97.015 [2024-08-10 19:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 19:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.30% [2024-08-10 19:11:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 19:11:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 19:11:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][0/625] eta 0:12:55 lr 0.000379 wd 0.0500 time 1.2410 (1.2410) data time 0.5429 (0.5429) model time 0.0000 (0.0000) loss 2.5020 (2.5020) grad_norm 2.7806 (2.7806) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][10/625] eta 0:05:07 lr 0.000379 wd 0.0500 time 0.4109 (0.5004) data time 0.0007 (0.0502) model time 0.0000 (0.0000) loss 1.6928 (2.8172) grad_norm 1.5567 (1.9873) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][20/625] eta 0:04:44 lr 0.000378 wd 0.0500 time 0.4112 (0.4701) data time 0.0010 (0.0268) model time 0.0000 (0.0000) loss 3.0256 (2.7050) grad_norm 1.5498 (2.0221) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][30/625] eta 0:04:32 lr 0.000378 wd 0.0500 time 0.4164 (0.4583) data time 0.0007 (0.0184) model time 0.0000 (0.0000) loss 2.7913 (2.6581) grad_norm 1.9068 (1.9703) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][40/625] eta 0:04:24 lr 0.000378 wd 0.0500 time 0.4206 (0.4522) data time 0.0011 (0.0142) model time 0.0000 (0.0000) loss 2.3676 (2.6902) grad_norm 2.7042 (1.9772) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][50/625] eta 0:04:16 lr 0.000378 wd 0.0500 time 0.5078 (0.4468) data time 0.0009 (0.0116) model time 0.0000 (0.0000) loss 2.9975 (2.7185) grad_norm 1.3849 (1.9082) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][60/625] eta 0:04:09 lr 0.000378 wd 0.0500 time 0.4104 (0.4419) data time 0.0011 (0.0099) model time 0.4092 (0.4156) loss 2.8439 (2.7314) grad_norm 1.4310 (1.8608) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][70/625] eta 0:04:04 lr 0.000378 wd 0.0500 time 0.4129 (0.4412) data time 0.0010 (0.0086) model time 0.4119 (0.4258) loss 3.3702 (2.7249) grad_norm 1.2214 (1.8310) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][80/625] eta 0:03:58 lr 0.000378 wd 0.0500 time 0.4282 (0.4380) data time 0.0010 (0.0077) model time 0.4273 (0.4221) loss 2.4329 (2.7310) grad_norm 2.9271 (1.8743) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][90/625] eta 0:03:54 lr 0.000378 wd 0.0500 time 0.4158 (0.4376) data time 0.0009 (0.0070) model time 0.4149 (0.4249) loss 2.8176 (2.7067) grad_norm 1.6091 (1.8955) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][100/625] eta 0:03:49 lr 0.000378 wd 0.0500 time 0.4157 (0.4367) data time 0.0009 (0.0064) model time 0.4147 (0.4254) loss 3.1780 (2.6983) grad_norm 2.7168 (1.9130) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][110/625] eta 0:03:45 lr 0.000378 wd 0.0500 time 0.4830 (0.4384) data time 0.0007 (0.0059) model time 0.4823 (0.4302) loss 3.1480 (2.6898) grad_norm 1.8433 (1.9244) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][120/625] eta 0:03:40 lr 0.000378 wd 0.0500 time 0.4203 (0.4367) data time 0.0009 (0.0055) model time 0.4193 (0.4283) loss 1.8706 (2.6850) grad_norm 1.5733 (1.9137) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][130/625] eta 0:03:35 lr 0.000377 wd 0.0500 time 0.4124 (0.4354) data time 0.0007 (0.0051) model time 0.4117 (0.4272) loss 2.8878 (2.6843) grad_norm 1.6825 (1.9103) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][140/625] eta 0:03:30 lr 0.000377 wd 0.0500 time 0.4111 (0.4345) data time 0.0008 (0.0048) model time 0.4103 (0.4266) loss 2.8136 (2.6775) grad_norm 1.6579 (1.8956) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][150/625] eta 0:03:26 lr 0.000377 wd 0.0500 time 0.4118 (0.4339) data time 0.0008 (0.0046) model time 0.4110 (0.4264) loss 2.7488 (2.6738) grad_norm 13.0055 (1.9788) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][160/625] eta 0:03:21 lr 0.000377 wd 0.0500 time 0.4161 (0.4343) data time 0.0009 (0.0043) model time 0.4152 (0.4275) loss 2.9473 (2.6707) grad_norm 3.9510 (2.0138) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][170/625] eta 0:03:17 lr 0.000377 wd 0.0500 time 0.6102 (0.4342) data time 0.0010 (0.0042) model time 0.6092 (0.4278) loss 2.7660 (2.6790) grad_norm 2.9722 (2.0164) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:12:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][180/625] eta 0:03:13 lr 0.000377 wd 0.0500 time 0.4136 (0.4342) data time 0.0007 (0.0040) model time 0.4129 (0.4283) loss 1.7422 (2.6779) grad_norm 2.0662 (2.0125) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][190/625] eta 0:03:08 lr 0.000377 wd 0.0500 time 0.4130 (0.4336) data time 0.0010 (0.0038) model time 0.4120 (0.4278) loss 2.6051 (2.6731) grad_norm 2.7799 (2.1118) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][200/625] eta 0:03:03 lr 0.000377 wd 0.0500 time 0.4106 (0.4326) data time 0.0010 (0.0037) model time 0.4096 (0.4268) loss 3.3424 (2.6799) grad_norm 1.8570 (2.1077) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][210/625] eta 0:02:59 lr 0.000377 wd 0.0500 time 0.4138 (0.4330) data time 0.0008 (0.0036) model time 0.4130 (0.4277) loss 3.4877 (2.6999) grad_norm 1.3596 (2.0821) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][220/625] eta 0:02:55 lr 0.000377 wd 0.0500 time 0.4362 (0.4329) data time 0.0008 (0.0034) model time 0.4355 (0.4277) loss 1.8112 (2.7056) grad_norm 1.9961 (2.0769) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][230/625] eta 0:02:50 lr 0.000376 wd 0.0500 time 0.4117 (0.4325) data time 0.0010 (0.0033) model time 0.4107 (0.4275) loss 3.1819 (2.7156) grad_norm 1.4207 (2.0980) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][240/625] eta 0:02:46 lr 0.000376 wd 0.0500 time 0.4160 (0.4322) data time 0.0007 (0.0032) model time 0.4152 (0.4272) loss 3.2248 (2.7215) grad_norm 1.6473 (2.0864) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][250/625] eta 0:02:42 lr 0.000376 wd 0.0500 time 0.5036 (0.4320) data time 0.0008 (0.0032) model time 0.5028 (0.4273) loss 2.8269 (2.7231) grad_norm 1.4391 (2.0796) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][260/625] eta 0:02:37 lr 0.000376 wd 0.0500 time 0.4201 (0.4314) data time 0.0010 (0.0031) model time 0.4191 (0.4266) loss 2.3310 (2.7218) grad_norm 2.5606 (2.1208) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][270/625] eta 0:02:33 lr 0.000376 wd 0.0500 time 0.4235 (0.4310) data time 0.0007 (0.0030) model time 0.4227 (0.4264) loss 1.7638 (2.7164) grad_norm 2.3465 (2.1408) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][280/625] eta 0:02:28 lr 0.000376 wd 0.0500 time 0.4146 (0.4311) data time 0.0008 (0.0029) model time 0.4138 (0.4266) loss 3.6158 (2.7136) grad_norm 1.4131 (2.1354) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][290/625] eta 0:02:24 lr 0.000376 wd 0.0500 time 0.4107 (0.4304) data time 0.0009 (0.0029) model time 0.4097 (0.4260) loss 3.0490 (2.7199) grad_norm 2.8557 (2.1285) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][300/625] eta 0:02:19 lr 0.000376 wd 0.0500 time 0.4132 (0.4304) data time 0.0010 (0.0028) model time 0.4122 (0.4261) loss 2.5034 (2.7253) grad_norm 1.5935 (2.1180) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][310/625] eta 0:02:15 lr 0.000376 wd 0.0500 time 0.4102 (0.4301) data time 0.0008 (0.0027) model time 0.4094 (0.4258) loss 3.1693 (2.7268) grad_norm 1.6384 (2.1097) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:13:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][320/625] eta 0:02:11 lr 0.000376 wd 0.0500 time 0.4626 (0.4299) data time 0.0007 (0.0027) model time 0.4619 (0.4257) loss 2.9534 (2.7351) grad_norm 1.8710 (2.1024) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][330/625] eta 0:02:06 lr 0.000375 wd 0.0500 time 0.4112 (0.4297) data time 0.0008 (0.0026) model time 0.4104 (0.4256) loss 3.4208 (2.7440) grad_norm 1.4539 (2.1002) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][340/625] eta 0:02:02 lr 0.000375 wd 0.0500 time 0.4697 (0.4296) data time 0.0007 (0.0026) model time 0.4690 (0.4256) loss 2.5633 (2.7459) grad_norm 3.0966 (2.1190) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][350/625] eta 0:01:58 lr 0.000375 wd 0.0500 time 0.4114 (0.4294) data time 0.0009 (0.0025) model time 0.4105 (0.4254) loss 3.0854 (2.7378) grad_norm 1.6455 (2.1198) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][360/625] eta 0:01:53 lr 0.000375 wd 0.0500 time 0.4140 (0.4293) data time 0.0010 (0.0025) model time 0.4130 (0.4254) loss 2.6169 (2.7306) grad_norm 1.5177 (2.1167) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][370/625] eta 0:01:49 lr 0.000375 wd 0.0500 time 0.4116 (0.4293) data time 0.0007 (0.0024) model time 0.4108 (0.4256) loss 3.3204 (2.7300) grad_norm 9.8611 (2.1334) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][380/625] eta 0:01:45 lr 0.000375 wd 0.0500 time 0.4130 (0.4290) data time 0.0011 (0.0024) model time 0.4119 (0.4252) loss 3.3692 (2.7387) grad_norm 1.7416 (2.1335) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][390/625] eta 0:01:40 lr 0.000375 wd 0.0500 time 0.5041 (0.4290) data time 0.0009 (0.0024) model time 0.5032 (0.4254) loss 3.0654 (2.7383) grad_norm 1.2851 (2.1275) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][400/625] eta 0:01:36 lr 0.000375 wd 0.0500 time 0.4127 (0.4290) data time 0.0008 (0.0023) model time 0.4119 (0.4254) loss 3.2064 (2.7423) grad_norm 1.7038 (2.1187) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][410/625] eta 0:01:32 lr 0.000375 wd 0.0500 time 0.4116 (0.4286) data time 0.0008 (0.0023) model time 0.4108 (0.4251) loss 2.7785 (2.7421) grad_norm 1.6380 (2.1252) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][420/625] eta 0:01:27 lr 0.000375 wd 0.0500 time 0.4172 (0.4289) data time 0.0010 (0.0023) model time 0.4162 (0.4254) loss 3.2057 (2.7398) grad_norm 1.1780 (2.1166) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][430/625] eta 0:01:23 lr 0.000374 wd 0.0500 time 0.4096 (0.4289) data time 0.0009 (0.0022) model time 0.4087 (0.4255) loss 2.9635 (2.7386) grad_norm 2.0938 (2.1305) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][440/625] eta 0:01:19 lr 0.000374 wd 0.0500 time 0.4100 (0.4288) data time 0.0007 (0.0022) model time 0.4093 (0.4255) loss 3.3058 (2.7387) grad_norm 2.8161 (2.1293) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][450/625] eta 0:01:15 lr 0.000374 wd 0.0500 time 0.4126 (0.4292) data time 0.0009 (0.0022) model time 0.4117 (0.4260) loss 2.9026 (2.7430) grad_norm 1.6092 (2.1246) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][460/625] eta 0:01:10 lr 0.000374 wd 0.0500 time 0.4524 (0.4291) data time 0.0010 (0.0022) model time 0.4514 (0.4259) loss 3.0144 (2.7441) grad_norm 1.9788 (2.1144) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][470/625] eta 0:01:06 lr 0.000374 wd 0.0500 time 0.4192 (0.4291) data time 0.0009 (0.0021) model time 0.4183 (0.4259) loss 3.3469 (2.7445) grad_norm 1.5114 (2.1035) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][480/625] eta 0:01:02 lr 0.000374 wd 0.0500 time 0.4114 (0.4290) data time 0.0010 (0.0021) model time 0.4104 (0.4259) loss 3.2524 (2.7477) grad_norm 1.5093 (2.0910) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][490/625] eta 0:00:57 lr 0.000374 wd 0.0500 time 0.4196 (0.4291) data time 0.0010 (0.0021) model time 0.4186 (0.4260) loss 1.5458 (2.7443) grad_norm 1.7059 (2.0818) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][500/625] eta 0:00:53 lr 0.000374 wd 0.0500 time 0.4155 (0.4288) data time 0.0009 (0.0021) model time 0.4146 (0.4258) loss 2.9595 (2.7444) grad_norm 2.4737 (2.0872) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][510/625] eta 0:00:49 lr 0.000374 wd 0.0500 time 0.4306 (0.4292) data time 0.0008 (0.0021) model time 0.4299 (0.4263) loss 3.4017 (2.7459) grad_norm 1.7449 (2.0827) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][520/625] eta 0:00:45 lr 0.000374 wd 0.0500 time 0.4171 (0.4291) data time 0.0010 (0.0020) model time 0.4161 (0.4262) loss 3.2626 (2.7473) grad_norm 1.8758 (2.0895) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][530/625] eta 0:00:40 lr 0.000373 wd 0.0500 time 0.4793 (0.4290) data time 0.0008 (0.0020) model time 0.4785 (0.4261) loss 1.7735 (2.7465) grad_norm 1.5438 (2.0843) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 19:15:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:15:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:17:15 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:17:16 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:17:29 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:17:39 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:17:42 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:17:44 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:17:44 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 195) [2024-08-10 19:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:18:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][540/625] eta 0:03:15 lr 0.000373 wd 0.0500 time 0.4456 (2.3042) data time 0.0008 (0.0847) model time 0.4448 (2.2195) loss 3.0598 (3.1623) grad_norm 3.9359 (2.1032) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][550/625] eta 0:01:43 lr 0.000373 wd 0.0500 time 0.4495 (1.3783) data time 0.0006 (0.0428) model time 0.4490 (1.3355) loss 3.2022 (3.0057) grad_norm 1.6184 (1.9564) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][560/625] eta 0:01:09 lr 0.000373 wd 0.0500 time 0.4688 (1.0769) data time 0.0008 (0.0288) model time 0.4680 (1.0480) loss 3.2193 (3.0318) grad_norm 2.3378 (2.0999) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][570/625] eta 0:00:50 lr 0.000373 wd 0.0500 time 0.4503 (0.9256) data time 0.0007 (0.0221) model time 0.4496 (0.9035) loss 3.0681 (2.9611) grad_norm 1.7163 (2.9928) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][580/625] eta 0:00:37 lr 0.000373 wd 0.0500 time 0.4484 (0.8306) data time 0.0008 (0.0179) model time 0.4476 (0.8127) loss 2.4444 (2.9315) grad_norm 1.8415 (2.8909) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][590/625] eta 0:00:26 lr 0.000373 wd 0.0500 time 0.4502 (0.7681) data time 0.0007 (0.0150) model time 0.4496 (0.7531) loss 2.8075 (2.8987) grad_norm 2.2401 (2.7573) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][600/625] eta 0:00:18 lr 0.000373 wd 0.0500 time 0.4500 (0.7224) data time 0.0007 (0.0130) model time 0.4494 (0.7094) loss 1.9239 (2.8596) grad_norm 2.1167 (2.6483) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][610/625] eta 0:00:10 lr 0.000373 wd 0.0500 time 0.4463 (0.6886) data time 0.0006 (0.0115) model time 0.4457 (0.6770) loss 2.9225 (2.8462) grad_norm 2.0960 (2.5521) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [195/300][620/625] eta 0:00:03 lr 0.000373 wd 0.0500 time 0.4439 (0.6617) data time 0.0005 (0.0103) model time 0.4434 (0.6514) loss 2.9503 (2.8219) grad_norm 1.4984 (2.6442) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 195 training takes 0:01:01 [2024-08-10 19:18:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:18:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.5244 (0.5244) Acc@1 88.916 (88.916) Acc@5 98.584 (98.584) Mem 16711MB [2024-08-10 19:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8164 (0.6325) Acc@1 80.176 (86.444) Acc@5 96.143 (97.714) Mem 16711MB [2024-08-10 19:18:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9219 (0.7481) Acc@1 77.686 (83.389) Acc@5 94.775 (96.598) Mem 16711MB [2024-08-10 19:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.111 Acc@5 96.565 [2024-08-10 19:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 19:19:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.798 (1.798) Loss 0.4700 (0.4700) Acc@1 89.697 (89.697) Acc@5 98.779 (98.779) Mem 16711MB [2024-08-10 19:19:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.271) Loss 0.7466 (0.5837) Acc@1 81.641 (87.336) Acc@5 96.729 (97.967) Mem 16711MB [2024-08-10 19:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.197) Loss 0.8389 (0.6860) Acc@1 80.225 (84.633) Acc@5 95.947 (97.024) Mem 16711MB [2024-08-10 19:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.339 Acc@5 97.021 [2024-08-10 19:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 19:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.34% [2024-08-10 19:19:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 19:19:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 19:19:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][0/625] eta 0:09:36 lr 0.000373 wd 0.0500 time 0.9228 (0.9228) data time 0.3983 (0.3983) model time 0.0000 (0.0000) loss 3.1073 (3.1073) grad_norm 1.9546 (1.9546) loss_scale 512.0000 (512.0000) mem 16710MB [2024-08-10 19:19:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][10/625] eta 0:05:04 lr 0.000372 wd 0.0500 time 0.4491 (0.4946) data time 0.0008 (0.0372) model time 0.0000 (0.0000) loss 2.6877 (3.0894) grad_norm 2.5145 (1.9952) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][20/625] eta 0:04:47 lr 0.000372 wd 0.0500 time 0.4465 (0.4747) data time 0.0007 (0.0199) model time 0.0000 (0.0000) loss 2.3398 (2.9571) grad_norm 1.5805 (1.8466) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][30/625] eta 0:04:38 lr 0.000372 wd 0.0500 time 0.4558 (0.4683) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 2.8155 (2.9531) grad_norm 1.5944 (1.7932) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][40/625] eta 0:04:31 lr 0.000372 wd 0.0500 time 0.4471 (0.4635) data time 0.0007 (0.0107) model time 0.0000 (0.0000) loss 3.0265 (2.9343) grad_norm 1.4844 (1.7686) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][50/625] eta 0:04:25 lr 0.000372 wd 0.0500 time 0.4454 (0.4617) data time 0.0008 (0.0088) model time 0.0000 (0.0000) loss 3.3570 (2.8854) grad_norm 1.4091 (1.7574) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][60/625] eta 0:04:19 lr 0.000372 wd 0.0500 time 0.4509 (0.4602) data time 0.0008 (0.0075) model time 0.4502 (0.4514) loss 3.0343 (2.8601) grad_norm 2.1546 (1.8591) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][70/625] eta 0:04:15 lr 0.000372 wd 0.0500 time 0.4523 (0.4599) data time 0.0008 (0.0065) model time 0.4514 (0.4544) loss 3.0682 (2.8565) grad_norm 4.3494 (1.9594) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][80/625] eta 0:04:09 lr 0.000372 wd 0.0500 time 0.4466 (0.4585) data time 0.0006 (0.0059) model time 0.4459 (0.4522) loss 2.7643 (2.8436) grad_norm 2.1844 (1.9982) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][90/625] eta 0:04:05 lr 0.000372 wd 0.0500 time 0.4672 (0.4579) data time 0.0009 (0.0053) model time 0.4663 (0.4522) loss 2.4866 (2.8171) grad_norm 2.2749 (1.9992) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:19:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][100/625] eta 0:04:00 lr 0.000372 wd 0.0500 time 0.4531 (0.4573) data time 0.0007 (0.0049) model time 0.4523 (0.4519) loss 1.6611 (2.7966) grad_norm 1.8472 (2.1049) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][110/625] eta 0:03:55 lr 0.000371 wd 0.0500 time 0.4488 (0.4567) data time 0.0008 (0.0045) model time 0.4480 (0.4516) loss 2.3933 (2.7734) grad_norm 1.8031 (2.1253) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][120/625] eta 0:03:52 lr 0.000371 wd 0.0500 time 0.4470 (0.4598) data time 0.0008 (0.0042) model time 0.4463 (0.4575) loss 2.2846 (2.7563) grad_norm 3.3329 (2.2787) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][130/625] eta 0:03:47 lr 0.000371 wd 0.0500 time 0.4464 (0.4591) data time 0.0007 (0.0040) model time 0.4457 (0.4565) loss 2.4973 (2.7612) grad_norm 1.9215 (2.2625) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][140/625] eta 0:03:42 lr 0.000371 wd 0.0500 time 0.4522 (0.4586) data time 0.0008 (0.0037) model time 0.4514 (0.4559) loss 3.0377 (2.7578) grad_norm 2.5576 (2.2806) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][150/625] eta 0:03:37 lr 0.000371 wd 0.0500 time 0.4520 (0.4581) data time 0.0007 (0.0035) model time 0.4513 (0.4554) loss 2.5893 (2.7585) grad_norm 1.8933 (2.2595) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][160/625] eta 0:03:32 lr 0.000371 wd 0.0500 time 0.4486 (0.4580) data time 0.0006 (0.0034) model time 0.4480 (0.4554) loss 2.5207 (2.7470) grad_norm 2.0258 (2.2358) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][170/625] eta 0:03:28 lr 0.000371 wd 0.0500 time 0.4606 (0.4577) data time 0.0006 (0.0032) model time 0.4600 (0.4551) loss 2.3925 (2.7388) grad_norm 1.6054 (2.1959) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][180/625] eta 0:03:23 lr 0.000371 wd 0.0500 time 0.4480 (0.4576) data time 0.0008 (0.0031) model time 0.4473 (0.4551) loss 3.5921 (2.7484) grad_norm 2.5510 (2.1669) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][190/625] eta 0:03:18 lr 0.000371 wd 0.0500 time 0.4485 (0.4574) data time 0.0008 (0.0030) model time 0.4477 (0.4549) loss 2.5812 (2.7397) grad_norm 11.9036 (2.2178) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][200/625] eta 0:03:14 lr 0.000371 wd 0.0500 time 0.4664 (0.4572) data time 0.0009 (0.0029) model time 0.4655 (0.4548) loss 2.5504 (2.7382) grad_norm 1.5150 (2.2252) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][210/625] eta 0:03:09 lr 0.000370 wd 0.0500 time 0.4503 (0.4567) data time 0.0009 (0.0028) model time 0.4495 (0.4542) loss 2.1367 (2.7254) grad_norm 3.7715 (2.2529) loss_scale 512.0000 (512.0000) mem 16715MB [2024-08-10 19:20:49 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 19:20:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:20:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:29:01 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:29:03 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:29:17 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:29:31 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:29:31 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:29:33 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:29:35 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:29:35 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 196) [2024-08-10 19:29:35 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][220/625] eta 0:21:01 lr 0.000370 wd 0.0500 time 0.4647 (3.1137) data time 0.0007 (0.0867) model time 0.4639 (3.0271) loss 3.1017 (3.1086) grad_norm 2.0477 (2.1606) loss_scale 512.0000 (512.0000) mem 16721MB [2024-08-10 19:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 19:30:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:30:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:41:49 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:42:08 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:42:26 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:42:29 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:42:31 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:42:31 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 196) [2024-08-10 19:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:43:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][230/625] eta 0:42:23 lr 0.000370 wd 0.0500 time 0.4249 (6.4392) data time 0.0008 (0.2038) model time 0.4241 (6.2354) loss 3.0661 (3.0947) grad_norm 5.5601 (2.7551) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][240/625] eta 0:13:43 lr 0.000370 wd 0.0500 time 0.4234 (2.1384) data time 0.0007 (0.0590) model time 0.4227 (2.0795) loss 3.1990 (3.0850) grad_norm 2.3166 (3.0273) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][250/625] eta 0:08:56 lr 0.000370 wd 0.0500 time 0.4148 (1.4296) data time 0.0010 (0.0348) model time 0.4137 (1.3948) loss 2.7467 (3.0387) grad_norm 1.3652 (2.9441) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][260/625] eta 0:06:56 lr 0.000370 wd 0.0500 time 0.4202 (1.1422) data time 0.0008 (0.0249) model time 0.4194 (1.1173) loss 2.2026 (3.0174) grad_norm 1.8056 (2.7072) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][270/625] eta 0:05:49 lr 0.000370 wd 0.0500 time 0.4121 (0.9840) data time 0.0009 (0.0195) model time 0.4112 (0.9646) loss 2.7572 (2.9952) grad_norm 1.7828 (2.8547) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][280/625] eta 0:05:04 lr 0.000370 wd 0.0500 time 0.4301 (0.8819) data time 0.0008 (0.0160) model time 0.4294 (0.8659) loss 3.5121 (2.9991) grad_norm 1.8949 (2.7357) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][290/625] eta 0:04:31 lr 0.000370 wd 0.0500 time 0.4170 (0.8102) data time 0.0007 (0.0137) model time 0.4163 (0.7965) loss 3.2345 (2.9667) grad_norm 2.0443 (2.6235) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][300/625] eta 0:04:06 lr 0.000370 wd 0.0500 time 0.4317 (0.7589) data time 0.0008 (0.0120) model time 0.4309 (0.7469) loss 3.0188 (2.9331) grad_norm 1.8936 (2.7395) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][310/625] eta 0:03:46 lr 0.000370 wd 0.0500 time 0.4155 (0.7192) data time 0.0010 (0.0107) model time 0.4145 (0.7085) loss 2.9981 (2.9069) grad_norm 2.0371 (2.7105) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][320/625] eta 0:03:30 lr 0.000369 wd 0.0500 time 0.4262 (0.6888) data time 0.0010 (0.0096) model time 0.4252 (0.6791) loss 2.5185 (2.8878) grad_norm 2.3188 (2.6698) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][330/625] eta 0:03:15 lr 0.000369 wd 0.0500 time 0.4904 (0.6633) data time 0.0010 (0.0088) model time 0.4894 (0.6545) loss 2.7710 (2.9079) grad_norm 1.9075 (2.6059) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][340/625] eta 0:03:02 lr 0.000369 wd 0.0500 time 0.4152 (0.6420) data time 0.0010 (0.0081) model time 0.4142 (0.6338) loss 2.8919 (2.8946) grad_norm 1.6185 (2.5461) loss_scale 512.0000 (512.0000) mem 16700MB [2024-08-10 19:43:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][350/625] eta 0:02:51 lr 0.000369 wd 0.0500 time 0.4266 (0.6244) data time 0.0010 (0.0076) model time 0.4256 (0.6169) loss 2.2897 (2.8859) grad_norm 2.5825 (2.5362) loss_scale 1024.0000 (532.6452) mem 16700MB [2024-08-10 19:43:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][360/625] eta 0:02:41 lr 0.000369 wd 0.0500 time 0.4140 (0.6087) data time 0.0010 (0.0071) model time 0.4129 (0.6017) loss 2.9218 (2.8750) grad_norm 2.0835 (2.5397) loss_scale 1024.0000 (569.3134) mem 16700MB [2024-08-10 19:44:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][370/625] eta 0:02:32 lr 0.000369 wd 0.0500 time 0.4215 (0.5966) data time 0.0009 (0.0067) model time 0.4207 (0.5900) loss 2.3905 (2.8536) grad_norm 2.1586 (2.4994) loss_scale 1024.0000 (600.8889) mem 16700MB [2024-08-10 19:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][380/625] eta 0:02:23 lr 0.000369 wd 0.0500 time 0.4156 (0.5856) data time 0.0008 (0.0063) model time 0.4149 (0.5793) loss 2.8790 (2.8454) grad_norm 2.6249 (2.4587) loss_scale 1024.0000 (628.3636) mem 16700MB [2024-08-10 19:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][390/625] eta 0:02:15 lr 0.000369 wd 0.0500 time 0.4762 (0.5760) data time 0.0008 (0.0060) model time 0.4754 (0.5700) loss 3.0513 (2.8431) grad_norm 3.4119 (2.4264) loss_scale 1024.0000 (652.4878) mem 16700MB [2024-08-10 19:44:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][400/625] eta 0:02:07 lr 0.000369 wd 0.0500 time 0.4150 (0.5668) data time 0.0010 (0.0057) model time 0.4140 (0.5611) loss 1.7366 (2.8412) grad_norm 1.6606 (2.3956) loss_scale 1024.0000 (673.8391) mem 16700MB [2024-08-10 19:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][410/625] eta 0:02:00 lr 0.000369 wd 0.0500 time 0.4054 (0.5602) data time 0.0009 (0.0054) model time 0.4045 (0.5547) loss 2.5844 (2.8369) grad_norm 2.8262 (2.3844) loss_scale 1024.0000 (692.8696) mem 16700MB [2024-08-10 19:44:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][420/625] eta 0:01:53 lr 0.000368 wd 0.0500 time 0.4181 (0.5531) data time 0.0008 (0.0052) model time 0.4173 (0.5479) loss 2.9970 (2.8339) grad_norm 2.0837 (2.3562) loss_scale 1024.0000 (709.9381) mem 16700MB [2024-08-10 19:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][430/625] eta 0:01:46 lr 0.000368 wd 0.0500 time 0.4115 (0.5466) data time 0.0010 (0.0050) model time 0.4105 (0.5415) loss 3.0276 (2.8161) grad_norm 1.8146 (2.3407) loss_scale 1024.0000 (725.3333) mem 16700MB [2024-08-10 19:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][440/625] eta 0:01:40 lr 0.000368 wd 0.0500 time 0.4157 (0.5413) data time 0.0008 (0.0048) model time 0.4149 (0.5365) loss 2.5737 (2.8066) grad_norm 2.5959 (2.3283) loss_scale 1024.0000 (739.2897) mem 16700MB [2024-08-10 19:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][450/625] eta 0:01:33 lr 0.000368 wd 0.0500 time 0.4122 (0.5356) data time 0.0010 (0.0046) model time 0.4112 (0.5310) loss 3.1667 (2.8060) grad_norm 2.7019 (2.3822) loss_scale 1024.0000 (752.0000) mem 16700MB [2024-08-10 19:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][460/625] eta 0:01:27 lr 0.000368 wd 0.0500 time 0.4116 (0.5313) data time 0.0008 (0.0045) model time 0.4108 (0.5268) loss 1.9033 (2.7957) grad_norm 1.8875 (2.3683) loss_scale 1024.0000 (763.6239) mem 16700MB [2024-08-10 19:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][470/625] eta 0:01:21 lr 0.000368 wd 0.0500 time 0.4168 (0.5271) data time 0.0007 (0.0044) model time 0.4161 (0.5227) loss 1.9230 (2.7950) grad_norm 1.9041 (2.3523) loss_scale 1024.0000 (774.2951) mem 16700MB [2024-08-10 19:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][480/625] eta 0:01:15 lr 0.000368 wd 0.0500 time 0.5059 (0.5231) data time 0.0009 (0.0042) model time 0.5049 (0.5189) loss 2.3010 (2.7873) grad_norm 1.9330 (2.3308) loss_scale 1024.0000 (784.1260) mem 16700MB [2024-08-10 19:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][490/625] eta 0:01:10 lr 0.000368 wd 0.0500 time 0.4131 (0.5195) data time 0.0008 (0.0041) model time 0.4123 (0.5154) loss 2.6326 (2.7796) grad_norm 1.8132 (2.3330) loss_scale 1024.0000 (793.2121) mem 16700MB [2024-08-10 19:44:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][500/625] eta 0:01:04 lr 0.000368 wd 0.0500 time 0.4211 (0.5159) data time 0.0010 (0.0040) model time 0.4201 (0.5119) loss 3.0445 (2.7749) grad_norm 2.2648 (inf) loss_scale 512.0000 (786.6861) mem 16700MB [2024-08-10 19:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][510/625] eta 0:00:58 lr 0.000368 wd 0.0500 time 0.4154 (0.5127) data time 0.0007 (0.0039) model time 0.4146 (0.5089) loss 1.8836 (2.7692) grad_norm 2.0228 (inf) loss_scale 512.0000 (777.0141) mem 16700MB [2024-08-10 19:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][520/625] eta 0:00:53 lr 0.000367 wd 0.0500 time 0.4141 (0.5099) data time 0.0007 (0.0038) model time 0.4134 (0.5061) loss 2.3310 (2.7646) grad_norm 2.3801 (inf) loss_scale 512.0000 (768.0000) mem 16700MB [2024-08-10 19:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][530/625] eta 0:00:48 lr 0.000367 wd 0.0500 time 0.4744 (0.5076) data time 0.0010 (0.0037) model time 0.4734 (0.5039) loss 2.8836 (2.7602) grad_norm 2.3157 (inf) loss_scale 512.0000 (759.5789) mem 16700MB [2024-08-10 19:45:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][540/625] eta 0:00:42 lr 0.000367 wd 0.0500 time 0.4281 (0.5046) data time 0.0010 (0.0036) model time 0.4271 (0.5010) loss 2.9811 (2.7619) grad_norm 2.2384 (inf) loss_scale 512.0000 (751.6943) mem 16700MB [2024-08-10 19:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][550/625] eta 0:00:37 lr 0.000367 wd 0.0500 time 0.4624 (0.5023) data time 0.0009 (0.0035) model time 0.4615 (0.4987) loss 3.5789 (2.7729) grad_norm 2.1734 (inf) loss_scale 512.0000 (744.2963) mem 16700MB [2024-08-10 19:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][560/625] eta 0:00:32 lr 0.000367 wd 0.0500 time 0.5105 (0.5002) data time 0.0009 (0.0035) model time 0.5096 (0.4968) loss 2.7086 (2.7751) grad_norm 1.8400 (inf) loss_scale 512.0000 (737.3413) mem 16700MB [2024-08-10 19:45:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][570/625] eta 0:00:27 lr 0.000367 wd 0.0500 time 0.4150 (0.4979) data time 0.0010 (0.0034) model time 0.4140 (0.4945) loss 3.2361 (2.7756) grad_norm 2.6755 (inf) loss_scale 512.0000 (730.7907) mem 16700MB [2024-08-10 19:45:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][580/625] eta 0:00:22 lr 0.000367 wd 0.0500 time 0.4138 (0.4961) data time 0.0011 (0.0033) model time 0.4128 (0.4928) loss 2.8972 (2.7768) grad_norm 2.2671 (inf) loss_scale 512.0000 (724.6102) mem 16700MB [2024-08-10 19:45:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][590/625] eta 0:00:17 lr 0.000367 wd 0.0500 time 0.4552 (0.4939) data time 0.0010 (0.0033) model time 0.4542 (0.4907) loss 1.9058 (2.7747) grad_norm 1.4446 (inf) loss_scale 512.0000 (718.7692) mem 16700MB [2024-08-10 19:45:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][600/625] eta 0:00:12 lr 0.000367 wd 0.0500 time 0.4120 (0.4929) data time 0.0012 (0.0032) model time 0.4108 (0.4897) loss 3.1812 (2.7774) grad_norm 1.9313 (inf) loss_scale 512.0000 (713.2406) mem 16700MB [2024-08-10 19:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][610/625] eta 0:00:07 lr 0.000367 wd 0.0500 time 0.4103 (0.4910) data time 0.0006 (0.0031) model time 0.4097 (0.4879) loss 2.1680 (2.7698) grad_norm 2.0835 (inf) loss_scale 512.0000 (708.0000) mem 16700MB [2024-08-10 19:45:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [196/300][620/625] eta 0:00:02 lr 0.000366 wd 0.0500 time 0.4923 (0.4891) data time 0.0007 (0.0031) model time 0.4916 (0.4860) loss 3.0380 (2.7683) grad_norm 1.6236 (inf) loss_scale 512.0000 (703.0254) mem 16700MB [2024-08-10 19:45:50 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 196 training takes 0:03:14 [2024-08-10 19:45:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:45:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:45:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.583 (0.583) Loss 0.5220 (0.5220) Acc@1 88.574 (88.574) Acc@5 98.682 (98.682) Mem 16700MB [2024-08-10 19:45:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.167) Loss 0.8271 (0.6380) Acc@1 80.615 (86.279) Acc@5 95.947 (97.741) Mem 16700MB [2024-08-10 19:45:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.149) Loss 0.9219 (0.7517) Acc@1 77.588 (83.436) Acc@5 95.117 (96.598) Mem 16700MB [2024-08-10 19:46:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.159 Acc@5 96.565 [2024-08-10 19:46:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 19:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.004 (1.004) Loss 0.4702 (0.4702) Acc@1 89.648 (89.648) Acc@5 98.828 (98.828) Mem 16700MB [2024-08-10 19:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.211) Loss 0.7466 (0.5838) Acc@1 81.885 (87.402) Acc@5 96.777 (97.967) Mem 16700MB [2024-08-10 19:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.165) Loss 0.8394 (0.6861) Acc@1 80.273 (84.675) Acc@5 96.045 (97.040) Mem 16700MB [2024-08-10 19:46:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.373 Acc@5 97.037 [2024-08-10 19:46:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 19:46:07 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.37% [2024-08-10 19:46:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 19:46:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 19:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][0/625] eta 0:11:15 lr 0.000366 wd 0.0500 time 1.0807 (1.0807) data time 0.5253 (0.5253) model time 0.0000 (0.0000) loss 2.8559 (2.8559) grad_norm 2.1269 (2.1269) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:46:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][10/625] eta 0:04:58 lr 0.000366 wd 0.0500 time 0.4128 (0.4861) data time 0.0010 (0.0486) model time 0.0000 (0.0000) loss 2.8980 (2.9440) grad_norm 2.1084 (2.1945) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][20/625] eta 0:04:40 lr 0.000366 wd 0.0500 time 0.4139 (0.4642) data time 0.0008 (0.0260) model time 0.0000 (0.0000) loss 2.0504 (2.8605) grad_norm 1.7549 (2.0182) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][30/625] eta 0:04:29 lr 0.000366 wd 0.0500 time 0.4177 (0.4525) data time 0.0009 (0.0179) model time 0.0000 (0.0000) loss 2.7607 (2.8989) grad_norm 1.3408 (1.9982) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][40/625] eta 0:04:20 lr 0.000366 wd 0.0500 time 0.4192 (0.4454) data time 0.0010 (0.0138) model time 0.0000 (0.0000) loss 2.5871 (2.9051) grad_norm 1.4970 (1.9413) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][50/625] eta 0:04:13 lr 0.000366 wd 0.0500 time 0.4135 (0.4408) data time 0.0010 (0.0113) model time 0.0000 (0.0000) loss 2.6575 (2.8603) grad_norm 2.0218 (1.9968) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][60/625] eta 0:04:07 lr 0.000366 wd 0.0500 time 0.4158 (0.4381) data time 0.0012 (0.0096) model time 0.4145 (0.4232) loss 2.6541 (2.8084) grad_norm 3.1342 (2.6105) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][70/625] eta 0:04:02 lr 0.000366 wd 0.0500 time 0.4168 (0.4377) data time 0.0007 (0.0084) model time 0.4161 (0.4286) loss 1.5832 (2.7636) grad_norm 1.2652 (2.5091) loss_scale 512.0000 (512.0000) mem 16707MB [2024-08-10 19:46:45 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 19:46:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:46:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 19:48:47 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 19:49:00 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 19:49:13 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 19:49:13 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 19:49:16 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 19:49:18 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 19:49:18 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 197) [2024-08-10 19:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 19:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][80/625] eta 1:36:18 lr 0.000366 wd 0.0500 time 1.2317 (10.6022) data time 0.0011 (0.3434) model time 1.2306 (10.2588) loss 2.8483 (3.0907) grad_norm 1.6261 (1.6322) loss_scale 512.0000 (512.0000) mem 16711MB [2024-08-10 19:49:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][90/625] eta 0:19:18 lr 0.000366 wd 0.0500 time 0.4766 (2.1651) data time 0.0009 (0.0581) model time 0.4758 (2.1070) loss 2.4562 (2.8975) grad_norm 1.8988 (3.3192) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:49:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][100/625] eta 0:12:13 lr 0.000365 wd 0.0500 time 0.4712 (1.3967) data time 0.0012 (0.0323) model time 0.4700 (1.3645) loss 2.8277 (2.9160) grad_norm 1.7533 (3.0018) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:49:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][110/625] eta 0:09:35 lr 0.000365 wd 0.0500 time 0.4797 (1.1172) data time 0.0009 (0.0225) model time 0.4789 (1.0946) loss 3.0013 (2.9432) grad_norm 1.7397 (2.6611) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][120/625] eta 0:08:09 lr 0.000365 wd 0.0500 time 0.4701 (0.9687) data time 0.0011 (0.0175) model time 0.4690 (0.9513) loss 2.6990 (2.9032) grad_norm 1.6434 (2.4908) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][130/625] eta 0:07:12 lr 0.000365 wd 0.0500 time 0.4745 (0.8735) data time 0.0008 (0.0143) model time 0.4736 (0.8591) loss 2.6851 (2.8877) grad_norm 2.3040 (2.4111) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][140/625] eta 0:06:32 lr 0.000365 wd 0.0500 time 0.4812 (0.8100) data time 0.0009 (0.0122) model time 0.4804 (0.7978) loss 3.4155 (2.8507) grad_norm 2.0483 (2.2866) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][150/625] eta 0:06:02 lr 0.000365 wd 0.0500 time 0.4796 (0.7641) data time 0.0011 (0.0107) model time 0.4785 (0.7534) loss 2.9636 (2.8110) grad_norm 2.2734 (2.2476) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][160/625] eta 0:05:39 lr 0.000365 wd 0.0500 time 0.4758 (0.7292) data time 0.0010 (0.0095) model time 0.4747 (0.7197) loss 2.5430 (2.8018) grad_norm 1.5919 (2.2226) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][170/625] eta 0:05:19 lr 0.000365 wd 0.0500 time 0.4738 (0.7019) data time 0.0008 (0.0086) model time 0.4730 (0.6933) loss 1.8868 (2.7888) grad_norm 2.1373 (2.1804) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][180/625] eta 0:05:02 lr 0.000365 wd 0.0500 time 0.4733 (0.6795) data time 0.0009 (0.0079) model time 0.4724 (0.6717) loss 3.2679 (2.8099) grad_norm 1.8984 (2.2384) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][190/625] eta 0:04:47 lr 0.000365 wd 0.0500 time 0.4715 (0.6609) data time 0.0010 (0.0073) model time 0.4705 (0.6537) loss 2.8674 (2.8042) grad_norm 1.8225 (2.2382) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][200/625] eta 0:04:34 lr 0.000364 wd 0.0500 time 0.4685 (0.6455) data time 0.0009 (0.0067) model time 0.4677 (0.6388) loss 3.1746 (2.8039) grad_norm 1.4284 (2.2018) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][210/625] eta 0:04:22 lr 0.000364 wd 0.0500 time 0.4757 (0.6325) data time 0.0012 (0.0063) model time 0.4745 (0.6261) loss 3.0868 (2.7969) grad_norm 1.7742 (2.1839) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][220/625] eta 0:04:11 lr 0.000364 wd 0.0500 time 0.4823 (0.6216) data time 0.0011 (0.0059) model time 0.4812 (0.6156) loss 2.6010 (2.7914) grad_norm 2.3720 (2.1572) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][230/625] eta 0:04:01 lr 0.000364 wd 0.0500 time 0.4749 (0.6121) data time 0.0012 (0.0056) model time 0.4737 (0.6064) loss 3.0601 (2.7943) grad_norm 2.6181 (2.1476) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][240/625] eta 0:03:52 lr 0.000364 wd 0.0500 time 0.4780 (0.6038) data time 0.0011 (0.0053) model time 0.4769 (0.5984) loss 3.0723 (2.7986) grad_norm 2.6185 (2.1375) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][250/625] eta 0:03:43 lr 0.000364 wd 0.0500 time 0.4659 (0.5960) data time 0.0010 (0.0051) model time 0.4648 (0.5909) loss 2.7061 (2.7996) grad_norm 1.3816 (2.1485) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][260/625] eta 0:03:35 lr 0.000364 wd 0.0500 time 0.4744 (0.5892) data time 0.0011 (0.0049) model time 0.4733 (0.5843) loss 3.1106 (2.7849) grad_norm 1.8280 (2.1708) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][270/625] eta 0:03:27 lr 0.000364 wd 0.0500 time 0.4694 (0.5840) data time 0.0012 (0.0047) model time 0.4682 (0.5793) loss 3.2519 (2.7844) grad_norm 1.9513 (2.1571) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][280/625] eta 0:03:19 lr 0.000364 wd 0.0500 time 0.4825 (0.5786) data time 0.0009 (0.0045) model time 0.4816 (0.5741) loss 3.0934 (2.7723) grad_norm 1.7871 (2.1416) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][290/625] eta 0:03:12 lr 0.000364 wd 0.0500 time 0.4777 (0.5738) data time 0.0013 (0.0043) model time 0.4764 (0.5695) loss 2.9320 (2.7699) grad_norm 1.4887 (2.1294) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][300/625] eta 0:03:05 lr 0.000364 wd 0.0500 time 0.4759 (0.5695) data time 0.0012 (0.0042) model time 0.4747 (0.5653) loss 3.2096 (2.7668) grad_norm 2.0337 (2.1700) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][310/625] eta 0:02:58 lr 0.000363 wd 0.0500 time 0.4750 (0.5655) data time 0.0009 (0.0041) model time 0.4741 (0.5614) loss 2.5658 (2.7627) grad_norm 2.4415 (2.1721) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][320/625] eta 0:02:51 lr 0.000363 wd 0.0500 time 0.4681 (0.5617) data time 0.0009 (0.0039) model time 0.4672 (0.5578) loss 3.0714 (2.7583) grad_norm 1.9433 (2.1773) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][330/625] eta 0:02:44 lr 0.000363 wd 0.0500 time 0.4684 (0.5581) data time 0.0009 (0.0038) model time 0.4675 (0.5543) loss 3.0856 (2.7501) grad_norm 2.5325 (2.1779) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][340/625] eta 0:02:38 lr 0.000363 wd 0.0500 time 0.4700 (0.5548) data time 0.0011 (0.0037) model time 0.4689 (0.5511) loss 2.9777 (2.7449) grad_norm 2.5402 (2.1816) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][350/625] eta 0:02:31 lr 0.000363 wd 0.0500 time 0.4736 (0.5518) data time 0.0011 (0.0036) model time 0.4725 (0.5482) loss 2.6101 (2.7418) grad_norm 2.1331 (2.1711) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:51:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][360/625] eta 0:02:25 lr 0.000363 wd 0.0500 time 0.4761 (0.5491) data time 0.0011 (0.0035) model time 0.4750 (0.5456) loss 2.0390 (2.7427) grad_norm 1.9881 (2.1545) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][370/625] eta 0:02:19 lr 0.000363 wd 0.0500 time 0.4743 (0.5466) data time 0.0011 (0.0035) model time 0.4732 (0.5431) loss 2.7868 (2.7417) grad_norm 1.9599 (2.1436) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][380/625] eta 0:02:13 lr 0.000363 wd 0.0500 time 0.4773 (0.5443) data time 0.0011 (0.0034) model time 0.4763 (0.5409) loss 2.7389 (2.7383) grad_norm 1.5572 (2.1294) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][390/625] eta 0:02:07 lr 0.000363 wd 0.0500 time 0.4736 (0.5421) data time 0.0012 (0.0033) model time 0.4724 (0.5388) loss 2.9984 (2.7394) grad_norm 2.0204 (2.1590) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][400/625] eta 0:02:01 lr 0.000363 wd 0.0500 time 0.4708 (0.5399) data time 0.0011 (0.0032) model time 0.4697 (0.5367) loss 2.8299 (2.7491) grad_norm 1.6279 (2.1573) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][410/625] eta 0:01:55 lr 0.000362 wd 0.0500 time 0.4725 (0.5379) data time 0.0010 (0.0032) model time 0.4715 (0.5347) loss 3.0216 (2.7493) grad_norm 2.9516 (2.1573) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][420/625] eta 0:01:49 lr 0.000362 wd 0.0500 time 0.4693 (0.5360) data time 0.0009 (0.0031) model time 0.4684 (0.5328) loss 3.2455 (2.7543) grad_norm 2.0457 (2.1682) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][430/625] eta 0:01:44 lr 0.000362 wd 0.0500 time 0.4764 (0.5342) data time 0.0008 (0.0030) model time 0.4756 (0.5311) loss 3.3646 (2.7585) grad_norm 2.3007 (2.1825) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][440/625] eta 0:01:38 lr 0.000362 wd 0.0500 time 0.4773 (0.5326) data time 0.0008 (0.0030) model time 0.4764 (0.5296) loss 3.4788 (2.7570) grad_norm 1.4671 (2.1697) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][450/625] eta 0:01:33 lr 0.000362 wd 0.0500 time 0.4736 (0.5317) data time 0.0011 (0.0029) model time 0.4725 (0.5288) loss 2.8549 (2.7552) grad_norm 1.4873 (2.1607) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][460/625] eta 0:01:27 lr 0.000362 wd 0.0500 time 0.4806 (0.5303) data time 0.0008 (0.0029) model time 0.4798 (0.5274) loss 3.6590 (2.7552) grad_norm 1.7262 (2.1496) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][470/625] eta 0:01:21 lr 0.000362 wd 0.0500 time 0.4724 (0.5289) data time 0.0009 (0.0028) model time 0.4716 (0.5260) loss 2.5850 (2.7515) grad_norm 1.9658 (2.1562) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][480/625] eta 0:01:16 lr 0.000362 wd 0.0500 time 0.4711 (0.5275) data time 0.0008 (0.0028) model time 0.4703 (0.5247) loss 3.5997 (2.7579) grad_norm 2.6779 (2.1533) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][490/625] eta 0:01:11 lr 0.000362 wd 0.0500 time 0.4750 (0.5262) data time 0.0011 (0.0028) model time 0.4739 (0.5234) loss 2.8578 (2.7612) grad_norm 1.9803 (2.1591) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][500/625] eta 0:01:05 lr 0.000362 wd 0.0500 time 0.4809 (0.5249) data time 0.0009 (0.0027) model time 0.4800 (0.5222) loss 2.8480 (2.7609) grad_norm 1.7555 (2.1856) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][510/625] eta 0:01:00 lr 0.000361 wd 0.0500 time 0.4808 (0.5238) data time 0.0008 (0.0027) model time 0.4799 (0.5211) loss 2.8541 (2.7666) grad_norm 2.1500 (2.1865) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][520/625] eta 0:00:54 lr 0.000361 wd 0.0500 time 0.4792 (0.5228) data time 0.0009 (0.0026) model time 0.4783 (0.5201) loss 2.6707 (2.7701) grad_norm 1.9963 (2.1797) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][530/625] eta 0:00:49 lr 0.000361 wd 0.0500 time 0.4747 (0.5218) data time 0.0011 (0.0026) model time 0.4736 (0.5191) loss 2.2305 (2.7697) grad_norm 1.5927 (2.1722) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][540/625] eta 0:00:44 lr 0.000361 wd 0.0500 time 0.4734 (0.5208) data time 0.0008 (0.0026) model time 0.4726 (0.5182) loss 2.3126 (2.7629) grad_norm 1.9500 (2.1688) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][550/625] eta 0:00:38 lr 0.000361 wd 0.0500 time 0.4689 (0.5197) data time 0.0011 (0.0026) model time 0.4678 (0.5172) loss 2.9530 (2.7582) grad_norm 1.8276 (2.1692) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][560/625] eta 0:00:33 lr 0.000361 wd 0.0500 time 0.4752 (0.5187) data time 0.0011 (0.0025) model time 0.4742 (0.5162) loss 3.0817 (2.7578) grad_norm 1.9402 (2.1684) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][570/625] eta 0:00:28 lr 0.000361 wd 0.0500 time 0.4694 (0.5178) data time 0.0012 (0.0025) model time 0.4682 (0.5153) loss 3.1597 (2.7615) grad_norm 2.1976 (2.1625) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][580/625] eta 0:00:23 lr 0.000361 wd 0.0500 time 0.4818 (0.5170) data time 0.0010 (0.0025) model time 0.4808 (0.5145) loss 2.8673 (2.7595) grad_norm 1.4379 (2.1583) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][590/625] eta 0:00:18 lr 0.000361 wd 0.0500 time 0.4795 (0.5162) data time 0.0009 (0.0024) model time 0.4786 (0.5138) loss 3.4533 (2.7646) grad_norm 3.7743 (2.1898) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][600/625] eta 0:00:12 lr 0.000361 wd 0.0500 time 0.4759 (0.5158) data time 0.0008 (0.0024) model time 0.4751 (0.5134) loss 1.7135 (2.7616) grad_norm 1.8621 (2.1930) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][610/625] eta 0:00:07 lr 0.000360 wd 0.0500 time 0.4722 (0.5150) data time 0.0008 (0.0024) model time 0.4714 (0.5126) loss 2.9977 (2.7602) grad_norm 1.7386 (2.1865) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [197/300][620/625] eta 0:00:02 lr 0.000360 wd 0.0500 time 0.4701 (0.5142) data time 0.0006 (0.0024) model time 0.4695 (0.5119) loss 2.0329 (2.7582) grad_norm 1.3816 (2.1797) loss_scale 512.0000 (512.0000) mem 16712MB [2024-08-10 19:54:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 197 training takes 0:04:40 [2024-08-10 19:54:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:54:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 19:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.508 (0.508) Loss 0.5200 (0.5200) Acc@1 88.428 (88.428) Acc@5 98.828 (98.828) Mem 16712MB [2024-08-10 19:54:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.161) Loss 0.7915 (0.6230) Acc@1 80.811 (86.346) Acc@5 96.240 (97.767) Mem 16712MB [2024-08-10 19:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9136 (0.7414) Acc@1 78.418 (83.426) Acc@5 95.068 (96.524) Mem 16712MB [2024-08-10 19:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.165 Acc@5 96.507 [2024-08-10 19:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 19:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.839 (0.839) Loss 0.4707 (0.4707) Acc@1 89.697 (89.697) Acc@5 98.877 (98.877) Mem 16712MB [2024-08-10 19:54:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.192) Loss 0.7446 (0.5837) Acc@1 81.885 (87.376) Acc@5 96.826 (97.971) Mem 16712MB [2024-08-10 19:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.8384 (0.6860) Acc@1 80.273 (84.645) Acc@5 95.898 (97.033) Mem 16712MB [2024-08-10 19:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.351 Acc@5 97.029 [2024-08-10 19:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 19:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.35% [2024-08-10 19:54:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 19:54:23 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 19:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][0/625] eta 0:10:33 lr 0.000360 wd 0.0500 time 1.0128 (1.0128) data time 0.4568 (0.4568) model time 0.0000 (0.0000) loss 2.9039 (2.9039) grad_norm 1.8928 (1.8928) loss_scale 256.0000 (256.0000) mem 16716MB [2024-08-10 19:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][10/625] eta 0:05:21 lr 0.000360 wd 0.0500 time 0.4753 (0.5231) data time 0.0010 (0.0425) model time 0.0000 (0.0000) loss 2.0894 (2.8547) grad_norm 1.8206 (1.7633) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][20/625] eta 0:05:08 lr 0.000360 wd 0.0500 time 0.4707 (0.5098) data time 0.0011 (0.0228) model time 0.0000 (0.0000) loss 3.0283 (2.9139) grad_norm 1.7434 (1.8320) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][30/625] eta 0:04:56 lr 0.000360 wd 0.0500 time 0.4736 (0.4986) data time 0.0009 (0.0158) model time 0.0000 (0.0000) loss 3.3392 (2.8812) grad_norm 2.3567 (2.0258) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:54:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][40/625] eta 0:04:48 lr 0.000360 wd 0.0500 time 0.4748 (0.4930) data time 0.0008 (0.0122) model time 0.0000 (0.0000) loss 3.0407 (2.8780) grad_norm 1.8559 (1.9680) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][50/625] eta 0:04:41 lr 0.000360 wd 0.0500 time 0.4747 (0.4894) data time 0.0009 (0.0100) model time 0.0000 (0.0000) loss 3.3485 (2.8521) grad_norm 1.3109 (1.9176) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:54:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][60/625] eta 0:04:35 lr 0.000360 wd 0.0500 time 0.4741 (0.4873) data time 0.0011 (0.0086) model time 0.4730 (0.4752) loss 3.3835 (2.8184) grad_norm 2.3821 (1.9499) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:54:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][70/625] eta 0:04:29 lr 0.000360 wd 0.0500 time 0.4760 (0.4857) data time 0.0011 (0.0075) model time 0.4749 (0.4750) loss 3.1676 (2.7979) grad_norm 2.0240 (1.9800) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][80/625] eta 0:04:23 lr 0.000360 wd 0.0500 time 0.4701 (0.4844) data time 0.0008 (0.0067) model time 0.4693 (0.4748) loss 2.2099 (2.7917) grad_norm 1.5185 (1.9768) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][90/625] eta 0:04:18 lr 0.000359 wd 0.0500 time 0.4724 (0.4835) data time 0.0009 (0.0061) model time 0.4715 (0.4747) loss 2.5833 (2.7828) grad_norm 1.6843 (1.9578) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][100/625] eta 0:04:13 lr 0.000359 wd 0.0500 time 0.4777 (0.4828) data time 0.0008 (0.0056) model time 0.4769 (0.4749) loss 2.9074 (2.7759) grad_norm 1.8707 (1.9403) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][110/625] eta 0:04:08 lr 0.000359 wd 0.0500 time 0.4749 (0.4828) data time 0.0011 (0.0052) model time 0.4738 (0.4761) loss 2.1217 (2.7602) grad_norm 2.1994 (1.9470) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][120/625] eta 0:04:03 lr 0.000359 wd 0.0500 time 0.4708 (0.4821) data time 0.0013 (0.0049) model time 0.4695 (0.4757) loss 2.6617 (2.7492) grad_norm 2.5665 (1.9692) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][130/625] eta 0:03:58 lr 0.000359 wd 0.0500 time 0.4737 (0.4817) data time 0.0009 (0.0046) model time 0.4728 (0.4756) loss 2.8180 (2.7545) grad_norm 1.8650 (1.9802) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][140/625] eta 0:03:53 lr 0.000359 wd 0.0500 time 0.4769 (0.4813) data time 0.0010 (0.0043) model time 0.4759 (0.4756) loss 3.0721 (2.7613) grad_norm 2.3414 (2.0026) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][150/625] eta 0:03:48 lr 0.000359 wd 0.0500 time 0.4752 (0.4808) data time 0.0010 (0.0041) model time 0.4742 (0.4754) loss 1.8844 (2.7421) grad_norm 1.5177 (1.9922) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][160/625] eta 0:03:43 lr 0.000359 wd 0.0500 time 0.4740 (0.4805) data time 0.0008 (0.0039) model time 0.4732 (0.4752) loss 1.8753 (2.7376) grad_norm 1.6424 (1.9803) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][170/625] eta 0:03:38 lr 0.000359 wd 0.0500 time 0.4751 (0.4802) data time 0.0012 (0.0038) model time 0.4740 (0.4751) loss 2.4019 (2.7233) grad_norm 2.2378 (2.0004) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][180/625] eta 0:03:33 lr 0.000359 wd 0.0500 time 0.4820 (0.4800) data time 0.0008 (0.0036) model time 0.4813 (0.4752) loss 1.8320 (2.7136) grad_norm 1.8338 (2.0141) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][190/625] eta 0:03:28 lr 0.000359 wd 0.0500 time 0.4715 (0.4796) data time 0.0008 (0.0035) model time 0.4707 (0.4750) loss 2.7485 (2.7298) grad_norm 2.7011 (2.0162) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][200/625] eta 0:03:23 lr 0.000358 wd 0.0500 time 0.4745 (0.4795) data time 0.0008 (0.0034) model time 0.4737 (0.4750) loss 2.8173 (2.7309) grad_norm 1.7819 (2.0504) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][210/625] eta 0:03:19 lr 0.000358 wd 0.0500 time 0.4773 (0.4800) data time 0.0011 (0.0032) model time 0.4762 (0.4759) loss 2.8479 (2.7188) grad_norm 1.9338 (2.0717) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][220/625] eta 0:03:14 lr 0.000358 wd 0.0500 time 0.4766 (0.4798) data time 0.0010 (0.0032) model time 0.4756 (0.4758) loss 2.1348 (2.7265) grad_norm 1.7966 (2.0702) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][230/625] eta 0:03:09 lr 0.000358 wd 0.0500 time 0.4741 (0.4796) data time 0.0011 (0.0031) model time 0.4730 (0.4757) loss 2.4821 (2.7317) grad_norm 2.1123 (2.0775) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][240/625] eta 0:03:04 lr 0.000358 wd 0.0500 time 0.4754 (0.4793) data time 0.0009 (0.0030) model time 0.4745 (0.4755) loss 3.0124 (2.7344) grad_norm 2.1345 (2.0945) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][250/625] eta 0:02:59 lr 0.000358 wd 0.0500 time 0.4764 (0.4792) data time 0.0008 (0.0029) model time 0.4756 (0.4755) loss 3.5069 (2.7392) grad_norm 2.7463 (2.1059) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][260/625] eta 0:02:54 lr 0.000358 wd 0.0500 time 0.4711 (0.4790) data time 0.0008 (0.0028) model time 0.4703 (0.4754) loss 2.0213 (2.7341) grad_norm 2.3924 (2.1198) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][270/625] eta 0:02:49 lr 0.000358 wd 0.0500 time 0.4782 (0.4789) data time 0.0011 (0.0028) model time 0.4771 (0.4753) loss 2.8148 (2.7344) grad_norm 6.6236 (2.1298) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][280/625] eta 0:02:45 lr 0.000358 wd 0.0500 time 0.4752 (0.4787) data time 0.0008 (0.0027) model time 0.4744 (0.4752) loss 2.4339 (2.7227) grad_norm 3.6116 (2.1486) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][290/625] eta 0:02:40 lr 0.000358 wd 0.0500 time 0.4750 (0.4785) data time 0.0010 (0.0026) model time 0.4740 (0.4751) loss 1.5093 (2.7208) grad_norm 1.5905 (2.2571) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][300/625] eta 0:02:35 lr 0.000357 wd 0.0500 time 0.4733 (0.4784) data time 0.0011 (0.0026) model time 0.4723 (0.4750) loss 3.2790 (2.7227) grad_norm 1.3260 (2.2392) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][310/625] eta 0:02:30 lr 0.000357 wd 0.0500 time 0.4697 (0.4782) data time 0.0011 (0.0025) model time 0.4686 (0.4749) loss 2.3638 (2.7242) grad_norm 1.8233 (2.2282) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:56:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][320/625] eta 0:02:25 lr 0.000357 wd 0.0500 time 0.4723 (0.4780) data time 0.0009 (0.0025) model time 0.4714 (0.4748) loss 2.4805 (2.7207) grad_norm 1.9281 (2.2262) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][330/625] eta 0:02:21 lr 0.000357 wd 0.0500 time 0.4718 (0.4784) data time 0.0008 (0.0024) model time 0.4709 (0.4754) loss 1.6200 (2.7174) grad_norm 3.6074 (2.2397) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][340/625] eta 0:02:16 lr 0.000357 wd 0.0500 time 0.4772 (0.4783) data time 0.0008 (0.0024) model time 0.4764 (0.4752) loss 2.4728 (2.7181) grad_norm 3.2979 (2.2475) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][350/625] eta 0:02:11 lr 0.000357 wd 0.0500 time 0.4721 (0.4781) data time 0.0008 (0.0024) model time 0.4714 (0.4751) loss 2.4725 (2.7171) grad_norm 1.4501 (2.2443) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][360/625] eta 0:02:06 lr 0.000357 wd 0.0500 time 0.4761 (0.4781) data time 0.0009 (0.0023) model time 0.4752 (0.4751) loss 1.9696 (2.7155) grad_norm 1.7219 (2.2291) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][370/625] eta 0:02:01 lr 0.000357 wd 0.0500 time 0.4756 (0.4780) data time 0.0011 (0.0023) model time 0.4745 (0.4752) loss 2.6266 (2.7175) grad_norm 1.3616 (2.2202) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][380/625] eta 0:01:57 lr 0.000357 wd 0.0500 time 0.4757 (0.4780) data time 0.0011 (0.0023) model time 0.4746 (0.4752) loss 3.1036 (2.7246) grad_norm 3.5204 (2.2124) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][390/625] eta 0:01:52 lr 0.000357 wd 0.0500 time 0.4771 (0.4780) data time 0.0011 (0.0022) model time 0.4760 (0.4752) loss 3.4332 (2.7262) grad_norm 1.8193 (2.2139) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][400/625] eta 0:01:47 lr 0.000356 wd 0.0500 time 0.4765 (0.4779) data time 0.0008 (0.0022) model time 0.4757 (0.4752) loss 2.0579 (2.7253) grad_norm 1.7253 (2.2139) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][410/625] eta 0:01:42 lr 0.000356 wd 0.0500 time 0.4716 (0.4779) data time 0.0011 (0.0022) model time 0.4705 (0.4752) loss 2.3423 (2.7207) grad_norm 1.9296 (2.2060) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][420/625] eta 0:01:37 lr 0.000356 wd 0.0500 time 0.4793 (0.4779) data time 0.0011 (0.0021) model time 0.4782 (0.4753) loss 2.8481 (2.7225) grad_norm 2.1955 (2.1997) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][430/625] eta 0:01:33 lr 0.000356 wd 0.0500 time 0.4717 (0.4783) data time 0.0012 (0.0021) model time 0.4705 (0.4758) loss 1.8388 (2.7268) grad_norm 1.5598 (2.1941) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][440/625] eta 0:01:28 lr 0.000356 wd 0.0500 time 0.4748 (0.4783) data time 0.0009 (0.0021) model time 0.4739 (0.4758) loss 3.4881 (2.7278) grad_norm 5.5021 (2.2011) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][450/625] eta 0:01:23 lr 0.000356 wd 0.0500 time 0.4741 (0.4782) data time 0.0009 (0.0021) model time 0.4732 (0.4757) loss 2.2261 (2.7297) grad_norm 2.8128 (2.2234) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][460/625] eta 0:01:18 lr 0.000356 wd 0.0500 time 0.4742 (0.4781) data time 0.0011 (0.0021) model time 0.4731 (0.4757) loss 3.2528 (2.7313) grad_norm 1.2274 (2.2160) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][470/625] eta 0:01:14 lr 0.000356 wd 0.0500 time 0.4721 (0.4781) data time 0.0011 (0.0020) model time 0.4711 (0.4756) loss 2.0164 (2.7335) grad_norm 2.4453 (2.2115) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][480/625] eta 0:01:09 lr 0.000356 wd 0.0500 time 0.4729 (0.4784) data time 0.0010 (0.0020) model time 0.4719 (0.4759) loss 2.7517 (2.7308) grad_norm 2.5072 (2.2433) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][490/625] eta 0:01:04 lr 0.000356 wd 0.0500 time 0.4748 (0.4783) data time 0.0011 (0.0020) model time 0.4737 (0.4759) loss 2.8955 (2.7317) grad_norm 2.1373 (2.2491) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][500/625] eta 0:00:59 lr 0.000356 wd 0.0500 time 0.4747 (0.4782) data time 0.0009 (0.0020) model time 0.4738 (0.4759) loss 2.7320 (2.7334) grad_norm 2.0681 (2.2439) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][510/625] eta 0:00:54 lr 0.000355 wd 0.0500 time 0.4727 (0.4782) data time 0.0010 (0.0020) model time 0.4717 (0.4759) loss 3.2592 (2.7349) grad_norm 2.2574 (2.2473) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][520/625] eta 0:00:50 lr 0.000355 wd 0.0500 time 0.4777 (0.4782) data time 0.0011 (0.0020) model time 0.4765 (0.4759) loss 2.5233 (2.7346) grad_norm 1.7233 (2.2777) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][530/625] eta 0:00:45 lr 0.000355 wd 0.0500 time 0.4738 (0.4781) data time 0.0011 (0.0019) model time 0.4728 (0.4758) loss 3.2077 (2.7327) grad_norm 6.3071 (2.2942) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][540/625] eta 0:00:40 lr 0.000355 wd 0.0500 time 0.4799 (0.4781) data time 0.0010 (0.0019) model time 0.4789 (0.4758) loss 2.5156 (2.7314) grad_norm 2.2221 (2.2954) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][550/625] eta 0:00:35 lr 0.000355 wd 0.0500 time 0.4738 (0.4780) data time 0.0009 (0.0019) model time 0.4729 (0.4758) loss 3.0559 (2.7321) grad_norm 2.0359 (2.2932) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][560/625] eta 0:00:31 lr 0.000355 wd 0.0500 time 0.4764 (0.4780) data time 0.0008 (0.0019) model time 0.4756 (0.4758) loss 3.1302 (2.7322) grad_norm 1.8302 (2.2866) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][570/625] eta 0:00:26 lr 0.000355 wd 0.0500 time 0.4746 (0.4783) data time 0.0010 (0.0019) model time 0.4735 (0.4762) loss 2.9566 (2.7336) grad_norm 2.1968 (2.2792) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][580/625] eta 0:00:21 lr 0.000355 wd 0.0500 time 0.4712 (0.4783) data time 0.0009 (0.0019) model time 0.4703 (0.4761) loss 2.9616 (2.7356) grad_norm 1.5111 (2.2819) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:59:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][590/625] eta 0:00:16 lr 0.000355 wd 0.0500 time 0.4763 (0.4782) data time 0.0011 (0.0019) model time 0.4752 (0.4761) loss 2.4144 (2.7354) grad_norm 1.8975 (2.2765) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][600/625] eta 0:00:11 lr 0.000355 wd 0.0500 time 0.4734 (0.4782) data time 0.0009 (0.0019) model time 0.4726 (0.4760) loss 2.8901 (2.7377) grad_norm 1.3902 (2.2663) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][610/625] eta 0:00:07 lr 0.000354 wd 0.0500 time 0.4750 (0.4781) data time 0.0005 (0.0018) model time 0.4745 (0.4760) loss 2.7334 (2.7371) grad_norm 3.0117 (2.2918) loss_scale 256.0000 (256.0000) mem 16717MB [2024-08-10 19:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 19:59:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 19:59:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 20:00:53 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 20:01:06 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 20:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 20:01:14 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 20:01:17 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 20:01:19 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 20:01:19 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 198) [2024-08-10 20:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 20:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [198/300][620/625] eta 0:00:13 lr 0.000354 wd 0.0500 time 0.4420 (2.6531) data time 0.0004 (0.0768) model time 0.4417 (2.5763) loss 2.7390 (3.1256) grad_norm 1.3771 (2.2052) loss_scale 256.0000 (256.0000) mem 16678MB [2024-08-10 20:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 198 training takes 0:00:22 [2024-08-10 20:01:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:01:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5024 (0.5024) Acc@1 88.867 (88.867) Acc@5 98.682 (98.682) Mem 16678MB [2024-08-10 20:01:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8252 (0.6224) Acc@1 78.906 (86.266) Acc@5 96.094 (97.727) Mem 16678MB [2024-08-10 20:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 0.9141 (0.7340) Acc@1 78.271 (83.626) Acc@5 95.020 (96.540) Mem 16678MB [2024-08-10 20:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.315 Acc@5 96.505 [2024-08-10 20:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 20:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.32% [2024-08-10 20:01:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 20:02:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 20:02:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.460 (0.460) Loss 0.4702 (0.4702) Acc@1 89.697 (89.697) Acc@5 98.877 (98.877) Mem 16678MB [2024-08-10 20:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 0.7427 (0.5835) Acc@1 81.885 (87.318) Acc@5 96.924 (98.002) Mem 16678MB [2024-08-10 20:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.132) Loss 0.8374 (0.6860) Acc@1 80.176 (84.608) Acc@5 95.898 (97.063) Mem 16678MB [2024-08-10 20:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.325 Acc@5 97.065 [2024-08-10 20:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 20:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.33% [2024-08-10 20:02:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 20:02:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 20:02:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][0/625] eta 0:09:22 lr 0.000354 wd 0.0500 time 0.8992 (0.8992) data time 0.4288 (0.4288) model time 0.0000 (0.0000) loss 3.1354 (3.1354) grad_norm 1.4391 (1.4391) loss_scale 256.0000 (256.0000) mem 16691MB [2024-08-10 20:02:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][10/625] eta 0:04:57 lr 0.000354 wd 0.0500 time 0.4442 (0.4840) data time 0.0007 (0.0398) model time 0.0000 (0.0000) loss 3.3248 (3.1049) grad_norm 1.7365 (1.6453) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][20/625] eta 0:04:41 lr 0.000354 wd 0.0500 time 0.4405 (0.4654) data time 0.0006 (0.0212) model time 0.0000 (0.0000) loss 3.3116 (3.0085) grad_norm 1.6088 (1.6583) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][30/625] eta 0:04:38 lr 0.000354 wd 0.0500 time 0.4436 (0.4678) data time 0.0010 (0.0147) model time 0.0000 (0.0000) loss 2.8492 (2.9225) grad_norm 2.0659 (2.1481) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][40/625] eta 0:04:30 lr 0.000354 wd 0.0500 time 0.4453 (0.4624) data time 0.0008 (0.0113) model time 0.0000 (0.0000) loss 2.8683 (2.9279) grad_norm 2.0579 (2.1644) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][50/625] eta 0:04:23 lr 0.000354 wd 0.0500 time 0.4427 (0.4588) data time 0.0009 (0.0093) model time 0.0000 (0.0000) loss 2.2742 (2.8853) grad_norm 1.4152 (2.1177) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][60/625] eta 0:04:17 lr 0.000354 wd 0.0500 time 0.4452 (0.4566) data time 0.0008 (0.0079) model time 0.4445 (0.4442) loss 2.8336 (2.8415) grad_norm 2.1225 (2.0809) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][70/625] eta 0:04:12 lr 0.000354 wd 0.0500 time 0.4460 (0.4550) data time 0.0006 (0.0069) model time 0.4454 (0.4444) loss 2.0274 (2.8084) grad_norm 2.7884 (2.1008) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][80/625] eta 0:04:07 lr 0.000354 wd 0.0500 time 0.4448 (0.4539) data time 0.0007 (0.0062) model time 0.4441 (0.4446) loss 2.8546 (2.7992) grad_norm 1.1662 (2.0604) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][90/625] eta 0:04:02 lr 0.000353 wd 0.0500 time 0.4443 (0.4531) data time 0.0007 (0.0056) model time 0.4436 (0.4447) loss 3.2992 (2.8180) grad_norm 1.4069 (2.0177) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][100/625] eta 0:03:57 lr 0.000353 wd 0.0500 time 0.4475 (0.4525) data time 0.0006 (0.0051) model time 0.4469 (0.4452) loss 2.7259 (2.8049) grad_norm 1.5272 (1.9855) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][110/625] eta 0:03:52 lr 0.000353 wd 0.0500 time 0.4409 (0.4518) data time 0.0008 (0.0048) model time 0.4401 (0.4449) loss 2.3776 (2.8124) grad_norm 3.4514 (2.0845) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][120/625] eta 0:03:47 lr 0.000353 wd 0.0500 time 0.4484 (0.4515) data time 0.0008 (0.0044) model time 0.4476 (0.4452) loss 3.0383 (2.8120) grad_norm 2.3775 (2.0998) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][130/625] eta 0:03:43 lr 0.000353 wd 0.0500 time 0.4480 (0.4510) data time 0.0008 (0.0042) model time 0.4472 (0.4452) loss 3.1832 (2.8074) grad_norm 2.7913 (2.0877) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][140/625] eta 0:03:38 lr 0.000353 wd 0.0500 time 0.4482 (0.4508) data time 0.0006 (0.0039) model time 0.4475 (0.4453) loss 2.9498 (2.7996) grad_norm 2.0863 (2.0694) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][150/625] eta 0:03:34 lr 0.000353 wd 0.0500 time 0.4453 (0.4515) data time 0.0007 (0.0037) model time 0.4447 (0.4468) loss 2.4670 (2.7978) grad_norm 1.4523 (2.0677) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][160/625] eta 0:03:29 lr 0.000353 wd 0.0500 time 0.4461 (0.4512) data time 0.0008 (0.0036) model time 0.4453 (0.4468) loss 3.2002 (2.7940) grad_norm 1.9754 (2.0835) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][170/625] eta 0:03:25 lr 0.000353 wd 0.0500 time 0.4458 (0.4510) data time 0.0009 (0.0034) model time 0.4449 (0.4467) loss 3.3179 (2.7883) grad_norm 2.4889 (2.0648) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][180/625] eta 0:03:20 lr 0.000353 wd 0.0500 time 0.4467 (0.4507) data time 0.0006 (0.0033) model time 0.4461 (0.4466) loss 2.9227 (2.7852) grad_norm 6.4820 (2.0805) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][190/625] eta 0:03:15 lr 0.000352 wd 0.0500 time 0.4454 (0.4505) data time 0.0007 (0.0031) model time 0.4447 (0.4465) loss 1.9098 (2.7612) grad_norm 1.7545 (2.1115) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][200/625] eta 0:03:11 lr 0.000352 wd 0.0500 time 0.4447 (0.4503) data time 0.0008 (0.0030) model time 0.4438 (0.4465) loss 1.7475 (2.7528) grad_norm 3.6392 (2.2450) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][210/625] eta 0:03:06 lr 0.000352 wd 0.0500 time 0.4458 (0.4501) data time 0.0007 (0.0029) model time 0.4451 (0.4464) loss 1.9643 (2.7530) grad_norm 1.9914 (2.2738) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][220/625] eta 0:03:02 lr 0.000352 wd 0.0500 time 0.4476 (0.4500) data time 0.0009 (0.0028) model time 0.4467 (0.4464) loss 2.1663 (2.7519) grad_norm 1.8107 (2.2689) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][230/625] eta 0:02:57 lr 0.000352 wd 0.0500 time 0.4451 (0.4498) data time 0.0008 (0.0028) model time 0.4443 (0.4464) loss 3.3122 (2.7599) grad_norm 2.8678 (2.2561) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:03:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][240/625] eta 0:02:53 lr 0.000352 wd 0.0500 time 0.4485 (0.4497) data time 0.0009 (0.0027) model time 0.4476 (0.4464) loss 3.2802 (2.7535) grad_norm 2.8353 (2.2494) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][250/625] eta 0:02:49 lr 0.000352 wd 0.0500 time 0.4481 (0.4509) data time 0.0007 (0.0026) model time 0.4474 (0.4479) loss 2.9442 (2.7416) grad_norm 1.4689 (2.2466) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][260/625] eta 0:02:44 lr 0.000352 wd 0.0500 time 0.4460 (0.4507) data time 0.0007 (0.0025) model time 0.4454 (0.4478) loss 3.0419 (2.7352) grad_norm 2.0141 (2.2400) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][270/625] eta 0:02:39 lr 0.000352 wd 0.0500 time 0.4543 (0.4506) data time 0.0008 (0.0025) model time 0.4535 (0.4478) loss 3.1851 (2.7349) grad_norm 2.7013 (2.2905) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][280/625] eta 0:02:35 lr 0.000352 wd 0.0500 time 0.4423 (0.4504) data time 0.0009 (0.0024) model time 0.4414 (0.4477) loss 2.4476 (2.7291) grad_norm 1.4255 (2.2885) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][290/625] eta 0:02:30 lr 0.000351 wd 0.0500 time 0.4485 (0.4503) data time 0.0006 (0.0024) model time 0.4479 (0.4475) loss 2.5044 (2.7232) grad_norm 1.7783 (2.2770) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][300/625] eta 0:02:26 lr 0.000351 wd 0.0500 time 0.4501 (0.4502) data time 0.0006 (0.0023) model time 0.4495 (0.4475) loss 3.7191 (2.7255) grad_norm 1.6125 (2.2564) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][310/625] eta 0:02:21 lr 0.000351 wd 0.0500 time 0.4442 (0.4501) data time 0.0008 (0.0023) model time 0.4435 (0.4475) loss 3.0385 (2.7325) grad_norm 1.4969 (2.2402) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][320/625] eta 0:02:17 lr 0.000351 wd 0.0500 time 0.4491 (0.4501) data time 0.0007 (0.0022) model time 0.4484 (0.4475) loss 3.0050 (2.7284) grad_norm 1.7835 (2.2366) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][330/625] eta 0:02:12 lr 0.000351 wd 0.0500 time 0.4481 (0.4500) data time 0.0009 (0.0022) model time 0.4472 (0.4475) loss 2.9134 (2.7307) grad_norm 2.2494 (2.2338) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][340/625] eta 0:02:08 lr 0.000351 wd 0.0500 time 0.4462 (0.4500) data time 0.0006 (0.0022) model time 0.4456 (0.4475) loss 2.9721 (2.7350) grad_norm 1.7359 (2.2184) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][350/625] eta 0:02:03 lr 0.000351 wd 0.0500 time 0.4439 (0.4499) data time 0.0007 (0.0021) model time 0.4432 (0.4475) loss 2.7138 (2.7344) grad_norm 1.8220 (2.2180) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][360/625] eta 0:01:59 lr 0.000351 wd 0.0500 time 0.4435 (0.4498) data time 0.0008 (0.0021) model time 0.4427 (0.4474) loss 3.2605 (2.7314) grad_norm 1.9298 (2.2228) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:04:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][370/625] eta 0:01:54 lr 0.000351 wd 0.0500 time 0.4470 (0.4501) data time 0.0011 (0.0021) model time 0.4459 (0.4478) loss 2.0123 (2.7286) grad_norm 2.0465 (2.2185) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][380/625] eta 0:01:50 lr 0.000351 wd 0.0500 time 0.4484 (0.4500) data time 0.0008 (0.0020) model time 0.4476 (0.4478) loss 2.8218 (2.7272) grad_norm 2.0882 (2.2100) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][390/625] eta 0:01:45 lr 0.000351 wd 0.0500 time 0.5902 (0.4509) data time 0.0009 (0.0020) model time 0.5893 (0.4488) loss 2.5210 (2.7340) grad_norm 4.4349 (2.2287) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][400/625] eta 0:01:41 lr 0.000350 wd 0.0500 time 0.4390 (0.4508) data time 0.0010 (0.0020) model time 0.4380 (0.4487) loss 3.1646 (2.7385) grad_norm 1.5666 (2.2273) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][410/625] eta 0:01:36 lr 0.000350 wd 0.0500 time 0.4487 (0.4507) data time 0.0007 (0.0020) model time 0.4480 (0.4486) loss 1.8693 (2.7358) grad_norm 1.5812 (2.2319) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][420/625] eta 0:01:32 lr 0.000350 wd 0.0500 time 0.4485 (0.4506) data time 0.0006 (0.0019) model time 0.4479 (0.4486) loss 3.3869 (2.7447) grad_norm 1.5744 (2.2223) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][430/625] eta 0:01:27 lr 0.000350 wd 0.0500 time 0.4481 (0.4506) data time 0.0007 (0.0019) model time 0.4474 (0.4485) loss 3.1358 (2.7474) grad_norm 2.0820 (2.2177) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][440/625] eta 0:01:23 lr 0.000350 wd 0.0500 time 0.4463 (0.4505) data time 0.0007 (0.0019) model time 0.4456 (0.4485) loss 2.1853 (2.7457) grad_norm 2.4011 (2.2145) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][450/625] eta 0:01:18 lr 0.000350 wd 0.0500 time 0.4480 (0.4504) data time 0.0007 (0.0019) model time 0.4474 (0.4484) loss 2.7969 (2.7445) grad_norm 2.5541 (2.2459) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][460/625] eta 0:01:14 lr 0.000350 wd 0.0500 time 0.4438 (0.4503) data time 0.0007 (0.0018) model time 0.4431 (0.4483) loss 2.4532 (2.7411) grad_norm 1.7273 (2.2521) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][470/625] eta 0:01:09 lr 0.000350 wd 0.0500 time 0.4446 (0.4502) data time 0.0008 (0.0018) model time 0.4438 (0.4482) loss 3.2473 (2.7444) grad_norm 1.6257 (2.2490) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][480/625] eta 0:01:05 lr 0.000350 wd 0.0500 time 0.4453 (0.4501) data time 0.0006 (0.0018) model time 0.4447 (0.4482) loss 2.1384 (2.7466) grad_norm 3.1097 (2.2479) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][490/625] eta 0:01:00 lr 0.000350 wd 0.0500 time 0.4498 (0.4501) data time 0.0007 (0.0018) model time 0.4491 (0.4481) loss 2.0779 (2.7448) grad_norm 1.3976 (2.2372) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][500/625] eta 0:00:56 lr 0.000349 wd 0.0500 time 0.4440 (0.4500) data time 0.0007 (0.0018) model time 0.4433 (0.4481) loss 3.3389 (2.7514) grad_norm 1.3088 (2.2360) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:05:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][510/625] eta 0:00:51 lr 0.000349 wd 0.0500 time 0.4482 (0.4500) data time 0.0008 (0.0018) model time 0.4474 (0.4480) loss 2.9829 (2.7479) grad_norm 2.8291 (2.2299) loss_scale 256.0000 (256.0000) mem 16689MB [2024-08-10 20:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 20:06:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:06:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 20:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 20:08:07 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 20:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 20:08:19 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 20:08:22 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 20:08:24 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 20:08:24 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 199) [2024-08-10 20:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 20:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][520/625] eta 0:05:06 lr 0.000349 wd 0.0500 time 0.4704 (2.9188) data time 0.0008 (0.1066) model time 0.4696 (2.8122) loss 2.9380 (3.0556) grad_norm 1.1865 (1.8363) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][530/625] eta 0:02:28 lr 0.000349 wd 0.0500 time 0.4733 (1.5594) data time 0.0009 (0.0480) model time 0.4724 (1.5114) loss 2.9231 (2.9196) grad_norm 2.9040 (2.3508) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][540/625] eta 0:01:39 lr 0.000349 wd 0.0500 time 0.4644 (1.1690) data time 0.0011 (0.0313) model time 0.4633 (1.1377) loss 2.9485 (2.9488) grad_norm 2.4591 (4.0677) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][550/625] eta 0:01:14 lr 0.000349 wd 0.0500 time 0.4065 (0.9967) data time 0.0012 (0.0233) model time 0.4053 (0.9734) loss 3.0391 (2.9284) grad_norm 3.6219 (3.7192) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][560/625] eta 0:00:57 lr 0.000349 wd 0.0500 time 0.4717 (0.8860) data time 0.0008 (0.0187) model time 0.4709 (0.8672) loss 2.7658 (2.8869) grad_norm 1.5179 (3.3612) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][570/625] eta 0:00:44 lr 0.000349 wd 0.0500 time 0.4717 (0.8140) data time 0.0007 (0.0157) model time 0.4710 (0.7983) loss 2.4223 (2.8816) grad_norm 1.7175 (3.1118) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][580/625] eta 0:00:34 lr 0.000349 wd 0.0500 time 0.4717 (0.7636) data time 0.0007 (0.0135) model time 0.4710 (0.7501) loss 1.9104 (2.8643) grad_norm 1.7048 (2.9700) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][590/625] eta 0:00:25 lr 0.000349 wd 0.0500 time 0.4803 (0.7265) data time 0.0008 (0.0119) model time 0.4795 (0.7146) loss 2.1661 (2.8341) grad_norm 1.8279 (2.8672) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][600/625] eta 0:00:17 lr 0.000349 wd 0.0500 time 0.4715 (0.6976) data time 0.0010 (0.0107) model time 0.4704 (0.6869) loss 3.0123 (2.8185) grad_norm 2.1372 (2.8370) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][610/625] eta 0:00:10 lr 0.000348 wd 0.0500 time 0.4651 (0.6741) data time 0.0005 (0.0097) model time 0.4646 (0.6644) loss 3.6008 (2.8232) grad_norm 2.2891 (2.7609) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [199/300][620/625] eta 0:00:03 lr 0.000348 wd 0.0500 time 0.4614 (0.6546) data time 0.0005 (0.0089) model time 0.4608 (0.6457) loss 2.2254 (2.8338) grad_norm 1.9263 (2.7202) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 20:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 199 training takes 0:01:12 [2024-08-10 20:09:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:09:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:09:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.5200 (0.5200) Acc@1 88.379 (88.379) Acc@5 98.682 (98.682) Mem 16695MB [2024-08-10 20:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.160) Loss 0.8184 (0.6264) Acc@1 79.346 (86.128) Acc@5 95.898 (97.749) Mem 16695MB [2024-08-10 20:09:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.140) Loss 0.9092 (0.7394) Acc@1 78.955 (83.403) Acc@5 94.971 (96.556) Mem 16695MB [2024-08-10 20:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.145 Acc@5 96.537 [2024-08-10 20:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 20:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.986 (0.986) Loss 0.4702 (0.4702) Acc@1 89.795 (89.795) Acc@5 98.877 (98.877) Mem 16695MB [2024-08-10 20:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.203) Loss 0.7427 (0.5838) Acc@1 82.031 (87.340) Acc@5 96.973 (97.994) Mem 16695MB [2024-08-10 20:09:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.162) Loss 0.8369 (0.6863) Acc@1 80.273 (84.645) Acc@5 95.947 (97.070) Mem 16695MB [2024-08-10 20:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.357 Acc@5 97.055 [2024-08-10 20:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.36% [2024-08-10 20:09:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 20:10:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 20:10:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][0/625] eta 0:10:09 lr 0.000348 wd 0.0500 time 0.9753 (0.9753) data time 0.4249 (0.4249) model time 0.0000 (0.0000) loss 2.6622 (2.6622) grad_norm 1.7407 (1.7407) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 20:10:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][10/625] eta 0:05:16 lr 0.000348 wd 0.0500 time 0.4790 (0.5151) data time 0.0010 (0.0396) model time 0.0000 (0.0000) loss 2.9688 (2.8252) grad_norm 2.9745 (1.9601) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][20/625] eta 0:04:58 lr 0.000348 wd 0.0500 time 0.4689 (0.4940) data time 0.0010 (0.0212) model time 0.0000 (0.0000) loss 2.9585 (2.8006) grad_norm 1.4566 (2.0431) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][30/625] eta 0:04:49 lr 0.000348 wd 0.0500 time 0.4648 (0.4870) data time 0.0010 (0.0147) model time 0.0000 (0.0000) loss 3.1294 (2.7302) grad_norm 1.9476 (2.0110) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][40/625] eta 0:04:42 lr 0.000348 wd 0.0500 time 0.4687 (0.4831) data time 0.0008 (0.0114) model time 0.0000 (0.0000) loss 3.1879 (2.7327) grad_norm 2.2785 (1.9990) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][50/625] eta 0:04:36 lr 0.000348 wd 0.0500 time 0.4655 (0.4809) data time 0.0008 (0.0094) model time 0.0000 (0.0000) loss 2.3132 (2.7483) grad_norm 1.9524 (2.0059) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][60/625] eta 0:04:30 lr 0.000348 wd 0.0500 time 0.4719 (0.4791) data time 0.0011 (0.0080) model time 0.4709 (0.4692) loss 2.7075 (2.7441) grad_norm 1.7843 (2.5588) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][70/625] eta 0:04:25 lr 0.000348 wd 0.0500 time 0.4673 (0.4777) data time 0.0010 (0.0070) model time 0.4663 (0.4684) loss 3.1005 (2.7253) grad_norm 1.6651 (2.4827) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][80/625] eta 0:04:20 lr 0.000348 wd 0.0500 time 0.4679 (0.4788) data time 0.0008 (0.0063) model time 0.4672 (0.4743) loss 2.9108 (2.7222) grad_norm 2.0489 (2.4602) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][90/625] eta 0:04:15 lr 0.000347 wd 0.0500 time 0.4683 (0.4781) data time 0.0008 (0.0057) model time 0.4675 (0.4734) loss 1.9678 (2.6856) grad_norm 1.9829 (2.4436) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][100/625] eta 0:04:10 lr 0.000347 wd 0.0500 time 0.4738 (0.4778) data time 0.0010 (0.0052) model time 0.4728 (0.4736) loss 1.9621 (2.6807) grad_norm 1.7514 (2.3768) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][110/625] eta 0:04:06 lr 0.000347 wd 0.0500 time 0.6912 (0.4793) data time 0.0007 (0.0049) model time 0.6904 (0.4769) loss 2.2377 (2.6870) grad_norm 1.6357 (2.3783) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:10:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][120/625] eta 0:04:01 lr 0.000347 wd 0.0500 time 0.4687 (0.4786) data time 0.0009 (0.0045) model time 0.4678 (0.4759) loss 2.5003 (2.6814) grad_norm 2.1397 (2.3533) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][130/625] eta 0:03:56 lr 0.000347 wd 0.0500 time 0.4656 (0.4777) data time 0.0010 (0.0043) model time 0.4646 (0.4746) loss 2.8625 (2.6904) grad_norm 2.0345 (2.3280) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][140/625] eta 0:03:51 lr 0.000347 wd 0.0500 time 0.4739 (0.4772) data time 0.0011 (0.0040) model time 0.4728 (0.4741) loss 2.8502 (2.6713) grad_norm 1.8271 (2.2871) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][150/625] eta 0:03:46 lr 0.000347 wd 0.0500 time 0.4741 (0.4768) data time 0.0008 (0.0038) model time 0.4734 (0.4736) loss 2.5298 (2.6572) grad_norm 1.2812 (2.2689) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][160/625] eta 0:03:41 lr 0.000347 wd 0.0500 time 0.4678 (0.4764) data time 0.0008 (0.0037) model time 0.4671 (0.4733) loss 3.4072 (2.6575) grad_norm 3.0742 (2.2446) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][170/625] eta 0:03:36 lr 0.000347 wd 0.0500 time 0.4826 (0.4762) data time 0.0010 (0.0035) model time 0.4816 (0.4732) loss 3.2277 (2.6667) grad_norm 1.8793 (2.2245) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][180/625] eta 0:03:31 lr 0.000347 wd 0.0500 time 0.4817 (0.4761) data time 0.0011 (0.0034) model time 0.4806 (0.4732) loss 2.5552 (2.6628) grad_norm 2.8175 (2.2513) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][190/625] eta 0:03:27 lr 0.000346 wd 0.0500 time 0.4711 (0.4759) data time 0.0007 (0.0033) model time 0.4704 (0.4730) loss 2.9099 (2.6592) grad_norm 1.9081 (2.2490) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][200/625] eta 0:03:22 lr 0.000346 wd 0.0500 time 0.4676 (0.4755) data time 0.0007 (0.0031) model time 0.4668 (0.4727) loss 3.5533 (2.6652) grad_norm 1.6589 (2.2539) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][210/625] eta 0:03:17 lr 0.000346 wd 0.0500 time 0.4680 (0.4754) data time 0.0011 (0.0030) model time 0.4669 (0.4726) loss 2.9308 (2.6856) grad_norm 1.9284 (2.2459) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][220/625] eta 0:03:12 lr 0.000346 wd 0.0500 time 0.4632 (0.4751) data time 0.0011 (0.0030) model time 0.4621 (0.4723) loss 3.2381 (2.6894) grad_norm 1.9009 (2.2667) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][230/625] eta 0:03:07 lr 0.000346 wd 0.0500 time 0.4748 (0.4748) data time 0.0010 (0.0029) model time 0.4739 (0.4720) loss 2.6035 (2.6915) grad_norm 1.8636 (2.2556) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][240/625] eta 0:03:02 lr 0.000346 wd 0.0500 time 0.4654 (0.4745) data time 0.0007 (0.0028) model time 0.4647 (0.4718) loss 2.9197 (2.6942) grad_norm 2.0609 (2.2518) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][250/625] eta 0:02:57 lr 0.000346 wd 0.0500 time 0.4689 (0.4743) data time 0.0008 (0.0027) model time 0.4681 (0.4716) loss 3.0220 (2.6960) grad_norm 1.3266 (2.2435) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][260/625] eta 0:02:53 lr 0.000346 wd 0.0500 time 0.4760 (0.4740) data time 0.0008 (0.0027) model time 0.4752 (0.4714) loss 3.0816 (2.6929) grad_norm 2.5039 (2.2412) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][270/625] eta 0:02:48 lr 0.000346 wd 0.0500 time 0.4681 (0.4744) data time 0.0012 (0.0026) model time 0.4669 (0.4718) loss 1.7034 (2.6876) grad_norm 1.5213 (2.2474) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][280/625] eta 0:02:43 lr 0.000346 wd 0.0500 time 0.4675 (0.4742) data time 0.0011 (0.0026) model time 0.4665 (0.4716) loss 3.3636 (2.6856) grad_norm 1.7207 (2.2367) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][290/625] eta 0:02:38 lr 0.000345 wd 0.0500 time 0.4697 (0.4740) data time 0.0010 (0.0025) model time 0.4687 (0.4715) loss 2.4430 (2.6886) grad_norm 1.8324 (2.2373) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][300/625] eta 0:02:34 lr 0.000345 wd 0.0500 time 0.4719 (0.4739) data time 0.0009 (0.0025) model time 0.4709 (0.4714) loss 2.8782 (2.6948) grad_norm 1.8267 (2.2274) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][310/625] eta 0:02:29 lr 0.000345 wd 0.0500 time 0.4731 (0.4738) data time 0.0008 (0.0024) model time 0.4723 (0.4714) loss 2.2875 (2.6922) grad_norm 2.0123 (2.2238) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][320/625] eta 0:02:24 lr 0.000345 wd 0.0500 time 0.4843 (0.4738) data time 0.0008 (0.0024) model time 0.4835 (0.4715) loss 3.4865 (2.7034) grad_norm 2.2856 (2.2345) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][330/625] eta 0:02:19 lr 0.000345 wd 0.0500 time 0.4718 (0.4738) data time 0.0007 (0.0023) model time 0.4711 (0.4715) loss 2.7811 (2.7064) grad_norm 1.8384 (2.2273) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][340/625] eta 0:02:15 lr 0.000345 wd 0.0500 time 0.4653 (0.4738) data time 0.0007 (0.0023) model time 0.4646 (0.4715) loss 1.7962 (2.7030) grad_norm 1.9090 (2.2149) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][350/625] eta 0:02:10 lr 0.000345 wd 0.0500 time 0.4746 (0.4737) data time 0.0008 (0.0022) model time 0.4738 (0.4714) loss 2.5263 (2.6979) grad_norm 2.7753 (2.2153) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][360/625] eta 0:02:05 lr 0.000345 wd 0.0500 time 0.4686 (0.4735) data time 0.0008 (0.0022) model time 0.4678 (0.4713) loss 2.6574 (2.6975) grad_norm 1.6329 (2.2176) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][370/625] eta 0:02:00 lr 0.000345 wd 0.0500 time 0.4676 (0.4734) data time 0.0008 (0.0022) model time 0.4669 (0.4712) loss 3.1463 (2.6987) grad_norm 2.4491 (2.2194) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][380/625] eta 0:01:55 lr 0.000345 wd 0.0500 time 0.4703 (0.4734) data time 0.0008 (0.0022) model time 0.4695 (0.4712) loss 2.4922 (2.7033) grad_norm 2.0739 (2.2270) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][390/625] eta 0:01:51 lr 0.000345 wd 0.0500 time 0.4780 (0.4734) data time 0.0008 (0.0021) model time 0.4772 (0.4712) loss 2.1861 (2.7009) grad_norm 1.5085 (2.2193) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][400/625] eta 0:01:46 lr 0.000344 wd 0.0500 time 0.4703 (0.4734) data time 0.0008 (0.0021) model time 0.4695 (0.4712) loss 3.0847 (2.7058) grad_norm 2.3340 (2.2688) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][410/625] eta 0:01:41 lr 0.000344 wd 0.0500 time 0.4738 (0.4734) data time 0.0011 (0.0021) model time 0.4727 (0.4713) loss 2.8278 (2.7011) grad_norm 2.0802 (2.2628) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][420/625] eta 0:01:37 lr 0.000344 wd 0.0500 time 0.4705 (0.4734) data time 0.0008 (0.0020) model time 0.4697 (0.4713) loss 3.0134 (2.7018) grad_norm 1.7146 (2.2561) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][430/625] eta 0:01:32 lr 0.000344 wd 0.0500 time 0.4713 (0.4733) data time 0.0010 (0.0020) model time 0.4702 (0.4712) loss 3.1026 (2.7008) grad_norm 1.8969 (2.2530) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][440/625] eta 0:01:27 lr 0.000344 wd 0.0500 time 0.4803 (0.4732) data time 0.0007 (0.0020) model time 0.4796 (0.4711) loss 3.0390 (2.7020) grad_norm 4.5844 (2.2809) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][450/625] eta 0:01:22 lr 0.000344 wd 0.0500 time 0.4743 (0.4735) data time 0.0009 (0.0020) model time 0.4734 (0.4716) loss 2.7247 (2.7063) grad_norm 4.9948 (2.3018) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][460/625] eta 0:01:18 lr 0.000344 wd 0.0500 time 0.4875 (0.4735) data time 0.0007 (0.0020) model time 0.4868 (0.4716) loss 2.6074 (2.7077) grad_norm 1.9638 (2.3262) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][470/625] eta 0:01:13 lr 0.000344 wd 0.0500 time 0.4693 (0.4735) data time 0.0011 (0.0019) model time 0.4682 (0.4716) loss 2.7554 (2.7117) grad_norm 1.9942 (2.3323) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][480/625] eta 0:01:08 lr 0.000344 wd 0.0500 time 0.4687 (0.4735) data time 0.0012 (0.0019) model time 0.4675 (0.4716) loss 2.2389 (2.7123) grad_norm 3.4521 (2.3306) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][490/625] eta 0:01:03 lr 0.000344 wd 0.0500 time 0.4703 (0.4738) data time 0.0010 (0.0019) model time 0.4694 (0.4719) loss 2.0384 (2.7103) grad_norm 1.8014 (2.3223) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][500/625] eta 0:00:59 lr 0.000343 wd 0.0500 time 0.4774 (0.4737) data time 0.0010 (0.0019) model time 0.4764 (0.4718) loss 3.3252 (2.7117) grad_norm 2.0539 (2.3090) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][510/625] eta 0:00:54 lr 0.000343 wd 0.0500 time 0.4702 (0.4736) data time 0.0009 (0.0019) model time 0.4693 (0.4718) loss 3.2125 (2.7163) grad_norm 1.3362 (2.3025) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][520/625] eta 0:00:49 lr 0.000343 wd 0.0500 time 0.4667 (0.4736) data time 0.0007 (0.0019) model time 0.4660 (0.4718) loss 3.0640 (2.7191) grad_norm 2.4328 (2.3183) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][530/625] eta 0:00:44 lr 0.000343 wd 0.0500 time 0.4675 (0.4736) data time 0.0008 (0.0018) model time 0.4667 (0.4717) loss 2.9392 (2.7137) grad_norm 1.8631 (2.3254) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][540/625] eta 0:00:40 lr 0.000343 wd 0.0500 time 0.4714 (0.4735) data time 0.0008 (0.0018) model time 0.4706 (0.4717) loss 3.1195 (2.7140) grad_norm 2.4492 (2.3289) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][550/625] eta 0:00:35 lr 0.000343 wd 0.0500 time 0.4655 (0.4734) data time 0.0011 (0.0018) model time 0.4644 (0.4716) loss 2.3840 (2.7108) grad_norm 2.8543 (2.3431) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][560/625] eta 0:00:30 lr 0.000343 wd 0.0500 time 0.4635 (0.4734) data time 0.0008 (0.0018) model time 0.4627 (0.4716) loss 3.0877 (2.7153) grad_norm 2.1108 (2.3449) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][570/625] eta 0:00:26 lr 0.000343 wd 0.0500 time 0.4704 (0.4733) data time 0.0008 (0.0018) model time 0.4696 (0.4715) loss 2.8600 (2.7159) grad_norm 4.2189 (2.3950) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][580/625] eta 0:00:21 lr 0.000343 wd 0.0500 time 0.4657 (0.4732) data time 0.0010 (0.0018) model time 0.4648 (0.4714) loss 2.7637 (2.7167) grad_norm 1.5550 (2.3951) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][590/625] eta 0:00:16 lr 0.000343 wd 0.0500 time 0.4715 (0.4732) data time 0.0007 (0.0018) model time 0.4708 (0.4714) loss 3.1392 (2.7152) grad_norm 6.1344 (2.4048) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][600/625] eta 0:00:11 lr 0.000343 wd 0.0500 time 0.4726 (0.4731) data time 0.0008 (0.0017) model time 0.4719 (0.4713) loss 2.5551 (2.7135) grad_norm 2.5636 (2.4053) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][610/625] eta 0:00:07 lr 0.000342 wd 0.0500 time 0.4676 (0.4731) data time 0.0005 (0.0017) model time 0.4671 (0.4713) loss 2.5706 (2.7128) grad_norm 1.6006 (inf) loss_scale 128.0000 (255.5810) mem 16707MB [2024-08-10 20:14:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [200/300][620/625] eta 0:00:02 lr 0.000342 wd 0.0500 time 0.4678 (0.4730) data time 0.0007 (0.0017) model time 0.4671 (0.4713) loss 2.7868 (2.7150) grad_norm 1.6037 (inf) loss_scale 128.0000 (253.5266) mem 16707MB [2024-08-10 20:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 200 training takes 0:04:55 [2024-08-10 20:14:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:14:58 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.5220 (0.5220) Acc@1 88.818 (88.818) Acc@5 98.535 (98.535) Mem 16707MB [2024-08-10 20:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.8262 (0.6430) Acc@1 81.006 (86.257) Acc@5 96.484 (97.732) Mem 16707MB [2024-08-10 20:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.8823 (0.7539) Acc@1 78.906 (83.457) Acc@5 95.410 (96.573) Mem 16707MB [2024-08-10 20:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.229 Acc@5 96.521 [2024-08-10 20:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 20:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.887 (0.887) Loss 0.4707 (0.4707) Acc@1 89.697 (89.697) Acc@5 98.877 (98.877) Mem 16707MB [2024-08-10 20:15:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.198) Loss 0.7432 (0.5839) Acc@1 81.982 (87.309) Acc@5 96.973 (97.994) Mem 16707MB [2024-08-10 20:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.8379 (0.6864) Acc@1 80.225 (84.621) Acc@5 95.947 (97.068) Mem 16707MB [2024-08-10 20:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.353 Acc@5 97.051 [2024-08-10 20:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:15:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][0/625] eta 0:13:20 lr 0.000342 wd 0.0500 time 1.2802 (1.2802) data time 0.7567 (0.7567) model time 0.0000 (0.0000) loss 3.1318 (3.1318) grad_norm 2.2167 (2.2167) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][10/625] eta 0:05:32 lr 0.000342 wd 0.0500 time 0.4694 (0.5412) data time 0.0008 (0.0698) model time 0.0000 (0.0000) loss 1.4824 (2.6632) grad_norm 1.9533 (2.5413) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][20/625] eta 0:05:12 lr 0.000342 wd 0.0500 time 0.4711 (0.5170) data time 0.0008 (0.0371) model time 0.0000 (0.0000) loss 3.4871 (2.6405) grad_norm 1.9108 (2.3789) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][30/625] eta 0:04:58 lr 0.000342 wd 0.0500 time 0.4762 (0.5021) data time 0.0010 (0.0255) model time 0.0000 (0.0000) loss 3.0208 (2.7103) grad_norm 1.4807 (2.2540) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][40/625] eta 0:04:49 lr 0.000342 wd 0.0500 time 0.4689 (0.4947) data time 0.0008 (0.0196) model time 0.0000 (0.0000) loss 3.3453 (2.7722) grad_norm 5.1759 (2.3216) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][50/625] eta 0:04:41 lr 0.000342 wd 0.0500 time 0.4740 (0.4903) data time 0.0008 (0.0159) model time 0.0000 (0.0000) loss 2.5492 (2.7556) grad_norm 1.7637 (2.3168) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][60/625] eta 0:04:35 lr 0.000342 wd 0.0500 time 0.4750 (0.4878) data time 0.0008 (0.0135) model time 0.4742 (0.4735) loss 2.8743 (2.7495) grad_norm 1.8746 (2.3579) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][70/625] eta 0:04:29 lr 0.000342 wd 0.0500 time 0.4700 (0.4854) data time 0.0010 (0.0118) model time 0.4690 (0.4718) loss 3.1468 (2.7239) grad_norm 1.6852 (2.3125) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][80/625] eta 0:04:23 lr 0.000342 wd 0.0500 time 0.4696 (0.4835) data time 0.0010 (0.0104) model time 0.4686 (0.4707) loss 2.8402 (2.7259) grad_norm 1.6606 (2.2497) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][90/625] eta 0:04:17 lr 0.000341 wd 0.0500 time 0.4749 (0.4818) data time 0.0010 (0.0094) model time 0.4739 (0.4699) loss 3.3439 (2.7112) grad_norm 1.6649 (2.2369) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][100/625] eta 0:04:12 lr 0.000341 wd 0.0500 time 0.4749 (0.4807) data time 0.0008 (0.0086) model time 0.4741 (0.4699) loss 3.4261 (2.7105) grad_norm 2.7865 (2.2309) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:15:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][110/625] eta 0:04:07 lr 0.000341 wd 0.0500 time 0.4665 (0.4798) data time 0.0008 (0.0079) model time 0.4657 (0.4698) loss 1.7643 (2.6968) grad_norm 2.4413 (2.3402) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][120/625] eta 0:04:02 lr 0.000341 wd 0.0500 time 0.4718 (0.4793) data time 0.0008 (0.0073) model time 0.4710 (0.4703) loss 2.2578 (2.7024) grad_norm 1.6482 (2.3306) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][130/625] eta 0:03:57 lr 0.000341 wd 0.0500 time 0.4693 (0.4789) data time 0.0009 (0.0069) model time 0.4684 (0.4706) loss 3.3630 (2.7053) grad_norm 1.7609 (2.3152) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][140/625] eta 0:03:52 lr 0.000341 wd 0.0500 time 0.4675 (0.4784) data time 0.0011 (0.0064) model time 0.4664 (0.4706) loss 2.2984 (2.6994) grad_norm 2.4512 (2.2841) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][150/625] eta 0:03:47 lr 0.000341 wd 0.0500 time 0.4696 (0.4780) data time 0.0010 (0.0061) model time 0.4687 (0.4707) loss 3.2388 (2.7050) grad_norm 1.5140 (2.2766) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][160/625] eta 0:03:42 lr 0.000341 wd 0.0500 time 0.4627 (0.4775) data time 0.0008 (0.0058) model time 0.4619 (0.4705) loss 2.0392 (2.6986) grad_norm 1.3283 (2.3272) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][170/625] eta 0:03:37 lr 0.000341 wd 0.0500 time 0.4679 (0.4770) data time 0.0011 (0.0055) model time 0.4668 (0.4703) loss 2.4816 (2.6977) grad_norm 2.1238 (2.2982) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][180/625] eta 0:03:32 lr 0.000341 wd 0.0500 time 0.4713 (0.4767) data time 0.0011 (0.0053) model time 0.4702 (0.4703) loss 2.6090 (2.7024) grad_norm 1.7562 (2.2587) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][190/625] eta 0:03:27 lr 0.000340 wd 0.0500 time 0.4732 (0.4765) data time 0.0008 (0.0050) model time 0.4724 (0.4704) loss 2.9535 (2.7180) grad_norm 2.8547 (2.2929) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][200/625] eta 0:03:22 lr 0.000340 wd 0.0500 time 0.4738 (0.4762) data time 0.0008 (0.0048) model time 0.4730 (0.4703) loss 2.7185 (2.7195) grad_norm 1.8797 (2.3246) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][210/625] eta 0:03:17 lr 0.000340 wd 0.0500 time 0.4692 (0.4770) data time 0.0008 (0.0047) model time 0.4685 (0.4716) loss 2.8332 (2.7187) grad_norm 4.6792 (2.3918) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][220/625] eta 0:03:13 lr 0.000340 wd 0.0500 time 0.4797 (0.4766) data time 0.0008 (0.0045) model time 0.4789 (0.4714) loss 2.7989 (2.7156) grad_norm 2.8594 (2.4604) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:16:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][230/625] eta 0:03:08 lr 0.000340 wd 0.0500 time 0.4692 (0.4763) data time 0.0010 (0.0044) model time 0.4683 (0.4713) loss 2.7983 (2.7194) grad_norm 2.1721 (2.4615) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][240/625] eta 0:03:03 lr 0.000340 wd 0.0500 time 0.4656 (0.4760) data time 0.0011 (0.0042) model time 0.4645 (0.4711) loss 2.1025 (2.7200) grad_norm 2.2781 (2.5011) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][250/625] eta 0:02:58 lr 0.000340 wd 0.0500 time 0.4674 (0.4757) data time 0.0011 (0.0041) model time 0.4663 (0.4709) loss 1.9320 (2.7192) grad_norm 1.4379 (2.4704) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][260/625] eta 0:02:53 lr 0.000340 wd 0.0500 time 0.4709 (0.4755) data time 0.0011 (0.0040) model time 0.4698 (0.4709) loss 3.1082 (2.7206) grad_norm 1.4292 (2.4464) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][270/625] eta 0:02:48 lr 0.000340 wd 0.0500 time 0.4693 (0.4754) data time 0.0008 (0.0039) model time 0.4686 (0.4709) loss 3.6098 (2.7244) grad_norm 2.4952 (2.4308) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][280/625] eta 0:02:43 lr 0.000340 wd 0.0500 time 0.4668 (0.4752) data time 0.0009 (0.0038) model time 0.4658 (0.4708) loss 2.4725 (2.7241) grad_norm 2.8842 (2.4230) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][290/625] eta 0:02:39 lr 0.000340 wd 0.0500 time 0.4712 (0.4751) data time 0.0012 (0.0037) model time 0.4700 (0.4708) loss 2.5411 (2.7174) grad_norm 1.6153 (2.4053) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][300/625] eta 0:02:34 lr 0.000339 wd 0.0500 time 0.4756 (0.4748) data time 0.0008 (0.0036) model time 0.4749 (0.4706) loss 3.4195 (2.7229) grad_norm 2.2244 (2.3989) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][310/625] eta 0:02:29 lr 0.000339 wd 0.0500 time 0.4678 (0.4746) data time 0.0010 (0.0035) model time 0.4668 (0.4705) loss 2.2212 (2.7201) grad_norm 1.9914 (2.3965) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][320/625] eta 0:02:24 lr 0.000339 wd 0.0500 time 0.4688 (0.4744) data time 0.0008 (0.0034) model time 0.4680 (0.4704) loss 3.0645 (2.7264) grad_norm 3.1488 (2.3975) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][330/625] eta 0:02:19 lr 0.000339 wd 0.0500 time 0.4738 (0.4743) data time 0.0010 (0.0034) model time 0.4727 (0.4703) loss 2.9925 (2.7298) grad_norm 2.4197 (2.4136) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][340/625] eta 0:02:15 lr 0.000339 wd 0.0500 time 0.4690 (0.4741) data time 0.0010 (0.0033) model time 0.4681 (0.4702) loss 2.8200 (2.7250) grad_norm 4.3514 (2.4128) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][350/625] eta 0:02:10 lr 0.000339 wd 0.0500 time 0.6953 (0.4747) data time 0.0008 (0.0032) model time 0.6945 (0.4710) loss 1.7943 (2.7202) grad_norm 3.6875 (2.4271) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:17:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][360/625] eta 0:02:05 lr 0.000339 wd 0.0500 time 0.4709 (0.4745) data time 0.0008 (0.0032) model time 0.4701 (0.4708) loss 2.5665 (2.7205) grad_norm 2.7643 (2.4221) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][370/625] eta 0:02:00 lr 0.000339 wd 0.0500 time 0.4665 (0.4744) data time 0.0008 (0.0031) model time 0.4657 (0.4708) loss 2.7960 (2.7207) grad_norm 1.4990 (2.4147) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][380/625] eta 0:01:56 lr 0.000339 wd 0.0500 time 0.4688 (0.4742) data time 0.0008 (0.0031) model time 0.4679 (0.4707) loss 3.5661 (2.7284) grad_norm 1.3831 (2.3966) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][390/625] eta 0:01:51 lr 0.000339 wd 0.0500 time 0.4728 (0.4741) data time 0.0010 (0.0030) model time 0.4718 (0.4706) loss 2.5766 (2.7264) grad_norm 2.5654 (2.3881) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][400/625] eta 0:01:46 lr 0.000338 wd 0.0500 time 0.4743 (0.4739) data time 0.0011 (0.0030) model time 0.4732 (0.4705) loss 2.8458 (2.7252) grad_norm 2.0130 (2.3908) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][410/625] eta 0:01:41 lr 0.000338 wd 0.0500 time 0.4694 (0.4738) data time 0.0010 (0.0029) model time 0.4683 (0.4704) loss 2.5089 (2.7262) grad_norm 8.3453 (2.4125) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][420/625] eta 0:01:37 lr 0.000338 wd 0.0500 time 0.4663 (0.4738) data time 0.0010 (0.0029) model time 0.4653 (0.4704) loss 2.1535 (2.7280) grad_norm 1.3197 (2.4061) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][430/625] eta 0:01:32 lr 0.000338 wd 0.0500 time 0.4742 (0.4746) data time 0.0008 (0.0028) model time 0.4734 (0.4714) loss 3.2707 (2.7239) grad_norm 1.7627 (2.3982) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][440/625] eta 0:01:27 lr 0.000338 wd 0.0500 time 0.4721 (0.4745) data time 0.0013 (0.0028) model time 0.4708 (0.4714) loss 2.8804 (2.7303) grad_norm 2.0122 (2.3914) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][450/625] eta 0:01:23 lr 0.000338 wd 0.0500 time 0.4684 (0.4744) data time 0.0008 (0.0028) model time 0.4676 (0.4713) loss 2.5748 (2.7346) grad_norm 1.7254 (2.3893) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][460/625] eta 0:01:18 lr 0.000338 wd 0.0500 time 0.4635 (0.4742) data time 0.0007 (0.0027) model time 0.4628 (0.4711) loss 3.7613 (2.7368) grad_norm 2.9254 (2.3858) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][470/625] eta 0:01:13 lr 0.000338 wd 0.0500 time 0.4641 (0.4741) data time 0.0010 (0.0027) model time 0.4631 (0.4711) loss 2.8931 (2.7408) grad_norm 1.8102 (2.3866) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][480/625] eta 0:01:08 lr 0.000338 wd 0.0500 time 0.4690 (0.4740) data time 0.0008 (0.0026) model time 0.4682 (0.4710) loss 2.3940 (2.7406) grad_norm 2.5080 (2.3815) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:18:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][490/625] eta 0:01:03 lr 0.000338 wd 0.0500 time 0.4712 (0.4739) data time 0.0009 (0.0026) model time 0.4703 (0.4710) loss 1.7062 (2.7361) grad_norm 2.1760 (2.4746) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][500/625] eta 0:00:59 lr 0.000338 wd 0.0500 time 0.4727 (0.4739) data time 0.0010 (0.0026) model time 0.4717 (0.4709) loss 2.6780 (2.7361) grad_norm 1.9345 (2.4705) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][510/625] eta 0:00:54 lr 0.000337 wd 0.0500 time 0.4685 (0.4738) data time 0.0010 (0.0026) model time 0.4675 (0.4709) loss 2.5418 (2.7372) grad_norm 1.6085 (2.4570) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][520/625] eta 0:00:49 lr 0.000337 wd 0.0500 time 0.4747 (0.4737) data time 0.0009 (0.0025) model time 0.4739 (0.4708) loss 2.9595 (2.7346) grad_norm 2.0820 (2.4505) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][530/625] eta 0:00:44 lr 0.000337 wd 0.0500 time 0.4724 (0.4736) data time 0.0008 (0.0025) model time 0.4716 (0.4708) loss 2.5750 (2.7370) grad_norm 2.2688 (2.4437) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][540/625] eta 0:00:40 lr 0.000337 wd 0.0500 time 0.4601 (0.4735) data time 0.0007 (0.0025) model time 0.4594 (0.4707) loss 3.0788 (2.7381) grad_norm 1.8755 (2.4368) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][550/625] eta 0:00:35 lr 0.000337 wd 0.0500 time 0.4711 (0.4735) data time 0.0011 (0.0024) model time 0.4700 (0.4707) loss 3.1931 (2.7426) grad_norm 1.4717 (2.4348) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][560/625] eta 0:00:30 lr 0.000337 wd 0.0500 time 0.4745 (0.4735) data time 0.0008 (0.0024) model time 0.4737 (0.4707) loss 2.4275 (2.7418) grad_norm 2.1392 (2.4261) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][570/625] eta 0:00:26 lr 0.000337 wd 0.0500 time 0.4707 (0.4735) data time 0.0011 (0.0024) model time 0.4696 (0.4707) loss 2.5534 (2.7426) grad_norm 1.7432 (2.4159) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][580/625] eta 0:00:21 lr 0.000337 wd 0.0500 time 0.4723 (0.4742) data time 0.0010 (0.0024) model time 0.4713 (0.4715) loss 2.4246 (2.7375) grad_norm 2.1120 (2.4147) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][590/625] eta 0:00:16 lr 0.000337 wd 0.0500 time 0.4663 (0.4741) data time 0.0008 (0.0024) model time 0.4655 (0.4715) loss 3.4817 (2.7373) grad_norm 3.1719 (2.4220) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][600/625] eta 0:00:11 lr 0.000337 wd 0.0500 time 0.4672 (0.4740) data time 0.0010 (0.0023) model time 0.4662 (0.4714) loss 3.3934 (2.7371) grad_norm 1.8291 (2.4128) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][610/625] eta 0:00:07 lr 0.000336 wd 0.0500 time 0.4675 (0.4739) data time 0.0008 (0.0023) model time 0.4667 (0.4713) loss 2.8356 (2.7373) grad_norm 2.6023 (2.4048) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [201/300][620/625] eta 0:00:02 lr 0.000336 wd 0.0500 time 0.4682 (0.4738) data time 0.0007 (0.0023) model time 0.4675 (0.4712) loss 3.2346 (2.7413) grad_norm 2.1011 (2.3966) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 201 training takes 0:04:56 [2024-08-10 20:20:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:20:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.5088 (0.5088) Acc@1 88.721 (88.721) Acc@5 98.682 (98.682) Mem 16707MB [2024-08-10 20:20:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.8354 (0.6374) Acc@1 80.127 (86.088) Acc@5 95.850 (97.687) Mem 16707MB [2024-08-10 20:20:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.142) Loss 0.8965 (0.7440) Acc@1 78.662 (83.361) Acc@5 95.312 (96.645) Mem 16707MB [2024-08-10 20:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.159 Acc@5 96.631 [2024-08-10 20:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 20:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.851 (0.851) Loss 0.4712 (0.4712) Acc@1 89.844 (89.844) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:20:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.195) Loss 0.7441 (0.5841) Acc@1 82.129 (87.362) Acc@5 96.973 (97.976) Mem 16707MB [2024-08-10 20:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.158) Loss 0.8359 (0.6863) Acc@1 80.225 (84.638) Acc@5 95.898 (97.042) Mem 16707MB [2024-08-10 20:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.361 Acc@5 97.033 [2024-08-10 20:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.36% [2024-08-10 20:20:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 20:20:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 20:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][0/625] eta 0:08:38 lr 0.000336 wd 0.0500 time 0.8293 (0.8293) data time 0.4127 (0.4127) model time 0.0000 (0.0000) loss 3.0168 (3.0168) grad_norm 2.1393 (2.1393) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][10/625] eta 0:05:09 lr 0.000336 wd 0.0500 time 0.4729 (0.5031) data time 0.0008 (0.0386) model time 0.0000 (0.0000) loss 3.4212 (2.8799) grad_norm 1.9873 (1.8515) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][20/625] eta 0:04:55 lr 0.000336 wd 0.0500 time 0.4730 (0.4882) data time 0.0008 (0.0207) model time 0.0000 (0.0000) loss 3.0157 (2.8406) grad_norm 2.4007 (1.9857) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][30/625] eta 0:04:47 lr 0.000336 wd 0.0500 time 0.4702 (0.4824) data time 0.0009 (0.0144) model time 0.0000 (0.0000) loss 2.8808 (2.7772) grad_norm 1.9047 (2.4292) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][40/625] eta 0:04:40 lr 0.000336 wd 0.0500 time 0.4720 (0.4791) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 3.1075 (2.8066) grad_norm 1.3907 (2.2804) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][50/625] eta 0:04:34 lr 0.000336 wd 0.0500 time 0.4689 (0.4774) data time 0.0011 (0.0092) model time 0.0000 (0.0000) loss 2.8975 (2.8078) grad_norm 2.1192 (2.2170) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][60/625] eta 0:04:29 lr 0.000336 wd 0.0500 time 0.4731 (0.4766) data time 0.0008 (0.0079) model time 0.4723 (0.4715) loss 3.0090 (2.7864) grad_norm 5.5456 (2.2282) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][70/625] eta 0:04:24 lr 0.000336 wd 0.0500 time 0.4765 (0.4763) data time 0.0011 (0.0069) model time 0.4754 (0.4724) loss 2.9888 (2.7807) grad_norm 3.0443 (2.2989) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][80/625] eta 0:04:19 lr 0.000336 wd 0.0500 time 0.4709 (0.4759) data time 0.0010 (0.0062) model time 0.4699 (0.4723) loss 3.3430 (2.7737) grad_norm 2.6170 (2.7019) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:20:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][90/625] eta 0:04:14 lr 0.000335 wd 0.0500 time 0.4768 (0.4759) data time 0.0008 (0.0056) model time 0.4760 (0.4729) loss 2.4303 (2.7662) grad_norm 1.9756 (2.6309) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][100/625] eta 0:04:09 lr 0.000335 wd 0.0500 time 0.4715 (0.4755) data time 0.0007 (0.0052) model time 0.4707 (0.4725) loss 2.5205 (2.7808) grad_norm 2.6520 (2.5885) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][110/625] eta 0:04:04 lr 0.000335 wd 0.0500 time 0.4653 (0.4749) data time 0.0007 (0.0048) model time 0.4646 (0.4717) loss 2.6020 (2.7697) grad_norm 1.9085 (2.6289) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][120/625] eta 0:03:59 lr 0.000335 wd 0.0500 time 0.4717 (0.4744) data time 0.0008 (0.0045) model time 0.4709 (0.4712) loss 1.9304 (2.7431) grad_norm 1.9371 (2.6732) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][130/625] eta 0:03:56 lr 0.000335 wd 0.0500 time 0.4731 (0.4776) data time 0.0011 (0.0042) model time 0.4721 (0.4767) loss 2.3605 (2.7235) grad_norm 2.0438 (2.6115) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][140/625] eta 0:03:51 lr 0.000335 wd 0.0500 time 0.4830 (0.4773) data time 0.0010 (0.0040) model time 0.4819 (0.4762) loss 2.9026 (2.7260) grad_norm 1.6687 (2.5549) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][150/625] eta 0:03:46 lr 0.000335 wd 0.0500 time 0.4754 (0.4771) data time 0.0008 (0.0038) model time 0.4747 (0.4758) loss 3.0937 (2.7323) grad_norm 2.5116 (2.5301) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][160/625] eta 0:03:41 lr 0.000335 wd 0.0500 time 0.4754 (0.4767) data time 0.0010 (0.0036) model time 0.4744 (0.4753) loss 2.5211 (2.7349) grad_norm 2.1711 (2.5433) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][170/625] eta 0:03:36 lr 0.000335 wd 0.0500 time 0.4707 (0.4763) data time 0.0010 (0.0035) model time 0.4698 (0.4748) loss 2.4915 (2.7366) grad_norm 1.6409 (2.5228) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][180/625] eta 0:03:31 lr 0.000335 wd 0.0500 time 0.4721 (0.4760) data time 0.0008 (0.0034) model time 0.4713 (0.4744) loss 2.5796 (2.7348) grad_norm 2.5865 (2.5090) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][190/625] eta 0:03:26 lr 0.000335 wd 0.0500 time 0.4661 (0.4756) data time 0.0010 (0.0032) model time 0.4651 (0.4739) loss 2.9265 (2.7223) grad_norm 1.9486 (2.4738) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][200/625] eta 0:03:22 lr 0.000334 wd 0.0500 time 0.4728 (0.4754) data time 0.0010 (0.0031) model time 0.4718 (0.4736) loss 2.2411 (2.7140) grad_norm 2.1556 (2.5029) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][210/625] eta 0:03:17 lr 0.000334 wd 0.0500 time 0.4668 (0.4759) data time 0.0009 (0.0030) model time 0.4660 (0.4744) loss 3.2117 (2.7132) grad_norm 2.3906 (2.5233) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:21:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][220/625] eta 0:03:12 lr 0.000334 wd 0.0500 time 0.4705 (0.4758) data time 0.0010 (0.0029) model time 0.4695 (0.4742) loss 2.5769 (2.7088) grad_norm 1.9281 (2.5105) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][230/625] eta 0:03:07 lr 0.000334 wd 0.0500 time 0.4711 (0.4756) data time 0.0010 (0.0029) model time 0.4701 (0.4740) loss 2.9542 (2.7226) grad_norm 2.2284 (2.5170) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][240/625] eta 0:03:03 lr 0.000334 wd 0.0500 time 0.4666 (0.4753) data time 0.0010 (0.0028) model time 0.4656 (0.4737) loss 2.8261 (2.7276) grad_norm 2.0554 (2.4979) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][250/625] eta 0:02:58 lr 0.000334 wd 0.0500 time 0.4687 (0.4751) data time 0.0007 (0.0027) model time 0.4680 (0.4735) loss 2.3325 (2.7355) grad_norm 1.9684 (2.5921) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][260/625] eta 0:02:53 lr 0.000334 wd 0.0500 time 0.4786 (0.4749) data time 0.0008 (0.0026) model time 0.4778 (0.4732) loss 3.0810 (2.7371) grad_norm 1.6169 (2.6017) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][270/625] eta 0:02:48 lr 0.000334 wd 0.0500 time 0.4675 (0.4746) data time 0.0011 (0.0026) model time 0.4664 (0.4729) loss 2.8917 (2.7396) grad_norm 1.9331 (2.5882) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][280/625] eta 0:02:43 lr 0.000334 wd 0.0500 time 0.4718 (0.4746) data time 0.0008 (0.0025) model time 0.4711 (0.4729) loss 1.8254 (2.7345) grad_norm 1.9521 (2.5891) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][290/625] eta 0:02:38 lr 0.000334 wd 0.0500 time 0.4726 (0.4746) data time 0.0011 (0.0025) model time 0.4715 (0.4730) loss 2.8996 (2.7230) grad_norm 1.2994 (2.5568) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][300/625] eta 0:02:34 lr 0.000333 wd 0.0500 time 0.4723 (0.4746) data time 0.0008 (0.0024) model time 0.4715 (0.4730) loss 2.8340 (2.7155) grad_norm 1.3475 (2.5270) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][310/625] eta 0:02:29 lr 0.000333 wd 0.0500 time 0.4693 (0.4746) data time 0.0008 (0.0024) model time 0.4684 (0.4730) loss 3.3399 (2.7160) grad_norm 1.4742 (2.5022) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][320/625] eta 0:02:24 lr 0.000333 wd 0.0500 time 0.4724 (0.4745) data time 0.0007 (0.0023) model time 0.4717 (0.4729) loss 2.4067 (2.7121) grad_norm 1.7607 (2.4797) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][330/625] eta 0:02:19 lr 0.000333 wd 0.0500 time 0.4678 (0.4743) data time 0.0010 (0.0023) model time 0.4669 (0.4727) loss 2.2879 (2.7170) grad_norm 2.1039 (2.4732) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][340/625] eta 0:02:15 lr 0.000333 wd 0.0500 time 0.4698 (0.4742) data time 0.0010 (0.0023) model time 0.4688 (0.4726) loss 2.8512 (2.7256) grad_norm 2.2204 (2.4640) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][350/625] eta 0:02:10 lr 0.000333 wd 0.0500 time 0.4696 (0.4741) data time 0.0010 (0.0022) model time 0.4686 (0.4725) loss 2.7471 (2.7241) grad_norm 1.6621 (2.5218) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][360/625] eta 0:02:05 lr 0.000333 wd 0.0500 time 0.4691 (0.4745) data time 0.0009 (0.0022) model time 0.4682 (0.4730) loss 2.9083 (2.7213) grad_norm 1.4410 (2.5432) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][370/625] eta 0:02:00 lr 0.000333 wd 0.0500 time 0.4673 (0.4745) data time 0.0008 (0.0022) model time 0.4666 (0.4730) loss 2.8630 (2.7258) grad_norm 3.2923 (2.5325) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][380/625] eta 0:01:56 lr 0.000333 wd 0.0500 time 0.4710 (0.4744) data time 0.0008 (0.0021) model time 0.4703 (0.4729) loss 2.8167 (2.7228) grad_norm 1.2865 (2.5375) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][390/625] eta 0:01:51 lr 0.000333 wd 0.0500 time 0.4695 (0.4743) data time 0.0008 (0.0021) model time 0.4687 (0.4728) loss 1.7917 (2.7161) grad_norm 4.2025 (2.5581) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][400/625] eta 0:01:46 lr 0.000333 wd 0.0500 time 0.4663 (0.4742) data time 0.0008 (0.0021) model time 0.4654 (0.4727) loss 3.3047 (2.7207) grad_norm 1.8120 (2.5401) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][410/625] eta 0:01:41 lr 0.000332 wd 0.0500 time 0.4720 (0.4741) data time 0.0010 (0.0021) model time 0.4710 (0.4726) loss 2.9457 (2.7205) grad_norm 2.1683 (2.5206) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][420/625] eta 0:01:37 lr 0.000332 wd 0.0500 time 0.4716 (0.4740) data time 0.0008 (0.0020) model time 0.4708 (0.4725) loss 2.1329 (2.7214) grad_norm 1.9587 (2.5180) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][430/625] eta 0:01:32 lr 0.000332 wd 0.0500 time 0.4663 (0.4740) data time 0.0011 (0.0020) model time 0.4652 (0.4724) loss 3.2389 (2.7229) grad_norm 1.8970 (2.5119) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][440/625] eta 0:01:27 lr 0.000332 wd 0.0500 time 0.4705 (0.4740) data time 0.0008 (0.0020) model time 0.4697 (0.4725) loss 2.7458 (2.7265) grad_norm 1.8382 (2.4962) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][450/625] eta 0:01:22 lr 0.000332 wd 0.0500 time 0.4715 (0.4739) data time 0.0008 (0.0020) model time 0.4708 (0.4724) loss 2.4907 (2.7249) grad_norm 2.5417 (2.4856) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][460/625] eta 0:01:18 lr 0.000332 wd 0.0500 time 0.4661 (0.4749) data time 0.0010 (0.0020) model time 0.4652 (0.4735) loss 3.2784 (2.7249) grad_norm 2.1192 (2.4763) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][470/625] eta 0:01:13 lr 0.000332 wd 0.0500 time 0.4650 (0.4748) data time 0.0010 (0.0019) model time 0.4640 (0.4734) loss 2.7417 (2.7286) grad_norm 3.7646 (2.4884) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][480/625] eta 0:01:08 lr 0.000332 wd 0.0500 time 0.4653 (0.4747) data time 0.0010 (0.0019) model time 0.4643 (0.4733) loss 3.3425 (2.7279) grad_norm 1.8122 (2.4888) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][490/625] eta 0:01:04 lr 0.000332 wd 0.0500 time 0.4678 (0.4746) data time 0.0008 (0.0019) model time 0.4671 (0.4732) loss 3.2750 (2.7263) grad_norm 3.4759 (2.4989) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][500/625] eta 0:00:59 lr 0.000332 wd 0.0500 time 0.4679 (0.4744) data time 0.0008 (0.0019) model time 0.4671 (0.4731) loss 2.4894 (2.7246) grad_norm 2.1194 (2.5060) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][510/625] eta 0:00:54 lr 0.000331 wd 0.0500 time 0.4792 (0.4744) data time 0.0008 (0.0019) model time 0.4784 (0.4730) loss 2.8086 (2.7214) grad_norm 2.0258 (2.4980) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][520/625] eta 0:00:49 lr 0.000331 wd 0.0500 time 0.4686 (0.4743) data time 0.0008 (0.0018) model time 0.4677 (0.4729) loss 1.6928 (2.7237) grad_norm 1.5287 (2.4887) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][530/625] eta 0:00:45 lr 0.000331 wd 0.0500 time 0.4689 (0.4743) data time 0.0008 (0.0018) model time 0.4681 (0.4729) loss 3.1159 (2.7226) grad_norm 3.8983 (2.4835) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][540/625] eta 0:00:40 lr 0.000331 wd 0.0500 time 0.4719 (0.4742) data time 0.0008 (0.0018) model time 0.4711 (0.4728) loss 2.8880 (2.7221) grad_norm 4.8981 (2.4908) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][550/625] eta 0:00:35 lr 0.000331 wd 0.0500 time 0.4665 (0.4741) data time 0.0008 (0.0018) model time 0.4657 (0.4727) loss 2.9170 (2.7249) grad_norm 1.9067 (2.4872) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][560/625] eta 0:00:30 lr 0.000331 wd 0.0500 time 0.4707 (0.4740) data time 0.0010 (0.0018) model time 0.4697 (0.4726) loss 2.4211 (2.7197) grad_norm 1.8003 (2.4773) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][570/625] eta 0:00:26 lr 0.000331 wd 0.0500 time 0.4759 (0.4739) data time 0.0008 (0.0018) model time 0.4751 (0.4725) loss 3.0364 (2.7174) grad_norm 1.4909 (2.4646) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][580/625] eta 0:00:21 lr 0.000331 wd 0.0500 time 0.4721 (0.4739) data time 0.0010 (0.0018) model time 0.4711 (0.4725) loss 2.7738 (2.7187) grad_norm 1.6037 (2.4642) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][590/625] eta 0:00:16 lr 0.000331 wd 0.0500 time 0.4747 (0.4739) data time 0.0009 (0.0017) model time 0.4738 (0.4725) loss 1.7358 (2.7154) grad_norm 2.3557 (2.4639) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][600/625] eta 0:00:11 lr 0.000331 wd 0.0500 time 0.4725 (0.4738) data time 0.0011 (0.0017) model time 0.4714 (0.4724) loss 2.0283 (2.7154) grad_norm 1.4885 (2.4566) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][610/625] eta 0:00:07 lr 0.000331 wd 0.0500 time 0.4637 (0.4737) data time 0.0007 (0.0017) model time 0.4630 (0.4724) loss 2.7936 (2.7124) grad_norm 5.2570 (2.4528) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [202/300][620/625] eta 0:00:02 lr 0.000330 wd 0.0500 time 0.4663 (0.4736) data time 0.0005 (0.0017) model time 0.4658 (0.4722) loss 3.4374 (2.7126) grad_norm 2.0007 (2.4458) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 202 training takes 0:04:55 [2024-08-10 20:25:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:25:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:25:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.5234 (0.5234) Acc@1 88.721 (88.721) Acc@5 98.389 (98.389) Mem 16707MB [2024-08-10 20:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.8193 (0.6129) Acc@1 80.518 (86.523) Acc@5 96.045 (97.745) Mem 16707MB [2024-08-10 20:25:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.9326 (0.7323) Acc@1 77.344 (83.629) Acc@5 95.117 (96.622) Mem 16707MB [2024-08-10 20:25:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.413 Acc@5 96.593 [2024-08-10 20:25:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 20:25:14 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.41% [2024-08-10 20:25:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 20:25:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 20:25:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.4724 (0.4724) Acc@1 89.746 (89.746) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.160) Loss 0.7432 (0.5837) Acc@1 81.982 (87.358) Acc@5 96.973 (98.025) Mem 16707MB [2024-08-10 20:25:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.8345 (0.6860) Acc@1 80.469 (84.668) Acc@5 95.947 (97.070) Mem 16707MB [2024-08-10 20:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.389 Acc@5 97.063 [2024-08-10 20:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.39% [2024-08-10 20:25:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 20:25:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 20:25:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][0/625] eta 0:08:31 lr 0.000330 wd 0.0500 time 0.8185 (0.8185) data time 0.3998 (0.3998) model time 0.0000 (0.0000) loss 3.3248 (3.3248) grad_norm 11.6367 (11.6367) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][10/625] eta 0:05:08 lr 0.000330 wd 0.0500 time 0.4714 (0.5011) data time 0.0008 (0.0373) model time 0.0000 (0.0000) loss 1.6429 (2.6676) grad_norm 2.3023 (2.8656) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][20/625] eta 0:04:54 lr 0.000330 wd 0.0500 time 0.4657 (0.4873) data time 0.0010 (0.0200) model time 0.0000 (0.0000) loss 2.7078 (2.8215) grad_norm 1.6373 (2.5043) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][30/625] eta 0:04:47 lr 0.000330 wd 0.0500 time 0.4758 (0.4829) data time 0.0010 (0.0139) model time 0.0000 (0.0000) loss 2.7347 (2.7832) grad_norm 1.6078 (2.2986) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][40/625] eta 0:04:44 lr 0.000330 wd 0.0500 time 0.4641 (0.4857) data time 0.0010 (0.0108) model time 0.0000 (0.0000) loss 2.8445 (2.7746) grad_norm 2.2457 (2.2198) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][50/625] eta 0:04:40 lr 0.000330 wd 0.0500 time 0.4678 (0.4871) data time 0.0010 (0.0089) model time 0.0000 (0.0000) loss 2.3633 (2.7592) grad_norm 1.9143 (2.2952) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][60/625] eta 0:04:33 lr 0.000330 wd 0.0500 time 0.4733 (0.4842) data time 0.0008 (0.0076) model time 0.4725 (0.4683) loss 3.1426 (2.7649) grad_norm 1.8287 (2.5103) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:25:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][70/625] eta 0:04:27 lr 0.000330 wd 0.0500 time 0.4669 (0.4819) data time 0.0010 (0.0067) model time 0.4659 (0.4677) loss 2.7936 (2.7683) grad_norm 2.1763 (2.4890) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][80/625] eta 0:04:21 lr 0.000330 wd 0.0500 time 0.4715 (0.4805) data time 0.0010 (0.0060) model time 0.4705 (0.4682) loss 2.8754 (2.7551) grad_norm 6.5961 (2.4821) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][90/625] eta 0:04:16 lr 0.000330 wd 0.0500 time 0.4731 (0.4797) data time 0.0010 (0.0054) model time 0.4721 (0.4692) loss 3.0607 (2.7676) grad_norm 1.7980 (2.4270) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][100/625] eta 0:04:11 lr 0.000329 wd 0.0500 time 0.4680 (0.4788) data time 0.0010 (0.0050) model time 0.4670 (0.4694) loss 3.6657 (2.7855) grad_norm 1.9471 (2.3795) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][110/625] eta 0:04:06 lr 0.000329 wd 0.0500 time 0.4655 (0.4779) data time 0.0008 (0.0046) model time 0.4647 (0.4691) loss 2.2957 (2.7689) grad_norm 2.2002 (2.3278) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][120/625] eta 0:04:01 lr 0.000329 wd 0.0500 time 0.4687 (0.4772) data time 0.0008 (0.0043) model time 0.4679 (0.4691) loss 1.9025 (2.7630) grad_norm 1.6448 (2.3107) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][130/625] eta 0:03:55 lr 0.000329 wd 0.0500 time 0.4614 (0.4765) data time 0.0008 (0.0041) model time 0.4606 (0.4687) loss 2.2774 (2.7621) grad_norm 2.2921 (2.2903) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][140/625] eta 0:03:50 lr 0.000329 wd 0.0500 time 0.4723 (0.4759) data time 0.0010 (0.0039) model time 0.4713 (0.4686) loss 2.5072 (2.7641) grad_norm 1.4362 (2.2430) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][150/625] eta 0:03:45 lr 0.000329 wd 0.0500 time 0.4714 (0.4754) data time 0.0010 (0.0037) model time 0.4704 (0.4685) loss 2.7113 (2.7757) grad_norm 1.8086 (2.2097) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][160/625] eta 0:03:40 lr 0.000329 wd 0.0500 time 0.4693 (0.4752) data time 0.0010 (0.0035) model time 0.4683 (0.4686) loss 2.9539 (2.7748) grad_norm 1.4232 (2.2121) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][170/625] eta 0:03:36 lr 0.000329 wd 0.0500 time 0.4770 (0.4750) data time 0.0008 (0.0034) model time 0.4762 (0.4689) loss 2.9687 (2.7779) grad_norm 2.4880 (2.2321) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][180/625] eta 0:03:31 lr 0.000329 wd 0.0500 time 0.4682 (0.4748) data time 0.0008 (0.0033) model time 0.4675 (0.4689) loss 2.6626 (2.7804) grad_norm 2.1136 (2.2076) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][190/625] eta 0:03:26 lr 0.000329 wd 0.0500 time 0.4768 (0.4745) data time 0.0011 (0.0031) model time 0.4757 (0.4689) loss 2.8642 (2.7737) grad_norm 1.4714 (2.1770) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][200/625] eta 0:03:21 lr 0.000329 wd 0.0500 time 0.4672 (0.4742) data time 0.0011 (0.0030) model time 0.4660 (0.4688) loss 3.2950 (2.7792) grad_norm 2.3929 (2.1892) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][210/625] eta 0:03:16 lr 0.000328 wd 0.0500 time 0.4708 (0.4740) data time 0.0010 (0.0029) model time 0.4698 (0.4688) loss 2.8353 (2.7773) grad_norm 1.5532 (2.1987) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][220/625] eta 0:03:11 lr 0.000328 wd 0.0500 time 0.4647 (0.4738) data time 0.0011 (0.0029) model time 0.4637 (0.4687) loss 2.0421 (2.7628) grad_norm 3.2614 (2.1872) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][230/625] eta 0:03:07 lr 0.000328 wd 0.0500 time 0.4702 (0.4747) data time 0.0010 (0.0028) model time 0.4692 (0.4701) loss 2.8991 (2.7650) grad_norm 3.6691 (2.1875) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][240/625] eta 0:03:02 lr 0.000328 wd 0.0500 time 0.4676 (0.4753) data time 0.0008 (0.0027) model time 0.4668 (0.4711) loss 2.6759 (2.7514) grad_norm 1.2721 (2.1736) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][250/625] eta 0:02:58 lr 0.000328 wd 0.0500 time 0.4787 (0.4752) data time 0.0008 (0.0026) model time 0.4779 (0.4711) loss 2.1174 (2.7421) grad_norm 1.7761 (2.2027) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][260/625] eta 0:02:53 lr 0.000328 wd 0.0500 time 0.4658 (0.4751) data time 0.0009 (0.0026) model time 0.4649 (0.4711) loss 1.6976 (2.7374) grad_norm 1.8499 (2.2627) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][270/625] eta 0:02:48 lr 0.000328 wd 0.0500 time 0.4651 (0.4748) data time 0.0007 (0.0025) model time 0.4644 (0.4710) loss 2.2126 (2.7368) grad_norm 4.2155 (2.2930) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][280/625] eta 0:02:43 lr 0.000328 wd 0.0500 time 0.4685 (0.4747) data time 0.0007 (0.0025) model time 0.4678 (0.4709) loss 3.2154 (2.7407) grad_norm 1.8940 (2.3267) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][290/625] eta 0:02:39 lr 0.000328 wd 0.0500 time 0.4676 (0.4751) data time 0.0010 (0.0024) model time 0.4667 (0.4715) loss 2.9088 (2.7426) grad_norm 1.7088 (2.3139) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][300/625] eta 0:02:34 lr 0.000328 wd 0.0500 time 0.4744 (0.4750) data time 0.0011 (0.0024) model time 0.4733 (0.4715) loss 2.5578 (2.7380) grad_norm 2.2540 (2.3045) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][310/625] eta 0:02:29 lr 0.000327 wd 0.0500 time 0.4787 (0.4750) data time 0.0011 (0.0023) model time 0.4775 (0.4715) loss 2.6917 (2.7365) grad_norm 2.0696 (2.3082) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][320/625] eta 0:02:24 lr 0.000327 wd 0.0500 time 0.4662 (0.4748) data time 0.0010 (0.0023) model time 0.4652 (0.4715) loss 2.6914 (2.7322) grad_norm 2.1020 (2.2967) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][330/625] eta 0:02:20 lr 0.000327 wd 0.0500 time 0.4772 (0.4748) data time 0.0009 (0.0023) model time 0.4764 (0.4715) loss 3.1783 (2.7352) grad_norm 1.1101 (2.2767) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][340/625] eta 0:02:15 lr 0.000327 wd 0.0500 time 0.4660 (0.4746) data time 0.0008 (0.0022) model time 0.4652 (0.4714) loss 2.9129 (2.7442) grad_norm 2.2980 (2.2645) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][350/625] eta 0:02:10 lr 0.000327 wd 0.0500 time 0.4677 (0.4745) data time 0.0010 (0.0022) model time 0.4667 (0.4713) loss 1.9681 (2.7447) grad_norm 1.7413 (2.2511) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][360/625] eta 0:02:05 lr 0.000327 wd 0.0500 time 0.4631 (0.4744) data time 0.0011 (0.0022) model time 0.4620 (0.4712) loss 3.0047 (2.7454) grad_norm 2.5198 (2.2445) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][370/625] eta 0:02:00 lr 0.000327 wd 0.0500 time 0.4719 (0.4743) data time 0.0007 (0.0021) model time 0.4712 (0.4712) loss 3.2806 (2.7445) grad_norm 2.1051 (2.2403) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][380/625] eta 0:01:56 lr 0.000327 wd 0.0500 time 0.4720 (0.4742) data time 0.0009 (0.0021) model time 0.4711 (0.4712) loss 2.1763 (2.7372) grad_norm 2.7185 (2.2783) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][390/625] eta 0:01:51 lr 0.000327 wd 0.0500 time 0.4723 (0.4742) data time 0.0010 (0.0021) model time 0.4713 (0.4712) loss 2.0922 (2.7377) grad_norm 2.3523 (2.2723) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][400/625] eta 0:01:46 lr 0.000327 wd 0.0500 time 0.4695 (0.4742) data time 0.0008 (0.0021) model time 0.4686 (0.4712) loss 1.8645 (2.7350) grad_norm 1.8632 (2.3133) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][410/625] eta 0:01:41 lr 0.000327 wd 0.0500 time 0.4647 (0.4741) data time 0.0011 (0.0020) model time 0.4637 (0.4712) loss 2.4220 (2.7308) grad_norm 1.7647 (2.3126) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][420/625] eta 0:01:37 lr 0.000326 wd 0.0500 time 0.4725 (0.4740) data time 0.0010 (0.0020) model time 0.4715 (0.4712) loss 2.8662 (2.7385) grad_norm 2.2051 (2.3134) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][430/625] eta 0:01:32 lr 0.000326 wd 0.0500 time 0.4649 (0.4739) data time 0.0010 (0.0020) model time 0.4639 (0.4711) loss 2.4557 (2.7313) grad_norm 2.2273 (2.3074) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][440/625] eta 0:01:27 lr 0.000326 wd 0.0500 time 0.4767 (0.4739) data time 0.0008 (0.0020) model time 0.4758 (0.4711) loss 3.2487 (2.7315) grad_norm 1.5673 (2.2985) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][450/625] eta 0:01:22 lr 0.000326 wd 0.0500 time 0.4647 (0.4742) data time 0.0010 (0.0019) model time 0.4637 (0.4715) loss 3.1804 (2.7328) grad_norm 2.2710 (2.2957) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][460/625] eta 0:01:18 lr 0.000326 wd 0.0500 time 0.4701 (0.4747) data time 0.0010 (0.0019) model time 0.4690 (0.4721) loss 3.2445 (2.7338) grad_norm 1.4981 (2.2961) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][470/625] eta 0:01:13 lr 0.000326 wd 0.0500 time 0.4691 (0.4747) data time 0.0010 (0.0019) model time 0.4681 (0.4721) loss 2.6330 (2.7405) grad_norm 1.8061 (2.2887) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][480/625] eta 0:01:08 lr 0.000326 wd 0.0500 time 0.4754 (0.4748) data time 0.0011 (0.0019) model time 0.4744 (0.4722) loss 2.9894 (2.7357) grad_norm 1.6945 (2.2930) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][490/625] eta 0:01:04 lr 0.000326 wd 0.0500 time 0.4682 (0.4747) data time 0.0011 (0.0019) model time 0.4671 (0.4722) loss 2.5715 (2.7344) grad_norm 2.2506 (2.3014) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][500/625] eta 0:00:59 lr 0.000326 wd 0.0500 time 0.4629 (0.4745) data time 0.0011 (0.0019) model time 0.4618 (0.4721) loss 2.7667 (2.7355) grad_norm 1.7972 (2.3019) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][510/625] eta 0:00:54 lr 0.000326 wd 0.0500 time 0.4648 (0.4744) data time 0.0009 (0.0018) model time 0.4639 (0.4720) loss 2.3381 (2.7344) grad_norm 1.6976 (2.3019) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][520/625] eta 0:00:49 lr 0.000326 wd 0.0500 time 0.4715 (0.4744) data time 0.0008 (0.0018) model time 0.4707 (0.4719) loss 2.6076 (2.7310) grad_norm 2.3722 (2.2918) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][530/625] eta 0:00:45 lr 0.000325 wd 0.0500 time 0.4666 (0.4743) data time 0.0010 (0.0018) model time 0.4655 (0.4719) loss 2.8302 (2.7331) grad_norm 1.5349 (2.2841) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][540/625] eta 0:00:40 lr 0.000325 wd 0.0500 time 0.4786 (0.4743) data time 0.0010 (0.0018) model time 0.4776 (0.4719) loss 2.8893 (2.7378) grad_norm 1.8079 (2.2832) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][550/625] eta 0:00:35 lr 0.000325 wd 0.0500 time 0.4660 (0.4742) data time 0.0008 (0.0018) model time 0.4652 (0.4719) loss 2.7462 (2.7386) grad_norm 1.6748 (2.2725) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][560/625] eta 0:00:30 lr 0.000325 wd 0.0500 time 0.4654 (0.4741) data time 0.0011 (0.0018) model time 0.4643 (0.4718) loss 2.8311 (2.7378) grad_norm 1.9354 (2.2774) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][570/625] eta 0:00:26 lr 0.000325 wd 0.0500 time 0.4647 (0.4741) data time 0.0008 (0.0018) model time 0.4638 (0.4717) loss 3.0987 (2.7422) grad_norm 1.5330 (2.2690) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][580/625] eta 0:00:21 lr 0.000325 wd 0.0500 time 0.4663 (0.4740) data time 0.0008 (0.0018) model time 0.4655 (0.4716) loss 3.4325 (2.7454) grad_norm 1.5209 (2.2707) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][590/625] eta 0:00:16 lr 0.000325 wd 0.0500 time 0.4674 (0.4739) data time 0.0011 (0.0017) model time 0.4663 (0.4715) loss 2.8289 (2.7436) grad_norm 1.9332 (2.2718) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][600/625] eta 0:00:11 lr 0.000325 wd 0.0500 time 0.4747 (0.4746) data time 0.0010 (0.0017) model time 0.4737 (0.4723) loss 2.6982 (2.7437) grad_norm 1.3625 (2.2670) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][610/625] eta 0:00:07 lr 0.000325 wd 0.0500 time 0.4682 (0.4745) data time 0.0007 (0.0017) model time 0.4675 (0.4723) loss 2.8651 (2.7397) grad_norm 1.8226 (2.2632) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [203/300][620/625] eta 0:00:02 lr 0.000325 wd 0.0500 time 0.4686 (0.4743) data time 0.0005 (0.0017) model time 0.4680 (0.4721) loss 2.1832 (2.7406) grad_norm 1.8923 (2.2572) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:19 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 203 training takes 0:04:56 [2024-08-10 20:30:19 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:30:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5098 (0.5098) Acc@1 88.623 (88.623) Acc@5 98.730 (98.730) Mem 16707MB [2024-08-10 20:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.161) Loss 0.8164 (0.6264) Acc@1 80.664 (86.230) Acc@5 96.484 (97.714) Mem 16707MB [2024-08-10 20:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9136 (0.7368) Acc@1 77.588 (83.412) Acc@5 95.068 (96.652) Mem 16707MB [2024-08-10 20:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.093 Acc@5 96.609 [2024-08-10 20:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.1% [2024-08-10 20:30:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.847 (0.847) Loss 0.4731 (0.4731) Acc@1 89.844 (89.844) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.194) Loss 0.7441 (0.5835) Acc@1 81.787 (87.367) Acc@5 96.973 (98.011) Mem 16707MB [2024-08-10 20:30:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.8345 (0.6859) Acc@1 80.371 (84.649) Acc@5 95.947 (97.063) Mem 16707MB [2024-08-10 20:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.395 Acc@5 97.047 [2024-08-10 20:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:30:28 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.39% [2024-08-10 20:30:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 20:30:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 20:30:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][0/625] eta 0:08:51 lr 0.000325 wd 0.0500 time 0.8512 (0.8512) data time 0.4386 (0.4386) model time 0.0000 (0.0000) loss 2.1896 (2.1896) grad_norm 6.5938 (6.5938) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][10/625] eta 0:05:19 lr 0.000324 wd 0.0500 time 0.4707 (0.5190) data time 0.0010 (0.0410) model time 0.0000 (0.0000) loss 3.2967 (2.8630) grad_norm 1.9308 (2.4455) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][20/625] eta 0:04:59 lr 0.000324 wd 0.0500 time 0.4672 (0.4946) data time 0.0008 (0.0220) model time 0.0000 (0.0000) loss 3.3439 (2.7768) grad_norm 1.5536 (2.3850) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][30/625] eta 0:04:49 lr 0.000324 wd 0.0500 time 0.4772 (0.4863) data time 0.0010 (0.0152) model time 0.0000 (0.0000) loss 2.5011 (2.8000) grad_norm 2.1325 (2.8630) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][40/625] eta 0:04:42 lr 0.000324 wd 0.0500 time 0.4723 (0.4826) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 1.9254 (2.7321) grad_norm 2.9000 (2.7821) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][50/625] eta 0:04:36 lr 0.000324 wd 0.0500 time 0.4712 (0.4806) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 2.7879 (2.7420) grad_norm 1.5854 (2.6908) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][60/625] eta 0:04:30 lr 0.000324 wd 0.0500 time 0.4693 (0.4789) data time 0.0010 (0.0083) model time 0.4682 (0.4687) loss 2.9314 (2.7571) grad_norm 2.4467 (2.6955) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:31:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][70/625] eta 0:04:25 lr 0.000324 wd 0.0500 time 0.4684 (0.4776) data time 0.0010 (0.0073) model time 0.4674 (0.4687) loss 3.0643 (2.7718) grad_norm 1.8344 (2.6745) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][80/625] eta 0:04:19 lr 0.000324 wd 0.0500 time 0.4686 (0.4766) data time 0.0008 (0.0065) model time 0.4678 (0.4687) loss 2.3634 (2.7770) grad_norm 1.8485 (2.6755) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:31:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][90/625] eta 0:04:14 lr 0.000324 wd 0.0500 time 0.4691 (0.4757) data time 0.0010 (0.0059) model time 0.4681 (0.4683) loss 3.1894 (2.7960) grad_norm 1.2566 (2.6186) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:31:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][100/625] eta 0:04:09 lr 0.000324 wd 0.0500 time 0.4710 (0.4752) data time 0.0009 (0.0054) model time 0.4700 (0.4686) loss 2.5707 (2.8058) grad_norm 1.9034 (2.7105) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:31:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][110/625] eta 0:04:04 lr 0.000323 wd 0.0500 time 0.4751 (0.4748) data time 0.0010 (0.0050) model time 0.4741 (0.4688) loss 3.3720 (2.8012) grad_norm 2.1771 (2.6936) loss_scale 256.0000 (130.3063) mem 16707MB [2024-08-10 20:31:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][120/625] eta 0:03:59 lr 0.000323 wd 0.0500 time 0.4750 (0.4746) data time 0.0008 (0.0047) model time 0.4741 (0.4691) loss 3.1947 (2.8069) grad_norm 1.6928 (2.6695) loss_scale 256.0000 (140.6942) mem 16707MB [2024-08-10 20:31:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][130/625] eta 0:03:54 lr 0.000323 wd 0.0500 time 0.4756 (0.4744) data time 0.0011 (0.0044) model time 0.4745 (0.4693) loss 3.0765 (2.7758) grad_norm 1.6880 (2.6022) loss_scale 256.0000 (149.4962) mem 16707MB [2024-08-10 20:31:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][140/625] eta 0:03:51 lr 0.000323 wd 0.0500 time 0.4671 (0.4774) data time 0.0011 (0.0042) model time 0.4660 (0.4745) loss 2.1321 (2.7711) grad_norm 1.6050 (2.5340) loss_scale 256.0000 (157.0496) mem 16707MB [2024-08-10 20:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][150/625] eta 0:03:46 lr 0.000323 wd 0.0500 time 0.4696 (0.4769) data time 0.0008 (0.0040) model time 0.4689 (0.4739) loss 2.7724 (2.7543) grad_norm 1.6607 (2.4965) loss_scale 256.0000 (163.6026) mem 16707MB [2024-08-10 20:31:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][160/625] eta 0:03:41 lr 0.000323 wd 0.0500 time 0.4673 (0.4764) data time 0.0007 (0.0038) model time 0.4665 (0.4734) loss 3.2637 (2.7707) grad_norm 1.6380 (2.4588) loss_scale 256.0000 (169.3416) mem 16707MB [2024-08-10 20:31:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][170/625] eta 0:03:36 lr 0.000323 wd 0.0500 time 0.4702 (0.4760) data time 0.0008 (0.0037) model time 0.4694 (0.4730) loss 2.9138 (2.7671) grad_norm 2.3853 (2.4271) loss_scale 256.0000 (174.4094) mem 16707MB [2024-08-10 20:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][180/625] eta 0:03:31 lr 0.000323 wd 0.0500 time 0.4695 (0.4758) data time 0.0011 (0.0035) model time 0.4684 (0.4728) loss 2.0671 (2.7516) grad_norm 1.5515 (2.3875) loss_scale 256.0000 (178.9171) mem 16707MB [2024-08-10 20:32:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][190/625] eta 0:03:26 lr 0.000323 wd 0.0500 time 0.4749 (0.4756) data time 0.0010 (0.0034) model time 0.4739 (0.4726) loss 2.9942 (2.7546) grad_norm 1.5982 (2.4415) loss_scale 256.0000 (182.9529) mem 16707MB [2024-08-10 20:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][200/625] eta 0:03:22 lr 0.000323 wd 0.0500 time 0.4702 (0.4753) data time 0.0011 (0.0033) model time 0.4691 (0.4724) loss 2.8866 (2.7565) grad_norm 2.2105 (2.4354) loss_scale 256.0000 (186.5871) mem 16707MB [2024-08-10 20:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][210/625] eta 0:03:17 lr 0.000323 wd 0.0500 time 0.4701 (0.4765) data time 0.0010 (0.0032) model time 0.4691 (0.4741) loss 2.8973 (2.7609) grad_norm 1.5718 (2.4243) loss_scale 256.0000 (189.8768) mem 16707MB [2024-08-10 20:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][220/625] eta 0:03:12 lr 0.000322 wd 0.0500 time 0.4718 (0.4763) data time 0.0008 (0.0031) model time 0.4710 (0.4738) loss 3.2526 (2.7623) grad_norm 3.5739 (2.4261) loss_scale 256.0000 (192.8688) mem 16707MB [2024-08-10 20:32:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][230/625] eta 0:03:08 lr 0.000322 wd 0.0500 time 0.4081 (0.4767) data time 0.0011 (0.0030) model time 0.4070 (0.4744) loss 1.8542 (2.7530) grad_norm 2.3077 (2.4096) loss_scale 256.0000 (195.6017) mem 16707MB [2024-08-10 20:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][240/625] eta 0:03:03 lr 0.000322 wd 0.0500 time 0.4704 (0.4763) data time 0.0010 (0.0029) model time 0.4694 (0.4741) loss 2.8108 (2.7510) grad_norm 1.2859 (2.3768) loss_scale 256.0000 (198.1079) mem 16707MB [2024-08-10 20:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][250/625] eta 0:02:58 lr 0.000322 wd 0.0500 time 0.4701 (0.4761) data time 0.0008 (0.0028) model time 0.4692 (0.4738) loss 2.4106 (2.7334) grad_norm 2.0624 (2.3613) loss_scale 256.0000 (200.4143) mem 16707MB [2024-08-10 20:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][260/625] eta 0:02:53 lr 0.000322 wd 0.0500 time 0.4685 (0.4759) data time 0.0011 (0.0028) model time 0.4674 (0.4737) loss 3.2345 (2.7326) grad_norm 4.3552 (2.3501) loss_scale 256.0000 (202.5441) mem 16707MB [2024-08-10 20:32:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][270/625] eta 0:02:48 lr 0.000322 wd 0.0500 time 0.4741 (0.4758) data time 0.0009 (0.0027) model time 0.4731 (0.4735) loss 2.7833 (2.7338) grad_norm 2.7031 (2.3503) loss_scale 256.0000 (204.5166) mem 16707MB [2024-08-10 20:32:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][280/625] eta 0:02:44 lr 0.000322 wd 0.0500 time 0.4710 (0.4756) data time 0.0010 (0.0027) model time 0.4700 (0.4734) loss 2.9683 (2.7272) grad_norm 1.6813 (2.3497) loss_scale 256.0000 (206.3488) mem 16707MB [2024-08-10 20:32:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][290/625] eta 0:02:39 lr 0.000322 wd 0.0500 time 0.4697 (0.4754) data time 0.0011 (0.0026) model time 0.4686 (0.4732) loss 1.7072 (2.7279) grad_norm 1.7479 (2.3804) loss_scale 256.0000 (208.0550) mem 16707MB [2024-08-10 20:32:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][300/625] eta 0:02:34 lr 0.000322 wd 0.0500 time 0.4654 (0.4752) data time 0.0008 (0.0026) model time 0.4645 (0.4730) loss 3.0827 (2.7232) grad_norm 2.6281 (2.3615) loss_scale 256.0000 (209.6478) mem 16707MB [2024-08-10 20:32:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][310/625] eta 0:02:29 lr 0.000322 wd 0.0500 time 0.4622 (0.4749) data time 0.0008 (0.0025) model time 0.4614 (0.4726) loss 3.0182 (2.7284) grad_norm 1.7355 (2.3617) loss_scale 256.0000 (211.1383) mem 16707MB [2024-08-10 20:33:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][320/625] eta 0:02:24 lr 0.000322 wd 0.0500 time 0.4658 (0.4746) data time 0.0009 (0.0025) model time 0.4648 (0.4724) loss 2.6988 (2.7310) grad_norm 2.5964 (2.3480) loss_scale 256.0000 (212.5358) mem 16707MB [2024-08-10 20:33:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][330/625] eta 0:02:19 lr 0.000321 wd 0.0500 time 0.4703 (0.4744) data time 0.0008 (0.0024) model time 0.4695 (0.4722) loss 2.2009 (2.7404) grad_norm 1.5973 (2.3354) loss_scale 256.0000 (213.8489) mem 16707MB [2024-08-10 20:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][340/625] eta 0:02:15 lr 0.000321 wd 0.0500 time 0.4727 (0.4744) data time 0.0008 (0.0024) model time 0.4719 (0.4722) loss 2.9672 (2.7321) grad_norm 2.7310 (2.3468) loss_scale 256.0000 (215.0850) mem 16707MB [2024-08-10 20:33:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][350/625] eta 0:02:10 lr 0.000321 wd 0.0500 time 0.6456 (0.4754) data time 0.0010 (0.0024) model time 0.6445 (0.4734) loss 2.4701 (2.7293) grad_norm 2.8667 (2.3458) loss_scale 256.0000 (216.2507) mem 16707MB [2024-08-10 20:33:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][360/625] eta 0:02:05 lr 0.000321 wd 0.0500 time 0.4671 (0.4752) data time 0.0008 (0.0023) model time 0.4663 (0.4732) loss 2.0377 (2.7327) grad_norm 1.4490 (2.3326) loss_scale 256.0000 (217.3518) mem 16707MB [2024-08-10 20:33:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][370/625] eta 0:02:01 lr 0.000321 wd 0.0500 time 0.4759 (0.4751) data time 0.0008 (0.0023) model time 0.4750 (0.4731) loss 2.3037 (2.7291) grad_norm 2.9484 (2.3212) loss_scale 256.0000 (218.3935) mem 16707MB [2024-08-10 20:33:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][380/625] eta 0:01:56 lr 0.000321 wd 0.0500 time 0.4658 (0.4753) data time 0.0007 (0.0023) model time 0.4651 (0.4734) loss 3.0548 (2.7333) grad_norm 2.3061 (2.3148) loss_scale 256.0000 (219.3806) mem 16707MB [2024-08-10 20:33:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][390/625] eta 0:01:51 lr 0.000321 wd 0.0500 time 0.4719 (0.4752) data time 0.0010 (0.0022) model time 0.4709 (0.4733) loss 2.8686 (2.7329) grad_norm 1.8049 (2.3037) loss_scale 256.0000 (220.3171) mem 16707MB [2024-08-10 20:33:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][400/625] eta 0:01:46 lr 0.000321 wd 0.0500 time 0.4764 (0.4751) data time 0.0011 (0.0022) model time 0.4753 (0.4732) loss 2.9574 (2.7301) grad_norm 1.5299 (2.3081) loss_scale 256.0000 (221.2070) mem 16707MB [2024-08-10 20:33:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][410/625] eta 0:01:42 lr 0.000321 wd 0.0500 time 0.4771 (0.4751) data time 0.0011 (0.0022) model time 0.4760 (0.4732) loss 2.6181 (2.7303) grad_norm 6.4673 (2.3144) loss_scale 256.0000 (222.0535) mem 16707MB [2024-08-10 20:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][420/625] eta 0:01:37 lr 0.000321 wd 0.0500 time 0.4676 (0.4750) data time 0.0008 (0.0021) model time 0.4668 (0.4731) loss 2.9376 (2.7291) grad_norm 1.7995 (2.4294) loss_scale 256.0000 (222.8599) mem 16707MB [2024-08-10 20:33:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][430/625] eta 0:01:32 lr 0.000320 wd 0.0500 time 0.4666 (0.4749) data time 0.0011 (0.0021) model time 0.4655 (0.4730) loss 2.4679 (2.7289) grad_norm 1.8206 (2.4195) loss_scale 256.0000 (223.6288) mem 16707MB [2024-08-10 20:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][440/625] eta 0:01:27 lr 0.000320 wd 0.0500 time 0.4729 (0.4747) data time 0.0010 (0.0021) model time 0.4719 (0.4728) loss 2.2847 (2.7305) grad_norm 2.9377 (2.4137) loss_scale 256.0000 (224.3628) mem 16707MB [2024-08-10 20:34:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][450/625] eta 0:01:23 lr 0.000320 wd 0.0500 time 0.4629 (0.4745) data time 0.0008 (0.0021) model time 0.4621 (0.4726) loss 2.0840 (2.7280) grad_norm 1.8573 (2.4117) loss_scale 256.0000 (225.0643) mem 16707MB [2024-08-10 20:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][460/625] eta 0:01:18 lr 0.000320 wd 0.0500 time 0.4652 (0.4744) data time 0.0013 (0.0021) model time 0.4639 (0.4725) loss 3.0351 (2.7290) grad_norm 1.5516 (2.4066) loss_scale 256.0000 (225.7354) mem 16707MB [2024-08-10 20:34:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][470/625] eta 0:01:13 lr 0.000320 wd 0.0500 time 0.4783 (0.4748) data time 0.0009 (0.0020) model time 0.4774 (0.4730) loss 1.8544 (2.7251) grad_norm 1.9568 (2.3956) loss_scale 256.0000 (226.3779) mem 16707MB [2024-08-10 20:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][480/625] eta 0:01:08 lr 0.000320 wd 0.0500 time 0.4696 (0.4749) data time 0.0010 (0.0020) model time 0.4686 (0.4731) loss 2.9553 (2.7259) grad_norm 1.5173 (2.3853) loss_scale 256.0000 (226.9938) mem 16707MB [2024-08-10 20:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][490/625] eta 0:01:04 lr 0.000320 wd 0.0500 time 0.4849 (0.4750) data time 0.0011 (0.0020) model time 0.4838 (0.4732) loss 2.5016 (2.7278) grad_norm 3.4431 (2.3950) loss_scale 256.0000 (227.5845) mem 16707MB [2024-08-10 20:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][500/625] eta 0:00:59 lr 0.000320 wd 0.0500 time 0.4692 (0.4750) data time 0.0012 (0.0020) model time 0.4680 (0.4732) loss 1.9579 (2.7289) grad_norm 1.9621 (2.4000) loss_scale 256.0000 (228.1517) mem 16707MB [2024-08-10 20:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][510/625] eta 0:00:54 lr 0.000320 wd 0.0500 time 0.4670 (0.4749) data time 0.0008 (0.0020) model time 0.4662 (0.4731) loss 2.7533 (2.7318) grad_norm 1.5099 (2.3972) loss_scale 256.0000 (228.6967) mem 16707MB [2024-08-10 20:34:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][520/625] eta 0:00:49 lr 0.000320 wd 0.0500 time 0.4699 (0.4747) data time 0.0008 (0.0019) model time 0.4691 (0.4730) loss 3.1826 (2.7329) grad_norm 1.6637 (2.3912) loss_scale 256.0000 (229.2207) mem 16707MB [2024-08-10 20:34:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][530/625] eta 0:00:45 lr 0.000320 wd 0.0500 time 0.4689 (0.4746) data time 0.0008 (0.0019) model time 0.4680 (0.4729) loss 2.8805 (2.7269) grad_norm 1.5688 (2.3818) loss_scale 256.0000 (229.7250) mem 16707MB [2024-08-10 20:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][540/625] eta 0:00:40 lr 0.000319 wd 0.0500 time 0.4749 (0.4745) data time 0.0011 (0.0019) model time 0.4738 (0.4728) loss 3.2544 (2.7287) grad_norm 2.3537 (2.4048) loss_scale 256.0000 (230.2107) mem 16707MB [2024-08-10 20:34:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][550/625] eta 0:00:35 lr 0.000319 wd 0.0500 time 0.4745 (0.4745) data time 0.0008 (0.0019) model time 0.4737 (0.4727) loss 3.3783 (2.7258) grad_norm 8.8598 (2.4098) loss_scale 256.0000 (230.6788) mem 16707MB [2024-08-10 20:34:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][560/625] eta 0:00:30 lr 0.000319 wd 0.0500 time 0.4704 (0.4745) data time 0.0008 (0.0019) model time 0.4696 (0.4727) loss 2.6022 (2.7261) grad_norm 4.2606 (2.4200) loss_scale 256.0000 (231.1301) mem 16707MB [2024-08-10 20:35:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][570/625] eta 0:00:26 lr 0.000319 wd 0.0500 time 0.4692 (0.4744) data time 0.0010 (0.0019) model time 0.4682 (0.4727) loss 2.1509 (2.7246) grad_norm 2.1472 (2.4145) loss_scale 256.0000 (231.5657) mem 16707MB [2024-08-10 20:35:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][580/625] eta 0:00:21 lr 0.000319 wd 0.0500 time 0.4748 (0.4744) data time 0.0009 (0.0019) model time 0.4740 (0.4726) loss 3.1943 (2.7240) grad_norm 2.2289 (2.4398) loss_scale 256.0000 (231.9862) mem 16707MB [2024-08-10 20:35:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][590/625] eta 0:00:16 lr 0.000319 wd 0.0500 time 0.4686 (0.4743) data time 0.0009 (0.0019) model time 0.4678 (0.4726) loss 2.9247 (2.7271) grad_norm 1.7029 (2.4349) loss_scale 256.0000 (232.3926) mem 16707MB [2024-08-10 20:35:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][600/625] eta 0:00:11 lr 0.000319 wd 0.0500 time 0.4707 (0.4742) data time 0.0008 (0.0018) model time 0.4699 (0.4725) loss 2.9932 (2.7306) grad_norm 2.3958 (2.4290) loss_scale 256.0000 (232.7854) mem 16707MB [2024-08-10 20:35:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][610/625] eta 0:00:07 lr 0.000319 wd 0.0500 time 0.4645 (0.4741) data time 0.0005 (0.0018) model time 0.4640 (0.4724) loss 2.6546 (2.7313) grad_norm 1.8389 (2.4196) loss_scale 256.0000 (233.1653) mem 16707MB [2024-08-10 20:35:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [204/300][620/625] eta 0:00:02 lr 0.000319 wd 0.0500 time 0.4643 (0.4740) data time 0.0007 (0.0018) model time 0.4635 (0.4723) loss 2.8074 (2.7351) grad_norm 1.0968 (2.4121) loss_scale 256.0000 (233.5330) mem 16707MB [2024-08-10 20:35:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 204 training takes 0:04:56 [2024-08-10 20:35:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:35:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:35:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5269 (0.5269) Acc@1 88.623 (88.623) Acc@5 98.535 (98.535) Mem 16707MB [2024-08-10 20:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.8228 (0.6282) Acc@1 79.932 (86.479) Acc@5 95.703 (97.563) Mem 16707MB [2024-08-10 20:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9287 (0.7429) Acc@1 78.564 (83.575) Acc@5 95.117 (96.456) Mem 16707MB [2024-08-10 20:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.277 Acc@5 96.439 [2024-08-10 20:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 20:35:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.059 (1.059) Loss 0.4746 (0.4746) Acc@1 89.746 (89.746) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.210) Loss 0.7471 (0.5839) Acc@1 81.934 (87.331) Acc@5 96.826 (97.998) Mem 16707MB [2024-08-10 20:35:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.166) Loss 0.8340 (0.6862) Acc@1 80.273 (84.617) Acc@5 96.045 (97.059) Mem 16707MB [2024-08-10 20:35:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.379 Acc@5 97.035 [2024-08-10 20:35:35 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:35:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][0/625] eta 0:14:01 lr 0.000319 wd 0.0500 time 1.3469 (1.3469) data time 0.7130 (0.7130) model time 0.0000 (0.0000) loss 2.8723 (2.8723) grad_norm 1.6539 (1.6539) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:35:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][10/625] eta 0:05:38 lr 0.000319 wd 0.0500 time 0.4742 (0.5508) data time 0.0008 (0.0657) model time 0.0000 (0.0000) loss 3.1388 (2.7839) grad_norm 1.4289 (1.6078) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:35:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][20/625] eta 0:05:16 lr 0.000318 wd 0.0500 time 0.6907 (0.5234) data time 0.0007 (0.0349) model time 0.0000 (0.0000) loss 3.2804 (2.7534) grad_norm 2.3733 (1.9326) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:35:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][30/625] eta 0:05:07 lr 0.000318 wd 0.0500 time 0.4656 (0.5165) data time 0.0008 (0.0240) model time 0.0000 (0.0000) loss 2.4738 (2.7353) grad_norm 1.8318 (2.1972) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:35:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][40/625] eta 0:04:55 lr 0.000318 wd 0.0500 time 0.4680 (0.5050) data time 0.0009 (0.0184) model time 0.0000 (0.0000) loss 2.2833 (2.7290) grad_norm 1.8020 (2.5528) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][50/625] eta 0:04:46 lr 0.000318 wd 0.0500 time 0.4749 (0.4986) data time 0.0008 (0.0150) model time 0.0000 (0.0000) loss 3.5749 (2.7209) grad_norm 2.7132 (2.4597) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][60/625] eta 0:04:41 lr 0.000318 wd 0.0500 time 0.4726 (0.4977) data time 0.0010 (0.0127) model time 0.4716 (0.4922) loss 2.4533 (2.7439) grad_norm 1.7462 (2.3415) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][70/625] eta 0:04:34 lr 0.000318 wd 0.0500 time 0.4757 (0.4939) data time 0.0008 (0.0111) model time 0.4749 (0.4808) loss 2.8041 (2.7472) grad_norm 1.3725 (2.2626) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][80/625] eta 0:04:27 lr 0.000318 wd 0.0500 time 0.4743 (0.4912) data time 0.0009 (0.0098) model time 0.4734 (0.4775) loss 3.2665 (2.7559) grad_norm 1.9477 (2.2074) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][90/625] eta 0:04:21 lr 0.000318 wd 0.0500 time 0.4791 (0.4890) data time 0.0009 (0.0089) model time 0.4782 (0.4756) loss 3.2667 (2.7233) grad_norm 1.9598 (2.1719) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][100/625] eta 0:04:15 lr 0.000318 wd 0.0500 time 0.4703 (0.4871) data time 0.0009 (0.0081) model time 0.4693 (0.4742) loss 2.9960 (2.7006) grad_norm 1.6974 (2.2092) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][110/625] eta 0:04:09 lr 0.000318 wd 0.0500 time 0.4675 (0.4853) data time 0.0011 (0.0075) model time 0.4664 (0.4730) loss 2.2006 (2.7019) grad_norm 1.2695 (2.1943) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][120/625] eta 0:04:04 lr 0.000318 wd 0.0500 time 0.4676 (0.4839) data time 0.0008 (0.0069) model time 0.4668 (0.4722) loss 3.1966 (2.7075) grad_norm 2.6279 (2.4549) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][130/625] eta 0:03:59 lr 0.000317 wd 0.0500 time 0.4771 (0.4829) data time 0.0011 (0.0065) model time 0.4761 (0.4718) loss 2.8091 (2.7059) grad_norm 1.6814 (2.4236) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][140/625] eta 0:03:53 lr 0.000317 wd 0.0500 time 0.4736 (0.4821) data time 0.0008 (0.0061) model time 0.4728 (0.4717) loss 3.3177 (2.7120) grad_norm 1.7133 (2.4411) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][150/625] eta 0:03:48 lr 0.000317 wd 0.0500 time 0.4725 (0.4814) data time 0.0010 (0.0058) model time 0.4715 (0.4715) loss 2.3777 (2.7158) grad_norm 2.3360 (2.4304) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][160/625] eta 0:03:43 lr 0.000317 wd 0.0500 time 0.4738 (0.4807) data time 0.0010 (0.0055) model time 0.4727 (0.4713) loss 2.2762 (2.7123) grad_norm 2.0335 (2.3922) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][170/625] eta 0:03:38 lr 0.000317 wd 0.0500 time 0.4705 (0.4800) data time 0.0011 (0.0052) model time 0.4694 (0.4710) loss 2.6562 (2.7115) grad_norm 3.3133 (2.4152) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][180/625] eta 0:03:33 lr 0.000317 wd 0.0500 time 0.4671 (0.4793) data time 0.0011 (0.0050) model time 0.4660 (0.4707) loss 1.8332 (2.7124) grad_norm 2.3504 (2.3998) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][190/625] eta 0:03:28 lr 0.000317 wd 0.0500 time 0.4697 (0.4787) data time 0.0008 (0.0048) model time 0.4689 (0.4704) loss 2.0429 (2.7142) grad_norm 2.2664 (2.3878) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][200/625] eta 0:03:23 lr 0.000317 wd 0.0500 time 0.4745 (0.4784) data time 0.0008 (0.0046) model time 0.4738 (0.4704) loss 2.2744 (2.7021) grad_norm 2.2169 (2.3793) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][210/625] eta 0:03:18 lr 0.000317 wd 0.0500 time 0.4763 (0.4781) data time 0.0010 (0.0044) model time 0.4752 (0.4705) loss 2.9901 (2.7047) grad_norm 1.5201 (2.3403) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][220/625] eta 0:03:13 lr 0.000317 wd 0.0500 time 0.4725 (0.4779) data time 0.0008 (0.0043) model time 0.4718 (0.4706) loss 3.0092 (2.7042) grad_norm 1.7072 (2.3300) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][230/625] eta 0:03:08 lr 0.000317 wd 0.0500 time 0.4705 (0.4777) data time 0.0008 (0.0041) model time 0.4697 (0.4707) loss 1.8683 (2.6914) grad_norm 1.3822 (2.3283) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][240/625] eta 0:03:03 lr 0.000316 wd 0.0500 time 0.4660 (0.4773) data time 0.0009 (0.0040) model time 0.4652 (0.4706) loss 2.8769 (2.6922) grad_norm 1.8382 (2.3096) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][250/625] eta 0:02:59 lr 0.000316 wd 0.0500 time 0.4841 (0.4784) data time 0.0008 (0.0039) model time 0.4832 (0.4722) loss 3.5323 (2.6937) grad_norm 1.4799 (2.2880) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][260/625] eta 0:02:54 lr 0.000316 wd 0.0500 time 0.4650 (0.4780) data time 0.0010 (0.0038) model time 0.4640 (0.4720) loss 3.1602 (2.6924) grad_norm 2.4936 (2.2963) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][270/625] eta 0:02:49 lr 0.000316 wd 0.0500 time 0.4730 (0.4777) data time 0.0010 (0.0037) model time 0.4720 (0.4718) loss 2.4454 (2.6981) grad_norm 1.1643 (2.2809) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][280/625] eta 0:02:44 lr 0.000316 wd 0.0500 time 0.4816 (0.4775) data time 0.0008 (0.0036) model time 0.4808 (0.4718) loss 2.7447 (2.6902) grad_norm 2.4522 (2.2772) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][290/625] eta 0:02:39 lr 0.000316 wd 0.0500 time 0.4708 (0.4773) data time 0.0010 (0.0035) model time 0.4698 (0.4718) loss 2.7831 (2.6884) grad_norm 1.3135 (2.2645) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][300/625] eta 0:02:35 lr 0.000316 wd 0.0500 time 0.4696 (0.4772) data time 0.0010 (0.0034) model time 0.4686 (0.4717) loss 2.4224 (2.6868) grad_norm 1.6187 (2.2481) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][310/625] eta 0:02:30 lr 0.000316 wd 0.0500 time 0.4649 (0.4769) data time 0.0008 (0.0033) model time 0.4641 (0.4716) loss 3.0349 (2.6827) grad_norm 2.4034 (2.2532) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][320/625] eta 0:02:25 lr 0.000316 wd 0.0500 time 0.4770 (0.4767) data time 0.0009 (0.0033) model time 0.4761 (0.4715) loss 3.1649 (2.6875) grad_norm 2.1306 (2.2487) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][330/625] eta 0:02:20 lr 0.000316 wd 0.0500 time 0.4752 (0.4766) data time 0.0010 (0.0032) model time 0.4741 (0.4715) loss 3.0078 (2.6942) grad_norm 1.3689 (2.2416) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][340/625] eta 0:02:15 lr 0.000316 wd 0.0500 time 0.4679 (0.4763) data time 0.0008 (0.0032) model time 0.4671 (0.4714) loss 3.5961 (2.6978) grad_norm 1.8233 (2.2353) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][350/625] eta 0:02:10 lr 0.000315 wd 0.0500 time 0.4695 (0.4762) data time 0.0011 (0.0031) model time 0.4684 (0.4713) loss 2.6272 (2.7032) grad_norm 1.3090 (2.2228) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][360/625] eta 0:02:06 lr 0.000315 wd 0.0500 time 0.4716 (0.4766) data time 0.0011 (0.0030) model time 0.4705 (0.4720) loss 2.3947 (2.7003) grad_norm 2.1196 (2.2154) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][370/625] eta 0:02:01 lr 0.000315 wd 0.0500 time 0.4747 (0.4770) data time 0.0010 (0.0030) model time 0.4737 (0.4725) loss 2.9277 (2.7006) grad_norm 1.4622 (2.2058) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][380/625] eta 0:01:56 lr 0.000315 wd 0.0500 time 0.4667 (0.4769) data time 0.0010 (0.0029) model time 0.4657 (0.4724) loss 2.7570 (2.7019) grad_norm 1.5563 (2.2031) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][390/625] eta 0:01:52 lr 0.000315 wd 0.0500 time 0.4709 (0.4771) data time 0.0011 (0.0029) model time 0.4698 (0.4728) loss 2.9291 (2.7056) grad_norm 2.9431 (2.2060) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][400/625] eta 0:01:47 lr 0.000315 wd 0.0500 time 0.4682 (0.4769) data time 0.0008 (0.0028) model time 0.4674 (0.4726) loss 2.9596 (2.7117) grad_norm 2.1866 (2.2037) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][410/625] eta 0:01:42 lr 0.000315 wd 0.0500 time 0.4733 (0.4767) data time 0.0009 (0.0028) model time 0.4723 (0.4725) loss 3.2299 (2.7157) grad_norm 1.8126 (2.2079) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][420/625] eta 0:01:37 lr 0.000315 wd 0.0500 time 0.4682 (0.4765) data time 0.0008 (0.0028) model time 0.4674 (0.4724) loss 2.9196 (2.7150) grad_norm 1.8272 (2.2000) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][430/625] eta 0:01:32 lr 0.000315 wd 0.0500 time 0.4742 (0.4764) data time 0.0011 (0.0027) model time 0.4731 (0.4724) loss 1.5444 (2.7154) grad_norm 3.1665 (2.1961) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][440/625] eta 0:01:28 lr 0.000315 wd 0.0500 time 0.4655 (0.4763) data time 0.0012 (0.0027) model time 0.4643 (0.4724) loss 2.6653 (2.7161) grad_norm 1.7656 (2.1884) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][450/625] eta 0:01:23 lr 0.000314 wd 0.0500 time 0.4742 (0.4762) data time 0.0008 (0.0026) model time 0.4734 (0.4723) loss 3.1669 (2.7183) grad_norm 1.8978 (2.1876) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][460/625] eta 0:01:18 lr 0.000314 wd 0.0500 time 0.4695 (0.4761) data time 0.0010 (0.0026) model time 0.4684 (0.4722) loss 2.6838 (2.7174) grad_norm 2.3401 (2.1909) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][470/625] eta 0:01:13 lr 0.000314 wd 0.0500 time 0.4654 (0.4764) data time 0.0009 (0.0026) model time 0.4645 (0.4727) loss 3.4809 (2.7177) grad_norm 2.2452 (2.1870) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][480/625] eta 0:01:09 lr 0.000314 wd 0.0500 time 0.4647 (0.4763) data time 0.0008 (0.0025) model time 0.4639 (0.4726) loss 2.8879 (2.7168) grad_norm 3.5721 (2.2092) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][490/625] eta 0:01:04 lr 0.000314 wd 0.0500 time 0.4729 (0.4762) data time 0.0008 (0.0025) model time 0.4721 (0.4725) loss 3.1965 (2.7188) grad_norm 4.1240 (2.2142) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][500/625] eta 0:00:59 lr 0.000314 wd 0.0500 time 0.4731 (0.4761) data time 0.0008 (0.0025) model time 0.4723 (0.4725) loss 3.0486 (2.7217) grad_norm 6.9528 (2.2220) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][510/625] eta 0:00:54 lr 0.000314 wd 0.0500 time 0.4699 (0.4760) data time 0.0010 (0.0025) model time 0.4689 (0.4725) loss 3.1302 (2.7253) grad_norm 2.0231 (2.2382) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][520/625] eta 0:00:49 lr 0.000314 wd 0.0500 time 0.4726 (0.4760) data time 0.0011 (0.0024) model time 0.4715 (0.4724) loss 1.9042 (2.7214) grad_norm 2.5172 (2.3342) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][530/625] eta 0:00:45 lr 0.000314 wd 0.0500 time 0.4680 (0.4758) data time 0.0010 (0.0024) model time 0.4670 (0.4723) loss 2.7867 (2.7249) grad_norm 1.4616 (2.3414) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][540/625] eta 0:00:40 lr 0.000314 wd 0.0500 time 0.4659 (0.4757) data time 0.0010 (0.0024) model time 0.4648 (0.4722) loss 2.8162 (2.7238) grad_norm 1.8123 (2.3331) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:39:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][550/625] eta 0:00:35 lr 0.000314 wd 0.0500 time 0.4678 (0.4758) data time 0.0009 (0.0024) model time 0.4669 (0.4724) loss 3.4044 (2.7264) grad_norm 1.8495 (2.3257) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][560/625] eta 0:00:30 lr 0.000313 wd 0.0500 time 0.4660 (0.4759) data time 0.0010 (0.0023) model time 0.4650 (0.4725) loss 3.0268 (2.7288) grad_norm 3.2645 (2.3234) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][570/625] eta 0:00:26 lr 0.000313 wd 0.0500 time 0.4732 (0.4757) data time 0.0009 (0.0023) model time 0.4723 (0.4724) loss 2.9226 (2.7300) grad_norm 1.4717 (2.3152) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][580/625] eta 0:00:21 lr 0.000313 wd 0.0500 time 0.4676 (0.4757) data time 0.0010 (0.0023) model time 0.4666 (0.4724) loss 3.0776 (2.7275) grad_norm 2.1831 (2.3075) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][590/625] eta 0:00:16 lr 0.000313 wd 0.0500 time 0.4669 (0.4756) data time 0.0011 (0.0023) model time 0.4658 (0.4724) loss 2.9070 (2.7305) grad_norm 2.0922 (2.3047) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][600/625] eta 0:00:11 lr 0.000313 wd 0.0500 time 0.4662 (0.4755) data time 0.0010 (0.0022) model time 0.4652 (0.4723) loss 1.7105 (2.7267) grad_norm 1.9939 (2.3016) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][610/625] eta 0:00:07 lr 0.000313 wd 0.0500 time 0.4679 (0.4754) data time 0.0005 (0.0022) model time 0.4674 (0.4722) loss 3.1727 (2.7273) grad_norm 1.8330 (2.2964) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [205/300][620/625] eta 0:00:02 lr 0.000313 wd 0.0500 time 0.4658 (0.4756) data time 0.0008 (0.0022) model time 0.4650 (0.4725) loss 2.5572 (2.7274) grad_norm 1.2725 (2.2911) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 205 training takes 0:04:57 [2024-08-10 20:40:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:40:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:40:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.509 (0.509) Loss 0.4951 (0.4951) Acc@1 89.209 (89.209) Acc@5 98.926 (98.926) Mem 16707MB [2024-08-10 20:40:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.8228 (0.6158) Acc@1 80.078 (86.541) Acc@5 95.801 (97.736) Mem 16707MB [2024-08-10 20:40:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.8892 (0.7349) Acc@1 78.223 (83.580) Acc@5 95.410 (96.608) Mem 16707MB [2024-08-10 20:40:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.319 Acc@5 96.589 [2024-08-10 20:40:37 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 20:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.960 (0.960) Loss 0.4751 (0.4751) Acc@1 89.795 (89.795) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.202) Loss 0.7476 (0.5840) Acc@1 81.885 (87.327) Acc@5 96.680 (97.985) Mem 16707MB [2024-08-10 20:40:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.162) Loss 0.8345 (0.6863) Acc@1 80.127 (84.617) Acc@5 96.045 (97.042) Mem 16707MB [2024-08-10 20:40:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.357 Acc@5 97.023 [2024-08-10 20:40:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][0/625] eta 0:13:39 lr 0.000313 wd 0.0500 time 1.3110 (1.3110) data time 0.8014 (0.8014) model time 0.0000 (0.0000) loss 3.0590 (3.0590) grad_norm 1.6249 (1.6249) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][10/625] eta 0:05:37 lr 0.000313 wd 0.0500 time 0.4696 (0.5490) data time 0.0008 (0.0739) model time 0.0000 (0.0000) loss 2.0656 (2.6521) grad_norm 2.3830 (2.2494) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][20/625] eta 0:05:09 lr 0.000313 wd 0.0500 time 0.4781 (0.5123) data time 0.0008 (0.0392) model time 0.0000 (0.0000) loss 2.0487 (2.6024) grad_norm 2.6022 (2.3053) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][30/625] eta 0:04:59 lr 0.000313 wd 0.0500 time 0.4112 (0.5025) data time 0.0009 (0.0269) model time 0.0000 (0.0000) loss 2.9465 (2.6115) grad_norm 1.7073 (2.3140) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][40/625] eta 0:04:49 lr 0.000312 wd 0.0500 time 0.4679 (0.4944) data time 0.0008 (0.0206) model time 0.0000 (0.0000) loss 3.4038 (2.6116) grad_norm 1.7574 (2.2719) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][50/625] eta 0:04:41 lr 0.000312 wd 0.0500 time 0.4682 (0.4896) data time 0.0008 (0.0168) model time 0.0000 (0.0000) loss 3.2078 (2.6156) grad_norm 1.9932 (2.8095) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][60/625] eta 0:04:34 lr 0.000312 wd 0.0500 time 0.4642 (0.4865) data time 0.0009 (0.0142) model time 0.4633 (0.4694) loss 2.7090 (2.6495) grad_norm 1.6003 (2.6483) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][70/625] eta 0:04:28 lr 0.000312 wd 0.0500 time 0.4741 (0.4842) data time 0.0008 (0.0123) model time 0.4732 (0.4693) loss 3.1317 (2.6459) grad_norm 2.3777 (2.5739) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][80/625] eta 0:04:22 lr 0.000312 wd 0.0500 time 0.4772 (0.4825) data time 0.0010 (0.0109) model time 0.4762 (0.4694) loss 3.2161 (2.6348) grad_norm 2.1625 (2.5332) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][90/625] eta 0:04:17 lr 0.000312 wd 0.0500 time 0.4768 (0.4811) data time 0.0008 (0.0099) model time 0.4761 (0.4693) loss 3.0565 (2.6162) grad_norm 2.2528 (2.4601) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][100/625] eta 0:04:12 lr 0.000312 wd 0.0500 time 0.4660 (0.4803) data time 0.0011 (0.0090) model time 0.4650 (0.4697) loss 2.8711 (2.6379) grad_norm 1.7970 (2.3914) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][110/625] eta 0:04:06 lr 0.000312 wd 0.0500 time 0.4685 (0.4794) data time 0.0008 (0.0083) model time 0.4677 (0.4698) loss 2.8825 (2.6682) grad_norm 2.1931 (2.3670) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][120/625] eta 0:04:01 lr 0.000312 wd 0.0500 time 0.4766 (0.4788) data time 0.0011 (0.0077) model time 0.4755 (0.4699) loss 2.7519 (2.6685) grad_norm 2.8199 (2.3815) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][130/625] eta 0:03:58 lr 0.000312 wd 0.0500 time 0.4696 (0.4812) data time 0.0011 (0.0072) model time 0.4685 (0.4748) loss 3.1366 (2.6725) grad_norm 3.3231 (2.3981) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][140/625] eta 0:03:53 lr 0.000312 wd 0.0500 time 0.4699 (0.4805) data time 0.0011 (0.0067) model time 0.4688 (0.4743) loss 1.7535 (2.6665) grad_norm 2.8204 (2.3962) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][150/625] eta 0:03:47 lr 0.000311 wd 0.0500 time 0.4741 (0.4798) data time 0.0008 (0.0064) model time 0.4733 (0.4737) loss 3.1747 (2.6548) grad_norm 1.7078 (2.3786) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:41:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][160/625] eta 0:03:42 lr 0.000311 wd 0.0500 time 0.4688 (0.4793) data time 0.0010 (0.0060) model time 0.4678 (0.4735) loss 2.9371 (2.6649) grad_norm 1.9743 (2.3525) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][170/625] eta 0:03:37 lr 0.000311 wd 0.0500 time 0.4688 (0.4790) data time 0.0011 (0.0058) model time 0.4677 (0.4734) loss 3.3594 (2.6664) grad_norm 1.8319 (2.3379) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][180/625] eta 0:03:32 lr 0.000311 wd 0.0500 time 0.4726 (0.4786) data time 0.0009 (0.0055) model time 0.4717 (0.4732) loss 3.4434 (2.6762) grad_norm 1.6564 (2.3150) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][190/625] eta 0:03:28 lr 0.000311 wd 0.0500 time 0.4670 (0.4793) data time 0.0010 (0.0053) model time 0.4660 (0.4745) loss 2.6747 (2.6897) grad_norm 1.9360 (2.3041) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][200/625] eta 0:03:23 lr 0.000311 wd 0.0500 time 0.4645 (0.4788) data time 0.0008 (0.0051) model time 0.4637 (0.4740) loss 3.3409 (2.6947) grad_norm 3.3785 (2.3301) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][210/625] eta 0:03:18 lr 0.000311 wd 0.0500 time 0.4682 (0.4784) data time 0.0008 (0.0049) model time 0.4674 (0.4737) loss 3.2966 (2.6796) grad_norm 2.3130 (2.3251) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][220/625] eta 0:03:13 lr 0.000311 wd 0.0500 time 0.4667 (0.4780) data time 0.0008 (0.0047) model time 0.4659 (0.4735) loss 1.7736 (2.6679) grad_norm 1.3823 (2.3060) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][230/625] eta 0:03:08 lr 0.000311 wd 0.0500 time 0.4681 (0.4778) data time 0.0008 (0.0045) model time 0.4673 (0.4733) loss 2.6509 (2.6678) grad_norm 2.1500 (2.2842) loss_scale 256.0000 (256.0000) mem 16707MB [2024-08-10 20:42:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][240/625] eta 0:03:03 lr 0.000311 wd 0.0500 time 0.4740 (0.4775) data time 0.0010 (0.0044) model time 0.4730 (0.4732) loss 2.0551 (2.6620) grad_norm 1.7032 (nan) loss_scale 128.0000 (254.9378) mem 16707MB [2024-08-10 20:42:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][250/625] eta 0:02:59 lr 0.000311 wd 0.0500 time 0.4065 (0.4779) data time 0.0008 (0.0043) model time 0.4057 (0.4739) loss 3.1036 (2.6694) grad_norm 1.7792 (nan) loss_scale 128.0000 (249.8805) mem 16707MB [2024-08-10 20:42:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][260/625] eta 0:02:54 lr 0.000310 wd 0.0500 time 0.4737 (0.4776) data time 0.0008 (0.0041) model time 0.4729 (0.4737) loss 2.6150 (2.6663) grad_norm 2.4782 (nan) loss_scale 128.0000 (245.2107) mem 16707MB [2024-08-10 20:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][270/625] eta 0:02:49 lr 0.000310 wd 0.0500 time 0.4695 (0.4773) data time 0.0009 (0.0040) model time 0.4687 (0.4734) loss 2.2723 (2.6693) grad_norm 1.4718 (nan) loss_scale 128.0000 (240.8856) mem 16707MB [2024-08-10 20:42:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][280/625] eta 0:02:44 lr 0.000310 wd 0.0500 time 0.4695 (0.4771) data time 0.0010 (0.0039) model time 0.4685 (0.4732) loss 2.9642 (2.6735) grad_norm 1.8061 (nan) loss_scale 128.0000 (236.8683) mem 16707MB [2024-08-10 20:43:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][290/625] eta 0:02:39 lr 0.000310 wd 0.0500 time 0.4673 (0.4768) data time 0.0008 (0.0038) model time 0.4666 (0.4730) loss 2.9786 (2.6714) grad_norm 1.7846 (nan) loss_scale 128.0000 (233.1271) mem 16707MB [2024-08-10 20:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][300/625] eta 0:02:34 lr 0.000310 wd 0.0500 time 0.4692 (0.4765) data time 0.0010 (0.0037) model time 0.4683 (0.4728) loss 2.6761 (2.6686) grad_norm 1.6328 (nan) loss_scale 128.0000 (229.6346) mem 16707MB [2024-08-10 20:43:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][310/625] eta 0:02:30 lr 0.000310 wd 0.0500 time 0.4709 (0.4764) data time 0.0008 (0.0036) model time 0.4700 (0.4728) loss 2.3830 (2.6757) grad_norm 2.5197 (nan) loss_scale 128.0000 (226.3666) mem 16707MB [2024-08-10 20:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][320/625] eta 0:02:25 lr 0.000310 wd 0.0500 time 0.4716 (0.4764) data time 0.0010 (0.0036) model time 0.4706 (0.4728) loss 2.3621 (2.6765) grad_norm 1.3264 (nan) loss_scale 128.0000 (223.3022) mem 16707MB [2024-08-10 20:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][330/625] eta 0:02:20 lr 0.000310 wd 0.0500 time 0.4785 (0.4762) data time 0.0008 (0.0035) model time 0.4777 (0.4727) loss 1.8979 (2.6711) grad_norm 2.4832 (nan) loss_scale 128.0000 (220.4230) mem 16707MB [2024-08-10 20:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][340/625] eta 0:02:15 lr 0.000310 wd 0.0500 time 0.4685 (0.4760) data time 0.0011 (0.0034) model time 0.4674 (0.4726) loss 3.0168 (2.6716) grad_norm 1.9451 (nan) loss_scale 128.0000 (217.7126) mem 16707MB [2024-08-10 20:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][350/625] eta 0:02:11 lr 0.000310 wd 0.0500 time 0.4615 (0.4769) data time 0.0010 (0.0033) model time 0.4605 (0.4737) loss 2.3792 (2.6727) grad_norm 1.8489 (nan) loss_scale 128.0000 (215.1567) mem 16707MB [2024-08-10 20:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][360/625] eta 0:02:06 lr 0.000310 wd 0.0500 time 0.4664 (0.4767) data time 0.0008 (0.0033) model time 0.4657 (0.4735) loss 3.1490 (2.6770) grad_norm 1.7065 (nan) loss_scale 128.0000 (212.7424) mem 16707MB [2024-08-10 20:43:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][370/625] eta 0:02:01 lr 0.000309 wd 0.0500 time 0.4661 (0.4764) data time 0.0010 (0.0032) model time 0.4651 (0.4733) loss 2.9155 (2.6699) grad_norm 1.8063 (nan) loss_scale 128.0000 (210.4582) mem 16707MB [2024-08-10 20:43:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][380/625] eta 0:01:56 lr 0.000309 wd 0.0500 time 0.4717 (0.4763) data time 0.0010 (0.0032) model time 0.4707 (0.4732) loss 1.9619 (2.6700) grad_norm 2.1165 (nan) loss_scale 128.0000 (208.2940) mem 16707MB [2024-08-10 20:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][390/625] eta 0:01:51 lr 0.000309 wd 0.0500 time 0.4730 (0.4762) data time 0.0008 (0.0031) model time 0.4722 (0.4732) loss 2.6173 (2.6734) grad_norm 2.2377 (nan) loss_scale 128.0000 (206.2404) mem 16707MB [2024-08-10 20:43:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][400/625] eta 0:01:47 lr 0.000309 wd 0.0500 time 0.4699 (0.4766) data time 0.0009 (0.0031) model time 0.4691 (0.4737) loss 2.7215 (2.6749) grad_norm 1.3106 (nan) loss_scale 128.0000 (204.2893) mem 16707MB [2024-08-10 20:43:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][410/625] eta 0:01:42 lr 0.000309 wd 0.0500 time 0.4666 (0.4774) data time 0.0011 (0.0030) model time 0.4655 (0.4746) loss 3.2740 (2.6731) grad_norm 1.9974 (nan) loss_scale 128.0000 (202.4331) mem 16707MB [2024-08-10 20:44:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][420/625] eta 0:01:37 lr 0.000309 wd 0.0500 time 0.4650 (0.4772) data time 0.0008 (0.0030) model time 0.4643 (0.4744) loss 1.8000 (2.6750) grad_norm 1.6966 (nan) loss_scale 128.0000 (200.6651) mem 16707MB [2024-08-10 20:44:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][430/625] eta 0:01:33 lr 0.000309 wd 0.0500 time 0.4866 (0.4770) data time 0.0009 (0.0029) model time 0.4857 (0.4743) loss 1.9806 (2.6758) grad_norm 1.7333 (nan) loss_scale 128.0000 (198.9791) mem 16707MB [2024-08-10 20:44:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][440/625] eta 0:01:28 lr 0.000309 wd 0.0500 time 0.4669 (0.4769) data time 0.0008 (0.0029) model time 0.4661 (0.4742) loss 2.3736 (2.6723) grad_norm 1.9005 (nan) loss_scale 128.0000 (197.3696) mem 16707MB [2024-08-10 20:44:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][450/625] eta 0:01:23 lr 0.000309 wd 0.0500 time 0.4709 (0.4768) data time 0.0011 (0.0028) model time 0.4698 (0.4741) loss 2.9091 (2.6779) grad_norm 1.7556 (nan) loss_scale 128.0000 (195.8315) mem 16707MB [2024-08-10 20:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][460/625] eta 0:01:18 lr 0.000309 wd 0.0500 time 0.4724 (0.4767) data time 0.0008 (0.0028) model time 0.4716 (0.4740) loss 2.4413 (2.6801) grad_norm 3.1234 (nan) loss_scale 128.0000 (194.3601) mem 16707MB [2024-08-10 20:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][470/625] eta 0:01:13 lr 0.000309 wd 0.0500 time 0.4677 (0.4766) data time 0.0008 (0.0028) model time 0.4669 (0.4740) loss 3.0974 (2.6821) grad_norm 2.5244 (nan) loss_scale 128.0000 (192.9512) mem 16707MB [2024-08-10 20:44:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][480/625] eta 0:01:09 lr 0.000308 wd 0.0500 time 0.4725 (0.4765) data time 0.0011 (0.0027) model time 0.4715 (0.4738) loss 2.1360 (2.6796) grad_norm 1.6194 (nan) loss_scale 128.0000 (191.6008) mem 16707MB [2024-08-10 20:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][490/625] eta 0:01:04 lr 0.000308 wd 0.0500 time 0.4661 (0.4763) data time 0.0008 (0.0027) model time 0.4653 (0.4737) loss 2.9653 (2.6788) grad_norm 2.2990 (nan) loss_scale 128.0000 (190.3055) mem 16707MB [2024-08-10 20:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][500/625] eta 0:00:59 lr 0.000308 wd 0.0500 time 0.4703 (0.4770) data time 0.0008 (0.0027) model time 0.4695 (0.4744) loss 2.6340 (2.6803) grad_norm 1.5503 (nan) loss_scale 128.0000 (189.0619) mem 16707MB [2024-08-10 20:44:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][510/625] eta 0:00:54 lr 0.000308 wd 0.0500 time 0.4710 (0.4768) data time 0.0011 (0.0026) model time 0.4699 (0.4743) loss 2.9254 (2.6842) grad_norm 1.5697 (nan) loss_scale 128.0000 (187.8669) mem 16707MB [2024-08-10 20:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][520/625] eta 0:00:50 lr 0.000308 wd 0.0500 time 0.4682 (0.4767) data time 0.0011 (0.0026) model time 0.4671 (0.4742) loss 3.3596 (2.6829) grad_norm 2.2375 (nan) loss_scale 128.0000 (186.7179) mem 16707MB [2024-08-10 20:44:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][530/625] eta 0:00:45 lr 0.000308 wd 0.0500 time 0.4692 (0.4765) data time 0.0010 (0.0026) model time 0.4682 (0.4741) loss 2.5043 (2.6846) grad_norm 1.6492 (nan) loss_scale 128.0000 (185.6121) mem 16707MB [2024-08-10 20:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][540/625] eta 0:00:40 lr 0.000308 wd 0.0500 time 0.4697 (0.4764) data time 0.0010 (0.0026) model time 0.4687 (0.4740) loss 2.9905 (2.6895) grad_norm 1.8731 (nan) loss_scale 128.0000 (184.5471) mem 16707MB [2024-08-10 20:45:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][550/625] eta 0:00:35 lr 0.000308 wd 0.0500 time 0.4680 (0.4767) data time 0.0010 (0.0025) model time 0.4670 (0.4743) loss 2.9975 (2.6898) grad_norm 2.3797 (nan) loss_scale 128.0000 (183.5209) mem 16707MB [2024-08-10 20:45:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][560/625] eta 0:00:30 lr 0.000308 wd 0.0500 time 0.4700 (0.4768) data time 0.0010 (0.0025) model time 0.4690 (0.4745) loss 3.5265 (2.6897) grad_norm 2.0219 (nan) loss_scale 128.0000 (182.5312) mem 16707MB [2024-08-10 20:45:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][570/625] eta 0:00:26 lr 0.000308 wd 0.0500 time 0.4614 (0.4767) data time 0.0010 (0.0025) model time 0.4603 (0.4743) loss 3.1326 (2.6860) grad_norm 1.8987 (nan) loss_scale 128.0000 (181.5762) mem 16707MB [2024-08-10 20:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][580/625] eta 0:00:21 lr 0.000307 wd 0.0500 time 0.4646 (0.4765) data time 0.0010 (0.0025) model time 0.4636 (0.4742) loss 3.0440 (2.6878) grad_norm 1.9241 (nan) loss_scale 128.0000 (180.6540) mem 16707MB [2024-08-10 20:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][590/625] eta 0:00:16 lr 0.000307 wd 0.0500 time 0.4734 (0.4765) data time 0.0008 (0.0024) model time 0.4726 (0.4741) loss 3.1927 (2.6872) grad_norm 1.8955 (nan) loss_scale 128.0000 (179.7631) mem 16707MB [2024-08-10 20:45:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][600/625] eta 0:00:11 lr 0.000307 wd 0.0500 time 0.4704 (0.4764) data time 0.0011 (0.0024) model time 0.4693 (0.4741) loss 2.7887 (2.6833) grad_norm 2.1323 (nan) loss_scale 128.0000 (178.9018) mem 16707MB [2024-08-10 20:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][610/625] eta 0:00:07 lr 0.000307 wd 0.0500 time 0.4666 (0.4763) data time 0.0005 (0.0024) model time 0.4660 (0.4740) loss 3.0961 (2.6886) grad_norm 1.7801 (nan) loss_scale 128.0000 (178.0687) mem 16707MB [2024-08-10 20:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [206/300][620/625] eta 0:00:02 lr 0.000307 wd 0.0500 time 0.4660 (0.4761) data time 0.0008 (0.0024) model time 0.4652 (0.4738) loss 2.6107 (2.6887) grad_norm 1.8727 (nan) loss_scale 128.0000 (177.2625) mem 16707MB [2024-08-10 20:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 206 training takes 0:04:57 [2024-08-10 20:45:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:45:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.512 (0.512) Loss 0.5254 (0.5254) Acc@1 89.355 (89.355) Acc@5 98.779 (98.779) Mem 16707MB [2024-08-10 20:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8076 (0.6287) Acc@1 80.420 (86.466) Acc@5 96.240 (97.674) Mem 16707MB [2024-08-10 20:45:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.8940 (0.7359) Acc@1 79.102 (83.717) Acc@5 95.459 (96.603) Mem 16707MB [2024-08-10 20:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.419 Acc@5 96.569 [2024-08-10 20:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 20:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.42% [2024-08-10 20:45:44 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 20:45:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 20:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.504 (0.504) Loss 0.4744 (0.4744) Acc@1 89.453 (89.453) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.159) Loss 0.7476 (0.5841) Acc@1 81.982 (87.314) Acc@5 96.777 (97.989) Mem 16707MB [2024-08-10 20:45:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.139) Loss 0.8335 (0.6863) Acc@1 80.078 (84.614) Acc@5 95.947 (97.042) Mem 16707MB [2024-08-10 20:45:49 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.357 Acc@5 97.017 [2024-08-10 20:45:49 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:45:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][0/625] eta 0:14:07 lr 0.000307 wd 0.0500 time 1.3560 (1.3560) data time 0.8161 (0.8161) model time 0.0000 (0.0000) loss 2.5649 (2.5649) grad_norm 2.0628 (2.0628) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:45:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][10/625] eta 0:05:46 lr 0.000307 wd 0.0500 time 0.4086 (0.5642) data time 0.0009 (0.0752) model time 0.0000 (0.0000) loss 3.4327 (2.6664) grad_norm 1.8761 (2.9675) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][20/625] eta 0:05:13 lr 0.000307 wd 0.0500 time 0.4697 (0.5186) data time 0.0008 (0.0399) model time 0.0000 (0.0000) loss 3.2172 (2.6709) grad_norm 1.9855 (2.7485) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][30/625] eta 0:04:59 lr 0.000307 wd 0.0500 time 0.4696 (0.5028) data time 0.0008 (0.0273) model time 0.0000 (0.0000) loss 3.4359 (2.6844) grad_norm 1.9877 (2.7216) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][40/625] eta 0:04:49 lr 0.000307 wd 0.0500 time 0.4721 (0.4951) data time 0.0014 (0.0209) model time 0.0000 (0.0000) loss 2.6249 (2.6757) grad_norm 2.2353 (2.6363) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][50/625] eta 0:04:41 lr 0.000307 wd 0.0500 time 0.4749 (0.4904) data time 0.0010 (0.0170) model time 0.0000 (0.0000) loss 2.6886 (2.6716) grad_norm 1.2974 (2.4777) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][60/625] eta 0:04:35 lr 0.000307 wd 0.0500 time 0.4693 (0.4871) data time 0.0010 (0.0144) model time 0.4683 (0.4690) loss 2.2862 (2.6577) grad_norm 1.6236 (2.3979) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][70/625] eta 0:04:28 lr 0.000306 wd 0.0500 time 0.4671 (0.4843) data time 0.0010 (0.0126) model time 0.4661 (0.4677) loss 3.0787 (2.6889) grad_norm 3.3062 (2.3502) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][80/625] eta 0:04:22 lr 0.000306 wd 0.0500 time 0.4716 (0.4825) data time 0.0010 (0.0111) model time 0.4706 (0.4681) loss 2.7778 (2.6994) grad_norm 1.6606 (2.3674) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][90/625] eta 0:04:19 lr 0.000306 wd 0.0500 time 0.4691 (0.4851) data time 0.0011 (0.0100) model time 0.4680 (0.4772) loss 2.8208 (2.7114) grad_norm 2.1798 (2.3492) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][100/625] eta 0:04:13 lr 0.000306 wd 0.0500 time 0.4684 (0.4835) data time 0.0008 (0.0091) model time 0.4676 (0.4754) loss 1.8467 (2.7311) grad_norm 1.7820 (2.3430) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][110/625] eta 0:04:08 lr 0.000306 wd 0.0500 time 0.4685 (0.4826) data time 0.0009 (0.0084) model time 0.4676 (0.4749) loss 2.6747 (2.7270) grad_norm 2.1978 (2.3451) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][120/625] eta 0:04:04 lr 0.000306 wd 0.0500 time 0.4712 (0.4835) data time 0.0010 (0.0079) model time 0.4702 (0.4773) loss 2.3588 (2.7196) grad_norm 2.3023 (2.3542) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][130/625] eta 0:03:58 lr 0.000306 wd 0.0500 time 0.4765 (0.4825) data time 0.0008 (0.0073) model time 0.4758 (0.4764) loss 3.3387 (2.7246) grad_norm 2.2090 (2.3299) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:46:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][140/625] eta 0:03:53 lr 0.000306 wd 0.0500 time 0.4664 (0.4816) data time 0.0008 (0.0069) model time 0.4655 (0.4756) loss 2.7022 (2.7216) grad_norm 2.2238 (2.3232) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][150/625] eta 0:03:48 lr 0.000306 wd 0.0500 time 0.4752 (0.4810) data time 0.0010 (0.0065) model time 0.4741 (0.4752) loss 2.6664 (2.7304) grad_norm 2.0562 (2.3429) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][160/625] eta 0:03:43 lr 0.000306 wd 0.0500 time 0.4732 (0.4803) data time 0.0009 (0.0061) model time 0.4723 (0.4745) loss 3.0291 (2.7300) grad_norm 1.6993 (2.3553) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][170/625] eta 0:03:38 lr 0.000306 wd 0.0500 time 0.4708 (0.4797) data time 0.0010 (0.0059) model time 0.4698 (0.4740) loss 3.1866 (2.7388) grad_norm 4.4744 (2.3621) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][180/625] eta 0:03:33 lr 0.000305 wd 0.0500 time 0.4737 (0.4791) data time 0.0008 (0.0056) model time 0.4729 (0.4736) loss 2.2607 (2.7387) grad_norm 1.7415 (2.3370) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][190/625] eta 0:03:28 lr 0.000305 wd 0.0500 time 0.4680 (0.4787) data time 0.0011 (0.0054) model time 0.4669 (0.4733) loss 1.7059 (2.7363) grad_norm 1.8903 (2.3385) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][200/625] eta 0:03:23 lr 0.000305 wd 0.0500 time 0.4680 (0.4783) data time 0.0011 (0.0051) model time 0.4668 (0.4731) loss 2.6953 (2.7468) grad_norm 2.5142 (2.3224) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][210/625] eta 0:03:18 lr 0.000305 wd 0.0500 time 0.4672 (0.4778) data time 0.0009 (0.0049) model time 0.4663 (0.4728) loss 3.2790 (2.7546) grad_norm 2.6553 (2.3156) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][220/625] eta 0:03:13 lr 0.000305 wd 0.0500 time 0.4640 (0.4775) data time 0.0008 (0.0048) model time 0.4631 (0.4725) loss 2.8386 (2.7604) grad_norm 1.7577 (2.3121) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][230/625] eta 0:03:08 lr 0.000305 wd 0.0500 time 0.4644 (0.4771) data time 0.0011 (0.0046) model time 0.4633 (0.4722) loss 3.1938 (2.7567) grad_norm 2.4065 (2.3334) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][240/625] eta 0:03:03 lr 0.000305 wd 0.0500 time 0.4654 (0.4767) data time 0.0008 (0.0045) model time 0.4646 (0.4719) loss 2.1907 (2.7577) grad_norm 1.9420 (2.3621) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][250/625] eta 0:02:58 lr 0.000305 wd 0.0500 time 0.4738 (0.4766) data time 0.0010 (0.0043) model time 0.4728 (0.4720) loss 2.8089 (2.7574) grad_norm 1.4979 (2.3464) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][260/625] eta 0:02:53 lr 0.000305 wd 0.0500 time 0.4797 (0.4765) data time 0.0010 (0.0042) model time 0.4787 (0.4720) loss 2.6620 (2.7524) grad_norm 1.9583 (2.3602) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:47:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][270/625] eta 0:02:49 lr 0.000305 wd 0.0500 time 0.4713 (0.4764) data time 0.0009 (0.0041) model time 0.4703 (0.4720) loss 3.0271 (2.7523) grad_norm 1.8756 (2.3951) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][280/625] eta 0:02:44 lr 0.000305 wd 0.0500 time 0.4702 (0.4762) data time 0.0008 (0.0040) model time 0.4694 (0.4720) loss 2.5390 (2.7450) grad_norm 2.1071 (2.3793) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][290/625] eta 0:02:39 lr 0.000304 wd 0.0500 time 0.4770 (0.4760) data time 0.0009 (0.0039) model time 0.4761 (0.4719) loss 2.9237 (2.7454) grad_norm 1.4920 (2.3623) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][300/625] eta 0:02:34 lr 0.000304 wd 0.0500 time 0.4696 (0.4758) data time 0.0011 (0.0038) model time 0.4685 (0.4717) loss 2.4881 (2.7422) grad_norm 1.6629 (2.3475) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][310/625] eta 0:02:29 lr 0.000304 wd 0.0500 time 0.4666 (0.4756) data time 0.0010 (0.0037) model time 0.4656 (0.4716) loss 3.0134 (2.7406) grad_norm 2.8578 (2.3465) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][320/625] eta 0:02:25 lr 0.000304 wd 0.0500 time 0.4747 (0.4755) data time 0.0008 (0.0036) model time 0.4739 (0.4716) loss 3.2046 (2.7437) grad_norm 1.3893 (2.3315) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][330/625] eta 0:02:20 lr 0.000304 wd 0.0500 time 0.4732 (0.4754) data time 0.0011 (0.0035) model time 0.4720 (0.4716) loss 2.7173 (2.7469) grad_norm 3.5327 (2.3254) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][340/625] eta 0:02:15 lr 0.000304 wd 0.0500 time 0.4750 (0.4753) data time 0.0008 (0.0035) model time 0.4742 (0.4716) loss 2.3600 (2.7370) grad_norm 2.7933 (2.3231) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][350/625] eta 0:02:10 lr 0.000304 wd 0.0500 time 0.4702 (0.4757) data time 0.0010 (0.0034) model time 0.4692 (0.4722) loss 2.6975 (2.7409) grad_norm 1.6752 (2.3221) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][360/625] eta 0:02:06 lr 0.000304 wd 0.0500 time 0.4704 (0.4756) data time 0.0008 (0.0033) model time 0.4696 (0.4721) loss 1.7695 (2.7357) grad_norm 1.5517 (2.3028) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][370/625] eta 0:02:01 lr 0.000304 wd 0.0500 time 0.4638 (0.4754) data time 0.0007 (0.0033) model time 0.4631 (0.4719) loss 2.5371 (2.7309) grad_norm 2.6765 (2.3052) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][380/625] eta 0:01:56 lr 0.000304 wd 0.0500 time 0.4690 (0.4752) data time 0.0008 (0.0032) model time 0.4681 (0.4718) loss 2.4455 (2.7201) grad_norm 1.9520 (2.3101) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][390/625] eta 0:01:51 lr 0.000303 wd 0.0500 time 0.4740 (0.4751) data time 0.0010 (0.0032) model time 0.4730 (0.4717) loss 1.9950 (2.7155) grad_norm 1.2399 (2.3057) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][400/625] eta 0:01:46 lr 0.000303 wd 0.0500 time 0.4769 (0.4750) data time 0.0010 (0.0031) model time 0.4759 (0.4717) loss 2.5393 (2.7153) grad_norm 1.5510 (2.2916) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][410/625] eta 0:01:42 lr 0.000303 wd 0.0500 time 0.4684 (0.4749) data time 0.0011 (0.0031) model time 0.4673 (0.4716) loss 2.3717 (2.7123) grad_norm 1.7608 (2.2767) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][420/625] eta 0:01:37 lr 0.000303 wd 0.0500 time 0.4741 (0.4748) data time 0.0010 (0.0030) model time 0.4732 (0.4716) loss 2.5498 (2.7182) grad_norm 1.6327 (2.2660) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][430/625] eta 0:01:32 lr 0.000303 wd 0.0500 time 0.4681 (0.4751) data time 0.0008 (0.0030) model time 0.4672 (0.4720) loss 2.6113 (2.7172) grad_norm 2.1490 (2.2592) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][440/625] eta 0:01:27 lr 0.000303 wd 0.0500 time 0.4690 (0.4749) data time 0.0011 (0.0029) model time 0.4680 (0.4718) loss 1.6827 (2.7127) grad_norm 1.4438 (2.2507) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][450/625] eta 0:01:23 lr 0.000303 wd 0.0500 time 0.4642 (0.4753) data time 0.0007 (0.0029) model time 0.4634 (0.4723) loss 2.7361 (2.7177) grad_norm 1.3776 (2.2464) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][460/625] eta 0:01:18 lr 0.000303 wd 0.0500 time 0.4693 (0.4752) data time 0.0008 (0.0028) model time 0.4685 (0.4722) loss 3.0613 (2.7185) grad_norm 2.9157 (2.2377) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][470/625] eta 0:01:13 lr 0.000303 wd 0.0500 time 0.4765 (0.4751) data time 0.0008 (0.0028) model time 0.4757 (0.4722) loss 2.8888 (2.7179) grad_norm 1.8851 (2.2294) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][480/625] eta 0:01:08 lr 0.000303 wd 0.0500 time 0.4763 (0.4752) data time 0.0007 (0.0028) model time 0.4755 (0.4723) loss 2.5268 (2.7138) grad_norm 2.7966 (2.2267) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][490/625] eta 0:01:04 lr 0.000303 wd 0.0500 time 0.4716 (0.4751) data time 0.0010 (0.0027) model time 0.4706 (0.4723) loss 3.0606 (2.7210) grad_norm 2.3076 (2.2220) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][500/625] eta 0:00:59 lr 0.000302 wd 0.0500 time 0.4702 (0.4750) data time 0.0010 (0.0027) model time 0.4693 (0.4722) loss 2.8365 (2.7248) grad_norm 2.3298 (2.2386) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][510/625] eta 0:00:54 lr 0.000302 wd 0.0500 time 0.4687 (0.4749) data time 0.0010 (0.0027) model time 0.4677 (0.4721) loss 3.2507 (2.7235) grad_norm 1.9730 (2.2362) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][520/625] eta 0:00:49 lr 0.000302 wd 0.0500 time 0.4677 (0.4748) data time 0.0007 (0.0026) model time 0.4670 (0.4721) loss 2.2907 (2.7234) grad_norm 2.2977 (2.2372) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][530/625] eta 0:00:45 lr 0.000302 wd 0.0500 time 0.4725 (0.4747) data time 0.0010 (0.0026) model time 0.4716 (0.4720) loss 2.8993 (2.7286) grad_norm 1.6442 (2.2419) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][540/625] eta 0:00:40 lr 0.000302 wd 0.0500 time 0.4717 (0.4748) data time 0.0008 (0.0026) model time 0.4709 (0.4721) loss 1.9658 (2.7257) grad_norm 1.6117 (2.2365) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][550/625] eta 0:00:35 lr 0.000302 wd 0.0500 time 0.4703 (0.4748) data time 0.0007 (0.0025) model time 0.4696 (0.4721) loss 2.8428 (2.7235) grad_norm 1.9686 (2.2432) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][560/625] eta 0:00:30 lr 0.000302 wd 0.0500 time 0.4780 (0.4748) data time 0.0007 (0.0025) model time 0.4773 (0.4721) loss 3.1755 (2.7254) grad_norm 1.9641 (2.2430) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][570/625] eta 0:00:26 lr 0.000302 wd 0.0500 time 0.4753 (0.4748) data time 0.0007 (0.0025) model time 0.4746 (0.4722) loss 3.2366 (2.7282) grad_norm 2.1426 (2.2510) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][580/625] eta 0:00:21 lr 0.000302 wd 0.0500 time 0.4781 (0.4747) data time 0.0010 (0.0025) model time 0.4771 (0.4721) loss 2.3684 (2.7292) grad_norm 1.7955 (2.2462) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][590/625] eta 0:00:16 lr 0.000302 wd 0.0500 time 0.4694 (0.4746) data time 0.0010 (0.0024) model time 0.4684 (0.4721) loss 3.0559 (2.7278) grad_norm 1.5732 (2.2423) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][600/625] eta 0:00:11 lr 0.000302 wd 0.0500 time 0.4664 (0.4745) data time 0.0009 (0.0024) model time 0.4655 (0.4720) loss 2.8049 (2.7253) grad_norm 2.2161 (2.2347) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][610/625] eta 0:00:07 lr 0.000301 wd 0.0500 time 0.4692 (0.4745) data time 0.0005 (0.0024) model time 0.4687 (0.4720) loss 2.0668 (2.7246) grad_norm 2.1884 (2.2362) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [207/300][620/625] eta 0:00:02 lr 0.000301 wd 0.0500 time 0.4719 (0.4746) data time 0.0007 (0.0024) model time 0.4712 (0.4722) loss 3.0006 (2.7261) grad_norm 1.5838 (2.2374) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:50:46 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 207 training takes 0:04:56 [2024-08-10 20:50:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:50:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:50:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.5137 (0.5137) Acc@1 88.379 (88.379) Acc@5 98.730 (98.730) Mem 16707MB [2024-08-10 20:50:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.160) Loss 0.8228 (0.6268) Acc@1 80.420 (86.537) Acc@5 96.387 (97.714) Mem 16707MB [2024-08-10 20:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9536 (0.7402) Acc@1 77.295 (83.691) Acc@5 94.775 (96.568) Mem 16707MB [2024-08-10 20:50:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.377 Acc@5 96.565 [2024-08-10 20:50:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 20:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.877 (0.877) Loss 0.4751 (0.4751) Acc@1 89.502 (89.502) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:50:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.7476 (0.5844) Acc@1 82.080 (87.336) Acc@5 96.826 (97.949) Mem 16707MB [2024-08-10 20:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.159) Loss 0.8350 (0.6866) Acc@1 80.078 (84.642) Acc@5 95.996 (97.028) Mem 16707MB [2024-08-10 20:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.367 Acc@5 97.009 [2024-08-10 20:50:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 20:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][0/625] eta 0:13:55 lr 0.000301 wd 0.0500 time 1.3373 (1.3373) data time 0.7542 (0.7542) model time 0.0000 (0.0000) loss 2.9982 (2.9982) grad_norm 1.5548 (1.5548) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][10/625] eta 0:05:36 lr 0.000301 wd 0.0500 time 0.4646 (0.5479) data time 0.0010 (0.0695) model time 0.0000 (0.0000) loss 2.4472 (2.6966) grad_norm 2.2788 (2.2840) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][20/625] eta 0:05:15 lr 0.000301 wd 0.0500 time 0.4692 (0.5210) data time 0.0011 (0.0369) model time 0.0000 (0.0000) loss 2.7638 (2.7179) grad_norm 1.9521 (2.1432) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][30/625] eta 0:04:59 lr 0.000301 wd 0.0500 time 0.4674 (0.5038) data time 0.0011 (0.0253) model time 0.0000 (0.0000) loss 3.1232 (2.7602) grad_norm 1.6236 (2.2586) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][40/625] eta 0:04:49 lr 0.000301 wd 0.0500 time 0.4675 (0.4952) data time 0.0011 (0.0194) model time 0.0000 (0.0000) loss 2.4444 (2.7435) grad_norm 3.2481 (2.3318) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][50/625] eta 0:04:41 lr 0.000301 wd 0.0500 time 0.4686 (0.4900) data time 0.0008 (0.0158) model time 0.0000 (0.0000) loss 3.3981 (2.7668) grad_norm 1.7453 (2.4029) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][60/625] eta 0:04:35 lr 0.000301 wd 0.0500 time 0.4682 (0.4869) data time 0.0008 (0.0134) model time 0.4674 (0.4702) loss 3.1702 (2.7332) grad_norm 2.0390 (2.4304) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][70/625] eta 0:04:30 lr 0.000301 wd 0.0500 time 0.4669 (0.4869) data time 0.0010 (0.0117) model time 0.4659 (0.4779) loss 3.1404 (2.7414) grad_norm 2.0090 (2.4094) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][80/625] eta 0:04:24 lr 0.000301 wd 0.0500 time 0.4642 (0.4847) data time 0.0008 (0.0104) model time 0.4634 (0.4746) loss 2.4518 (2.7223) grad_norm 1.8821 (2.3814) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][90/625] eta 0:04:18 lr 0.000301 wd 0.0500 time 0.4651 (0.4827) data time 0.0008 (0.0093) model time 0.4642 (0.4723) loss 3.1620 (2.7409) grad_norm 2.0119 (2.3745) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][100/625] eta 0:04:12 lr 0.000300 wd 0.0500 time 0.4663 (0.4811) data time 0.0010 (0.0085) model time 0.4653 (0.4710) loss 2.9813 (2.7356) grad_norm 2.1558 (2.4601) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][110/625] eta 0:04:07 lr 0.000300 wd 0.0500 time 0.4659 (0.4798) data time 0.0010 (0.0078) model time 0.4649 (0.4701) loss 2.9522 (2.7662) grad_norm 1.5695 (2.4136) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][120/625] eta 0:04:01 lr 0.000300 wd 0.0500 time 0.4676 (0.4790) data time 0.0010 (0.0073) model time 0.4666 (0.4700) loss 3.0028 (2.7498) grad_norm 2.1560 (2.3627) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:51:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][130/625] eta 0:03:56 lr 0.000300 wd 0.0500 time 0.4750 (0.4786) data time 0.0008 (0.0068) model time 0.4742 (0.4703) loss 2.5411 (2.7594) grad_norm 6.7122 (2.3842) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][140/625] eta 0:03:52 lr 0.000300 wd 0.0500 time 0.4702 (0.4795) data time 0.0011 (0.0064) model time 0.4692 (0.4725) loss 2.5839 (2.7612) grad_norm 1.9229 (2.4232) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][150/625] eta 0:03:47 lr 0.000300 wd 0.0500 time 0.4695 (0.4789) data time 0.0011 (0.0061) model time 0.4684 (0.4721) loss 2.0896 (2.7619) grad_norm 1.5914 (2.4412) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][160/625] eta 0:03:42 lr 0.000300 wd 0.0500 time 0.4646 (0.4782) data time 0.0010 (0.0057) model time 0.4636 (0.4717) loss 2.9215 (2.7650) grad_norm 1.9023 (2.4058) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][170/625] eta 0:03:37 lr 0.000300 wd 0.0500 time 0.4692 (0.4777) data time 0.0010 (0.0055) model time 0.4682 (0.4715) loss 2.6863 (2.7587) grad_norm 1.4837 (2.3734) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][180/625] eta 0:03:32 lr 0.000300 wd 0.0500 time 0.4716 (0.4772) data time 0.0008 (0.0052) model time 0.4709 (0.4711) loss 2.8451 (2.7594) grad_norm 3.7074 (2.3736) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][190/625] eta 0:03:27 lr 0.000300 wd 0.0500 time 0.4676 (0.4768) data time 0.0011 (0.0050) model time 0.4665 (0.4709) loss 2.6642 (2.7617) grad_norm 2.1638 (2.3778) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][200/625] eta 0:03:22 lr 0.000300 wd 0.0500 time 0.4677 (0.4765) data time 0.0010 (0.0048) model time 0.4666 (0.4709) loss 2.4055 (2.7539) grad_norm 1.7902 (2.3716) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][210/625] eta 0:03:18 lr 0.000299 wd 0.0500 time 0.4695 (0.4772) data time 0.0011 (0.0046) model time 0.4684 (0.4721) loss 2.7519 (2.7474) grad_norm 1.6982 (2.3609) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][220/625] eta 0:03:13 lr 0.000299 wd 0.0500 time 0.4691 (0.4769) data time 0.0010 (0.0045) model time 0.4681 (0.4719) loss 2.8330 (2.7485) grad_norm 2.5289 (2.3525) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][230/625] eta 0:03:08 lr 0.000299 wd 0.0500 time 0.4612 (0.4765) data time 0.0008 (0.0043) model time 0.4604 (0.4716) loss 1.7929 (2.7396) grad_norm 1.4591 (2.3389) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][240/625] eta 0:03:03 lr 0.000299 wd 0.0500 time 0.4693 (0.4762) data time 0.0010 (0.0042) model time 0.4683 (0.4714) loss 2.7636 (2.7445) grad_norm 1.4492 (2.3293) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][250/625] eta 0:02:58 lr 0.000299 wd 0.0500 time 0.4712 (0.4758) data time 0.0010 (0.0041) model time 0.4702 (0.4712) loss 2.7484 (2.7395) grad_norm 1.5812 (2.3193) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][260/625] eta 0:02:53 lr 0.000299 wd 0.0500 time 0.4664 (0.4755) data time 0.0010 (0.0039) model time 0.4654 (0.4710) loss 2.5050 (2.7311) grad_norm 1.8056 (2.3203) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][270/625] eta 0:02:48 lr 0.000299 wd 0.0500 time 0.4742 (0.4754) data time 0.0009 (0.0038) model time 0.4732 (0.4710) loss 2.7477 (2.7230) grad_norm 1.6081 (2.3066) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][280/625] eta 0:02:43 lr 0.000299 wd 0.0500 time 0.4689 (0.4752) data time 0.0010 (0.0037) model time 0.4679 (0.4709) loss 2.1150 (2.7218) grad_norm 2.5915 (2.3001) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][290/625] eta 0:02:39 lr 0.000299 wd 0.0500 time 0.4834 (0.4752) data time 0.0007 (0.0036) model time 0.4827 (0.4710) loss 1.6040 (2.7269) grad_norm 1.6578 (2.2862) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][300/625] eta 0:02:34 lr 0.000299 wd 0.0500 time 0.4763 (0.4752) data time 0.0010 (0.0036) model time 0.4754 (0.4711) loss 2.9367 (2.7282) grad_norm 1.6674 (2.2725) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][310/625] eta 0:02:29 lr 0.000299 wd 0.0500 time 0.4744 (0.4751) data time 0.0010 (0.0035) model time 0.4735 (0.4711) loss 2.0863 (2.7199) grad_norm 1.6849 (2.2650) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][320/625] eta 0:02:24 lr 0.000298 wd 0.0500 time 0.4681 (0.4749) data time 0.0010 (0.0034) model time 0.4670 (0.4710) loss 2.8157 (2.7237) grad_norm 3.5819 (2.3393) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][330/625] eta 0:02:20 lr 0.000298 wd 0.0500 time 0.4812 (0.4749) data time 0.0007 (0.0033) model time 0.4805 (0.4711) loss 2.2110 (2.7180) grad_norm 2.2784 (2.3317) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][340/625] eta 0:02:15 lr 0.000298 wd 0.0500 time 0.4692 (0.4748) data time 0.0009 (0.0033) model time 0.4683 (0.4710) loss 3.0325 (2.7221) grad_norm 1.9308 (2.3277) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][350/625] eta 0:02:10 lr 0.000298 wd 0.0500 time 0.4716 (0.4747) data time 0.0010 (0.0032) model time 0.4706 (0.4711) loss 2.9781 (2.7199) grad_norm 1.7033 (2.3149) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][360/625] eta 0:02:05 lr 0.000298 wd 0.0500 time 0.4734 (0.4747) data time 0.0008 (0.0031) model time 0.4726 (0.4711) loss 2.7467 (2.7235) grad_norm 2.3207 (2.3275) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][370/625] eta 0:02:01 lr 0.000298 wd 0.0500 time 0.4780 (0.4747) data time 0.0010 (0.0031) model time 0.4770 (0.4712) loss 2.6001 (2.7202) grad_norm 1.9405 (2.3313) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:53:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][380/625] eta 0:01:56 lr 0.000298 wd 0.0500 time 0.4733 (0.4746) data time 0.0007 (0.0030) model time 0.4726 (0.4712) loss 2.8261 (2.7202) grad_norm 2.2137 (2.3216) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][390/625] eta 0:01:51 lr 0.000298 wd 0.0500 time 0.4694 (0.4746) data time 0.0007 (0.0030) model time 0.4687 (0.4713) loss 3.1650 (2.7223) grad_norm 1.6794 (2.3329) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][400/625] eta 0:01:46 lr 0.000298 wd 0.0500 time 0.4589 (0.4750) data time 0.0010 (0.0029) model time 0.4579 (0.4718) loss 1.8611 (2.7228) grad_norm 1.4814 (2.3249) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][410/625] eta 0:01:42 lr 0.000298 wd 0.0500 time 0.4684 (0.4748) data time 0.0008 (0.0029) model time 0.4676 (0.4716) loss 3.0967 (2.7220) grad_norm 3.5731 (2.3357) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][420/625] eta 0:01:37 lr 0.000298 wd 0.0500 time 0.4802 (0.4747) data time 0.0008 (0.0028) model time 0.4794 (0.4716) loss 3.1476 (2.7178) grad_norm 1.8552 (2.3284) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][430/625] eta 0:01:32 lr 0.000297 wd 0.0500 time 0.4716 (0.4752) data time 0.0007 (0.0028) model time 0.4709 (0.4721) loss 1.7489 (2.7140) grad_norm 1.6122 (2.3236) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][440/625] eta 0:01:27 lr 0.000297 wd 0.0500 time 0.4670 (0.4751) data time 0.0007 (0.0028) model time 0.4663 (0.4721) loss 3.3418 (2.7122) grad_norm 3.1663 (2.3163) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][450/625] eta 0:01:23 lr 0.000297 wd 0.0500 time 0.4657 (0.4750) data time 0.0008 (0.0027) model time 0.4648 (0.4720) loss 3.3744 (2.7125) grad_norm 1.7862 (2.3120) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][460/625] eta 0:01:18 lr 0.000297 wd 0.0500 time 0.4671 (0.4749) data time 0.0010 (0.0027) model time 0.4661 (0.4720) loss 2.4914 (2.7101) grad_norm 1.6230 (2.3086) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][470/625] eta 0:01:13 lr 0.000297 wd 0.0500 time 0.4707 (0.4748) data time 0.0008 (0.0026) model time 0.4699 (0.4719) loss 2.6516 (2.7085) grad_norm 2.0940 (2.3115) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][480/625] eta 0:01:08 lr 0.000297 wd 0.0500 time 0.4667 (0.4751) data time 0.0008 (0.0026) model time 0.4660 (0.4723) loss 2.2316 (2.7087) grad_norm 2.2584 (2.3207) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][490/625] eta 0:01:04 lr 0.000297 wd 0.0500 time 0.4669 (0.4750) data time 0.0010 (0.0026) model time 0.4659 (0.4722) loss 1.6165 (2.7082) grad_norm 2.7150 (2.3223) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][500/625] eta 0:00:59 lr 0.000297 wd 0.0500 time 0.4749 (0.4749) data time 0.0011 (0.0025) model time 0.4738 (0.4722) loss 2.9127 (2.7082) grad_norm 2.6369 (2.3249) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][510/625] eta 0:00:54 lr 0.000297 wd 0.0500 time 0.4730 (0.4749) data time 0.0010 (0.0025) model time 0.4719 (0.4721) loss 2.5578 (2.7089) grad_norm 3.5657 (2.3330) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][520/625] eta 0:00:49 lr 0.000297 wd 0.0500 time 0.4679 (0.4748) data time 0.0008 (0.0025) model time 0.4671 (0.4721) loss 3.0789 (2.7122) grad_norm 1.6575 (2.3255) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][530/625] eta 0:00:45 lr 0.000297 wd 0.0500 time 0.4756 (0.4747) data time 0.0009 (0.0025) model time 0.4747 (0.4721) loss 3.2093 (2.7163) grad_norm 2.0818 (2.3388) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][540/625] eta 0:00:40 lr 0.000296 wd 0.0500 time 0.4707 (0.4747) data time 0.0010 (0.0024) model time 0.4697 (0.4720) loss 2.3915 (2.7125) grad_norm 1.9662 (2.3369) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][550/625] eta 0:00:35 lr 0.000296 wd 0.0500 time 0.4634 (0.4746) data time 0.0010 (0.0024) model time 0.4624 (0.4720) loss 2.9515 (2.7138) grad_norm 1.9179 (2.3778) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][560/625] eta 0:00:30 lr 0.000296 wd 0.0500 time 0.4725 (0.4746) data time 0.0009 (0.0024) model time 0.4715 (0.4720) loss 2.7296 (2.7117) grad_norm 4.7034 (2.3939) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][570/625] eta 0:00:26 lr 0.000296 wd 0.0500 time 0.4760 (0.4746) data time 0.0009 (0.0024) model time 0.4751 (0.4720) loss 3.1198 (2.7117) grad_norm 2.2634 (2.4100) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][580/625] eta 0:00:21 lr 0.000296 wd 0.0500 time 0.4742 (0.4749) data time 0.0007 (0.0023) model time 0.4734 (0.4724) loss 2.8449 (2.7139) grad_norm 2.7368 (2.4110) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][590/625] eta 0:00:16 lr 0.000296 wd 0.0500 time 0.4677 (0.4750) data time 0.0010 (0.0023) model time 0.4667 (0.4726) loss 3.1179 (2.7147) grad_norm 2.6180 (2.4095) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][600/625] eta 0:00:11 lr 0.000296 wd 0.0500 time 0.4700 (0.4750) data time 0.0008 (0.0023) model time 0.4692 (0.4725) loss 2.9419 (2.7127) grad_norm 1.6661 (2.4007) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][610/625] eta 0:00:07 lr 0.000296 wd 0.0500 time 0.4638 (0.4749) data time 0.0005 (0.0023) model time 0.4633 (0.4725) loss 2.5702 (2.7114) grad_norm 2.2055 (2.3938) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [208/300][620/625] eta 0:00:02 lr 0.000296 wd 0.0500 time 0.4620 (0.4747) data time 0.0005 (0.0023) model time 0.4615 (0.4723) loss 2.2467 (2.7092) grad_norm 1.1835 (2.3827) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 208 training takes 0:04:56 [2024-08-10 20:55:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:55:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 20:55:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5210 (0.5210) Acc@1 88.721 (88.721) Acc@5 98.682 (98.682) Mem 16707MB [2024-08-10 20:55:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.163) Loss 0.8164 (0.6301) Acc@1 81.006 (86.563) Acc@5 95.898 (97.652) Mem 16707MB [2024-08-10 20:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.120 (0.142) Loss 0.9443 (0.7479) Acc@1 77.930 (83.564) Acc@5 94.824 (96.591) Mem 16707MB [2024-08-10 20:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.339 Acc@5 96.589 [2024-08-10 20:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 20:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.857 (0.857) Loss 0.4749 (0.4749) Acc@1 89.355 (89.355) Acc@5 98.828 (98.828) Mem 16707MB [2024-08-10 20:55:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.197) Loss 0.7495 (0.5846) Acc@1 82.275 (87.300) Acc@5 96.875 (97.976) Mem 16707MB [2024-08-10 20:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.160) Loss 0.8364 (0.6867) Acc@1 80.078 (84.617) Acc@5 96.045 (97.061) Mem 16707MB [2024-08-10 20:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.335 Acc@5 97.045 [2024-08-10 20:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 20:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][0/625] eta 0:14:27 lr 0.000296 wd 0.0500 time 1.3884 (1.3884) data time 0.6643 (0.6643) model time 0.0000 (0.0000) loss 3.1191 (3.1191) grad_norm 1.8864 (1.8864) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][10/625] eta 0:05:41 lr 0.000296 wd 0.0500 time 0.4727 (0.5553) data time 0.0010 (0.0613) model time 0.0000 (0.0000) loss 2.7108 (2.8373) grad_norm 1.8706 (2.4487) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][20/625] eta 0:05:11 lr 0.000295 wd 0.0500 time 0.4742 (0.5155) data time 0.0008 (0.0326) model time 0.0000 (0.0000) loss 2.4538 (2.8168) grad_norm 1.5355 (2.6960) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][30/625] eta 0:04:58 lr 0.000295 wd 0.0500 time 0.4704 (0.5014) data time 0.0010 (0.0225) model time 0.0000 (0.0000) loss 2.2339 (2.7831) grad_norm 1.4700 (2.5033) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][40/625] eta 0:04:48 lr 0.000295 wd 0.0500 time 0.4688 (0.4936) data time 0.0010 (0.0172) model time 0.0000 (0.0000) loss 2.0402 (2.7402) grad_norm 1.5029 (2.3737) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][50/625] eta 0:04:41 lr 0.000295 wd 0.0500 time 0.4695 (0.4888) data time 0.0009 (0.0141) model time 0.0000 (0.0000) loss 1.7204 (2.7190) grad_norm 1.7344 (2.2436) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][60/625] eta 0:04:34 lr 0.000295 wd 0.0500 time 0.4698 (0.4856) data time 0.0011 (0.0119) model time 0.4688 (0.4682) loss 2.7624 (2.7339) grad_norm 2.6466 (2.5114) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][70/625] eta 0:04:29 lr 0.000295 wd 0.0500 time 0.4748 (0.4862) data time 0.0008 (0.0104) model time 0.4741 (0.4786) loss 3.3835 (2.7525) grad_norm 5.5922 (2.5702) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][80/625] eta 0:04:24 lr 0.000295 wd 0.0500 time 0.4701 (0.4844) data time 0.0009 (0.0094) model time 0.4692 (0.4757) loss 3.2557 (2.7304) grad_norm 2.5002 (2.5968) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][90/625] eta 0:04:18 lr 0.000295 wd 0.0500 time 0.4738 (0.4834) data time 0.0007 (0.0084) model time 0.4731 (0.4752) loss 3.1047 (2.7600) grad_norm 2.8333 (2.6234) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][100/625] eta 0:04:14 lr 0.000295 wd 0.0500 time 0.6476 (0.4840) data time 0.0007 (0.0077) model time 0.6469 (0.4778) loss 1.7551 (2.7504) grad_norm 1.6456 (2.6028) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][110/625] eta 0:04:08 lr 0.000295 wd 0.0500 time 0.4686 (0.4827) data time 0.0011 (0.0071) model time 0.4675 (0.4764) loss 2.8068 (2.7509) grad_norm 1.9288 (2.5489) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][120/625] eta 0:04:03 lr 0.000295 wd 0.0500 time 0.4744 (0.4818) data time 0.0007 (0.0066) model time 0.4737 (0.4756) loss 3.1760 (2.7432) grad_norm 1.9364 (2.5060) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][130/625] eta 0:03:58 lr 0.000294 wd 0.0500 time 0.4676 (0.4825) data time 0.0010 (0.0062) model time 0.4666 (0.4774) loss 2.8924 (2.7578) grad_norm 1.5330 (2.4614) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][140/625] eta 0:03:53 lr 0.000294 wd 0.0500 time 0.4691 (0.4814) data time 0.0008 (0.0058) model time 0.4682 (0.4761) loss 2.6971 (2.7694) grad_norm 2.0940 (2.4684) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][150/625] eta 0:03:48 lr 0.000294 wd 0.0500 time 0.4672 (0.4806) data time 0.0010 (0.0055) model time 0.4662 (0.4753) loss 2.9687 (2.7765) grad_norm 2.1557 (2.4434) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][160/625] eta 0:03:43 lr 0.000294 wd 0.0500 time 0.4792 (0.4802) data time 0.0007 (0.0052) model time 0.4785 (0.4751) loss 1.7470 (2.7466) grad_norm 1.3170 (2.4118) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][170/625] eta 0:03:38 lr 0.000294 wd 0.0500 time 0.4705 (0.4797) data time 0.0008 (0.0050) model time 0.4698 (0.4747) loss 2.6935 (2.7494) grad_norm 1.9035 (2.4224) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][180/625] eta 0:03:33 lr 0.000294 wd 0.0500 time 0.4657 (0.4792) data time 0.0007 (0.0048) model time 0.4649 (0.4743) loss 2.8826 (2.7466) grad_norm 3.1201 (2.4307) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][190/625] eta 0:03:28 lr 0.000294 wd 0.0500 time 0.4725 (0.4787) data time 0.0010 (0.0046) model time 0.4714 (0.4739) loss 2.9831 (2.7462) grad_norm 3.1778 (2.4294) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][200/625] eta 0:03:23 lr 0.000294 wd 0.0500 time 0.4722 (0.4782) data time 0.0010 (0.0044) model time 0.4712 (0.4735) loss 3.0238 (2.7525) grad_norm 2.0715 (2.4222) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][210/625] eta 0:03:18 lr 0.000294 wd 0.0500 time 0.4681 (0.4778) data time 0.0008 (0.0043) model time 0.4673 (0.4731) loss 2.8252 (2.7519) grad_norm 2.1140 (2.4735) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][220/625] eta 0:03:13 lr 0.000294 wd 0.0500 time 0.4670 (0.4775) data time 0.0007 (0.0041) model time 0.4663 (0.4730) loss 1.9258 (2.7488) grad_norm 1.8735 (2.4654) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][230/625] eta 0:03:08 lr 0.000294 wd 0.0500 time 0.4695 (0.4773) data time 0.0007 (0.0040) model time 0.4688 (0.4729) loss 2.8655 (2.7545) grad_norm 1.8158 (2.4761) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:57:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][240/625] eta 0:03:03 lr 0.000293 wd 0.0500 time 0.4754 (0.4771) data time 0.0009 (0.0039) model time 0.4745 (0.4728) loss 2.5405 (2.7569) grad_norm 1.7686 (2.4506) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][250/625] eta 0:02:58 lr 0.000293 wd 0.0500 time 0.4683 (0.4769) data time 0.0007 (0.0037) model time 0.4676 (0.4728) loss 2.7012 (2.7568) grad_norm 2.3899 (2.4872) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][260/625] eta 0:02:54 lr 0.000293 wd 0.0500 time 0.4644 (0.4771) data time 0.0009 (0.0036) model time 0.4635 (0.4732) loss 2.9129 (2.7597) grad_norm 2.3278 (2.4737) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][270/625] eta 0:02:49 lr 0.000293 wd 0.0500 time 0.4676 (0.4769) data time 0.0007 (0.0035) model time 0.4668 (0.4730) loss 3.0021 (2.7646) grad_norm 2.0625 (2.4847) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][280/625] eta 0:02:44 lr 0.000293 wd 0.0500 time 0.4766 (0.4766) data time 0.0007 (0.0035) model time 0.4758 (0.4728) loss 3.0274 (2.7699) grad_norm 2.5201 (2.5414) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][290/625] eta 0:02:39 lr 0.000293 wd 0.0500 time 0.4695 (0.4764) data time 0.0009 (0.0034) model time 0.4686 (0.4726) loss 2.6095 (2.7654) grad_norm 2.4112 (2.5243) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][300/625] eta 0:02:34 lr 0.000293 wd 0.0500 time 0.4718 (0.4762) data time 0.0008 (0.0033) model time 0.4710 (0.4726) loss 3.2834 (2.7555) grad_norm 2.1675 (2.5067) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][310/625] eta 0:02:29 lr 0.000293 wd 0.0500 time 0.4814 (0.4761) data time 0.0010 (0.0032) model time 0.4804 (0.4726) loss 2.6688 (2.7540) grad_norm 3.0776 (2.4979) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][320/625] eta 0:02:25 lr 0.000293 wd 0.0500 time 0.4723 (0.4761) data time 0.0010 (0.0032) model time 0.4713 (0.4726) loss 1.7137 (2.7497) grad_norm 2.6675 (2.4879) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][330/625] eta 0:02:20 lr 0.000293 wd 0.0500 time 0.4702 (0.4759) data time 0.0008 (0.0031) model time 0.4694 (0.4724) loss 3.0843 (2.7520) grad_norm 1.9498 (2.4741) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][340/625] eta 0:02:15 lr 0.000293 wd 0.0500 time 0.4658 (0.4756) data time 0.0010 (0.0030) model time 0.4648 (0.4722) loss 2.7627 (2.7536) grad_norm 5.6224 (2.5172) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][350/625] eta 0:02:10 lr 0.000292 wd 0.0500 time 0.4701 (0.4754) data time 0.0008 (0.0030) model time 0.4693 (0.4720) loss 3.1917 (2.7568) grad_norm 2.0318 (2.5257) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][360/625] eta 0:02:05 lr 0.000292 wd 0.0500 time 0.4707 (0.4752) data time 0.0009 (0.0029) model time 0.4698 (0.4718) loss 3.3204 (2.7588) grad_norm 1.8011 (2.5092) loss_scale 128.0000 (128.0000) mem 16707MB [2024-08-10 20:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][370/625] eta 0:02:01 lr 0.000292 wd 0.0500 time 0.4724 (0.4750) data time 0.0008 (0.0029) model time 0.4715 (0.4718) loss 3.3684 (2.7512) grad_norm 1.9098 (2.4959) loss_scale 256.0000 (130.4151) mem 16707MB [2024-08-10 20:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][380/625] eta 0:01:56 lr 0.000292 wd 0.0500 time 0.4752 (0.4750) data time 0.0008 (0.0028) model time 0.4744 (0.4718) loss 2.7983 (2.7495) grad_norm 1.8381 (2.4802) loss_scale 256.0000 (133.7113) mem 16707MB [2024-08-10 20:59:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][390/625] eta 0:01:51 lr 0.000292 wd 0.0500 time 0.4736 (0.4749) data time 0.0010 (0.0028) model time 0.4726 (0.4718) loss 2.8136 (2.7460) grad_norm 1.8685 (2.4749) loss_scale 256.0000 (136.8389) mem 16707MB [2024-08-10 20:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][400/625] eta 0:01:46 lr 0.000292 wd 0.0500 time 0.4662 (0.4748) data time 0.0008 (0.0027) model time 0.4654 (0.4717) loss 2.6767 (2.7438) grad_norm 1.5392 (2.4598) loss_scale 256.0000 (139.8105) mem 16707MB [2024-08-10 20:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 20:59:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 20:59:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:03:20 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 21:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 21:03:33 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 21:03:44 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 21:03:44 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 21:03:47 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 21:03:49 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 21:03:49 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 209) [2024-08-10 21:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 21:04:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][410/625] eta 0:13:56 lr 0.000292 wd 0.0500 time 0.4391 (3.8907) data time 0.0007 (0.2053) model time 0.4384 (3.6854) loss 3.2171 (3.0271) grad_norm 1.3874 (1.9320) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][420/625] eta 0:05:55 lr 0.000292 wd 0.0500 time 0.4395 (1.7345) data time 0.0007 (0.0775) model time 0.4388 (1.6570) loss 2.8958 (2.9702) grad_norm 1.8802 (2.0143) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][430/625] eta 0:04:01 lr 0.000292 wd 0.0500 time 0.4413 (1.2367) data time 0.0007 (0.0480) model time 0.4407 (1.1887) loss 2.8331 (2.9523) grad_norm 1.5211 (2.2778) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][440/625] eta 0:03:09 lr 0.000292 wd 0.0500 time 0.4426 (1.0231) data time 0.0008 (0.0349) model time 0.4418 (0.9882) loss 2.9100 (2.9517) grad_norm 2.4565 (2.3438) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][450/625] eta 0:02:37 lr 0.000292 wd 0.0500 time 0.4415 (0.9009) data time 0.0008 (0.0275) model time 0.4407 (0.8734) loss 2.5038 (2.8903) grad_norm 1.8383 (2.3014) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][460/625] eta 0:02:15 lr 0.000291 wd 0.0500 time 0.4424 (0.8190) data time 0.0006 (0.0228) model time 0.4418 (0.7962) loss 3.1790 (2.8739) grad_norm 2.2816 (2.3835) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][470/625] eta 0:01:58 lr 0.000291 wd 0.0500 time 0.4389 (0.7617) data time 0.0006 (0.0194) model time 0.4383 (0.7423) loss 2.6203 (2.8418) grad_norm 1.7835 (2.7061) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][480/625] eta 0:01:44 lr 0.000291 wd 0.0500 time 0.4423 (0.7197) data time 0.0009 (0.0170) model time 0.4415 (0.7027) loss 2.9621 (2.7996) grad_norm 1.8348 (2.6370) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][490/625] eta 0:01:32 lr 0.000291 wd 0.0500 time 0.4391 (0.6872) data time 0.0006 (0.0151) model time 0.4385 (0.6721) loss 2.2607 (2.7766) grad_norm 2.2069 (2.6138) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][500/625] eta 0:01:22 lr 0.000291 wd 0.0500 time 0.4422 (0.6615) data time 0.0006 (0.0136) model time 0.4416 (0.6479) loss 2.9238 (2.7749) grad_norm 1.9093 (2.6167) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][510/625] eta 0:01:13 lr 0.000291 wd 0.0500 time 0.4364 (0.6406) data time 0.0007 (0.0124) model time 0.4356 (0.6282) loss 2.9861 (2.7970) grad_norm 2.7968 (2.5806) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][520/625] eta 0:01:05 lr 0.000291 wd 0.0500 time 0.4443 (0.6235) data time 0.0006 (0.0114) model time 0.4437 (0.6121) loss 3.1795 (2.7900) grad_norm 1.2563 (2.5432) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][530/625] eta 0:00:57 lr 0.000291 wd 0.0500 time 0.4406 (0.6090) data time 0.0006 (0.0106) model time 0.4400 (0.5985) loss 1.7375 (2.7811) grad_norm 2.1149 (2.4955) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][540/625] eta 0:00:50 lr 0.000291 wd 0.0500 time 0.4440 (0.5968) data time 0.0008 (0.0098) model time 0.4432 (0.5869) loss 2.3068 (2.7915) grad_norm 2.4164 (2.4571) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][550/625] eta 0:00:43 lr 0.000291 wd 0.0500 time 0.4411 (0.5861) data time 0.0006 (0.0092) model time 0.4404 (0.5769) loss 2.1152 (2.7789) grad_norm 1.9223 (2.4407) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][560/625] eta 0:00:37 lr 0.000291 wd 0.0500 time 0.4438 (0.5769) data time 0.0009 (0.0087) model time 0.4429 (0.5682) loss 3.1374 (2.7813) grad_norm 2.0306 (2.4873) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][570/625] eta 0:00:31 lr 0.000290 wd 0.0500 time 0.4399 (0.5688) data time 0.0010 (0.0082) model time 0.4389 (0.5606) loss 2.9740 (2.7850) grad_norm 2.8659 (2.4635) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][580/625] eta 0:00:25 lr 0.000290 wd 0.0500 time 0.4469 (0.5616) data time 0.0007 (0.0078) model time 0.4462 (0.5538) loss 2.4688 (2.7693) grad_norm 1.6505 (2.4301) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][590/625] eta 0:00:19 lr 0.000290 wd 0.0500 time 0.4465 (0.5561) data time 0.0009 (0.0074) model time 0.4456 (0.5486) loss 2.3127 (2.7574) grad_norm 2.1029 (2.4289) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][600/625] eta 0:00:13 lr 0.000290 wd 0.0500 time 0.4506 (0.5504) data time 0.0006 (0.0071) model time 0.4500 (0.5433) loss 2.3924 (2.7486) grad_norm 2.1649 (2.4045) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][610/625] eta 0:00:08 lr 0.000290 wd 0.0500 time 0.4385 (0.5453) data time 0.0004 (0.0068) model time 0.4381 (0.5385) loss 1.6907 (2.7350) grad_norm 1.5063 (2.4039) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [209/300][620/625] eta 0:00:02 lr 0.000290 wd 0.0500 time 0.4429 (0.5405) data time 0.0004 (0.0065) model time 0.4425 (0.5340) loss 2.0197 (2.7294) grad_norm 2.3346 (2.3859) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-10 21:05:50 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 209 training takes 0:01:58 [2024-08-10 21:05:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:05:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5127 (0.5127) Acc@1 89.160 (89.160) Acc@5 98.730 (98.730) Mem 16695MB [2024-08-10 21:05:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8408 (0.6275) Acc@1 79.590 (86.253) Acc@5 96.191 (97.745) Mem 16695MB [2024-08-10 21:05:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 0.9292 (0.7372) Acc@1 77.979 (83.477) Acc@5 94.629 (96.652) Mem 16695MB [2024-08-10 21:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.237 Acc@5 96.627 [2024-08-10 21:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.2% [2024-08-10 21:06:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.851 (0.851) Loss 0.4746 (0.4746) Acc@1 89.600 (89.600) Acc@5 98.828 (98.828) Mem 16695MB [2024-08-10 21:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.187) Loss 0.7500 (0.5846) Acc@1 82.178 (87.314) Acc@5 96.875 (97.989) Mem 16695MB [2024-08-10 21:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.153) Loss 0.8379 (0.6866) Acc@1 79.932 (84.621) Acc@5 96.045 (97.049) Mem 16695MB [2024-08-10 21:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.347 Acc@5 97.033 [2024-08-10 21:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.35% [2024-08-10 21:06:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 21:06:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 21:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][0/625] eta 0:11:22 lr 0.000290 wd 0.0500 time 1.0919 (1.0919) data time 0.3507 (0.3507) model time 0.0000 (0.0000) loss 2.9709 (2.9709) grad_norm 2.0136 (2.0136) loss_scale 256.0000 (256.0000) mem 16704MB [2024-08-10 21:06:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][10/625] eta 0:05:08 lr 0.000290 wd 0.0500 time 0.4404 (0.5008) data time 0.0006 (0.0326) model time 0.0000 (0.0000) loss 1.9226 (2.7063) grad_norm 1.8434 (1.9146) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][20/625] eta 0:04:46 lr 0.000290 wd 0.0500 time 0.4434 (0.4734) data time 0.0006 (0.0175) model time 0.0000 (0.0000) loss 2.9523 (2.6657) grad_norm 2.5738 (2.0514) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][30/625] eta 0:04:36 lr 0.000290 wd 0.0500 time 0.4442 (0.4640) data time 0.0006 (0.0121) model time 0.0000 (0.0000) loss 3.1562 (2.6116) grad_norm 5.7433 (2.1230) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][40/625] eta 0:04:28 lr 0.000290 wd 0.0500 time 0.4468 (0.4591) data time 0.0006 (0.0093) model time 0.0000 (0.0000) loss 2.7541 (2.5925) grad_norm 1.6050 (2.4045) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][50/625] eta 0:04:22 lr 0.000290 wd 0.0500 time 0.4428 (0.4561) data time 0.0006 (0.0076) model time 0.0000 (0.0000) loss 2.7783 (2.5811) grad_norm 2.3806 (2.4235) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][60/625] eta 0:04:16 lr 0.000289 wd 0.0500 time 0.4416 (0.4540) data time 0.0008 (0.0065) model time 0.4408 (0.4424) loss 2.8264 (2.6260) grad_norm 1.6546 (2.3692) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][70/625] eta 0:04:12 lr 0.000289 wd 0.0500 time 0.4451 (0.4557) data time 0.0006 (0.0057) model time 0.4445 (0.4538) loss 1.6216 (2.5940) grad_norm 3.2749 (2.3863) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][80/625] eta 0:04:07 lr 0.000289 wd 0.0500 time 0.4431 (0.4541) data time 0.0008 (0.0051) model time 0.4424 (0.4499) loss 2.6141 (2.5758) grad_norm 1.8018 (2.3301) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][90/625] eta 0:04:02 lr 0.000289 wd 0.0500 time 0.4415 (0.4529) data time 0.0008 (0.0047) model time 0.4407 (0.4480) loss 2.9047 (2.5748) grad_norm 1.6744 (2.3184) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][100/625] eta 0:03:57 lr 0.000289 wd 0.0500 time 0.4446 (0.4519) data time 0.0006 (0.0043) model time 0.4440 (0.4467) loss 3.2903 (2.6214) grad_norm 1.4052 (2.2742) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][110/625] eta 0:03:52 lr 0.000289 wd 0.0500 time 0.4473 (0.4512) data time 0.0006 (0.0040) model time 0.4467 (0.4462) loss 1.5456 (2.6255) grad_norm 1.9293 (2.2449) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][120/625] eta 0:03:47 lr 0.000289 wd 0.0500 time 0.4418 (0.4510) data time 0.0008 (0.0037) model time 0.4409 (0.4465) loss 3.3227 (2.6391) grad_norm 2.0451 (2.2282) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][130/625] eta 0:03:42 lr 0.000289 wd 0.0500 time 0.4422 (0.4503) data time 0.0006 (0.0035) model time 0.4416 (0.4458) loss 2.6795 (2.6450) grad_norm 1.6517 (2.2030) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][140/625] eta 0:03:38 lr 0.000289 wd 0.0500 time 0.4537 (0.4498) data time 0.0007 (0.0033) model time 0.4531 (0.4454) loss 2.5729 (2.6534) grad_norm 3.6390 (2.1976) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][150/625] eta 0:03:33 lr 0.000289 wd 0.0500 time 0.4474 (0.4494) data time 0.0009 (0.0031) model time 0.4465 (0.4452) loss 2.2015 (2.6514) grad_norm 2.0408 (2.2231) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][160/625] eta 0:03:28 lr 0.000289 wd 0.0500 time 0.4375 (0.4491) data time 0.0006 (0.0030) model time 0.4369 (0.4450) loss 1.6540 (2.6496) grad_norm 1.2700 (2.2018) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][170/625] eta 0:03:24 lr 0.000288 wd 0.0500 time 0.4450 (0.4487) data time 0.0006 (0.0028) model time 0.4444 (0.4447) loss 3.2908 (2.6483) grad_norm 2.1018 (2.1802) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][180/625] eta 0:03:19 lr 0.000288 wd 0.0500 time 0.4474 (0.4485) data time 0.0008 (0.0027) model time 0.4465 (0.4447) loss 2.7417 (2.6520) grad_norm 1.5750 (2.1772) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][190/625] eta 0:03:15 lr 0.000288 wd 0.0500 time 0.4494 (0.4483) data time 0.0009 (0.0026) model time 0.4485 (0.4446) loss 2.8561 (2.6627) grad_norm 2.3055 (2.2118) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][200/625] eta 0:03:10 lr 0.000288 wd 0.0500 time 0.4402 (0.4481) data time 0.0006 (0.0026) model time 0.4395 (0.4446) loss 2.8559 (2.6682) grad_norm 2.6175 (2.2198) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][210/625] eta 0:03:05 lr 0.000288 wd 0.0500 time 0.4459 (0.4479) data time 0.0006 (0.0025) model time 0.4453 (0.4445) loss 2.8675 (2.6834) grad_norm 1.5254 (2.2219) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][220/625] eta 0:03:01 lr 0.000288 wd 0.0500 time 0.4422 (0.4478) data time 0.0006 (0.0024) model time 0.4416 (0.4444) loss 3.2130 (2.6945) grad_norm 1.7370 (2.2344) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][230/625] eta 0:02:56 lr 0.000288 wd 0.0500 time 0.4381 (0.4476) data time 0.0007 (0.0023) model time 0.4375 (0.4444) loss 2.4664 (2.6932) grad_norm 2.3937 (2.2346) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][240/625] eta 0:02:52 lr 0.000288 wd 0.0500 time 0.4439 (0.4475) data time 0.0008 (0.0023) model time 0.4431 (0.4443) loss 2.6391 (2.6853) grad_norm 4.2153 (2.2452) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][250/625] eta 0:02:47 lr 0.000288 wd 0.0500 time 0.4417 (0.4473) data time 0.0009 (0.0022) model time 0.4408 (0.4443) loss 2.7947 (2.6731) grad_norm 2.5442 (2.2616) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][260/625] eta 0:02:43 lr 0.000288 wd 0.0500 time 0.4494 (0.4473) data time 0.0006 (0.0021) model time 0.4487 (0.4443) loss 2.8625 (2.6736) grad_norm 1.7648 (2.2751) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][270/625] eta 0:02:38 lr 0.000288 wd 0.0500 time 0.4431 (0.4472) data time 0.0008 (0.0021) model time 0.4423 (0.4443) loss 2.6637 (2.6847) grad_norm 1.8745 (2.2644) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][280/625] eta 0:02:34 lr 0.000287 wd 0.0500 time 0.4442 (0.4471) data time 0.0008 (0.0020) model time 0.4434 (0.4442) loss 3.0651 (2.6845) grad_norm 2.1650 (2.2552) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][290/625] eta 0:02:29 lr 0.000287 wd 0.0500 time 0.4475 (0.4470) data time 0.0007 (0.0020) model time 0.4468 (0.4442) loss 3.0385 (2.6908) grad_norm 2.1944 (2.2398) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][300/625] eta 0:02:25 lr 0.000287 wd 0.0500 time 0.4403 (0.4469) data time 0.0006 (0.0020) model time 0.4397 (0.4442) loss 2.2064 (2.6895) grad_norm 1.9575 (2.2412) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][310/625] eta 0:02:20 lr 0.000287 wd 0.0500 time 0.4448 (0.4468) data time 0.0007 (0.0019) model time 0.4440 (0.4442) loss 2.6077 (2.6848) grad_norm 2.3632 (2.2344) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][320/625] eta 0:02:16 lr 0.000287 wd 0.0500 time 0.4459 (0.4468) data time 0.0008 (0.0019) model time 0.4450 (0.4442) loss 3.1137 (2.6859) grad_norm 1.9092 (2.2246) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][330/625] eta 0:02:11 lr 0.000287 wd 0.0500 time 0.4448 (0.4467) data time 0.0006 (0.0019) model time 0.4442 (0.4442) loss 2.9238 (2.6835) grad_norm 1.6091 (2.2117) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][340/625] eta 0:02:07 lr 0.000287 wd 0.0500 time 0.4392 (0.4470) data time 0.0008 (0.0018) model time 0.4384 (0.4446) loss 2.7427 (2.6865) grad_norm 2.0840 (2.2396) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][350/625] eta 0:02:02 lr 0.000287 wd 0.0500 time 0.4448 (0.4469) data time 0.0008 (0.0018) model time 0.4440 (0.4446) loss 2.9178 (2.6886) grad_norm 3.7080 (2.2400) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][360/625] eta 0:01:58 lr 0.000287 wd 0.0500 time 0.4426 (0.4468) data time 0.0006 (0.0018) model time 0.4420 (0.4445) loss 3.2603 (2.6906) grad_norm 2.0526 (2.2592) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][370/625] eta 0:01:53 lr 0.000287 wd 0.0500 time 0.4462 (0.4468) data time 0.0008 (0.0017) model time 0.4454 (0.4445) loss 2.9316 (2.6928) grad_norm 1.9535 (2.2529) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][380/625] eta 0:01:49 lr 0.000287 wd 0.0500 time 0.4416 (0.4467) data time 0.0008 (0.0017) model time 0.4408 (0.4444) loss 1.8776 (2.6923) grad_norm 2.1229 (2.2472) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][390/625] eta 0:01:44 lr 0.000286 wd 0.0500 time 0.4452 (0.4466) data time 0.0008 (0.0017) model time 0.4444 (0.4444) loss 2.6608 (2.6934) grad_norm 2.4437 (2.2377) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][400/625] eta 0:01:40 lr 0.000286 wd 0.0500 time 0.4384 (0.4471) data time 0.0007 (0.0017) model time 0.4378 (0.4450) loss 2.8550 (2.6945) grad_norm 1.8418 (2.2275) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][410/625] eta 0:01:36 lr 0.000286 wd 0.0500 time 0.4470 (0.4470) data time 0.0008 (0.0016) model time 0.4462 (0.4449) loss 3.2711 (2.6963) grad_norm 2.4696 (2.2252) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][420/625] eta 0:01:31 lr 0.000286 wd 0.0500 time 0.4395 (0.4469) data time 0.0006 (0.0016) model time 0.4388 (0.4448) loss 2.0766 (2.6927) grad_norm 1.4031 (2.2161) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][430/625] eta 0:01:27 lr 0.000286 wd 0.0500 time 0.4456 (0.4468) data time 0.0006 (0.0016) model time 0.4450 (0.4447) loss 3.0453 (2.6906) grad_norm 6.4826 (2.2192) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][440/625] eta 0:01:22 lr 0.000286 wd 0.0500 time 0.4385 (0.4467) data time 0.0006 (0.0016) model time 0.4379 (0.4446) loss 1.5960 (2.6818) grad_norm 2.4829 (2.2259) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][450/625] eta 0:01:18 lr 0.000286 wd 0.0500 time 0.4447 (0.4466) data time 0.0006 (0.0016) model time 0.4441 (0.4445) loss 3.4982 (2.6898) grad_norm 2.2625 (2.2336) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][460/625] eta 0:01:13 lr 0.000286 wd 0.0500 time 0.4421 (0.4465) data time 0.0007 (0.0016) model time 0.4414 (0.4445) loss 3.0886 (2.6917) grad_norm 1.8373 (2.2300) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][470/625] eta 0:01:09 lr 0.000286 wd 0.0500 time 0.4459 (0.4465) data time 0.0008 (0.0015) model time 0.4452 (0.4445) loss 1.9719 (2.6874) grad_norm 2.6170 (2.2338) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][480/625] eta 0:01:04 lr 0.000286 wd 0.0500 time 0.4423 (0.4465) data time 0.0008 (0.0015) model time 0.4415 (0.4445) loss 3.4619 (2.6863) grad_norm 2.0591 (2.2367) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][490/625] eta 0:01:00 lr 0.000286 wd 0.0500 time 0.4523 (0.4468) data time 0.0006 (0.0015) model time 0.4517 (0.4449) loss 1.9464 (2.6857) grad_norm 2.0581 (2.2331) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][500/625] eta 0:00:55 lr 0.000285 wd 0.0500 time 0.4372 (0.4467) data time 0.0006 (0.0015) model time 0.4366 (0.4448) loss 2.7546 (2.6828) grad_norm 2.0131 (2.2277) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][510/625] eta 0:00:51 lr 0.000285 wd 0.0500 time 0.4391 (0.4466) data time 0.0007 (0.0015) model time 0.4383 (0.4448) loss 2.9211 (2.6857) grad_norm 1.6619 (2.2296) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 21:10:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:10:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:11:52 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 21:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 21:12:07 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 21:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 21:12:23 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 21:12:25 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 21:12:27 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 21:12:27 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 210) [2024-08-10 21:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 21:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][520/625] eta 0:18:03 lr 0.000285 wd 0.0500 time 1.2862 (10.3163) data time 0.0011 (0.3929) model time 1.2852 (9.9234) loss 3.2613 (3.4056) grad_norm 2.3926 (2.3460) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-10 21:12:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][530/625] eta 0:03:20 lr 0.000285 wd 0.0500 time 0.4651 (2.1103) data time 0.0008 (0.0664) model time 0.4643 (2.0439) loss 2.4578 (2.9491) grad_norm 2.9700 (2.6353) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][540/625] eta 0:01:55 lr 0.000285 wd 0.0500 time 0.4612 (1.3622) data time 0.0011 (0.0367) model time 0.4601 (1.3255) loss 2.8208 (2.9820) grad_norm 2.7789 (2.6263) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][550/625] eta 0:01:21 lr 0.000285 wd 0.0500 time 0.4778 (1.0914) data time 0.0008 (0.0255) model time 0.4770 (1.0658) loss 2.8600 (2.9940) grad_norm 2.1092 (2.6058) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][560/625] eta 0:01:01 lr 0.000285 wd 0.0500 time 0.4701 (0.9473) data time 0.0010 (0.0197) model time 0.4691 (0.9276) loss 2.8776 (2.9528) grad_norm 2.3220 (2.6268) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][570/625] eta 0:00:47 lr 0.000285 wd 0.0500 time 0.4613 (0.8547) data time 0.0009 (0.0161) model time 0.4604 (0.8386) loss 2.9152 (2.9132) grad_norm 3.0103 (2.6665) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][580/625] eta 0:00:35 lr 0.000285 wd 0.0500 time 0.4713 (0.7925) data time 0.0007 (0.0137) model time 0.4705 (0.7788) loss 2.9315 (2.8637) grad_norm 1.9689 (2.5400) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][590/625] eta 0:00:26 lr 0.000285 wd 0.0500 time 0.4657 (0.7477) data time 0.0011 (0.0119) model time 0.4647 (0.7358) loss 2.5327 (2.8215) grad_norm 1.7228 (2.4612) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][600/625] eta 0:00:17 lr 0.000285 wd 0.0500 time 0.4668 (0.7137) data time 0.0010 (0.0106) model time 0.4658 (0.7030) loss 2.7604 (2.8153) grad_norm 1.7437 (2.3828) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][610/625] eta 0:00:10 lr 0.000284 wd 0.0500 time 0.4642 (0.6870) data time 0.0006 (0.0096) model time 0.4636 (0.6773) loss 1.5230 (2.7853) grad_norm 2.3865 (2.3372) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [210/300][620/625] eta 0:00:03 lr 0.000284 wd 0.0500 time 0.4603 (0.6649) data time 0.0006 (0.0088) model time 0.4598 (0.6562) loss 3.1197 (2.8032) grad_norm 2.4220 (2.3604) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 21:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 210 training takes 0:01:09 [2024-08-10 21:13:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:13:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5479 (0.5479) Acc@1 87.744 (87.744) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-10 21:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.8403 (0.6280) Acc@1 80.615 (86.430) Acc@5 95.996 (97.692) Mem 16699MB [2024-08-10 21:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9238 (0.7486) Acc@1 79.053 (83.501) Acc@5 95.117 (96.519) Mem 16699MB [2024-08-10 21:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.269 Acc@5 96.507 [2024-08-10 21:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 21:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.862 (0.862) Loss 0.4749 (0.4749) Acc@1 89.648 (89.648) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-10 21:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.195) Loss 0.7505 (0.5851) Acc@1 82.324 (87.336) Acc@5 96.875 (97.989) Mem 16699MB [2024-08-10 21:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.8389 (0.6870) Acc@1 79.834 (84.624) Acc@5 96.045 (97.045) Mem 16699MB [2024-08-10 21:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.335 Acc@5 97.027 [2024-08-10 21:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.34% [2024-08-10 21:13:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 21:14:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 21:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][0/625] eta 0:10:40 lr 0.000284 wd 0.0500 time 1.0251 (1.0251) data time 0.4375 (0.4375) model time 0.0000 (0.0000) loss 2.5889 (2.5889) grad_norm 2.3045 (2.3045) loss_scale 256.0000 (256.0000) mem 16710MB [2024-08-10 21:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][10/625] eta 0:05:19 lr 0.000284 wd 0.0500 time 0.4726 (0.5195) data time 0.0010 (0.0409) model time 0.0000 (0.0000) loss 2.9410 (2.6923) grad_norm 1.7781 (2.1324) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][20/625] eta 0:05:00 lr 0.000284 wd 0.0500 time 0.4709 (0.4965) data time 0.0011 (0.0219) model time 0.0000 (0.0000) loss 3.0764 (2.6277) grad_norm 2.8427 (2.2547) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][30/625] eta 0:04:50 lr 0.000284 wd 0.0500 time 0.4666 (0.4883) data time 0.0008 (0.0152) model time 0.0000 (0.0000) loss 2.2276 (2.6610) grad_norm 2.6347 (2.2204) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][40/625] eta 0:04:43 lr 0.000284 wd 0.0500 time 0.4764 (0.4838) data time 0.0009 (0.0117) model time 0.0000 (0.0000) loss 2.7263 (2.6601) grad_norm 3.3912 (2.3149) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][50/625] eta 0:04:36 lr 0.000284 wd 0.0500 time 0.4603 (0.4802) data time 0.0011 (0.0096) model time 0.0000 (0.0000) loss 2.5128 (2.6875) grad_norm 6.0771 (2.4156) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][60/625] eta 0:04:30 lr 0.000284 wd 0.0500 time 0.4659 (0.4780) data time 0.0010 (0.0083) model time 0.4649 (0.4653) loss 2.8882 (2.6849) grad_norm 2.7815 (2.4028) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][70/625] eta 0:04:24 lr 0.000284 wd 0.0500 time 0.4678 (0.4763) data time 0.0011 (0.0072) model time 0.4667 (0.4653) loss 2.9296 (2.6739) grad_norm 2.0010 (2.4982) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][80/625] eta 0:04:19 lr 0.000284 wd 0.0500 time 0.4719 (0.4754) data time 0.0008 (0.0065) model time 0.4711 (0.4660) loss 2.5243 (2.6594) grad_norm 1.3944 (2.4292) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][90/625] eta 0:04:14 lr 0.000284 wd 0.0500 time 0.4780 (0.4765) data time 0.0008 (0.0059) model time 0.4772 (0.4707) loss 3.0137 (2.6598) grad_norm 2.3371 (2.4290) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][100/625] eta 0:04:09 lr 0.000283 wd 0.0500 time 0.4649 (0.4758) data time 0.0008 (0.0054) model time 0.4641 (0.4701) loss 2.7977 (2.6413) grad_norm 4.5911 (2.3974) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][110/625] eta 0:04:04 lr 0.000283 wd 0.0500 time 0.4680 (0.4751) data time 0.0008 (0.0050) model time 0.4672 (0.4696) loss 3.2694 (2.6428) grad_norm 2.3133 (2.3813) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][120/625] eta 0:04:00 lr 0.000283 wd 0.0500 time 0.4639 (0.4763) data time 0.0012 (0.0047) model time 0.4627 (0.4723) loss 2.5830 (2.6536) grad_norm 1.9690 (2.3809) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][130/625] eta 0:03:55 lr 0.000283 wd 0.0500 time 0.4700 (0.4757) data time 0.0008 (0.0044) model time 0.4692 (0.4717) loss 2.9949 (2.6471) grad_norm 3.5482 (2.3875) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][140/625] eta 0:03:50 lr 0.000283 wd 0.0500 time 0.4638 (0.4750) data time 0.0012 (0.0042) model time 0.4626 (0.4710) loss 2.8315 (2.6430) grad_norm 4.9847 (2.4153) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][150/625] eta 0:03:45 lr 0.000283 wd 0.0500 time 0.4727 (0.4746) data time 0.0009 (0.0040) model time 0.4719 (0.4707) loss 2.0913 (2.6287) grad_norm 2.1086 (2.3948) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][160/625] eta 0:03:40 lr 0.000283 wd 0.0500 time 0.4646 (0.4742) data time 0.0007 (0.0038) model time 0.4639 (0.4703) loss 1.8286 (2.6301) grad_norm 1.8135 (2.3900) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][170/625] eta 0:03:35 lr 0.000283 wd 0.0500 time 0.4673 (0.4738) data time 0.0009 (0.0036) model time 0.4663 (0.4701) loss 2.8534 (2.6475) grad_norm 2.0606 (2.3716) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][180/625] eta 0:03:30 lr 0.000283 wd 0.0500 time 0.4656 (0.4735) data time 0.0007 (0.0035) model time 0.4649 (0.4697) loss 3.3369 (2.6481) grad_norm 2.0738 (2.3465) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][190/625] eta 0:03:25 lr 0.000283 wd 0.0500 time 0.4689 (0.4732) data time 0.0010 (0.0034) model time 0.4679 (0.4696) loss 1.5177 (2.6345) grad_norm 2.7164 (2.3604) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][200/625] eta 0:03:20 lr 0.000283 wd 0.0500 time 0.4625 (0.4729) data time 0.0008 (0.0032) model time 0.4617 (0.4693) loss 3.0749 (2.6295) grad_norm 2.0000 (2.3530) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][210/625] eta 0:03:16 lr 0.000282 wd 0.0500 time 0.4682 (0.4726) data time 0.0011 (0.0031) model time 0.4672 (0.4691) loss 2.9144 (2.6433) grad_norm 2.3035 (2.3577) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][220/625] eta 0:03:11 lr 0.000282 wd 0.0500 time 0.4670 (0.4724) data time 0.0008 (0.0030) model time 0.4662 (0.4690) loss 2.2692 (2.6557) grad_norm 1.7351 (2.3504) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][230/625] eta 0:03:06 lr 0.000282 wd 0.0500 time 0.4808 (0.4723) data time 0.0011 (0.0030) model time 0.4797 (0.4690) loss 2.1193 (2.6521) grad_norm 1.4796 (2.3253) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:15:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][240/625] eta 0:03:01 lr 0.000282 wd 0.0500 time 0.4753 (0.4723) data time 0.0008 (0.0029) model time 0.4745 (0.4691) loss 2.3886 (2.6585) grad_norm 2.1481 (2.3188) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][250/625] eta 0:02:57 lr 0.000282 wd 0.0500 time 0.4682 (0.4722) data time 0.0010 (0.0028) model time 0.4672 (0.4691) loss 2.3071 (2.6591) grad_norm 11.5493 (2.3792) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][260/625] eta 0:02:52 lr 0.000282 wd 0.0500 time 0.4681 (0.4720) data time 0.0008 (0.0027) model time 0.4673 (0.4689) loss 2.0078 (2.6606) grad_norm 2.9801 (2.3824) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][270/625] eta 0:02:47 lr 0.000282 wd 0.0500 time 0.4677 (0.4718) data time 0.0010 (0.0027) model time 0.4666 (0.4688) loss 2.6729 (2.6592) grad_norm 1.6996 (2.3772) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][280/625] eta 0:02:42 lr 0.000282 wd 0.0500 time 0.4598 (0.4718) data time 0.0009 (0.0026) model time 0.4589 (0.4689) loss 3.1233 (2.6540) grad_norm 2.1838 (2.3751) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][290/625] eta 0:02:38 lr 0.000282 wd 0.0500 time 0.4670 (0.4717) data time 0.0008 (0.0026) model time 0.4662 (0.4688) loss 2.6984 (2.6584) grad_norm 2.0425 (2.3565) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][300/625] eta 0:02:33 lr 0.000282 wd 0.0500 time 0.4678 (0.4715) data time 0.0010 (0.0025) model time 0.4668 (0.4687) loss 2.7809 (2.6650) grad_norm 1.6312 (2.3461) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][310/625] eta 0:02:28 lr 0.000282 wd 0.0500 time 0.4685 (0.4714) data time 0.0008 (0.0025) model time 0.4677 (0.4686) loss 2.0387 (2.6689) grad_norm 1.8112 (2.3274) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][320/625] eta 0:02:23 lr 0.000281 wd 0.0500 time 0.4714 (0.4713) data time 0.0011 (0.0024) model time 0.4703 (0.4685) loss 2.8018 (2.6718) grad_norm 2.3871 (2.3342) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][330/625] eta 0:02:18 lr 0.000281 wd 0.0500 time 0.4648 (0.4712) data time 0.0010 (0.0024) model time 0.4639 (0.4685) loss 2.9215 (2.6790) grad_norm 1.8229 (2.3315) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][340/625] eta 0:02:14 lr 0.000281 wd 0.0500 time 0.4689 (0.4711) data time 0.0011 (0.0023) model time 0.4678 (0.4684) loss 2.1681 (2.6792) grad_norm 2.8302 (2.3420) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][350/625] eta 0:02:09 lr 0.000281 wd 0.0500 time 0.4669 (0.4709) data time 0.0011 (0.0023) model time 0.4658 (0.4683) loss 1.9187 (2.6760) grad_norm 2.0168 (2.4459) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][360/625] eta 0:02:04 lr 0.000281 wd 0.0500 time 0.4667 (0.4708) data time 0.0011 (0.0023) model time 0.4656 (0.4682) loss 2.6958 (2.6758) grad_norm 2.6603 (2.4402) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:16:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][370/625] eta 0:02:00 lr 0.000281 wd 0.0500 time 0.4644 (0.4707) data time 0.0008 (0.0022) model time 0.4637 (0.4682) loss 3.4060 (2.6762) grad_norm 20.8883 (2.4787) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][380/625] eta 0:01:55 lr 0.000281 wd 0.0500 time 0.4640 (0.4706) data time 0.0010 (0.0022) model time 0.4630 (0.4681) loss 2.8830 (2.6799) grad_norm 1.8649 (2.4601) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][390/625] eta 0:01:50 lr 0.000281 wd 0.0500 time 0.4711 (0.4706) data time 0.0008 (0.0022) model time 0.4703 (0.4681) loss 2.7354 (2.6804) grad_norm 3.7390 (2.4612) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][400/625] eta 0:01:45 lr 0.000281 wd 0.0500 time 0.4671 (0.4705) data time 0.0010 (0.0021) model time 0.4660 (0.4680) loss 2.5539 (2.6812) grad_norm 1.7458 (2.4470) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][410/625] eta 0:01:41 lr 0.000281 wd 0.0500 time 0.4656 (0.4704) data time 0.0011 (0.0021) model time 0.4645 (0.4680) loss 2.6438 (2.6893) grad_norm 3.3969 (2.4431) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][420/625] eta 0:01:36 lr 0.000281 wd 0.0500 time 0.4629 (0.4703) data time 0.0009 (0.0021) model time 0.4620 (0.4679) loss 3.2030 (2.6850) grad_norm 2.0520 (2.4469) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][430/625] eta 0:01:31 lr 0.000281 wd 0.0500 time 0.4648 (0.4702) data time 0.0008 (0.0021) model time 0.4640 (0.4678) loss 1.6558 (2.6791) grad_norm 2.2231 (2.4321) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][440/625] eta 0:01:26 lr 0.000280 wd 0.0500 time 0.4699 (0.4702) data time 0.0007 (0.0020) model time 0.4692 (0.4678) loss 3.0231 (2.6812) grad_norm 1.7727 (2.4300) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][450/625] eta 0:01:22 lr 0.000280 wd 0.0500 time 0.4628 (0.4706) data time 0.0008 (0.0020) model time 0.4621 (0.4683) loss 1.7005 (2.6852) grad_norm 1.6771 (2.4266) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][460/625] eta 0:01:17 lr 0.000280 wd 0.0500 time 0.4639 (0.4705) data time 0.0011 (0.0020) model time 0.4628 (0.4683) loss 3.0421 (2.6870) grad_norm 2.7785 (2.4214) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][470/625] eta 0:01:12 lr 0.000280 wd 0.0500 time 0.4657 (0.4705) data time 0.0009 (0.0020) model time 0.4649 (0.4682) loss 3.1669 (2.6889) grad_norm 2.6923 (2.4146) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][480/625] eta 0:01:08 lr 0.000280 wd 0.0500 time 0.4648 (0.4704) data time 0.0008 (0.0020) model time 0.4640 (0.4682) loss 2.7474 (2.6908) grad_norm 1.8623 (2.4033) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][490/625] eta 0:01:03 lr 0.000280 wd 0.0500 time 0.4641 (0.4703) data time 0.0010 (0.0019) model time 0.4631 (0.4681) loss 3.2713 (2.6926) grad_norm 1.5831 (2.4107) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][500/625] eta 0:00:58 lr 0.000280 wd 0.0500 time 0.4641 (0.4705) data time 0.0010 (0.0019) model time 0.4631 (0.4684) loss 3.1643 (2.6899) grad_norm 1.4172 (2.4112) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][510/625] eta 0:00:54 lr 0.000280 wd 0.0500 time 0.4648 (0.4704) data time 0.0011 (0.0019) model time 0.4637 (0.4683) loss 2.7564 (2.6917) grad_norm 6.9667 (2.4320) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][520/625] eta 0:00:49 lr 0.000280 wd 0.0500 time 0.4657 (0.4703) data time 0.0008 (0.0019) model time 0.4650 (0.4682) loss 2.6896 (2.6959) grad_norm 3.6565 (2.4445) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][530/625] eta 0:00:44 lr 0.000280 wd 0.0500 time 0.4650 (0.4702) data time 0.0007 (0.0019) model time 0.4643 (0.4681) loss 2.5404 (2.6965) grad_norm 2.2568 (2.4392) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][540/625] eta 0:00:39 lr 0.000280 wd 0.0500 time 0.4678 (0.4702) data time 0.0008 (0.0018) model time 0.4671 (0.4682) loss 2.9661 (2.6944) grad_norm 2.1760 (2.4283) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][550/625] eta 0:00:35 lr 0.000279 wd 0.0500 time 0.4640 (0.4702) data time 0.0012 (0.0018) model time 0.4628 (0.4681) loss 2.1535 (2.6935) grad_norm 1.4069 (2.4174) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][560/625] eta 0:00:30 lr 0.000279 wd 0.0500 time 0.4598 (0.4701) data time 0.0010 (0.0018) model time 0.4587 (0.4681) loss 3.0802 (2.6926) grad_norm 2.8429 (2.4240) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][570/625] eta 0:00:25 lr 0.000279 wd 0.0500 time 0.4676 (0.4700) data time 0.0008 (0.0018) model time 0.4669 (0.4680) loss 2.6800 (2.6957) grad_norm 1.6026 (2.4189) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][580/625] eta 0:00:21 lr 0.000279 wd 0.0500 time 0.4620 (0.4700) data time 0.0008 (0.0018) model time 0.4613 (0.4679) loss 3.0390 (2.6980) grad_norm 1.8556 (2.4274) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][590/625] eta 0:00:16 lr 0.000279 wd 0.0500 time 0.4662 (0.4699) data time 0.0010 (0.0018) model time 0.4652 (0.4679) loss 1.6901 (2.6946) grad_norm 2.0520 (2.4426) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][600/625] eta 0:00:11 lr 0.000279 wd 0.0500 time 0.4682 (0.4698) data time 0.0007 (0.0018) model time 0.4674 (0.4678) loss 2.1357 (2.6949) grad_norm 2.1217 (2.4384) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][610/625] eta 0:00:07 lr 0.000279 wd 0.0500 time 0.4645 (0.4698) data time 0.0008 (0.0018) model time 0.4637 (0.4678) loss 1.7871 (2.6902) grad_norm 2.1937 (2.4371) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [211/300][620/625] eta 0:00:02 lr 0.000279 wd 0.0500 time 0.4659 (0.4698) data time 0.0006 (0.0017) model time 0.4653 (0.4678) loss 2.0816 (2.6893) grad_norm 3.4102 (2.4519) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:18:57 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 211 training takes 0:04:53 [2024-08-10 21:18:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:18:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:18:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.514 (0.514) Loss 0.5156 (0.5156) Acc@1 88.525 (88.525) Acc@5 98.877 (98.877) Mem 16706MB [2024-08-10 21:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.161) Loss 0.8408 (0.6272) Acc@1 80.469 (86.470) Acc@5 95.850 (97.701) Mem 16706MB [2024-08-10 21:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.9058 (0.7423) Acc@1 78.857 (83.601) Acc@5 95.020 (96.556) Mem 16706MB [2024-08-10 21:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.315 Acc@5 96.555 [2024-08-10 21:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 21:19:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.830 (0.830) Loss 0.4771 (0.4771) Acc@1 89.502 (89.502) Acc@5 98.877 (98.877) Mem 16706MB [2024-08-10 21:19:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.192) Loss 0.7534 (0.5857) Acc@1 82.031 (87.278) Acc@5 96.924 (97.989) Mem 16706MB [2024-08-10 21:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.157) Loss 0.8403 (0.6878) Acc@1 80.029 (84.608) Acc@5 95.947 (97.021) Mem 16706MB [2024-08-10 21:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.317 Acc@5 97.005 [2024-08-10 21:19:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:19:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][0/625] eta 0:13:09 lr 0.000279 wd 0.0500 time 1.2637 (1.2637) data time 0.7567 (0.7567) model time 0.0000 (0.0000) loss 2.8603 (2.8603) grad_norm 1.8864 (1.8864) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][10/625] eta 0:05:31 lr 0.000279 wd 0.0500 time 0.4662 (0.5396) data time 0.0011 (0.0700) model time 0.0000 (0.0000) loss 2.0396 (2.9297) grad_norm 2.7041 (2.5806) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][20/625] eta 0:05:12 lr 0.000279 wd 0.0500 time 0.6854 (0.5157) data time 0.0008 (0.0371) model time 0.0000 (0.0000) loss 2.9398 (2.7790) grad_norm 2.2956 (2.7404) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][30/625] eta 0:04:57 lr 0.000279 wd 0.0500 time 0.4606 (0.5001) data time 0.0008 (0.0255) model time 0.0000 (0.0000) loss 2.9492 (2.7399) grad_norm 7.3788 (2.9135) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][40/625] eta 0:04:48 lr 0.000278 wd 0.0500 time 0.4659 (0.4924) data time 0.0007 (0.0195) model time 0.0000 (0.0000) loss 2.8459 (2.7587) grad_norm 15.4362 (3.0944) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][50/625] eta 0:04:40 lr 0.000278 wd 0.0500 time 0.4687 (0.4877) data time 0.0008 (0.0159) model time 0.0000 (0.0000) loss 2.4064 (2.7446) grad_norm 1.8366 (3.6965) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][60/625] eta 0:04:33 lr 0.000278 wd 0.0500 time 0.4660 (0.4845) data time 0.0011 (0.0135) model time 0.4649 (0.4672) loss 3.1985 (2.7356) grad_norm 1.5019 (3.3738) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][70/625] eta 0:04:27 lr 0.000278 wd 0.0500 time 0.4694 (0.4821) data time 0.0008 (0.0118) model time 0.4686 (0.4668) loss 2.1080 (2.7255) grad_norm 3.9836 (3.3654) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][80/625] eta 0:04:21 lr 0.000278 wd 0.0500 time 0.4683 (0.4801) data time 0.0010 (0.0104) model time 0.4674 (0.4661) loss 2.8125 (2.6684) grad_norm 2.2308 (3.2982) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][90/625] eta 0:04:16 lr 0.000278 wd 0.0500 time 0.4682 (0.4787) data time 0.0008 (0.0094) model time 0.4674 (0.4661) loss 2.1365 (2.6679) grad_norm 1.2612 (3.1562) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][100/625] eta 0:04:10 lr 0.000278 wd 0.0500 time 0.4634 (0.4775) data time 0.0008 (0.0086) model time 0.4627 (0.4661) loss 2.6099 (2.6536) grad_norm 2.0955 (3.0809) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:19:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][110/625] eta 0:04:05 lr 0.000278 wd 0.0500 time 0.4658 (0.4767) data time 0.0011 (0.0079) model time 0.4647 (0.4663) loss 2.2337 (2.6485) grad_norm 1.8876 (2.9807) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][120/625] eta 0:04:00 lr 0.000278 wd 0.0500 time 0.4704 (0.4760) data time 0.0008 (0.0073) model time 0.4696 (0.4664) loss 2.7062 (2.6442) grad_norm 2.0163 (2.9126) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][130/625] eta 0:03:55 lr 0.000278 wd 0.0500 time 0.4723 (0.4756) data time 0.0011 (0.0068) model time 0.4712 (0.4667) loss 2.6044 (2.6505) grad_norm 1.6589 (2.8464) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][140/625] eta 0:03:50 lr 0.000278 wd 0.0500 time 0.4676 (0.4751) data time 0.0009 (0.0064) model time 0.4667 (0.4670) loss 1.5682 (2.6470) grad_norm 2.0951 (2.7933) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][150/625] eta 0:03:45 lr 0.000277 wd 0.0500 time 0.4663 (0.4748) data time 0.0008 (0.0061) model time 0.4655 (0.4671) loss 1.5725 (2.6399) grad_norm 2.3465 (2.7400) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][160/625] eta 0:03:40 lr 0.000277 wd 0.0500 time 0.4710 (0.4744) data time 0.0009 (0.0058) model time 0.4701 (0.4672) loss 2.8169 (2.6420) grad_norm 2.0955 (2.7241) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][170/625] eta 0:03:35 lr 0.000277 wd 0.0500 time 0.4707 (0.4741) data time 0.0008 (0.0055) model time 0.4700 (0.4673) loss 3.1361 (2.6437) grad_norm 2.4492 (2.7067) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][180/625] eta 0:03:30 lr 0.000277 wd 0.0500 time 0.4639 (0.4738) data time 0.0009 (0.0052) model time 0.4630 (0.4673) loss 1.9005 (2.6425) grad_norm 1.9666 (2.6716) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][190/625] eta 0:03:25 lr 0.000277 wd 0.0500 time 0.4675 (0.4736) data time 0.0011 (0.0050) model time 0.4664 (0.4673) loss 2.8238 (2.6476) grad_norm 1.5543 (2.6598) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][200/625] eta 0:03:21 lr 0.000277 wd 0.0500 time 0.4707 (0.4732) data time 0.0008 (0.0048) model time 0.4699 (0.4672) loss 3.2813 (2.6652) grad_norm 2.0592 (2.6498) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][210/625] eta 0:03:16 lr 0.000277 wd 0.0500 time 0.6371 (0.4737) data time 0.0011 (0.0046) model time 0.6360 (0.4682) loss 1.8299 (2.6583) grad_norm 1.9035 (2.6720) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][220/625] eta 0:03:11 lr 0.000277 wd 0.0500 time 0.4729 (0.4734) data time 0.0010 (0.0045) model time 0.4718 (0.4680) loss 3.0337 (2.6528) grad_norm 3.3946 (2.6435) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:20:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][230/625] eta 0:03:06 lr 0.000277 wd 0.0500 time 0.4666 (0.4733) data time 0.0010 (0.0043) model time 0.4655 (0.4682) loss 2.4212 (2.6545) grad_norm 1.8975 (2.6119) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][240/625] eta 0:03:02 lr 0.000277 wd 0.0500 time 0.4660 (0.4730) data time 0.0008 (0.0042) model time 0.4652 (0.4680) loss 3.0092 (2.6581) grad_norm 1.5784 (2.5806) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][250/625] eta 0:02:57 lr 0.000277 wd 0.0500 time 0.4643 (0.4727) data time 0.0008 (0.0041) model time 0.4634 (0.4678) loss 3.3647 (2.6572) grad_norm 5.2960 (2.5742) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][260/625] eta 0:02:52 lr 0.000276 wd 0.0500 time 0.4629 (0.4724) data time 0.0010 (0.0040) model time 0.4619 (0.4676) loss 2.6204 (2.6552) grad_norm 1.4359 (2.5658) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][270/625] eta 0:02:47 lr 0.000276 wd 0.0500 time 0.4705 (0.4722) data time 0.0010 (0.0038) model time 0.4695 (0.4675) loss 2.6454 (2.6547) grad_norm 1.5808 (2.5411) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][280/625] eta 0:02:42 lr 0.000276 wd 0.0500 time 0.4720 (0.4720) data time 0.0009 (0.0038) model time 0.4711 (0.4675) loss 2.9965 (2.6638) grad_norm 2.1032 (2.5995) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][290/625] eta 0:02:38 lr 0.000276 wd 0.0500 time 0.4636 (0.4719) data time 0.0010 (0.0037) model time 0.4626 (0.4675) loss 2.8328 (2.6632) grad_norm 1.8191 (2.5861) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][300/625] eta 0:02:33 lr 0.000276 wd 0.0500 time 0.4667 (0.4717) data time 0.0008 (0.0036) model time 0.4658 (0.4674) loss 3.0927 (2.6574) grad_norm 1.9497 (2.5886) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][310/625] eta 0:02:28 lr 0.000276 wd 0.0500 time 0.4651 (0.4715) data time 0.0008 (0.0035) model time 0.4643 (0.4673) loss 2.8160 (2.6655) grad_norm 1.6291 (2.5716) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][320/625] eta 0:02:23 lr 0.000276 wd 0.0500 time 0.4686 (0.4713) data time 0.0010 (0.0034) model time 0.4676 (0.4672) loss 3.1509 (2.6671) grad_norm 2.5140 (2.5621) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][330/625] eta 0:02:19 lr 0.000276 wd 0.0500 time 0.4694 (0.4712) data time 0.0009 (0.0033) model time 0.4685 (0.4672) loss 3.0225 (2.6691) grad_norm 1.5124 (2.5479) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][340/625] eta 0:02:14 lr 0.000276 wd 0.0500 time 0.4696 (0.4711) data time 0.0012 (0.0033) model time 0.4683 (0.4672) loss 1.7568 (2.6622) grad_norm 2.0741 (2.5267) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][350/625] eta 0:02:09 lr 0.000276 wd 0.0500 time 0.4623 (0.4710) data time 0.0011 (0.0032) model time 0.4611 (0.4671) loss 2.8766 (2.6640) grad_norm 2.1826 (2.5213) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:21:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][360/625] eta 0:02:04 lr 0.000276 wd 0.0500 time 0.4692 (0.4709) data time 0.0008 (0.0032) model time 0.4683 (0.4671) loss 2.2068 (2.6641) grad_norm 1.4654 (2.5015) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][370/625] eta 0:02:00 lr 0.000275 wd 0.0500 time 0.4641 (0.4708) data time 0.0010 (0.0031) model time 0.4630 (0.4670) loss 2.3460 (2.6627) grad_norm 32.7128 (2.5716) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][380/625] eta 0:01:55 lr 0.000275 wd 0.0500 time 0.4642 (0.4711) data time 0.0010 (0.0031) model time 0.4632 (0.4674) loss 2.8973 (2.6663) grad_norm 1.7706 (2.5591) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][390/625] eta 0:01:50 lr 0.000275 wd 0.0500 time 0.4681 (0.4710) data time 0.0008 (0.0030) model time 0.4673 (0.4674) loss 2.6934 (2.6689) grad_norm 1.6095 (2.5401) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][400/625] eta 0:01:45 lr 0.000275 wd 0.0500 time 0.4738 (0.4709) data time 0.0008 (0.0030) model time 0.4730 (0.4673) loss 1.5767 (2.6718) grad_norm 1.9826 (2.5484) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][410/625] eta 0:01:41 lr 0.000275 wd 0.0500 time 0.4642 (0.4708) data time 0.0011 (0.0029) model time 0.4631 (0.4673) loss 3.2534 (2.6723) grad_norm 1.9191 (2.5431) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][420/625] eta 0:01:36 lr 0.000275 wd 0.0500 time 0.4651 (0.4707) data time 0.0011 (0.0029) model time 0.4641 (0.4673) loss 3.2922 (2.6738) grad_norm 2.9968 (2.5270) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][430/625] eta 0:01:31 lr 0.000275 wd 0.0500 time 0.7060 (0.4712) data time 0.0008 (0.0028) model time 0.7052 (0.4679) loss 2.4529 (2.6741) grad_norm 1.6768 (2.5163) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][440/625] eta 0:01:27 lr 0.000275 wd 0.0500 time 0.4658 (0.4711) data time 0.0008 (0.0028) model time 0.4650 (0.4678) loss 2.4507 (2.6731) grad_norm 2.7393 (2.5062) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][450/625] eta 0:01:22 lr 0.000275 wd 0.0500 time 0.4663 (0.4710) data time 0.0007 (0.0027) model time 0.4655 (0.4678) loss 2.7933 (2.6786) grad_norm 2.2266 (2.4958) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][460/625] eta 0:01:17 lr 0.000275 wd 0.0500 time 0.4685 (0.4709) data time 0.0010 (0.0027) model time 0.4675 (0.4677) loss 2.4419 (2.6833) grad_norm 2.4527 (2.4932) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][470/625] eta 0:01:12 lr 0.000275 wd 0.0500 time 0.4623 (0.4708) data time 0.0010 (0.0027) model time 0.4613 (0.4677) loss 1.8964 (2.6837) grad_norm 1.6028 (2.4832) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][480/625] eta 0:01:08 lr 0.000275 wd 0.0500 time 0.4650 (0.4707) data time 0.0008 (0.0026) model time 0.4642 (0.4676) loss 3.4296 (2.6870) grad_norm 1.9420 (2.4682) loss_scale 256.0000 (256.0000) mem 16706MB [2024-08-10 21:22:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][490/625] eta 0:01:03 lr 0.000274 wd 0.0500 time 0.4713 (0.4706) data time 0.0010 (0.0026) model time 0.4704 (0.4676) loss 3.1933 (2.6858) grad_norm 1.8578 (2.4536) loss_scale 512.0000 (257.0428) mem 16706MB [2024-08-10 21:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][500/625] eta 0:00:58 lr 0.000274 wd 0.0500 time 0.4652 (0.4706) data time 0.0008 (0.0026) model time 0.4645 (0.4676) loss 1.9730 (2.6826) grad_norm 1.9781 (2.4494) loss_scale 512.0000 (262.1317) mem 16706MB [2024-08-10 21:23:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][510/625] eta 0:00:54 lr 0.000274 wd 0.0500 time 0.4595 (0.4705) data time 0.0008 (0.0025) model time 0.4587 (0.4675) loss 2.7267 (2.6849) grad_norm 4.5486 (2.4426) loss_scale 512.0000 (267.0215) mem 16706MB [2024-08-10 21:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][520/625] eta 0:00:49 lr 0.000274 wd 0.0500 time 0.4665 (0.4705) data time 0.0010 (0.0025) model time 0.4655 (0.4675) loss 2.9221 (2.6797) grad_norm 3.5560 (2.4409) loss_scale 512.0000 (271.7236) mem 16706MB [2024-08-10 21:23:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][530/625] eta 0:00:44 lr 0.000274 wd 0.0500 time 0.4697 (0.4704) data time 0.0008 (0.0025) model time 0.4688 (0.4675) loss 2.4896 (2.6804) grad_norm 1.8626 (2.4347) loss_scale 512.0000 (276.2486) mem 16706MB [2024-08-10 21:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][540/625] eta 0:00:39 lr 0.000274 wd 0.0500 time 0.4636 (0.4703) data time 0.0011 (0.0025) model time 0.4625 (0.4675) loss 2.3971 (2.6818) grad_norm 2.3992 (2.4295) loss_scale 512.0000 (280.6063) mem 16706MB [2024-08-10 21:23:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][550/625] eta 0:00:35 lr 0.000274 wd 0.0500 time 0.4693 (0.4703) data time 0.0010 (0.0024) model time 0.4683 (0.4675) loss 2.9122 (2.6875) grad_norm 1.7768 (2.4263) loss_scale 512.0000 (284.8058) mem 16706MB [2024-08-10 21:23:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][560/625] eta 0:00:30 lr 0.000274 wd 0.0500 time 0.4708 (0.4703) data time 0.0011 (0.0024) model time 0.4696 (0.4675) loss 2.1265 (2.6885) grad_norm 2.0743 (2.4557) loss_scale 512.0000 (288.8556) mem 16706MB [2024-08-10 21:23:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][570/625] eta 0:00:25 lr 0.000274 wd 0.0500 time 0.4651 (0.4702) data time 0.0008 (0.0024) model time 0.4643 (0.4675) loss 2.5179 (2.6893) grad_norm 1.6654 (2.4558) loss_scale 512.0000 (292.7636) mem 16706MB [2024-08-10 21:23:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][580/625] eta 0:00:21 lr 0.000274 wd 0.0500 time 0.4653 (0.4705) data time 0.0011 (0.0024) model time 0.4642 (0.4678) loss 2.4647 (2.6875) grad_norm 2.2600 (2.4508) loss_scale 512.0000 (296.5370) mem 16706MB [2024-08-10 21:23:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][590/625] eta 0:00:16 lr 0.000274 wd 0.0500 time 0.4677 (0.4705) data time 0.0008 (0.0023) model time 0.4668 (0.4678) loss 2.3660 (2.6825) grad_norm 1.7900 (2.4428) loss_scale 512.0000 (300.1827) mem 16706MB [2024-08-10 21:23:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][600/625] eta 0:00:11 lr 0.000273 wd 0.0500 time 0.4703 (0.4704) data time 0.0008 (0.0023) model time 0.4695 (0.4677) loss 2.6829 (2.6844) grad_norm 2.7772 (2.4353) loss_scale 512.0000 (303.7072) mem 16706MB [2024-08-10 21:23:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][610/625] eta 0:00:07 lr 0.000273 wd 0.0500 time 0.4612 (0.4703) data time 0.0008 (0.0023) model time 0.4604 (0.4677) loss 1.8589 (2.6850) grad_norm 2.3563 (2.4347) loss_scale 512.0000 (307.1162) mem 16706MB [2024-08-10 21:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [212/300][620/625] eta 0:00:02 lr 0.000273 wd 0.0500 time 0.4623 (0.4702) data time 0.0005 (0.0023) model time 0.4617 (0.4676) loss 2.9602 (2.6876) grad_norm 1.5886 (2.4301) loss_scale 512.0000 (310.4155) mem 16706MB [2024-08-10 21:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 212 training takes 0:04:53 [2024-08-10 21:24:00 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:24:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:24:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.522 (0.522) Loss 0.5122 (0.5122) Acc@1 88.574 (88.574) Acc@5 98.877 (98.877) Mem 16706MB [2024-08-10 21:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.162) Loss 0.8145 (0.6156) Acc@1 81.055 (86.670) Acc@5 96.045 (97.776) Mem 16706MB [2024-08-10 21:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.142) Loss 0.8784 (0.7282) Acc@1 79.932 (83.840) Acc@5 95.020 (96.596) Mem 16706MB [2024-08-10 21:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.509 Acc@5 96.581 [2024-08-10 21:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 21:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.51% [2024-08-10 21:24:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 21:24:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 21:24:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.507 (0.507) Loss 0.4773 (0.4773) Acc@1 89.502 (89.502) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.7559 (0.5862) Acc@1 82.178 (87.278) Acc@5 97.070 (98.025) Mem 16706MB [2024-08-10 21:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.8433 (0.6885) Acc@1 79.932 (84.610) Acc@5 95.947 (97.045) Mem 16706MB [2024-08-10 21:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.321 Acc@5 97.033 [2024-08-10 21:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:24:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][0/625] eta 0:16:13 lr 0.000273 wd 0.0500 time 1.5573 (1.5573) data time 0.7440 (0.7440) model time 0.0000 (0.0000) loss 3.3452 (3.3452) grad_norm 1.9587 (1.9587) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][10/625] eta 0:05:44 lr 0.000273 wd 0.0500 time 0.4625 (0.5606) data time 0.0011 (0.0688) model time 0.0000 (0.0000) loss 2.8193 (2.9630) grad_norm 2.5580 (2.2617) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][20/625] eta 0:05:12 lr 0.000273 wd 0.0500 time 0.4631 (0.5160) data time 0.0011 (0.0366) model time 0.0000 (0.0000) loss 1.9484 (2.8081) grad_norm 6.8065 (2.3978) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][30/625] eta 0:04:57 lr 0.000273 wd 0.0500 time 0.4675 (0.5002) data time 0.0010 (0.0251) model time 0.0000 (0.0000) loss 2.6812 (2.7891) grad_norm 1.4993 (2.2516) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][40/625] eta 0:04:47 lr 0.000273 wd 0.0500 time 0.4641 (0.4922) data time 0.0010 (0.0192) model time 0.0000 (0.0000) loss 3.0584 (2.7664) grad_norm 1.6807 (2.2549) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][50/625] eta 0:04:40 lr 0.000273 wd 0.0500 time 0.4677 (0.4873) data time 0.0008 (0.0157) model time 0.0000 (0.0000) loss 2.9602 (2.7883) grad_norm 2.3610 (2.1642) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][60/625] eta 0:04:33 lr 0.000273 wd 0.0500 time 0.4651 (0.4838) data time 0.0010 (0.0133) model time 0.4641 (0.4652) loss 1.9843 (2.7534) grad_norm 1.5953 (2.1783) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][70/625] eta 0:04:27 lr 0.000273 wd 0.0500 time 0.4829 (0.4823) data time 0.0008 (0.0115) model time 0.4821 (0.4687) loss 2.1501 (2.7540) grad_norm 2.6612 (2.2128) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][80/625] eta 0:04:22 lr 0.000273 wd 0.0500 time 0.4696 (0.4808) data time 0.0008 (0.0102) model time 0.4688 (0.4687) loss 2.6759 (2.7481) grad_norm 2.8234 (2.2115) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:24:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][90/625] eta 0:04:16 lr 0.000272 wd 0.0500 time 0.4671 (0.4797) data time 0.0007 (0.0092) model time 0.4664 (0.4689) loss 2.9974 (2.7507) grad_norm 2.6139 (2.3406) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][100/625] eta 0:04:11 lr 0.000272 wd 0.0500 time 0.4663 (0.4787) data time 0.0010 (0.0084) model time 0.4653 (0.4690) loss 2.7116 (2.7467) grad_norm 2.9943 (2.3528) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][110/625] eta 0:04:06 lr 0.000272 wd 0.0500 time 0.4741 (0.4777) data time 0.0010 (0.0078) model time 0.4731 (0.4684) loss 2.5919 (2.7357) grad_norm 1.3756 (2.3038) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][120/625] eta 0:04:00 lr 0.000272 wd 0.0500 time 0.4702 (0.4769) data time 0.0010 (0.0072) model time 0.4692 (0.4683) loss 1.9613 (2.7200) grad_norm 1.9746 (2.2679) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][130/625] eta 0:03:56 lr 0.000272 wd 0.0500 time 0.4680 (0.4777) data time 0.0007 (0.0068) model time 0.4673 (0.4706) loss 2.1921 (2.6910) grad_norm 1.5926 (2.2300) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][140/625] eta 0:03:51 lr 0.000272 wd 0.0500 time 0.4707 (0.4771) data time 0.0009 (0.0064) model time 0.4698 (0.4703) loss 2.5427 (2.6726) grad_norm 1.5626 (2.2010) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][150/625] eta 0:03:46 lr 0.000272 wd 0.0500 time 0.4668 (0.4765) data time 0.0008 (0.0060) model time 0.4660 (0.4700) loss 2.8937 (2.6896) grad_norm 1.8878 (2.2134) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][160/625] eta 0:03:41 lr 0.000272 wd 0.0500 time 0.4731 (0.4762) data time 0.0011 (0.0057) model time 0.4720 (0.4700) loss 2.8185 (2.6888) grad_norm 1.9584 (2.2139) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][170/625] eta 0:03:36 lr 0.000272 wd 0.0500 time 0.4647 (0.4757) data time 0.0010 (0.0054) model time 0.4637 (0.4698) loss 2.1911 (2.6970) grad_norm 2.1244 (2.2426) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][180/625] eta 0:03:31 lr 0.000272 wd 0.0500 time 0.4671 (0.4753) data time 0.0010 (0.0052) model time 0.4661 (0.4695) loss 3.2197 (2.6912) grad_norm 1.8781 (2.2299) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][190/625] eta 0:03:26 lr 0.000272 wd 0.0500 time 0.4737 (0.4749) data time 0.0011 (0.0050) model time 0.4726 (0.4693) loss 2.9799 (2.6891) grad_norm 1.6921 (2.2193) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][200/625] eta 0:03:21 lr 0.000271 wd 0.0500 time 0.4783 (0.4747) data time 0.0011 (0.0048) model time 0.4772 (0.4693) loss 2.6424 (2.6809) grad_norm 1.6768 (2.2150) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][210/625] eta 0:03:16 lr 0.000271 wd 0.0500 time 0.4696 (0.4744) data time 0.0008 (0.0046) model time 0.4689 (0.4693) loss 2.4059 (2.6686) grad_norm 1.8484 (2.2025) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:25:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][220/625] eta 0:03:12 lr 0.000271 wd 0.0500 time 0.4687 (0.4742) data time 0.0008 (0.0044) model time 0.4679 (0.4692) loss 2.1281 (2.6593) grad_norm 1.6643 (2.1843) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][230/625] eta 0:03:07 lr 0.000271 wd 0.0500 time 0.4669 (0.4739) data time 0.0008 (0.0043) model time 0.4661 (0.4690) loss 2.2656 (2.6631) grad_norm 1.8511 (2.1982) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][240/625] eta 0:03:02 lr 0.000271 wd 0.0500 time 0.4651 (0.4736) data time 0.0008 (0.0041) model time 0.4642 (0.4688) loss 2.7513 (2.6688) grad_norm 1.7002 (2.2157) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][250/625] eta 0:02:57 lr 0.000271 wd 0.0500 time 0.4668 (0.4734) data time 0.0007 (0.0040) model time 0.4661 (0.4688) loss 3.5386 (2.6808) grad_norm 5.3683 (2.2145) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][260/625] eta 0:02:52 lr 0.000271 wd 0.0500 time 0.4667 (0.4732) data time 0.0008 (0.0039) model time 0.4659 (0.4687) loss 2.8592 (2.6813) grad_norm 3.2039 (2.2169) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][270/625] eta 0:02:47 lr 0.000271 wd 0.0500 time 0.4624 (0.4729) data time 0.0011 (0.0038) model time 0.4613 (0.4686) loss 2.6253 (2.6794) grad_norm 1.2398 (2.2121) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][280/625] eta 0:02:43 lr 0.000271 wd 0.0500 time 0.4633 (0.4728) data time 0.0011 (0.0037) model time 0.4623 (0.4685) loss 2.6678 (2.6868) grad_norm 1.9053 (2.3118) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][290/625] eta 0:02:38 lr 0.000271 wd 0.0500 time 0.4710 (0.4727) data time 0.0008 (0.0036) model time 0.4702 (0.4685) loss 1.8012 (2.6691) grad_norm 1.5532 (2.2938) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][300/625] eta 0:02:33 lr 0.000271 wd 0.0500 time 0.4707 (0.4725) data time 0.0010 (0.0035) model time 0.4697 (0.4685) loss 2.8833 (2.6659) grad_norm 2.3032 (2.2905) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][310/625] eta 0:02:28 lr 0.000270 wd 0.0500 time 0.4632 (0.4725) data time 0.0008 (0.0034) model time 0.4624 (0.4686) loss 3.1694 (2.6653) grad_norm 2.0388 (2.2925) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][320/625] eta 0:02:24 lr 0.000270 wd 0.0500 time 0.4699 (0.4724) data time 0.0010 (0.0034) model time 0.4688 (0.4686) loss 1.9010 (2.6638) grad_norm 1.6715 (2.2983) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][330/625] eta 0:02:19 lr 0.000270 wd 0.0500 time 0.4695 (0.4723) data time 0.0012 (0.0033) model time 0.4683 (0.4686) loss 2.8133 (2.6650) grad_norm 2.0492 (2.3172) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:26:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][340/625] eta 0:02:14 lr 0.000270 wd 0.0500 time 0.4668 (0.4727) data time 0.0010 (0.0032) model time 0.4658 (0.4691) loss 2.2785 (2.6659) grad_norm 1.9833 (2.3199) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][350/625] eta 0:02:09 lr 0.000270 wd 0.0500 time 0.4677 (0.4727) data time 0.0008 (0.0032) model time 0.4670 (0.4692) loss 2.4061 (2.6718) grad_norm 2.3196 (2.3080) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][360/625] eta 0:02:05 lr 0.000270 wd 0.0500 time 0.4815 (0.4727) data time 0.0010 (0.0031) model time 0.4805 (0.4692) loss 3.0185 (2.6703) grad_norm 1.9786 (2.3052) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][370/625] eta 0:02:00 lr 0.000270 wd 0.0500 time 0.4657 (0.4726) data time 0.0010 (0.0031) model time 0.4647 (0.4692) loss 2.2964 (2.6687) grad_norm 2.8320 (2.2937) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][380/625] eta 0:01:55 lr 0.000270 wd 0.0500 time 0.4703 (0.4725) data time 0.0007 (0.0030) model time 0.4695 (0.4692) loss 2.5861 (2.6708) grad_norm 2.5099 (2.2873) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][390/625] eta 0:01:51 lr 0.000270 wd 0.0500 time 0.4711 (0.4725) data time 0.0010 (0.0030) model time 0.4701 (0.4692) loss 2.1595 (2.6669) grad_norm 1.6036 (2.2804) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][400/625] eta 0:01:46 lr 0.000270 wd 0.0500 time 0.4728 (0.4724) data time 0.0010 (0.0029) model time 0.4717 (0.4692) loss 2.7977 (2.6611) grad_norm 1.8787 (2.2714) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][410/625] eta 0:01:41 lr 0.000270 wd 0.0500 time 0.4665 (0.4723) data time 0.0009 (0.0029) model time 0.4657 (0.4691) loss 2.3867 (2.6650) grad_norm 2.1378 (2.2657) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][420/625] eta 0:01:36 lr 0.000270 wd 0.0500 time 0.4659 (0.4722) data time 0.0011 (0.0028) model time 0.4648 (0.4691) loss 2.9198 (2.6690) grad_norm 1.6993 (2.2561) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][430/625] eta 0:01:32 lr 0.000269 wd 0.0500 time 0.4734 (0.4721) data time 0.0008 (0.0028) model time 0.4726 (0.4690) loss 2.5242 (2.6670) grad_norm 2.1338 (2.2436) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][440/625] eta 0:01:27 lr 0.000269 wd 0.0500 time 0.4725 (0.4721) data time 0.0008 (0.0027) model time 0.4718 (0.4691) loss 3.1840 (2.6737) grad_norm 3.8011 (2.2851) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][450/625] eta 0:01:22 lr 0.000269 wd 0.0500 time 0.4696 (0.4720) data time 0.0011 (0.0027) model time 0.4685 (0.4691) loss 3.1426 (2.6758) grad_norm 2.2410 (2.2807) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][460/625] eta 0:01:17 lr 0.000269 wd 0.0500 time 0.4738 (0.4725) data time 0.0008 (0.0027) model time 0.4730 (0.4696) loss 3.0471 (2.6748) grad_norm 2.4039 (2.2765) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:27:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][470/625] eta 0:01:13 lr 0.000269 wd 0.0500 time 0.4677 (0.4724) data time 0.0008 (0.0026) model time 0.4669 (0.4695) loss 2.0421 (2.6757) grad_norm 1.6307 (2.2746) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][480/625] eta 0:01:08 lr 0.000269 wd 0.0500 time 0.4737 (0.4723) data time 0.0011 (0.0026) model time 0.4727 (0.4695) loss 2.7289 (2.6770) grad_norm 1.8785 (2.2752) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][490/625] eta 0:01:03 lr 0.000269 wd 0.0500 time 0.4666 (0.4722) data time 0.0008 (0.0026) model time 0.4658 (0.4694) loss 1.8080 (2.6725) grad_norm 2.1084 (2.2654) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][500/625] eta 0:00:59 lr 0.000269 wd 0.0500 time 0.4667 (0.4721) data time 0.0008 (0.0025) model time 0.4659 (0.4694) loss 2.6378 (2.6712) grad_norm 2.7736 (2.2979) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][510/625] eta 0:00:54 lr 0.000269 wd 0.0500 time 0.4675 (0.4721) data time 0.0010 (0.0025) model time 0.4665 (0.4694) loss 2.1374 (2.6700) grad_norm 1.6690 (2.3007) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][520/625] eta 0:00:49 lr 0.000269 wd 0.0500 time 0.4648 (0.4721) data time 0.0008 (0.0025) model time 0.4640 (0.4694) loss 2.9539 (2.6731) grad_norm 6.0380 (2.3074) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][530/625] eta 0:00:44 lr 0.000269 wd 0.0500 time 0.4730 (0.4721) data time 0.0011 (0.0024) model time 0.4718 (0.4695) loss 2.7719 (2.6744) grad_norm 12.5040 (2.3278) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][540/625] eta 0:00:40 lr 0.000268 wd 0.0500 time 0.4658 (0.4720) data time 0.0009 (0.0024) model time 0.4649 (0.4694) loss 1.7452 (2.6697) grad_norm 1.7161 (2.3256) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][550/625] eta 0:00:35 lr 0.000268 wd 0.0500 time 0.4626 (0.4719) data time 0.0010 (0.0024) model time 0.4615 (0.4693) loss 2.3139 (2.6706) grad_norm 1.6632 (2.3159) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][560/625] eta 0:00:30 lr 0.000268 wd 0.0500 time 0.4621 (0.4718) data time 0.0011 (0.0024) model time 0.4610 (0.4692) loss 2.3296 (2.6663) grad_norm 1.8929 (2.3071) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][570/625] eta 0:00:25 lr 0.000268 wd 0.0500 time 0.4773 (0.4718) data time 0.0007 (0.0024) model time 0.4766 (0.4692) loss 2.4270 (2.6626) grad_norm 1.7157 (2.3033) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][580/625] eta 0:00:21 lr 0.000268 wd 0.0500 time 0.4722 (0.4718) data time 0.0008 (0.0023) model time 0.4714 (0.4693) loss 2.6934 (2.6640) grad_norm 1.9586 (2.2978) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][590/625] eta 0:00:16 lr 0.000268 wd 0.0500 time 0.4657 (0.4718) data time 0.0011 (0.0023) model time 0.4646 (0.4692) loss 2.0463 (2.6622) grad_norm 1.5786 (2.2894) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:28:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][600/625] eta 0:00:11 lr 0.000268 wd 0.0500 time 0.4686 (0.4717) data time 0.0010 (0.0023) model time 0.4676 (0.4692) loss 2.8781 (2.6601) grad_norm 1.9788 (2.3003) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][610/625] eta 0:00:07 lr 0.000268 wd 0.0500 time 0.4629 (0.4716) data time 0.0008 (0.0023) model time 0.4621 (0.4691) loss 2.7871 (2.6584) grad_norm 2.6279 (2.2984) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [213/300][620/625] eta 0:00:02 lr 0.000268 wd 0.0500 time 0.4631 (0.4715) data time 0.0009 (0.0022) model time 0.4623 (0.4690) loss 2.8576 (2.6597) grad_norm 1.6682 (2.2954) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 213 training takes 0:04:54 [2024-08-10 21:29:09 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:29:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.521 (0.521) Loss 0.5083 (0.5083) Acc@1 88.721 (88.721) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:29:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.162) Loss 0.8330 (0.6251) Acc@1 80.322 (86.426) Acc@5 96.045 (97.718) Mem 16706MB [2024-08-10 21:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.142) Loss 0.9136 (0.7375) Acc@1 78.564 (83.582) Acc@5 95.801 (96.642) Mem 16706MB [2024-08-10 21:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.295 Acc@5 96.599 [2024-08-10 21:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 21:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.877 (0.877) Loss 0.4766 (0.4766) Acc@1 89.551 (89.551) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:29:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.196) Loss 0.7588 (0.5868) Acc@1 82.031 (87.287) Acc@5 97.070 (98.007) Mem 16706MB [2024-08-10 21:29:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.159) Loss 0.8423 (0.6887) Acc@1 80.029 (84.610) Acc@5 96.045 (97.040) Mem 16706MB [2024-08-10 21:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.327 Acc@5 97.021 [2024-08-10 21:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][0/625] eta 0:13:34 lr 0.000268 wd 0.0500 time 1.3037 (1.3037) data time 0.7173 (0.7173) model time 0.0000 (0.0000) loss 1.7268 (1.7268) grad_norm 1.5702 (1.5702) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][10/625] eta 0:05:34 lr 0.000268 wd 0.0500 time 0.4663 (0.5445) data time 0.0010 (0.0661) model time 0.0000 (0.0000) loss 2.5843 (2.5071) grad_norm 2.2468 (2.0061) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][20/625] eta 0:05:07 lr 0.000268 wd 0.0500 time 0.4682 (0.5085) data time 0.0008 (0.0351) model time 0.0000 (0.0000) loss 2.5645 (2.5810) grad_norm 1.8185 (1.9614) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][30/625] eta 0:04:54 lr 0.000267 wd 0.0500 time 0.4657 (0.4956) data time 0.0011 (0.0241) model time 0.0000 (0.0000) loss 2.6959 (2.6932) grad_norm 2.2395 (2.1215) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][40/625] eta 0:04:45 lr 0.000267 wd 0.0500 time 0.4661 (0.4885) data time 0.0011 (0.0185) model time 0.0000 (0.0000) loss 2.8051 (2.6604) grad_norm 2.3772 (2.2042) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][50/625] eta 0:04:41 lr 0.000267 wd 0.0500 time 0.4652 (0.4889) data time 0.0011 (0.0151) model time 0.0000 (0.0000) loss 3.1608 (2.6793) grad_norm 2.0552 (2.1644) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][60/625] eta 0:04:35 lr 0.000267 wd 0.0500 time 0.4625 (0.4877) data time 0.0011 (0.0128) model time 0.4614 (0.4808) loss 2.5679 (2.6891) grad_norm 2.5433 (2.1725) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][70/625] eta 0:04:28 lr 0.000267 wd 0.0500 time 0.4648 (0.4847) data time 0.0012 (0.0111) model time 0.4636 (0.4730) loss 2.8905 (2.6739) grad_norm 4.3044 (2.2201) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][80/625] eta 0:04:22 lr 0.000267 wd 0.0500 time 0.4668 (0.4823) data time 0.0008 (0.0099) model time 0.4660 (0.4702) loss 1.8354 (2.6470) grad_norm 1.9043 (2.1980) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][90/625] eta 0:04:17 lr 0.000267 wd 0.0500 time 0.4633 (0.4805) data time 0.0010 (0.0089) model time 0.4623 (0.4688) loss 2.6021 (2.6627) grad_norm 2.2180 (2.2578) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][100/625] eta 0:04:11 lr 0.000267 wd 0.0500 time 0.4638 (0.4790) data time 0.0008 (0.0081) model time 0.4631 (0.4678) loss 1.9071 (2.6489) grad_norm 2.0799 (2.2233) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][110/625] eta 0:04:06 lr 0.000267 wd 0.0500 time 0.4640 (0.4777) data time 0.0008 (0.0075) model time 0.4633 (0.4672) loss 1.9713 (2.6649) grad_norm 3.5686 (2.2463) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][120/625] eta 0:04:00 lr 0.000267 wd 0.0500 time 0.4646 (0.4767) data time 0.0008 (0.0070) model time 0.4638 (0.4668) loss 2.6560 (2.6550) grad_norm 3.4409 (2.2780) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][130/625] eta 0:03:55 lr 0.000267 wd 0.0500 time 0.4666 (0.4759) data time 0.0014 (0.0065) model time 0.4653 (0.4666) loss 2.0084 (2.6546) grad_norm 1.8757 (2.2715) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][140/625] eta 0:03:50 lr 0.000267 wd 0.0500 time 0.4656 (0.4751) data time 0.0008 (0.0061) model time 0.4648 (0.4663) loss 2.4176 (2.6561) grad_norm 1.5412 (2.2509) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][150/625] eta 0:03:45 lr 0.000266 wd 0.0500 time 0.4642 (0.4746) data time 0.0008 (0.0058) model time 0.4634 (0.4663) loss 2.8980 (2.6641) grad_norm 1.7551 (2.2497) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][160/625] eta 0:03:40 lr 0.000266 wd 0.0500 time 0.4726 (0.4741) data time 0.0010 (0.0055) model time 0.4716 (0.4663) loss 2.8800 (2.6737) grad_norm 5.1275 (2.2860) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][170/625] eta 0:03:35 lr 0.000266 wd 0.0500 time 0.4671 (0.4738) data time 0.0011 (0.0052) model time 0.4660 (0.4663) loss 3.0648 (2.6713) grad_norm 5.5167 (2.5047) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][180/625] eta 0:03:30 lr 0.000266 wd 0.0500 time 0.4656 (0.4734) data time 0.0010 (0.0050) model time 0.4645 (0.4663) loss 2.7466 (2.6727) grad_norm 3.9679 (2.5372) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][190/625] eta 0:03:25 lr 0.000266 wd 0.0500 time 0.4638 (0.4731) data time 0.0008 (0.0048) model time 0.4630 (0.4663) loss 2.2377 (2.6848) grad_norm 1.5940 (2.5139) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][200/625] eta 0:03:20 lr 0.000266 wd 0.0500 time 0.4686 (0.4727) data time 0.0011 (0.0046) model time 0.4675 (0.4662) loss 2.8498 (2.6847) grad_norm 2.5398 (2.5246) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:30:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][210/625] eta 0:03:16 lr 0.000266 wd 0.0500 time 0.4692 (0.4724) data time 0.0011 (0.0044) model time 0.4680 (0.4662) loss 2.3203 (2.6953) grad_norm 1.7644 (2.5860) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][220/625] eta 0:03:11 lr 0.000266 wd 0.0500 time 0.4706 (0.4722) data time 0.0010 (0.0043) model time 0.4695 (0.4662) loss 2.4458 (2.6815) grad_norm 1.5731 (2.5651) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][230/625] eta 0:03:06 lr 0.000266 wd 0.0500 time 0.4961 (0.4722) data time 0.0008 (0.0041) model time 0.4953 (0.4665) loss 3.3071 (2.6826) grad_norm 2.1075 (2.5607) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][240/625] eta 0:03:02 lr 0.000266 wd 0.0500 time 0.4765 (0.4730) data time 0.0010 (0.0040) model time 0.4756 (0.4677) loss 2.4154 (2.6741) grad_norm 1.7263 (2.5334) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][250/625] eta 0:02:57 lr 0.000266 wd 0.0500 time 0.4839 (0.4730) data time 0.0008 (0.0039) model time 0.4832 (0.4679) loss 1.7304 (2.6640) grad_norm 4.8035 (2.6178) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][260/625] eta 0:02:52 lr 0.000265 wd 0.0500 time 0.4651 (0.4728) data time 0.0011 (0.0038) model time 0.4640 (0.4679) loss 2.9429 (2.6652) grad_norm 1.8202 (2.6096) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][270/625] eta 0:02:47 lr 0.000265 wd 0.0500 time 0.4709 (0.4726) data time 0.0009 (0.0037) model time 0.4700 (0.4678) loss 3.1519 (2.6623) grad_norm 1.8636 (2.5843) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][280/625] eta 0:02:43 lr 0.000265 wd 0.0500 time 0.4663 (0.4725) data time 0.0010 (0.0036) model time 0.4653 (0.4678) loss 2.8340 (2.6625) grad_norm 1.8345 (2.5605) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][290/625] eta 0:02:38 lr 0.000265 wd 0.0500 time 0.4614 (0.4724) data time 0.0009 (0.0035) model time 0.4605 (0.4678) loss 3.1573 (2.6685) grad_norm 2.8378 (2.5412) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][300/625] eta 0:02:33 lr 0.000265 wd 0.0500 time 0.4707 (0.4723) data time 0.0010 (0.0034) model time 0.4697 (0.4679) loss 2.5706 (2.6663) grad_norm 1.6404 (2.5215) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][310/625] eta 0:02:28 lr 0.000265 wd 0.0500 time 0.4664 (0.4722) data time 0.0008 (0.0033) model time 0.4656 (0.4679) loss 2.9636 (2.6657) grad_norm 2.2512 (2.6222) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][320/625] eta 0:02:24 lr 0.000265 wd 0.0500 time 0.4807 (0.4722) data time 0.0008 (0.0033) model time 0.4799 (0.4680) loss 1.9793 (2.6563) grad_norm 2.5622 (2.6054) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][330/625] eta 0:02:19 lr 0.000265 wd 0.0500 time 0.4687 (0.4721) data time 0.0011 (0.0032) model time 0.4676 (0.4680) loss 3.1251 (2.6565) grad_norm 2.1465 (2.6017) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:31:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][340/625] eta 0:02:14 lr 0.000265 wd 0.0500 time 0.4693 (0.4719) data time 0.0010 (0.0031) model time 0.4683 (0.4679) loss 3.4025 (2.6634) grad_norm 3.0015 (2.6204) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][350/625] eta 0:02:09 lr 0.000265 wd 0.0500 time 0.4632 (0.4718) data time 0.0008 (0.0031) model time 0.4623 (0.4678) loss 3.5497 (2.6703) grad_norm 6.6153 (2.6475) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][360/625] eta 0:02:04 lr 0.000265 wd 0.0500 time 0.4751 (0.4717) data time 0.0010 (0.0030) model time 0.4741 (0.4678) loss 2.8775 (2.6681) grad_norm 2.5812 (2.6475) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][370/625] eta 0:02:00 lr 0.000264 wd 0.0500 time 0.4688 (0.4716) data time 0.0008 (0.0030) model time 0.4680 (0.4678) loss 2.4156 (2.6693) grad_norm 1.8824 (2.6368) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][380/625] eta 0:01:55 lr 0.000264 wd 0.0500 time 0.4651 (0.4715) data time 0.0008 (0.0029) model time 0.4644 (0.4677) loss 2.6247 (2.6646) grad_norm 1.5640 (2.6208) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][390/625] eta 0:01:50 lr 0.000264 wd 0.0500 time 0.4668 (0.4714) data time 0.0009 (0.0029) model time 0.4660 (0.4677) loss 2.5908 (2.6640) grad_norm 1.5555 (2.5985) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][400/625] eta 0:01:46 lr 0.000264 wd 0.0500 time 0.4661 (0.4717) data time 0.0011 (0.0028) model time 0.4650 (0.4682) loss 3.0308 (2.6658) grad_norm 2.0005 (2.5870) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][410/625] eta 0:01:41 lr 0.000264 wd 0.0500 time 0.4647 (0.4715) data time 0.0011 (0.0028) model time 0.4636 (0.4680) loss 2.5598 (2.6638) grad_norm 3.6460 (2.5854) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][420/625] eta 0:01:36 lr 0.000264 wd 0.0500 time 0.4746 (0.4714) data time 0.0011 (0.0027) model time 0.4735 (0.4679) loss 3.2466 (2.6649) grad_norm 1.7204 (2.5843) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][430/625] eta 0:01:31 lr 0.000264 wd 0.0500 time 0.4644 (0.4712) data time 0.0008 (0.0027) model time 0.4636 (0.4678) loss 2.3928 (2.6656) grad_norm 2.5111 (2.5735) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][440/625] eta 0:01:27 lr 0.000264 wd 0.0500 time 0.4783 (0.4711) data time 0.0009 (0.0027) model time 0.4774 (0.4678) loss 2.2384 (2.6614) grad_norm 2.6977 (2.5758) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][450/625] eta 0:01:22 lr 0.000264 wd 0.0500 time 0.4656 (0.4711) data time 0.0010 (0.0026) model time 0.4646 (0.4677) loss 2.5149 (2.6611) grad_norm 2.1200 (2.5679) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:32:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][460/625] eta 0:01:17 lr 0.000264 wd 0.0500 time 0.4675 (0.4715) data time 0.0011 (0.0026) model time 0.4664 (0.4683) loss 2.3935 (2.6606) grad_norm 2.9944 (2.5600) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][470/625] eta 0:01:13 lr 0.000264 wd 0.0500 time 0.4690 (0.4714) data time 0.0008 (0.0026) model time 0.4682 (0.4682) loss 3.1763 (2.6657) grad_norm 2.9717 (2.5525) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][480/625] eta 0:01:08 lr 0.000264 wd 0.0500 time 0.4635 (0.4713) data time 0.0008 (0.0025) model time 0.4628 (0.4681) loss 1.6629 (2.6657) grad_norm 1.5248 (2.5429) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][490/625] eta 0:01:03 lr 0.000263 wd 0.0500 time 0.4632 (0.4711) data time 0.0011 (0.0025) model time 0.4622 (0.4680) loss 2.0805 (2.6650) grad_norm 2.0907 (2.5541) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][500/625] eta 0:00:58 lr 0.000263 wd 0.0500 time 0.4644 (0.4710) data time 0.0009 (0.0025) model time 0.4635 (0.4679) loss 2.4612 (2.6637) grad_norm 3.0395 (2.5552) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][510/625] eta 0:00:54 lr 0.000263 wd 0.0500 time 0.4632 (0.4709) data time 0.0011 (0.0025) model time 0.4621 (0.4678) loss 2.8621 (2.6663) grad_norm 2.1358 (2.5746) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][520/625] eta 0:00:49 lr 0.000263 wd 0.0500 time 0.4682 (0.4708) data time 0.0008 (0.0024) model time 0.4674 (0.4677) loss 1.6520 (2.6634) grad_norm 1.7786 (2.5713) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][530/625] eta 0:00:44 lr 0.000263 wd 0.0500 time 0.4697 (0.4707) data time 0.0008 (0.0024) model time 0.4690 (0.4677) loss 2.6979 (2.6647) grad_norm 1.7499 (2.5635) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][540/625] eta 0:00:40 lr 0.000263 wd 0.0500 time 0.4761 (0.4706) data time 0.0011 (0.0024) model time 0.4751 (0.4677) loss 2.9556 (2.6660) grad_norm 1.3187 (2.5471) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][550/625] eta 0:00:35 lr 0.000263 wd 0.0500 time 0.4638 (0.4706) data time 0.0011 (0.0024) model time 0.4627 (0.4676) loss 3.0948 (2.6673) grad_norm 1.5763 (2.5382) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][560/625] eta 0:00:30 lr 0.000263 wd 0.0500 time 0.4664 (0.4705) data time 0.0008 (0.0023) model time 0.4656 (0.4676) loss 2.5259 (2.6662) grad_norm 4.2500 (2.5394) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][570/625] eta 0:00:25 lr 0.000263 wd 0.0500 time 0.4642 (0.4704) data time 0.0009 (0.0023) model time 0.4633 (0.4676) loss 2.7559 (2.6672) grad_norm 1.8711 (2.5398) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][580/625] eta 0:00:21 lr 0.000263 wd 0.0500 time 0.4658 (0.4703) data time 0.0010 (0.0023) model time 0.4647 (0.4675) loss 2.6514 (2.6731) grad_norm 1.9600 (2.5388) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][590/625] eta 0:00:16 lr 0.000263 wd 0.0500 time 0.4658 (0.4703) data time 0.0010 (0.0023) model time 0.4647 (0.4675) loss 2.1064 (2.6745) grad_norm 2.1068 (2.5421) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][600/625] eta 0:00:11 lr 0.000262 wd 0.0500 time 0.4729 (0.4707) data time 0.0010 (0.0022) model time 0.4719 (0.4679) loss 2.3622 (2.6745) grad_norm 3.3365 (2.5413) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][610/625] eta 0:00:07 lr 0.000262 wd 0.0500 time 0.4663 (0.4706) data time 0.0008 (0.0022) model time 0.4655 (0.4679) loss 2.4386 (2.6740) grad_norm 2.1710 (2.5331) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [214/300][620/625] eta 0:00:02 lr 0.000262 wd 0.0500 time 0.4638 (0.4705) data time 0.0005 (0.0022) model time 0.4633 (0.4679) loss 2.5829 (2.6762) grad_norm 2.2845 (2.5229) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:12 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 214 training takes 0:04:54 [2024-08-10 21:34:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:34:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.500 (0.500) Loss 0.5063 (0.5063) Acc@1 88.672 (88.672) Acc@5 98.828 (98.828) Mem 16706MB [2024-08-10 21:34:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.159) Loss 0.8462 (0.6292) Acc@1 78.760 (86.386) Acc@5 96.045 (97.754) Mem 16706MB [2024-08-10 21:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.140) Loss 0.9272 (0.7411) Acc@1 78.564 (83.587) Acc@5 95.361 (96.617) Mem 16706MB [2024-08-10 21:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.305 Acc@5 96.595 [2024-08-10 21:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 21:34:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.825 (0.825) Loss 0.4771 (0.4771) Acc@1 89.502 (89.502) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:34:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.191) Loss 0.7603 (0.5871) Acc@1 81.787 (87.256) Acc@5 96.973 (98.007) Mem 16706MB [2024-08-10 21:34:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.123 (0.157) Loss 0.8442 (0.6894) Acc@1 80.078 (84.589) Acc@5 96.143 (97.052) Mem 16706MB [2024-08-10 21:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.307 Acc@5 97.031 [2024-08-10 21:34:21 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][0/625] eta 0:13:47 lr 0.000262 wd 0.0500 time 1.3234 (1.3234) data time 0.6866 (0.6866) model time 0.0000 (0.0000) loss 2.6695 (2.6695) grad_norm 2.3374 (2.3374) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][10/625] eta 0:05:36 lr 0.000262 wd 0.0500 time 0.4776 (0.5470) data time 0.0011 (0.0634) model time 0.0000 (0.0000) loss 2.9657 (2.6133) grad_norm 4.2437 (2.1651) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][20/625] eta 0:05:07 lr 0.000262 wd 0.0500 time 0.4646 (0.5088) data time 0.0008 (0.0337) model time 0.0000 (0.0000) loss 3.3850 (2.7042) grad_norm 2.0139 (2.1566) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][30/625] eta 0:04:55 lr 0.000262 wd 0.0500 time 0.4928 (0.4971) data time 0.0008 (0.0232) model time 0.0000 (0.0000) loss 2.2340 (2.7013) grad_norm 3.8620 (2.3924) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][40/625] eta 0:04:46 lr 0.000262 wd 0.0500 time 0.4670 (0.4899) data time 0.0009 (0.0178) model time 0.0000 (0.0000) loss 3.0769 (2.7605) grad_norm 2.1638 (2.6006) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][50/625] eta 0:04:39 lr 0.000262 wd 0.0500 time 0.4719 (0.4863) data time 0.0010 (0.0145) model time 0.0000 (0.0000) loss 3.0088 (2.7271) grad_norm 2.1492 (2.7608) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][60/625] eta 0:04:32 lr 0.000262 wd 0.0500 time 0.4650 (0.4832) data time 0.0010 (0.0123) model time 0.4640 (0.4662) loss 2.4519 (2.7266) grad_norm 2.0167 (2.6660) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][70/625] eta 0:04:27 lr 0.000262 wd 0.0500 time 0.4872 (0.4812) data time 0.0012 (0.0107) model time 0.4860 (0.4671) loss 3.3044 (2.7431) grad_norm 1.7205 (2.5469) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][80/625] eta 0:04:21 lr 0.000262 wd 0.0500 time 0.4686 (0.4797) data time 0.0009 (0.0096) model time 0.4678 (0.4673) loss 3.4395 (2.7467) grad_norm 1.7793 (2.4788) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][90/625] eta 0:04:15 lr 0.000261 wd 0.0500 time 0.4643 (0.4783) data time 0.0008 (0.0086) model time 0.4634 (0.4671) loss 3.1226 (2.7480) grad_norm 1.4780 (2.4258) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][100/625] eta 0:04:11 lr 0.000261 wd 0.0500 time 0.4690 (0.4790) data time 0.0007 (0.0079) model time 0.4683 (0.4704) loss 2.8683 (2.7743) grad_norm 2.5352 (2.3650) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][110/625] eta 0:04:06 lr 0.000261 wd 0.0500 time 0.4704 (0.4779) data time 0.0008 (0.0073) model time 0.4696 (0.4698) loss 1.5438 (2.7628) grad_norm 1.7499 (2.3372) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][120/625] eta 0:04:00 lr 0.000261 wd 0.0500 time 0.4659 (0.4770) data time 0.0008 (0.0067) model time 0.4651 (0.4692) loss 3.0366 (2.7774) grad_norm 1.7230 (2.3014) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][130/625] eta 0:03:55 lr 0.000261 wd 0.0500 time 0.4649 (0.4762) data time 0.0010 (0.0063) model time 0.4639 (0.4687) loss 2.9969 (2.7687) grad_norm 1.9512 (2.2850) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][140/625] eta 0:03:51 lr 0.000261 wd 0.0500 time 0.4668 (0.4771) data time 0.0008 (0.0059) model time 0.4660 (0.4708) loss 1.6237 (2.7549) grad_norm 1.6908 (2.2709) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][150/625] eta 0:03:46 lr 0.000261 wd 0.0500 time 0.4663 (0.4764) data time 0.0008 (0.0056) model time 0.4655 (0.4703) loss 3.1063 (2.7442) grad_norm 2.3298 (2.2456) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][160/625] eta 0:03:41 lr 0.000261 wd 0.0500 time 0.4653 (0.4759) data time 0.0012 (0.0053) model time 0.4641 (0.4700) loss 3.1903 (2.7504) grad_norm 1.4918 (2.2384) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][170/625] eta 0:03:36 lr 0.000261 wd 0.0500 time 0.4671 (0.4754) data time 0.0009 (0.0051) model time 0.4663 (0.4697) loss 2.8302 (2.7483) grad_norm 1.6157 (2.2270) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][180/625] eta 0:03:31 lr 0.000261 wd 0.0500 time 0.4673 (0.4750) data time 0.0010 (0.0048) model time 0.4662 (0.4695) loss 1.9830 (2.7300) grad_norm 2.0454 (2.2154) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][190/625] eta 0:03:26 lr 0.000261 wd 0.0500 time 0.4721 (0.4746) data time 0.0010 (0.0046) model time 0.4711 (0.4693) loss 2.9261 (2.7196) grad_norm 3.3309 (2.2294) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:35:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][200/625] eta 0:03:21 lr 0.000261 wd 0.0500 time 0.4664 (0.4742) data time 0.0009 (0.0045) model time 0.4656 (0.4691) loss 2.8806 (2.7172) grad_norm 2.0797 (2.2350) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][210/625] eta 0:03:17 lr 0.000260 wd 0.0500 time 0.4643 (0.4756) data time 0.0008 (0.0043) model time 0.4634 (0.4712) loss 2.9285 (2.7222) grad_norm 2.2766 (2.2286) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][220/625] eta 0:03:12 lr 0.000260 wd 0.0500 time 0.4663 (0.4751) data time 0.0007 (0.0042) model time 0.4656 (0.4708) loss 2.1220 (2.7145) grad_norm 2.7754 (2.2734) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][230/625] eta 0:03:07 lr 0.000260 wd 0.0500 time 0.4671 (0.4748) data time 0.0007 (0.0040) model time 0.4663 (0.4706) loss 2.5568 (2.7141) grad_norm 1.8523 (2.2756) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][240/625] eta 0:03:02 lr 0.000260 wd 0.0500 time 0.4655 (0.4744) data time 0.0008 (0.0039) model time 0.4647 (0.4702) loss 2.0385 (2.7067) grad_norm 3.9325 (2.2961) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][250/625] eta 0:02:57 lr 0.000260 wd 0.0500 time 0.4637 (0.4741) data time 0.0010 (0.0038) model time 0.4627 (0.4700) loss 2.1971 (2.6986) grad_norm 3.9461 (2.3174) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][260/625] eta 0:02:52 lr 0.000260 wd 0.0500 time 0.4710 (0.4739) data time 0.0008 (0.0037) model time 0.4702 (0.4698) loss 1.9265 (2.6838) grad_norm 1.3363 (2.3145) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][270/625] eta 0:02:48 lr 0.000260 wd 0.0500 time 0.4658 (0.4737) data time 0.0011 (0.0036) model time 0.4647 (0.4698) loss 2.6945 (2.6846) grad_norm 1.6297 (2.3024) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][280/625] eta 0:02:43 lr 0.000260 wd 0.0500 time 0.4643 (0.4735) data time 0.0011 (0.0035) model time 0.4633 (0.4696) loss 1.7847 (2.6760) grad_norm 1.6361 (2.2859) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][290/625] eta 0:02:38 lr 0.000260 wd 0.0500 time 0.4672 (0.4733) data time 0.0008 (0.0034) model time 0.4664 (0.4695) loss 3.4860 (2.6782) grad_norm 2.5301 (2.2716) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][300/625] eta 0:02:33 lr 0.000260 wd 0.0500 time 0.4719 (0.4731) data time 0.0009 (0.0033) model time 0.4709 (0.4693) loss 2.7831 (2.6789) grad_norm 2.3387 (2.2640) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][310/625] eta 0:02:28 lr 0.000260 wd 0.0500 time 0.4649 (0.4729) data time 0.0010 (0.0033) model time 0.4638 (0.4692) loss 2.5822 (2.6779) grad_norm 1.8249 (2.2575) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][320/625] eta 0:02:24 lr 0.000259 wd 0.0500 time 0.4688 (0.4727) data time 0.0010 (0.0032) model time 0.4677 (0.4690) loss 3.1636 (2.6745) grad_norm 2.1434 (2.2749) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][330/625] eta 0:02:19 lr 0.000259 wd 0.0500 time 0.4696 (0.4725) data time 0.0010 (0.0031) model time 0.4686 (0.4689) loss 2.8821 (2.6827) grad_norm 7.3079 (2.2848) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][340/625] eta 0:02:14 lr 0.000259 wd 0.0500 time 0.4769 (0.4724) data time 0.0008 (0.0031) model time 0.4761 (0.4689) loss 1.9833 (2.6805) grad_norm 2.1116 (2.2790) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][350/625] eta 0:02:10 lr 0.000259 wd 0.0500 time 0.6355 (0.4734) data time 0.0009 (0.0030) model time 0.6347 (0.4701) loss 2.0124 (2.6755) grad_norm 1.4439 (2.2757) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][360/625] eta 0:02:05 lr 0.000259 wd 0.0500 time 0.4643 (0.4732) data time 0.0010 (0.0029) model time 0.4632 (0.4700) loss 2.7643 (2.6766) grad_norm 1.9369 (2.2646) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][370/625] eta 0:02:00 lr 0.000259 wd 0.0500 time 0.4704 (0.4730) data time 0.0010 (0.0029) model time 0.4694 (0.4699) loss 2.8126 (2.6810) grad_norm 2.1329 (2.2545) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][380/625] eta 0:01:55 lr 0.000259 wd 0.0500 time 0.4639 (0.4729) data time 0.0011 (0.0028) model time 0.4628 (0.4698) loss 2.8046 (2.6755) grad_norm 1.6385 (2.2484) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][390/625] eta 0:01:51 lr 0.000259 wd 0.0500 time 0.4645 (0.4727) data time 0.0008 (0.0028) model time 0.4637 (0.4697) loss 2.9610 (2.6782) grad_norm 1.7162 (2.2556) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][400/625] eta 0:01:46 lr 0.000259 wd 0.0500 time 0.4693 (0.4726) data time 0.0008 (0.0028) model time 0.4686 (0.4695) loss 2.7320 (2.6774) grad_norm 2.4619 (2.2504) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][410/625] eta 0:01:41 lr 0.000259 wd 0.0500 time 0.4701 (0.4725) data time 0.0012 (0.0027) model time 0.4690 (0.4695) loss 2.3040 (2.6795) grad_norm 4.2412 (2.2549) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][420/625] eta 0:01:36 lr 0.000259 wd 0.0500 time 0.4664 (0.4724) data time 0.0008 (0.0027) model time 0.4656 (0.4695) loss 2.5489 (2.6761) grad_norm 1.4582 (2.2590) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][430/625] eta 0:01:32 lr 0.000259 wd 0.0500 time 0.4704 (0.4724) data time 0.0012 (0.0026) model time 0.4692 (0.4695) loss 2.9827 (2.6802) grad_norm 1.6508 (2.2485) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][440/625] eta 0:01:27 lr 0.000258 wd 0.0500 time 0.4651 (0.4726) data time 0.0011 (0.0026) model time 0.4640 (0.4698) loss 1.9928 (2.6766) grad_norm 1.6695 (2.2441) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][450/625] eta 0:01:22 lr 0.000258 wd 0.0500 time 0.4715 (0.4726) data time 0.0012 (0.0026) model time 0.4703 (0.4698) loss 3.1229 (2.6789) grad_norm 2.1961 (2.2394) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][460/625] eta 0:01:17 lr 0.000258 wd 0.0500 time 0.4675 (0.4725) data time 0.0008 (0.0025) model time 0.4667 (0.4697) loss 2.4621 (2.6763) grad_norm 1.6808 (2.2367) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][470/625] eta 0:01:13 lr 0.000258 wd 0.0500 time 0.4679 (0.4729) data time 0.0008 (0.0025) model time 0.4671 (0.4702) loss 1.8042 (2.6755) grad_norm 1.7252 (2.2340) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][480/625] eta 0:01:08 lr 0.000258 wd 0.0500 time 0.4676 (0.4728) data time 0.0010 (0.0025) model time 0.4666 (0.4702) loss 2.3946 (2.6717) grad_norm 2.2050 (2.2286) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][490/625] eta 0:01:03 lr 0.000258 wd 0.0500 time 0.4693 (0.4728) data time 0.0010 (0.0024) model time 0.4682 (0.4702) loss 2.8855 (2.6750) grad_norm 1.9790 (2.2203) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][500/625] eta 0:00:59 lr 0.000258 wd 0.0500 time 0.4688 (0.4727) data time 0.0010 (0.0024) model time 0.4678 (0.4702) loss 2.9661 (2.6744) grad_norm 1.4512 (2.2597) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][510/625] eta 0:00:54 lr 0.000258 wd 0.0500 time 0.4658 (0.4727) data time 0.0008 (0.0024) model time 0.4650 (0.4701) loss 2.7934 (2.6761) grad_norm 2.6845 (2.2606) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][520/625] eta 0:00:49 lr 0.000258 wd 0.0500 time 0.4658 (0.4726) data time 0.0011 (0.0024) model time 0.4648 (0.4700) loss 2.6016 (2.6741) grad_norm 2.0287 (2.2589) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][530/625] eta 0:00:44 lr 0.000258 wd 0.0500 time 0.4823 (0.4725) data time 0.0008 (0.0023) model time 0.4815 (0.4700) loss 2.0594 (2.6734) grad_norm 1.7509 (2.2589) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][540/625] eta 0:00:40 lr 0.000258 wd 0.0500 time 0.4663 (0.4724) data time 0.0008 (0.0023) model time 0.4655 (0.4699) loss 3.3107 (2.6698) grad_norm 2.0087 (2.2609) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][550/625] eta 0:00:35 lr 0.000258 wd 0.0500 time 0.4671 (0.4724) data time 0.0008 (0.0023) model time 0.4664 (0.4699) loss 2.6166 (2.6703) grad_norm 1.8815 (2.2551) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][560/625] eta 0:00:30 lr 0.000257 wd 0.0500 time 0.4692 (0.4723) data time 0.0008 (0.0023) model time 0.4684 (0.4699) loss 2.3044 (2.6694) grad_norm 2.5661 (2.2669) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][570/625] eta 0:00:25 lr 0.000257 wd 0.0500 time 0.4683 (0.4723) data time 0.0010 (0.0023) model time 0.4673 (0.4699) loss 2.8223 (2.6701) grad_norm 1.7608 (2.2747) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][580/625] eta 0:00:21 lr 0.000257 wd 0.0500 time 0.4650 (0.4722) data time 0.0011 (0.0022) model time 0.4639 (0.4698) loss 1.5978 (2.6687) grad_norm 3.8554 (2.3101) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][590/625] eta 0:00:16 lr 0.000257 wd 0.0500 time 0.4705 (0.4721) data time 0.0008 (0.0022) model time 0.4697 (0.4697) loss 2.8739 (2.6697) grad_norm 2.1691 (2.3095) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][600/625] eta 0:00:11 lr 0.000257 wd 0.0500 time 0.4679 (0.4720) data time 0.0011 (0.0022) model time 0.4668 (0.4696) loss 3.0166 (2.6737) grad_norm 1.5972 (2.3041) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][610/625] eta 0:00:07 lr 0.000257 wd 0.0500 time 0.4651 (0.4719) data time 0.0008 (0.0022) model time 0.4643 (0.4695) loss 2.4244 (2.6720) grad_norm 2.1152 (2.3054) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [215/300][620/625] eta 0:00:02 lr 0.000257 wd 0.0500 time 0.4636 (0.4718) data time 0.0005 (0.0022) model time 0.4631 (0.4694) loss 3.0462 (2.6739) grad_norm 2.0465 (2.3022) loss_scale 1024.0000 (517.7713) mem 16706MB [2024-08-10 21:39:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 215 training takes 0:04:54 [2024-08-10 21:39:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:39:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.531 (0.531) Loss 0.5396 (0.5396) Acc@1 88.379 (88.379) Acc@5 98.584 (98.584) Mem 16706MB [2024-08-10 21:39:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.162) Loss 0.8550 (0.6408) Acc@1 78.906 (86.448) Acc@5 95.801 (97.647) Mem 16706MB [2024-08-10 21:39:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.9321 (0.7505) Acc@1 77.393 (83.570) Acc@5 95.508 (96.584) Mem 16706MB [2024-08-10 21:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.301 Acc@5 96.589 [2024-08-10 21:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.3% [2024-08-10 21:39:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.830 (0.830) Loss 0.4775 (0.4775) Acc@1 89.551 (89.551) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.194) Loss 0.7622 (0.5880) Acc@1 81.641 (87.247) Acc@5 96.973 (97.998) Mem 16706MB [2024-08-10 21:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.159) Loss 0.8447 (0.6898) Acc@1 80.078 (84.594) Acc@5 96.143 (97.042) Mem 16706MB [2024-08-10 21:39:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.323 Acc@5 97.027 [2024-08-10 21:39:25 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:39:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][0/625] eta 0:13:09 lr 0.000257 wd 0.0500 time 1.2631 (1.2631) data time 0.5719 (0.5719) model time 0.0000 (0.0000) loss 3.3480 (3.3480) grad_norm 1.8475 (1.8475) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:39:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][10/625] eta 0:05:31 lr 0.000257 wd 0.0500 time 0.4661 (0.5384) data time 0.0010 (0.0530) model time 0.0000 (0.0000) loss 3.2774 (3.0268) grad_norm 2.7203 (2.7805) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:39:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][20/625] eta 0:05:04 lr 0.000257 wd 0.0500 time 0.4732 (0.5040) data time 0.0011 (0.0282) model time 0.0000 (0.0000) loss 2.8630 (2.8100) grad_norm 2.2671 (2.8362) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:39:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][30/625] eta 0:05:01 lr 0.000257 wd 0.0500 time 0.4695 (0.5067) data time 0.0008 (0.0194) model time 0.0000 (0.0000) loss 2.2562 (2.7444) grad_norm 2.4514 (2.6110) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:39:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][40/625] eta 0:04:51 lr 0.000257 wd 0.0500 time 0.4646 (0.4976) data time 0.0010 (0.0149) model time 0.0000 (0.0000) loss 2.2268 (2.7149) grad_norm 2.2305 (2.5434) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][50/625] eta 0:04:43 lr 0.000256 wd 0.0500 time 0.4689 (0.4922) data time 0.0010 (0.0122) model time 0.0000 (0.0000) loss 2.3064 (2.7096) grad_norm 1.4084 (2.5618) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][60/625] eta 0:04:40 lr 0.000256 wd 0.0500 time 0.4764 (0.4956) data time 0.0008 (0.0103) model time 0.4756 (0.5121) loss 2.6368 (2.7008) grad_norm 1.5883 (2.4608) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][70/625] eta 0:04:33 lr 0.000256 wd 0.0500 time 0.4657 (0.4923) data time 0.0007 (0.0090) model time 0.4649 (0.4915) loss 3.0168 (2.6864) grad_norm 2.2413 (2.4228) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][80/625] eta 0:04:26 lr 0.000256 wd 0.0500 time 0.4607 (0.4892) data time 0.0010 (0.0080) model time 0.4596 (0.4831) loss 2.1451 (2.6999) grad_norm 1.5713 (2.3777) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][90/625] eta 0:04:20 lr 0.000256 wd 0.0500 time 0.4676 (0.4871) data time 0.0010 (0.0073) model time 0.4666 (0.4797) loss 2.3166 (2.7144) grad_norm 1.6075 (2.4913) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][100/625] eta 0:04:14 lr 0.000256 wd 0.0500 time 0.4692 (0.4851) data time 0.0011 (0.0066) model time 0.4681 (0.4768) loss 2.8648 (2.6832) grad_norm 1.9650 (2.4879) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][110/625] eta 0:04:09 lr 0.000256 wd 0.0500 time 0.4644 (0.4839) data time 0.0011 (0.0061) model time 0.4633 (0.4758) loss 3.4392 (2.6710) grad_norm 1.8548 (2.4816) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][120/625] eta 0:04:03 lr 0.000256 wd 0.0500 time 0.4680 (0.4827) data time 0.0011 (0.0057) model time 0.4669 (0.4747) loss 3.0066 (2.6760) grad_norm 4.1572 (2.4660) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][130/625] eta 0:03:58 lr 0.000256 wd 0.0500 time 0.4682 (0.4818) data time 0.0007 (0.0053) model time 0.4674 (0.4742) loss 2.5557 (2.6762) grad_norm 1.6136 (2.4837) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][140/625] eta 0:03:53 lr 0.000256 wd 0.0500 time 0.4700 (0.4810) data time 0.0008 (0.0050) model time 0.4692 (0.4736) loss 2.3747 (2.6778) grad_norm 2.9103 (2.5129) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][150/625] eta 0:03:48 lr 0.000256 wd 0.0500 time 0.4668 (0.4802) data time 0.0008 (0.0048) model time 0.4660 (0.4730) loss 1.5750 (2.6777) grad_norm 1.4795 (2.4709) loss_scale 1024.0000 (1024.0000) mem 16706MB [2024-08-10 21:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][160/625] eta 0:03:42 lr 0.000255 wd 0.0500 time 0.4757 (0.4795) data time 0.0010 (0.0045) model time 0.4747 (0.4726) loss 2.6022 (2.6881) grad_norm 2.9404 (nan) loss_scale 512.0000 (998.5590) mem 16706MB [2024-08-10 21:40:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][170/625] eta 0:03:37 lr 0.000255 wd 0.0500 time 0.4643 (0.4789) data time 0.0011 (0.0043) model time 0.4632 (0.4722) loss 2.8435 (2.6774) grad_norm 2.3637 (nan) loss_scale 512.0000 (970.1053) mem 16706MB [2024-08-10 21:40:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][180/625] eta 0:03:32 lr 0.000255 wd 0.0500 time 0.4685 (0.4783) data time 0.0010 (0.0041) model time 0.4675 (0.4718) loss 1.9475 (2.6857) grad_norm 2.5496 (nan) loss_scale 512.0000 (944.7956) mem 16706MB [2024-08-10 21:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][190/625] eta 0:03:27 lr 0.000255 wd 0.0500 time 0.4683 (0.4778) data time 0.0010 (0.0040) model time 0.4673 (0.4715) loss 2.6105 (2.6852) grad_norm 2.3582 (nan) loss_scale 512.0000 (922.1361) mem 16706MB [2024-08-10 21:41:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][200/625] eta 0:03:22 lr 0.000255 wd 0.0500 time 0.4680 (0.4773) data time 0.0011 (0.0039) model time 0.4669 (0.4712) loss 2.6354 (2.6815) grad_norm 1.7771 (nan) loss_scale 512.0000 (901.7313) mem 16706MB [2024-08-10 21:41:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][210/625] eta 0:03:17 lr 0.000255 wd 0.0500 time 0.4660 (0.4768) data time 0.0009 (0.0037) model time 0.4651 (0.4709) loss 2.8972 (2.6759) grad_norm 1.5989 (nan) loss_scale 512.0000 (883.2607) mem 16706MB [2024-08-10 21:41:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][220/625] eta 0:03:12 lr 0.000255 wd 0.0500 time 0.4672 (0.4764) data time 0.0009 (0.0036) model time 0.4663 (0.4707) loss 2.6304 (2.6719) grad_norm 1.6521 (nan) loss_scale 512.0000 (866.4615) mem 16706MB [2024-08-10 21:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][230/625] eta 0:03:08 lr 0.000255 wd 0.0500 time 0.4682 (0.4760) data time 0.0012 (0.0035) model time 0.4670 (0.4704) loss 2.6451 (2.6793) grad_norm 1.9806 (nan) loss_scale 512.0000 (851.1169) mem 16706MB [2024-08-10 21:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][240/625] eta 0:03:03 lr 0.000255 wd 0.0500 time 0.4641 (0.4757) data time 0.0010 (0.0034) model time 0.4631 (0.4702) loss 2.4352 (2.6702) grad_norm 1.7376 (nan) loss_scale 512.0000 (837.0456) mem 16706MB [2024-08-10 21:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][250/625] eta 0:02:58 lr 0.000255 wd 0.0500 time 0.4655 (0.4770) data time 0.0009 (0.0034) model time 0.4646 (0.4720) loss 2.7509 (2.6696) grad_norm 4.1981 (nan) loss_scale 512.0000 (824.0956) mem 16706MB [2024-08-10 21:41:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][260/625] eta 0:02:53 lr 0.000255 wd 0.0500 time 0.4667 (0.4766) data time 0.0007 (0.0033) model time 0.4659 (0.4717) loss 2.8691 (2.6660) grad_norm 2.5445 (nan) loss_scale 512.0000 (812.1379) mem 16706MB [2024-08-10 21:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][270/625] eta 0:02:49 lr 0.000255 wd 0.0500 time 0.4681 (0.4764) data time 0.0011 (0.0032) model time 0.4670 (0.4716) loss 2.7675 (2.6624) grad_norm 1.6516 (nan) loss_scale 512.0000 (801.0627) mem 16706MB [2024-08-10 21:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][280/625] eta 0:02:44 lr 0.000254 wd 0.0500 time 0.4637 (0.4760) data time 0.0010 (0.0031) model time 0.4626 (0.4713) loss 2.5440 (2.6661) grad_norm 2.4988 (nan) loss_scale 512.0000 (790.7758) mem 16706MB [2024-08-10 21:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][290/625] eta 0:02:39 lr 0.000254 wd 0.0500 time 0.4679 (0.4764) data time 0.0010 (0.0031) model time 0.4669 (0.4719) loss 2.7008 (2.6616) grad_norm 18.1633 (nan) loss_scale 512.0000 (781.1959) mem 16706MB [2024-08-10 21:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][300/625] eta 0:02:34 lr 0.000254 wd 0.0500 time 0.4639 (0.4760) data time 0.0011 (0.0030) model time 0.4629 (0.4716) loss 3.0791 (2.6658) grad_norm 2.0888 (nan) loss_scale 512.0000 (772.2525) mem 16706MB [2024-08-10 21:41:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][310/625] eta 0:02:29 lr 0.000254 wd 0.0500 time 0.4747 (0.4758) data time 0.0011 (0.0029) model time 0.4737 (0.4715) loss 2.6402 (2.6634) grad_norm 2.3965 (nan) loss_scale 512.0000 (763.8842) mem 16706MB [2024-08-10 21:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][320/625] eta 0:02:25 lr 0.000254 wd 0.0500 time 0.4636 (0.4755) data time 0.0010 (0.0029) model time 0.4626 (0.4712) loss 2.5757 (2.6592) grad_norm 1.8518 (nan) loss_scale 512.0000 (756.0374) mem 16706MB [2024-08-10 21:42:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][330/625] eta 0:02:20 lr 0.000254 wd 0.0500 time 0.4721 (0.4752) data time 0.0010 (0.0028) model time 0.4711 (0.4710) loss 2.6017 (2.6622) grad_norm 2.1615 (nan) loss_scale 512.0000 (748.6647) mem 16706MB [2024-08-10 21:42:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][340/625] eta 0:02:15 lr 0.000254 wd 0.0500 time 0.4624 (0.4749) data time 0.0010 (0.0028) model time 0.4613 (0.4708) loss 2.6636 (2.6626) grad_norm 2.4515 (nan) loss_scale 512.0000 (741.7243) mem 16706MB [2024-08-10 21:42:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][350/625] eta 0:02:10 lr 0.000254 wd 0.0500 time 0.4674 (0.4747) data time 0.0011 (0.0027) model time 0.4662 (0.4706) loss 2.6178 (2.6669) grad_norm 1.9396 (nan) loss_scale 512.0000 (735.1795) mem 16706MB [2024-08-10 21:42:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][360/625] eta 0:02:05 lr 0.000254 wd 0.0500 time 0.4643 (0.4751) data time 0.0008 (0.0027) model time 0.4635 (0.4712) loss 2.6859 (2.6706) grad_norm 1.8466 (nan) loss_scale 512.0000 (728.9972) mem 16706MB [2024-08-10 21:42:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][370/625] eta 0:02:01 lr 0.000254 wd 0.0500 time 0.4726 (0.4754) data time 0.0008 (0.0026) model time 0.4718 (0.4716) loss 2.5343 (2.6673) grad_norm 1.9852 (nan) loss_scale 512.0000 (723.1482) mem 16706MB [2024-08-10 21:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][380/625] eta 0:01:56 lr 0.000254 wd 0.0500 time 0.4635 (0.4751) data time 0.0008 (0.0026) model time 0.4627 (0.4714) loss 3.2521 (2.6692) grad_norm 2.3411 (nan) loss_scale 512.0000 (717.6063) mem 16706MB [2024-08-10 21:42:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][390/625] eta 0:01:51 lr 0.000253 wd 0.0500 time 0.4723 (0.4749) data time 0.0008 (0.0026) model time 0.4715 (0.4712) loss 2.8343 (2.6711) grad_norm 2.3269 (nan) loss_scale 512.0000 (712.3478) mem 16706MB [2024-08-10 21:42:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][400/625] eta 0:01:46 lr 0.000253 wd 0.0500 time 0.4710 (0.4747) data time 0.0010 (0.0025) model time 0.4699 (0.4710) loss 2.9643 (2.6736) grad_norm 2.0700 (nan) loss_scale 512.0000 (707.3516) mem 16706MB [2024-08-10 21:42:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][410/625] eta 0:01:42 lr 0.000253 wd 0.0500 time 0.4650 (0.4744) data time 0.0011 (0.0025) model time 0.4639 (0.4708) loss 2.5722 (2.6732) grad_norm 2.1733 (nan) loss_scale 512.0000 (702.5985) mem 16706MB [2024-08-10 21:42:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][420/625] eta 0:01:37 lr 0.000253 wd 0.0500 time 0.4667 (0.4742) data time 0.0011 (0.0024) model time 0.4656 (0.4706) loss 3.2388 (2.6779) grad_norm 1.9286 (nan) loss_scale 512.0000 (698.0713) mem 16706MB [2024-08-10 21:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][430/625] eta 0:01:32 lr 0.000253 wd 0.0500 time 0.4651 (0.4740) data time 0.0010 (0.0024) model time 0.4641 (0.4705) loss 2.7109 (2.6756) grad_norm 1.7608 (nan) loss_scale 512.0000 (693.7541) mem 16706MB [2024-08-10 21:42:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][440/625] eta 0:01:27 lr 0.000253 wd 0.0500 time 0.4651 (0.4739) data time 0.0010 (0.0024) model time 0.4641 (0.4704) loss 2.8566 (2.6780) grad_norm 3.5115 (nan) loss_scale 512.0000 (689.6327) mem 16706MB [2024-08-10 21:42:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][450/625] eta 0:01:22 lr 0.000253 wd 0.0500 time 0.4662 (0.4737) data time 0.0008 (0.0024) model time 0.4654 (0.4702) loss 3.0380 (2.6785) grad_norm 2.1312 (nan) loss_scale 512.0000 (685.6940) mem 16706MB [2024-08-10 21:43:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][460/625] eta 0:01:18 lr 0.000253 wd 0.0500 time 0.4613 (0.4735) data time 0.0011 (0.0023) model time 0.4602 (0.4701) loss 3.3598 (2.6805) grad_norm 8.4394 (nan) loss_scale 512.0000 (681.9262) mem 16706MB [2024-08-10 21:43:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][470/625] eta 0:01:13 lr 0.000253 wd 0.0500 time 0.4669 (0.4742) data time 0.0010 (0.0023) model time 0.4659 (0.4709) loss 2.7191 (2.6797) grad_norm 2.3622 (nan) loss_scale 512.0000 (678.3185) mem 16706MB [2024-08-10 21:43:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][480/625] eta 0:01:08 lr 0.000253 wd 0.0500 time 0.4668 (0.4741) data time 0.0010 (0.0023) model time 0.4658 (0.4709) loss 1.7560 (2.6770) grad_norm 1.8700 (nan) loss_scale 512.0000 (674.8607) mem 16706MB [2024-08-10 21:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][490/625] eta 0:01:03 lr 0.000253 wd 0.0500 time 0.4629 (0.4740) data time 0.0010 (0.0022) model time 0.4619 (0.4708) loss 2.9602 (2.6787) grad_norm 2.0175 (nan) loss_scale 512.0000 (671.5438) mem 16706MB [2024-08-10 21:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][500/625] eta 0:00:59 lr 0.000253 wd 0.0500 time 0.4619 (0.4738) data time 0.0008 (0.0022) model time 0.4610 (0.4706) loss 3.4885 (2.6813) grad_norm 19.8659 (nan) loss_scale 512.0000 (668.3593) mem 16706MB [2024-08-10 21:43:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][510/625] eta 0:00:54 lr 0.000252 wd 0.0500 time 0.4636 (0.4737) data time 0.0013 (0.0022) model time 0.4624 (0.4705) loss 2.6133 (2.6824) grad_norm 2.5443 (nan) loss_scale 512.0000 (665.2994) mem 16706MB [2024-08-10 21:43:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][520/625] eta 0:00:49 lr 0.000252 wd 0.0500 time 0.4647 (0.4735) data time 0.0011 (0.0022) model time 0.4636 (0.4704) loss 2.5980 (2.6809) grad_norm 1.9147 (nan) loss_scale 512.0000 (662.3570) mem 16706MB [2024-08-10 21:43:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][530/625] eta 0:00:44 lr 0.000252 wd 0.0500 time 0.4657 (0.4734) data time 0.0009 (0.0022) model time 0.4649 (0.4703) loss 3.0761 (2.6837) grad_norm 2.1745 (nan) loss_scale 512.0000 (659.5254) mem 16706MB [2024-08-10 21:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][540/625] eta 0:00:40 lr 0.000252 wd 0.0500 time 0.4674 (0.4733) data time 0.0008 (0.0021) model time 0.4666 (0.4702) loss 2.7065 (2.6846) grad_norm 2.1064 (nan) loss_scale 512.0000 (656.7985) mem 16706MB [2024-08-10 21:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][550/625] eta 0:00:35 lr 0.000252 wd 0.0500 time 0.4657 (0.4735) data time 0.0011 (0.0021) model time 0.4647 (0.4705) loss 2.6897 (2.6819) grad_norm 1.6031 (nan) loss_scale 512.0000 (654.1706) mem 16706MB [2024-08-10 21:43:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][560/625] eta 0:00:30 lr 0.000252 wd 0.0500 time 0.4655 (0.4735) data time 0.0008 (0.0021) model time 0.4647 (0.4706) loss 3.0968 (2.6861) grad_norm 2.4569 (nan) loss_scale 512.0000 (651.6364) mem 16706MB [2024-08-10 21:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][570/625] eta 0:00:26 lr 0.000252 wd 0.0500 time 0.4760 (0.4734) data time 0.0008 (0.0021) model time 0.4753 (0.4705) loss 2.7308 (2.6866) grad_norm 2.4547 (nan) loss_scale 512.0000 (649.1909) mem 16706MB [2024-08-10 21:44:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][580/625] eta 0:00:21 lr 0.000252 wd 0.0500 time 0.4809 (0.4734) data time 0.0011 (0.0021) model time 0.4798 (0.4705) loss 2.2824 (2.6840) grad_norm 3.3097 (nan) loss_scale 512.0000 (646.8296) mem 16706MB [2024-08-10 21:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][590/625] eta 0:00:16 lr 0.000252 wd 0.0500 time 0.4665 (0.4733) data time 0.0010 (0.0020) model time 0.4654 (0.4705) loss 2.3738 (2.6806) grad_norm 1.7771 (nan) loss_scale 512.0000 (644.5482) mem 16706MB [2024-08-10 21:44:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][600/625] eta 0:00:11 lr 0.000252 wd 0.0500 time 0.4660 (0.4732) data time 0.0011 (0.0020) model time 0.4649 (0.4704) loss 1.7318 (2.6815) grad_norm 1.5380 (nan) loss_scale 512.0000 (642.3428) mem 16706MB [2024-08-10 21:44:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][610/625] eta 0:00:07 lr 0.000252 wd 0.0500 time 0.4646 (0.4731) data time 0.0005 (0.0020) model time 0.4641 (0.4703) loss 1.7746 (2.6779) grad_norm 1.5364 (nan) loss_scale 512.0000 (640.2095) mem 16706MB [2024-08-10 21:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [216/300][620/625] eta 0:00:02 lr 0.000252 wd 0.0500 time 0.4597 (0.4736) data time 0.0008 (0.0020) model time 0.4589 (0.4709) loss 2.9677 (2.6816) grad_norm 2.7529 (nan) loss_scale 512.0000 (638.1449) mem 16706MB [2024-08-10 21:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 216 training takes 0:04:55 [2024-08-10 21:44:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:44:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:44:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.530 (0.530) Loss 0.5029 (0.5029) Acc@1 89.209 (89.209) Acc@5 98.828 (98.828) Mem 16706MB [2024-08-10 21:44:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.8403 (0.6231) Acc@1 80.029 (86.590) Acc@5 95.801 (97.696) Mem 16706MB [2024-08-10 21:44:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.9038 (0.7340) Acc@1 79.102 (83.805) Acc@5 95.312 (96.696) Mem 16706MB [2024-08-10 21:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.501 Acc@5 96.665 [2024-08-10 21:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 21:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.844 (0.844) Loss 0.4775 (0.4775) Acc@1 89.404 (89.404) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:44:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.193) Loss 0.7642 (0.5884) Acc@1 81.592 (87.274) Acc@5 96.924 (97.989) Mem 16706MB [2024-08-10 21:44:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.158) Loss 0.8457 (0.6901) Acc@1 80.078 (84.631) Acc@5 96.094 (97.021) Mem 16706MB [2024-08-10 21:44:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.353 Acc@5 97.005 [2024-08-10 21:44:29 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-10 21:44:29 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.35% [2024-08-10 21:44:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 21:44:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 21:44:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][0/625] eta 0:08:33 lr 0.000251 wd 0.0500 time 0.8218 (0.8218) data time 0.4169 (0.4169) model time 0.0000 (0.0000) loss 1.6987 (1.6987) grad_norm 2.9711 (2.9711) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:44:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][10/625] eta 0:05:16 lr 0.000251 wd 0.0500 time 0.4636 (0.5139) data time 0.0011 (0.0389) model time 0.0000 (0.0000) loss 2.5770 (2.5491) grad_norm 1.6462 (2.5400) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:44:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][20/625] eta 0:04:57 lr 0.000251 wd 0.0500 time 0.4637 (0.4912) data time 0.0009 (0.0209) model time 0.0000 (0.0000) loss 1.5965 (2.5736) grad_norm 2.0312 (2.2930) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][30/625] eta 0:04:47 lr 0.000251 wd 0.0500 time 0.4725 (0.4837) data time 0.0008 (0.0144) model time 0.0000 (0.0000) loss 2.5780 (2.5818) grad_norm 2.7328 (2.2747) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:44:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][40/625] eta 0:04:40 lr 0.000251 wd 0.0500 time 0.4680 (0.4798) data time 0.0008 (0.0112) model time 0.0000 (0.0000) loss 3.0090 (2.6027) grad_norm 1.5961 (2.1859) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:44:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][50/625] eta 0:04:34 lr 0.000251 wd 0.0500 time 0.4686 (0.4771) data time 0.0010 (0.0092) model time 0.0000 (0.0000) loss 2.8212 (2.6034) grad_norm 9.6235 (2.3051) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][60/625] eta 0:04:28 lr 0.000251 wd 0.0500 time 0.4635 (0.4751) data time 0.0010 (0.0078) model time 0.4625 (0.4640) loss 2.7281 (2.6264) grad_norm 2.0933 (2.2673) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][70/625] eta 0:04:22 lr 0.000251 wd 0.0500 time 0.4659 (0.4736) data time 0.0009 (0.0069) model time 0.4650 (0.4639) loss 3.2336 (2.6126) grad_norm 10.7580 (2.4156) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][80/625] eta 0:04:17 lr 0.000251 wd 0.0500 time 0.4654 (0.4725) data time 0.0008 (0.0061) model time 0.4646 (0.4639) loss 1.8461 (2.5968) grad_norm 1.7966 (2.3805) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][90/625] eta 0:04:12 lr 0.000251 wd 0.0500 time 0.4631 (0.4719) data time 0.0009 (0.0056) model time 0.4623 (0.4643) loss 1.7607 (2.5860) grad_norm 1.6440 (2.3429) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][100/625] eta 0:04:07 lr 0.000251 wd 0.0500 time 0.4682 (0.4715) data time 0.0009 (0.0051) model time 0.4673 (0.4648) loss 3.2277 (2.5953) grad_norm 2.2492 (2.2913) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][110/625] eta 0:04:02 lr 0.000251 wd 0.0500 time 0.4684 (0.4710) data time 0.0011 (0.0047) model time 0.4673 (0.4650) loss 2.6621 (2.6112) grad_norm 1.8041 (2.2981) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][120/625] eta 0:03:57 lr 0.000250 wd 0.0500 time 0.4654 (0.4706) data time 0.0008 (0.0044) model time 0.4646 (0.4650) loss 3.1694 (2.6282) grad_norm 2.6783 (2.3057) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][130/625] eta 0:03:54 lr 0.000250 wd 0.0500 time 0.4671 (0.4733) data time 0.0008 (0.0042) model time 0.4664 (0.4699) loss 1.9806 (2.6274) grad_norm 2.5163 (2.2779) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][140/625] eta 0:03:49 lr 0.000250 wd 0.0500 time 0.4634 (0.4728) data time 0.0015 (0.0039) model time 0.4619 (0.4694) loss 2.9833 (2.6355) grad_norm 2.1611 (2.2576) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][150/625] eta 0:03:44 lr 0.000250 wd 0.0500 time 0.4681 (0.4725) data time 0.0011 (0.0038) model time 0.4670 (0.4692) loss 1.8343 (2.6212) grad_norm 3.0401 (2.2431) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][160/625] eta 0:03:39 lr 0.000250 wd 0.0500 time 0.4652 (0.4723) data time 0.0007 (0.0036) model time 0.4645 (0.4692) loss 2.9077 (2.6245) grad_norm 2.7401 (2.2862) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][170/625] eta 0:03:34 lr 0.000250 wd 0.0500 time 0.4650 (0.4720) data time 0.0007 (0.0034) model time 0.4642 (0.4689) loss 1.9504 (2.6269) grad_norm 2.1008 (2.3069) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:45:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][180/625] eta 0:03:29 lr 0.000250 wd 0.0500 time 0.4644 (0.4718) data time 0.0008 (0.0033) model time 0.4636 (0.4688) loss 2.6795 (2.6279) grad_norm 1.9671 (2.2875) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][190/625] eta 0:03:25 lr 0.000250 wd 0.0500 time 0.4653 (0.4726) data time 0.0011 (0.0032) model time 0.4642 (0.4700) loss 3.2182 (2.6373) grad_norm 1.2950 (2.2735) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][200/625] eta 0:03:20 lr 0.000250 wd 0.0500 time 0.4637 (0.4723) data time 0.0010 (0.0031) model time 0.4627 (0.4697) loss 2.7452 (2.6497) grad_norm 2.3648 (2.2525) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][210/625] eta 0:03:15 lr 0.000250 wd 0.0500 time 0.4755 (0.4721) data time 0.0008 (0.0030) model time 0.4747 (0.4694) loss 1.7805 (2.6327) grad_norm 3.4265 (2.2653) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][220/625] eta 0:03:11 lr 0.000250 wd 0.0500 time 0.4732 (0.4719) data time 0.0008 (0.0029) model time 0.4725 (0.4693) loss 3.0848 (2.6289) grad_norm 1.3599 (2.2484) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][230/625] eta 0:03:06 lr 0.000250 wd 0.0500 time 0.6757 (0.4726) data time 0.0008 (0.0028) model time 0.6749 (0.4703) loss 2.9181 (2.6171) grad_norm 1.6885 (2.2319) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][240/625] eta 0:03:01 lr 0.000249 wd 0.0500 time 0.4684 (0.4722) data time 0.0008 (0.0028) model time 0.4676 (0.4699) loss 2.2538 (2.6237) grad_norm 1.7618 (2.2200) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][250/625] eta 0:02:57 lr 0.000249 wd 0.0500 time 0.4732 (0.4722) data time 0.0011 (0.0027) model time 0.4721 (0.4699) loss 3.0039 (2.6142) grad_norm 1.8273 (2.2035) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][260/625] eta 0:02:52 lr 0.000249 wd 0.0500 time 0.4625 (0.4720) data time 0.0010 (0.0026) model time 0.4615 (0.4698) loss 2.5732 (2.6157) grad_norm 2.0801 (2.1915) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][270/625] eta 0:02:47 lr 0.000249 wd 0.0500 time 0.4657 (0.4718) data time 0.0008 (0.0026) model time 0.4649 (0.4696) loss 3.1905 (2.6196) grad_norm 2.0825 (2.1890) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][280/625] eta 0:02:42 lr 0.000249 wd 0.0500 time 0.4652 (0.4717) data time 0.0012 (0.0025) model time 0.4640 (0.4695) loss 2.6362 (2.6145) grad_norm 2.4412 (2.1838) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][290/625] eta 0:02:37 lr 0.000249 wd 0.0500 time 0.4663 (0.4715) data time 0.0008 (0.0025) model time 0.4655 (0.4693) loss 2.6340 (2.6126) grad_norm 2.4144 (2.2005) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][300/625] eta 0:02:33 lr 0.000249 wd 0.0500 time 0.4683 (0.4714) data time 0.0010 (0.0024) model time 0.4673 (0.4692) loss 3.2239 (2.6172) grad_norm 1.9290 (2.2013) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:46:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][310/625] eta 0:02:28 lr 0.000249 wd 0.0500 time 0.4675 (0.4713) data time 0.0010 (0.0024) model time 0.4665 (0.4691) loss 2.6864 (2.6166) grad_norm 1.9330 (2.2098) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][320/625] eta 0:02:23 lr 0.000249 wd 0.0500 time 0.4642 (0.4712) data time 0.0008 (0.0023) model time 0.4635 (0.4691) loss 2.6936 (2.6194) grad_norm 1.8089 (2.2163) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][330/625] eta 0:02:18 lr 0.000249 wd 0.0500 time 0.4805 (0.4712) data time 0.0008 (0.0023) model time 0.4797 (0.4691) loss 2.0592 (2.6217) grad_norm 2.9020 (2.2230) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][340/625] eta 0:02:14 lr 0.000249 wd 0.0500 time 0.4663 (0.4710) data time 0.0008 (0.0022) model time 0.4655 (0.4689) loss 2.9807 (2.6233) grad_norm 1.5071 (2.2338) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][350/625] eta 0:02:09 lr 0.000248 wd 0.0500 time 0.4651 (0.4719) data time 0.0011 (0.0022) model time 0.4640 (0.4699) loss 2.7589 (2.6230) grad_norm 2.1297 (2.2249) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][360/625] eta 0:02:05 lr 0.000248 wd 0.0500 time 0.4679 (0.4717) data time 0.0011 (0.0022) model time 0.4668 (0.4698) loss 3.0837 (2.6291) grad_norm 3.1650 (2.2435) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][370/625] eta 0:02:00 lr 0.000248 wd 0.0500 time 0.4733 (0.4716) data time 0.0011 (0.0022) model time 0.4722 (0.4697) loss 2.8578 (2.6289) grad_norm 2.3917 (2.2435) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][380/625] eta 0:01:55 lr 0.000248 wd 0.0500 time 0.4662 (0.4720) data time 0.0010 (0.0021) model time 0.4652 (0.4702) loss 2.7722 (2.6240) grad_norm 1.7428 (2.2581) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][390/625] eta 0:01:50 lr 0.000248 wd 0.0500 time 0.4710 (0.4720) data time 0.0008 (0.0021) model time 0.4702 (0.4702) loss 2.8786 (2.6269) grad_norm 4.0108 (2.2570) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][400/625] eta 0:01:46 lr 0.000248 wd 0.0500 time 0.4687 (0.4720) data time 0.0010 (0.0021) model time 0.4676 (0.4702) loss 3.0651 (2.6303) grad_norm 1.7678 (2.2600) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][410/625] eta 0:01:41 lr 0.000248 wd 0.0500 time 0.4677 (0.4727) data time 0.0011 (0.0020) model time 0.4666 (0.4711) loss 2.3272 (2.6271) grad_norm 2.7730 (2.2962) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][420/625] eta 0:01:36 lr 0.000248 wd 0.0500 time 0.4740 (0.4726) data time 0.0010 (0.0020) model time 0.4730 (0.4710) loss 1.8223 (2.6263) grad_norm 2.3445 (2.3822) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:47:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][430/625] eta 0:01:32 lr 0.000248 wd 0.0500 time 0.4715 (0.4725) data time 0.0008 (0.0020) model time 0.4707 (0.4709) loss 3.1569 (2.6273) grad_norm 2.8971 (2.3985) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][440/625] eta 0:01:27 lr 0.000248 wd 0.0500 time 0.4733 (0.4724) data time 0.0008 (0.0020) model time 0.4726 (0.4708) loss 2.9638 (2.6298) grad_norm 3.3693 (2.4070) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][450/625] eta 0:01:22 lr 0.000248 wd 0.0500 time 0.4698 (0.4723) data time 0.0010 (0.0020) model time 0.4688 (0.4707) loss 2.7985 (2.6257) grad_norm 1.9021 (2.4034) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][460/625] eta 0:01:17 lr 0.000248 wd 0.0500 time 0.4671 (0.4722) data time 0.0010 (0.0019) model time 0.4660 (0.4705) loss 2.9560 (2.6293) grad_norm 1.4195 (2.4050) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][470/625] eta 0:01:13 lr 0.000247 wd 0.0500 time 0.4691 (0.4721) data time 0.0008 (0.0019) model time 0.4683 (0.4704) loss 1.7026 (2.6312) grad_norm 3.0734 (2.4025) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][480/625] eta 0:01:08 lr 0.000247 wd 0.0500 time 0.4670 (0.4720) data time 0.0008 (0.0019) model time 0.4662 (0.4703) loss 1.4780 (2.6300) grad_norm 2.8701 (2.3996) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][490/625] eta 0:01:03 lr 0.000247 wd 0.0500 time 0.4642 (0.4723) data time 0.0010 (0.0019) model time 0.4632 (0.4708) loss 2.8391 (2.6273) grad_norm 1.8360 (2.4163) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][500/625] eta 0:00:59 lr 0.000247 wd 0.0500 time 0.4666 (0.4726) data time 0.0008 (0.0019) model time 0.4658 (0.4710) loss 2.9902 (2.6300) grad_norm 2.1813 (2.4066) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][510/625] eta 0:00:54 lr 0.000247 wd 0.0500 time 0.4643 (0.4724) data time 0.0011 (0.0018) model time 0.4632 (0.4709) loss 2.7932 (2.6301) grad_norm 1.8767 (2.4083) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][520/625] eta 0:00:49 lr 0.000247 wd 0.0500 time 0.4645 (0.4723) data time 0.0007 (0.0018) model time 0.4638 (0.4708) loss 2.2545 (2.6321) grad_norm 2.0148 (2.4137) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][530/625] eta 0:00:44 lr 0.000247 wd 0.0500 time 0.4737 (0.4723) data time 0.0010 (0.0018) model time 0.4726 (0.4707) loss 2.4347 (2.6280) grad_norm 2.1522 (2.4078) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][540/625] eta 0:00:40 lr 0.000247 wd 0.0500 time 0.4668 (0.4722) data time 0.0008 (0.0018) model time 0.4660 (0.4706) loss 2.6418 (2.6322) grad_norm 2.2447 (2.4077) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][550/625] eta 0:00:35 lr 0.000247 wd 0.0500 time 0.4630 (0.4725) data time 0.0010 (0.0018) model time 0.4620 (0.4710) loss 3.0923 (2.6364) grad_norm 2.0770 (2.4011) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][560/625] eta 0:00:30 lr 0.000247 wd 0.0500 time 0.4619 (0.4727) data time 0.0009 (0.0018) model time 0.4610 (0.4712) loss 3.0117 (2.6367) grad_norm 2.4758 (2.4204) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][570/625] eta 0:00:25 lr 0.000247 wd 0.0500 time 0.4678 (0.4726) data time 0.0008 (0.0018) model time 0.4670 (0.4711) loss 2.3520 (2.6365) grad_norm 1.4185 (2.4246) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][580/625] eta 0:00:21 lr 0.000247 wd 0.0500 time 0.4653 (0.4725) data time 0.0008 (0.0018) model time 0.4645 (0.4710) loss 3.3668 (2.6378) grad_norm 2.3897 (2.4207) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][590/625] eta 0:00:16 lr 0.000246 wd 0.0500 time 0.4646 (0.4724) data time 0.0010 (0.0017) model time 0.4636 (0.4709) loss 2.4925 (2.6416) grad_norm 2.0842 (2.4215) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][600/625] eta 0:00:11 lr 0.000246 wd 0.0500 time 0.4634 (0.4723) data time 0.0012 (0.0017) model time 0.4622 (0.4709) loss 2.1976 (2.6401) grad_norm 4.2820 (2.4231) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][610/625] eta 0:00:07 lr 0.000246 wd 0.0500 time 0.4643 (0.4723) data time 0.0005 (0.0017) model time 0.4637 (0.4708) loss 3.2547 (2.6424) grad_norm 2.4693 (2.4178) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [217/300][620/625] eta 0:00:02 lr 0.000246 wd 0.0500 time 0.4658 (0.4722) data time 0.0005 (0.0017) model time 0.4653 (0.4707) loss 2.4492 (2.6432) grad_norm 1.7840 (2.4117) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 217 training takes 0:04:55 [2024-08-10 21:49:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:49:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:49:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.575 (0.575) Loss 0.5093 (0.5093) Acc@1 89.307 (89.307) Acc@5 98.828 (98.828) Mem 16706MB [2024-08-10 21:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.166) Loss 0.8560 (0.6321) Acc@1 79.639 (86.572) Acc@5 96.143 (97.754) Mem 16706MB [2024-08-10 21:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.143) Loss 0.8887 (0.7445) Acc@1 79.395 (83.745) Acc@5 95.654 (96.687) Mem 16706MB [2024-08-10 21:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.389 Acc@5 96.663 [2024-08-10 21:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.4% [2024-08-10 21:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.845 (0.845) Loss 0.4768 (0.4768) Acc@1 89.502 (89.502) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.192) Loss 0.7656 (0.5884) Acc@1 81.494 (87.274) Acc@5 96.924 (97.980) Mem 16706MB [2024-08-10 21:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.157) Loss 0.8452 (0.6903) Acc@1 79.980 (84.624) Acc@5 95.996 (97.028) Mem 16706MB [2024-08-10 21:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.343 Acc@5 97.007 [2024-08-10 21:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:49:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][0/625] eta 0:13:44 lr 0.000246 wd 0.0500 time 1.3185 (1.3185) data time 0.4883 (0.4883) model time 0.0000 (0.0000) loss 2.6234 (2.6234) grad_norm 1.3775 (1.3775) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][10/625] eta 0:05:34 lr 0.000246 wd 0.0500 time 0.4700 (0.5443) data time 0.0010 (0.0453) model time 0.0000 (0.0000) loss 2.8758 (2.3785) grad_norm 2.5112 (2.0579) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][20/625] eta 0:05:06 lr 0.000246 wd 0.0500 time 0.4653 (0.5070) data time 0.0008 (0.0242) model time 0.0000 (0.0000) loss 2.3781 (2.5639) grad_norm 2.8101 (2.3055) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][30/625] eta 0:04:55 lr 0.000246 wd 0.0500 time 0.4662 (0.4963) data time 0.0008 (0.0167) model time 0.0000 (0.0000) loss 1.4719 (2.6238) grad_norm 1.9469 (2.2942) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:49:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][40/625] eta 0:04:46 lr 0.000246 wd 0.0500 time 0.4737 (0.4891) data time 0.0011 (0.0129) model time 0.0000 (0.0000) loss 2.6395 (2.6200) grad_norm 2.1659 (2.4274) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][50/625] eta 0:04:38 lr 0.000246 wd 0.0500 time 0.4633 (0.4847) data time 0.0010 (0.0106) model time 0.0000 (0.0000) loss 3.0076 (2.6478) grad_norm 2.0456 (2.3236) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][60/625] eta 0:04:32 lr 0.000246 wd 0.0500 time 0.4667 (0.4818) data time 0.0011 (0.0090) model time 0.4655 (0.4655) loss 2.6077 (2.6345) grad_norm 1.7648 (2.2569) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][70/625] eta 0:04:26 lr 0.000246 wd 0.0500 time 0.4657 (0.4798) data time 0.0010 (0.0079) model time 0.4647 (0.4662) loss 2.1545 (2.6537) grad_norm 1.9297 (2.2241) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][80/625] eta 0:04:20 lr 0.000245 wd 0.0500 time 0.4671 (0.4782) data time 0.0008 (0.0070) model time 0.4663 (0.4660) loss 3.0184 (2.6726) grad_norm 4.3846 (2.2587) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][90/625] eta 0:04:17 lr 0.000245 wd 0.0500 time 0.4674 (0.4815) data time 0.0008 (0.0064) model time 0.4667 (0.4763) loss 3.0532 (2.6714) grad_norm 1.8683 (2.3028) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][100/625] eta 0:04:12 lr 0.000245 wd 0.0500 time 0.4651 (0.4801) data time 0.0010 (0.0058) model time 0.4641 (0.4744) loss 2.9016 (2.6871) grad_norm 1.8661 (2.2840) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][110/625] eta 0:04:06 lr 0.000245 wd 0.0500 time 0.4661 (0.4791) data time 0.0007 (0.0054) model time 0.4653 (0.4733) loss 3.1734 (2.7016) grad_norm 1.7862 (2.3049) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][120/625] eta 0:04:02 lr 0.000245 wd 0.0500 time 0.4721 (0.4800) data time 0.0010 (0.0050) model time 0.4711 (0.4756) loss 2.6231 (2.6930) grad_norm 1.7692 (2.2848) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][130/625] eta 0:03:57 lr 0.000245 wd 0.0500 time 0.4660 (0.4791) data time 0.0008 (0.0047) model time 0.4652 (0.4745) loss 2.4468 (2.6799) grad_norm 2.1093 (2.2545) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][140/625] eta 0:03:51 lr 0.000245 wd 0.0500 time 0.4633 (0.4782) data time 0.0012 (0.0045) model time 0.4621 (0.4735) loss 2.7798 (2.6866) grad_norm 2.0528 (2.2282) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][150/625] eta 0:03:46 lr 0.000245 wd 0.0500 time 0.4675 (0.4775) data time 0.0010 (0.0042) model time 0.4665 (0.4727) loss 2.6654 (2.6891) grad_norm 2.0035 (2.2184) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][160/625] eta 0:03:41 lr 0.000245 wd 0.0500 time 0.4663 (0.4769) data time 0.0010 (0.0040) model time 0.4653 (0.4722) loss 2.4781 (2.6891) grad_norm 1.9444 (2.1929) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:50:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][170/625] eta 0:03:36 lr 0.000245 wd 0.0500 time 0.4701 (0.4762) data time 0.0011 (0.0039) model time 0.4690 (0.4716) loss 2.8008 (2.6864) grad_norm 2.0987 (2.1720) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][180/625] eta 0:03:31 lr 0.000245 wd 0.0500 time 0.4657 (0.4758) data time 0.0009 (0.0037) model time 0.4648 (0.4713) loss 3.2551 (2.7005) grad_norm 1.9282 (2.1545) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][190/625] eta 0:03:26 lr 0.000245 wd 0.0500 time 0.4651 (0.4752) data time 0.0010 (0.0036) model time 0.4641 (0.4707) loss 2.8118 (2.7040) grad_norm 1.9672 (2.2775) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][200/625] eta 0:03:21 lr 0.000244 wd 0.0500 time 0.4697 (0.4747) data time 0.0008 (0.0035) model time 0.4690 (0.4702) loss 2.5447 (2.6988) grad_norm 2.1346 (2.2667) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][210/625] eta 0:03:16 lr 0.000244 wd 0.0500 time 0.4647 (0.4742) data time 0.0008 (0.0033) model time 0.4639 (0.4698) loss 3.2227 (2.7033) grad_norm 2.0407 (2.2544) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][220/625] eta 0:03:11 lr 0.000244 wd 0.0500 time 0.4681 (0.4739) data time 0.0008 (0.0032) model time 0.4673 (0.4696) loss 2.9762 (2.7127) grad_norm 2.1457 (2.2531) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][230/625] eta 0:03:07 lr 0.000244 wd 0.0500 time 0.4657 (0.4736) data time 0.0010 (0.0031) model time 0.4646 (0.4694) loss 2.6507 (2.7076) grad_norm 3.4103 (2.2610) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][240/625] eta 0:03:02 lr 0.000244 wd 0.0500 time 0.4748 (0.4734) data time 0.0008 (0.0031) model time 0.4740 (0.4693) loss 2.8088 (2.7122) grad_norm 2.6990 (2.2613) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][250/625] eta 0:02:57 lr 0.000244 wd 0.0500 time 0.4749 (0.4738) data time 0.0007 (0.0030) model time 0.4741 (0.4700) loss 3.3489 (2.7067) grad_norm 2.0959 (2.2510) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][260/625] eta 0:02:52 lr 0.000244 wd 0.0500 time 0.4635 (0.4737) data time 0.0011 (0.0029) model time 0.4624 (0.4699) loss 3.1671 (2.7084) grad_norm 3.7696 (2.2710) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][270/625] eta 0:02:48 lr 0.000244 wd 0.0500 time 0.4662 (0.4734) data time 0.0008 (0.0028) model time 0.4655 (0.4698) loss 1.9396 (2.7041) grad_norm 3.7982 (2.2887) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][280/625] eta 0:02:43 lr 0.000244 wd 0.0500 time 0.4704 (0.4732) data time 0.0008 (0.0028) model time 0.4696 (0.4696) loss 2.9123 (2.7062) grad_norm 1.7893 (2.2908) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][290/625] eta 0:02:38 lr 0.000244 wd 0.0500 time 0.4637 (0.4730) data time 0.0010 (0.0027) model time 0.4627 (0.4695) loss 2.7792 (2.7025) grad_norm 1.4857 (2.2905) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:51:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][300/625] eta 0:02:33 lr 0.000244 wd 0.0500 time 0.4661 (0.4728) data time 0.0010 (0.0027) model time 0.4650 (0.4693) loss 3.1155 (2.6993) grad_norm 2.0215 (2.2805) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:52:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][310/625] eta 0:02:28 lr 0.000244 wd 0.0500 time 0.4656 (0.4726) data time 0.0011 (0.0026) model time 0.4645 (0.4692) loss 1.3687 (2.6937) grad_norm 1.5307 (2.2848) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:52:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][320/625] eta 0:02:24 lr 0.000243 wd 0.0500 time 0.4881 (0.4726) data time 0.0010 (0.0025) model time 0.4871 (0.4693) loss 3.0214 (2.6966) grad_norm 2.1927 (2.2774) loss_scale 512.0000 (512.0000) mem 16706MB [2024-08-10 21:52:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][330/625] eta 0:02:19 lr 0.000243 wd 0.0500 time 0.4733 (0.4724) data time 0.0010 (0.0025) model time 0.4723 (0.4691) loss 3.1552 (2.6980) grad_norm 2.9901 (inf) loss_scale 256.0000 (505.8127) mem 16706MB [2024-08-10 21:52:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][340/625] eta 0:02:14 lr 0.000243 wd 0.0500 time 0.4686 (0.4724) data time 0.0010 (0.0025) model time 0.4676 (0.4691) loss 1.8915 (2.6963) grad_norm 2.6722 (inf) loss_scale 256.0000 (498.4868) mem 16706MB [2024-08-10 21:52:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][350/625] eta 0:02:09 lr 0.000243 wd 0.0500 time 0.4633 (0.4723) data time 0.0008 (0.0024) model time 0.4625 (0.4692) loss 3.0285 (2.6959) grad_norm 1.9192 (inf) loss_scale 256.0000 (491.5783) mem 16706MB [2024-08-10 21:52:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][360/625] eta 0:02:05 lr 0.000243 wd 0.0500 time 0.4849 (0.4723) data time 0.0007 (0.0024) model time 0.4841 (0.4692) loss 3.4696 (2.6973) grad_norm 1.6279 (inf) loss_scale 256.0000 (485.0526) mem 16706MB [2024-08-10 21:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][370/625] eta 0:02:00 lr 0.000243 wd 0.0500 time 0.4690 (0.4722) data time 0.0008 (0.0023) model time 0.4682 (0.4692) loss 2.2113 (2.6919) grad_norm 1.8144 (inf) loss_scale 128.0000 (475.4286) mem 16706MB [2024-08-10 21:52:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][380/625] eta 0:01:55 lr 0.000243 wd 0.0500 time 0.4671 (0.4722) data time 0.0007 (0.0023) model time 0.4664 (0.4692) loss 1.9113 (2.6839) grad_norm 3.4710 (inf) loss_scale 128.0000 (466.3097) mem 16706MB [2024-08-10 21:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][390/625] eta 0:01:51 lr 0.000243 wd 0.0500 time 0.4114 (0.4725) data time 0.0011 (0.0023) model time 0.4104 (0.4696) loss 2.1644 (2.6777) grad_norm 3.9969 (inf) loss_scale 128.0000 (457.6573) mem 16706MB [2024-08-10 21:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][400/625] eta 0:01:46 lr 0.000243 wd 0.0500 time 0.4877 (0.4726) data time 0.0011 (0.0022) model time 0.4866 (0.4697) loss 3.1591 (2.6805) grad_norm 9.2874 (inf) loss_scale 128.0000 (449.4364) mem 16706MB [2024-08-10 21:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][410/625] eta 0:01:41 lr 0.000243 wd 0.0500 time 0.4721 (0.4725) data time 0.0011 (0.0022) model time 0.4710 (0.4698) loss 3.0815 (2.6793) grad_norm 2.1431 (inf) loss_scale 128.0000 (441.6156) mem 16706MB [2024-08-10 21:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][420/625] eta 0:01:36 lr 0.000243 wd 0.0500 time 0.4673 (0.4725) data time 0.0008 (0.0022) model time 0.4666 (0.4698) loss 2.9504 (2.6762) grad_norm 2.4250 (inf) loss_scale 128.0000 (434.1663) mem 16706MB [2024-08-10 21:52:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][430/625] eta 0:01:32 lr 0.000243 wd 0.0500 time 0.4624 (0.4731) data time 0.0010 (0.0022) model time 0.4614 (0.4705) loss 2.7245 (2.6737) grad_norm 1.6378 (inf) loss_scale 128.0000 (427.0626) mem 16706MB [2024-08-10 21:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][440/625] eta 0:01:27 lr 0.000242 wd 0.0500 time 0.4898 (0.4730) data time 0.0008 (0.0021) model time 0.4890 (0.4705) loss 2.6855 (2.6751) grad_norm 2.3829 (inf) loss_scale 128.0000 (420.2812) mem 16706MB [2024-08-10 21:53:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][450/625] eta 0:01:22 lr 0.000242 wd 0.0500 time 0.4634 (0.4734) data time 0.0008 (0.0021) model time 0.4626 (0.4709) loss 2.9823 (2.6736) grad_norm 2.1265 (inf) loss_scale 128.0000 (413.8004) mem 16706MB [2024-08-10 21:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][460/625] eta 0:01:18 lr 0.000242 wd 0.0500 time 0.4661 (0.4733) data time 0.0010 (0.0021) model time 0.4651 (0.4709) loss 2.5696 (2.6773) grad_norm 1.5108 (inf) loss_scale 128.0000 (407.6009) mem 16706MB [2024-08-10 21:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][470/625] eta 0:01:13 lr 0.000242 wd 0.0500 time 0.4658 (0.4732) data time 0.0008 (0.0021) model time 0.4650 (0.4708) loss 2.9782 (2.6769) grad_norm 2.9530 (inf) loss_scale 128.0000 (401.6645) mem 16706MB [2024-08-10 21:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][480/625] eta 0:01:08 lr 0.000242 wd 0.0500 time 0.4868 (0.4732) data time 0.0008 (0.0020) model time 0.4860 (0.4708) loss 1.8130 (2.6752) grad_norm 1.7666 (inf) loss_scale 128.0000 (395.9751) mem 16706MB [2024-08-10 21:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][490/625] eta 0:01:03 lr 0.000242 wd 0.0500 time 0.4665 (0.4731) data time 0.0008 (0.0020) model time 0.4658 (0.4707) loss 3.0399 (2.6758) grad_norm 3.3231 (inf) loss_scale 128.0000 (390.5173) mem 16706MB [2024-08-10 21:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][500/625] eta 0:00:59 lr 0.000242 wd 0.0500 time 0.4687 (0.4731) data time 0.0011 (0.0020) model time 0.4677 (0.4707) loss 2.8918 (2.6818) grad_norm 2.6911 (inf) loss_scale 128.0000 (385.2774) mem 16706MB [2024-08-10 21:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][510/625] eta 0:00:54 lr 0.000242 wd 0.0500 time 0.4662 (0.4730) data time 0.0010 (0.0020) model time 0.4652 (0.4707) loss 2.6949 (2.6817) grad_norm 2.1028 (inf) loss_scale 128.0000 (380.2427) mem 16706MB [2024-08-10 21:53:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][520/625] eta 0:00:49 lr 0.000242 wd 0.0500 time 0.4858 (0.4730) data time 0.0011 (0.0020) model time 0.4847 (0.4707) loss 2.6434 (2.6849) grad_norm 1.7806 (inf) loss_scale 128.0000 (375.4012) mem 16706MB [2024-08-10 21:53:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][530/625] eta 0:00:44 lr 0.000242 wd 0.0500 time 0.4640 (0.4729) data time 0.0008 (0.0019) model time 0.4632 (0.4706) loss 2.8340 (2.6847) grad_norm 1.8240 (inf) loss_scale 128.0000 (370.7420) mem 16706MB [2024-08-10 21:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][540/625] eta 0:00:40 lr 0.000242 wd 0.0500 time 0.4671 (0.4728) data time 0.0011 (0.0019) model time 0.4660 (0.4706) loss 1.9338 (2.6828) grad_norm 2.3163 (inf) loss_scale 128.0000 (366.2551) mem 16706MB [2024-08-10 21:53:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][550/625] eta 0:00:35 lr 0.000242 wd 0.0500 time 0.4655 (0.4728) data time 0.0010 (0.0019) model time 0.4645 (0.4705) loss 2.7462 (2.6819) grad_norm 1.9255 (inf) loss_scale 128.0000 (361.9310) mem 16706MB [2024-08-10 21:54:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][560/625] eta 0:00:30 lr 0.000241 wd 0.0500 time 0.4725 (0.4727) data time 0.0012 (0.0019) model time 0.4712 (0.4705) loss 2.4079 (2.6808) grad_norm 2.7963 (inf) loss_scale 128.0000 (357.7611) mem 16706MB [2024-08-10 21:54:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][570/625] eta 0:00:25 lr 0.000241 wd 0.0500 time 0.4600 (0.4726) data time 0.0008 (0.0019) model time 0.4592 (0.4704) loss 2.0112 (2.6798) grad_norm 1.6554 (inf) loss_scale 128.0000 (353.7373) mem 16706MB [2024-08-10 21:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][580/625] eta 0:00:21 lr 0.000241 wd 0.0500 time 0.4636 (0.4725) data time 0.0011 (0.0019) model time 0.4624 (0.4703) loss 2.0709 (2.6838) grad_norm 2.1990 (inf) loss_scale 128.0000 (349.8520) mem 16706MB [2024-08-10 21:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][590/625] eta 0:00:16 lr 0.000241 wd 0.0500 time 0.4704 (0.4724) data time 0.0008 (0.0019) model time 0.4695 (0.4702) loss 3.0975 (2.6813) grad_norm 2.1782 (inf) loss_scale 128.0000 (346.0981) mem 16706MB [2024-08-10 21:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][600/625] eta 0:00:11 lr 0.000241 wd 0.0500 time 0.4793 (0.4724) data time 0.0011 (0.0018) model time 0.4782 (0.4702) loss 2.5892 (2.6807) grad_norm 1.6563 (inf) loss_scale 128.0000 (342.4692) mem 16706MB [2024-08-10 21:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][610/625] eta 0:00:07 lr 0.000241 wd 0.0500 time 0.4638 (0.4723) data time 0.0005 (0.0018) model time 0.4633 (0.4701) loss 2.3177 (2.6788) grad_norm 2.4260 (inf) loss_scale 128.0000 (338.9591) mem 16706MB [2024-08-10 21:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [218/300][620/625] eta 0:00:02 lr 0.000241 wd 0.0500 time 0.4646 (0.4724) data time 0.0005 (0.0018) model time 0.4640 (0.4702) loss 3.1395 (2.6762) grad_norm 1.4626 (inf) loss_scale 128.0000 (335.5620) mem 16706MB [2024-08-10 21:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 218 training takes 0:04:55 [2024-08-10 21:54:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:54:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.516 (0.516) Loss 0.5166 (0.5166) Acc@1 89.014 (89.014) Acc@5 98.682 (98.682) Mem 16706MB [2024-08-10 21:54:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.161) Loss 0.8223 (0.6257) Acc@1 80.518 (86.688) Acc@5 96.289 (97.718) Mem 16706MB [2024-08-10 21:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.141) Loss 0.9316 (0.7406) Acc@1 78.174 (83.763) Acc@5 95.166 (96.636) Mem 16706MB [2024-08-10 21:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.477 Acc@5 96.631 [2024-08-10 21:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 21:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.814 (0.814) Loss 0.4778 (0.4778) Acc@1 89.600 (89.600) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.190) Loss 0.7671 (0.5889) Acc@1 81.396 (87.278) Acc@5 96.875 (97.963) Mem 16706MB [2024-08-10 21:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.156) Loss 0.8472 (0.6908) Acc@1 80.029 (84.614) Acc@5 96.143 (97.024) Mem 16706MB [2024-08-10 21:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.325 Acc@5 97.009 [2024-08-10 21:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:54:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][0/625] eta 0:13:22 lr 0.000241 wd 0.0500 time 1.2840 (1.2840) data time 0.7696 (0.7696) model time 0.0000 (0.0000) loss 2.4396 (2.4396) grad_norm 1.9483 (1.9483) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:54:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][10/625] eta 0:05:33 lr 0.000241 wd 0.0500 time 0.4658 (0.5418) data time 0.0008 (0.0709) model time 0.0000 (0.0000) loss 2.1275 (2.5327) grad_norm 2.8408 (2.0893) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][20/625] eta 0:05:12 lr 0.000241 wd 0.0500 time 0.4635 (0.5162) data time 0.0008 (0.0376) model time 0.0000 (0.0000) loss 2.4486 (2.6287) grad_norm 3.2953 (3.7706) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:54:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][30/625] eta 0:04:57 lr 0.000241 wd 0.0500 time 0.4636 (0.5001) data time 0.0008 (0.0258) model time 0.0000 (0.0000) loss 2.6943 (2.6632) grad_norm 1.9014 (3.7540) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][40/625] eta 0:04:49 lr 0.000241 wd 0.0500 time 0.4618 (0.4940) data time 0.0008 (0.0198) model time 0.0000 (0.0000) loss 2.2289 (2.6859) grad_norm 1.8188 (3.3749) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][50/625] eta 0:04:41 lr 0.000240 wd 0.0500 time 0.4668 (0.4887) data time 0.0011 (0.0161) model time 0.0000 (0.0000) loss 3.0443 (2.6982) grad_norm 1.9491 (3.1202) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][60/625] eta 0:04:34 lr 0.000240 wd 0.0500 time 0.4748 (0.4854) data time 0.0011 (0.0136) model time 0.4737 (0.4674) loss 2.9783 (2.7210) grad_norm 1.2816 (2.9315) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][70/625] eta 0:04:28 lr 0.000240 wd 0.0500 time 0.4670 (0.4831) data time 0.0010 (0.0118) model time 0.4660 (0.4676) loss 2.1145 (2.7023) grad_norm 1.8754 (2.8445) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][80/625] eta 0:04:22 lr 0.000240 wd 0.0500 time 0.4692 (0.4811) data time 0.0010 (0.0105) model time 0.4682 (0.4671) loss 2.4281 (2.7091) grad_norm 2.7758 (3.0648) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][90/625] eta 0:04:16 lr 0.000240 wd 0.0500 time 0.4673 (0.4796) data time 0.0009 (0.0095) model time 0.4665 (0.4668) loss 3.0070 (2.7004) grad_norm 1.9393 (3.0198) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][100/625] eta 0:04:11 lr 0.000240 wd 0.0500 time 0.4746 (0.4783) data time 0.0008 (0.0086) model time 0.4738 (0.4666) loss 2.8336 (2.6956) grad_norm 1.8386 (2.9602) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][110/625] eta 0:04:05 lr 0.000240 wd 0.0500 time 0.4708 (0.4772) data time 0.0009 (0.0080) model time 0.4699 (0.4664) loss 3.1338 (2.7156) grad_norm 2.8167 (3.1955) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][120/625] eta 0:04:00 lr 0.000240 wd 0.0500 time 0.4656 (0.4764) data time 0.0010 (0.0074) model time 0.4646 (0.4664) loss 2.0856 (2.7007) grad_norm 2.2913 (3.1095) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][130/625] eta 0:03:55 lr 0.000240 wd 0.0500 time 0.4698 (0.4757) data time 0.0008 (0.0069) model time 0.4690 (0.4663) loss 3.4709 (2.7082) grad_norm 3.3933 (3.0598) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][140/625] eta 0:03:51 lr 0.000240 wd 0.0500 time 0.4780 (0.4769) data time 0.0008 (0.0065) model time 0.4771 (0.4691) loss 2.9203 (2.7243) grad_norm 1.9938 (3.0243) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][150/625] eta 0:03:46 lr 0.000240 wd 0.0500 time 0.4739 (0.4762) data time 0.0011 (0.0061) model time 0.4729 (0.4688) loss 2.3805 (2.7085) grad_norm 2.2587 (2.9813) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][160/625] eta 0:03:41 lr 0.000240 wd 0.0500 time 0.4660 (0.4756) data time 0.0011 (0.0058) model time 0.4650 (0.4685) loss 3.1657 (2.7134) grad_norm 21.6875 (3.0650) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][170/625] eta 0:03:36 lr 0.000239 wd 0.0500 time 0.4770 (0.4751) data time 0.0010 (0.0055) model time 0.4760 (0.4683) loss 2.1745 (2.7042) grad_norm 2.4730 (3.0181) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][180/625] eta 0:03:31 lr 0.000239 wd 0.0500 time 0.4657 (0.4747) data time 0.0008 (0.0053) model time 0.4649 (0.4682) loss 2.8560 (2.7120) grad_norm 2.9251 (2.9803) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][190/625] eta 0:03:26 lr 0.000239 wd 0.0500 time 0.4697 (0.4743) data time 0.0010 (0.0051) model time 0.4687 (0.4680) loss 2.8053 (2.7021) grad_norm 1.8163 (2.9354) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][200/625] eta 0:03:21 lr 0.000239 wd 0.0500 time 0.4638 (0.4740) data time 0.0007 (0.0049) model time 0.4631 (0.4679) loss 2.9595 (2.6990) grad_norm 2.4995 (2.9239) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][210/625] eta 0:03:16 lr 0.000239 wd 0.0500 time 0.4663 (0.4745) data time 0.0007 (0.0047) model time 0.4656 (0.4689) loss 2.3818 (2.6929) grad_norm 2.1847 (2.8755) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][220/625] eta 0:03:12 lr 0.000239 wd 0.0500 time 0.4755 (0.4742) data time 0.0007 (0.0045) model time 0.4748 (0.4688) loss 3.0250 (2.6887) grad_norm 2.1557 (2.8328) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][230/625] eta 0:03:07 lr 0.000239 wd 0.0500 time 0.4698 (0.4739) data time 0.0010 (0.0044) model time 0.4687 (0.4687) loss 2.8957 (2.6827) grad_norm 2.7280 (2.8090) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][240/625] eta 0:03:02 lr 0.000239 wd 0.0500 time 0.4667 (0.4737) data time 0.0007 (0.0042) model time 0.4660 (0.4686) loss 3.1525 (2.6858) grad_norm 1.9635 (2.7999) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][250/625] eta 0:02:57 lr 0.000239 wd 0.0500 time 0.4639 (0.4734) data time 0.0010 (0.0041) model time 0.4629 (0.4684) loss 2.5350 (2.6803) grad_norm 3.0426 (2.9658) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][260/625] eta 0:02:52 lr 0.000239 wd 0.0500 time 0.4672 (0.4737) data time 0.0007 (0.0040) model time 0.4664 (0.4690) loss 2.1019 (2.6749) grad_norm 1.9120 (2.9319) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][270/625] eta 0:02:48 lr 0.000239 wd 0.0500 time 0.4644 (0.4734) data time 0.0008 (0.0039) model time 0.4637 (0.4688) loss 2.7341 (2.6769) grad_norm 1.8361 (2.9048) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][280/625] eta 0:02:43 lr 0.000239 wd 0.0500 time 0.4659 (0.4731) data time 0.0007 (0.0038) model time 0.4652 (0.4686) loss 1.8637 (2.6726) grad_norm 2.5375 (2.8722) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:56:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][290/625] eta 0:02:38 lr 0.000238 wd 0.0500 time 0.4657 (0.4729) data time 0.0007 (0.0037) model time 0.4650 (0.4684) loss 2.6219 (2.6694) grad_norm 1.2554 (2.8395) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][300/625] eta 0:02:33 lr 0.000238 wd 0.0500 time 0.4601 (0.4727) data time 0.0008 (0.0036) model time 0.4593 (0.4683) loss 2.4163 (2.6688) grad_norm 2.0322 (2.8134) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][310/625] eta 0:02:28 lr 0.000238 wd 0.0500 time 0.4611 (0.4724) data time 0.0010 (0.0035) model time 0.4601 (0.4681) loss 2.8395 (2.6677) grad_norm 1.7920 (2.7972) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][320/625] eta 0:02:24 lr 0.000238 wd 0.0500 time 0.4695 (0.4722) data time 0.0008 (0.0034) model time 0.4687 (0.4681) loss 3.0274 (2.6637) grad_norm 2.1010 (2.7850) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][330/625] eta 0:02:19 lr 0.000238 wd 0.0500 time 0.4678 (0.4721) data time 0.0010 (0.0033) model time 0.4668 (0.4681) loss 2.3892 (2.6653) grad_norm 2.0154 (2.7658) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][340/625] eta 0:02:14 lr 0.000238 wd 0.0500 time 0.4679 (0.4720) data time 0.0007 (0.0033) model time 0.4672 (0.4680) loss 1.8134 (2.6617) grad_norm 1.7374 (2.7412) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][350/625] eta 0:02:09 lr 0.000238 wd 0.0500 time 0.4678 (0.4719) data time 0.0008 (0.0032) model time 0.4670 (0.4681) loss 2.0434 (2.6653) grad_norm 2.0465 (2.7199) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][360/625] eta 0:02:05 lr 0.000238 wd 0.0500 time 0.4735 (0.4719) data time 0.0010 (0.0031) model time 0.4725 (0.4681) loss 2.9420 (2.6702) grad_norm 2.0936 (2.7284) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][370/625] eta 0:02:00 lr 0.000238 wd 0.0500 time 0.4682 (0.4718) data time 0.0009 (0.0031) model time 0.4673 (0.4681) loss 2.5239 (2.6674) grad_norm 1.7783 (2.7423) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][380/625] eta 0:01:55 lr 0.000238 wd 0.0500 time 0.4657 (0.4717) data time 0.0010 (0.0030) model time 0.4647 (0.4681) loss 2.9622 (2.6617) grad_norm 1.7082 (2.7230) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][390/625] eta 0:01:50 lr 0.000238 wd 0.0500 time 0.4677 (0.4717) data time 0.0010 (0.0030) model time 0.4667 (0.4681) loss 2.2383 (2.6630) grad_norm 1.8989 (2.7048) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][400/625] eta 0:01:46 lr 0.000238 wd 0.0500 time 0.4101 (0.4720) data time 0.0008 (0.0029) model time 0.4092 (0.4685) loss 1.7263 (2.6617) grad_norm 1.5213 (2.7183) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][410/625] eta 0:01:41 lr 0.000237 wd 0.0500 time 0.4709 (0.4720) data time 0.0011 (0.0029) model time 0.4698 (0.4686) loss 3.0929 (2.6609) grad_norm 1.6745 (2.6992) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:57:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][420/625] eta 0:01:36 lr 0.000237 wd 0.0500 time 0.4678 (0.4719) data time 0.0008 (0.0028) model time 0.4670 (0.4686) loss 2.7886 (2.6619) grad_norm 2.4733 (2.7160) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][430/625] eta 0:01:32 lr 0.000237 wd 0.0500 time 0.4655 (0.4724) data time 0.0011 (0.0028) model time 0.4644 (0.4692) loss 2.9259 (2.6588) grad_norm 1.8328 (2.7191) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][440/625] eta 0:01:27 lr 0.000237 wd 0.0500 time 0.4711 (0.4724) data time 0.0011 (0.0028) model time 0.4700 (0.4693) loss 2.8587 (2.6569) grad_norm 2.6282 (2.7368) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][450/625] eta 0:01:22 lr 0.000237 wd 0.0500 time 0.4708 (0.4724) data time 0.0010 (0.0027) model time 0.4698 (0.4693) loss 3.3009 (2.6563) grad_norm 1.7131 (2.7280) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][460/625] eta 0:01:17 lr 0.000237 wd 0.0500 time 0.4635 (0.4723) data time 0.0011 (0.0027) model time 0.4624 (0.4693) loss 2.8541 (2.6573) grad_norm 1.7216 (2.7534) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][470/625] eta 0:01:13 lr 0.000237 wd 0.0500 time 0.4644 (0.4723) data time 0.0010 (0.0026) model time 0.4633 (0.4692) loss 2.3907 (2.6537) grad_norm 3.0109 (2.7870) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][480/625] eta 0:01:08 lr 0.000237 wd 0.0500 time 0.4700 (0.4726) data time 0.0007 (0.0026) model time 0.4692 (0.4697) loss 3.4337 (2.6568) grad_norm 2.0039 (2.7895) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][490/625] eta 0:01:03 lr 0.000237 wd 0.0500 time 0.4688 (0.4726) data time 0.0010 (0.0026) model time 0.4678 (0.4697) loss 2.8562 (2.6577) grad_norm 1.2561 (2.7760) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][500/625] eta 0:00:59 lr 0.000237 wd 0.0500 time 0.4665 (0.4725) data time 0.0010 (0.0025) model time 0.4655 (0.4697) loss 3.0739 (2.6563) grad_norm 2.0686 (2.7645) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][510/625] eta 0:00:54 lr 0.000237 wd 0.0500 time 0.4674 (0.4725) data time 0.0011 (0.0025) model time 0.4663 (0.4696) loss 2.1309 (2.6548) grad_norm 8.4257 (2.7715) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][520/625] eta 0:00:49 lr 0.000237 wd 0.0500 time 0.4688 (0.4724) data time 0.0007 (0.0025) model time 0.4681 (0.4696) loss 3.1659 (2.6584) grad_norm 1.9509 (2.7575) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][530/625] eta 0:00:44 lr 0.000236 wd 0.0500 time 0.4708 (0.4724) data time 0.0007 (0.0025) model time 0.4701 (0.4697) loss 2.4479 (2.6632) grad_norm 2.4684 (2.7476) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][540/625] eta 0:00:40 lr 0.000236 wd 0.0500 time 0.4679 (0.4724) data time 0.0009 (0.0024) model time 0.4670 (0.4697) loss 2.3672 (2.6609) grad_norm 1.6571 (2.7381) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][550/625] eta 0:00:35 lr 0.000236 wd 0.0500 time 0.4649 (0.4723) data time 0.0011 (0.0024) model time 0.4638 (0.4697) loss 1.7922 (2.6581) grad_norm 1.7439 (2.7241) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][560/625] eta 0:00:30 lr 0.000236 wd 0.0500 time 0.4727 (0.4723) data time 0.0008 (0.0024) model time 0.4720 (0.4696) loss 2.5554 (2.6586) grad_norm 15.4375 (2.7544) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][570/625] eta 0:00:25 lr 0.000236 wd 0.0500 time 0.4723 (0.4723) data time 0.0013 (0.0024) model time 0.4710 (0.4697) loss 2.7884 (2.6585) grad_norm 2.0946 (2.7488) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][580/625] eta 0:00:21 lr 0.000236 wd 0.0500 time 0.4673 (0.4727) data time 0.0011 (0.0023) model time 0.4662 (0.4701) loss 2.6958 (2.6606) grad_norm 2.5412 (2.7458) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][590/625] eta 0:00:16 lr 0.000236 wd 0.0500 time 0.4724 (0.4727) data time 0.0011 (0.0023) model time 0.4713 (0.4701) loss 2.7636 (2.6603) grad_norm 3.0921 (2.7489) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][600/625] eta 0:00:11 lr 0.000236 wd 0.0500 time 0.4671 (0.4726) data time 0.0009 (0.0023) model time 0.4663 (0.4701) loss 1.9812 (2.6610) grad_norm 2.8470 (2.7544) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][610/625] eta 0:00:07 lr 0.000236 wd 0.0500 time 0.4686 (0.4726) data time 0.0005 (0.0023) model time 0.4681 (0.4701) loss 2.4355 (2.6601) grad_norm 3.4904 (2.7549) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [219/300][620/625] eta 0:00:02 lr 0.000236 wd 0.0500 time 0.4658 (0.4725) data time 0.0008 (0.0023) model time 0.4651 (0.4700) loss 3.0219 (2.6601) grad_norm 1.9353 (2.7442) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 219 training takes 0:04:55 [2024-08-10 21:59:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 21:59:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 21:59:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.566 (0.566) Loss 0.5093 (0.5093) Acc@1 88.818 (88.818) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:59:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.165) Loss 0.8306 (0.6200) Acc@1 80.273 (86.705) Acc@5 95.898 (97.781) Mem 16706MB [2024-08-10 21:59:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.143) Loss 0.9209 (0.7360) Acc@1 78.711 (83.801) Acc@5 95.557 (96.633) Mem 16706MB [2024-08-10 21:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.541 Acc@5 96.633 [2024-08-10 21:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 21:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.54% [2024-08-10 21:59:40 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-10 21:59:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-10 21:59:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.4780 (0.4780) Acc@1 89.404 (89.404) Acc@5 98.926 (98.926) Mem 16706MB [2024-08-10 21:59:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.161) Loss 0.7676 (0.5887) Acc@1 81.250 (87.274) Acc@5 96.729 (97.940) Mem 16706MB [2024-08-10 21:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.8486 (0.6910) Acc@1 80.078 (84.608) Acc@5 96.094 (97.008) Mem 16706MB [2024-08-10 21:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.325 Acc@5 96.997 [2024-08-10 21:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 21:59:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][0/625] eta 0:12:44 lr 0.000236 wd 0.0500 time 1.2236 (1.2236) data time 0.5387 (0.5387) model time 0.0000 (0.0000) loss 2.6398 (2.6398) grad_norm 1.8036 (1.8036) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][10/625] eta 0:05:40 lr 0.000236 wd 0.0500 time 0.4755 (0.5540) data time 0.0009 (0.0501) model time 0.0000 (0.0000) loss 2.5553 (2.7249) grad_norm 1.6798 (1.9597) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 21:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][20/625] eta 0:05:11 lr 0.000235 wd 0.0500 time 0.4670 (0.5147) data time 0.0011 (0.0267) model time 0.0000 (0.0000) loss 2.7167 (2.6839) grad_norm 1.7967 (1.9830) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][30/625] eta 0:04:57 lr 0.000235 wd 0.0500 time 0.4623 (0.5004) data time 0.0011 (0.0185) model time 0.0000 (0.0000) loss 2.8079 (2.7205) grad_norm 1.6342 (1.9244) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][40/625] eta 0:04:47 lr 0.000235 wd 0.0500 time 0.4600 (0.4914) data time 0.0008 (0.0142) model time 0.0000 (0.0000) loss 3.0419 (2.6845) grad_norm 1.9621 (2.0413) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][50/625] eta 0:04:39 lr 0.000235 wd 0.0500 time 0.4668 (0.4862) data time 0.0010 (0.0116) model time 0.0000 (0.0000) loss 2.8120 (2.6515) grad_norm 2.5490 (2.3342) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][60/625] eta 0:04:32 lr 0.000235 wd 0.0500 time 0.4637 (0.4826) data time 0.0008 (0.0099) model time 0.4629 (0.4632) loss 1.8307 (2.6198) grad_norm 1.9848 (2.3159) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][70/625] eta 0:04:28 lr 0.000235 wd 0.0500 time 0.4697 (0.4835) data time 0.0010 (0.0087) model time 0.4687 (0.4754) loss 3.0415 (2.6520) grad_norm 2.2811 (2.3361) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][80/625] eta 0:04:22 lr 0.000235 wd 0.0500 time 0.4662 (0.4813) data time 0.0007 (0.0077) model time 0.4655 (0.4720) loss 1.8047 (2.6526) grad_norm 1.6296 (3.0970) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][90/625] eta 0:04:16 lr 0.000235 wd 0.0500 time 0.4664 (0.4798) data time 0.0011 (0.0070) model time 0.4654 (0.4706) loss 2.6146 (2.6533) grad_norm 1.8731 (3.0285) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][100/625] eta 0:04:11 lr 0.000235 wd 0.0500 time 0.4667 (0.4785) data time 0.0011 (0.0064) model time 0.4656 (0.4695) loss 2.4194 (2.6652) grad_norm 1.4493 (2.9604) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][110/625] eta 0:04:05 lr 0.000235 wd 0.0500 time 0.4673 (0.4774) data time 0.0011 (0.0059) model time 0.4662 (0.4689) loss 2.6437 (2.6652) grad_norm 1.9261 (2.9145) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][120/625] eta 0:04:00 lr 0.000235 wd 0.0500 time 0.4703 (0.4767) data time 0.0007 (0.0055) model time 0.4695 (0.4687) loss 2.7751 (2.6724) grad_norm 2.7718 (2.8469) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][130/625] eta 0:03:56 lr 0.000235 wd 0.0500 time 0.4693 (0.4777) data time 0.0008 (0.0052) model time 0.4685 (0.4713) loss 2.6305 (2.6840) grad_norm 1.3544 (2.8269) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][140/625] eta 0:03:51 lr 0.000234 wd 0.0500 time 0.4830 (0.4772) data time 0.0008 (0.0049) model time 0.4822 (0.4710) loss 2.8815 (2.6980) grad_norm 1.9386 (2.7995) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:00:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][150/625] eta 0:03:46 lr 0.000234 wd 0.0500 time 0.4793 (0.4768) data time 0.0010 (0.0046) model time 0.4783 (0.4710) loss 2.5519 (2.7007) grad_norm 2.2422 (2.7615) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][160/625] eta 0:03:41 lr 0.000234 wd 0.0500 time 0.4627 (0.4763) data time 0.0007 (0.0044) model time 0.4620 (0.4707) loss 2.0680 (2.7035) grad_norm 2.5500 (2.7733) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][170/625] eta 0:03:36 lr 0.000234 wd 0.0500 time 0.4709 (0.4759) data time 0.0010 (0.0042) model time 0.4699 (0.4704) loss 2.7636 (2.6810) grad_norm 2.2846 (2.7688) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][180/625] eta 0:03:31 lr 0.000234 wd 0.0500 time 0.4751 (0.4755) data time 0.0011 (0.0040) model time 0.4740 (0.4703) loss 2.4569 (2.6830) grad_norm 1.9557 (2.7434) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][190/625] eta 0:03:26 lr 0.000234 wd 0.0500 time 0.4633 (0.4750) data time 0.0010 (0.0039) model time 0.4623 (0.4699) loss 2.9550 (2.6836) grad_norm 2.4657 (2.7157) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][200/625] eta 0:03:21 lr 0.000234 wd 0.0500 time 0.4659 (0.4746) data time 0.0010 (0.0037) model time 0.4649 (0.4696) loss 2.8438 (2.6822) grad_norm 1.4855 (2.6827) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][210/625] eta 0:03:16 lr 0.000234 wd 0.0500 time 0.4686 (0.4742) data time 0.0008 (0.0036) model time 0.4678 (0.4693) loss 2.7172 (2.6918) grad_norm 1.6780 (2.6667) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][220/625] eta 0:03:11 lr 0.000234 wd 0.0500 time 0.4654 (0.4739) data time 0.0010 (0.0035) model time 0.4644 (0.4692) loss 1.8577 (2.6933) grad_norm 1.6385 (2.6859) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][230/625] eta 0:03:07 lr 0.000234 wd 0.0500 time 0.4716 (0.4738) data time 0.0012 (0.0034) model time 0.4704 (0.4692) loss 1.8154 (2.6811) grad_norm 1.7301 (2.6503) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][240/625] eta 0:03:02 lr 0.000234 wd 0.0500 time 0.4698 (0.4736) data time 0.0008 (0.0033) model time 0.4689 (0.4691) loss 3.2066 (2.6889) grad_norm 2.0004 (2.7953) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][250/625] eta 0:02:57 lr 0.000234 wd 0.0500 time 0.4679 (0.4733) data time 0.0008 (0.0032) model time 0.4671 (0.4690) loss 2.9882 (2.6951) grad_norm 2.2272 (2.7686) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][260/625] eta 0:02:52 lr 0.000233 wd 0.0500 time 0.4654 (0.4738) data time 0.0011 (0.0031) model time 0.4643 (0.4697) loss 2.8737 (2.6941) grad_norm 3.0987 (2.7493) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][270/625] eta 0:02:48 lr 0.000233 wd 0.0500 time 0.4653 (0.4735) data time 0.0010 (0.0030) model time 0.4643 (0.4696) loss 2.8278 (2.6952) grad_norm 1.5939 (2.7336) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][280/625] eta 0:02:43 lr 0.000233 wd 0.0500 time 0.4672 (0.4734) data time 0.0008 (0.0030) model time 0.4664 (0.4695) loss 2.7479 (2.6946) grad_norm 1.7047 (2.7295) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][290/625] eta 0:02:38 lr 0.000233 wd 0.0500 time 0.4644 (0.4731) data time 0.0008 (0.0029) model time 0.4637 (0.4693) loss 2.3691 (2.6892) grad_norm 2.6956 (2.7087) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][300/625] eta 0:02:33 lr 0.000233 wd 0.0500 time 0.4646 (0.4729) data time 0.0008 (0.0028) model time 0.4638 (0.4692) loss 1.8900 (2.6839) grad_norm 2.0962 (2.7502) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][310/625] eta 0:02:28 lr 0.000233 wd 0.0500 time 0.4672 (0.4728) data time 0.0010 (0.0028) model time 0.4662 (0.4692) loss 2.2086 (2.6801) grad_norm 1.9717 (2.7504) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][320/625] eta 0:02:24 lr 0.000233 wd 0.0500 time 0.4687 (0.4727) data time 0.0010 (0.0027) model time 0.4677 (0.4691) loss 3.1253 (2.6791) grad_norm 2.1499 (2.7288) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][330/625] eta 0:02:19 lr 0.000233 wd 0.0500 time 0.4765 (0.4725) data time 0.0009 (0.0027) model time 0.4756 (0.4690) loss 2.7584 (2.6804) grad_norm 1.6945 (2.7065) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][340/625] eta 0:02:14 lr 0.000233 wd 0.0500 time 0.4640 (0.4728) data time 0.0008 (0.0026) model time 0.4632 (0.4694) loss 2.6941 (2.6849) grad_norm 2.3326 (2.6787) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][350/625] eta 0:02:09 lr 0.000233 wd 0.0500 time 0.4645 (0.4727) data time 0.0008 (0.0026) model time 0.4637 (0.4694) loss 2.6473 (2.6839) grad_norm 2.3127 (2.6663) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][360/625] eta 0:02:05 lr 0.000233 wd 0.0500 time 0.4656 (0.4725) data time 0.0011 (0.0025) model time 0.4645 (0.4692) loss 2.2626 (2.6853) grad_norm 2.1312 (2.6501) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][370/625] eta 0:02:00 lr 0.000233 wd 0.0500 time 0.4714 (0.4725) data time 0.0007 (0.0025) model time 0.4707 (0.4692) loss 2.3988 (2.6809) grad_norm 2.0106 (2.6444) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][380/625] eta 0:01:55 lr 0.000232 wd 0.0500 time 0.4681 (0.4723) data time 0.0010 (0.0025) model time 0.4671 (0.4692) loss 2.2248 (2.6743) grad_norm 4.2768 (2.6399) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][390/625] eta 0:01:50 lr 0.000232 wd 0.0500 time 0.4668 (0.4722) data time 0.0009 (0.0024) model time 0.4659 (0.4691) loss 2.3686 (2.6758) grad_norm 1.9644 (2.6229) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][400/625] eta 0:01:46 lr 0.000232 wd 0.0500 time 0.4672 (0.4722) data time 0.0008 (0.0024) model time 0.4664 (0.4691) loss 2.5875 (2.6742) grad_norm 3.7904 (2.6120) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:02:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][410/625] eta 0:01:41 lr 0.000232 wd 0.0500 time 0.4707 (0.4721) data time 0.0011 (0.0023) model time 0.4696 (0.4690) loss 1.7736 (2.6673) grad_norm 2.2252 (2.5944) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][420/625] eta 0:01:36 lr 0.000232 wd 0.0500 time 0.4681 (0.4720) data time 0.0010 (0.0023) model time 0.4671 (0.4690) loss 2.3577 (2.6632) grad_norm 1.5271 (2.5737) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][430/625] eta 0:01:32 lr 0.000232 wd 0.0500 time 0.4623 (0.4719) data time 0.0011 (0.0023) model time 0.4612 (0.4689) loss 2.8214 (2.6653) grad_norm 1.8032 (2.5657) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][440/625] eta 0:01:27 lr 0.000232 wd 0.0500 time 0.4632 (0.4718) data time 0.0009 (0.0023) model time 0.4624 (0.4689) loss 3.1152 (2.6701) grad_norm 2.3542 (2.5511) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][450/625] eta 0:01:22 lr 0.000232 wd 0.0500 time 0.4713 (0.4717) data time 0.0011 (0.0022) model time 0.4703 (0.4688) loss 2.9415 (2.6717) grad_norm 2.4653 (2.5437) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][460/625] eta 0:01:17 lr 0.000232 wd 0.0500 time 0.4717 (0.4722) data time 0.0010 (0.0022) model time 0.4707 (0.4694) loss 2.5486 (2.6719) grad_norm 2.3542 (2.5348) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][470/625] eta 0:01:13 lr 0.000232 wd 0.0500 time 0.4681 (0.4721) data time 0.0008 (0.0022) model time 0.4673 (0.4693) loss 3.1761 (2.6716) grad_norm 2.0575 (2.5254) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][480/625] eta 0:01:08 lr 0.000232 wd 0.0500 time 0.4630 (0.4724) data time 0.0011 (0.0022) model time 0.4618 (0.4698) loss 2.2258 (2.6632) grad_norm 2.1152 (2.5140) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][490/625] eta 0:01:03 lr 0.000232 wd 0.0500 time 0.4696 (0.4723) data time 0.0008 (0.0021) model time 0.4687 (0.4697) loss 3.0065 (2.6629) grad_norm 1.6025 (2.4997) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][500/625] eta 0:00:59 lr 0.000231 wd 0.0500 time 0.4681 (0.4722) data time 0.0008 (0.0021) model time 0.4674 (0.4696) loss 2.2940 (2.6594) grad_norm 2.7661 (2.4936) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][510/625] eta 0:00:54 lr 0.000231 wd 0.0500 time 0.4624 (0.4721) data time 0.0010 (0.0021) model time 0.4614 (0.4695) loss 2.8502 (2.6634) grad_norm 2.9340 (2.4943) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][520/625] eta 0:00:49 lr 0.000231 wd 0.0500 time 0.4640 (0.4721) data time 0.0010 (0.0021) model time 0.4629 (0.4695) loss 3.0091 (2.6668) grad_norm 2.9500 (2.5004) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][530/625] eta 0:00:44 lr 0.000231 wd 0.0500 time 0.4696 (0.4721) data time 0.0008 (0.0021) model time 0.4688 (0.4696) loss 2.8151 (2.6686) grad_norm 1.9878 (2.4945) loss_scale 128.0000 (128.0000) mem 16706MB [2024-08-10 22:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 22:03:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 22:03:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 22:05:36 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 22:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 22:05:49 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 22:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 22:06:01 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 22:06:04 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 22:06:06 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 22:06:06 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 220) [2024-08-10 22:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 22:06:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][540/625] eta 0:03:13 lr 0.000231 wd 0.0500 time 0.4392 (2.2788) data time 0.0009 (0.0790) model time 0.4383 (2.1999) loss 2.7279 (2.8653) grad_norm 1.8535 (2.6580) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:06:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][550/625] eta 0:01:42 lr 0.000231 wd 0.0500 time 0.4413 (1.3609) data time 0.0006 (0.0399) model time 0.4407 (1.3211) loss 3.1496 (2.7715) grad_norm 3.1629 (2.4155) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:06:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][560/625] eta 0:01:09 lr 0.000231 wd 0.0500 time 0.4412 (1.0637) data time 0.0009 (0.0269) model time 0.4403 (1.0368) loss 3.1299 (2.8560) grad_norm 2.0462 (2.4682) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:06:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][570/625] eta 0:00:50 lr 0.000231 wd 0.0500 time 0.4412 (0.9131) data time 0.0006 (0.0203) model time 0.4406 (0.8927) loss 2.2551 (2.7815) grad_norm 2.0871 (2.5854) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:06:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][580/625] eta 0:00:36 lr 0.000231 wd 0.0500 time 0.4446 (0.8192) data time 0.0008 (0.0164) model time 0.4438 (0.8028) loss 2.2985 (2.7826) grad_norm 2.1819 (2.5156) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][590/625] eta 0:00:26 lr 0.000231 wd 0.0500 time 0.4467 (0.7569) data time 0.0007 (0.0138) model time 0.4461 (0.7431) loss 2.9585 (2.7750) grad_norm 5.6709 (2.8079) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][600/625] eta 0:00:17 lr 0.000231 wd 0.0500 time 0.4425 (0.7122) data time 0.0006 (0.0120) model time 0.4418 (0.7002) loss 1.9218 (2.7456) grad_norm 1.6184 (2.6786) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][610/625] eta 0:00:10 lr 0.000231 wd 0.0500 time 0.4400 (0.6787) data time 0.0006 (0.0106) model time 0.4394 (0.6680) loss 2.9465 (2.7395) grad_norm 1.8429 (2.6348) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [220/300][620/625] eta 0:00:03 lr 0.000231 wd 0.0500 time 0.4375 (0.6519) data time 0.0004 (0.0095) model time 0.4371 (0.6424) loss 3.0959 (2.7241) grad_norm 2.1259 (2.5764) loss_scale 128.0000 (128.0000) mem 16711MB [2024-08-10 22:07:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 220 training takes 0:01:00 [2024-08-10 22:07:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 22:07:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 22:07:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.4963 (0.4963) Acc@1 89.307 (89.307) Acc@5 99.121 (99.121) Mem 16711MB [2024-08-10 22:07:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8403 (0.6183) Acc@1 79.492 (86.581) Acc@5 96.143 (97.781) Mem 16711MB [2024-08-10 22:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9121 (0.7308) Acc@1 78.027 (83.738) Acc@5 95.361 (96.719) Mem 16711MB [2024-08-10 22:07:20 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.479 Acc@5 96.713 [2024-08-10 22:07:20 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 22:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.751 (0.751) Loss 0.4780 (0.4780) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16711MB [2024-08-10 22:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.179) Loss 0.7700 (0.5892) Acc@1 81.299 (87.269) Acc@5 96.631 (97.940) Mem 16711MB [2024-08-10 22:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.149) Loss 0.8506 (0.6913) Acc@1 79.932 (84.587) Acc@5 96.094 (97.019) Mem 16711MB [2024-08-10 22:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.291 Acc@5 97.005 [2024-08-10 22:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 22:07:24 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.29% [2024-08-10 22:07:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 22:07:29 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 22:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][0/625] eta 0:09:29 lr 0.000230 wd 0.0500 time 0.9114 (0.9114) data time 0.4241 (0.4241) model time 0.0000 (0.0000) loss 2.7993 (2.7993) grad_norm 2.8537 (2.8537) loss_scale 128.0000 (128.0000) mem 16714MB [2024-08-10 22:07:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][10/625] eta 0:04:58 lr 0.000230 wd 0.0500 time 0.4393 (0.4847) data time 0.0008 (0.0392) model time 0.0000 (0.0000) loss 2.4105 (2.8742) grad_norm 1.6624 (3.1638) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:07:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][20/625] eta 0:04:40 lr 0.000230 wd 0.0500 time 0.4366 (0.4643) data time 0.0007 (0.0209) model time 0.0000 (0.0000) loss 2.1963 (2.7143) grad_norm 3.3031 (3.3529) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][30/625] eta 0:04:32 lr 0.000230 wd 0.0500 time 0.4395 (0.4572) data time 0.0007 (0.0145) model time 0.0000 (0.0000) loss 2.5868 (2.7316) grad_norm 3.4890 (3.0181) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][40/625] eta 0:04:25 lr 0.000230 wd 0.0500 time 0.4476 (0.4538) data time 0.0006 (0.0111) model time 0.0000 (0.0000) loss 2.2803 (2.7290) grad_norm 1.8215 (2.8851) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][50/625] eta 0:04:19 lr 0.000230 wd 0.0500 time 0.4388 (0.4517) data time 0.0009 (0.0091) model time 0.0000 (0.0000) loss 2.8487 (2.7040) grad_norm 1.6230 (2.7225) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][60/625] eta 0:04:14 lr 0.000230 wd 0.0500 time 0.4426 (0.4503) data time 0.0008 (0.0077) model time 0.4417 (0.4427) loss 2.6773 (2.6839) grad_norm 2.2680 (2.6162) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][70/625] eta 0:04:09 lr 0.000230 wd 0.0500 time 0.4423 (0.4493) data time 0.0008 (0.0068) model time 0.4415 (0.4425) loss 2.7644 (2.6854) grad_norm 2.0139 (2.7330) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][80/625] eta 0:04:04 lr 0.000230 wd 0.0500 time 0.4397 (0.4485) data time 0.0006 (0.0060) model time 0.4391 (0.4423) loss 2.7539 (2.6771) grad_norm 2.0236 (2.8178) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][90/625] eta 0:03:59 lr 0.000230 wd 0.0500 time 0.4431 (0.4478) data time 0.0008 (0.0055) model time 0.4423 (0.4421) loss 2.5336 (2.6611) grad_norm 1.9158 (2.7672) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][100/625] eta 0:03:54 lr 0.000230 wd 0.0500 time 0.4460 (0.4474) data time 0.0008 (0.0050) model time 0.4452 (0.4423) loss 1.6498 (2.6492) grad_norm 3.8249 (2.7111) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][110/625] eta 0:03:50 lr 0.000230 wd 0.0500 time 0.4424 (0.4470) data time 0.0008 (0.0046) model time 0.4416 (0.4423) loss 2.6617 (2.6389) grad_norm 1.4817 (2.6348) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][120/625] eta 0:03:47 lr 0.000229 wd 0.0500 time 0.4444 (0.4497) data time 0.0008 (0.0043) model time 0.4437 (0.4474) loss 2.9759 (2.6332) grad_norm 1.4968 (2.5672) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][130/625] eta 0:03:42 lr 0.000229 wd 0.0500 time 0.4420 (0.4492) data time 0.0007 (0.0040) model time 0.4413 (0.4469) loss 2.8326 (2.6414) grad_norm 4.6195 (2.5533) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][140/625] eta 0:03:37 lr 0.000229 wd 0.0500 time 0.4447 (0.4489) data time 0.0009 (0.0038) model time 0.4438 (0.4466) loss 2.9661 (2.6364) grad_norm 2.8292 (2.5819) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][150/625] eta 0:03:33 lr 0.000229 wd 0.0500 time 0.4487 (0.4486) data time 0.0008 (0.0036) model time 0.4478 (0.4462) loss 2.6833 (2.6363) grad_norm 1.8317 (2.5453) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][160/625] eta 0:03:28 lr 0.000229 wd 0.0500 time 0.4459 (0.4483) data time 0.0006 (0.0035) model time 0.4453 (0.4459) loss 2.0994 (2.6219) grad_norm 2.0107 (2.5241) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][170/625] eta 0:03:23 lr 0.000229 wd 0.0500 time 0.4441 (0.4480) data time 0.0006 (0.0033) model time 0.4434 (0.4456) loss 1.7815 (2.6054) grad_norm 3.1926 (2.5727) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][180/625] eta 0:03:19 lr 0.000229 wd 0.0500 time 0.4431 (0.4478) data time 0.0008 (0.0032) model time 0.4423 (0.4454) loss 2.6708 (2.6059) grad_norm 1.8846 (2.5536) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][190/625] eta 0:03:14 lr 0.000229 wd 0.0500 time 0.4435 (0.4476) data time 0.0009 (0.0031) model time 0.4425 (0.4452) loss 2.8939 (2.6081) grad_norm 1.7242 (2.5622) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][200/625] eta 0:03:10 lr 0.000229 wd 0.0500 time 0.4483 (0.4474) data time 0.0008 (0.0029) model time 0.4475 (0.4451) loss 2.8617 (2.6040) grad_norm 1.9376 (2.5314) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][210/625] eta 0:03:05 lr 0.000229 wd 0.0500 time 0.4422 (0.4473) data time 0.0008 (0.0028) model time 0.4414 (0.4450) loss 1.9789 (2.5935) grad_norm 5.8625 (2.5416) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][220/625] eta 0:03:01 lr 0.000229 wd 0.0500 time 0.4420 (0.4471) data time 0.0008 (0.0028) model time 0.4412 (0.4448) loss 3.1298 (2.6003) grad_norm 3.5460 (2.7757) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][230/625] eta 0:02:56 lr 0.000229 wd 0.0500 time 0.4434 (0.4469) data time 0.0006 (0.0027) model time 0.4428 (0.4447) loss 2.9344 (2.6096) grad_norm 2.4151 (2.8004) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][240/625] eta 0:02:52 lr 0.000228 wd 0.0500 time 0.4444 (0.4468) data time 0.0006 (0.0026) model time 0.4438 (0.4446) loss 2.8800 (2.6076) grad_norm 1.9617 (2.7789) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][250/625] eta 0:02:47 lr 0.000228 wd 0.0500 time 0.4416 (0.4467) data time 0.0006 (0.0025) model time 0.4410 (0.4445) loss 2.3522 (2.6141) grad_norm 2.1422 (2.7576) loss_scale 128.0000 (128.0000) mem 16717MB [2024-08-10 22:09:24 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 22:09:24 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 22:09:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 22:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 22:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 22:59:19 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 22:59:30 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 22:59:30 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 22:59:32 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 22:59:34 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 22:59:34 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 221) [2024-08-10 22:59:34 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 22:59:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][260/625] eta 0:23:33 lr 0.000228 wd 0.0500 time 0.4494 (3.8730) data time 0.0006 (0.1784) model time 0.4488 (3.6945) loss 3.1640 (3.1587) grad_norm 1.8673 (2.2386) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][270/625] eta 0:09:23 lr 0.000228 wd 0.0500 time 0.4435 (1.5883) data time 0.0010 (0.0601) model time 0.4425 (1.5282) loss 2.7799 (2.9265) grad_norm 3.3863 (2.3378) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][280/625] eta 0:06:30 lr 0.000228 wd 0.0500 time 0.4434 (1.1309) data time 0.0009 (0.0365) model time 0.4425 (1.0945) loss 2.9029 (2.9145) grad_norm 1.7479 (2.3111) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][290/625] eta 0:05:15 lr 0.000228 wd 0.0500 time 0.4474 (0.9422) data time 0.0009 (0.0263) model time 0.4464 (0.9159) loss 3.0371 (2.9183) grad_norm 1.5878 (2.2211) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][300/625] eta 0:04:31 lr 0.000228 wd 0.0500 time 0.4440 (0.8360) data time 0.0009 (0.0207) model time 0.4431 (0.8153) loss 2.9098 (2.8640) grad_norm 1.9168 (2.2880) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][310/625] eta 0:04:00 lr 0.000228 wd 0.0500 time 0.4477 (0.7649) data time 0.0006 (0.0171) model time 0.4471 (0.7478) loss 1.9789 (2.8290) grad_norm 2.1550 (2.2137) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][320/625] eta 0:03:38 lr 0.000228 wd 0.0500 time 0.4512 (0.7160) data time 0.0009 (0.0146) model time 0.4503 (0.7014) loss 3.0328 (2.8229) grad_norm 1.6566 (2.1774) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][330/625] eta 0:03:20 lr 0.000228 wd 0.0500 time 0.4494 (0.6802) data time 0.0009 (0.0128) model time 0.4485 (0.6674) loss 2.1413 (2.7814) grad_norm 2.6923 (2.1682) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][340/625] eta 0:03:06 lr 0.000228 wd 0.0500 time 0.4495 (0.6528) data time 0.0006 (0.0114) model time 0.4488 (0.6414) loss 2.7139 (2.7660) grad_norm 2.5797 (2.1567) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][350/625] eta 0:02:53 lr 0.000228 wd 0.0500 time 0.4465 (0.6312) data time 0.0008 (0.0103) model time 0.4456 (0.6209) loss 2.9315 (2.7580) grad_norm 2.3308 (2.4465) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][360/625] eta 0:02:42 lr 0.000227 wd 0.0500 time 0.4490 (0.6137) data time 0.0009 (0.0094) model time 0.4481 (0.6043) loss 2.6703 (2.7774) grad_norm 1.5626 (2.4566) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][370/625] eta 0:02:32 lr 0.000227 wd 0.0500 time 0.4457 (0.5991) data time 0.0007 (0.0087) model time 0.4450 (0.5904) loss 1.9951 (2.7707) grad_norm 2.2147 (2.4128) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][380/625] eta 0:02:23 lr 0.000227 wd 0.0500 time 0.4475 (0.5869) data time 0.0006 (0.0080) model time 0.4469 (0.5788) loss 2.4821 (2.7684) grad_norm 3.9069 (2.4544) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][390/625] eta 0:02:15 lr 0.000227 wd 0.0500 time 0.4473 (0.5765) data time 0.0006 (0.0075) model time 0.4466 (0.5690) loss 2.5658 (2.7663) grad_norm 2.6107 (2.4995) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][400/625] eta 0:02:07 lr 0.000227 wd 0.0500 time 0.4489 (0.5676) data time 0.0008 (0.0071) model time 0.4481 (0.5605) loss 3.1258 (2.7575) grad_norm 3.4821 (2.4936) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][410/625] eta 0:02:00 lr 0.000227 wd 0.0500 time 0.4488 (0.5599) data time 0.0009 (0.0067) model time 0.4479 (0.5532) loss 2.9729 (2.7527) grad_norm 2.3661 (2.4695) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][420/625] eta 0:01:53 lr 0.000227 wd 0.0500 time 0.4434 (0.5531) data time 0.0008 (0.0063) model time 0.4425 (0.5468) loss 3.1045 (2.7478) grad_norm 1.9382 (2.4579) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][430/625] eta 0:01:46 lr 0.000227 wd 0.0500 time 0.4449 (0.5470) data time 0.0007 (0.0060) model time 0.4442 (0.5410) loss 2.6670 (2.7373) grad_norm 6.3214 (2.4867) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][440/625] eta 0:01:40 lr 0.000227 wd 0.0500 time 0.4460 (0.5424) data time 0.0009 (0.0057) model time 0.4451 (0.5366) loss 2.7170 (2.7247) grad_norm 3.0883 (2.8370) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][450/625] eta 0:01:34 lr 0.000227 wd 0.0500 time 0.4460 (0.5374) data time 0.0009 (0.0055) model time 0.4450 (0.5319) loss 1.7586 (2.7159) grad_norm 2.0575 (2.8017) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][460/625] eta 0:01:27 lr 0.000227 wd 0.0500 time 0.4471 (0.5329) data time 0.0008 (0.0053) model time 0.4463 (0.5277) loss 2.3247 (2.7076) grad_norm 1.5041 (2.8113) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][470/625] eta 0:01:21 lr 0.000227 wd 0.0500 time 0.4493 (0.5290) data time 0.0008 (0.0051) model time 0.4485 (0.5239) loss 2.4423 (2.7001) grad_norm 2.4333 (2.7718) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][480/625] eta 0:01:16 lr 0.000227 wd 0.0500 time 0.4487 (0.5254) data time 0.0006 (0.0049) model time 0.4481 (0.5205) loss 2.8078 (2.6995) grad_norm 1.9178 (2.7620) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-10 23:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][490/625] eta 0:01:10 lr 0.000226 wd 0.0500 time 0.4445 (0.5221) data time 0.0009 (0.0047) model time 0.4436 (0.5174) loss 2.9338 (2.6929) grad_norm 1.9530 (2.7359) loss_scale 256.0000 (130.7234) mem 16695MB [2024-08-10 23:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][500/625] eta 0:01:04 lr 0.000226 wd 0.0500 time 0.4445 (0.5191) data time 0.0008 (0.0046) model time 0.4436 (0.5145) loss 2.6563 (2.6922) grad_norm 9.7149 (2.7444) loss_scale 256.0000 (135.8367) mem 16695MB [2024-08-10 23:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][510/625] eta 0:00:59 lr 0.000226 wd 0.0500 time 0.4471 (0.5162) data time 0.0007 (0.0044) model time 0.4464 (0.5118) loss 2.4109 (2.6863) grad_norm 2.0100 (2.7340) loss_scale 256.0000 (140.5490) mem 16695MB [2024-08-10 23:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][520/625] eta 0:00:53 lr 0.000226 wd 0.0500 time 0.4485 (0.5136) data time 0.0006 (0.0043) model time 0.4479 (0.5093) loss 2.0293 (2.6771) grad_norm 2.0936 (2.7122) loss_scale 256.0000 (144.9057) mem 16695MB [2024-08-10 23:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][530/625] eta 0:00:48 lr 0.000226 wd 0.0500 time 0.4475 (0.5112) data time 0.0009 (0.0042) model time 0.4467 (0.5071) loss 2.6225 (2.6736) grad_norm 1.8307 (2.6918) loss_scale 256.0000 (148.9455) mem 16695MB [2024-08-10 23:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][540/625] eta 0:00:43 lr 0.000226 wd 0.0500 time 0.4468 (0.5091) data time 0.0008 (0.0041) model time 0.4460 (0.5050) loss 2.9098 (2.6744) grad_norm 3.1223 (2.6711) loss_scale 256.0000 (152.7018) mem 16695MB [2024-08-10 23:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][550/625] eta 0:00:38 lr 0.000226 wd 0.0500 time 0.4459 (0.5070) data time 0.0008 (0.0040) model time 0.4451 (0.5030) loss 2.6203 (2.6700) grad_norm 2.4213 (2.6471) loss_scale 256.0000 (156.2034) mem 16695MB [2024-08-10 23:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][560/625] eta 0:00:32 lr 0.000226 wd 0.0500 time 0.4481 (0.5050) data time 0.0008 (0.0039) model time 0.4473 (0.5012) loss 2.1458 (2.6648) grad_norm 2.1172 (2.6494) loss_scale 256.0000 (159.4754) mem 16695MB [2024-08-10 23:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][570/625] eta 0:00:27 lr 0.000226 wd 0.0500 time 0.4488 (0.5032) data time 0.0008 (0.0038) model time 0.4480 (0.4994) loss 2.7273 (2.6658) grad_norm 1.8508 (2.6302) loss_scale 256.0000 (162.5397) mem 16695MB [2024-08-10 23:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][580/625] eta 0:00:22 lr 0.000226 wd 0.0500 time 0.4430 (0.5014) data time 0.0006 (0.0037) model time 0.4424 (0.4978) loss 3.3016 (2.6751) grad_norm 1.5244 (2.6082) loss_scale 256.0000 (165.4154) mem 16695MB [2024-08-10 23:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][590/625] eta 0:00:17 lr 0.000226 wd 0.0500 time 0.4453 (0.4998) data time 0.0006 (0.0036) model time 0.4447 (0.4962) loss 2.9974 (2.6715) grad_norm 1.7643 (2.5852) loss_scale 256.0000 (168.1194) mem 16695MB [2024-08-10 23:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][600/625] eta 0:00:12 lr 0.000226 wd 0.0500 time 0.4447 (0.4983) data time 0.0006 (0.0035) model time 0.4442 (0.4947) loss 2.0132 (2.6708) grad_norm 2.7778 (2.5986) loss_scale 256.0000 (170.6667) mem 16695MB [2024-08-10 23:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][610/625] eta 0:00:07 lr 0.000225 wd 0.0500 time 0.4474 (0.4968) data time 0.0006 (0.0034) model time 0.4468 (0.4934) loss 2.2038 (2.6743) grad_norm 3.2742 (2.5924) loss_scale 256.0000 (173.0704) mem 16695MB [2024-08-10 23:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [221/300][620/625] eta 0:00:02 lr 0.000225 wd 0.0500 time 0.6382 (0.4959) data time 0.0006 (0.0034) model time 0.6376 (0.4925) loss 2.5153 (2.6735) grad_norm 1.5743 (2.5797) loss_scale 256.0000 (175.3425) mem 16695MB [2024-08-10 23:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 221 training takes 0:03:02 [2024-08-10 23:02:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 23:02:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 23:02:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.5278 (0.5278) Acc@1 88.867 (88.867) Acc@5 98.828 (98.828) Mem 16695MB [2024-08-10 23:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 0.8442 (0.6373) Acc@1 80.811 (86.741) Acc@5 95.850 (97.630) Mem 16695MB [2024-08-10 23:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9165 (0.7483) Acc@1 78.320 (83.745) Acc@5 95.459 (96.617) Mem 16695MB [2024-08-10 23:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.465 Acc@5 96.597 [2024-08-10 23:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 23:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.791 (0.791) Loss 0.4792 (0.4792) Acc@1 89.453 (89.453) Acc@5 98.926 (98.926) Mem 16695MB [2024-08-10 23:02:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.182) Loss 0.7700 (0.5890) Acc@1 81.348 (87.260) Acc@5 96.680 (97.940) Mem 16695MB [2024-08-10 23:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.150) Loss 0.8516 (0.6916) Acc@1 79.932 (84.612) Acc@5 95.996 (97.021) Mem 16695MB [2024-08-10 23:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.319 Acc@5 97.007 [2024-08-10 23:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 23:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.32% [2024-08-10 23:02:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 23:03:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 23:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][0/625] eta 0:21:16 lr 0.000225 wd 0.0500 time 2.0429 (2.0429) data time 0.3997 (0.3997) model time 0.0000 (0.0000) loss 3.2656 (3.2656) grad_norm 2.5316 (2.5316) loss_scale 256.0000 (256.0000) mem 16704MB [2024-08-10 23:03:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][10/625] eta 0:06:02 lr 0.000225 wd 0.0500 time 0.4449 (0.5899) data time 0.0006 (0.0371) model time 0.0000 (0.0000) loss 2.6973 (2.6670) grad_norm 2.0830 (2.5084) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][20/625] eta 0:05:15 lr 0.000225 wd 0.0500 time 0.4462 (0.5217) data time 0.0008 (0.0198) model time 0.0000 (0.0000) loss 1.8111 (2.5416) grad_norm 1.5992 (2.4340) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][30/625] eta 0:04:56 lr 0.000225 wd 0.0500 time 0.4459 (0.4980) data time 0.0009 (0.0137) model time 0.0000 (0.0000) loss 2.4195 (2.6238) grad_norm 2.0111 (2.2729) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][40/625] eta 0:04:45 lr 0.000225 wd 0.0500 time 0.4517 (0.4880) data time 0.0006 (0.0106) model time 0.0000 (0.0000) loss 2.8369 (2.6768) grad_norm 4.5449 (2.3016) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][50/625] eta 0:04:35 lr 0.000225 wd 0.0500 time 0.4457 (0.4798) data time 0.0007 (0.0087) model time 0.0000 (0.0000) loss 3.1963 (2.6586) grad_norm 1.7232 (2.2888) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][60/625] eta 0:04:27 lr 0.000225 wd 0.0500 time 0.4484 (0.4742) data time 0.0008 (0.0074) model time 0.4475 (0.4446) loss 3.0236 (2.6858) grad_norm 2.5251 (2.3176) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][70/625] eta 0:04:20 lr 0.000225 wd 0.0500 time 0.4444 (0.4701) data time 0.0008 (0.0065) model time 0.4436 (0.4445) loss 2.3792 (2.7116) grad_norm 2.1233 (2.3035) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][80/625] eta 0:04:14 lr 0.000225 wd 0.0500 time 0.4488 (0.4671) data time 0.0006 (0.0058) model time 0.4482 (0.4445) loss 2.8908 (2.7133) grad_norm 2.4220 (2.2720) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][90/625] eta 0:04:08 lr 0.000225 wd 0.0500 time 0.4492 (0.4651) data time 0.0006 (0.0053) model time 0.4486 (0.4455) loss 2.8508 (2.6750) grad_norm 2.1846 (2.2227) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][100/625] eta 0:04:03 lr 0.000225 wd 0.0500 time 0.4487 (0.4634) data time 0.0009 (0.0048) model time 0.4478 (0.4457) loss 2.6901 (2.6457) grad_norm 2.3734 (2.2283) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][110/625] eta 0:03:57 lr 0.000224 wd 0.0500 time 0.4478 (0.4618) data time 0.0008 (0.0045) model time 0.4469 (0.4456) loss 2.5102 (2.6296) grad_norm 2.5539 (2.2764) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:03:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][120/625] eta 0:03:52 lr 0.000224 wd 0.0500 time 0.4435 (0.4604) data time 0.0006 (0.0042) model time 0.4429 (0.4454) loss 2.5764 (2.6445) grad_norm 2.2977 (2.2822) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][130/625] eta 0:03:47 lr 0.000224 wd 0.0500 time 0.4461 (0.4592) data time 0.0006 (0.0040) model time 0.4455 (0.4452) loss 2.0015 (2.6444) grad_norm 1.9691 (2.2983) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][140/625] eta 0:03:42 lr 0.000224 wd 0.0500 time 0.4470 (0.4583) data time 0.0009 (0.0037) model time 0.4461 (0.4452) loss 2.6644 (2.6522) grad_norm 3.2088 (2.2974) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][150/625] eta 0:03:37 lr 0.000224 wd 0.0500 time 0.4440 (0.4574) data time 0.0007 (0.0036) model time 0.4433 (0.4450) loss 2.8250 (2.6660) grad_norm 2.1999 (2.3032) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][160/625] eta 0:03:32 lr 0.000224 wd 0.0500 time 0.4479 (0.4567) data time 0.0009 (0.0034) model time 0.4470 (0.4451) loss 2.6296 (2.6458) grad_norm 2.0672 (2.3088) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][170/625] eta 0:03:27 lr 0.000224 wd 0.0500 time 0.4488 (0.4562) data time 0.0006 (0.0032) model time 0.4482 (0.4452) loss 3.0572 (2.6447) grad_norm 2.8741 (2.3241) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][180/625] eta 0:03:22 lr 0.000224 wd 0.0500 time 0.4471 (0.4557) data time 0.0008 (0.0031) model time 0.4463 (0.4453) loss 2.0337 (2.6378) grad_norm 2.0266 (2.3285) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][190/625] eta 0:03:18 lr 0.000224 wd 0.0500 time 0.4460 (0.4553) data time 0.0009 (0.0030) model time 0.4451 (0.4454) loss 2.9946 (2.6492) grad_norm 1.6658 (2.3136) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][200/625] eta 0:03:13 lr 0.000224 wd 0.0500 time 0.4458 (0.4548) data time 0.0009 (0.0029) model time 0.4450 (0.4454) loss 2.7211 (2.6519) grad_norm 3.1529 (2.3195) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][210/625] eta 0:03:08 lr 0.000224 wd 0.0500 time 0.4538 (0.4544) data time 0.0006 (0.0028) model time 0.4532 (0.4454) loss 2.5260 (2.6566) grad_norm 1.6233 (2.3039) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][220/625] eta 0:03:04 lr 0.000224 wd 0.0500 time 0.4462 (0.4548) data time 0.0006 (0.0027) model time 0.4456 (0.4464) loss 3.0211 (2.6568) grad_norm 1.9632 (2.2908) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][230/625] eta 0:02:59 lr 0.000223 wd 0.0500 time 0.4521 (0.4545) data time 0.0008 (0.0026) model time 0.4512 (0.4464) loss 3.3004 (2.6590) grad_norm 1.8398 (2.2817) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][240/625] eta 0:02:54 lr 0.000223 wd 0.0500 time 0.4465 (0.4541) data time 0.0007 (0.0026) model time 0.4458 (0.4463) loss 2.5789 (2.6569) grad_norm 2.0133 (2.2897) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][250/625] eta 0:02:50 lr 0.000223 wd 0.0500 time 0.4499 (0.4539) data time 0.0009 (0.0025) model time 0.4491 (0.4463) loss 2.8203 (2.6592) grad_norm 2.2782 (2.3334) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][260/625] eta 0:02:45 lr 0.000223 wd 0.0500 time 0.4443 (0.4543) data time 0.0006 (0.0024) model time 0.4437 (0.4472) loss 3.4082 (2.6619) grad_norm 1.6038 (2.3102) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][270/625] eta 0:02:41 lr 0.000223 wd 0.0500 time 0.4429 (0.4540) data time 0.0007 (0.0024) model time 0.4423 (0.4471) loss 1.7172 (2.6600) grad_norm 2.2340 (2.3081) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][280/625] eta 0:02:36 lr 0.000223 wd 0.0500 time 0.4450 (0.4537) data time 0.0008 (0.0023) model time 0.4442 (0.4470) loss 3.1177 (2.6545) grad_norm 1.3261 (2.3005) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][290/625] eta 0:02:31 lr 0.000223 wd 0.0500 time 0.4481 (0.4534) data time 0.0006 (0.0023) model time 0.4474 (0.4469) loss 2.3795 (2.6472) grad_norm 1.7809 (2.2871) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][300/625] eta 0:02:27 lr 0.000223 wd 0.0500 time 0.4417 (0.4531) data time 0.0006 (0.0022) model time 0.4412 (0.4468) loss 3.2449 (2.6484) grad_norm 1.5752 (2.2840) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][310/625] eta 0:02:22 lr 0.000223 wd 0.0500 time 0.4495 (0.4530) data time 0.0010 (0.0022) model time 0.4485 (0.4468) loss 2.3499 (2.6506) grad_norm 1.5555 (2.2722) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-10 23:05:25 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 23:05:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 23:05:26 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 23:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-10 23:11:32 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-10 23:11:42 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-10 23:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-10 23:11:56 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-10 23:11:59 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-10 23:12:01 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-10 23:12:01 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 222) [2024-08-10 23:12:01 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-10 23:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][320/625] eta 0:27:04 lr 0.000223 wd 0.0500 time 0.4424 (5.3253) data time 0.0007 (0.1465) model time 0.4417 (5.1788) loss 2.8988 (2.7878) grad_norm 2.2791 (3.1145) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][330/625] eta 0:09:02 lr 0.000223 wd 0.0500 time 0.4466 (1.8389) data time 0.0006 (0.0425) model time 0.4460 (1.7964) loss 2.8451 (2.7737) grad_norm 2.8966 (3.2229) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][340/625] eta 0:05:58 lr 0.000223 wd 0.0500 time 0.4478 (1.2578) data time 0.0008 (0.0251) model time 0.4470 (1.2327) loss 2.7702 (2.8097) grad_norm 1.7922 (3.1248) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][350/625] eta 0:04:42 lr 0.000222 wd 0.0500 time 0.4403 (1.0261) data time 0.0007 (0.0180) model time 0.4397 (1.0082) loss 2.3945 (2.8007) grad_norm 1.6449 (2.8820) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][360/625] eta 0:03:58 lr 0.000222 wd 0.0500 time 0.4467 (0.8988) data time 0.0006 (0.0141) model time 0.4461 (0.8847) loss 2.6953 (2.7744) grad_norm 5.3864 (2.7713) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][370/625] eta 0:03:27 lr 0.000222 wd 0.0500 time 0.4430 (0.8140) data time 0.0007 (0.0116) model time 0.4423 (0.8024) loss 2.8648 (2.7603) grad_norm 3.9030 (2.6725) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][380/625] eta 0:03:05 lr 0.000222 wd 0.0500 time 0.4426 (0.7560) data time 0.0006 (0.0099) model time 0.4420 (0.7460) loss 3.0917 (2.7438) grad_norm 1.8714 (2.6296) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][390/625] eta 0:02:47 lr 0.000222 wd 0.0500 time 0.4429 (0.7136) data time 0.0007 (0.0087) model time 0.4422 (0.7049) loss 2.9548 (2.7229) grad_norm 2.1710 (2.5435) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][400/625] eta 0:02:33 lr 0.000222 wd 0.0500 time 0.4479 (0.6819) data time 0.0009 (0.0078) model time 0.4470 (0.6741) loss 2.7671 (2.6990) grad_norm 1.6366 (2.4829) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][410/625] eta 0:02:21 lr 0.000222 wd 0.0500 time 0.4387 (0.6565) data time 0.0008 (0.0070) model time 0.4379 (0.6495) loss 2.4559 (2.6961) grad_norm 1.7210 (2.4411) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][420/625] eta 0:02:10 lr 0.000222 wd 0.0500 time 0.4410 (0.6361) data time 0.0009 (0.0064) model time 0.4401 (0.6296) loss 2.9315 (2.7174) grad_norm 2.8548 (2.4123) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][430/625] eta 0:02:00 lr 0.000222 wd 0.0500 time 0.4452 (0.6193) data time 0.0009 (0.0060) model time 0.4443 (0.6134) loss 2.8398 (2.7106) grad_norm 5.4544 (2.4178) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][440/625] eta 0:01:51 lr 0.000222 wd 0.0500 time 0.4454 (0.6052) data time 0.0008 (0.0056) model time 0.4445 (0.5997) loss 2.5537 (2.7051) grad_norm 2.6295 (2.5709) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][450/625] eta 0:01:43 lr 0.000222 wd 0.0500 time 0.4413 (0.5931) data time 0.0008 (0.0052) model time 0.4405 (0.5879) loss 2.8041 (2.7007) grad_norm 1.8132 (2.5386) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][460/625] eta 0:01:36 lr 0.000222 wd 0.0500 time 0.4390 (0.5826) data time 0.0006 (0.0049) model time 0.4384 (0.5777) loss 2.8000 (2.6846) grad_norm 3.0663 (2.5735) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][470/625] eta 0:01:28 lr 0.000221 wd 0.0500 time 0.4433 (0.5735) data time 0.0006 (0.0046) model time 0.4426 (0.5689) loss 2.8900 (2.6826) grad_norm 1.7173 (2.5463) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][480/625] eta 0:01:22 lr 0.000221 wd 0.0500 time 0.4393 (0.5655) data time 0.0007 (0.0044) model time 0.4386 (0.5611) loss 2.4245 (2.6855) grad_norm 1.3517 (2.5022) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][490/625] eta 0:01:15 lr 0.000221 wd 0.0500 time 0.4411 (0.5584) data time 0.0008 (0.0042) model time 0.4403 (0.5542) loss 2.0528 (2.6806) grad_norm 2.4683 (2.4722) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][500/625] eta 0:01:09 lr 0.000221 wd 0.0500 time 0.3850 (0.5530) data time 0.0008 (0.0040) model time 0.3842 (0.5490) loss 2.5547 (2.6753) grad_norm 2.2098 (2.4913) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][510/625] eta 0:01:02 lr 0.000221 wd 0.0500 time 0.4429 (0.5474) data time 0.0007 (0.0039) model time 0.4422 (0.5435) loss 2.7044 (2.6696) grad_norm 1.7018 (2.4812) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][520/625] eta 0:00:56 lr 0.000221 wd 0.0500 time 0.4463 (0.5423) data time 0.0009 (0.0037) model time 0.4455 (0.5386) loss 2.6991 (2.6571) grad_norm 2.6878 (2.4928) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][530/625] eta 0:00:51 lr 0.000221 wd 0.0500 time 0.4454 (0.5377) data time 0.0006 (0.0036) model time 0.4448 (0.5342) loss 2.5891 (2.6516) grad_norm 3.8133 (2.5268) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][540/625] eta 0:00:45 lr 0.000221 wd 0.0500 time 0.4465 (0.5337) data time 0.0008 (0.0035) model time 0.4456 (0.5302) loss 2.7193 (2.6478) grad_norm 2.5176 (2.5268) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][550/625] eta 0:00:39 lr 0.000221 wd 0.0500 time 0.4461 (0.5299) data time 0.0007 (0.0033) model time 0.4454 (0.5266) loss 2.1554 (2.6390) grad_norm 1.7491 (2.5280) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][560/625] eta 0:00:34 lr 0.000221 wd 0.0500 time 0.4459 (0.5265) data time 0.0007 (0.0032) model time 0.4452 (0.5232) loss 1.7125 (2.6450) grad_norm 2.4257 (2.5228) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][570/625] eta 0:00:28 lr 0.000221 wd 0.0500 time 0.4491 (0.5233) data time 0.0007 (0.0031) model time 0.4484 (0.5202) loss 1.9945 (2.6419) grad_norm 2.5523 (2.5102) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][580/625] eta 0:00:23 lr 0.000221 wd 0.0500 time 0.4510 (0.5204) data time 0.0007 (0.0031) model time 0.4503 (0.5174) loss 2.7629 (2.6373) grad_norm 1.9092 (2.4892) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][590/625] eta 0:00:18 lr 0.000221 wd 0.0500 time 0.4448 (0.5177) data time 0.0010 (0.0030) model time 0.4438 (0.5148) loss 2.9073 (2.6350) grad_norm 1.7322 (2.4882) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][600/625] eta 0:00:12 lr 0.000220 wd 0.0500 time 0.4472 (0.5152) data time 0.0007 (0.0029) model time 0.4465 (0.5123) loss 1.7091 (2.6313) grad_norm 1.8123 (2.5010) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][610/625] eta 0:00:07 lr 0.000220 wd 0.0500 time 0.4429 (0.5129) data time 0.0004 (0.0028) model time 0.4425 (0.5101) loss 2.2194 (2.6267) grad_norm 3.5552 (2.5016) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [222/300][620/625] eta 0:00:02 lr 0.000220 wd 0.0500 time 0.4426 (0.5106) data time 0.0006 (0.0028) model time 0.4420 (0.5078) loss 2.8028 (2.6194) grad_norm 4.4085 (2.5016) loss_scale 256.0000 (256.0000) mem 16711MB [2024-08-10 23:14:43 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 222 training takes 0:02:37 [2024-08-10 23:14:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 23:14:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-10 23:14:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.4946 (0.4946) Acc@1 89.404 (89.404) Acc@5 99.023 (99.023) Mem 16711MB [2024-08-10 23:14:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8320 (0.6203) Acc@1 80.078 (86.612) Acc@5 95.703 (97.754) Mem 16711MB [2024-08-10 23:14:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9443 (0.7342) Acc@1 77.930 (83.782) Acc@5 95.312 (96.622) Mem 16711MB [2024-08-10 23:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.465 Acc@5 96.553 [2024-08-10 23:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-10 23:14:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.792 (0.792) Loss 0.4797 (0.4797) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16711MB [2024-08-10 23:14:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.183) Loss 0.7720 (0.5894) Acc@1 81.396 (87.256) Acc@5 96.729 (97.954) Mem 16711MB [2024-08-10 23:14:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.151) Loss 0.8516 (0.6919) Acc@1 79.932 (84.601) Acc@5 95.996 (97.035) Mem 16711MB [2024-08-10 23:14:55 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.317 Acc@5 97.021 [2024-08-10 23:14:55 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-10 23:14:55 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.32% [2024-08-10 23:14:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-10 23:15:01 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-10 23:15:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][0/625] eta 0:09:43 lr 0.000220 wd 0.0500 time 0.9336 (0.9336) data time 0.3968 (0.3968) model time 0.0000 (0.0000) loss 2.9512 (2.9512) grad_norm 1.8634 (1.8634) loss_scale 256.0000 (256.0000) mem 16710MB [2024-08-10 23:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-10 23:15:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-10 23:15:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 05:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 05:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 05:43:02 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 05:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 05:43:14 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 05:43:17 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 05:43:19 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 05:43:19 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 223) [2024-08-11 05:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 05:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][10/625] eta 3:07:58 lr 0.000220 wd 0.0500 time 18.3391 (18.3391) data time 0.8155 (0.8155) model time 0.0000 (0.0000) loss 2.9590 (2.9590) grad_norm 2.1599 (2.1599) loss_scale 256.0000 (256.0000) mem 25862MB [2024-08-11 05:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][20/625] eta 0:21:25 lr 0.000220 wd 0.0500 time 0.4459 (2.1246) data time 0.0008 (0.0750) model time 0.0000 (0.0000) loss 2.2398 (2.8889) grad_norm 1.4536 (2.9149) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][30/625] eta 0:13:08 lr 0.000220 wd 0.0500 time 0.4456 (1.3251) data time 0.0008 (0.0397) model time 0.0000 (0.0000) loss 2.6724 (2.8264) grad_norm 1.5850 (2.6033) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][40/625] eta 0:10:14 lr 0.000220 wd 0.0500 time 0.4442 (1.0501) data time 0.0006 (0.0272) model time 0.0000 (0.0000) loss 2.4308 (2.8396) grad_norm 1.5569 (2.5385) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][50/625] eta 0:08:41 lr 0.000220 wd 0.0500 time 0.4475 (0.9071) data time 0.0009 (0.0208) model time 0.0000 (0.0000) loss 2.5585 (2.7971) grad_norm 1.8040 (2.5551) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][60/625] eta 0:07:41 lr 0.000220 wd 0.0500 time 0.4423 (0.8161) data time 0.0007 (0.0169) model time 0.4417 (0.4419) loss 2.7933 (2.7712) grad_norm 9.3146 (2.6640) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][70/625] eta 0:06:58 lr 0.000220 wd 0.0500 time 0.4457 (0.7549) data time 0.0009 (0.0143) model time 0.4448 (0.4419) loss 2.5799 (2.7539) grad_norm 1.6994 (2.5948) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][80/625] eta 0:06:27 lr 0.000220 wd 0.0500 time 0.4436 (0.7111) data time 0.0008 (0.0124) model time 0.4427 (0.4422) loss 2.6747 (2.7363) grad_norm 1.7793 (2.5199) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][90/625] eta 0:06:02 lr 0.000219 wd 0.0500 time 0.4438 (0.6782) data time 0.0009 (0.0110) model time 0.4430 (0.4426) loss 2.4084 (2.7236) grad_norm 3.4045 (2.5211) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][100/625] eta 0:05:42 lr 0.000219 wd 0.0500 time 0.4452 (0.6524) data time 0.0007 (0.0099) model time 0.4445 (0.4427) loss 3.0767 (2.7037) grad_norm 2.5794 (2.5557) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][110/625] eta 0:05:25 lr 0.000219 wd 0.0500 time 0.4479 (0.6320) data time 0.0006 (0.0090) model time 0.4473 (0.4430) loss 2.9558 (2.7059) grad_norm 2.6165 (2.5356) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][120/625] eta 0:05:10 lr 0.000219 wd 0.0500 time 0.4412 (0.6150) data time 0.0008 (0.0083) model time 0.4404 (0.4431) loss 2.1879 (2.7084) grad_norm 1.6078 (2.6067) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][130/625] eta 0:04:57 lr 0.000219 wd 0.0500 time 0.4438 (0.6009) data time 0.0006 (0.0077) model time 0.4432 (0.4430) loss 1.5684 (2.7046) grad_norm 1.6267 (2.5440) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][140/625] eta 0:04:45 lr 0.000219 wd 0.0500 time 0.4470 (0.5889) data time 0.0008 (0.0071) model time 0.4461 (0.4431) loss 2.8846 (2.7075) grad_norm 1.9435 (2.5624) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][150/625] eta 0:04:34 lr 0.000219 wd 0.0500 time 0.4438 (0.5787) data time 0.0008 (0.0067) model time 0.4431 (0.4432) loss 2.8784 (2.7013) grad_norm 2.2381 (2.5519) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][160/625] eta 0:04:25 lr 0.000219 wd 0.0500 time 0.4480 (0.5700) data time 0.0008 (0.0063) model time 0.4472 (0.4434) loss 2.2399 (2.6928) grad_norm 1.8286 (2.5258) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][170/625] eta 0:04:15 lr 0.000219 wd 0.0500 time 0.4629 (0.5624) data time 0.0009 (0.0060) model time 0.4620 (0.4437) loss 3.0036 (2.6971) grad_norm 2.4549 (2.5419) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][180/625] eta 0:04:07 lr 0.000219 wd 0.0500 time 0.4453 (0.5557) data time 0.0007 (0.0057) model time 0.4446 (0.4439) loss 2.7057 (2.6946) grad_norm 1.9967 (2.5377) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][190/625] eta 0:03:59 lr 0.000219 wd 0.0500 time 0.4569 (0.5497) data time 0.0009 (0.0054) model time 0.4559 (0.4442) loss 2.9202 (2.6840) grad_norm 1.9764 (2.5138) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][200/625] eta 0:03:51 lr 0.000219 wd 0.0500 time 0.4469 (0.5451) data time 0.0008 (0.0052) model time 0.4461 (0.4452) loss 2.4367 (2.6830) grad_norm 2.3437 (2.4728) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][210/625] eta 0:03:44 lr 0.000219 wd 0.0500 time 0.4438 (0.5401) data time 0.0009 (0.0050) model time 0.4429 (0.4452) loss 2.6124 (2.6754) grad_norm 3.4200 (2.4654) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][220/625] eta 0:03:36 lr 0.000218 wd 0.0500 time 0.4473 (0.5357) data time 0.0008 (0.0048) model time 0.4466 (0.4452) loss 2.9928 (2.6698) grad_norm 1.8455 (2.4544) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][230/625] eta 0:03:29 lr 0.000218 wd 0.0500 time 0.4462 (0.5316) data time 0.0005 (0.0046) model time 0.4457 (0.4452) loss 2.8887 (2.6645) grad_norm 1.6024 (2.4397) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][240/625] eta 0:03:23 lr 0.000218 wd 0.0500 time 0.4453 (0.5279) data time 0.0006 (0.0044) model time 0.4448 (0.4452) loss 1.5833 (2.6572) grad_norm 1.9259 (2.4600) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][250/625] eta 0:03:16 lr 0.000218 wd 0.0500 time 0.4468 (0.5246) data time 0.0007 (0.0043) model time 0.4462 (0.4453) loss 2.5004 (2.6563) grad_norm 4.3137 (2.4942) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][260/625] eta 0:03:10 lr 0.000218 wd 0.0500 time 0.4443 (0.5214) data time 0.0007 (0.0041) model time 0.4436 (0.4452) loss 2.4008 (2.6440) grad_norm 3.6650 (2.5201) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][270/625] eta 0:03:04 lr 0.000218 wd 0.0500 time 0.4471 (0.5186) data time 0.0007 (0.0040) model time 0.4464 (0.4453) loss 2.5737 (2.6351) grad_norm 2.1435 (2.6084) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][280/625] eta 0:02:57 lr 0.000218 wd 0.0500 time 0.4468 (0.5159) data time 0.0007 (0.0039) model time 0.4461 (0.4453) loss 2.7603 (2.6314) grad_norm 1.6681 (2.6191) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][290/625] eta 0:02:52 lr 0.000218 wd 0.0500 time 0.4420 (0.5134) data time 0.0009 (0.0038) model time 0.4411 (0.4453) loss 2.4463 (2.6330) grad_norm 2.6683 (2.6042) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][300/625] eta 0:02:46 lr 0.000218 wd 0.0500 time 0.4443 (0.5111) data time 0.0006 (0.0037) model time 0.4437 (0.4453) loss 1.7720 (2.6288) grad_norm 4.5716 (2.6260) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:45:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][310/625] eta 0:02:40 lr 0.000218 wd 0.0500 time 0.4460 (0.5089) data time 0.0008 (0.0036) model time 0.4452 (0.4453) loss 2.5361 (2.6197) grad_norm 2.7143 (2.6182) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][320/625] eta 0:02:34 lr 0.000218 wd 0.0500 time 0.4440 (0.5069) data time 0.0009 (0.0035) model time 0.4431 (0.4453) loss 2.7415 (2.6182) grad_norm 2.6529 (2.6559) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][330/625] eta 0:02:28 lr 0.000218 wd 0.0500 time 0.4440 (0.5050) data time 0.0006 (0.0034) model time 0.4433 (0.4452) loss 3.3447 (2.6249) grad_norm 2.0531 (2.6589) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][340/625] eta 0:02:23 lr 0.000217 wd 0.0500 time 0.4456 (0.5032) data time 0.0006 (0.0034) model time 0.4449 (0.4452) loss 1.7554 (2.6254) grad_norm 2.0551 (2.6464) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][350/625] eta 0:02:17 lr 0.000217 wd 0.0500 time 0.4461 (0.5016) data time 0.0009 (0.0033) model time 0.4453 (0.4452) loss 2.7894 (2.6292) grad_norm 1.6825 (2.6376) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][360/625] eta 0:02:12 lr 0.000217 wd 0.0500 time 0.4441 (0.5000) data time 0.0007 (0.0032) model time 0.4434 (0.4452) loss 3.1166 (2.6312) grad_norm 1.9266 (2.6264) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][370/625] eta 0:02:07 lr 0.000217 wd 0.0500 time 0.4506 (0.4985) data time 0.0007 (0.0032) model time 0.4499 (0.4452) loss 2.3764 (2.6310) grad_norm 1.4887 (2.6146) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][380/625] eta 0:02:01 lr 0.000217 wd 0.0500 time 0.4469 (0.4977) data time 0.0009 (0.0031) model time 0.4460 (0.4460) loss 1.9172 (2.6297) grad_norm 2.4041 (2.6405) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][390/625] eta 0:01:56 lr 0.000217 wd 0.0500 time 0.4440 (0.4964) data time 0.0006 (0.0030) model time 0.4434 (0.4460) loss 1.5961 (2.6292) grad_norm 4.2986 (2.6337) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][400/625] eta 0:01:51 lr 0.000217 wd 0.0500 time 0.4488 (0.4951) data time 0.0007 (0.0030) model time 0.4482 (0.4460) loss 2.9640 (2.6255) grad_norm 2.6627 (2.6252) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][410/625] eta 0:01:46 lr 0.000217 wd 0.0500 time 0.4453 (0.4939) data time 0.0009 (0.0029) model time 0.4444 (0.4459) loss 3.1453 (2.6308) grad_norm 4.0980 (2.6305) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][420/625] eta 0:01:41 lr 0.000217 wd 0.0500 time 0.4477 (0.4927) data time 0.0009 (0.0029) model time 0.4468 (0.4459) loss 2.9062 (2.6369) grad_norm 1.4135 (2.6195) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][430/625] eta 0:01:35 lr 0.000217 wd 0.0500 time 0.4470 (0.4916) data time 0.0006 (0.0028) model time 0.4464 (0.4459) loss 2.9308 (2.6361) grad_norm 2.8437 (2.6060) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][440/625] eta 0:01:30 lr 0.000217 wd 0.0500 time 0.4436 (0.4906) data time 0.0006 (0.0028) model time 0.4430 (0.4459) loss 2.9668 (2.6407) grad_norm 2.5693 (2.5983) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:46:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][450/625] eta 0:01:25 lr 0.000217 wd 0.0500 time 0.4474 (0.4896) data time 0.0006 (0.0027) model time 0.4468 (0.4459) loss 2.7337 (2.6421) grad_norm 1.5790 (2.5915) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][460/625] eta 0:01:20 lr 0.000217 wd 0.0500 time 0.4447 (0.4886) data time 0.0006 (0.0027) model time 0.4440 (0.4459) loss 2.3410 (2.6411) grad_norm 2.1785 (2.6008) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][470/625] eta 0:01:15 lr 0.000216 wd 0.0500 time 0.4456 (0.4877) data time 0.0008 (0.0027) model time 0.4448 (0.4459) loss 3.2238 (2.6388) grad_norm 1.9423 (2.5915) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][480/625] eta 0:01:10 lr 0.000216 wd 0.0500 time 0.4454 (0.4868) data time 0.0008 (0.0026) model time 0.4445 (0.4459) loss 2.5148 (2.6312) grad_norm 1.8614 (2.5831) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][490/625] eta 0:01:05 lr 0.000216 wd 0.0500 time 0.4435 (0.4860) data time 0.0007 (0.0026) model time 0.4428 (0.4459) loss 2.8244 (2.6328) grad_norm 4.5634 (2.5789) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][500/625] eta 0:01:00 lr 0.000216 wd 0.0500 time 0.4452 (0.4852) data time 0.0008 (0.0026) model time 0.4444 (0.4458) loss 2.7885 (2.6360) grad_norm 2.8269 (2.5737) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][510/625] eta 0:00:55 lr 0.000216 wd 0.0500 time 0.4524 (0.4844) data time 0.0009 (0.0025) model time 0.4515 (0.4459) loss 2.6654 (2.6361) grad_norm 1.6478 (2.5621) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][520/625] eta 0:00:50 lr 0.000216 wd 0.0500 time 0.4483 (0.4837) data time 0.0007 (0.0025) model time 0.4476 (0.4459) loss 2.9545 (2.6364) grad_norm 3.0262 (2.5511) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][530/625] eta 0:00:45 lr 0.000216 wd 0.0500 time 0.4483 (0.4833) data time 0.0006 (0.0025) model time 0.4477 (0.4462) loss 2.3831 (2.6375) grad_norm 2.1626 (2.5401) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][540/625] eta 0:00:41 lr 0.000216 wd 0.0500 time 0.4429 (0.4826) data time 0.0009 (0.0024) model time 0.4420 (0.4462) loss 3.1318 (2.6340) grad_norm 1.9951 (2.5736) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][550/625] eta 0:00:36 lr 0.000216 wd 0.0500 time 0.4459 (0.4820) data time 0.0008 (0.0024) model time 0.4451 (0.4462) loss 2.6099 (2.6331) grad_norm 3.7475 (2.5749) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][560/625] eta 0:00:31 lr 0.000216 wd 0.0500 time 0.4447 (0.4813) data time 0.0007 (0.0024) model time 0.4440 (0.4462) loss 3.2110 (2.6330) grad_norm 3.9285 (2.5831) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][570/625] eta 0:00:26 lr 0.000216 wd 0.0500 time 0.4433 (0.4809) data time 0.0008 (0.0023) model time 0.4425 (0.4464) loss 2.6625 (2.6357) grad_norm 2.5521 (2.5783) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:47:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][580/625] eta 0:00:21 lr 0.000216 wd 0.0500 time 0.4451 (0.4803) data time 0.0009 (0.0023) model time 0.4442 (0.4464) loss 2.6467 (2.6358) grad_norm 2.3767 (2.5682) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][590/625] eta 0:00:16 lr 0.000215 wd 0.0500 time 0.4469 (0.4797) data time 0.0006 (0.0023) model time 0.4463 (0.4464) loss 2.6647 (2.6365) grad_norm 2.6636 (2.5642) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][600/625] eta 0:00:11 lr 0.000215 wd 0.0500 time 0.4487 (0.4792) data time 0.0009 (0.0023) model time 0.4478 (0.4464) loss 2.9974 (2.6387) grad_norm 1.7635 (2.5556) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][610/625] eta 0:00:07 lr 0.000215 wd 0.0500 time 0.4400 (0.4786) data time 0.0006 (0.0023) model time 0.4394 (0.4463) loss 1.7761 (2.6361) grad_norm 2.3038 (2.5547) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [223/300][620/625] eta 0:00:02 lr 0.000215 wd 0.0500 time 0.4426 (0.4780) data time 0.0007 (0.0022) model time 0.4420 (0.4462) loss 2.4825 (2.6357) grad_norm 2.3705 (2.5493) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:17 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 223 training takes 0:04:53 [2024-08-11 05:48:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 05:48:20 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 05:48:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.4951 (0.4951) Acc@1 89.258 (89.258) Acc@5 98.975 (98.975) Mem 16721MB [2024-08-11 05:48:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.151) Loss 0.8281 (0.6215) Acc@1 80.713 (86.710) Acc@5 96.240 (97.749) Mem 16721MB [2024-08-11 05:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.8960 (0.7309) Acc@1 79.346 (83.894) Acc@5 95.361 (96.724) Mem 16721MB [2024-08-11 05:48:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.559 Acc@5 96.669 [2024-08-11 05:48:25 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 05:48:25 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.56% [2024-08-11 05:48:25 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-11 05:48:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-11 05:48:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.4795 (0.4795) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16721MB [2024-08-11 05:48:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.7725 (0.5896) Acc@1 81.396 (87.291) Acc@5 96.582 (97.936) Mem 16721MB [2024-08-11 05:48:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.8540 (0.6922) Acc@1 79.688 (84.614) Acc@5 95.947 (97.015) Mem 16721MB [2024-08-11 05:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.327 Acc@5 96.999 [2024-08-11 05:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 05:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.33% [2024-08-11 05:48:30 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 05:48:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 05:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][0/625] eta 0:09:11 lr 0.000215 wd 0.0500 time 0.8824 (0.8824) data time 0.3905 (0.3905) model time 0.0000 (0.0000) loss 2.8821 (2.8821) grad_norm 3.3443 (3.3443) loss_scale 256.0000 (256.0000) mem 16725MB [2024-08-11 05:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][10/625] eta 0:04:58 lr 0.000215 wd 0.0500 time 0.4443 (0.4857) data time 0.0008 (0.0363) model time 0.0000 (0.0000) loss 1.9011 (2.7465) grad_norm 1.8275 (2.1914) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][20/625] eta 0:04:42 lr 0.000215 wd 0.0500 time 0.4490 (0.4672) data time 0.0007 (0.0195) model time 0.0000 (0.0000) loss 2.6239 (2.6448) grad_norm 1.9005 (2.1744) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][30/625] eta 0:04:36 lr 0.000215 wd 0.0500 time 0.4457 (0.4651) data time 0.0009 (0.0135) model time 0.0000 (0.0000) loss 3.0728 (2.5528) grad_norm 1.9404 (2.3971) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][40/625] eta 0:04:29 lr 0.000215 wd 0.0500 time 0.4370 (0.4606) data time 0.0008 (0.0104) model time 0.0000 (0.0000) loss 1.9996 (2.5795) grad_norm 3.1013 (2.5389) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][50/625] eta 0:04:23 lr 0.000215 wd 0.0500 time 0.4409 (0.4576) data time 0.0006 (0.0085) model time 0.0000 (0.0000) loss 2.7022 (2.5521) grad_norm 1.9473 (3.0907) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][60/625] eta 0:04:17 lr 0.000215 wd 0.0500 time 0.4458 (0.4556) data time 0.0006 (0.0073) model time 0.4451 (0.4444) loss 2.3979 (2.5883) grad_norm 2.8879 (3.5008) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][70/625] eta 0:04:12 lr 0.000215 wd 0.0500 time 0.4459 (0.4541) data time 0.0007 (0.0064) model time 0.4452 (0.4444) loss 2.4318 (2.5967) grad_norm 1.8483 (3.3008) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][80/625] eta 0:04:06 lr 0.000215 wd 0.0500 time 0.4488 (0.4531) data time 0.0006 (0.0057) model time 0.4482 (0.4445) loss 2.0540 (2.5837) grad_norm 1.9757 (3.1746) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][90/625] eta 0:04:01 lr 0.000214 wd 0.0500 time 0.4485 (0.4523) data time 0.0007 (0.0052) model time 0.4478 (0.4446) loss 2.8953 (2.5891) grad_norm 3.0557 (3.1228) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][100/625] eta 0:03:57 lr 0.000214 wd 0.0500 time 0.4497 (0.4517) data time 0.0008 (0.0047) model time 0.4489 (0.4449) loss 3.0835 (2.5736) grad_norm 2.0263 (3.1227) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][110/625] eta 0:03:52 lr 0.000214 wd 0.0500 time 0.4480 (0.4513) data time 0.0007 (0.0044) model time 0.4473 (0.4451) loss 2.6559 (2.5781) grad_norm 2.5502 (3.2160) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][120/625] eta 0:03:47 lr 0.000214 wd 0.0500 time 0.4448 (0.4509) data time 0.0007 (0.0041) model time 0.4442 (0.4452) loss 3.3850 (2.6006) grad_norm 1.7201 (3.1272) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][130/625] eta 0:03:43 lr 0.000214 wd 0.0500 time 0.4430 (0.4506) data time 0.0007 (0.0039) model time 0.4424 (0.4452) loss 3.0932 (2.6041) grad_norm 1.8433 (3.3628) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][140/625] eta 0:03:38 lr 0.000214 wd 0.0500 time 0.4496 (0.4514) data time 0.0007 (0.0036) model time 0.4489 (0.4471) loss 2.9425 (2.5962) grad_norm 1.9288 (3.2887) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][150/625] eta 0:03:34 lr 0.000214 wd 0.0500 time 0.4467 (0.4511) data time 0.0008 (0.0035) model time 0.4459 (0.4469) loss 2.9937 (2.6087) grad_norm 1.5941 (3.3503) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][160/625] eta 0:03:29 lr 0.000214 wd 0.0500 time 0.4437 (0.4508) data time 0.0009 (0.0033) model time 0.4429 (0.4468) loss 2.8750 (2.6142) grad_norm 2.5810 (3.3186) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][170/625] eta 0:03:25 lr 0.000214 wd 0.0500 time 0.4463 (0.4506) data time 0.0009 (0.0032) model time 0.4454 (0.4467) loss 2.8455 (2.6124) grad_norm 5.8716 (3.2941) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][180/625] eta 0:03:20 lr 0.000214 wd 0.0500 time 0.4475 (0.4504) data time 0.0006 (0.0030) model time 0.4469 (0.4467) loss 1.9280 (2.6119) grad_norm 2.6809 (3.2953) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:49:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][190/625] eta 0:03:15 lr 0.000214 wd 0.0500 time 0.4464 (0.4502) data time 0.0006 (0.0029) model time 0.4458 (0.4466) loss 1.4181 (2.6109) grad_norm 1.9081 (3.2500) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][200/625] eta 0:03:11 lr 0.000214 wd 0.0500 time 0.4463 (0.4500) data time 0.0007 (0.0028) model time 0.4456 (0.4465) loss 2.9996 (2.6036) grad_norm 1.7241 (3.2106) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][210/625] eta 0:03:06 lr 0.000214 wd 0.0500 time 0.4453 (0.4498) data time 0.0008 (0.0027) model time 0.4444 (0.4464) loss 1.4331 (2.6031) grad_norm 2.2228 (3.1811) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][220/625] eta 0:03:02 lr 0.000213 wd 0.0500 time 0.4433 (0.4498) data time 0.0006 (0.0026) model time 0.4426 (0.4466) loss 3.1960 (2.6029) grad_norm 3.3584 (3.1409) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][230/625] eta 0:02:57 lr 0.000213 wd 0.0500 time 0.4451 (0.4496) data time 0.0006 (0.0026) model time 0.4445 (0.4465) loss 3.4155 (2.5991) grad_norm 2.1410 (3.1010) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][240/625] eta 0:02:53 lr 0.000213 wd 0.0500 time 0.4473 (0.4494) data time 0.0008 (0.0025) model time 0.4465 (0.4463) loss 2.7049 (2.6036) grad_norm 5.4945 (3.0727) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][250/625] eta 0:02:48 lr 0.000213 wd 0.0500 time 0.4434 (0.4493) data time 0.0009 (0.0024) model time 0.4425 (0.4464) loss 2.6987 (2.6008) grad_norm 2.7429 (3.0395) loss_scale 256.0000 (256.0000) mem 16721MB [2024-08-11 05:50:26 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 05:50:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 05:50:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 06:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 06:09:27 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 06:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 06:09:35 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 06:09:38 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 06:09:40 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 06:09:40 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 224) [2024-08-11 06:09:40 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 06:10:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][260/625] eta 0:20:29 lr 0.000213 wd 0.0500 time 0.4404 (3.3672) data time 0.0011 (0.0953) model time 0.4392 (3.2719) loss 2.8559 (2.8881) grad_norm 2.0703 (1.8735) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][270/625] eta 0:09:44 lr 0.000213 wd 0.0500 time 0.4397 (1.6460) data time 0.0008 (0.0398) model time 0.4389 (1.6063) loss 2.7554 (2.8344) grad_norm 1.9767 (2.2183) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][280/625] eta 0:06:53 lr 0.000213 wd 0.0500 time 0.4398 (1.1992) data time 0.0006 (0.0254) model time 0.4392 (1.1738) loss 3.2551 (2.8632) grad_norm 1.6407 (2.4113) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][290/625] eta 0:05:37 lr 0.000213 wd 0.0500 time 0.6941 (1.0085) data time 0.0009 (0.0188) model time 0.6933 (0.9897) loss 2.1955 (2.8155) grad_norm 2.5130 (2.3094) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][300/625] eta 0:04:48 lr 0.000213 wd 0.0500 time 0.4423 (0.8868) data time 0.0006 (0.0150) model time 0.4417 (0.8718) loss 2.8629 (2.7757) grad_norm 2.2566 (2.4146) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][310/625] eta 0:04:14 lr 0.000213 wd 0.0500 time 0.4411 (0.8089) data time 0.0008 (0.0125) model time 0.4403 (0.7964) loss 2.9994 (2.7754) grad_norm 2.6432 (2.4622) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][320/625] eta 0:03:50 lr 0.000213 wd 0.0500 time 0.4436 (0.7545) data time 0.0008 (0.0107) model time 0.4429 (0.7438) loss 2.6031 (2.7413) grad_norm 1.3952 (2.3763) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][330/625] eta 0:03:30 lr 0.000213 wd 0.0500 time 0.4461 (0.7143) data time 0.0008 (0.0095) model time 0.4453 (0.7048) loss 2.8089 (2.7204) grad_norm 2.6692 (2.3912) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][340/625] eta 0:03:14 lr 0.000212 wd 0.0500 time 0.4431 (0.6834) data time 0.0009 (0.0085) model time 0.4423 (0.6749) loss 2.7664 (2.6971) grad_norm 3.7829 (2.3779) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][350/625] eta 0:03:01 lr 0.000212 wd 0.0500 time 0.4403 (0.6587) data time 0.0008 (0.0077) model time 0.4395 (0.6510) loss 2.8331 (2.7006) grad_norm 1.7439 (2.3673) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][360/625] eta 0:02:49 lr 0.000212 wd 0.0500 time 0.4402 (0.6385) data time 0.0006 (0.0071) model time 0.4396 (0.6314) loss 2.6346 (2.7279) grad_norm 2.1874 (2.3690) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:10:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][370/625] eta 0:02:38 lr 0.000212 wd 0.0500 time 0.4406 (0.6217) data time 0.0008 (0.0065) model time 0.4398 (0.6152) loss 3.1664 (2.7287) grad_norm 1.6507 (2.3673) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][380/625] eta 0:02:28 lr 0.000212 wd 0.0500 time 0.4371 (0.6075) data time 0.0008 (0.0061) model time 0.4363 (0.6014) loss 2.7869 (2.7156) grad_norm 1.6154 (2.3440) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][390/625] eta 0:02:19 lr 0.000212 wd 0.0500 time 0.4412 (0.5956) data time 0.0006 (0.0057) model time 0.4406 (0.5899) loss 2.4081 (2.7104) grad_norm 1.7251 (2.3590) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][400/625] eta 0:02:11 lr 0.000212 wd 0.0500 time 0.4413 (0.5853) data time 0.0008 (0.0054) model time 0.4405 (0.5799) loss 2.6163 (2.7003) grad_norm 3.1759 (2.3677) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][410/625] eta 0:02:03 lr 0.000212 wd 0.0500 time 0.4458 (0.5764) data time 0.0007 (0.0051) model time 0.4451 (0.5713) loss 2.0909 (2.6995) grad_norm 3.9939 (2.4235) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][420/625] eta 0:01:56 lr 0.000212 wd 0.0500 time 0.4424 (0.5684) data time 0.0008 (0.0048) model time 0.4416 (0.5636) loss 2.6117 (2.7033) grad_norm 2.0794 (2.4451) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][430/625] eta 0:01:49 lr 0.000212 wd 0.0500 time 0.4419 (0.5613) data time 0.0008 (0.0046) model time 0.4410 (0.5567) loss 2.7389 (2.6931) grad_norm 3.3149 (2.5989) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][440/625] eta 0:01:42 lr 0.000212 wd 0.0500 time 0.4440 (0.5559) data time 0.0006 (0.0044) model time 0.4434 (0.5515) loss 2.2950 (2.6877) grad_norm 2.6205 (2.5835) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][450/625] eta 0:01:36 lr 0.000212 wd 0.0500 time 0.4445 (0.5502) data time 0.0006 (0.0042) model time 0.4439 (0.5459) loss 3.0380 (2.6822) grad_norm 2.5617 (2.6248) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][460/625] eta 0:01:29 lr 0.000212 wd 0.0500 time 0.4418 (0.5451) data time 0.0007 (0.0041) model time 0.4412 (0.5410) loss 2.6790 (2.6739) grad_norm 7.3671 (2.6190) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][470/625] eta 0:01:23 lr 0.000211 wd 0.0500 time 0.4463 (0.5404) data time 0.0006 (0.0039) model time 0.4457 (0.5365) loss 3.1043 (2.6725) grad_norm 1.9226 (2.5936) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][480/625] eta 0:01:17 lr 0.000211 wd 0.0500 time 0.4438 (0.5361) data time 0.0007 (0.0038) model time 0.4431 (0.5323) loss 2.9112 (2.6771) grad_norm 5.2145 (2.6226) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][490/625] eta 0:01:11 lr 0.000211 wd 0.0500 time 0.4440 (0.5322) data time 0.0007 (0.0037) model time 0.4434 (0.5285) loss 2.5550 (2.6733) grad_norm 2.3753 (2.6170) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][500/625] eta 0:01:06 lr 0.000211 wd 0.0500 time 0.4426 (0.5286) data time 0.0008 (0.0036) model time 0.4418 (0.5250) loss 3.1044 (2.6747) grad_norm 2.4143 (2.5984) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][510/625] eta 0:01:00 lr 0.000211 wd 0.0500 time 0.4443 (0.5253) data time 0.0006 (0.0035) model time 0.4437 (0.5218) loss 1.8064 (2.6627) grad_norm 1.7349 (2.5906) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][520/625] eta 0:00:54 lr 0.000211 wd 0.0500 time 0.4547 (0.5223) data time 0.0006 (0.0034) model time 0.4541 (0.5189) loss 1.9410 (2.6587) grad_norm 1.7099 (2.5748) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][530/625] eta 0:00:49 lr 0.000211 wd 0.0500 time 0.4466 (0.5195) data time 0.0008 (0.0033) model time 0.4458 (0.5162) loss 2.7371 (2.6651) grad_norm 2.5096 (2.5614) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][540/625] eta 0:00:43 lr 0.000211 wd 0.0500 time 0.4408 (0.5168) data time 0.0006 (0.0032) model time 0.4402 (0.5137) loss 3.2708 (2.6616) grad_norm 2.3897 (2.5595) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][550/625] eta 0:00:38 lr 0.000211 wd 0.0500 time 0.4443 (0.5144) data time 0.0008 (0.0031) model time 0.4436 (0.5113) loss 1.9238 (2.6508) grad_norm 2.7846 (2.6375) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][560/625] eta 0:00:33 lr 0.000211 wd 0.0500 time 0.4505 (0.5122) data time 0.0006 (0.0030) model time 0.4499 (0.5091) loss 3.2881 (2.6492) grad_norm 2.6592 (2.6376) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][570/625] eta 0:00:28 lr 0.000211 wd 0.0500 time 0.4441 (0.5100) data time 0.0008 (0.0030) model time 0.4433 (0.5070) loss 2.8980 (2.6516) grad_norm 2.0367 (2.6255) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][580/625] eta 0:00:22 lr 0.000211 wd 0.0500 time 0.4427 (0.5079) data time 0.0006 (0.0029) model time 0.4420 (0.5050) loss 2.2190 (2.6576) grad_norm 2.8592 (2.6177) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][590/625] eta 0:00:17 lr 0.000210 wd 0.0500 time 0.4428 (0.5060) data time 0.0007 (0.0028) model time 0.4420 (0.5031) loss 2.4084 (2.6524) grad_norm 6.1950 (2.7197) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][600/625] eta 0:00:12 lr 0.000210 wd 0.0500 time 0.4465 (0.5042) data time 0.0006 (0.0028) model time 0.4460 (0.5014) loss 2.3940 (2.6548) grad_norm 1.9304 (2.7077) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][610/625] eta 0:00:07 lr 0.000210 wd 0.0500 time 0.4402 (0.5025) data time 0.0006 (0.0027) model time 0.4397 (0.4998) loss 2.3818 (2.6553) grad_norm 9.6496 (2.7387) loss_scale 256.0000 (256.0000) mem 16695MB [2024-08-11 06:12:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [224/300][620/625] eta 0:00:02 lr 0.000210 wd 0.0500 time 0.4372 (0.5013) data time 0.0004 (0.0027) model time 0.4368 (0.4986) loss 1.8222 (2.6545) grad_norm 3.3544 (2.7416) loss_scale 512.0000 (262.9755) mem 16695MB [2024-08-11 06:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 224 training takes 0:03:05 [2024-08-11 06:12:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:12:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5039 (0.5039) Acc@1 89.258 (89.258) Acc@5 98.877 (98.877) Mem 16695MB [2024-08-11 06:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8438 (0.6198) Acc@1 80.127 (86.728) Acc@5 96.143 (97.718) Mem 16695MB [2024-08-11 06:12:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.133) Loss 0.9272 (0.7334) Acc@1 78.906 (83.863) Acc@5 95.264 (96.668) Mem 16695MB [2024-08-11 06:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.589 Acc@5 96.611 [2024-08-11 06:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 06:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.59% [2024-08-11 06:12:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-11 06:13:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-11 06:13:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.4800 (0.4800) Acc@1 89.551 (89.551) Acc@5 98.975 (98.975) Mem 16695MB [2024-08-11 06:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.149) Loss 0.7739 (0.5898) Acc@1 81.299 (87.322) Acc@5 96.533 (97.923) Mem 16695MB [2024-08-11 06:13:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 0.8560 (0.6926) Acc@1 79.688 (84.633) Acc@5 95.947 (96.996) Mem 16695MB [2024-08-11 06:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.347 Acc@5 96.985 [2024-08-11 06:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 06:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.35% [2024-08-11 06:13:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 06:13:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 06:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][0/625] eta 0:09:05 lr 0.000210 wd 0.0500 time 0.8726 (0.8726) data time 0.3909 (0.3909) model time 0.0000 (0.0000) loss 2.6257 (2.6257) grad_norm 3.0016 (3.0016) loss_scale 512.0000 (512.0000) mem 16704MB [2024-08-11 06:13:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][10/625] eta 0:05:05 lr 0.000210 wd 0.0500 time 0.4459 (0.4960) data time 0.0006 (0.0363) model time 0.0000 (0.0000) loss 2.7934 (2.5679) grad_norm 1.9621 (2.5137) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][20/625] eta 0:04:45 lr 0.000210 wd 0.0500 time 0.4437 (0.4715) data time 0.0006 (0.0194) model time 0.0000 (0.0000) loss 2.4660 (2.4754) grad_norm 4.0720 (2.5817) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][30/625] eta 0:04:35 lr 0.000210 wd 0.0500 time 0.4436 (0.4636) data time 0.0006 (0.0134) model time 0.0000 (0.0000) loss 2.9756 (2.5928) grad_norm 3.0352 (2.5210) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][40/625] eta 0:04:28 lr 0.000210 wd 0.0500 time 0.4450 (0.4591) data time 0.0009 (0.0104) model time 0.0000 (0.0000) loss 2.9460 (2.6371) grad_norm 2.1478 (2.5399) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][50/625] eta 0:04:22 lr 0.000210 wd 0.0500 time 0.4440 (0.4561) data time 0.0007 (0.0085) model time 0.0000 (0.0000) loss 2.8964 (2.6210) grad_norm 5.9167 (2.5552) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][60/625] eta 0:04:16 lr 0.000210 wd 0.0500 time 0.4441 (0.4541) data time 0.0007 (0.0073) model time 0.4434 (0.4431) loss 2.7374 (2.6509) grad_norm 2.0479 (2.4917) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][70/625] eta 0:04:11 lr 0.000210 wd 0.0500 time 0.4406 (0.4527) data time 0.0006 (0.0064) model time 0.4400 (0.4431) loss 2.6390 (2.6678) grad_norm 1.7745 (2.4278) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 06:13:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][80/625] eta 0:04:06 lr 0.000210 wd 0.0500 time 0.4444 (0.4516) data time 0.0008 (0.0057) model time 0.4436 (0.4429) loss 2.2863 (2.6519) grad_norm 2.4074 (inf) loss_scale 256.0000 (499.3580) mem 16699MB [2024-08-11 06:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][90/625] eta 0:04:01 lr 0.000209 wd 0.0500 time 0.4439 (0.4508) data time 0.0006 (0.0052) model time 0.4432 (0.4431) loss 1.7558 (2.6311) grad_norm 3.6711 (inf) loss_scale 256.0000 (472.6154) mem 16699MB [2024-08-11 06:13:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][100/625] eta 0:03:56 lr 0.000209 wd 0.0500 time 0.4462 (0.4501) data time 0.0007 (0.0047) model time 0.4455 (0.4431) loss 2.3710 (2.6145) grad_norm 2.5080 (inf) loss_scale 256.0000 (451.1683) mem 16699MB [2024-08-11 06:14:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][110/625] eta 0:03:51 lr 0.000209 wd 0.0500 time 0.4436 (0.4496) data time 0.0006 (0.0044) model time 0.4430 (0.4431) loss 3.5114 (2.6224) grad_norm 1.8782 (inf) loss_scale 256.0000 (433.5856) mem 16699MB [2024-08-11 06:14:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][120/625] eta 0:03:46 lr 0.000209 wd 0.0500 time 0.4487 (0.4492) data time 0.0008 (0.0041) model time 0.4479 (0.4432) loss 3.0392 (2.6383) grad_norm 3.6057 (inf) loss_scale 256.0000 (418.9091) mem 16699MB [2024-08-11 06:14:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][130/625] eta 0:03:42 lr 0.000209 wd 0.0500 time 0.4453 (0.4487) data time 0.0008 (0.0038) model time 0.4445 (0.4431) loss 2.7694 (2.6366) grad_norm 1.8718 (inf) loss_scale 256.0000 (406.4733) mem 16699MB [2024-08-11 06:14:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][140/625] eta 0:03:37 lr 0.000209 wd 0.0500 time 0.4429 (0.4483) data time 0.0006 (0.0036) model time 0.4423 (0.4430) loss 3.1878 (2.6539) grad_norm 2.8494 (inf) loss_scale 256.0000 (395.8014) mem 16699MB [2024-08-11 06:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][150/625] eta 0:03:32 lr 0.000209 wd 0.0500 time 0.4433 (0.4480) data time 0.0007 (0.0034) model time 0.4427 (0.4429) loss 1.6326 (2.6411) grad_norm 2.1579 (inf) loss_scale 256.0000 (386.5430) mem 16699MB [2024-08-11 06:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][160/625] eta 0:03:28 lr 0.000209 wd 0.0500 time 0.4491 (0.4477) data time 0.0008 (0.0033) model time 0.4484 (0.4430) loss 3.1302 (2.6382) grad_norm 1.6839 (inf) loss_scale 256.0000 (378.4348) mem 16699MB [2024-08-11 06:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][170/625] eta 0:03:23 lr 0.000209 wd 0.0500 time 0.4443 (0.4476) data time 0.0006 (0.0031) model time 0.4437 (0.4430) loss 1.9971 (2.6339) grad_norm 1.8587 (inf) loss_scale 256.0000 (371.2749) mem 16699MB [2024-08-11 06:14:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][180/625] eta 0:03:19 lr 0.000209 wd 0.0500 time 0.4451 (0.4474) data time 0.0006 (0.0030) model time 0.4445 (0.4431) loss 3.0235 (2.6379) grad_norm 2.7007 (inf) loss_scale 256.0000 (364.9061) mem 16699MB [2024-08-11 06:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][190/625] eta 0:03:14 lr 0.000209 wd 0.0500 time 0.4435 (0.4472) data time 0.0006 (0.0029) model time 0.4429 (0.4431) loss 3.3417 (2.6496) grad_norm 3.5121 (inf) loss_scale 256.0000 (359.2042) mem 16699MB [2024-08-11 06:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][200/625] eta 0:03:10 lr 0.000209 wd 0.0500 time 0.4438 (0.4471) data time 0.0009 (0.0028) model time 0.4429 (0.4431) loss 2.4005 (2.6525) grad_norm 2.3106 (inf) loss_scale 256.0000 (354.0697) mem 16699MB [2024-08-11 06:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][210/625] eta 0:03:05 lr 0.000209 wd 0.0500 time 0.4430 (0.4470) data time 0.0006 (0.0027) model time 0.4424 (0.4432) loss 2.8148 (2.6570) grad_norm 2.4566 (inf) loss_scale 256.0000 (349.4218) mem 16699MB [2024-08-11 06:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][220/625] eta 0:03:01 lr 0.000208 wd 0.0500 time 0.4381 (0.4476) data time 0.0008 (0.0026) model time 0.4373 (0.4441) loss 2.6077 (2.6604) grad_norm 1.3719 (inf) loss_scale 256.0000 (345.1946) mem 16699MB [2024-08-11 06:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][230/625] eta 0:02:56 lr 0.000208 wd 0.0500 time 0.4483 (0.4475) data time 0.0007 (0.0026) model time 0.4476 (0.4441) loss 2.3379 (2.6500) grad_norm 1.5676 (inf) loss_scale 256.0000 (341.3333) mem 16699MB [2024-08-11 06:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][240/625] eta 0:02:52 lr 0.000208 wd 0.0500 time 0.4456 (0.4474) data time 0.0008 (0.0025) model time 0.4447 (0.4441) loss 2.0185 (2.6460) grad_norm 1.6356 (inf) loss_scale 256.0000 (337.7925) mem 16699MB [2024-08-11 06:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][250/625] eta 0:02:47 lr 0.000208 wd 0.0500 time 0.4410 (0.4473) data time 0.0008 (0.0024) model time 0.4402 (0.4442) loss 2.4511 (2.6518) grad_norm 1.5454 (inf) loss_scale 256.0000 (334.5339) mem 16699MB [2024-08-11 06:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][260/625] eta 0:02:43 lr 0.000208 wd 0.0500 time 0.4428 (0.4473) data time 0.0008 (0.0024) model time 0.4420 (0.4442) loss 2.7897 (2.6570) grad_norm 3.2294 (inf) loss_scale 256.0000 (331.5249) mem 16699MB [2024-08-11 06:15:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][270/625] eta 0:02:38 lr 0.000208 wd 0.0500 time 0.4430 (0.4472) data time 0.0008 (0.0023) model time 0.4422 (0.4443) loss 1.6681 (2.6453) grad_norm 1.9747 (inf) loss_scale 256.0000 (328.7380) mem 16699MB [2024-08-11 06:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][280/625] eta 0:02:34 lr 0.000208 wd 0.0500 time 0.4404 (0.4471) data time 0.0008 (0.0023) model time 0.4396 (0.4442) loss 1.7247 (2.6405) grad_norm 1.8077 (inf) loss_scale 256.0000 (326.1495) mem 16699MB [2024-08-11 06:15:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][290/625] eta 0:02:29 lr 0.000208 wd 0.0500 time 0.4476 (0.4470) data time 0.0009 (0.0022) model time 0.4467 (0.4442) loss 2.9376 (2.6312) grad_norm 3.0368 (inf) loss_scale 256.0000 (323.7388) mem 16699MB [2024-08-11 06:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][300/625] eta 0:02:25 lr 0.000208 wd 0.0500 time 0.4492 (0.4469) data time 0.0008 (0.0022) model time 0.4484 (0.4442) loss 2.5955 (2.6368) grad_norm 2.3737 (inf) loss_scale 256.0000 (321.4884) mem 16699MB [2024-08-11 06:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][310/625] eta 0:02:20 lr 0.000208 wd 0.0500 time 0.4463 (0.4469) data time 0.0008 (0.0021) model time 0.4455 (0.4442) loss 2.4475 (2.6409) grad_norm 2.7276 (inf) loss_scale 256.0000 (319.3826) mem 16699MB [2024-08-11 06:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][320/625] eta 0:02:16 lr 0.000208 wd 0.0500 time 0.4477 (0.4468) data time 0.0008 (0.0021) model time 0.4470 (0.4442) loss 2.8341 (2.6336) grad_norm 2.5662 (inf) loss_scale 256.0000 (317.4081) mem 16699MB [2024-08-11 06:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][330/625] eta 0:02:11 lr 0.000208 wd 0.0500 time 0.4429 (0.4468) data time 0.0007 (0.0021) model time 0.4422 (0.4442) loss 1.8845 (2.6281) grad_norm 1.9788 (inf) loss_scale 256.0000 (315.5529) mem 16699MB [2024-08-11 06:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][340/625] eta 0:02:07 lr 0.000207 wd 0.0500 time 0.4378 (0.4467) data time 0.0007 (0.0020) model time 0.4372 (0.4441) loss 1.8246 (2.6258) grad_norm 2.3790 (inf) loss_scale 128.0000 (312.3050) mem 16699MB [2024-08-11 06:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][350/625] eta 0:02:02 lr 0.000207 wd 0.0500 time 0.4472 (0.4470) data time 0.0007 (0.0020) model time 0.4465 (0.4446) loss 2.4822 (2.6240) grad_norm 4.2418 (inf) loss_scale 128.0000 (307.0541) mem 16699MB [2024-08-11 06:15:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][360/625] eta 0:01:58 lr 0.000207 wd 0.0500 time 0.4393 (0.4476) data time 0.0008 (0.0020) model time 0.4385 (0.4453) loss 3.0522 (2.6270) grad_norm 2.9389 (inf) loss_scale 128.0000 (302.0942) mem 16699MB [2024-08-11 06:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 06:15:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:15:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:35:54 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 06:35:54 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 06:36:06 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 06:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 06:36:20 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 06:36:22 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 06:36:24 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 06:36:24 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 225) [2024-08-11 06:36:24 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 06:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][370/625] eta 0:40:04 lr 0.000207 wd 0.0500 time 1.0421 (9.4294) data time 0.0007 (0.3763) model time 1.0413 (9.0531) loss 3.2018 (3.0911) grad_norm 2.3212 (2.0775) loss_scale 128.0000 (128.0000) mem 16700MB [2024-08-11 06:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][380/625] eta 0:07:56 lr 0.000207 wd 0.0500 time 0.4435 (1.9440) data time 0.0007 (0.0634) model time 0.4428 (1.8805) loss 1.9194 (2.7298) grad_norm 1.8835 (2.3817) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][390/625] eta 0:04:56 lr 0.000207 wd 0.0500 time 0.4438 (1.2628) data time 0.0009 (0.0350) model time 0.4429 (1.2278) loss 2.9184 (2.7566) grad_norm 2.0998 (4.6520) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][400/625] eta 0:03:48 lr 0.000207 wd 0.0500 time 0.4487 (1.0159) data time 0.0008 (0.0243) model time 0.4479 (0.9915) loss 2.6670 (2.8009) grad_norm 2.7014 (3.8359) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][410/625] eta 0:03:10 lr 0.000207 wd 0.0500 time 0.4477 (0.8843) data time 0.0009 (0.0188) model time 0.4468 (0.8655) loss 2.8042 (2.7810) grad_norm 1.5827 (3.8984) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][420/625] eta 0:02:44 lr 0.000207 wd 0.0500 time 0.4443 (0.8004) data time 0.0006 (0.0153) model time 0.4437 (0.7851) loss 2.5844 (2.7485) grad_norm 1.9762 (3.5622) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][430/625] eta 0:02:24 lr 0.000207 wd 0.0500 time 0.4490 (0.7436) data time 0.0007 (0.0130) model time 0.4483 (0.7306) loss 3.2111 (2.7282) grad_norm 1.9623 (3.4003) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][440/625] eta 0:02:09 lr 0.000207 wd 0.0500 time 0.4494 (0.7025) data time 0.0009 (0.0113) model time 0.4485 (0.6912) loss 2.1920 (2.6864) grad_norm 1.9436 (3.2214) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][450/625] eta 0:01:57 lr 0.000207 wd 0.0500 time 0.4436 (0.6715) data time 0.0008 (0.0100) model time 0.4428 (0.6615) loss 2.6039 (2.6876) grad_norm 2.5019 (3.1947) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][460/625] eta 0:01:46 lr 0.000207 wd 0.0500 time 0.4469 (0.6472) data time 0.0006 (0.0090) model time 0.4462 (0.6382) loss 2.0045 (2.6791) grad_norm 2.0061 (3.0957) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][470/625] eta 0:01:37 lr 0.000206 wd 0.0500 time 0.4473 (0.6275) data time 0.0006 (0.0082) model time 0.4467 (0.6193) loss 3.0009 (2.6978) grad_norm 2.4846 (3.0545) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][480/625] eta 0:01:28 lr 0.000206 wd 0.0500 time 0.4448 (0.6112) data time 0.0008 (0.0076) model time 0.4440 (0.6037) loss 2.7702 (2.6944) grad_norm 1.9162 (2.9593) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][490/625] eta 0:01:20 lr 0.000206 wd 0.0500 time 0.4441 (0.5976) data time 0.0007 (0.0070) model time 0.4435 (0.5906) loss 3.0571 (2.6922) grad_norm 1.9293 (2.8988) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][500/625] eta 0:01:13 lr 0.000206 wd 0.0500 time 0.4449 (0.5861) data time 0.0009 (0.0065) model time 0.4440 (0.5795) loss 2.8079 (2.6822) grad_norm 2.8915 (2.8993) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][510/625] eta 0:01:06 lr 0.000206 wd 0.0500 time 0.4425 (0.5762) data time 0.0008 (0.0061) model time 0.4417 (0.5700) loss 2.7603 (2.6677) grad_norm 2.4249 (3.0237) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][520/625] eta 0:00:59 lr 0.000206 wd 0.0500 time 0.4468 (0.5677) data time 0.0009 (0.0058) model time 0.4459 (0.5619) loss 3.3306 (2.6648) grad_norm 2.1812 (3.0518) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][530/625] eta 0:00:53 lr 0.000206 wd 0.0500 time 0.4412 (0.5601) data time 0.0009 (0.0055) model time 0.4403 (0.5546) loss 2.6168 (2.6643) grad_norm 1.7126 (3.0048) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][540/625] eta 0:00:47 lr 0.000206 wd 0.0500 time 0.4417 (0.5534) data time 0.0008 (0.0052) model time 0.4410 (0.5481) loss 2.3148 (2.6604) grad_norm 2.1403 (3.1084) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][550/625] eta 0:00:41 lr 0.000206 wd 0.0500 time 0.4410 (0.5474) data time 0.0010 (0.0050) model time 0.4400 (0.5424) loss 2.8216 (2.6581) grad_norm 1.7227 (3.0654) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][560/625] eta 0:00:35 lr 0.000206 wd 0.0500 time 0.4468 (0.5428) data time 0.0008 (0.0048) model time 0.4460 (0.5381) loss 2.9555 (2.6568) grad_norm 1.6714 (3.0754) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][570/625] eta 0:00:29 lr 0.000206 wd 0.0500 time 0.4478 (0.5381) data time 0.0007 (0.0046) model time 0.4471 (0.5336) loss 3.0383 (2.6465) grad_norm 1.8825 (3.1908) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][580/625] eta 0:00:24 lr 0.000206 wd 0.0500 time 0.4493 (0.5339) data time 0.0008 (0.0044) model time 0.4485 (0.5295) loss 2.6607 (2.6394) grad_norm 4.2648 (3.1793) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][590/625] eta 0:00:18 lr 0.000206 wd 0.0500 time 0.4435 (0.5301) data time 0.0009 (0.0042) model time 0.4426 (0.5258) loss 3.0334 (2.6435) grad_norm 2.1212 (3.1464) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][600/625] eta 0:00:13 lr 0.000205 wd 0.0500 time 0.4486 (0.5266) data time 0.0007 (0.0041) model time 0.4480 (0.5225) loss 2.5409 (2.6378) grad_norm 2.4115 (3.1281) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][610/625] eta 0:00:07 lr 0.000205 wd 0.0500 time 0.4421 (0.5233) data time 0.0004 (0.0040) model time 0.4417 (0.5193) loss 3.0524 (2.6389) grad_norm 2.0433 (3.1189) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [225/300][620/625] eta 0:00:02 lr 0.000205 wd 0.0500 time 0.4444 (0.5202) data time 0.0004 (0.0038) model time 0.4440 (0.5164) loss 3.2336 (2.6330) grad_norm 2.4804 (3.1006) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 06:38:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 225 training takes 0:02:12 [2024-08-11 06:38:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:38:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.454 (0.454) Loss 0.5122 (0.5122) Acc@1 88.916 (88.916) Acc@5 98.682 (98.682) Mem 16695MB [2024-08-11 06:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 0.8281 (0.6178) Acc@1 80.957 (86.879) Acc@5 96.387 (97.767) Mem 16695MB [2024-08-11 06:38:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.133) Loss 0.9282 (0.7324) Acc@1 78.955 (83.936) Acc@5 95.410 (96.701) Mem 16695MB [2024-08-11 06:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.657 Acc@5 96.647 [2024-08-11 06:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 06:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.66% [2024-08-11 06:38:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-11 06:38:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-11 06:38:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.4792 (0.4792) Acc@1 89.600 (89.600) Acc@5 98.975 (98.975) Mem 16695MB [2024-08-11 06:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.7744 (0.5903) Acc@1 81.396 (87.314) Acc@5 96.631 (97.954) Mem 16695MB [2024-08-11 06:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.8584 (0.6933) Acc@1 79.688 (84.621) Acc@5 95.947 (97.033) Mem 16695MB [2024-08-11 06:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.343 Acc@5 97.017 [2024-08-11 06:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 06:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.34% [2024-08-11 06:38:56 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 06:39:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 06:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][0/625] eta 0:09:03 lr 0.000205 wd 0.0500 time 0.8695 (0.8695) data time 0.3964 (0.3964) model time 0.0000 (0.0000) loss 1.9017 (1.9017) grad_norm 2.6321 (2.6321) loss_scale 128.0000 (128.0000) mem 16704MB [2024-08-11 06:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][10/625] eta 0:04:58 lr 0.000205 wd 0.0500 time 0.4489 (0.4853) data time 0.0006 (0.0368) model time 0.0000 (0.0000) loss 1.7096 (2.3628) grad_norm 3.4282 (3.3345) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][20/625] eta 0:04:43 lr 0.000205 wd 0.0500 time 0.4458 (0.4682) data time 0.0009 (0.0197) model time 0.0000 (0.0000) loss 3.1380 (2.5767) grad_norm 2.8765 (2.9144) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][30/625] eta 0:04:35 lr 0.000205 wd 0.0500 time 0.4497 (0.4626) data time 0.0006 (0.0136) model time 0.0000 (0.0000) loss 3.4272 (2.5849) grad_norm 2.1412 (2.7077) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][40/625] eta 0:04:28 lr 0.000205 wd 0.0500 time 0.4457 (0.4594) data time 0.0009 (0.0105) model time 0.0000 (0.0000) loss 1.7588 (2.5280) grad_norm 1.6316 (2.5574) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][50/625] eta 0:04:25 lr 0.000205 wd 0.0500 time 0.4453 (0.4613) data time 0.0007 (0.0086) model time 0.0000 (0.0000) loss 3.0886 (2.5369) grad_norm 3.1350 (2.4758) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][60/625] eta 0:04:20 lr 0.000205 wd 0.0500 time 0.4480 (0.4615) data time 0.0009 (0.0074) model time 0.4472 (0.4615) loss 2.7203 (2.5861) grad_norm 2.1555 (2.4992) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][70/625] eta 0:04:15 lr 0.000205 wd 0.0500 time 0.4511 (0.4595) data time 0.0006 (0.0064) model time 0.4504 (0.4541) loss 1.9265 (2.6094) grad_norm 3.3680 (2.7680) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][80/625] eta 0:04:09 lr 0.000205 wd 0.0500 time 0.4420 (0.4580) data time 0.0010 (0.0057) model time 0.4410 (0.4515) loss 1.9779 (2.5971) grad_norm 1.8503 (2.7619) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][90/625] eta 0:04:04 lr 0.000205 wd 0.0500 time 0.4500 (0.4571) data time 0.0007 (0.0052) model time 0.4493 (0.4508) loss 2.3981 (2.6171) grad_norm 9.3416 (2.8103) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][100/625] eta 0:03:59 lr 0.000204 wd 0.0500 time 0.4577 (0.4565) data time 0.0008 (0.0048) model time 0.4569 (0.4507) loss 2.4674 (2.6214) grad_norm 2.3762 (2.7738) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][110/625] eta 0:03:54 lr 0.000204 wd 0.0500 time 0.4553 (0.4559) data time 0.0006 (0.0044) model time 0.4547 (0.4503) loss 2.3658 (2.6317) grad_norm 3.9793 (2.8170) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][120/625] eta 0:03:49 lr 0.000204 wd 0.0500 time 0.4455 (0.4553) data time 0.0009 (0.0041) model time 0.4446 (0.4501) loss 2.7829 (2.6300) grad_norm 2.3282 (2.7702) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:39:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][130/625] eta 0:03:45 lr 0.000204 wd 0.0500 time 0.4451 (0.4547) data time 0.0006 (0.0039) model time 0.4445 (0.4496) loss 2.7862 (2.6142) grad_norm 6.8906 (2.7774) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][140/625] eta 0:03:40 lr 0.000204 wd 0.0500 time 0.4499 (0.4542) data time 0.0007 (0.0037) model time 0.4492 (0.4493) loss 2.7118 (2.6234) grad_norm 2.2710 (2.7822) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][150/625] eta 0:03:35 lr 0.000204 wd 0.0500 time 0.4463 (0.4537) data time 0.0009 (0.0035) model time 0.4454 (0.4490) loss 2.7639 (2.6346) grad_norm 1.7514 (2.7239) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][160/625] eta 0:03:30 lr 0.000204 wd 0.0500 time 0.4454 (0.4533) data time 0.0006 (0.0033) model time 0.4448 (0.4487) loss 1.6697 (2.6314) grad_norm 1.5476 (2.7336) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][170/625] eta 0:03:26 lr 0.000204 wd 0.0500 time 0.4455 (0.4529) data time 0.0009 (0.0032) model time 0.4447 (0.4485) loss 3.5022 (2.6399) grad_norm 1.9374 (2.7038) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][180/625] eta 0:03:21 lr 0.000204 wd 0.0500 time 0.4484 (0.4526) data time 0.0009 (0.0030) model time 0.4475 (0.4484) loss 2.8618 (2.6501) grad_norm 2.2897 (2.6999) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][190/625] eta 0:03:16 lr 0.000204 wd 0.0500 time 0.4474 (0.4524) data time 0.0008 (0.0029) model time 0.4466 (0.4483) loss 2.2277 (2.6488) grad_norm 2.1019 (2.6813) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][200/625] eta 0:03:12 lr 0.000204 wd 0.0500 time 0.4456 (0.4522) data time 0.0009 (0.0028) model time 0.4448 (0.4482) loss 1.6885 (2.6357) grad_norm 9.8830 (2.7330) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][210/625] eta 0:03:07 lr 0.000204 wd 0.0500 time 0.4469 (0.4520) data time 0.0009 (0.0027) model time 0.4461 (0.4482) loss 1.9939 (2.6261) grad_norm 2.0536 (2.7495) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][220/625] eta 0:03:02 lr 0.000204 wd 0.0500 time 0.4481 (0.4518) data time 0.0006 (0.0026) model time 0.4475 (0.4481) loss 2.8947 (2.6226) grad_norm 4.0608 (2.7638) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][230/625] eta 0:02:58 lr 0.000203 wd 0.0500 time 0.4471 (0.4516) data time 0.0010 (0.0026) model time 0.4461 (0.4481) loss 2.7196 (2.6241) grad_norm 2.4297 (2.7618) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][240/625] eta 0:02:53 lr 0.000203 wd 0.0500 time 0.4460 (0.4515) data time 0.0006 (0.0025) model time 0.4454 (0.4480) loss 2.8665 (2.6214) grad_norm 1.5364 (2.7274) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][250/625] eta 0:02:49 lr 0.000203 wd 0.0500 time 0.4518 (0.4515) data time 0.0007 (0.0024) model time 0.4511 (0.4481) loss 2.9574 (2.6236) grad_norm 1.5895 (2.6914) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][260/625] eta 0:02:44 lr 0.000203 wd 0.0500 time 0.4503 (0.4514) data time 0.0009 (0.0024) model time 0.4494 (0.4481) loss 2.9476 (2.6324) grad_norm 1.6382 (2.6653) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][270/625] eta 0:02:40 lr 0.000203 wd 0.0500 time 0.4476 (0.4513) data time 0.0008 (0.0023) model time 0.4468 (0.4481) loss 2.5659 (2.6206) grad_norm 3.1163 (2.6704) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][280/625] eta 0:02:35 lr 0.000203 wd 0.0500 time 0.4462 (0.4512) data time 0.0006 (0.0023) model time 0.4456 (0.4481) loss 1.8362 (2.6132) grad_norm 1.7404 (2.6587) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][290/625] eta 0:02:31 lr 0.000203 wd 0.0500 time 0.4450 (0.4510) data time 0.0007 (0.0022) model time 0.4443 (0.4480) loss 2.6857 (2.6174) grad_norm 1.7323 (2.6386) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][300/625] eta 0:02:26 lr 0.000203 wd 0.0500 time 0.4452 (0.4509) data time 0.0006 (0.0022) model time 0.4446 (0.4479) loss 1.6673 (2.6231) grad_norm 2.0698 (2.6243) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][310/625] eta 0:02:21 lr 0.000203 wd 0.0500 time 0.4494 (0.4508) data time 0.0009 (0.0021) model time 0.4486 (0.4478) loss 2.5085 (2.6271) grad_norm 2.8463 (2.6142) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][320/625] eta 0:02:17 lr 0.000203 wd 0.0500 time 0.4486 (0.4507) data time 0.0006 (0.0021) model time 0.4480 (0.4478) loss 3.0830 (2.6293) grad_norm 2.7086 (2.6036) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][330/625] eta 0:02:12 lr 0.000203 wd 0.0500 time 0.4487 (0.4507) data time 0.0006 (0.0020) model time 0.4480 (0.4479) loss 2.8881 (2.6351) grad_norm 1.9410 (2.6129) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][340/625] eta 0:02:08 lr 0.000203 wd 0.0500 time 0.4468 (0.4506) data time 0.0007 (0.0020) model time 0.4462 (0.4478) loss 2.8461 (2.6367) grad_norm 2.5930 (2.6162) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][350/625] eta 0:02:03 lr 0.000202 wd 0.0500 time 0.4457 (0.4504) data time 0.0009 (0.0020) model time 0.4449 (0.4477) loss 2.9168 (2.6337) grad_norm 2.1591 (2.6021) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][360/625] eta 0:01:59 lr 0.000202 wd 0.0500 time 0.4462 (0.4503) data time 0.0009 (0.0020) model time 0.4453 (0.4476) loss 2.9253 (2.6329) grad_norm 1.7644 (2.6059) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][370/625] eta 0:01:54 lr 0.000202 wd 0.0500 time 0.4454 (0.4502) data time 0.0006 (0.0019) model time 0.4448 (0.4476) loss 2.7663 (2.6359) grad_norm 1.5357 (2.6063) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][380/625] eta 0:01:50 lr 0.000202 wd 0.0500 time 0.4493 (0.4502) data time 0.0006 (0.0019) model time 0.4487 (0.4476) loss 2.4125 (2.6360) grad_norm 2.3015 (2.5980) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:41:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][390/625] eta 0:01:45 lr 0.000202 wd 0.0500 time 0.4513 (0.4507) data time 0.0007 (0.0019) model time 0.4506 (0.4482) loss 2.5484 (2.6323) grad_norm 1.8790 (2.5878) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][400/625] eta 0:01:41 lr 0.000202 wd 0.0500 time 0.4477 (0.4510) data time 0.0009 (0.0018) model time 0.4468 (0.4486) loss 1.8352 (2.6302) grad_norm 2.2390 (2.5808) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][410/625] eta 0:01:36 lr 0.000202 wd 0.0500 time 0.4496 (0.4509) data time 0.0008 (0.0018) model time 0.4488 (0.4486) loss 2.6525 (2.6294) grad_norm 1.6847 (2.5781) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][420/625] eta 0:01:32 lr 0.000202 wd 0.0500 time 0.4473 (0.4509) data time 0.0006 (0.0018) model time 0.4467 (0.4486) loss 2.5105 (2.6335) grad_norm 2.2830 (2.5741) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][430/625] eta 0:01:27 lr 0.000202 wd 0.0500 time 0.4482 (0.4508) data time 0.0007 (0.0018) model time 0.4474 (0.4485) loss 2.9365 (2.6352) grad_norm 1.4676 (2.5646) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][440/625] eta 0:01:23 lr 0.000202 wd 0.0500 time 0.4490 (0.4507) data time 0.0008 (0.0017) model time 0.4482 (0.4485) loss 1.5048 (2.6266) grad_norm 2.2264 (2.5654) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][450/625] eta 0:01:18 lr 0.000202 wd 0.0500 time 0.4498 (0.4506) data time 0.0007 (0.0017) model time 0.4492 (0.4484) loss 1.6252 (2.6254) grad_norm 1.7252 (2.5751) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][460/625] eta 0:01:14 lr 0.000202 wd 0.0500 time 0.4468 (0.4506) data time 0.0009 (0.0017) model time 0.4459 (0.4484) loss 1.7532 (2.6185) grad_norm 1.7891 (2.5689) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][470/625] eta 0:01:09 lr 0.000202 wd 0.0500 time 0.4489 (0.4505) data time 0.0006 (0.0017) model time 0.4482 (0.4484) loss 2.0742 (2.6176) grad_norm 3.4437 (2.5642) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][480/625] eta 0:01:05 lr 0.000201 wd 0.0500 time 0.4493 (0.4505) data time 0.0006 (0.0017) model time 0.4487 (0.4484) loss 3.2125 (2.6236) grad_norm 1.8588 (2.5597) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][490/625] eta 0:01:00 lr 0.000201 wd 0.0500 time 0.4468 (0.4505) data time 0.0007 (0.0017) model time 0.4461 (0.4483) loss 3.1212 (2.6239) grad_norm 2.1417 (2.5710) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][500/625] eta 0:00:56 lr 0.000201 wd 0.0500 time 0.4470 (0.4504) data time 0.0009 (0.0016) model time 0.4461 (0.4483) loss 2.9836 (2.6203) grad_norm 1.9717 (2.5632) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][510/625] eta 0:00:51 lr 0.000201 wd 0.0500 time 0.4552 (0.4504) data time 0.0008 (0.0016) model time 0.4544 (0.4483) loss 2.0723 (2.6216) grad_norm 3.2647 (2.5583) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][520/625] eta 0:00:47 lr 0.000201 wd 0.0500 time 0.4461 (0.4503) data time 0.0010 (0.0016) model time 0.4450 (0.4483) loss 2.3580 (2.6243) grad_norm 1.8410 (2.5499) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:42:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][530/625] eta 0:00:42 lr 0.000201 wd 0.0500 time 0.4519 (0.4503) data time 0.0007 (0.0016) model time 0.4512 (0.4483) loss 3.0712 (2.6242) grad_norm 1.7178 (2.5397) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][540/625] eta 0:00:38 lr 0.000201 wd 0.0500 time 0.4497 (0.4503) data time 0.0006 (0.0016) model time 0.4491 (0.4483) loss 3.0444 (2.6251) grad_norm 1.7435 (2.6054) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][550/625] eta 0:00:33 lr 0.000201 wd 0.0500 time 0.4487 (0.4502) data time 0.0006 (0.0016) model time 0.4480 (0.4483) loss 1.6578 (2.6198) grad_norm 1.5613 (2.6028) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][560/625] eta 0:00:29 lr 0.000201 wd 0.0500 time 0.4473 (0.4502) data time 0.0009 (0.0016) model time 0.4465 (0.4482) loss 2.5678 (2.6186) grad_norm 2.3390 (2.5958) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][570/625] eta 0:00:24 lr 0.000201 wd 0.0500 time 0.4502 (0.4502) data time 0.0008 (0.0015) model time 0.4495 (0.4482) loss 2.1424 (2.6131) grad_norm 1.9007 (2.6219) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][580/625] eta 0:00:20 lr 0.000201 wd 0.0500 time 0.4470 (0.4504) data time 0.0009 (0.0015) model time 0.4461 (0.4484) loss 1.5678 (2.6115) grad_norm 3.0266 (2.6213) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][590/625] eta 0:00:15 lr 0.000201 wd 0.0500 time 0.4455 (0.4504) data time 0.0009 (0.0015) model time 0.4446 (0.4485) loss 2.9072 (2.6115) grad_norm 1.8625 (2.6169) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][600/625] eta 0:00:11 lr 0.000201 wd 0.0500 time 0.4525 (0.4503) data time 0.0008 (0.0015) model time 0.4517 (0.4485) loss 2.8522 (2.6138) grad_norm 2.0387 (2.6148) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][610/625] eta 0:00:06 lr 0.000200 wd 0.0500 time 0.4430 (0.4503) data time 0.0004 (0.0015) model time 0.4426 (0.4484) loss 2.4275 (2.6122) grad_norm 2.0717 (2.6140) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [226/300][620/625] eta 0:00:02 lr 0.000200 wd 0.0500 time 0.4459 (0.4502) data time 0.0004 (0.0015) model time 0.4455 (0.4484) loss 1.8537 (2.6104) grad_norm 1.5877 (2.6209) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 226 training takes 0:04:41 [2024-08-11 06:43:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:43:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:43:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.457 (0.457) Loss 0.5024 (0.5024) Acc@1 89.062 (89.062) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 06:43:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8296 (0.6204) Acc@1 80.615 (86.799) Acc@5 95.654 (97.741) Mem 16699MB [2024-08-11 06:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9194 (0.7325) Acc@1 78.271 (83.973) Acc@5 95.605 (96.735) Mem 16699MB [2024-08-11 06:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.635 Acc@5 96.669 [2024-08-11 06:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 06:43:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.802 (0.802) Loss 0.4792 (0.4792) Acc@1 89.551 (89.551) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 06:43:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.183) Loss 0.7754 (0.5904) Acc@1 81.299 (87.300) Acc@5 96.729 (97.940) Mem 16699MB [2024-08-11 06:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.151) Loss 0.8608 (0.6938) Acc@1 79.736 (84.638) Acc@5 95.947 (97.021) Mem 16699MB [2024-08-11 06:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.357 Acc@5 97.005 [2024-08-11 06:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.4% [2024-08-11 06:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 84.36% [2024-08-11 06:43:49 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 06:43:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 06:43:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][0/625] eta 0:08:17 lr 0.000200 wd 0.0500 time 0.7956 (0.7956) data time 0.4082 (0.4082) model time 0.0000 (0.0000) loss 1.7881 (1.7881) grad_norm 2.9387 (2.9387) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:43:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][10/625] eta 0:04:54 lr 0.000200 wd 0.0500 time 0.4462 (0.4789) data time 0.0006 (0.0379) model time 0.0000 (0.0000) loss 3.1871 (2.5411) grad_norm 1.9214 (2.8472) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][20/625] eta 0:04:40 lr 0.000200 wd 0.0500 time 0.4453 (0.4637) data time 0.0007 (0.0202) model time 0.0000 (0.0000) loss 3.3883 (2.5871) grad_norm 1.6873 (2.7193) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][30/625] eta 0:04:32 lr 0.000200 wd 0.0500 time 0.4443 (0.4580) data time 0.0007 (0.0140) model time 0.0000 (0.0000) loss 1.7929 (2.5279) grad_norm 2.1680 (2.6319) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][40/625] eta 0:04:26 lr 0.000200 wd 0.0500 time 0.4476 (0.4553) data time 0.0009 (0.0108) model time 0.0000 (0.0000) loss 2.9596 (2.6114) grad_norm 1.6109 (2.5570) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][50/625] eta 0:04:20 lr 0.000200 wd 0.0500 time 0.4468 (0.4536) data time 0.0007 (0.0090) model time 0.0000 (0.0000) loss 3.0925 (2.6517) grad_norm 2.5028 (2.4540) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][60/625] eta 0:04:15 lr 0.000200 wd 0.0500 time 0.4459 (0.4527) data time 0.0009 (0.0076) model time 0.4450 (0.4469) loss 1.6105 (2.6036) grad_norm 2.0374 (2.3572) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][70/625] eta 0:04:10 lr 0.000200 wd 0.0500 time 0.4488 (0.4521) data time 0.0009 (0.0067) model time 0.4480 (0.4473) loss 3.0941 (2.5706) grad_norm 2.0262 (2.4049) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][80/625] eta 0:04:06 lr 0.000200 wd 0.0500 time 0.4471 (0.4515) data time 0.0009 (0.0060) model time 0.4462 (0.4469) loss 2.4507 (2.5712) grad_norm 2.8986 (2.4596) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][90/625] eta 0:04:01 lr 0.000200 wd 0.0500 time 0.4468 (0.4511) data time 0.0007 (0.0054) model time 0.4461 (0.4469) loss 3.0738 (2.5946) grad_norm 2.2685 (2.4382) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][100/625] eta 0:03:56 lr 0.000200 wd 0.0500 time 0.3905 (0.4512) data time 0.0007 (0.0050) model time 0.3898 (0.4477) loss 3.1353 (2.6027) grad_norm 2.0342 (2.5576) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][110/625] eta 0:03:52 lr 0.000199 wd 0.0500 time 0.4482 (0.4508) data time 0.0008 (0.0046) model time 0.4474 (0.4474) loss 2.6343 (2.6049) grad_norm 2.1225 (2.5292) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][120/625] eta 0:03:47 lr 0.000199 wd 0.0500 time 0.4454 (0.4506) data time 0.0009 (0.0043) model time 0.4445 (0.4474) loss 2.3367 (2.5927) grad_norm 1.9098 (2.4913) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][130/625] eta 0:03:42 lr 0.000199 wd 0.0500 time 0.4463 (0.4504) data time 0.0006 (0.0041) model time 0.4457 (0.4474) loss 2.7697 (2.6013) grad_norm 2.3275 (2.5097) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][140/625] eta 0:03:38 lr 0.000199 wd 0.0500 time 0.4473 (0.4503) data time 0.0009 (0.0038) model time 0.4464 (0.4475) loss 2.7791 (2.5989) grad_norm 3.0630 (2.4851) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][150/625] eta 0:03:34 lr 0.000199 wd 0.0500 time 0.4502 (0.4513) data time 0.0007 (0.0036) model time 0.4496 (0.4492) loss 2.8024 (2.5827) grad_norm 2.0359 (2.4820) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][160/625] eta 0:03:29 lr 0.000199 wd 0.0500 time 0.4466 (0.4511) data time 0.0007 (0.0034) model time 0.4459 (0.4490) loss 2.6621 (2.5903) grad_norm 9.8590 (2.5238) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][170/625] eta 0:03:25 lr 0.000199 wd 0.0500 time 0.4464 (0.4509) data time 0.0009 (0.0033) model time 0.4456 (0.4488) loss 2.6871 (2.5886) grad_norm 10.5872 (2.5607) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][180/625] eta 0:03:20 lr 0.000199 wd 0.0500 time 0.4472 (0.4507) data time 0.0006 (0.0032) model time 0.4466 (0.4487) loss 2.8743 (2.5966) grad_norm 3.6466 (2.5575) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][190/625] eta 0:03:16 lr 0.000199 wd 0.0500 time 0.4538 (0.4507) data time 0.0008 (0.0030) model time 0.4530 (0.4488) loss 1.5419 (2.5885) grad_norm 2.2751 (2.5424) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][200/625] eta 0:03:11 lr 0.000199 wd 0.0500 time 0.4474 (0.4507) data time 0.0010 (0.0029) model time 0.4464 (0.4488) loss 2.9612 (2.5913) grad_norm 1.5949 (2.5368) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][210/625] eta 0:03:07 lr 0.000199 wd 0.0500 time 0.4496 (0.4506) data time 0.0006 (0.0028) model time 0.4490 (0.4488) loss 2.4550 (2.5876) grad_norm 3.2709 (2.5905) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][220/625] eta 0:03:02 lr 0.000199 wd 0.0500 time 0.4499 (0.4506) data time 0.0009 (0.0027) model time 0.4491 (0.4488) loss 2.4820 (2.5853) grad_norm 2.2424 (2.6407) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][230/625] eta 0:02:57 lr 0.000199 wd 0.0500 time 0.4480 (0.4505) data time 0.0010 (0.0027) model time 0.4470 (0.4488) loss 2.7704 (2.5886) grad_norm 2.3254 (2.6112) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][240/625] eta 0:02:53 lr 0.000198 wd 0.0500 time 0.4533 (0.4504) data time 0.0007 (0.0026) model time 0.4526 (0.4486) loss 2.6766 (2.5926) grad_norm 5.5599 (2.6229) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][250/625] eta 0:02:48 lr 0.000198 wd 0.0500 time 0.4453 (0.4502) data time 0.0007 (0.0025) model time 0.4446 (0.4484) loss 1.6989 (2.5952) grad_norm 5.1338 (2.6785) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][260/625] eta 0:02:44 lr 0.000198 wd 0.0500 time 0.4489 (0.4501) data time 0.0008 (0.0025) model time 0.4480 (0.4484) loss 2.3653 (2.5937) grad_norm 1.7469 (2.7019) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][270/625] eta 0:02:39 lr 0.000198 wd 0.0500 time 0.4528 (0.4501) data time 0.0009 (0.0024) model time 0.4519 (0.4483) loss 2.7393 (2.5943) grad_norm 1.9434 (2.6803) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:45:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][280/625] eta 0:02:35 lr 0.000198 wd 0.0500 time 0.4423 (0.4501) data time 0.0007 (0.0024) model time 0.4416 (0.4484) loss 2.1854 (2.5962) grad_norm 2.2144 (2.6548) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][290/625] eta 0:02:30 lr 0.000198 wd 0.0500 time 0.4481 (0.4500) data time 0.0006 (0.0023) model time 0.4475 (0.4484) loss 2.5177 (2.5963) grad_norm 2.1690 (2.6844) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][300/625] eta 0:02:26 lr 0.000198 wd 0.0500 time 0.4457 (0.4500) data time 0.0007 (0.0023) model time 0.4450 (0.4483) loss 3.3280 (2.6058) grad_norm 1.9250 (2.8197) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][310/625] eta 0:02:21 lr 0.000198 wd 0.0500 time 0.4472 (0.4499) data time 0.0008 (0.0022) model time 0.4464 (0.4483) loss 2.8345 (2.6125) grad_norm 1.5347 (2.8262) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][320/625] eta 0:02:17 lr 0.000198 wd 0.0500 time 0.3858 (0.4502) data time 0.0010 (0.0022) model time 0.3848 (0.4487) loss 1.3555 (2.6118) grad_norm 2.7832 (2.8061) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][330/625] eta 0:02:12 lr 0.000198 wd 0.0500 time 0.4497 (0.4501) data time 0.0007 (0.0021) model time 0.4490 (0.4486) loss 3.0281 (2.6139) grad_norm 2.3392 (2.7935) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][340/625] eta 0:02:08 lr 0.000198 wd 0.0500 time 0.4481 (0.4501) data time 0.0009 (0.0021) model time 0.4472 (0.4486) loss 2.6978 (2.6106) grad_norm 3.6390 (2.7987) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][350/625] eta 0:02:03 lr 0.000198 wd 0.0500 time 0.4463 (0.4501) data time 0.0006 (0.0021) model time 0.4457 (0.4486) loss 1.7404 (2.6078) grad_norm 1.9529 (2.8522) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][360/625] eta 0:01:59 lr 0.000198 wd 0.0500 time 0.4455 (0.4500) data time 0.0007 (0.0020) model time 0.4448 (0.4485) loss 3.2160 (2.6094) grad_norm 3.1830 (2.8592) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][370/625] eta 0:01:54 lr 0.000197 wd 0.0500 time 0.4472 (0.4508) data time 0.0011 (0.0020) model time 0.4461 (0.4495) loss 2.7944 (2.6023) grad_norm 2.2052 (2.8528) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][380/625] eta 0:01:50 lr 0.000197 wd 0.0500 time 0.4469 (0.4507) data time 0.0006 (0.0020) model time 0.4463 (0.4493) loss 2.9508 (2.6045) grad_norm 3.1687 (2.8501) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][390/625] eta 0:01:45 lr 0.000197 wd 0.0500 time 0.4459 (0.4506) data time 0.0008 (0.0019) model time 0.4450 (0.4492) loss 2.3472 (2.6077) grad_norm 3.1646 (2.8570) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][400/625] eta 0:01:41 lr 0.000197 wd 0.0500 time 0.4497 (0.4505) data time 0.0009 (0.0019) model time 0.4487 (0.4491) loss 2.9879 (2.6156) grad_norm 2.4069 (2.8455) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:46:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][410/625] eta 0:01:36 lr 0.000197 wd 0.0500 time 0.4512 (0.4504) data time 0.0009 (0.0019) model time 0.4503 (0.4491) loss 2.1733 (2.6179) grad_norm 2.0731 (2.8315) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][420/625] eta 0:01:32 lr 0.000197 wd 0.0500 time 0.4477 (0.4504) data time 0.0006 (0.0019) model time 0.4471 (0.4491) loss 1.9542 (2.6186) grad_norm 2.4216 (2.8730) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][430/625] eta 0:01:27 lr 0.000197 wd 0.0500 time 0.4466 (0.4503) data time 0.0008 (0.0018) model time 0.4458 (0.4490) loss 2.5987 (2.6138) grad_norm 2.6372 (2.8644) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][440/625] eta 0:01:23 lr 0.000197 wd 0.0500 time 0.4494 (0.4503) data time 0.0006 (0.0018) model time 0.4488 (0.4490) loss 1.9435 (2.6085) grad_norm 1.3984 (2.8591) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][450/625] eta 0:01:18 lr 0.000197 wd 0.0500 time 0.4483 (0.4502) data time 0.0006 (0.0018) model time 0.4477 (0.4489) loss 2.5572 (2.6100) grad_norm 2.1758 (2.8456) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][460/625] eta 0:01:14 lr 0.000197 wd 0.0500 time 0.4478 (0.4502) data time 0.0008 (0.0018) model time 0.4470 (0.4489) loss 1.8414 (2.6119) grad_norm 2.5508 (2.8280) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][470/625] eta 0:01:09 lr 0.000197 wd 0.0500 time 0.4418 (0.4504) data time 0.0007 (0.0018) model time 0.4411 (0.4491) loss 2.5247 (2.6144) grad_norm 2.3008 (2.8124) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][480/625] eta 0:01:05 lr 0.000197 wd 0.0500 time 0.4451 (0.4503) data time 0.0008 (0.0017) model time 0.4442 (0.4491) loss 3.0502 (2.6180) grad_norm 2.0115 (2.7976) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][490/625] eta 0:01:00 lr 0.000197 wd 0.0500 time 0.4487 (0.4503) data time 0.0008 (0.0017) model time 0.4478 (0.4490) loss 2.8981 (2.6218) grad_norm 2.2534 (2.8022) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][500/625] eta 0:00:56 lr 0.000196 wd 0.0500 time 0.4468 (0.4502) data time 0.0009 (0.0017) model time 0.4460 (0.4489) loss 2.2955 (2.6245) grad_norm 1.9116 (2.7973) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][510/625] eta 0:00:51 lr 0.000196 wd 0.0500 time 0.4470 (0.4501) data time 0.0006 (0.0017) model time 0.4464 (0.4489) loss 1.9591 (2.6231) grad_norm 2.4243 (2.8031) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][520/625] eta 0:00:47 lr 0.000196 wd 0.0500 time 0.4498 (0.4508) data time 0.0006 (0.0017) model time 0.4491 (0.4496) loss 3.0617 (2.6239) grad_norm 2.1741 (2.7992) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][530/625] eta 0:00:42 lr 0.000196 wd 0.0500 time 0.4489 (0.4508) data time 0.0007 (0.0017) model time 0.4482 (0.4496) loss 2.4139 (2.6265) grad_norm 2.9152 (2.7866) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][540/625] eta 0:00:38 lr 0.000196 wd 0.0500 time 0.4488 (0.4507) data time 0.0008 (0.0016) model time 0.4480 (0.4495) loss 2.8194 (2.6255) grad_norm 1.8827 (2.7848) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:47:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][550/625] eta 0:00:33 lr 0.000196 wd 0.0500 time 0.4445 (0.4506) data time 0.0007 (0.0016) model time 0.4439 (0.4495) loss 2.8040 (2.6246) grad_norm 3.8488 (2.7768) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][560/625] eta 0:00:29 lr 0.000196 wd 0.0500 time 0.4508 (0.4506) data time 0.0007 (0.0016) model time 0.4502 (0.4494) loss 3.1716 (2.6232) grad_norm 1.8260 (2.7600) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][570/625] eta 0:00:24 lr 0.000196 wd 0.0500 time 0.4513 (0.4506) data time 0.0008 (0.0016) model time 0.4505 (0.4494) loss 2.9539 (2.6265) grad_norm 1.3386 (2.7452) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][580/625] eta 0:00:20 lr 0.000196 wd 0.0500 time 0.4483 (0.4505) data time 0.0007 (0.0016) model time 0.4476 (0.4494) loss 2.9909 (2.6278) grad_norm 2.5469 (2.7448) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][590/625] eta 0:00:15 lr 0.000196 wd 0.0500 time 0.4486 (0.4505) data time 0.0008 (0.0016) model time 0.4477 (0.4493) loss 2.8922 (2.6272) grad_norm 2.1404 (2.7382) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][600/625] eta 0:00:11 lr 0.000196 wd 0.0500 time 0.4455 (0.4504) data time 0.0007 (0.0016) model time 0.4448 (0.4493) loss 1.9533 (2.6239) grad_norm 1.7959 (2.7308) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][610/625] eta 0:00:06 lr 0.000196 wd 0.0500 time 0.4430 (0.4504) data time 0.0004 (0.0016) model time 0.4426 (0.4492) loss 2.8199 (2.6191) grad_norm 2.6296 (2.7310) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [227/300][620/625] eta 0:00:02 lr 0.000196 wd 0.0500 time 0.4430 (0.4502) data time 0.0004 (0.0015) model time 0.4426 (0.4491) loss 2.7883 (2.6172) grad_norm 2.0999 (2.7259) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 227 training takes 0:04:41 [2024-08-11 06:48:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:48:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:48:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5137 (0.5137) Acc@1 89.355 (89.355) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 06:48:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.152) Loss 0.8330 (0.6230) Acc@1 80.957 (86.843) Acc@5 96.191 (97.820) Mem 16699MB [2024-08-11 06:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9219 (0.7388) Acc@1 78.271 (83.840) Acc@5 95.410 (96.726) Mem 16699MB [2024-08-11 06:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.513 Acc@5 96.683 [2024-08-11 06:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 06:48:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.774 (0.774) Loss 0.4795 (0.4795) Acc@1 89.600 (89.600) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 06:48:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.180) Loss 0.7749 (0.5906) Acc@1 81.348 (87.327) Acc@5 96.777 (97.954) Mem 16699MB [2024-08-11 06:48:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.149) Loss 0.8618 (0.6941) Acc@1 79.785 (84.645) Acc@5 95.850 (97.019) Mem 16699MB [2024-08-11 06:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.349 Acc@5 97.005 [2024-08-11 06:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 06:48:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][0/625] eta 0:13:01 lr 0.000196 wd 0.0500 time 1.2512 (1.2512) data time 0.6530 (0.6530) model time 0.0000 (0.0000) loss 2.9676 (2.9676) grad_norm 2.0054 (2.0054) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][10/625] eta 0:05:21 lr 0.000195 wd 0.0500 time 0.4457 (0.5226) data time 0.0008 (0.0601) model time 0.0000 (0.0000) loss 2.5696 (2.6641) grad_norm 2.3288 (2.2526) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][20/625] eta 0:04:54 lr 0.000195 wd 0.0500 time 0.4511 (0.4875) data time 0.0009 (0.0319) model time 0.0000 (0.0000) loss 2.2296 (2.6941) grad_norm 2.5014 (4.2490) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][30/625] eta 0:04:42 lr 0.000195 wd 0.0500 time 0.4490 (0.4749) data time 0.0008 (0.0219) model time 0.0000 (0.0000) loss 2.9499 (2.6214) grad_norm 2.0723 (3.6585) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][40/625] eta 0:04:34 lr 0.000195 wd 0.0500 time 0.4451 (0.4685) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 3.1215 (2.5870) grad_norm 2.3912 (3.3914) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][50/625] eta 0:04:26 lr 0.000195 wd 0.0500 time 0.4480 (0.4643) data time 0.0008 (0.0136) model time 0.0000 (0.0000) loss 2.7805 (2.5983) grad_norm 2.0334 (3.1457) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][60/625] eta 0:04:20 lr 0.000195 wd 0.0500 time 0.4457 (0.4616) data time 0.0007 (0.0115) model time 0.4451 (0.4468) loss 2.6364 (2.5512) grad_norm 1.7086 (2.9746) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][70/625] eta 0:04:15 lr 0.000195 wd 0.0500 time 0.5416 (0.4613) data time 0.0006 (0.0100) model time 0.5410 (0.4527) loss 2.2926 (2.5456) grad_norm 2.1415 (2.8576) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][80/625] eta 0:04:10 lr 0.000195 wd 0.0500 time 0.4460 (0.4591) data time 0.0007 (0.0089) model time 0.4454 (0.4494) loss 2.3358 (2.5553) grad_norm 32.3546 (3.3193) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][90/625] eta 0:04:05 lr 0.000195 wd 0.0500 time 0.4413 (0.4580) data time 0.0007 (0.0080) model time 0.4406 (0.4492) loss 2.5925 (2.5928) grad_norm 1.6798 (3.2594) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][100/625] eta 0:04:01 lr 0.000195 wd 0.0500 time 0.4459 (0.4609) data time 0.0006 (0.0073) model time 0.4453 (0.4565) loss 3.1917 (2.6192) grad_norm 2.1619 (3.3394) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][110/625] eta 0:03:56 lr 0.000195 wd 0.0500 time 0.4487 (0.4596) data time 0.0007 (0.0067) model time 0.4480 (0.4548) loss 3.0202 (2.6063) grad_norm 1.7791 (3.2402) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][120/625] eta 0:03:51 lr 0.000195 wd 0.0500 time 0.4495 (0.4586) data time 0.0009 (0.0062) model time 0.4486 (0.4537) loss 2.8031 (2.6138) grad_norm 2.0217 (3.1525) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][130/625] eta 0:03:46 lr 0.000195 wd 0.0500 time 0.4476 (0.4578) data time 0.0008 (0.0058) model time 0.4468 (0.4529) loss 2.8095 (2.6263) grad_norm 1.5528 (3.0916) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][140/625] eta 0:03:41 lr 0.000194 wd 0.0500 time 0.4489 (0.4573) data time 0.0006 (0.0054) model time 0.4483 (0.4525) loss 1.9403 (2.5958) grad_norm 2.8334 (3.0584) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][150/625] eta 0:03:37 lr 0.000194 wd 0.0500 time 0.4524 (0.4569) data time 0.0008 (0.0051) model time 0.4516 (0.4523) loss 2.5312 (2.5785) grad_norm 2.0926 (2.9961) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][160/625] eta 0:03:32 lr 0.000194 wd 0.0500 time 0.4526 (0.4565) data time 0.0007 (0.0049) model time 0.4518 (0.4521) loss 3.1159 (2.5757) grad_norm 1.7893 (2.9545) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:49:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][170/625] eta 0:03:27 lr 0.000194 wd 0.0500 time 0.4526 (0.4562) data time 0.0009 (0.0046) model time 0.4516 (0.4519) loss 1.7298 (2.5723) grad_norm 1.9012 (2.9284) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][180/625] eta 0:03:22 lr 0.000194 wd 0.0500 time 0.4463 (0.4557) data time 0.0010 (0.0044) model time 0.4454 (0.4515) loss 2.8918 (2.5832) grad_norm 2.0333 (2.8867) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][190/625] eta 0:03:17 lr 0.000194 wd 0.0500 time 0.4445 (0.4551) data time 0.0008 (0.0042) model time 0.4437 (0.4510) loss 2.6767 (2.5894) grad_norm 2.0043 (2.8453) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][200/625] eta 0:03:13 lr 0.000194 wd 0.0500 time 0.4446 (0.4547) data time 0.0007 (0.0041) model time 0.4440 (0.4506) loss 2.0902 (2.6032) grad_norm 1.8309 (2.8034) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][210/625] eta 0:03:08 lr 0.000194 wd 0.0500 time 0.4486 (0.4544) data time 0.0009 (0.0039) model time 0.4477 (0.4504) loss 2.7428 (2.5968) grad_norm 1.8380 (2.7893) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][220/625] eta 0:03:03 lr 0.000194 wd 0.0500 time 0.4493 (0.4541) data time 0.0009 (0.0038) model time 0.4483 (0.4502) loss 1.9976 (2.5927) grad_norm 3.2980 (2.7766) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][230/625] eta 0:02:59 lr 0.000194 wd 0.0500 time 0.4505 (0.4538) data time 0.0007 (0.0036) model time 0.4498 (0.4500) loss 2.1764 (2.5885) grad_norm 2.2112 (2.7500) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][240/625] eta 0:02:54 lr 0.000194 wd 0.0500 time 0.4461 (0.4536) data time 0.0009 (0.0035) model time 0.4452 (0.4499) loss 2.3572 (2.5834) grad_norm 1.5067 (2.7197) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][250/625] eta 0:02:49 lr 0.000194 wd 0.0500 time 0.4448 (0.4532) data time 0.0009 (0.0034) model time 0.4440 (0.4496) loss 3.1909 (2.5768) grad_norm 3.5093 (2.7013) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][260/625] eta 0:02:45 lr 0.000194 wd 0.0500 time 0.4467 (0.4530) data time 0.0007 (0.0033) model time 0.4460 (0.4494) loss 2.0041 (2.5826) grad_norm 1.5534 (2.6815) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][270/625] eta 0:02:40 lr 0.000193 wd 0.0500 time 0.4472 (0.4527) data time 0.0010 (0.0032) model time 0.4462 (0.4492) loss 2.7770 (2.5882) grad_norm 1.7563 (2.7082) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][280/625] eta 0:02:36 lr 0.000193 wd 0.0500 time 0.4431 (0.4525) data time 0.0008 (0.0032) model time 0.4423 (0.4490) loss 2.4931 (2.5843) grad_norm 1.9736 (2.7032) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][290/625] eta 0:02:31 lr 0.000193 wd 0.0500 time 0.6585 (0.4531) data time 0.0006 (0.0031) model time 0.6580 (0.4499) loss 3.1453 (2.5953) grad_norm 2.5388 (2.7074) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:50:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][300/625] eta 0:02:27 lr 0.000193 wd 0.0500 time 0.4469 (0.4527) data time 0.0009 (0.0030) model time 0.4461 (0.4495) loss 3.0296 (2.5962) grad_norm 1.6585 (2.6989) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][310/625] eta 0:02:22 lr 0.000193 wd 0.0500 time 0.4482 (0.4526) data time 0.0007 (0.0029) model time 0.4476 (0.4494) loss 3.0223 (2.5980) grad_norm 2.4690 (2.6881) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][320/625] eta 0:02:17 lr 0.000193 wd 0.0500 time 0.4446 (0.4524) data time 0.0007 (0.0029) model time 0.4439 (0.4493) loss 2.3492 (2.6012) grad_norm 1.7047 (2.6762) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][330/625] eta 0:02:13 lr 0.000193 wd 0.0500 time 0.4471 (0.4522) data time 0.0008 (0.0028) model time 0.4463 (0.4492) loss 2.8921 (2.6080) grad_norm 2.6985 (2.6954) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][340/625] eta 0:02:08 lr 0.000193 wd 0.0500 time 0.4464 (0.4520) data time 0.0007 (0.0028) model time 0.4457 (0.4490) loss 1.9761 (2.6021) grad_norm 3.1752 (2.7031) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][350/625] eta 0:02:04 lr 0.000193 wd 0.0500 time 0.4497 (0.4519) data time 0.0007 (0.0027) model time 0.4491 (0.4490) loss 2.6248 (2.6015) grad_norm 2.5853 (2.6962) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][360/625] eta 0:01:59 lr 0.000193 wd 0.0500 time 0.4458 (0.4518) data time 0.0007 (0.0026) model time 0.4451 (0.4490) loss 1.9906 (2.6056) grad_norm 5.2299 (2.7207) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][370/625] eta 0:01:55 lr 0.000193 wd 0.0500 time 0.4515 (0.4518) data time 0.0006 (0.0026) model time 0.4509 (0.4489) loss 3.3222 (2.6099) grad_norm 1.6113 (2.7143) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][380/625] eta 0:01:50 lr 0.000193 wd 0.0500 time 0.4455 (0.4517) data time 0.0009 (0.0026) model time 0.4446 (0.4489) loss 3.0098 (2.6136) grad_norm 3.4277 (2.7074) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][390/625] eta 0:01:46 lr 0.000193 wd 0.0500 time 0.4518 (0.4516) data time 0.0007 (0.0025) model time 0.4511 (0.4488) loss 2.0156 (2.6081) grad_norm 1.8644 (2.6959) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][400/625] eta 0:01:41 lr 0.000192 wd 0.0500 time 0.4459 (0.4514) data time 0.0009 (0.0025) model time 0.4450 (0.4487) loss 2.5134 (2.6119) grad_norm 2.0251 (2.6894) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][410/625] eta 0:01:37 lr 0.000192 wd 0.0500 time 0.4428 (0.4512) data time 0.0009 (0.0024) model time 0.4419 (0.4485) loss 2.1362 (2.6048) grad_norm 1.6115 (2.6972) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][420/625] eta 0:01:32 lr 0.000192 wd 0.0500 time 0.4437 (0.4511) data time 0.0007 (0.0024) model time 0.4430 (0.4484) loss 2.2836 (2.6017) grad_norm 2.3599 (2.6843) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:51:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][430/625] eta 0:01:27 lr 0.000192 wd 0.0500 time 0.4489 (0.4510) data time 0.0006 (0.0024) model time 0.4483 (0.4484) loss 2.8417 (2.6032) grad_norm 2.0559 (2.6716) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:52:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][440/625] eta 0:01:23 lr 0.000192 wd 0.0500 time 0.4477 (0.4518) data time 0.0008 (0.0023) model time 0.4468 (0.4493) loss 2.2230 (2.6011) grad_norm 2.0262 (2.7638) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:52:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][450/625] eta 0:01:19 lr 0.000192 wd 0.0500 time 0.4472 (0.4517) data time 0.0010 (0.0023) model time 0.4461 (0.4493) loss 2.9242 (2.5982) grad_norm 2.9879 (2.7608) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:52:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][460/625] eta 0:01:14 lr 0.000192 wd 0.0500 time 0.4459 (0.4516) data time 0.0008 (0.0023) model time 0.4451 (0.4492) loss 2.8687 (2.5944) grad_norm 1.9884 (2.8923) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 06:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][470/625] eta 0:01:09 lr 0.000192 wd 0.0500 time 0.4417 (0.4515) data time 0.0008 (0.0022) model time 0.4409 (0.4491) loss 2.8192 (2.5933) grad_norm 2.5631 (2.8797) loss_scale 256.0000 (130.4459) mem 16699MB [2024-08-11 06:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][480/625] eta 0:01:05 lr 0.000192 wd 0.0500 time 0.4459 (0.4515) data time 0.0009 (0.0022) model time 0.4450 (0.4491) loss 2.1511 (2.5869) grad_norm 1.4490 (2.8671) loss_scale 256.0000 (133.0561) mem 16699MB [2024-08-11 06:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][490/625] eta 0:01:00 lr 0.000192 wd 0.0500 time 0.4432 (0.4513) data time 0.0007 (0.0022) model time 0.4425 (0.4490) loss 3.0710 (2.5883) grad_norm 2.1584 (2.8734) loss_scale 256.0000 (135.5601) mem 16699MB [2024-08-11 06:52:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][500/625] eta 0:00:56 lr 0.000192 wd 0.0500 time 0.4518 (0.4513) data time 0.0008 (0.0022) model time 0.4509 (0.4490) loss 2.9973 (2.5914) grad_norm 1.4852 (2.8532) loss_scale 256.0000 (137.9641) mem 16699MB [2024-08-11 06:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][510/625] eta 0:00:51 lr 0.000192 wd 0.0500 time 0.4477 (0.4513) data time 0.0009 (0.0021) model time 0.4468 (0.4490) loss 2.9283 (2.5915) grad_norm 2.1622 (2.8426) loss_scale 256.0000 (140.2740) mem 16699MB [2024-08-11 06:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][520/625] eta 0:00:47 lr 0.000192 wd 0.0500 time 0.4472 (0.4512) data time 0.0009 (0.0021) model time 0.4463 (0.4489) loss 3.1636 (2.5955) grad_norm 1.9642 (2.8266) loss_scale 256.0000 (142.4952) mem 16699MB [2024-08-11 06:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][530/625] eta 0:00:42 lr 0.000191 wd 0.0500 time 0.4473 (0.4511) data time 0.0007 (0.0021) model time 0.4466 (0.4489) loss 2.9563 (2.5971) grad_norm 2.3313 (2.8155) loss_scale 256.0000 (144.6328) mem 16699MB [2024-08-11 06:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][540/625] eta 0:00:38 lr 0.000191 wd 0.0500 time 0.4466 (0.4510) data time 0.0008 (0.0021) model time 0.4457 (0.4488) loss 2.7878 (2.5980) grad_norm 1.7087 (2.8085) loss_scale 256.0000 (146.6913) mem 16699MB [2024-08-11 06:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][550/625] eta 0:00:33 lr 0.000191 wd 0.0500 time 0.4452 (0.4510) data time 0.0006 (0.0020) model time 0.4446 (0.4488) loss 2.2208 (2.5985) grad_norm 2.1989 (2.8012) loss_scale 256.0000 (148.6751) mem 16699MB [2024-08-11 06:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][560/625] eta 0:00:29 lr 0.000191 wd 0.0500 time 0.4434 (0.4509) data time 0.0006 (0.0020) model time 0.4428 (0.4487) loss 1.7300 (2.5964) grad_norm 1.8144 (2.7858) loss_scale 256.0000 (150.5882) mem 16699MB [2024-08-11 06:52:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][570/625] eta 0:00:24 lr 0.000191 wd 0.0500 time 0.4530 (0.4509) data time 0.0007 (0.0020) model time 0.4523 (0.4487) loss 2.6152 (2.6002) grad_norm 4.3578 (2.8074) loss_scale 256.0000 (152.4343) mem 16699MB [2024-08-11 06:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][580/625] eta 0:00:20 lr 0.000191 wd 0.0500 time 0.4484 (0.4508) data time 0.0008 (0.0020) model time 0.4475 (0.4487) loss 2.6559 (2.6003) grad_norm 2.1060 (2.7978) loss_scale 256.0000 (154.2169) mem 16699MB [2024-08-11 06:53:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][590/625] eta 0:00:15 lr 0.000191 wd 0.0500 time 0.4485 (0.4508) data time 0.0006 (0.0020) model time 0.4479 (0.4487) loss 2.1039 (2.6022) grad_norm 1.8149 (2.7881) loss_scale 256.0000 (155.9391) mem 16699MB [2024-08-11 06:53:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][600/625] eta 0:00:11 lr 0.000191 wd 0.0500 time 0.4456 (0.4507) data time 0.0006 (0.0019) model time 0.4450 (0.4486) loss 3.2287 (2.6032) grad_norm 2.8917 (2.7894) loss_scale 256.0000 (157.6040) mem 16699MB [2024-08-11 06:53:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][610/625] eta 0:00:06 lr 0.000191 wd 0.0500 time 0.4411 (0.4507) data time 0.0006 (0.0019) model time 0.4405 (0.4486) loss 2.9032 (2.6053) grad_norm 2.1213 (2.7781) loss_scale 256.0000 (159.2144) mem 16699MB [2024-08-11 06:53:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [228/300][620/625] eta 0:00:02 lr 0.000191 wd 0.0500 time 0.4413 (0.4506) data time 0.0006 (0.0019) model time 0.4407 (0.4485) loss 2.4119 (2.6045) grad_norm 1.5623 (2.7655) loss_scale 256.0000 (160.7729) mem 16699MB [2024-08-11 06:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 228 training takes 0:04:41 [2024-08-11 06:53:22 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:53:24 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:53:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5103 (0.5103) Acc@1 88.916 (88.916) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 06:53:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8257 (0.6152) Acc@1 80.078 (86.728) Acc@5 96.094 (97.807) Mem 16699MB [2024-08-11 06:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.134) Loss 0.8921 (0.7278) Acc@1 79.395 (83.936) Acc@5 95.215 (96.756) Mem 16699MB [2024-08-11 06:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.637 Acc@5 96.673 [2024-08-11 06:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 06:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.810 (0.810) Loss 0.4795 (0.4795) Acc@1 89.600 (89.600) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 06:53:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.184) Loss 0.7744 (0.5907) Acc@1 81.445 (87.331) Acc@5 96.729 (97.954) Mem 16699MB [2024-08-11 06:53:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.8633 (0.6944) Acc@1 79.590 (84.626) Acc@5 95.947 (97.033) Mem 16699MB [2024-08-11 06:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.347 Acc@5 97.013 [2024-08-11 06:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 06:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][0/625] eta 0:13:15 lr 0.000191 wd 0.0500 time 1.2732 (1.2732) data time 0.4968 (0.4968) model time 0.0000 (0.0000) loss 3.1139 (3.1139) grad_norm 2.6130 (2.6130) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:53:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][10/625] eta 0:05:21 lr 0.000191 wd 0.0500 time 0.4445 (0.5221) data time 0.0009 (0.0459) model time 0.0000 (0.0000) loss 3.0041 (2.8126) grad_norm 1.9186 (2.5016) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][20/625] eta 0:04:54 lr 0.000191 wd 0.0500 time 0.4461 (0.4868) data time 0.0009 (0.0245) model time 0.0000 (0.0000) loss 2.6959 (2.6576) grad_norm 2.0471 (2.3498) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][30/625] eta 0:04:42 lr 0.000190 wd 0.0500 time 0.4523 (0.4743) data time 0.0008 (0.0168) model time 0.0000 (0.0000) loss 2.3025 (2.6233) grad_norm 2.0903 (2.4099) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][40/625] eta 0:04:33 lr 0.000190 wd 0.0500 time 0.4497 (0.4679) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 2.0355 (2.6409) grad_norm 2.7839 (2.3208) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:53:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][50/625] eta 0:04:26 lr 0.000190 wd 0.0500 time 0.4463 (0.4638) data time 0.0008 (0.0106) model time 0.0000 (0.0000) loss 2.9632 (2.6262) grad_norm 3.8111 (2.3534) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:53:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][60/625] eta 0:04:21 lr 0.000190 wd 0.0500 time 0.4450 (0.4633) data time 0.0008 (0.0090) model time 0.4441 (0.4595) loss 2.0724 (2.6321) grad_norm 2.5262 (2.3646) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][70/625] eta 0:04:15 lr 0.000190 wd 0.0500 time 0.4475 (0.4610) data time 0.0009 (0.0078) model time 0.4466 (0.4527) loss 2.6734 (2.6075) grad_norm 2.0941 (2.3568) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][80/625] eta 0:04:10 lr 0.000190 wd 0.0500 time 0.4560 (0.4594) data time 0.0006 (0.0070) model time 0.4554 (0.4510) loss 3.1175 (2.6066) grad_norm 1.6585 (2.3702) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][90/625] eta 0:04:05 lr 0.000190 wd 0.0500 time 0.4520 (0.4584) data time 0.0008 (0.0063) model time 0.4511 (0.4505) loss 2.7679 (2.5898) grad_norm 1.7023 (2.3511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][100/625] eta 0:04:00 lr 0.000190 wd 0.0500 time 0.4558 (0.4575) data time 0.0007 (0.0058) model time 0.4550 (0.4501) loss 1.5402 (2.5720) grad_norm 2.7616 (2.3600) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][110/625] eta 0:03:55 lr 0.000190 wd 0.0500 time 0.4503 (0.4566) data time 0.0009 (0.0053) model time 0.4494 (0.4496) loss 2.6530 (2.5744) grad_norm 2.5240 (2.3674) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][120/625] eta 0:03:50 lr 0.000190 wd 0.0500 time 0.4446 (0.4559) data time 0.0006 (0.0049) model time 0.4440 (0.4492) loss 2.9110 (2.5695) grad_norm 2.9278 (2.4718) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][130/625] eta 0:03:45 lr 0.000190 wd 0.0500 time 0.4461 (0.4552) data time 0.0008 (0.0046) model time 0.4453 (0.4488) loss 3.2551 (2.5758) grad_norm 3.9487 (2.4678) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][140/625] eta 0:03:40 lr 0.000190 wd 0.0500 time 0.4507 (0.4547) data time 0.0008 (0.0044) model time 0.4499 (0.4486) loss 2.8839 (2.5806) grad_norm 1.9294 (2.5330) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][150/625] eta 0:03:36 lr 0.000190 wd 0.0500 time 0.4457 (0.4556) data time 0.0009 (0.0041) model time 0.4448 (0.4505) loss 3.0943 (2.5927) grad_norm 1.4147 (2.5214) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][160/625] eta 0:03:31 lr 0.000189 wd 0.0500 time 0.4477 (0.4551) data time 0.0006 (0.0039) model time 0.4470 (0.4502) loss 3.0099 (2.5898) grad_norm 2.1619 (2.5269) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][170/625] eta 0:03:26 lr 0.000189 wd 0.0500 time 0.4495 (0.4548) data time 0.0006 (0.0037) model time 0.4489 (0.4501) loss 1.6513 (2.5786) grad_norm 2.5368 (2.5642) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][180/625] eta 0:03:22 lr 0.000189 wd 0.0500 time 0.4438 (0.4545) data time 0.0009 (0.0036) model time 0.4429 (0.4499) loss 2.8193 (2.5745) grad_norm 2.3634 (2.5331) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][190/625] eta 0:03:17 lr 0.000189 wd 0.0500 time 0.4511 (0.4542) data time 0.0009 (0.0035) model time 0.4502 (0.4498) loss 3.1149 (2.5805) grad_norm 2.6302 (2.5227) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][200/625] eta 0:03:12 lr 0.000189 wd 0.0500 time 0.4439 (0.4538) data time 0.0006 (0.0033) model time 0.4433 (0.4495) loss 3.1834 (2.5917) grad_norm 2.0803 (2.5039) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][210/625] eta 0:03:08 lr 0.000189 wd 0.0500 time 0.4462 (0.4535) data time 0.0009 (0.0032) model time 0.4453 (0.4494) loss 3.1445 (2.5934) grad_norm 1.9101 (2.4922) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][220/625] eta 0:03:03 lr 0.000189 wd 0.0500 time 0.4460 (0.4533) data time 0.0006 (0.0031) model time 0.4454 (0.4492) loss 2.4737 (2.5960) grad_norm 2.1096 (2.5428) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][230/625] eta 0:02:58 lr 0.000189 wd 0.0500 time 0.4513 (0.4530) data time 0.0007 (0.0030) model time 0.4507 (0.4491) loss 2.4113 (2.5913) grad_norm 9.0888 (2.5503) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][240/625] eta 0:02:54 lr 0.000189 wd 0.0500 time 0.4506 (0.4529) data time 0.0006 (0.0029) model time 0.4500 (0.4491) loss 2.8679 (2.5916) grad_norm 1.4988 (2.5545) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][250/625] eta 0:02:49 lr 0.000189 wd 0.0500 time 0.4496 (0.4527) data time 0.0009 (0.0028) model time 0.4487 (0.4491) loss 2.4580 (2.5937) grad_norm 2.0989 (2.5519) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][260/625] eta 0:02:45 lr 0.000189 wd 0.0500 time 0.4456 (0.4526) data time 0.0008 (0.0028) model time 0.4448 (0.4490) loss 2.3546 (2.5878) grad_norm 1.9196 (2.5343) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][270/625] eta 0:02:40 lr 0.000189 wd 0.0500 time 0.4454 (0.4524) data time 0.0009 (0.0027) model time 0.4445 (0.4489) loss 2.9346 (2.5878) grad_norm 3.0365 (2.5440) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][280/625] eta 0:02:36 lr 0.000189 wd 0.0500 time 0.3859 (0.4527) data time 0.0008 (0.0026) model time 0.3851 (0.4494) loss 2.1262 (2.5917) grad_norm 2.4944 (2.6043) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][290/625] eta 0:02:31 lr 0.000189 wd 0.0500 time 0.4470 (0.4525) data time 0.0007 (0.0026) model time 0.4463 (0.4492) loss 2.0677 (2.5853) grad_norm 2.0203 (2.5981) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][300/625] eta 0:02:27 lr 0.000188 wd 0.0500 time 0.4499 (0.4523) data time 0.0008 (0.0025) model time 0.4491 (0.4491) loss 2.4333 (2.5853) grad_norm 1.6974 (2.5905) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][310/625] eta 0:02:22 lr 0.000188 wd 0.0500 time 0.4495 (0.4522) data time 0.0008 (0.0025) model time 0.4487 (0.4491) loss 2.8237 (2.5895) grad_norm 3.0552 (2.5888) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][320/625] eta 0:02:17 lr 0.000188 wd 0.0500 time 0.4461 (0.4521) data time 0.0006 (0.0024) model time 0.4455 (0.4490) loss 3.1284 (2.5948) grad_norm 1.7780 (2.5761) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][330/625] eta 0:02:13 lr 0.000188 wd 0.0500 time 0.4452 (0.4520) data time 0.0007 (0.0024) model time 0.4444 (0.4489) loss 1.6629 (2.5913) grad_norm 2.3826 (2.5654) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][340/625] eta 0:02:08 lr 0.000188 wd 0.0500 time 0.4443 (0.4518) data time 0.0009 (0.0023) model time 0.4434 (0.4489) loss 2.1601 (2.5882) grad_norm 2.0509 (2.5511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][350/625] eta 0:02:04 lr 0.000188 wd 0.0500 time 0.4448 (0.4517) data time 0.0007 (0.0023) model time 0.4441 (0.4487) loss 2.5384 (2.5873) grad_norm 2.7225 (2.5524) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][360/625] eta 0:01:59 lr 0.000188 wd 0.0500 time 0.4435 (0.4515) data time 0.0008 (0.0022) model time 0.4427 (0.4486) loss 2.4404 (2.5902) grad_norm 3.1888 (2.6026) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][370/625] eta 0:01:55 lr 0.000188 wd 0.0500 time 0.4458 (0.4514) data time 0.0006 (0.0022) model time 0.4451 (0.4485) loss 1.5909 (2.5844) grad_norm 3.0486 (2.6084) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][380/625] eta 0:01:50 lr 0.000188 wd 0.0500 time 0.4481 (0.4513) data time 0.0006 (0.0022) model time 0.4475 (0.4485) loss 2.4121 (2.5854) grad_norm 1.8693 (2.6250) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][390/625] eta 0:01:46 lr 0.000188 wd 0.0500 time 0.4481 (0.4513) data time 0.0008 (0.0021) model time 0.4473 (0.4485) loss 3.1362 (2.5868) grad_norm 2.0383 (2.6683) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][400/625] eta 0:01:41 lr 0.000188 wd 0.0500 time 0.4478 (0.4512) data time 0.0010 (0.0021) model time 0.4468 (0.4485) loss 2.6287 (2.5888) grad_norm 2.3762 (2.6615) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][410/625] eta 0:01:36 lr 0.000188 wd 0.0500 time 0.4485 (0.4511) data time 0.0006 (0.0021) model time 0.4479 (0.4485) loss 2.3406 (2.5862) grad_norm 1.6881 (2.6530) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][420/625] eta 0:01:32 lr 0.000188 wd 0.0500 time 0.4457 (0.4510) data time 0.0006 (0.0020) model time 0.4451 (0.4484) loss 2.9997 (2.5902) grad_norm 2.5216 (2.6402) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][430/625] eta 0:01:28 lr 0.000187 wd 0.0500 time 0.4514 (0.4513) data time 0.0009 (0.0020) model time 0.4505 (0.4488) loss 2.7409 (2.5986) grad_norm 3.5431 (2.6288) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][440/625] eta 0:01:23 lr 0.000187 wd 0.0500 time 0.4477 (0.4513) data time 0.0008 (0.0020) model time 0.4468 (0.4488) loss 2.3650 (2.6009) grad_norm 2.3583 (2.6221) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][450/625] eta 0:01:18 lr 0.000187 wd 0.0500 time 0.4490 (0.4513) data time 0.0009 (0.0020) model time 0.4481 (0.4489) loss 2.5722 (2.6001) grad_norm 2.0848 (2.6121) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][460/625] eta 0:01:14 lr 0.000187 wd 0.0500 time 0.4524 (0.4513) data time 0.0008 (0.0019) model time 0.4516 (0.4489) loss 2.3847 (2.5990) grad_norm 4.9918 (2.6068) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][470/625] eta 0:01:09 lr 0.000187 wd 0.0500 time 0.4489 (0.4513) data time 0.0007 (0.0019) model time 0.4482 (0.4489) loss 2.6344 (2.6013) grad_norm 2.2578 (2.6170) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][480/625] eta 0:01:05 lr 0.000187 wd 0.0500 time 0.4450 (0.4517) data time 0.0009 (0.0019) model time 0.4442 (0.4493) loss 2.9030 (2.5985) grad_norm 3.0595 (2.6181) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][490/625] eta 0:01:00 lr 0.000187 wd 0.0500 time 0.4468 (0.4516) data time 0.0009 (0.0019) model time 0.4459 (0.4493) loss 2.3833 (2.5980) grad_norm 4.1989 (2.6257) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][500/625] eta 0:00:56 lr 0.000187 wd 0.0500 time 0.4515 (0.4515) data time 0.0008 (0.0019) model time 0.4507 (0.4492) loss 3.0011 (2.6004) grad_norm 1.9125 (2.6209) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][510/625] eta 0:00:51 lr 0.000187 wd 0.0500 time 0.4421 (0.4514) data time 0.0007 (0.0018) model time 0.4414 (0.4492) loss 3.2009 (2.6056) grad_norm 2.3996 (2.6145) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][520/625] eta 0:00:47 lr 0.000187 wd 0.0500 time 0.4509 (0.4513) data time 0.0009 (0.0018) model time 0.4499 (0.4491) loss 1.5616 (2.6037) grad_norm 3.3094 (2.6103) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][530/625] eta 0:00:42 lr 0.000187 wd 0.0500 time 0.4482 (0.4513) data time 0.0007 (0.0018) model time 0.4476 (0.4491) loss 2.2960 (2.6034) grad_norm 1.9631 (2.6018) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][540/625] eta 0:00:38 lr 0.000187 wd 0.0500 time 0.4463 (0.4512) data time 0.0008 (0.0018) model time 0.4455 (0.4490) loss 1.8713 (2.6041) grad_norm 3.2056 (2.6107) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][550/625] eta 0:00:33 lr 0.000187 wd 0.0500 time 0.4491 (0.4511) data time 0.0006 (0.0018) model time 0.4485 (0.4490) loss 1.6317 (2.6071) grad_norm 2.1483 (2.6077) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][560/625] eta 0:00:29 lr 0.000186 wd 0.0500 time 0.4465 (0.4511) data time 0.0006 (0.0017) model time 0.4459 (0.4489) loss 2.2877 (2.6116) grad_norm 3.4852 (2.6289) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][570/625] eta 0:00:24 lr 0.000186 wd 0.0500 time 0.4454 (0.4510) data time 0.0009 (0.0017) model time 0.4445 (0.4489) loss 3.2937 (2.6145) grad_norm 2.8652 (2.6254) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][580/625] eta 0:00:20 lr 0.000186 wd 0.0500 time 0.4491 (0.4509) data time 0.0007 (0.0017) model time 0.4483 (0.4488) loss 3.6131 (2.6194) grad_norm 3.0094 (2.6209) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][590/625] eta 0:00:15 lr 0.000186 wd 0.0500 time 0.4458 (0.4508) data time 0.0009 (0.0017) model time 0.4449 (0.4487) loss 2.6482 (2.6176) grad_norm 2.6042 (2.6236) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][600/625] eta 0:00:11 lr 0.000186 wd 0.0500 time 0.4463 (0.4508) data time 0.0008 (0.0017) model time 0.4455 (0.4487) loss 2.1927 (2.6213) grad_norm 2.3549 (2.6324) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][610/625] eta 0:00:06 lr 0.000186 wd 0.0500 time 0.4458 (0.4507) data time 0.0006 (0.0017) model time 0.4452 (0.4487) loss 2.7897 (2.6175) grad_norm 2.0288 (2.6271) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [229/300][620/625] eta 0:00:02 lr 0.000186 wd 0.0500 time 0.4456 (0.4506) data time 0.0004 (0.0017) model time 0.4452 (0.4486) loss 2.9382 (2.6193) grad_norm 1.7671 (2.6216) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 229 training takes 0:04:41 [2024-08-11 06:58:12 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 06:58:14 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 06:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5151 (0.5151) Acc@1 88.818 (88.818) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 06:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8096 (0.6222) Acc@1 80.908 (86.586) Acc@5 95.850 (97.763) Mem 16699MB [2024-08-11 06:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9385 (0.7363) Acc@1 78.955 (83.773) Acc@5 95.117 (96.696) Mem 16699MB [2024-08-11 06:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.517 Acc@5 96.633 [2024-08-11 06:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 06:58:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.839 (0.839) Loss 0.4802 (0.4802) Acc@1 89.502 (89.502) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 06:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.187) Loss 0.7734 (0.5909) Acc@1 81.396 (87.300) Acc@5 96.680 (97.949) Mem 16699MB [2024-08-11 06:58:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.153) Loss 0.8647 (0.6951) Acc@1 79.639 (84.575) Acc@5 95.996 (97.021) Mem 16699MB [2024-08-11 06:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.301 Acc@5 96.993 [2024-08-11 06:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 06:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][0/625] eta 0:13:06 lr 0.000186 wd 0.0500 time 1.2587 (1.2587) data time 0.4773 (0.4773) model time 0.0000 (0.0000) loss 3.2027 (3.2027) grad_norm 2.4655 (2.4655) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][10/625] eta 0:05:20 lr 0.000186 wd 0.0500 time 0.4442 (0.5211) data time 0.0009 (0.0442) model time 0.0000 (0.0000) loss 2.6705 (2.6510) grad_norm 1.7419 (2.2485) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][20/625] eta 0:04:53 lr 0.000186 wd 0.0500 time 0.4459 (0.4858) data time 0.0006 (0.0235) model time 0.0000 (0.0000) loss 3.1753 (2.6580) grad_norm 2.0909 (2.3411) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][30/625] eta 0:04:41 lr 0.000186 wd 0.0500 time 0.4473 (0.4733) data time 0.0008 (0.0162) model time 0.0000 (0.0000) loss 1.9899 (2.6004) grad_norm 2.4159 (2.4662) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][40/625] eta 0:04:33 lr 0.000186 wd 0.0500 time 0.4684 (0.4675) data time 0.0009 (0.0125) model time 0.0000 (0.0000) loss 2.9667 (2.5655) grad_norm 2.9227 (2.6121) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][50/625] eta 0:04:26 lr 0.000186 wd 0.0500 time 0.4484 (0.4637) data time 0.0006 (0.0102) model time 0.0000 (0.0000) loss 2.6546 (2.5711) grad_norm 1.9972 (2.5907) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][60/625] eta 0:04:21 lr 0.000186 wd 0.0500 time 0.4476 (0.4636) data time 0.0007 (0.0087) model time 0.4469 (0.4623) loss 2.9267 (2.5824) grad_norm 2.1435 (2.5299) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][70/625] eta 0:04:15 lr 0.000185 wd 0.0500 time 0.4448 (0.4612) data time 0.0006 (0.0076) model time 0.4442 (0.4541) loss 2.1090 (2.5683) grad_norm 2.1952 (2.5212) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:58:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][80/625] eta 0:04:12 lr 0.000185 wd 0.0500 time 0.4476 (0.4639) data time 0.0006 (0.0067) model time 0.4470 (0.4634) loss 2.5008 (2.5888) grad_norm 1.9684 (2.5306) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][90/625] eta 0:04:07 lr 0.000185 wd 0.0500 time 0.4484 (0.4621) data time 0.0006 (0.0061) model time 0.4478 (0.4591) loss 1.9119 (2.5680) grad_norm 1.6212 (2.4913) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][100/625] eta 0:04:01 lr 0.000185 wd 0.0500 time 0.4498 (0.4606) data time 0.0008 (0.0056) model time 0.4489 (0.4564) loss 2.2949 (2.5558) grad_norm 2.2015 (2.4533) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][110/625] eta 0:03:56 lr 0.000185 wd 0.0500 time 0.4473 (0.4593) data time 0.0007 (0.0052) model time 0.4467 (0.4547) loss 1.7807 (2.5288) grad_norm 2.4936 (2.4083) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][120/625] eta 0:03:51 lr 0.000185 wd 0.0500 time 0.4452 (0.4584) data time 0.0008 (0.0048) model time 0.4444 (0.4536) loss 2.7650 (2.5363) grad_norm 1.7460 (2.3617) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][130/625] eta 0:03:46 lr 0.000185 wd 0.0500 time 0.4491 (0.4575) data time 0.0008 (0.0045) model time 0.4482 (0.4527) loss 1.8694 (2.5189) grad_norm 2.9341 (2.6856) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][140/625] eta 0:03:41 lr 0.000185 wd 0.0500 time 0.4454 (0.4568) data time 0.0008 (0.0043) model time 0.4446 (0.4519) loss 3.2202 (2.5269) grad_norm 2.3032 (2.7202) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][150/625] eta 0:03:36 lr 0.000185 wd 0.0500 time 0.4454 (0.4561) data time 0.0006 (0.0040) model time 0.4448 (0.4513) loss 2.9218 (2.5314) grad_norm 1.7715 (2.6955) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][160/625] eta 0:03:31 lr 0.000185 wd 0.0500 time 0.4447 (0.4554) data time 0.0010 (0.0038) model time 0.4438 (0.4507) loss 2.9577 (2.5337) grad_norm 3.1959 (2.6857) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][170/625] eta 0:03:26 lr 0.000185 wd 0.0500 time 0.4458 (0.4549) data time 0.0009 (0.0037) model time 0.4449 (0.4502) loss 2.5543 (2.5327) grad_norm 1.8009 (2.6555) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][180/625] eta 0:03:22 lr 0.000185 wd 0.0500 time 0.4465 (0.4544) data time 0.0008 (0.0035) model time 0.4457 (0.4499) loss 2.9986 (2.5523) grad_norm 1.7077 (2.6318) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][190/625] eta 0:03:17 lr 0.000185 wd 0.0500 time 0.4481 (0.4541) data time 0.0006 (0.0034) model time 0.4475 (0.4497) loss 1.9686 (2.5491) grad_norm 1.5431 (2.5959) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][200/625] eta 0:03:12 lr 0.000184 wd 0.0500 time 0.4479 (0.4538) data time 0.0006 (0.0032) model time 0.4473 (0.4495) loss 2.1621 (2.5420) grad_norm 2.1795 (2.5893) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 06:59:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][210/625] eta 0:03:08 lr 0.000184 wd 0.0500 time 0.4456 (0.4535) data time 0.0009 (0.0031) model time 0.4448 (0.4494) loss 2.7200 (2.5445) grad_norm 2.9673 (2.5670) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][220/625] eta 0:03:03 lr 0.000184 wd 0.0500 time 0.4472 (0.4532) data time 0.0009 (0.0030) model time 0.4463 (0.4491) loss 2.7016 (2.5539) grad_norm 1.6387 (2.5402) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][230/625] eta 0:02:58 lr 0.000184 wd 0.0500 time 0.4515 (0.4529) data time 0.0010 (0.0029) model time 0.4505 (0.4490) loss 2.8384 (2.5471) grad_norm 3.4065 (2.5372) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][240/625] eta 0:02:54 lr 0.000184 wd 0.0500 time 0.4498 (0.4526) data time 0.0006 (0.0028) model time 0.4492 (0.4488) loss 3.0769 (2.5545) grad_norm 2.7991 (2.5450) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][250/625] eta 0:02:49 lr 0.000184 wd 0.0500 time 0.4492 (0.4524) data time 0.0007 (0.0028) model time 0.4485 (0.4486) loss 2.4499 (2.5549) grad_norm 3.3447 (2.6511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][260/625] eta 0:02:45 lr 0.000184 wd 0.0500 time 0.4471 (0.4522) data time 0.0008 (0.0027) model time 0.4463 (0.4485) loss 2.2192 (2.5607) grad_norm 2.7251 (2.6626) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][270/625] eta 0:02:40 lr 0.000184 wd 0.0500 time 0.4495 (0.4533) data time 0.0006 (0.0026) model time 0.4489 (0.4501) loss 1.8382 (2.5565) grad_norm 1.5938 (2.6512) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][280/625] eta 0:02:36 lr 0.000184 wd 0.0500 time 0.4461 (0.4536) data time 0.0008 (0.0026) model time 0.4453 (0.4505) loss 2.7662 (2.5619) grad_norm 1.8078 (2.6379) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][290/625] eta 0:02:31 lr 0.000184 wd 0.0500 time 0.4522 (0.4536) data time 0.0009 (0.0025) model time 0.4513 (0.4506) loss 1.9349 (2.5555) grad_norm 1.9181 (2.6256) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][300/625] eta 0:02:27 lr 0.000184 wd 0.0500 time 0.4475 (0.4537) data time 0.0009 (0.0024) model time 0.4466 (0.4508) loss 2.8858 (2.5589) grad_norm 1.8683 (2.6110) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][310/625] eta 0:02:22 lr 0.000184 wd 0.0500 time 0.4515 (0.4535) data time 0.0007 (0.0024) model time 0.4508 (0.4506) loss 2.1736 (2.5560) grad_norm 12.4835 (2.6290) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][320/625] eta 0:02:18 lr 0.000184 wd 0.0500 time 0.4676 (0.4533) data time 0.0006 (0.0023) model time 0.4670 (0.4505) loss 1.8059 (2.5550) grad_norm 2.5034 (2.6401) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][330/625] eta 0:02:13 lr 0.000183 wd 0.0500 time 0.4647 (0.4532) data time 0.0007 (0.0023) model time 0.4640 (0.4505) loss 2.3421 (2.5515) grad_norm 3.2339 (2.6477) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:00:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][340/625] eta 0:02:09 lr 0.000183 wd 0.0500 time 0.4486 (0.4531) data time 0.0008 (0.0023) model time 0.4478 (0.4504) loss 2.8292 (2.5560) grad_norm 2.1219 (2.6469) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][350/625] eta 0:02:04 lr 0.000183 wd 0.0500 time 0.4575 (0.4530) data time 0.0009 (0.0022) model time 0.4566 (0.4503) loss 3.1831 (2.5574) grad_norm 3.9817 (2.6616) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][360/625] eta 0:02:00 lr 0.000183 wd 0.0500 time 0.4548 (0.4530) data time 0.0007 (0.0022) model time 0.4541 (0.4504) loss 2.8155 (2.5636) grad_norm 2.3076 (2.6551) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][370/625] eta 0:01:55 lr 0.000183 wd 0.0500 time 0.4477 (0.4529) data time 0.0008 (0.0022) model time 0.4469 (0.4503) loss 2.6901 (2.5650) grad_norm 1.7459 (2.6654) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][380/625] eta 0:01:50 lr 0.000183 wd 0.0500 time 0.4477 (0.4528) data time 0.0006 (0.0021) model time 0.4471 (0.4502) loss 1.6333 (2.5624) grad_norm 1.2979 (2.6728) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][390/625] eta 0:01:46 lr 0.000183 wd 0.0500 time 0.4474 (0.4527) data time 0.0007 (0.0021) model time 0.4467 (0.4502) loss 2.8518 (2.5591) grad_norm 1.9671 (2.6633) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][400/625] eta 0:01:41 lr 0.000183 wd 0.0500 time 0.4480 (0.4526) data time 0.0006 (0.0021) model time 0.4474 (0.4502) loss 2.5283 (2.5611) grad_norm 1.6187 (2.6484) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][410/625] eta 0:01:37 lr 0.000183 wd 0.0500 time 0.4506 (0.4526) data time 0.0006 (0.0020) model time 0.4500 (0.4501) loss 2.0923 (2.5588) grad_norm 1.9008 (2.6425) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][420/625] eta 0:01:32 lr 0.000183 wd 0.0500 time 0.6538 (0.4530) data time 0.0008 (0.0020) model time 0.6530 (0.4506) loss 2.7262 (2.5615) grad_norm 2.3209 (2.6999) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][430/625] eta 0:01:28 lr 0.000183 wd 0.0500 time 0.4489 (0.4527) data time 0.0008 (0.0020) model time 0.4480 (0.4504) loss 1.7072 (2.5600) grad_norm 3.8772 (2.6998) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][440/625] eta 0:01:23 lr 0.000183 wd 0.0500 time 0.4446 (0.4526) data time 0.0007 (0.0019) model time 0.4440 (0.4503) loss 3.1153 (2.5637) grad_norm 2.1655 (2.6923) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][450/625] eta 0:01:19 lr 0.000183 wd 0.0500 time 0.4488 (0.4526) data time 0.0008 (0.0019) model time 0.4480 (0.4503) loss 2.5237 (2.5685) grad_norm 2.1576 (2.6812) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][460/625] eta 0:01:14 lr 0.000183 wd 0.0500 time 0.4461 (0.4525) data time 0.0008 (0.0019) model time 0.4453 (0.4502) loss 2.3522 (2.5688) grad_norm 1.8502 (2.6692) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][470/625] eta 0:01:10 lr 0.000182 wd 0.0500 time 0.4446 (0.4524) data time 0.0006 (0.0019) model time 0.4440 (0.4502) loss 2.5368 (2.5717) grad_norm 1.8218 (2.6587) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][480/625] eta 0:01:05 lr 0.000182 wd 0.0500 time 0.4490 (0.4524) data time 0.0008 (0.0018) model time 0.4481 (0.4502) loss 3.2924 (2.5792) grad_norm 3.0023 (2.6505) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][490/625] eta 0:01:01 lr 0.000182 wd 0.0500 time 0.4498 (0.4528) data time 0.0006 (0.0018) model time 0.4492 (0.4507) loss 2.3597 (2.5814) grad_norm 1.6863 (2.6446) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][500/625] eta 0:00:56 lr 0.000182 wd 0.0500 time 0.4503 (0.4528) data time 0.0006 (0.0018) model time 0.4497 (0.4507) loss 2.0526 (2.5820) grad_norm 2.9545 (2.6346) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][510/625] eta 0:00:52 lr 0.000182 wd 0.0500 time 0.4465 (0.4527) data time 0.0006 (0.0018) model time 0.4459 (0.4506) loss 2.6865 (2.5853) grad_norm 1.9038 (2.6420) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][520/625] eta 0:00:47 lr 0.000182 wd 0.0500 time 0.4500 (0.4526) data time 0.0009 (0.0018) model time 0.4491 (0.4506) loss 2.8817 (2.5835) grad_norm 1.5045 (2.6261) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][530/625] eta 0:00:42 lr 0.000182 wd 0.0500 time 0.4527 (0.4526) data time 0.0006 (0.0018) model time 0.4521 (0.4505) loss 2.6265 (2.5826) grad_norm 2.1394 (2.6317) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][540/625] eta 0:00:38 lr 0.000182 wd 0.0500 time 0.4463 (0.4525) data time 0.0006 (0.0017) model time 0.4456 (0.4505) loss 2.7710 (2.5874) grad_norm 4.7086 (2.6633) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][550/625] eta 0:00:33 lr 0.000182 wd 0.0500 time 0.4505 (0.4524) data time 0.0006 (0.0017) model time 0.4498 (0.4504) loss 2.5997 (2.5879) grad_norm 1.7820 (2.6590) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][560/625] eta 0:00:29 lr 0.000182 wd 0.0500 time 0.4492 (0.4524) data time 0.0006 (0.0017) model time 0.4486 (0.4504) loss 2.9359 (2.5899) grad_norm 1.9584 (2.6497) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][570/625] eta 0:00:24 lr 0.000182 wd 0.0500 time 0.4456 (0.4523) data time 0.0006 (0.0017) model time 0.4450 (0.4504) loss 1.7346 (2.5871) grad_norm 2.4154 (2.6403) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][580/625] eta 0:00:20 lr 0.000182 wd 0.0500 time 0.4456 (0.4523) data time 0.0006 (0.0017) model time 0.4450 (0.4504) loss 1.4655 (2.5831) grad_norm 2.0815 (2.6327) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][590/625] eta 0:00:15 lr 0.000182 wd 0.0500 time 0.4446 (0.4523) data time 0.0008 (0.0017) model time 0.4438 (0.4503) loss 3.0898 (2.5841) grad_norm 2.7399 (2.6258) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][600/625] eta 0:00:11 lr 0.000181 wd 0.0500 time 0.4461 (0.4522) data time 0.0007 (0.0016) model time 0.4455 (0.4503) loss 2.5966 (2.5836) grad_norm 2.2654 (2.6157) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][610/625] eta 0:00:06 lr 0.000181 wd 0.0500 time 0.4415 (0.4521) data time 0.0004 (0.0016) model time 0.4410 (0.4502) loss 2.6550 (2.5832) grad_norm 2.3134 (2.6078) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [230/300][620/625] eta 0:00:02 lr 0.000181 wd 0.0500 time 0.4495 (0.4520) data time 0.0004 (0.0016) model time 0.4491 (0.4502) loss 3.2797 (2.5857) grad_norm 2.8185 (2.6584) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 230 training takes 0:04:42 [2024-08-11 07:03:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:03:05 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:03:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5117 (0.5117) Acc@1 89.209 (89.209) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 07:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 0.8062 (0.6157) Acc@1 81.201 (86.856) Acc@5 96.191 (97.825) Mem 16699MB [2024-08-11 07:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9067 (0.7330) Acc@1 78.906 (83.824) Acc@5 95.312 (96.735) Mem 16699MB [2024-08-11 07:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.565 Acc@5 96.675 [2024-08-11 07:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 07:03:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.912 (0.912) Loss 0.4810 (0.4810) Acc@1 89.404 (89.404) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.196) Loss 0.7749 (0.5914) Acc@1 81.396 (87.305) Acc@5 96.582 (97.945) Mem 16699MB [2024-08-11 07:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.158) Loss 0.8643 (0.6957) Acc@1 79.834 (84.619) Acc@5 95.898 (97.017) Mem 16699MB [2024-08-11 07:03:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.341 Acc@5 96.985 [2024-08-11 07:03:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:03:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][0/625] eta 0:14:32 lr 0.000181 wd 0.0500 time 1.3954 (1.3954) data time 0.6593 (0.6593) model time 0.0000 (0.0000) loss 1.6702 (1.6702) grad_norm 3.1278 (3.1278) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][10/625] eta 0:05:40 lr 0.000181 wd 0.0500 time 0.4475 (0.5540) data time 0.0009 (0.0607) model time 0.0000 (0.0000) loss 2.4580 (2.7187) grad_norm 1.6389 (2.4964) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][20/625] eta 0:05:04 lr 0.000181 wd 0.0500 time 0.4473 (0.5039) data time 0.0008 (0.0322) model time 0.0000 (0.0000) loss 2.6993 (2.5948) grad_norm 1.3527 (2.3666) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][30/625] eta 0:04:49 lr 0.000181 wd 0.0500 time 0.4483 (0.4860) data time 0.0009 (0.0221) model time 0.0000 (0.0000) loss 2.1303 (2.6263) grad_norm 2.4882 (2.3313) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][40/625] eta 0:04:39 lr 0.000181 wd 0.0500 time 0.4485 (0.4771) data time 0.0009 (0.0169) model time 0.0000 (0.0000) loss 2.4922 (2.6437) grad_norm 2.5379 (2.2795) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][50/625] eta 0:04:31 lr 0.000181 wd 0.0500 time 0.4530 (0.4723) data time 0.0009 (0.0138) model time 0.0000 (0.0000) loss 2.5337 (2.6064) grad_norm 2.7090 (2.2388) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][60/625] eta 0:04:24 lr 0.000181 wd 0.0500 time 0.4523 (0.4688) data time 0.0008 (0.0117) model time 0.4514 (0.4497) loss 2.9811 (2.5897) grad_norm 2.1256 (2.2155) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][70/625] eta 0:04:18 lr 0.000181 wd 0.0500 time 0.4511 (0.4665) data time 0.0008 (0.0101) model time 0.4504 (0.4506) loss 2.5441 (2.5910) grad_norm 2.9858 (2.2290) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][80/625] eta 0:04:13 lr 0.000181 wd 0.0500 time 0.4503 (0.4645) data time 0.0009 (0.0090) model time 0.4494 (0.4504) loss 2.8855 (2.6026) grad_norm 1.7992 (2.2623) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][90/625] eta 0:04:07 lr 0.000181 wd 0.0500 time 0.4510 (0.4629) data time 0.0008 (0.0081) model time 0.4503 (0.4501) loss 2.3825 (2.5797) grad_norm 2.1437 (2.3212) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:03:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][100/625] eta 0:04:02 lr 0.000181 wd 0.0500 time 0.4546 (0.4615) data time 0.0007 (0.0074) model time 0.4539 (0.4496) loss 2.5910 (2.5781) grad_norm 2.5182 (2.3972) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][110/625] eta 0:03:57 lr 0.000180 wd 0.0500 time 0.4437 (0.4602) data time 0.0007 (0.0068) model time 0.4430 (0.4491) loss 3.0351 (2.5847) grad_norm 2.1231 (2.3988) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][120/625] eta 0:03:51 lr 0.000180 wd 0.0500 time 0.4494 (0.4593) data time 0.0009 (0.0063) model time 0.4485 (0.4490) loss 2.7591 (2.5757) grad_norm 2.1815 (2.3984) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][130/625] eta 0:03:46 lr 0.000180 wd 0.0500 time 0.4485 (0.4585) data time 0.0009 (0.0060) model time 0.4476 (0.4487) loss 2.5314 (2.5788) grad_norm 2.6760 (2.3890) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][140/625] eta 0:03:42 lr 0.000180 wd 0.0500 time 0.4479 (0.4579) data time 0.0008 (0.0056) model time 0.4470 (0.4487) loss 2.7367 (2.5722) grad_norm 3.3469 (2.5386) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][150/625] eta 0:03:37 lr 0.000180 wd 0.0500 time 0.4501 (0.4574) data time 0.0009 (0.0053) model time 0.4493 (0.4488) loss 2.7281 (2.5713) grad_norm 2.3579 (2.6423) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][160/625] eta 0:03:33 lr 0.000180 wd 0.0500 time 0.4507 (0.4583) data time 0.0008 (0.0050) model time 0.4499 (0.4508) loss 2.5548 (2.5694) grad_norm 2.1357 (2.6838) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][170/625] eta 0:03:28 lr 0.000180 wd 0.0500 time 0.4616 (0.4580) data time 0.0009 (0.0048) model time 0.4607 (0.4510) loss 3.1918 (2.5653) grad_norm 2.0614 (2.6918) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][180/625] eta 0:03:23 lr 0.000180 wd 0.0500 time 0.4502 (0.4576) data time 0.0008 (0.0046) model time 0.4494 (0.4508) loss 2.4904 (2.5759) grad_norm 1.7816 (2.6566) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][190/625] eta 0:03:18 lr 0.000180 wd 0.0500 time 0.4487 (0.4573) data time 0.0009 (0.0044) model time 0.4478 (0.4509) loss 2.8234 (2.5734) grad_norm 2.1745 (2.6323) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][200/625] eta 0:03:14 lr 0.000180 wd 0.0500 time 0.4501 (0.4569) data time 0.0008 (0.0042) model time 0.4493 (0.4507) loss 2.1920 (2.5819) grad_norm 2.7676 (2.6427) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][210/625] eta 0:03:09 lr 0.000180 wd 0.0500 time 0.4490 (0.4566) data time 0.0006 (0.0040) model time 0.4484 (0.4506) loss 2.7562 (2.5865) grad_norm 3.2842 (2.6708) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][220/625] eta 0:03:04 lr 0.000180 wd 0.0500 time 0.4508 (0.4563) data time 0.0006 (0.0039) model time 0.4502 (0.4506) loss 2.3208 (2.5838) grad_norm 2.2284 (2.6678) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][230/625] eta 0:03:00 lr 0.000180 wd 0.0500 time 0.4518 (0.4560) data time 0.0006 (0.0038) model time 0.4512 (0.4505) loss 3.1978 (2.5851) grad_norm 2.4215 (2.6789) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][240/625] eta 0:02:55 lr 0.000180 wd 0.0500 time 0.4503 (0.4557) data time 0.0006 (0.0036) model time 0.4497 (0.4503) loss 2.9473 (2.5928) grad_norm 2.2315 (2.6660) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][250/625] eta 0:02:50 lr 0.000179 wd 0.0500 time 0.4464 (0.4554) data time 0.0009 (0.0035) model time 0.4455 (0.4502) loss 3.1230 (2.6020) grad_norm 2.5620 (2.6607) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][260/625] eta 0:02:46 lr 0.000179 wd 0.0500 time 0.4481 (0.4551) data time 0.0009 (0.0034) model time 0.4472 (0.4501) loss 2.5032 (2.5974) grad_norm 1.9265 (2.6620) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][270/625] eta 0:02:41 lr 0.000179 wd 0.0500 time 0.4572 (0.4555) data time 0.0008 (0.0033) model time 0.4564 (0.4507) loss 2.5947 (2.6071) grad_norm 2.1518 (2.6405) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][280/625] eta 0:02:37 lr 0.000179 wd 0.0500 time 0.4479 (0.4553) data time 0.0008 (0.0032) model time 0.4471 (0.4507) loss 2.5952 (2.6051) grad_norm 1.8476 (2.6219) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][290/625] eta 0:02:32 lr 0.000179 wd 0.0500 time 0.4553 (0.4552) data time 0.0009 (0.0032) model time 0.4544 (0.4507) loss 2.9736 (2.6087) grad_norm 1.9889 (2.6144) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][300/625] eta 0:02:27 lr 0.000179 wd 0.0500 time 0.4528 (0.4551) data time 0.0006 (0.0031) model time 0.4522 (0.4507) loss 2.7569 (2.6111) grad_norm 2.2391 (2.6135) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][310/625] eta 0:02:23 lr 0.000179 wd 0.0500 time 0.4473 (0.4550) data time 0.0009 (0.0030) model time 0.4464 (0.4506) loss 3.1027 (2.6116) grad_norm 1.8911 (2.6391) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][320/625] eta 0:02:18 lr 0.000179 wd 0.0500 time 0.4479 (0.4548) data time 0.0008 (0.0029) model time 0.4472 (0.4506) loss 2.9986 (2.6096) grad_norm 2.0913 (2.6301) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][330/625] eta 0:02:14 lr 0.000179 wd 0.0500 time 0.4483 (0.4547) data time 0.0009 (0.0029) model time 0.4475 (0.4505) loss 1.6449 (2.6046) grad_norm 1.7074 (2.6216) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][340/625] eta 0:02:09 lr 0.000179 wd 0.0500 time 0.4443 (0.4545) data time 0.0009 (0.0028) model time 0.4434 (0.4505) loss 3.0575 (2.6124) grad_norm 2.1398 (2.6163) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][350/625] eta 0:02:04 lr 0.000179 wd 0.0500 time 0.4489 (0.4543) data time 0.0006 (0.0028) model time 0.4483 (0.4504) loss 3.1398 (2.6157) grad_norm 2.0930 (2.6709) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][360/625] eta 0:02:00 lr 0.000179 wd 0.0500 time 0.4499 (0.4541) data time 0.0008 (0.0027) model time 0.4491 (0.4502) loss 2.6301 (2.6183) grad_norm 2.3943 (2.6557) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][370/625] eta 0:01:55 lr 0.000179 wd 0.0500 time 0.4530 (0.4540) data time 0.0008 (0.0027) model time 0.4522 (0.4502) loss 2.4052 (2.6130) grad_norm 2.3603 (2.6630) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][380/625] eta 0:01:51 lr 0.000178 wd 0.0500 time 0.4489 (0.4539) data time 0.0006 (0.0026) model time 0.4483 (0.4502) loss 2.9281 (2.6165) grad_norm 9.2501 (2.6660) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][390/625] eta 0:01:46 lr 0.000178 wd 0.0500 time 0.4472 (0.4538) data time 0.0006 (0.0026) model time 0.4466 (0.4501) loss 2.4217 (2.6179) grad_norm 1.9657 (2.6588) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][400/625] eta 0:01:42 lr 0.000178 wd 0.0500 time 0.4465 (0.4536) data time 0.0008 (0.0025) model time 0.4457 (0.4500) loss 2.8005 (2.6166) grad_norm 2.7110 (2.6666) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][410/625] eta 0:01:37 lr 0.000178 wd 0.0500 time 0.4463 (0.4535) data time 0.0007 (0.0025) model time 0.4456 (0.4499) loss 2.7028 (2.6191) grad_norm 1.6204 (2.6529) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][420/625] eta 0:01:33 lr 0.000178 wd 0.0500 time 0.4562 (0.4538) data time 0.0007 (0.0024) model time 0.4555 (0.4503) loss 2.3066 (2.6195) grad_norm 2.2266 (2.6373) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][430/625] eta 0:01:28 lr 0.000178 wd 0.0500 time 0.4492 (0.4537) data time 0.0008 (0.0024) model time 0.4484 (0.4503) loss 2.2629 (2.6208) grad_norm 2.5865 (2.6476) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][440/625] eta 0:01:23 lr 0.000178 wd 0.0500 time 0.4571 (0.4538) data time 0.0009 (0.0024) model time 0.4561 (0.4505) loss 2.1982 (2.6167) grad_norm 2.5497 (2.6436) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][450/625] eta 0:01:19 lr 0.000178 wd 0.0500 time 0.4632 (0.4539) data time 0.0008 (0.0023) model time 0.4624 (0.4506) loss 1.6464 (2.6177) grad_norm 1.8923 (2.6332) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][460/625] eta 0:01:14 lr 0.000178 wd 0.0500 time 0.4454 (0.4537) data time 0.0006 (0.0023) model time 0.4448 (0.4505) loss 1.6842 (2.6112) grad_norm 3.3324 (2.6225) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][470/625] eta 0:01:10 lr 0.000178 wd 0.0500 time 0.4480 (0.4536) data time 0.0009 (0.0023) model time 0.4471 (0.4505) loss 3.1039 (2.6157) grad_norm 6.6722 (2.6683) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][480/625] eta 0:01:05 lr 0.000178 wd 0.0500 time 0.4469 (0.4535) data time 0.0008 (0.0022) model time 0.4461 (0.4504) loss 2.5813 (2.6102) grad_norm 1.9080 (2.6733) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][490/625] eta 0:01:01 lr 0.000178 wd 0.0500 time 0.4490 (0.4539) data time 0.0009 (0.0022) model time 0.4480 (0.4509) loss 2.9814 (2.6119) grad_norm 2.3595 (2.6623) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][500/625] eta 0:00:56 lr 0.000178 wd 0.0500 time 0.4478 (0.4538) data time 0.0007 (0.0022) model time 0.4471 (0.4508) loss 2.5843 (2.6088) grad_norm 1.8763 (2.6518) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][510/625] eta 0:00:52 lr 0.000178 wd 0.0500 time 0.4458 (0.4538) data time 0.0009 (0.0022) model time 0.4450 (0.4508) loss 1.6097 (2.6063) grad_norm 2.3183 (2.6449) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][520/625] eta 0:00:47 lr 0.000177 wd 0.0500 time 0.4535 (0.4537) data time 0.0007 (0.0021) model time 0.4528 (0.4508) loss 2.3116 (2.6069) grad_norm 2.5102 (2.6437) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][530/625] eta 0:00:43 lr 0.000177 wd 0.0500 time 0.4465 (0.4536) data time 0.0009 (0.0021) model time 0.4456 (0.4507) loss 2.3106 (2.6077) grad_norm 3.5652 (2.6367) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][540/625] eta 0:00:38 lr 0.000177 wd 0.0500 time 0.4495 (0.4535) data time 0.0008 (0.0021) model time 0.4487 (0.4507) loss 2.1872 (2.6071) grad_norm 2.0100 (2.6301) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][550/625] eta 0:00:34 lr 0.000177 wd 0.0500 time 0.4452 (0.4534) data time 0.0009 (0.0021) model time 0.4443 (0.4506) loss 2.8865 (2.6062) grad_norm 5.0956 (2.7013) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][560/625] eta 0:00:29 lr 0.000177 wd 0.0500 time 0.4545 (0.4534) data time 0.0006 (0.0020) model time 0.4538 (0.4506) loss 2.2759 (2.6023) grad_norm 2.1236 (2.6914) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][570/625] eta 0:00:24 lr 0.000177 wd 0.0500 time 0.4477 (0.4534) data time 0.0007 (0.0020) model time 0.4470 (0.4506) loss 1.8585 (2.5990) grad_norm 1.6291 (2.6899) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][580/625] eta 0:00:20 lr 0.000177 wd 0.0500 time 0.4449 (0.4533) data time 0.0008 (0.0020) model time 0.4441 (0.4506) loss 2.7975 (2.6019) grad_norm 1.7822 (2.7411) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:07:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][590/625] eta 0:00:15 lr 0.000177 wd 0.0500 time 0.4483 (0.4532) data time 0.0007 (0.0020) model time 0.4476 (0.4506) loss 2.5119 (2.6055) grad_norm 2.0657 (2.7342) loss_scale 512.0000 (257.7327) mem 16699MB [2024-08-11 07:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][600/625] eta 0:00:11 lr 0.000177 wd 0.0500 time 0.4454 (0.4532) data time 0.0008 (0.0020) model time 0.4446 (0.4505) loss 2.3638 (2.6051) grad_norm 1.9264 (2.7264) loss_scale 512.0000 (261.9634) mem 16699MB [2024-08-11 07:07:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][610/625] eta 0:00:06 lr 0.000177 wd 0.0500 time 0.4453 (0.4531) data time 0.0006 (0.0019) model time 0.4447 (0.4505) loss 2.2655 (2.6054) grad_norm 2.2373 (2.7335) loss_scale 512.0000 (266.0556) mem 16699MB [2024-08-11 07:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [231/300][620/625] eta 0:00:02 lr 0.000177 wd 0.0500 time 0.4454 (0.4529) data time 0.0004 (0.0019) model time 0.4450 (0.4503) loss 2.9937 (2.6040) grad_norm 2.3139 (2.7275) loss_scale 512.0000 (270.0161) mem 16699MB [2024-08-11 07:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 231 training takes 0:04:43 [2024-08-11 07:07:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:07:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.499 (0.499) Loss 0.5220 (0.5220) Acc@1 88.379 (88.379) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:07:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.155) Loss 0.8281 (0.6317) Acc@1 80.371 (86.594) Acc@5 96.191 (97.754) Mem 16699MB [2024-08-11 07:07:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.9126 (0.7446) Acc@1 79.004 (83.708) Acc@5 95.312 (96.712) Mem 16699MB [2024-08-11 07:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.457 Acc@5 96.661 [2024-08-11 07:08:00 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 07:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.019 (1.019) Loss 0.4812 (0.4812) Acc@1 89.404 (89.404) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:08:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.204) Loss 0.7754 (0.5921) Acc@1 81.445 (87.331) Acc@5 96.582 (97.945) Mem 16699MB [2024-08-11 07:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.162) Loss 0.8657 (0.6965) Acc@1 79.883 (84.610) Acc@5 96.045 (97.015) Mem 16699MB [2024-08-11 07:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.329 Acc@5 96.967 [2024-08-11 07:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:08:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][0/625] eta 0:15:52 lr 0.000177 wd 0.0500 time 1.5244 (1.5244) data time 0.7011 (0.7011) model time 0.0000 (0.0000) loss 1.7758 (1.7758) grad_norm 2.4062 (2.4062) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][10/625] eta 0:05:36 lr 0.000177 wd 0.0500 time 0.4501 (0.5472) data time 0.0007 (0.0647) model time 0.0000 (0.0000) loss 3.2371 (2.5400) grad_norm 1.8157 (2.4000) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][20/625] eta 0:05:03 lr 0.000177 wd 0.0500 time 0.4464 (0.5011) data time 0.0007 (0.0343) model time 0.0000 (0.0000) loss 1.7894 (2.5831) grad_norm 2.2733 (2.2992) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][30/625] eta 0:04:48 lr 0.000176 wd 0.0500 time 0.4483 (0.4850) data time 0.0006 (0.0235) model time 0.0000 (0.0000) loss 3.0714 (2.5963) grad_norm 1.5984 (2.1548) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][40/625] eta 0:04:38 lr 0.000176 wd 0.0500 time 0.4431 (0.4760) data time 0.0009 (0.0180) model time 0.0000 (0.0000) loss 2.8125 (2.6350) grad_norm 2.8923 (2.2324) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][50/625] eta 0:04:31 lr 0.000176 wd 0.0500 time 0.4511 (0.4717) data time 0.0009 (0.0146) model time 0.0000 (0.0000) loss 2.5516 (2.6661) grad_norm 2.6170 (2.2437) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][60/625] eta 0:04:24 lr 0.000176 wd 0.0500 time 0.4507 (0.4681) data time 0.0006 (0.0124) model time 0.4501 (0.4486) loss 2.2349 (2.6206) grad_norm 1.6551 (2.2335) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][70/625] eta 0:04:18 lr 0.000176 wd 0.0500 time 0.4514 (0.4654) data time 0.0006 (0.0108) model time 0.4508 (0.4485) loss 3.2059 (2.5966) grad_norm 3.0068 (2.2351) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][80/625] eta 0:04:12 lr 0.000176 wd 0.0500 time 0.4510 (0.4635) data time 0.0006 (0.0095) model time 0.4503 (0.4487) loss 2.5844 (2.5617) grad_norm 2.0216 (2.2214) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][90/625] eta 0:04:08 lr 0.000176 wd 0.0500 time 0.4526 (0.4645) data time 0.0007 (0.0086) model time 0.4519 (0.4546) loss 2.2826 (2.5741) grad_norm 2.3174 (2.3679) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][100/625] eta 0:04:03 lr 0.000176 wd 0.0500 time 0.4499 (0.4634) data time 0.0009 (0.0078) model time 0.4490 (0.4541) loss 2.9356 (2.5547) grad_norm 2.4313 (2.3826) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][110/625] eta 0:03:57 lr 0.000176 wd 0.0500 time 0.4446 (0.4621) data time 0.0008 (0.0072) model time 0.4438 (0.4531) loss 2.4970 (2.5554) grad_norm 3.1259 (2.3593) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:08:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][120/625] eta 0:03:52 lr 0.000176 wd 0.0500 time 0.4478 (0.4609) data time 0.0006 (0.0066) model time 0.4472 (0.4522) loss 3.1407 (2.5620) grad_norm 1.6964 (2.3411) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][130/625] eta 0:03:47 lr 0.000176 wd 0.0500 time 0.4477 (0.4601) data time 0.0009 (0.0062) model time 0.4469 (0.4519) loss 2.5125 (2.5510) grad_norm 2.4178 (2.3453) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][140/625] eta 0:03:42 lr 0.000176 wd 0.0500 time 0.4473 (0.4593) data time 0.0007 (0.0058) model time 0.4465 (0.4515) loss 2.7197 (2.5548) grad_norm 2.1787 (2.3294) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][150/625] eta 0:03:37 lr 0.000176 wd 0.0500 time 0.4545 (0.4588) data time 0.0008 (0.0055) model time 0.4537 (0.4514) loss 2.6428 (2.5647) grad_norm 2.5352 (2.3352) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][160/625] eta 0:03:33 lr 0.000175 wd 0.0500 time 0.4531 (0.4583) data time 0.0008 (0.0052) model time 0.4523 (0.4513) loss 3.1246 (2.5660) grad_norm 2.6753 (2.4555) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][170/625] eta 0:03:28 lr 0.000175 wd 0.0500 time 0.4471 (0.4579) data time 0.0007 (0.0050) model time 0.4464 (0.4512) loss 3.1624 (2.5755) grad_norm 3.4923 (2.5502) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][180/625] eta 0:03:23 lr 0.000175 wd 0.0500 time 0.4540 (0.4576) data time 0.0006 (0.0047) model time 0.4534 (0.4512) loss 2.0915 (2.5800) grad_norm 2.4241 (2.5596) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][190/625] eta 0:03:18 lr 0.000175 wd 0.0500 time 0.4513 (0.4572) data time 0.0006 (0.0045) model time 0.4508 (0.4511) loss 2.8949 (2.5766) grad_norm 2.1620 (2.5572) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][200/625] eta 0:03:14 lr 0.000175 wd 0.0500 time 0.4551 (0.4568) data time 0.0009 (0.0043) model time 0.4542 (0.4509) loss 2.7175 (2.5742) grad_norm 2.1309 (2.5496) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][210/625] eta 0:03:09 lr 0.000175 wd 0.0500 time 0.4500 (0.4564) data time 0.0009 (0.0042) model time 0.4492 (0.4507) loss 3.0438 (2.5814) grad_norm 3.5982 (2.5350) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][220/625] eta 0:03:04 lr 0.000175 wd 0.0500 time 0.4503 (0.4561) data time 0.0011 (0.0040) model time 0.4492 (0.4506) loss 2.8283 (2.5822) grad_norm 2.8183 (2.5394) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][230/625] eta 0:03:00 lr 0.000175 wd 0.0500 time 0.4498 (0.4559) data time 0.0008 (0.0039) model time 0.4490 (0.4506) loss 2.5112 (2.5697) grad_norm 3.1184 (2.5491) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][240/625] eta 0:02:55 lr 0.000175 wd 0.0500 time 0.4499 (0.4558) data time 0.0007 (0.0038) model time 0.4492 (0.4507) loss 3.0954 (2.5717) grad_norm 2.2479 (2.5531) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][250/625] eta 0:02:50 lr 0.000175 wd 0.0500 time 0.4511 (0.4556) data time 0.0008 (0.0036) model time 0.4503 (0.4507) loss 3.0224 (2.5749) grad_norm 1.8460 (2.5366) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][260/625] eta 0:02:46 lr 0.000175 wd 0.0500 time 0.4544 (0.4554) data time 0.0008 (0.0035) model time 0.4535 (0.4507) loss 2.1272 (2.5696) grad_norm 2.1826 (2.5193) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][270/625] eta 0:02:41 lr 0.000175 wd 0.0500 time 0.4506 (0.4557) data time 0.0009 (0.0034) model time 0.4498 (0.4512) loss 1.7776 (2.5659) grad_norm 1.4985 (2.4992) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][280/625] eta 0:02:37 lr 0.000175 wd 0.0500 time 0.4401 (0.4561) data time 0.0006 (0.0034) model time 0.4395 (0.4519) loss 2.9753 (2.5664) grad_norm 2.0484 (2.4982) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][290/625] eta 0:02:32 lr 0.000175 wd 0.0500 time 0.4416 (0.4560) data time 0.0007 (0.0033) model time 0.4409 (0.4518) loss 3.3643 (2.5709) grad_norm 6.8111 (2.5114) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][300/625] eta 0:02:28 lr 0.000174 wd 0.0500 time 0.4508 (0.4559) data time 0.0008 (0.0032) model time 0.4500 (0.4518) loss 2.3395 (2.5622) grad_norm 1.7023 (2.5094) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][310/625] eta 0:02:23 lr 0.000174 wd 0.0500 time 0.4555 (0.4558) data time 0.0008 (0.0031) model time 0.4547 (0.4518) loss 2.9055 (2.5685) grad_norm 2.2691 (2.4899) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][320/625] eta 0:02:18 lr 0.000174 wd 0.0500 time 0.4381 (0.4556) data time 0.0007 (0.0031) model time 0.4374 (0.4516) loss 1.8764 (2.5700) grad_norm 1.7561 (2.4950) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][330/625] eta 0:02:14 lr 0.000174 wd 0.0500 time 0.4508 (0.4554) data time 0.0007 (0.0030) model time 0.4501 (0.4515) loss 1.5833 (2.5687) grad_norm 2.0862 (2.5003) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][340/625] eta 0:02:09 lr 0.000174 wd 0.0500 time 0.4471 (0.4552) data time 0.0008 (0.0029) model time 0.4462 (0.4514) loss 2.9810 (2.5687) grad_norm 2.5696 (2.5183) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][350/625] eta 0:02:05 lr 0.000174 wd 0.0500 time 0.4534 (0.4550) data time 0.0005 (0.0029) model time 0.4529 (0.4513) loss 2.7131 (2.5705) grad_norm 2.2168 (2.5119) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][360/625] eta 0:02:00 lr 0.000174 wd 0.0500 time 0.4508 (0.4549) data time 0.0009 (0.0028) model time 0.4499 (0.4513) loss 2.8585 (2.5698) grad_norm 2.6392 (2.5100) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][370/625] eta 0:01:55 lr 0.000174 wd 0.0500 time 0.4452 (0.4547) data time 0.0006 (0.0028) model time 0.4446 (0.4511) loss 2.3688 (2.5738) grad_norm 4.4751 (2.5220) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:10:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][380/625] eta 0:01:51 lr 0.000174 wd 0.0500 time 0.4525 (0.4545) data time 0.0008 (0.0027) model time 0.4517 (0.4510) loss 2.4503 (2.5687) grad_norm 2.3823 (2.5668) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][390/625] eta 0:01:46 lr 0.000174 wd 0.0500 time 0.4462 (0.4544) data time 0.0006 (0.0027) model time 0.4455 (0.4509) loss 2.2810 (2.5708) grad_norm 2.4804 (2.5606) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][400/625] eta 0:01:42 lr 0.000174 wd 0.0500 time 0.4461 (0.4542) data time 0.0009 (0.0026) model time 0.4452 (0.4508) loss 2.6437 (2.5741) grad_norm 2.4580 (2.5534) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][410/625] eta 0:01:37 lr 0.000174 wd 0.0500 time 0.6313 (0.4545) data time 0.0007 (0.0026) model time 0.6306 (0.4512) loss 2.9920 (2.5735) grad_norm 1.9979 (2.5362) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][420/625] eta 0:01:33 lr 0.000174 wd 0.0500 time 0.4452 (0.4541) data time 0.0006 (0.0025) model time 0.4446 (0.4508) loss 2.4758 (2.5728) grad_norm 2.5226 (2.5557) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][430/625] eta 0:01:28 lr 0.000174 wd 0.0500 time 0.4457 (0.4539) data time 0.0006 (0.0025) model time 0.4452 (0.4507) loss 3.1754 (2.5726) grad_norm 3.6091 (2.5465) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][440/625] eta 0:01:23 lr 0.000173 wd 0.0500 time 0.4461 (0.4538) data time 0.0010 (0.0025) model time 0.4451 (0.4505) loss 2.3066 (2.5761) grad_norm 3.2205 (2.5573) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][450/625] eta 0:01:19 lr 0.000173 wd 0.0500 time 0.4458 (0.4536) data time 0.0010 (0.0024) model time 0.4448 (0.4504) loss 1.9978 (2.5734) grad_norm 2.9024 (2.5569) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][460/625] eta 0:01:14 lr 0.000173 wd 0.0500 time 0.4468 (0.4535) data time 0.0007 (0.0024) model time 0.4461 (0.4503) loss 2.8712 (2.5775) grad_norm 2.6327 (2.5526) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][470/625] eta 0:01:10 lr 0.000173 wd 0.0500 time 0.4460 (0.4533) data time 0.0006 (0.0024) model time 0.4453 (0.4502) loss 2.4776 (2.5778) grad_norm 3.1342 (2.5581) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][480/625] eta 0:01:05 lr 0.000173 wd 0.0500 time 0.4465 (0.4532) data time 0.0006 (0.0023) model time 0.4459 (0.4501) loss 1.5592 (2.5813) grad_norm 2.2030 (2.5503) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][490/625] eta 0:01:01 lr 0.000173 wd 0.0500 time 0.4436 (0.4531) data time 0.0008 (0.0023) model time 0.4428 (0.4500) loss 2.5833 (2.5782) grad_norm 1.6060 (2.5497) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][500/625] eta 0:00:56 lr 0.000173 wd 0.0500 time 0.4475 (0.4533) data time 0.0008 (0.0023) model time 0.4467 (0.4503) loss 2.6641 (2.5796) grad_norm 3.3668 (2.5522) loss_scale 512.0000 (512.0000) mem 16699MB [2024-08-11 07:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][510/625] eta 0:00:52 lr 0.000173 wd 0.0500 time 0.4451 (0.4531) data time 0.0009 (0.0023) model time 0.4443 (0.4502) loss 2.4394 (2.5770) grad_norm 1.8136 (inf) loss_scale 256.0000 (510.9980) mem 16699MB [2024-08-11 07:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][520/625] eta 0:00:47 lr 0.000173 wd 0.0500 time 0.4490 (0.4530) data time 0.0009 (0.0022) model time 0.4481 (0.4501) loss 2.7864 (2.5777) grad_norm 1.8580 (inf) loss_scale 256.0000 (506.1036) mem 16699MB [2024-08-11 07:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][530/625] eta 0:00:43 lr 0.000173 wd 0.0500 time 0.4478 (0.4530) data time 0.0006 (0.0022) model time 0.4472 (0.4501) loss 2.0792 (2.5783) grad_norm 3.0240 (inf) loss_scale 256.0000 (501.3936) mem 16699MB [2024-08-11 07:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][540/625] eta 0:00:38 lr 0.000173 wd 0.0500 time 0.4502 (0.4529) data time 0.0007 (0.0022) model time 0.4495 (0.4500) loss 2.9189 (2.5806) grad_norm 1.7395 (inf) loss_scale 256.0000 (496.8577) mem 16699MB [2024-08-11 07:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][550/625] eta 0:00:33 lr 0.000173 wd 0.0500 time 0.4463 (0.4528) data time 0.0006 (0.0021) model time 0.4457 (0.4500) loss 2.8160 (2.5819) grad_norm 2.4375 (inf) loss_scale 256.0000 (492.4864) mem 16699MB [2024-08-11 07:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][560/625] eta 0:00:29 lr 0.000173 wd 0.0500 time 0.4438 (0.4527) data time 0.0008 (0.0021) model time 0.4429 (0.4499) loss 1.6664 (2.5834) grad_norm 6.5979 (inf) loss_scale 256.0000 (488.2709) mem 16699MB [2024-08-11 07:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][570/625] eta 0:00:24 lr 0.000172 wd 0.0500 time 0.4501 (0.4526) data time 0.0006 (0.0021) model time 0.4495 (0.4499) loss 2.7244 (2.5830) grad_norm 3.7528 (inf) loss_scale 256.0000 (484.2032) mem 16699MB [2024-08-11 07:12:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][580/625] eta 0:00:20 lr 0.000172 wd 0.0500 time 0.4460 (0.4526) data time 0.0007 (0.0021) model time 0.4453 (0.4499) loss 2.9293 (2.5871) grad_norm 3.1479 (inf) loss_scale 256.0000 (480.2754) mem 16699MB [2024-08-11 07:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][590/625] eta 0:00:15 lr 0.000172 wd 0.0500 time 0.4482 (0.4525) data time 0.0007 (0.0021) model time 0.4475 (0.4498) loss 2.6375 (2.5856) grad_norm 2.2921 (inf) loss_scale 256.0000 (476.4805) mem 16699MB [2024-08-11 07:12:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][600/625] eta 0:00:11 lr 0.000172 wd 0.0500 time 0.4515 (0.4525) data time 0.0009 (0.0020) model time 0.4506 (0.4499) loss 1.7763 (2.5844) grad_norm 2.7552 (inf) loss_scale 256.0000 (472.8120) mem 16699MB [2024-08-11 07:12:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][610/625] eta 0:00:06 lr 0.000172 wd 0.0500 time 0.4424 (0.4525) data time 0.0004 (0.0020) model time 0.4420 (0.4498) loss 2.7227 (2.5833) grad_norm 1.9474 (inf) loss_scale 256.0000 (469.2635) mem 16699MB [2024-08-11 07:12:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [232/300][620/625] eta 0:00:02 lr 0.000172 wd 0.0500 time 0.4447 (0.4524) data time 0.0006 (0.0020) model time 0.4441 (0.4497) loss 2.5841 (2.5840) grad_norm 3.5875 (inf) loss_scale 256.0000 (465.8293) mem 16699MB [2024-08-11 07:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 232 training takes 0:04:42 [2024-08-11 07:12:46 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:12:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:12:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5415 (0.5415) Acc@1 88.184 (88.184) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 07:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8198 (0.6279) Acc@1 80.908 (86.555) Acc@5 96.338 (97.772) Mem 16699MB [2024-08-11 07:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9243 (0.7414) Acc@1 78.662 (83.750) Acc@5 95.312 (96.696) Mem 16699MB [2024-08-11 07:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.495 Acc@5 96.671 [2024-08-11 07:12:51 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 07:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.776 (0.776) Loss 0.4822 (0.4822) Acc@1 89.355 (89.355) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:12:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.180) Loss 0.7773 (0.5925) Acc@1 81.201 (87.314) Acc@5 96.680 (97.949) Mem 16699MB [2024-08-11 07:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.149) Loss 0.8662 (0.6969) Acc@1 79.834 (84.598) Acc@5 96.045 (97.026) Mem 16699MB [2024-08-11 07:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.313 Acc@5 96.975 [2024-08-11 07:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][0/625] eta 0:13:03 lr 0.000172 wd 0.0500 time 1.2531 (1.2531) data time 0.7947 (0.7947) model time 0.0000 (0.0000) loss 3.1192 (3.1192) grad_norm 1.7756 (1.7756) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][10/625] eta 0:05:19 lr 0.000172 wd 0.0500 time 0.4485 (0.5197) data time 0.0009 (0.0730) model time 0.0000 (0.0000) loss 2.4615 (2.7275) grad_norm 2.7389 (2.5886) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][20/625] eta 0:04:53 lr 0.000172 wd 0.0500 time 0.4475 (0.4848) data time 0.0008 (0.0386) model time 0.0000 (0.0000) loss 3.0696 (2.6883) grad_norm 2.3538 (2.5995) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][30/625] eta 0:04:44 lr 0.000172 wd 0.0500 time 0.4447 (0.4779) data time 0.0006 (0.0264) model time 0.0000 (0.0000) loss 2.8821 (2.7154) grad_norm 2.6255 (2.5586) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][40/625] eta 0:04:35 lr 0.000172 wd 0.0500 time 0.4518 (0.4710) data time 0.0008 (0.0201) model time 0.0000 (0.0000) loss 2.5614 (2.6812) grad_norm 2.3000 (2.5020) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][50/625] eta 0:04:28 lr 0.000172 wd 0.0500 time 0.4473 (0.4673) data time 0.0006 (0.0163) model time 0.0000 (0.0000) loss 2.0690 (2.6457) grad_norm 2.0578 (2.4283) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][60/625] eta 0:04:22 lr 0.000172 wd 0.0500 time 0.4437 (0.4641) data time 0.0006 (0.0138) model time 0.4430 (0.4465) loss 3.0675 (2.6915) grad_norm 2.4255 (2.4461) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][70/625] eta 0:04:16 lr 0.000172 wd 0.0500 time 0.4443 (0.4615) data time 0.0006 (0.0120) model time 0.4437 (0.4460) loss 3.2406 (2.7136) grad_norm 1.9355 (2.4267) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][80/625] eta 0:04:10 lr 0.000171 wd 0.0500 time 0.4458 (0.4596) data time 0.0009 (0.0106) model time 0.4450 (0.4457) loss 2.4371 (2.6986) grad_norm 1.5104 (2.4319) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][90/625] eta 0:04:05 lr 0.000171 wd 0.0500 time 0.4454 (0.4581) data time 0.0007 (0.0095) model time 0.4448 (0.4456) loss 3.1936 (2.7010) grad_norm 3.3143 (2.4641) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][100/625] eta 0:03:59 lr 0.000171 wd 0.0500 time 0.4468 (0.4571) data time 0.0006 (0.0087) model time 0.4463 (0.4457) loss 2.5494 (2.6684) grad_norm 2.5969 (2.4561) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][110/625] eta 0:03:55 lr 0.000171 wd 0.0500 time 0.4544 (0.4566) data time 0.0008 (0.0080) model time 0.4535 (0.4466) loss 2.9128 (2.6736) grad_norm 1.7233 (2.4601) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][120/625] eta 0:03:50 lr 0.000171 wd 0.0500 time 0.4423 (0.4560) data time 0.0006 (0.0074) model time 0.4417 (0.4469) loss 1.9125 (2.6590) grad_norm 9.5875 (2.5216) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][130/625] eta 0:03:45 lr 0.000171 wd 0.0500 time 0.4481 (0.4556) data time 0.0006 (0.0069) model time 0.4475 (0.4472) loss 3.1402 (2.6603) grad_norm 1.8615 (2.5343) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:13:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][140/625] eta 0:03:40 lr 0.000171 wd 0.0500 time 0.4450 (0.4549) data time 0.0009 (0.0064) model time 0.4441 (0.4471) loss 2.5759 (2.6451) grad_norm 3.3769 (2.5487) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][150/625] eta 0:03:35 lr 0.000171 wd 0.0500 time 0.4517 (0.4544) data time 0.0008 (0.0061) model time 0.4509 (0.4470) loss 2.7004 (2.6416) grad_norm 2.6988 (2.5318) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][160/625] eta 0:03:31 lr 0.000171 wd 0.0500 time 0.4446 (0.4539) data time 0.0008 (0.0058) model time 0.4439 (0.4468) loss 1.8032 (2.6297) grad_norm 2.2847 (2.5146) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][170/625] eta 0:03:26 lr 0.000171 wd 0.0500 time 0.4533 (0.4535) data time 0.0008 (0.0055) model time 0.4525 (0.4468) loss 2.5014 (2.6335) grad_norm 1.8693 (2.5088) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][180/625] eta 0:03:22 lr 0.000171 wd 0.0500 time 0.4504 (0.4544) data time 0.0008 (0.0052) model time 0.4495 (0.4485) loss 2.8606 (2.6362) grad_norm 3.6485 (2.4953) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][190/625] eta 0:03:17 lr 0.000171 wd 0.0500 time 0.4576 (0.4542) data time 0.0008 (0.0050) model time 0.4568 (0.4485) loss 1.9036 (2.6315) grad_norm 2.7391 (2.5063) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][200/625] eta 0:03:12 lr 0.000171 wd 0.0500 time 0.4511 (0.4540) data time 0.0006 (0.0048) model time 0.4505 (0.4486) loss 2.8621 (2.6309) grad_norm 4.3030 (2.5149) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][210/625] eta 0:03:08 lr 0.000171 wd 0.0500 time 0.4497 (0.4537) data time 0.0006 (0.0046) model time 0.4491 (0.4486) loss 3.1765 (2.6336) grad_norm 2.8733 (2.5087) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][220/625] eta 0:03:03 lr 0.000170 wd 0.0500 time 0.4461 (0.4535) data time 0.0006 (0.0044) model time 0.4455 (0.4485) loss 2.1968 (2.6241) grad_norm 3.2629 (2.4974) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][230/625] eta 0:02:59 lr 0.000170 wd 0.0500 time 0.4521 (0.4533) data time 0.0006 (0.0042) model time 0.4515 (0.4486) loss 1.7503 (2.6122) grad_norm 1.9405 (2.4779) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][240/625] eta 0:02:54 lr 0.000170 wd 0.0500 time 0.4460 (0.4531) data time 0.0008 (0.0041) model time 0.4451 (0.4485) loss 1.8343 (2.6015) grad_norm 3.2559 (2.4654) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][250/625] eta 0:02:49 lr 0.000170 wd 0.0500 time 0.4523 (0.4529) data time 0.0008 (0.0040) model time 0.4515 (0.4485) loss 2.9859 (2.6068) grad_norm 4.0906 (2.4681) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][260/625] eta 0:02:45 lr 0.000170 wd 0.0500 time 0.4515 (0.4528) data time 0.0008 (0.0039) model time 0.4507 (0.4484) loss 3.0690 (2.6031) grad_norm 1.8614 (2.4527) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:14:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][270/625] eta 0:02:40 lr 0.000170 wd 0.0500 time 0.4458 (0.4531) data time 0.0006 (0.0037) model time 0.4452 (0.4489) loss 2.8749 (2.5969) grad_norm 5.4823 (2.4633) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][280/625] eta 0:02:36 lr 0.000170 wd 0.0500 time 0.4478 (0.4529) data time 0.0009 (0.0036) model time 0.4469 (0.4489) loss 2.7158 (2.5919) grad_norm 2.4409 (2.4578) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][290/625] eta 0:02:31 lr 0.000170 wd 0.0500 time 0.4484 (0.4528) data time 0.0006 (0.0036) model time 0.4478 (0.4489) loss 2.6175 (2.5961) grad_norm 1.8946 (2.4948) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][300/625] eta 0:02:27 lr 0.000170 wd 0.0500 time 0.4477 (0.4526) data time 0.0006 (0.0035) model time 0.4471 (0.4488) loss 2.5558 (2.5947) grad_norm 1.8593 (2.4867) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][310/625] eta 0:02:22 lr 0.000170 wd 0.0500 time 0.4459 (0.4524) data time 0.0008 (0.0034) model time 0.4451 (0.4486) loss 2.6361 (2.6009) grad_norm 2.4457 (2.4812) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][320/625] eta 0:02:17 lr 0.000170 wd 0.0500 time 0.4491 (0.4523) data time 0.0007 (0.0033) model time 0.4485 (0.4486) loss 2.4865 (2.5986) grad_norm 2.7896 (2.4851) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][330/625] eta 0:02:13 lr 0.000170 wd 0.0500 time 0.4489 (0.4522) data time 0.0006 (0.0032) model time 0.4482 (0.4486) loss 1.4807 (2.5950) grad_norm 2.0449 (2.4772) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][340/625] eta 0:02:08 lr 0.000170 wd 0.0500 time 0.4498 (0.4521) data time 0.0007 (0.0032) model time 0.4492 (0.4486) loss 2.8063 (2.5947) grad_norm 2.4502 (2.5194) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][350/625] eta 0:02:04 lr 0.000170 wd 0.0500 time 0.4483 (0.4521) data time 0.0010 (0.0031) model time 0.4472 (0.4486) loss 2.8612 (2.6029) grad_norm 3.2812 (2.5212) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][360/625] eta 0:01:59 lr 0.000169 wd 0.0500 time 0.4489 (0.4519) data time 0.0009 (0.0030) model time 0.4481 (0.4486) loss 2.4978 (2.6010) grad_norm 1.8399 (2.5362) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][370/625] eta 0:01:55 lr 0.000169 wd 0.0500 time 0.4493 (0.4518) data time 0.0010 (0.0030) model time 0.4484 (0.4485) loss 2.5634 (2.6063) grad_norm 1.9730 (2.5240) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][380/625] eta 0:01:50 lr 0.000169 wd 0.0500 time 0.4529 (0.4517) data time 0.0007 (0.0029) model time 0.4522 (0.4484) loss 2.9166 (2.6083) grad_norm 2.7426 (2.5129) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][390/625] eta 0:01:46 lr 0.000169 wd 0.0500 time 0.4615 (0.4517) data time 0.0010 (0.0029) model time 0.4606 (0.4485) loss 1.8058 (2.6048) grad_norm 1.9721 (2.5035) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:15:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][400/625] eta 0:01:41 lr 0.000169 wd 0.0500 time 0.4476 (0.4516) data time 0.0010 (0.0028) model time 0.4466 (0.4485) loss 2.5930 (2.6021) grad_norm 2.0801 (2.4856) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][410/625] eta 0:01:37 lr 0.000169 wd 0.0500 time 0.4528 (0.4516) data time 0.0008 (0.0028) model time 0.4520 (0.4485) loss 2.7328 (2.6013) grad_norm 2.2394 (2.4857) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:16:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][420/625] eta 0:01:32 lr 0.000169 wd 0.0500 time 0.4500 (0.4519) data time 0.0007 (0.0027) model time 0.4493 (0.4489) loss 1.9374 (2.5976) grad_norm 2.7540 (2.4851) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 07:16:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][430/625] eta 0:01:28 lr 0.000169 wd 0.0500 time 0.4505 (0.4518) data time 0.0009 (0.0027) model time 0.4496 (0.4489) loss 2.2355 (2.6019) grad_norm 1.7041 (inf) loss_scale 128.0000 (253.6241) mem 16699MB [2024-08-11 07:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][440/625] eta 0:01:23 lr 0.000169 wd 0.0500 time 0.4490 (0.4517) data time 0.0006 (0.0026) model time 0.4484 (0.4488) loss 2.7208 (2.5989) grad_norm 1.9554 (inf) loss_scale 128.0000 (250.7755) mem 16699MB [2024-08-11 07:16:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][450/625] eta 0:01:19 lr 0.000169 wd 0.0500 time 0.4489 (0.4516) data time 0.0010 (0.0026) model time 0.4479 (0.4487) loss 2.5683 (2.5959) grad_norm 1.7824 (inf) loss_scale 128.0000 (248.0532) mem 16699MB [2024-08-11 07:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][460/625] eta 0:01:14 lr 0.000169 wd 0.0500 time 0.4470 (0.4515) data time 0.0006 (0.0026) model time 0.4463 (0.4487) loss 1.9430 (2.5923) grad_norm 2.5063 (inf) loss_scale 128.0000 (245.4490) mem 16699MB [2024-08-11 07:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][470/625] eta 0:01:09 lr 0.000169 wd 0.0500 time 0.4634 (0.4514) data time 0.0006 (0.0025) model time 0.4628 (0.4487) loss 2.8718 (2.5885) grad_norm 1.9091 (inf) loss_scale 128.0000 (242.9554) mem 16699MB [2024-08-11 07:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][480/625] eta 0:01:05 lr 0.000169 wd 0.0500 time 0.4461 (0.4514) data time 0.0007 (0.0025) model time 0.4455 (0.4487) loss 2.8856 (2.5870) grad_norm 3.3728 (inf) loss_scale 128.0000 (240.5655) mem 16699MB [2024-08-11 07:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][490/625] eta 0:01:00 lr 0.000169 wd 0.0500 time 0.4505 (0.4513) data time 0.0008 (0.0025) model time 0.4498 (0.4486) loss 2.9010 (2.5874) grad_norm 2.2491 (inf) loss_scale 128.0000 (238.2729) mem 16699MB [2024-08-11 07:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][500/625] eta 0:00:56 lr 0.000168 wd 0.0500 time 0.4445 (0.4513) data time 0.0006 (0.0024) model time 0.4439 (0.4486) loss 2.5568 (2.5894) grad_norm 2.1520 (inf) loss_scale 128.0000 (236.0719) mem 16699MB [2024-08-11 07:16:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][510/625] eta 0:00:51 lr 0.000168 wd 0.0500 time 0.4461 (0.4516) data time 0.0008 (0.0024) model time 0.4453 (0.4490) loss 2.5966 (2.5909) grad_norm 2.6310 (inf) loss_scale 128.0000 (233.9569) mem 16699MB [2024-08-11 07:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][520/625] eta 0:00:47 lr 0.000168 wd 0.0500 time 0.4490 (0.4515) data time 0.0006 (0.0024) model time 0.4484 (0.4490) loss 3.0673 (2.5904) grad_norm 1.5412 (inf) loss_scale 128.0000 (231.9232) mem 16699MB [2024-08-11 07:16:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][530/625] eta 0:00:42 lr 0.000168 wd 0.0500 time 0.4435 (0.4514) data time 0.0006 (0.0023) model time 0.4429 (0.4489) loss 2.8703 (2.5944) grad_norm 2.1002 (inf) loss_scale 128.0000 (229.9661) mem 16699MB [2024-08-11 07:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][540/625] eta 0:00:38 lr 0.000168 wd 0.0500 time 0.4521 (0.4514) data time 0.0008 (0.0023) model time 0.4513 (0.4489) loss 3.0249 (2.5931) grad_norm 2.3091 (inf) loss_scale 128.0000 (228.0813) mem 16699MB [2024-08-11 07:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][550/625] eta 0:00:33 lr 0.000168 wd 0.0500 time 0.4492 (0.4513) data time 0.0010 (0.0023) model time 0.4483 (0.4489) loss 2.5387 (2.5958) grad_norm 2.4720 (inf) loss_scale 128.0000 (226.2650) mem 16699MB [2024-08-11 07:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][560/625] eta 0:00:29 lr 0.000168 wd 0.0500 time 0.4497 (0.4513) data time 0.0009 (0.0023) model time 0.4488 (0.4488) loss 2.5084 (2.5945) grad_norm 2.0698 (inf) loss_scale 128.0000 (224.5134) mem 16699MB [2024-08-11 07:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][570/625] eta 0:00:24 lr 0.000168 wd 0.0500 time 0.4468 (0.4513) data time 0.0008 (0.0022) model time 0.4460 (0.4488) loss 2.9313 (2.5960) grad_norm 2.2940 (inf) loss_scale 128.0000 (222.8231) mem 16699MB [2024-08-11 07:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][580/625] eta 0:00:20 lr 0.000168 wd 0.0500 time 0.4471 (0.4512) data time 0.0006 (0.0022) model time 0.4465 (0.4488) loss 1.9508 (2.5947) grad_norm 2.6170 (inf) loss_scale 128.0000 (221.1910) mem 16699MB [2024-08-11 07:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][590/625] eta 0:00:15 lr 0.000168 wd 0.0500 time 0.4554 (0.4512) data time 0.0010 (0.0022) model time 0.4544 (0.4488) loss 2.9826 (2.5971) grad_norm 1.7751 (inf) loss_scale 128.0000 (219.6142) mem 16699MB [2024-08-11 07:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][600/625] eta 0:00:11 lr 0.000168 wd 0.0500 time 0.4497 (0.4511) data time 0.0009 (0.0022) model time 0.4488 (0.4488) loss 2.5957 (2.5942) grad_norm 2.1418 (inf) loss_scale 128.0000 (218.0899) mem 16699MB [2024-08-11 07:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][610/625] eta 0:00:06 lr 0.000168 wd 0.0500 time 0.4467 (0.4511) data time 0.0006 (0.0022) model time 0.4461 (0.4487) loss 2.4832 (2.5984) grad_norm 1.8431 (inf) loss_scale 128.0000 (216.6154) mem 16699MB [2024-08-11 07:17:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [233/300][620/625] eta 0:00:02 lr 0.000168 wd 0.0500 time 0.4429 (0.4509) data time 0.0004 (0.0021) model time 0.4425 (0.4486) loss 2.9970 (2.5990) grad_norm 2.0929 (inf) loss_scale 128.0000 (215.1884) mem 16699MB [2024-08-11 07:17:36 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 233 training takes 0:04:41 [2024-08-11 07:17:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:17:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5093 (0.5093) Acc@1 88.623 (88.623) Acc@5 99.072 (99.072) Mem 16699MB [2024-08-11 07:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8271 (0.6220) Acc@1 81.348 (86.630) Acc@5 96.191 (97.789) Mem 16699MB [2024-08-11 07:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9014 (0.7385) Acc@1 80.078 (83.766) Acc@5 95.215 (96.684) Mem 16699MB [2024-08-11 07:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.527 Acc@5 96.647 [2024-08-11 07:17:41 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 07:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.813 (0.813) Loss 0.4832 (0.4832) Acc@1 89.551 (89.551) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:17:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.183) Loss 0.7773 (0.5932) Acc@1 81.104 (87.287) Acc@5 96.631 (97.936) Mem 16699MB [2024-08-11 07:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.8657 (0.6978) Acc@1 80.078 (84.608) Acc@5 95.947 (97.015) Mem 16699MB [2024-08-11 07:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.325 Acc@5 96.965 [2024-08-11 07:17:45 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:17:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][0/625] eta 0:14:04 lr 0.000168 wd 0.0500 time 1.3505 (1.3505) data time 0.6653 (0.6653) model time 0.0000 (0.0000) loss 2.5584 (2.5584) grad_norm 1.7416 (1.7416) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:17:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][10/625] eta 0:05:26 lr 0.000167 wd 0.0500 time 0.4472 (0.5316) data time 0.0008 (0.0612) model time 0.0000 (0.0000) loss 3.2577 (2.6413) grad_norm 2.3376 (2.1106) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:17:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][20/625] eta 0:04:57 lr 0.000167 wd 0.0500 time 0.4491 (0.4919) data time 0.0008 (0.0325) model time 0.0000 (0.0000) loss 1.8955 (2.5749) grad_norm 1.7554 (2.0807) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][30/625] eta 0:04:44 lr 0.000167 wd 0.0500 time 0.4463 (0.4774) data time 0.0006 (0.0222) model time 0.0000 (0.0000) loss 2.7316 (2.6090) grad_norm 2.1580 (2.2522) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][40/625] eta 0:04:36 lr 0.000167 wd 0.0500 time 0.5534 (0.4726) data time 0.0008 (0.0170) model time 0.0000 (0.0000) loss 2.5820 (2.5947) grad_norm 2.2380 (2.3469) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:18:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][50/625] eta 0:04:28 lr 0.000167 wd 0.0500 time 0.4385 (0.4663) data time 0.0007 (0.0139) model time 0.0000 (0.0000) loss 2.9239 (2.6185) grad_norm inf (inf) loss_scale 64.0000 (126.7451) mem 16699MB [2024-08-11 07:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][60/625] eta 0:04:21 lr 0.000167 wd 0.0500 time 0.4463 (0.4634) data time 0.0006 (0.0117) model time 0.4456 (0.4474) loss 2.3248 (2.5892) grad_norm 2.4980 (inf) loss_scale 64.0000 (116.4590) mem 16699MB [2024-08-11 07:18:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][70/625] eta 0:04:16 lr 0.000167 wd 0.0500 time 0.4497 (0.4614) data time 0.0007 (0.0102) model time 0.4490 (0.4479) loss 2.7969 (2.5841) grad_norm 2.0929 (inf) loss_scale 64.0000 (109.0704) mem 16699MB [2024-08-11 07:18:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][80/625] eta 0:04:10 lr 0.000167 wd 0.0500 time 0.4485 (0.4597) data time 0.0008 (0.0090) model time 0.4477 (0.4477) loss 3.1868 (2.5704) grad_norm 1.8072 (inf) loss_scale 64.0000 (103.5062) mem 16699MB [2024-08-11 07:18:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][90/625] eta 0:04:05 lr 0.000167 wd 0.0500 time 0.4491 (0.4585) data time 0.0007 (0.0081) model time 0.4484 (0.4476) loss 2.7841 (2.5886) grad_norm 2.0667 (inf) loss_scale 64.0000 (99.1648) mem 16699MB [2024-08-11 07:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][100/625] eta 0:04:00 lr 0.000167 wd 0.0500 time 0.4472 (0.4575) data time 0.0009 (0.0074) model time 0.4464 (0.4476) loss 2.8383 (2.5804) grad_norm 2.8906 (inf) loss_scale 64.0000 (95.6832) mem 16699MB [2024-08-11 07:18:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][110/625] eta 0:03:55 lr 0.000167 wd 0.0500 time 0.4475 (0.4566) data time 0.0006 (0.0068) model time 0.4469 (0.4476) loss 2.0078 (2.5739) grad_norm 2.4669 (inf) loss_scale 64.0000 (92.8288) mem 16699MB [2024-08-11 07:18:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][120/625] eta 0:03:50 lr 0.000167 wd 0.0500 time 0.4540 (0.4560) data time 0.0006 (0.0063) model time 0.4534 (0.4477) loss 3.0080 (2.5739) grad_norm 1.8708 (inf) loss_scale 64.0000 (90.4463) mem 16699MB [2024-08-11 07:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][130/625] eta 0:03:46 lr 0.000167 wd 0.0500 time 0.6666 (0.4571) data time 0.0007 (0.0059) model time 0.6659 (0.4504) loss 1.7733 (2.5613) grad_norm 2.3692 (inf) loss_scale 64.0000 (88.4275) mem 16699MB [2024-08-11 07:18:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][140/625] eta 0:03:41 lr 0.000167 wd 0.0500 time 0.4535 (0.4567) data time 0.0007 (0.0055) model time 0.4528 (0.4504) loss 2.8278 (2.5712) grad_norm 2.4952 (inf) loss_scale 64.0000 (86.6950) mem 16699MB [2024-08-11 07:18:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][150/625] eta 0:03:36 lr 0.000166 wd 0.0500 time 0.4495 (0.4562) data time 0.0006 (0.0052) model time 0.4489 (0.4502) loss 2.5353 (2.5734) grad_norm 1.9494 (inf) loss_scale 64.0000 (85.1921) mem 16699MB [2024-08-11 07:18:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][160/625] eta 0:03:31 lr 0.000166 wd 0.0500 time 0.4476 (0.4556) data time 0.0009 (0.0050) model time 0.4466 (0.4498) loss 3.0576 (2.5796) grad_norm 1.9545 (inf) loss_scale 64.0000 (83.8758) mem 16699MB [2024-08-11 07:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][170/625] eta 0:03:27 lr 0.000166 wd 0.0500 time 0.4469 (0.4552) data time 0.0007 (0.0047) model time 0.4462 (0.4496) loss 3.0417 (2.5682) grad_norm 2.0802 (inf) loss_scale 64.0000 (82.7135) mem 16699MB [2024-08-11 07:19:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][180/625] eta 0:03:22 lr 0.000166 wd 0.0500 time 0.4483 (0.4548) data time 0.0009 (0.0045) model time 0.4475 (0.4494) loss 2.6294 (2.5699) grad_norm 2.2444 (inf) loss_scale 64.0000 (81.6796) mem 16699MB [2024-08-11 07:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][190/625] eta 0:03:17 lr 0.000166 wd 0.0500 time 0.4502 (0.4544) data time 0.0008 (0.0043) model time 0.4494 (0.4492) loss 1.9276 (2.5689) grad_norm 1.6086 (inf) loss_scale 64.0000 (80.7539) mem 16699MB [2024-08-11 07:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][200/625] eta 0:03:12 lr 0.000166 wd 0.0500 time 0.4513 (0.4541) data time 0.0006 (0.0042) model time 0.4506 (0.4491) loss 2.0978 (2.5718) grad_norm 2.8296 (inf) loss_scale 64.0000 (79.9204) mem 16699MB [2024-08-11 07:19:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][210/625] eta 0:03:08 lr 0.000166 wd 0.0500 time 0.4500 (0.4539) data time 0.0009 (0.0040) model time 0.4492 (0.4491) loss 2.9652 (2.5846) grad_norm 6.8110 (inf) loss_scale 64.0000 (79.1659) mem 16699MB [2024-08-11 07:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][220/625] eta 0:03:03 lr 0.000166 wd 0.0500 time 0.4413 (0.4537) data time 0.0007 (0.0039) model time 0.4406 (0.4490) loss 2.3103 (2.5853) grad_norm 2.3072 (inf) loss_scale 64.0000 (78.4796) mem 16699MB [2024-08-11 07:19:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][230/625] eta 0:02:59 lr 0.000166 wd 0.0500 time 0.4578 (0.4535) data time 0.0008 (0.0037) model time 0.4570 (0.4490) loss 2.5106 (2.5759) grad_norm 3.5126 (inf) loss_scale 64.0000 (77.8528) mem 16699MB [2024-08-11 07:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][240/625] eta 0:02:54 lr 0.000166 wd 0.0500 time 0.4463 (0.4532) data time 0.0008 (0.0036) model time 0.4455 (0.4488) loss 2.5830 (2.5787) grad_norm 2.3125 (inf) loss_scale 64.0000 (77.2780) mem 16699MB [2024-08-11 07:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][250/625] eta 0:02:49 lr 0.000166 wd 0.0500 time 0.4475 (0.4529) data time 0.0006 (0.0035) model time 0.4469 (0.4486) loss 1.5551 (2.5761) grad_norm 3.3958 (inf) loss_scale 64.0000 (76.7490) mem 16699MB [2024-08-11 07:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][260/625] eta 0:02:45 lr 0.000166 wd 0.0500 time 0.6333 (0.4533) data time 0.0008 (0.0034) model time 0.6325 (0.4493) loss 2.7222 (2.5747) grad_norm 3.1381 (inf) loss_scale 64.0000 (76.2605) mem 16699MB [2024-08-11 07:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][270/625] eta 0:02:40 lr 0.000166 wd 0.0500 time 0.4503 (0.4530) data time 0.0007 (0.0033) model time 0.4496 (0.4490) loss 2.6022 (2.5763) grad_norm 3.9798 (inf) loss_scale 64.0000 (75.8081) mem 16699MB [2024-08-11 07:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][280/625] eta 0:02:36 lr 0.000166 wd 0.0500 time 0.4477 (0.4528) data time 0.0008 (0.0032) model time 0.4469 (0.4490) loss 2.8125 (2.5704) grad_norm 3.6437 (inf) loss_scale 64.0000 (75.3879) mem 16699MB [2024-08-11 07:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][290/625] eta 0:02:31 lr 0.000165 wd 0.0500 time 0.4488 (0.4527) data time 0.0009 (0.0031) model time 0.4480 (0.4490) loss 3.0262 (2.5656) grad_norm 1.9290 (inf) loss_scale 64.0000 (74.9966) mem 16699MB [2024-08-11 07:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][300/625] eta 0:02:27 lr 0.000165 wd 0.0500 time 0.4501 (0.4526) data time 0.0008 (0.0030) model time 0.4493 (0.4489) loss 2.9239 (2.5652) grad_norm 2.3928 (inf) loss_scale 64.0000 (74.6312) mem 16699MB [2024-08-11 07:20:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][310/625] eta 0:02:22 lr 0.000165 wd 0.0500 time 0.4477 (0.4524) data time 0.0009 (0.0030) model time 0.4468 (0.4489) loss 2.6357 (2.5660) grad_norm 2.6860 (inf) loss_scale 64.0000 (74.2894) mem 16699MB [2024-08-11 07:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][320/625] eta 0:02:18 lr 0.000165 wd 0.0500 time 0.6096 (0.4528) data time 0.0008 (0.0029) model time 0.6088 (0.4494) loss 2.4231 (2.5630) grad_norm 2.0899 (inf) loss_scale 64.0000 (73.9688) mem 16699MB [2024-08-11 07:20:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][330/625] eta 0:02:13 lr 0.000165 wd 0.0500 time 0.4495 (0.4526) data time 0.0006 (0.0028) model time 0.4489 (0.4493) loss 3.0751 (2.5647) grad_norm 2.3370 (inf) loss_scale 64.0000 (73.6677) mem 16699MB [2024-08-11 07:20:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][340/625] eta 0:02:08 lr 0.000165 wd 0.0500 time 0.4509 (0.4525) data time 0.0007 (0.0028) model time 0.4502 (0.4492) loss 2.6755 (2.5675) grad_norm 2.9039 (inf) loss_scale 64.0000 (73.3842) mem 16699MB [2024-08-11 07:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][350/625] eta 0:02:04 lr 0.000165 wd 0.0500 time 0.4515 (0.4524) data time 0.0009 (0.0027) model time 0.4506 (0.4492) loss 2.5995 (2.5654) grad_norm 2.0809 (inf) loss_scale 64.0000 (73.1168) mem 16699MB [2024-08-11 07:20:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][360/625] eta 0:01:59 lr 0.000165 wd 0.0500 time 0.4477 (0.4524) data time 0.0008 (0.0027) model time 0.4469 (0.4492) loss 2.2001 (2.5658) grad_norm 3.1787 (inf) loss_scale 64.0000 (72.8643) mem 16699MB [2024-08-11 07:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][370/625] eta 0:01:55 lr 0.000165 wd 0.0500 time 0.4491 (0.4523) data time 0.0007 (0.0026) model time 0.4484 (0.4492) loss 3.0076 (2.5693) grad_norm 2.5537 (inf) loss_scale 64.0000 (72.6253) mem 16699MB [2024-08-11 07:20:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][380/625] eta 0:01:50 lr 0.000165 wd 0.0500 time 0.4514 (0.4522) data time 0.0006 (0.0026) model time 0.4507 (0.4492) loss 2.2724 (2.5716) grad_norm 1.7683 (inf) loss_scale 64.0000 (72.3990) mem 16699MB [2024-08-11 07:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][390/625] eta 0:01:46 lr 0.000165 wd 0.0500 time 0.4476 (0.4521) data time 0.0009 (0.0025) model time 0.4466 (0.4491) loss 2.1210 (2.5712) grad_norm 2.8042 (inf) loss_scale 64.0000 (72.1841) mem 16699MB [2024-08-11 07:20:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][400/625] eta 0:01:41 lr 0.000165 wd 0.0500 time 0.4457 (0.4520) data time 0.0009 (0.0025) model time 0.4448 (0.4490) loss 1.8820 (2.5698) grad_norm 17.2432 (inf) loss_scale 64.0000 (71.9800) mem 16699MB [2024-08-11 07:20:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][410/625] eta 0:01:37 lr 0.000165 wd 0.0500 time 0.4505 (0.4522) data time 0.0007 (0.0025) model time 0.4498 (0.4493) loss 2.2597 (2.5698) grad_norm 2.9882 (inf) loss_scale 64.0000 (71.7859) mem 16699MB [2024-08-11 07:20:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][420/625] eta 0:01:32 lr 0.000165 wd 0.0500 time 0.4497 (0.4521) data time 0.0009 (0.0024) model time 0.4489 (0.4493) loss 2.8378 (2.5710) grad_norm 2.2248 (inf) loss_scale 64.0000 (71.6010) mem 16699MB [2024-08-11 07:20:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][430/625] eta 0:01:28 lr 0.000164 wd 0.0500 time 0.4555 (0.4521) data time 0.0008 (0.0024) model time 0.4546 (0.4493) loss 2.5726 (2.5729) grad_norm 1.8206 (inf) loss_scale 64.0000 (71.4246) mem 16699MB [2024-08-11 07:21:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][440/625] eta 0:01:23 lr 0.000164 wd 0.0500 time 0.4463 (0.4520) data time 0.0008 (0.0023) model time 0.4456 (0.4493) loss 2.9234 (2.5773) grad_norm 1.7913 (inf) loss_scale 64.0000 (71.2562) mem 16699MB [2024-08-11 07:21:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][450/625] eta 0:01:19 lr 0.000164 wd 0.0500 time 0.4467 (0.4519) data time 0.0007 (0.0023) model time 0.4461 (0.4492) loss 2.1036 (2.5776) grad_norm 2.3710 (inf) loss_scale 64.0000 (71.0953) mem 16699MB [2024-08-11 07:21:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][460/625] eta 0:01:14 lr 0.000164 wd 0.0500 time 0.4500 (0.4518) data time 0.0007 (0.0023) model time 0.4493 (0.4492) loss 2.0513 (2.5769) grad_norm 1.5604 (inf) loss_scale 64.0000 (70.9414) mem 16699MB [2024-08-11 07:21:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][470/625] eta 0:01:10 lr 0.000164 wd 0.0500 time 0.4542 (0.4517) data time 0.0009 (0.0022) model time 0.4533 (0.4491) loss 2.7914 (2.5764) grad_norm 2.4577 (inf) loss_scale 64.0000 (70.7941) mem 16699MB [2024-08-11 07:21:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][480/625] eta 0:01:05 lr 0.000164 wd 0.0500 time 0.4425 (0.4516) data time 0.0009 (0.0022) model time 0.4417 (0.4490) loss 2.6833 (2.5748) grad_norm 2.4493 (inf) loss_scale 64.0000 (70.6528) mem 16699MB [2024-08-11 07:21:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][490/625] eta 0:01:00 lr 0.000164 wd 0.0500 time 0.4501 (0.4515) data time 0.0006 (0.0022) model time 0.4494 (0.4490) loss 2.9651 (2.5752) grad_norm 1.9012 (inf) loss_scale 64.0000 (70.5173) mem 16699MB [2024-08-11 07:21:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][500/625] eta 0:00:56 lr 0.000164 wd 0.0500 time 0.4464 (0.4514) data time 0.0007 (0.0022) model time 0.4457 (0.4489) loss 3.0326 (2.5786) grad_norm 1.5924 (inf) loss_scale 64.0000 (70.3872) mem 16699MB [2024-08-11 07:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][510/625] eta 0:00:51 lr 0.000164 wd 0.0500 time 0.4495 (0.4514) data time 0.0006 (0.0021) model time 0.4488 (0.4489) loss 2.5564 (2.5830) grad_norm 2.7105 (inf) loss_scale 64.0000 (70.2622) mem 16699MB [2024-08-11 07:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][520/625] eta 0:00:47 lr 0.000164 wd 0.0500 time 0.4477 (0.4513) data time 0.0008 (0.0021) model time 0.4468 (0.4488) loss 2.5697 (2.5787) grad_norm 1.8387 (inf) loss_scale 64.0000 (70.1420) mem 16699MB [2024-08-11 07:21:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][530/625] eta 0:00:42 lr 0.000164 wd 0.0500 time 0.4484 (0.4512) data time 0.0010 (0.0021) model time 0.4474 (0.4488) loss 2.0186 (2.5789) grad_norm 2.8694 (inf) loss_scale 64.0000 (70.0264) mem 16699MB [2024-08-11 07:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][540/625] eta 0:00:38 lr 0.000164 wd 0.0500 time 0.6399 (0.4515) data time 0.0006 (0.0021) model time 0.6392 (0.4491) loss 2.6564 (2.5775) grad_norm 2.1649 (inf) loss_scale 64.0000 (69.9150) mem 16699MB [2024-08-11 07:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][550/625] eta 0:00:33 lr 0.000164 wd 0.0500 time 0.4448 (0.4514) data time 0.0007 (0.0021) model time 0.4441 (0.4490) loss 2.9041 (2.5793) grad_norm 2.4394 (inf) loss_scale 64.0000 (69.8076) mem 16699MB [2024-08-11 07:21:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][560/625] eta 0:00:29 lr 0.000164 wd 0.0500 time 0.4460 (0.4513) data time 0.0008 (0.0020) model time 0.4452 (0.4490) loss 2.5764 (2.5786) grad_norm 2.2751 (inf) loss_scale 64.0000 (69.7041) mem 16699MB [2024-08-11 07:22:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][570/625] eta 0:00:24 lr 0.000163 wd 0.0500 time 0.4453 (0.4513) data time 0.0006 (0.0020) model time 0.4447 (0.4489) loss 2.9388 (2.5833) grad_norm 2.0416 (inf) loss_scale 64.0000 (69.6042) mem 16699MB [2024-08-11 07:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][580/625] eta 0:00:20 lr 0.000163 wd 0.0500 time 0.4464 (0.4512) data time 0.0007 (0.0020) model time 0.4457 (0.4489) loss 2.3626 (2.5852) grad_norm 1.9247 (inf) loss_scale 64.0000 (69.5077) mem 16699MB [2024-08-11 07:22:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][590/625] eta 0:00:15 lr 0.000163 wd 0.0500 time 0.4458 (0.4511) data time 0.0009 (0.0020) model time 0.4449 (0.4488) loss 2.7371 (2.5847) grad_norm 1.9758 (inf) loss_scale 64.0000 (69.4146) mem 16699MB [2024-08-11 07:22:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][600/625] eta 0:00:11 lr 0.000163 wd 0.0500 time 0.4458 (0.4510) data time 0.0008 (0.0020) model time 0.4450 (0.4487) loss 2.1198 (2.5826) grad_norm 2.6898 (inf) loss_scale 64.0000 (69.3245) mem 16699MB [2024-08-11 07:22:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][610/625] eta 0:00:06 lr 0.000163 wd 0.0500 time 0.4378 (0.4509) data time 0.0004 (0.0019) model time 0.4374 (0.4486) loss 3.3926 (2.5895) grad_norm 2.1469 (inf) loss_scale 64.0000 (69.2373) mem 16699MB [2024-08-11 07:22:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [234/300][620/625] eta 0:00:02 lr 0.000163 wd 0.0500 time 0.4430 (0.4507) data time 0.0005 (0.0019) model time 0.4425 (0.4485) loss 2.8035 (2.5931) grad_norm 2.4155 (inf) loss_scale 64.0000 (69.1530) mem 16699MB [2024-08-11 07:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 234 training takes 0:04:41 [2024-08-11 07:22:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:22:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.5376 (0.5376) Acc@1 87.939 (87.939) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 07:22:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 0.8149 (0.6295) Acc@1 81.104 (86.816) Acc@5 96.387 (97.785) Mem 16699MB [2024-08-11 07:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9287 (0.7461) Acc@1 78.809 (83.859) Acc@5 95.068 (96.698) Mem 16699MB [2024-08-11 07:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.597 Acc@5 96.655 [2024-08-11 07:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 07:22:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.864 (0.864) Loss 0.4836 (0.4836) Acc@1 89.600 (89.600) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.188) Loss 0.7783 (0.5937) Acc@1 81.201 (87.300) Acc@5 96.582 (97.914) Mem 16699MB [2024-08-11 07:22:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.153) Loss 0.8677 (0.6986) Acc@1 80.127 (84.598) Acc@5 95.996 (96.998) Mem 16699MB [2024-08-11 07:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.317 Acc@5 96.947 [2024-08-11 07:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:22:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][0/625] eta 0:13:55 lr 0.000163 wd 0.0500 time 1.3365 (1.3365) data time 0.6050 (0.6050) model time 0.0000 (0.0000) loss 2.9666 (2.9666) grad_norm 2.8762 (2.8762) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][10/625] eta 0:05:25 lr 0.000163 wd 0.0500 time 0.4443 (0.5296) data time 0.0006 (0.0558) model time 0.0000 (0.0000) loss 1.5096 (2.5579) grad_norm 1.9879 (2.2164) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][20/625] eta 0:04:57 lr 0.000163 wd 0.0500 time 0.4516 (0.4914) data time 0.0009 (0.0296) model time 0.0000 (0.0000) loss 2.5965 (2.4473) grad_norm 1.8449 (2.5368) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][30/625] eta 0:04:44 lr 0.000163 wd 0.0500 time 0.4492 (0.4782) data time 0.0008 (0.0203) model time 0.0000 (0.0000) loss 2.6801 (2.5672) grad_norm 2.5952 (3.7585) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:22:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][40/625] eta 0:04:35 lr 0.000163 wd 0.0500 time 0.4464 (0.4706) data time 0.0008 (0.0155) model time 0.0000 (0.0000) loss 3.0726 (2.5983) grad_norm 2.3008 (3.4424) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][50/625] eta 0:04:28 lr 0.000163 wd 0.0500 time 0.4508 (0.4667) data time 0.0008 (0.0126) model time 0.0000 (0.0000) loss 2.8423 (2.5840) grad_norm 1.4974 (3.1878) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][60/625] eta 0:04:21 lr 0.000163 wd 0.0500 time 0.4460 (0.4636) data time 0.0007 (0.0107) model time 0.4453 (0.4468) loss 2.3242 (2.5970) grad_norm 1.9815 (3.2082) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][70/625] eta 0:04:16 lr 0.000163 wd 0.0500 time 0.4483 (0.4614) data time 0.0008 (0.0093) model time 0.4476 (0.4469) loss 1.6311 (2.5822) grad_norm 2.1782 (3.1554) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][80/625] eta 0:04:10 lr 0.000163 wd 0.0500 time 0.4505 (0.4599) data time 0.0008 (0.0083) model time 0.4497 (0.4475) loss 1.7532 (2.5554) grad_norm 2.6878 (3.3435) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][90/625] eta 0:04:05 lr 0.000162 wd 0.0500 time 0.4534 (0.4588) data time 0.0006 (0.0075) model time 0.4528 (0.4479) loss 2.7858 (2.5794) grad_norm 3.3055 (3.2597) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][100/625] eta 0:04:01 lr 0.000162 wd 0.0500 time 0.4490 (0.4598) data time 0.0006 (0.0068) model time 0.4484 (0.4519) loss 2.8252 (2.5938) grad_norm 2.5140 (3.6990) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][110/625] eta 0:03:56 lr 0.000162 wd 0.0500 time 0.4515 (0.4589) data time 0.0008 (0.0063) model time 0.4507 (0.4514) loss 3.2760 (2.5991) grad_norm 2.2570 (3.5662) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][120/625] eta 0:03:51 lr 0.000162 wd 0.0500 time 0.4463 (0.4580) data time 0.0008 (0.0058) model time 0.4455 (0.4507) loss 2.5809 (2.6048) grad_norm 3.1403 (3.4888) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][130/625] eta 0:03:46 lr 0.000162 wd 0.0500 time 0.4491 (0.4573) data time 0.0006 (0.0055) model time 0.4485 (0.4504) loss 2.7750 (2.6109) grad_norm 2.8415 (3.4000) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][140/625] eta 0:03:41 lr 0.000162 wd 0.0500 time 0.4518 (0.4567) data time 0.0007 (0.0051) model time 0.4511 (0.4501) loss 2.4316 (2.6067) grad_norm 2.3559 (3.3235) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][150/625] eta 0:03:36 lr 0.000162 wd 0.0500 time 0.4486 (0.4563) data time 0.0006 (0.0049) model time 0.4480 (0.4500) loss 1.7392 (2.5925) grad_norm 2.1892 (3.2506) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][160/625] eta 0:03:31 lr 0.000162 wd 0.0500 time 0.4495 (0.4559) data time 0.0008 (0.0046) model time 0.4487 (0.4500) loss 2.4030 (2.5776) grad_norm 2.3325 (3.1697) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][170/625] eta 0:03:27 lr 0.000162 wd 0.0500 time 0.4539 (0.4557) data time 0.0006 (0.0044) model time 0.4533 (0.4501) loss 2.7322 (2.5738) grad_norm 2.1011 (3.0972) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:23:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][180/625] eta 0:03:22 lr 0.000162 wd 0.0500 time 0.4434 (0.4554) data time 0.0007 (0.0042) model time 0.4427 (0.4501) loss 2.7650 (2.5767) grad_norm 1.9619 (3.1073) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][190/625] eta 0:03:17 lr 0.000162 wd 0.0500 time 0.4503 (0.4551) data time 0.0007 (0.0040) model time 0.4496 (0.4500) loss 2.5880 (2.5860) grad_norm 2.1372 (3.0714) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][200/625] eta 0:03:13 lr 0.000162 wd 0.0500 time 0.4507 (0.4548) data time 0.0007 (0.0039) model time 0.4501 (0.4498) loss 2.6192 (2.5906) grad_norm 2.0590 (3.0307) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][210/625] eta 0:03:08 lr 0.000162 wd 0.0500 time 0.4505 (0.4544) data time 0.0008 (0.0037) model time 0.4498 (0.4497) loss 2.6111 (2.5965) grad_norm 1.9368 (3.0240) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][220/625] eta 0:03:03 lr 0.000162 wd 0.0500 time 0.4454 (0.4542) data time 0.0007 (0.0036) model time 0.4447 (0.4496) loss 2.0910 (2.5848) grad_norm 1.8167 (2.9954) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][230/625] eta 0:02:59 lr 0.000161 wd 0.0500 time 0.4494 (0.4540) data time 0.0008 (0.0035) model time 0.4486 (0.4496) loss 2.3436 (2.5759) grad_norm 3.3490 (2.9835) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][240/625] eta 0:02:55 lr 0.000161 wd 0.0500 time 0.4516 (0.4546) data time 0.0007 (0.0034) model time 0.4509 (0.4505) loss 2.3691 (2.5749) grad_norm 1.7270 (2.9580) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][250/625] eta 0:02:50 lr 0.000161 wd 0.0500 time 0.4509 (0.4544) data time 0.0007 (0.0033) model time 0.4502 (0.4504) loss 2.9604 (2.5778) grad_norm 2.0616 (2.9308) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][260/625] eta 0:02:45 lr 0.000161 wd 0.0500 time 0.4420 (0.4541) data time 0.0009 (0.0032) model time 0.4411 (0.4501) loss 1.7927 (2.5689) grad_norm 1.9078 (2.9048) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][270/625] eta 0:02:41 lr 0.000161 wd 0.0500 time 0.4492 (0.4543) data time 0.0008 (0.0031) model time 0.4484 (0.4506) loss 2.6356 (2.5700) grad_norm 2.0995 (3.0371) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][280/625] eta 0:02:36 lr 0.000161 wd 0.0500 time 0.4443 (0.4541) data time 0.0009 (0.0030) model time 0.4435 (0.4504) loss 2.7413 (2.5690) grad_norm 2.4307 (3.0077) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][290/625] eta 0:02:32 lr 0.000161 wd 0.0500 time 0.4471 (0.4538) data time 0.0007 (0.0029) model time 0.4463 (0.4502) loss 3.0748 (2.5723) grad_norm 2.2705 (2.9825) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][300/625] eta 0:02:27 lr 0.000161 wd 0.0500 time 0.4469 (0.4537) data time 0.0009 (0.0029) model time 0.4460 (0.4501) loss 2.8642 (2.5805) grad_norm 2.0283 (2.9645) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:24:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][310/625] eta 0:02:22 lr 0.000161 wd 0.0500 time 0.4504 (0.4536) data time 0.0009 (0.0028) model time 0.4495 (0.4501) loss 2.4857 (2.5801) grad_norm 1.6271 (2.9349) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][320/625] eta 0:02:18 lr 0.000161 wd 0.0500 time 0.4460 (0.4535) data time 0.0007 (0.0027) model time 0.4453 (0.4501) loss 3.1145 (2.5841) grad_norm 2.2634 (2.9410) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][330/625] eta 0:02:13 lr 0.000161 wd 0.0500 time 0.4490 (0.4533) data time 0.0008 (0.0027) model time 0.4482 (0.4500) loss 2.0618 (2.5734) grad_norm 2.6622 (2.9239) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][340/625] eta 0:02:09 lr 0.000161 wd 0.0500 time 0.4494 (0.4531) data time 0.0007 (0.0026) model time 0.4487 (0.4499) loss 2.7866 (2.5733) grad_norm 2.4347 (2.9090) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][350/625] eta 0:02:04 lr 0.000161 wd 0.0500 time 0.4450 (0.4530) data time 0.0007 (0.0026) model time 0.4443 (0.4498) loss 1.8275 (2.5679) grad_norm 2.3408 (2.9176) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][360/625] eta 0:01:59 lr 0.000161 wd 0.0500 time 0.4462 (0.4528) data time 0.0008 (0.0025) model time 0.4454 (0.4497) loss 2.7366 (2.5744) grad_norm 3.5566 (2.9135) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][370/625] eta 0:01:55 lr 0.000160 wd 0.0500 time 0.4477 (0.4527) data time 0.0008 (0.0025) model time 0.4468 (0.4496) loss 2.8900 (2.5790) grad_norm 1.4132 (2.8887) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][380/625] eta 0:01:50 lr 0.000160 wd 0.0500 time 0.4494 (0.4525) data time 0.0007 (0.0024) model time 0.4487 (0.4495) loss 2.6348 (2.5790) grad_norm 1.6294 (2.9780) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][390/625] eta 0:01:46 lr 0.000160 wd 0.0500 time 0.4500 (0.4524) data time 0.0009 (0.0024) model time 0.4492 (0.4494) loss 2.2425 (2.5833) grad_norm 2.2457 (2.9798) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][400/625] eta 0:01:41 lr 0.000160 wd 0.0500 time 0.4401 (0.4522) data time 0.0008 (0.0024) model time 0.4393 (0.4493) loss 1.8416 (2.5807) grad_norm 2.0058 (2.9887) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][410/625] eta 0:01:37 lr 0.000160 wd 0.0500 time 0.4414 (0.4524) data time 0.0009 (0.0023) model time 0.4406 (0.4495) loss 2.9048 (2.5865) grad_norm 3.7837 (2.9777) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][420/625] eta 0:01:32 lr 0.000160 wd 0.0500 time 0.4464 (0.4522) data time 0.0007 (0.0023) model time 0.4458 (0.4494) loss 2.5216 (2.5869) grad_norm 36.1066 (3.0398) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][430/625] eta 0:01:28 lr 0.000160 wd 0.0500 time 0.4468 (0.4521) data time 0.0007 (0.0023) model time 0.4461 (0.4493) loss 2.2046 (2.5929) grad_norm 2.7114 (3.0224) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][440/625] eta 0:01:23 lr 0.000160 wd 0.0500 time 0.4489 (0.4520) data time 0.0006 (0.0022) model time 0.4482 (0.4493) loss 2.9563 (2.5932) grad_norm 2.3282 (3.0059) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:25:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][450/625] eta 0:01:19 lr 0.000160 wd 0.0500 time 0.4482 (0.4520) data time 0.0007 (0.0022) model time 0.4475 (0.4492) loss 2.5383 (2.5914) grad_norm 1.8958 (3.1307) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][460/625] eta 0:01:14 lr 0.000160 wd 0.0500 time 0.4439 (0.4519) data time 0.0009 (0.0022) model time 0.4430 (0.4492) loss 2.9073 (2.5936) grad_norm 2.2106 (3.1292) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][470/625] eta 0:01:10 lr 0.000160 wd 0.0500 time 0.4483 (0.4518) data time 0.0006 (0.0021) model time 0.4477 (0.4491) loss 1.4381 (2.5887) grad_norm 2.3778 (3.1134) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][480/625] eta 0:01:05 lr 0.000160 wd 0.0500 time 0.4488 (0.4517) data time 0.0006 (0.0021) model time 0.4482 (0.4490) loss 2.7648 (2.5906) grad_norm 1.9314 (3.0926) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][490/625] eta 0:01:00 lr 0.000160 wd 0.0500 time 0.4427 (0.4515) data time 0.0007 (0.0021) model time 0.4420 (0.4489) loss 3.1854 (2.5940) grad_norm 2.9878 (3.0998) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][500/625] eta 0:00:56 lr 0.000160 wd 0.0500 time 0.4439 (0.4514) data time 0.0007 (0.0021) model time 0.4433 (0.4488) loss 2.8592 (2.5963) grad_norm 4.9850 (3.1114) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][510/625] eta 0:00:51 lr 0.000159 wd 0.0500 time 0.4452 (0.4513) data time 0.0006 (0.0020) model time 0.4445 (0.4487) loss 3.0937 (2.5964) grad_norm 2.7330 (3.1168) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][520/625] eta 0:00:47 lr 0.000159 wd 0.0500 time 0.4465 (0.4512) data time 0.0006 (0.0020) model time 0.4459 (0.4487) loss 2.8159 (2.5950) grad_norm 4.3080 (3.1096) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][530/625] eta 0:00:42 lr 0.000159 wd 0.0500 time 0.4482 (0.4512) data time 0.0006 (0.0020) model time 0.4476 (0.4487) loss 2.9276 (2.5947) grad_norm 3.6689 (3.1074) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][540/625] eta 0:00:38 lr 0.000159 wd 0.0500 time 0.4461 (0.4511) data time 0.0007 (0.0020) model time 0.4454 (0.4487) loss 2.0070 (2.5960) grad_norm 1.8155 (3.1140) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][550/625] eta 0:00:33 lr 0.000159 wd 0.0500 time 0.4443 (0.4510) data time 0.0009 (0.0019) model time 0.4434 (0.4486) loss 2.2580 (2.5963) grad_norm 3.1388 (3.0999) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][560/625] eta 0:00:29 lr 0.000159 wd 0.0500 time 0.4530 (0.4510) data time 0.0008 (0.0019) model time 0.4521 (0.4486) loss 2.6929 (2.5968) grad_norm 2.8904 (3.0999) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][570/625] eta 0:00:24 lr 0.000159 wd 0.0500 time 0.4443 (0.4509) data time 0.0007 (0.0019) model time 0.4436 (0.4485) loss 2.3451 (2.5937) grad_norm 2.5574 (3.0896) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:26:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][580/625] eta 0:00:20 lr 0.000159 wd 0.0500 time 0.4483 (0.4512) data time 0.0008 (0.0019) model time 0.4475 (0.4489) loss 3.0163 (2.5922) grad_norm 2.5792 (3.0704) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][590/625] eta 0:00:15 lr 0.000159 wd 0.0500 time 0.4487 (0.4511) data time 0.0008 (0.0019) model time 0.4478 (0.4488) loss 1.6763 (2.5890) grad_norm 2.6878 (3.0728) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][600/625] eta 0:00:11 lr 0.000159 wd 0.0500 time 0.4461 (0.4511) data time 0.0007 (0.0018) model time 0.4454 (0.4488) loss 2.1734 (2.5867) grad_norm 2.3342 (3.0561) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][610/625] eta 0:00:06 lr 0.000159 wd 0.0500 time 0.4511 (0.4511) data time 0.0006 (0.0018) model time 0.4505 (0.4488) loss 2.8756 (2.5880) grad_norm 2.5651 (3.0825) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [235/300][620/625] eta 0:00:02 lr 0.000159 wd 0.0500 time 0.4427 (0.4510) data time 0.0006 (0.0018) model time 0.4421 (0.4487) loss 3.0292 (2.5855) grad_norm 2.0900 (3.0772) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 235 training takes 0:04:41 [2024-08-11 07:27:17 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:27:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:27:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5210 (0.5210) Acc@1 88.379 (88.379) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 07:27:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8467 (0.6294) Acc@1 79.102 (86.559) Acc@5 96.436 (97.763) Mem 16699MB [2024-08-11 07:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9043 (0.7454) Acc@1 79.150 (83.719) Acc@5 95.068 (96.636) Mem 16699MB [2024-08-11 07:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.491 Acc@5 96.565 [2024-08-11 07:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 07:27:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.840 (0.840) Loss 0.4841 (0.4841) Acc@1 89.453 (89.453) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:27:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.186) Loss 0.7798 (0.5943) Acc@1 81.104 (87.287) Acc@5 96.582 (97.931) Mem 16699MB [2024-08-11 07:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.153) Loss 0.8667 (0.6994) Acc@1 80.176 (84.554) Acc@5 95.898 (97.003) Mem 16699MB [2024-08-11 07:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.275 Acc@5 96.951 [2024-08-11 07:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][0/625] eta 0:13:15 lr 0.000159 wd 0.0500 time 1.2721 (1.2721) data time 0.6664 (0.6664) model time 0.0000 (0.0000) loss 2.6982 (2.6982) grad_norm 23.7739 (23.7739) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][10/625] eta 0:05:21 lr 0.000159 wd 0.0500 time 0.4439 (0.5222) data time 0.0007 (0.0614) model time 0.0000 (0.0000) loss 2.8161 (2.6525) grad_norm 1.9454 (7.1075) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][20/625] eta 0:04:55 lr 0.000159 wd 0.0500 time 0.4582 (0.4891) data time 0.0008 (0.0326) model time 0.0000 (0.0000) loss 2.3278 (2.5494) grad_norm 2.6679 (5.0131) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][30/625] eta 0:04:44 lr 0.000158 wd 0.0500 time 0.4508 (0.4774) data time 0.0006 (0.0223) model time 0.0000 (0.0000) loss 2.7591 (2.5635) grad_norm 2.9849 (4.2491) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][40/625] eta 0:04:34 lr 0.000158 wd 0.0500 time 0.4444 (0.4696) data time 0.0008 (0.0171) model time 0.0000 (0.0000) loss 2.4759 (2.5945) grad_norm 2.4468 (3.7871) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][50/625] eta 0:04:28 lr 0.000158 wd 0.0500 time 0.4495 (0.4661) data time 0.0008 (0.0139) model time 0.0000 (0.0000) loss 2.4417 (2.5971) grad_norm 2.0407 (3.4563) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][60/625] eta 0:04:21 lr 0.000158 wd 0.0500 time 0.4613 (0.4631) data time 0.0007 (0.0118) model time 0.4605 (0.4468) loss 2.5756 (2.5922) grad_norm 3.1344 (3.2631) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:27:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][70/625] eta 0:04:15 lr 0.000158 wd 0.0500 time 0.4466 (0.4607) data time 0.0010 (0.0102) model time 0.4456 (0.4460) loss 2.8497 (2.5885) grad_norm 2.6440 (3.1194) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][80/625] eta 0:04:10 lr 0.000158 wd 0.0500 time 0.4432 (0.4589) data time 0.0009 (0.0091) model time 0.4423 (0.4459) loss 2.6038 (2.5863) grad_norm 2.2387 (3.0498) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][90/625] eta 0:04:04 lr 0.000158 wd 0.0500 time 0.4514 (0.4577) data time 0.0006 (0.0082) model time 0.4508 (0.4461) loss 1.9962 (2.5939) grad_norm 1.8527 (2.9813) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][100/625] eta 0:03:59 lr 0.000158 wd 0.0500 time 0.4476 (0.4566) data time 0.0008 (0.0074) model time 0.4468 (0.4461) loss 2.5719 (2.5803) grad_norm 2.4835 (2.9287) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][110/625] eta 0:03:54 lr 0.000158 wd 0.0500 time 0.4491 (0.4559) data time 0.0008 (0.0068) model time 0.4482 (0.4464) loss 2.7228 (2.5834) grad_norm 2.5445 (2.9189) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][120/625] eta 0:03:49 lr 0.000158 wd 0.0500 time 0.4473 (0.4551) data time 0.0006 (0.0063) model time 0.4466 (0.4463) loss 2.7611 (2.5915) grad_norm 2.0355 (2.9226) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][130/625] eta 0:03:44 lr 0.000158 wd 0.0500 time 0.4476 (0.4544) data time 0.0006 (0.0059) model time 0.4470 (0.4462) loss 2.1728 (2.5778) grad_norm 1.7520 (2.8422) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][140/625] eta 0:03:40 lr 0.000158 wd 0.0500 time 0.4446 (0.4538) data time 0.0009 (0.0056) model time 0.4437 (0.4461) loss 2.5452 (2.5937) grad_norm 1.7959 (2.7829) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][150/625] eta 0:03:35 lr 0.000158 wd 0.0500 time 0.4474 (0.4534) data time 0.0008 (0.0052) model time 0.4466 (0.4461) loss 2.9359 (2.5890) grad_norm 1.9819 (2.7248) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][160/625] eta 0:03:30 lr 0.000158 wd 0.0500 time 0.4450 (0.4530) data time 0.0006 (0.0050) model time 0.4444 (0.4460) loss 2.3079 (2.5847) grad_norm 2.0052 (2.7182) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][170/625] eta 0:03:25 lr 0.000157 wd 0.0500 time 0.4499 (0.4527) data time 0.0007 (0.0047) model time 0.4492 (0.4462) loss 2.3338 (2.5720) grad_norm 1.5520 (2.6758) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][180/625] eta 0:03:21 lr 0.000157 wd 0.0500 time 0.4483 (0.4524) data time 0.0009 (0.0045) model time 0.4474 (0.4462) loss 2.6496 (2.5684) grad_norm 2.5132 (2.6636) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][190/625] eta 0:03:16 lr 0.000157 wd 0.0500 time 0.4456 (0.4522) data time 0.0007 (0.0043) model time 0.4449 (0.4463) loss 2.9083 (2.5653) grad_norm 1.7223 (2.6317) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:28:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][200/625] eta 0:03:12 lr 0.000157 wd 0.0500 time 0.4434 (0.4519) data time 0.0010 (0.0041) model time 0.4424 (0.4463) loss 2.9185 (2.5747) grad_norm 2.6329 (2.6019) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][210/625] eta 0:03:07 lr 0.000157 wd 0.0500 time 0.4503 (0.4517) data time 0.0006 (0.0040) model time 0.4497 (0.4462) loss 3.2507 (2.5833) grad_norm 2.4043 (2.5812) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][220/625] eta 0:03:02 lr 0.000157 wd 0.0500 time 0.4447 (0.4514) data time 0.0008 (0.0038) model time 0.4440 (0.4462) loss 2.4143 (2.5876) grad_norm 2.8027 (2.6986) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][230/625] eta 0:02:58 lr 0.000157 wd 0.0500 time 0.4448 (0.4512) data time 0.0007 (0.0037) model time 0.4441 (0.4461) loss 2.8855 (2.5929) grad_norm 1.6830 (2.6786) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][240/625] eta 0:02:53 lr 0.000157 wd 0.0500 time 0.4507 (0.4511) data time 0.0006 (0.0036) model time 0.4501 (0.4462) loss 2.2583 (2.5755) grad_norm 2.4343 (2.6771) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][250/625] eta 0:02:49 lr 0.000157 wd 0.0500 time 0.4569 (0.4519) data time 0.0009 (0.0035) model time 0.4560 (0.4474) loss 2.1235 (2.5694) grad_norm 2.1842 (2.6782) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][260/625] eta 0:02:44 lr 0.000157 wd 0.0500 time 0.4498 (0.4517) data time 0.0008 (0.0034) model time 0.4490 (0.4474) loss 3.3037 (2.5664) grad_norm 2.5980 (2.6572) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][270/625] eta 0:02:40 lr 0.000157 wd 0.0500 time 0.4467 (0.4521) data time 0.0006 (0.0033) model time 0.4461 (0.4480) loss 2.6912 (2.5697) grad_norm 3.1469 (2.6497) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][280/625] eta 0:02:35 lr 0.000157 wd 0.0500 time 0.4533 (0.4520) data time 0.0006 (0.0032) model time 0.4527 (0.4481) loss 2.6941 (2.5675) grad_norm 2.5789 (2.6367) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][290/625] eta 0:02:31 lr 0.000157 wd 0.0500 time 0.4442 (0.4518) data time 0.0008 (0.0031) model time 0.4434 (0.4480) loss 2.6915 (2.5752) grad_norm 2.2026 (2.6296) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][300/625] eta 0:02:26 lr 0.000157 wd 0.0500 time 0.4471 (0.4517) data time 0.0009 (0.0031) model time 0.4462 (0.4479) loss 2.8686 (2.5671) grad_norm 7.7761 (2.6505) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][310/625] eta 0:02:22 lr 0.000157 wd 0.0500 time 0.4508 (0.4516) data time 0.0007 (0.0030) model time 0.4502 (0.4479) loss 2.8761 (2.5696) grad_norm 2.4498 (2.6528) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][320/625] eta 0:02:17 lr 0.000156 wd 0.0500 time 0.4510 (0.4515) data time 0.0006 (0.0029) model time 0.4503 (0.4479) loss 2.7012 (2.5707) grad_norm 1.8374 (2.6447) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][330/625] eta 0:02:13 lr 0.000156 wd 0.0500 time 0.4457 (0.4514) data time 0.0006 (0.0029) model time 0.4451 (0.4479) loss 1.8969 (2.5647) grad_norm 2.4332 (2.6332) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][340/625] eta 0:02:08 lr 0.000156 wd 0.0500 time 0.4501 (0.4513) data time 0.0009 (0.0028) model time 0.4492 (0.4479) loss 1.7278 (2.5657) grad_norm 2.2407 (2.6220) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][350/625] eta 0:02:04 lr 0.000156 wd 0.0500 time 0.4456 (0.4512) data time 0.0006 (0.0027) model time 0.4450 (0.4478) loss 1.5479 (2.5560) grad_norm 2.6699 (2.6146) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][360/625] eta 0:01:59 lr 0.000156 wd 0.0500 time 0.4494 (0.4510) data time 0.0007 (0.0027) model time 0.4487 (0.4477) loss 3.1499 (2.5591) grad_norm 3.9847 (2.6228) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][370/625] eta 0:01:54 lr 0.000156 wd 0.0500 time 0.4462 (0.4509) data time 0.0006 (0.0026) model time 0.4457 (0.4477) loss 1.7383 (2.5599) grad_norm 6.2328 (2.6308) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][380/625] eta 0:01:50 lr 0.000156 wd 0.0500 time 0.4466 (0.4507) data time 0.0006 (0.0026) model time 0.4459 (0.4475) loss 2.6571 (2.5613) grad_norm 2.8186 (2.6331) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][390/625] eta 0:01:45 lr 0.000156 wd 0.0500 time 0.4491 (0.4507) data time 0.0006 (0.0025) model time 0.4485 (0.4476) loss 2.2596 (2.5604) grad_norm 3.3488 (2.6405) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][400/625] eta 0:01:41 lr 0.000156 wd 0.0500 time 0.4483 (0.4507) data time 0.0007 (0.0025) model time 0.4476 (0.4476) loss 2.7571 (2.5572) grad_norm 1.9816 (2.6356) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][410/625] eta 0:01:36 lr 0.000156 wd 0.0500 time 0.4462 (0.4510) data time 0.0009 (0.0025) model time 0.4453 (0.4480) loss 2.1774 (2.5578) grad_norm 3.9691 (2.6521) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][420/625] eta 0:01:32 lr 0.000156 wd 0.0500 time 0.4469 (0.4509) data time 0.0007 (0.0024) model time 0.4462 (0.4480) loss 2.7451 (2.5580) grad_norm 2.2984 (2.9173) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][430/625] eta 0:01:27 lr 0.000156 wd 0.0500 time 0.4491 (0.4509) data time 0.0007 (0.0024) model time 0.4484 (0.4480) loss 1.9110 (2.5590) grad_norm 2.3244 (2.9211) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][440/625] eta 0:01:23 lr 0.000156 wd 0.0500 time 0.4531 (0.4512) data time 0.0008 (0.0023) model time 0.4523 (0.4485) loss 3.2195 (2.5567) grad_norm 2.4019 (2.9089) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][450/625] eta 0:01:18 lr 0.000156 wd 0.0500 time 0.4515 (0.4513) data time 0.0008 (0.0023) model time 0.4507 (0.4486) loss 2.3518 (2.5508) grad_norm 25.6374 (2.9497) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][460/625] eta 0:01:14 lr 0.000155 wd 0.0500 time 0.4484 (0.4513) data time 0.0006 (0.0023) model time 0.4478 (0.4487) loss 1.8337 (2.5517) grad_norm 1.8249 (2.9416) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:30:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][470/625] eta 0:01:09 lr 0.000155 wd 0.0500 time 0.4469 (0.4513) data time 0.0007 (0.0023) model time 0.4462 (0.4487) loss 3.3338 (2.5544) grad_norm 2.4749 (2.9904) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][480/625] eta 0:01:05 lr 0.000155 wd 0.0500 time 0.4488 (0.4513) data time 0.0008 (0.0022) model time 0.4480 (0.4487) loss 2.3322 (2.5542) grad_norm 2.8629 (2.9920) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][490/625] eta 0:01:00 lr 0.000155 wd 0.0500 time 0.4493 (0.4513) data time 0.0008 (0.0022) model time 0.4485 (0.4487) loss 2.7939 (2.5563) grad_norm 1.5270 (2.9780) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][500/625] eta 0:00:56 lr 0.000155 wd 0.0500 time 0.4462 (0.4511) data time 0.0008 (0.0022) model time 0.4455 (0.4486) loss 2.5277 (2.5500) grad_norm 1.8703 (2.9647) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][510/625] eta 0:00:51 lr 0.000155 wd 0.0500 time 0.4469 (0.4511) data time 0.0009 (0.0021) model time 0.4460 (0.4486) loss 2.9346 (2.5539) grad_norm 2.3108 (2.9575) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][520/625] eta 0:00:47 lr 0.000155 wd 0.0500 time 0.4414 (0.4510) data time 0.0009 (0.0021) model time 0.4405 (0.4485) loss 2.2736 (2.5527) grad_norm 2.7132 (2.9481) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][530/625] eta 0:00:42 lr 0.000155 wd 0.0500 time 0.4485 (0.4509) data time 0.0006 (0.0021) model time 0.4479 (0.4485) loss 2.1338 (2.5541) grad_norm 1.4045 (2.9334) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][540/625] eta 0:00:38 lr 0.000155 wd 0.0500 time 0.4509 (0.4509) data time 0.0008 (0.0021) model time 0.4501 (0.4485) loss 2.8420 (2.5508) grad_norm 2.6307 (2.9196) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][550/625] eta 0:00:33 lr 0.000155 wd 0.0500 time 0.4465 (0.4509) data time 0.0006 (0.0020) model time 0.4459 (0.4485) loss 2.6424 (2.5503) grad_norm 2.3831 (2.9042) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][560/625] eta 0:00:29 lr 0.000155 wd 0.0500 time 0.4477 (0.4509) data time 0.0010 (0.0020) model time 0.4467 (0.4485) loss 2.9593 (2.5517) grad_norm 2.2315 (2.8910) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][570/625] eta 0:00:24 lr 0.000155 wd 0.0500 time 0.4508 (0.4508) data time 0.0006 (0.0020) model time 0.4502 (0.4485) loss 1.7201 (2.5475) grad_norm 1.7713 (2.8794) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][580/625] eta 0:00:20 lr 0.000155 wd 0.0500 time 0.4490 (0.4508) data time 0.0008 (0.0020) model time 0.4482 (0.4484) loss 2.5935 (2.5445) grad_norm 1.6740 (2.8697) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][590/625] eta 0:00:15 lr 0.000155 wd 0.0500 time 0.4452 (0.4507) data time 0.0009 (0.0020) model time 0.4444 (0.4484) loss 2.8873 (2.5459) grad_norm 2.3914 (2.8573) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][600/625] eta 0:00:11 lr 0.000154 wd 0.0500 time 0.4481 (0.4507) data time 0.0006 (0.0019) model time 0.4475 (0.4484) loss 2.0245 (2.5448) grad_norm 2.3260 (2.8459) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][610/625] eta 0:00:06 lr 0.000154 wd 0.0500 time 0.4445 (0.4507) data time 0.0004 (0.0019) model time 0.4441 (0.4484) loss 2.7538 (2.5426) grad_norm 6.0413 (2.8417) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [236/300][620/625] eta 0:00:02 lr 0.000154 wd 0.0500 time 0.4458 (0.4506) data time 0.0006 (0.0019) model time 0.4452 (0.4484) loss 2.9085 (2.5425) grad_norm 1.7716 (2.8318) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 236 training takes 0:04:41 [2024-08-11 07:32:07 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:32:08 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:32:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.510 (0.510) Loss 0.5239 (0.5239) Acc@1 89.014 (89.014) Acc@5 98.633 (98.633) Mem 16699MB [2024-08-11 07:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.155) Loss 0.8320 (0.6284) Acc@1 80.566 (86.719) Acc@5 96.289 (97.767) Mem 16699MB [2024-08-11 07:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.9297 (0.7433) Acc@1 78.174 (83.784) Acc@5 95.312 (96.677) Mem 16699MB [2024-08-11 07:32:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.547 Acc@5 96.639 [2024-08-11 07:32:12 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.5% [2024-08-11 07:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.041 (1.041) Loss 0.4858 (0.4858) Acc@1 89.404 (89.404) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 07:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.205) Loss 0.7798 (0.5946) Acc@1 81.152 (87.318) Acc@5 96.582 (97.918) Mem 16699MB [2024-08-11 07:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.162) Loss 0.8677 (0.7002) Acc@1 80.127 (84.570) Acc@5 95.801 (96.966) Mem 16699MB [2024-08-11 07:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.291 Acc@5 96.921 [2024-08-11 07:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:32:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][0/625] eta 0:14:41 lr 0.000154 wd 0.0500 time 1.4105 (1.4105) data time 0.5607 (0.5607) model time 0.0000 (0.0000) loss 2.4916 (2.4916) grad_norm 1.8876 (1.8876) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][10/625] eta 0:05:29 lr 0.000154 wd 0.0500 time 0.4450 (0.5351) data time 0.0006 (0.0518) model time 0.0000 (0.0000) loss 2.5927 (2.4197) grad_norm 1.6613 (2.1722) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][20/625] eta 0:04:58 lr 0.000154 wd 0.0500 time 0.4520 (0.4936) data time 0.0006 (0.0275) model time 0.0000 (0.0000) loss 2.5935 (2.5386) grad_norm 1.7006 (2.2228) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][30/625] eta 0:04:47 lr 0.000154 wd 0.0500 time 0.4398 (0.4840) data time 0.0009 (0.0189) model time 0.0000 (0.0000) loss 2.9888 (2.6466) grad_norm 2.8314 (2.3455) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][40/625] eta 0:04:39 lr 0.000154 wd 0.0500 time 0.4487 (0.4781) data time 0.0007 (0.0145) model time 0.0000 (0.0000) loss 1.5068 (2.6342) grad_norm 2.4725 (2.3329) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][50/625] eta 0:04:31 lr 0.000154 wd 0.0500 time 0.4477 (0.4723) data time 0.0008 (0.0118) model time 0.0000 (0.0000) loss 2.6030 (2.5795) grad_norm 2.0655 (2.3979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][60/625] eta 0:04:24 lr 0.000154 wd 0.0500 time 0.4478 (0.4686) data time 0.0006 (0.0100) model time 0.4472 (0.4487) loss 3.0321 (2.5588) grad_norm 2.6897 (2.3965) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][70/625] eta 0:04:18 lr 0.000154 wd 0.0500 time 0.4494 (0.4659) data time 0.0008 (0.0087) model time 0.4486 (0.4485) loss 2.6794 (2.5590) grad_norm 2.8560 (2.3901) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][80/625] eta 0:04:12 lr 0.000154 wd 0.0500 time 0.4473 (0.4637) data time 0.0008 (0.0077) model time 0.4465 (0.4481) loss 2.9741 (2.5675) grad_norm 2.5103 (2.4981) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:32:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][90/625] eta 0:04:07 lr 0.000154 wd 0.0500 time 0.4431 (0.4619) data time 0.0006 (0.0070) model time 0.4425 (0.4478) loss 2.6161 (2.5761) grad_norm 2.4529 (2.4384) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][100/625] eta 0:04:01 lr 0.000154 wd 0.0500 time 0.4471 (0.4606) data time 0.0009 (0.0064) model time 0.4462 (0.4477) loss 2.5868 (2.5696) grad_norm 1.8134 (2.4109) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][110/625] eta 0:03:56 lr 0.000154 wd 0.0500 time 0.4438 (0.4593) data time 0.0006 (0.0059) model time 0.4432 (0.4474) loss 3.1020 (2.5839) grad_norm 2.9549 (2.4912) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][120/625] eta 0:03:51 lr 0.000153 wd 0.0500 time 0.4476 (0.4584) data time 0.0008 (0.0054) model time 0.4468 (0.4475) loss 2.8426 (2.5817) grad_norm 2.0441 (2.4692) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][130/625] eta 0:03:46 lr 0.000153 wd 0.0500 time 0.4492 (0.4578) data time 0.0006 (0.0051) model time 0.4487 (0.4477) loss 2.9144 (2.5853) grad_norm 1.9035 (2.4679) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][140/625] eta 0:03:41 lr 0.000153 wd 0.0500 time 0.4532 (0.4572) data time 0.0007 (0.0048) model time 0.4525 (0.4478) loss 2.8279 (2.5859) grad_norm 1.9851 (2.4472) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][150/625] eta 0:03:36 lr 0.000153 wd 0.0500 time 0.4417 (0.4565) data time 0.0006 (0.0045) model time 0.4411 (0.4476) loss 2.0358 (2.5685) grad_norm 2.6713 (2.4123) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][160/625] eta 0:03:32 lr 0.000153 wd 0.0500 time 0.4483 (0.4559) data time 0.0006 (0.0043) model time 0.4477 (0.4475) loss 2.5986 (2.5755) grad_norm 2.9014 (2.4114) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][170/625] eta 0:03:27 lr 0.000153 wd 0.0500 time 0.4493 (0.4555) data time 0.0008 (0.0041) model time 0.4484 (0.4475) loss 2.9508 (2.5744) grad_norm 1.5569 (2.3972) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:33:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][180/625] eta 0:03:22 lr 0.000153 wd 0.0500 time 0.4520 (0.4551) data time 0.0008 (0.0039) model time 0.4512 (0.4475) loss 2.3935 (2.5759) grad_norm 2.3480 (2.3973) loss_scale 128.0000 (66.1215) mem 16699MB [2024-08-11 07:33:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][190/625] eta 0:03:17 lr 0.000153 wd 0.0500 time 0.4540 (0.4548) data time 0.0006 (0.0038) model time 0.4534 (0.4476) loss 2.5056 (2.5718) grad_norm 1.9704 (2.3764) loss_scale 128.0000 (69.3613) mem 16699MB [2024-08-11 07:33:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][200/625] eta 0:03:13 lr 0.000153 wd 0.0500 time 0.4502 (0.4546) data time 0.0008 (0.0036) model time 0.4494 (0.4477) loss 2.8198 (2.5718) grad_norm 2.6267 (2.4051) loss_scale 128.0000 (72.2786) mem 16699MB [2024-08-11 07:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][210/625] eta 0:03:08 lr 0.000153 wd 0.0500 time 0.4503 (0.4544) data time 0.0006 (0.0035) model time 0.4496 (0.4479) loss 2.6335 (2.5701) grad_norm 2.2403 (2.4366) loss_scale 128.0000 (74.9194) mem 16699MB [2024-08-11 07:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][220/625] eta 0:03:03 lr 0.000153 wd 0.0500 time 0.4498 (0.4542) data time 0.0006 (0.0034) model time 0.4491 (0.4480) loss 2.3526 (2.5694) grad_norm 3.4383 (2.4552) loss_scale 128.0000 (77.3213) mem 16699MB [2024-08-11 07:34:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][230/625] eta 0:02:59 lr 0.000153 wd 0.0500 time 0.4522 (0.4540) data time 0.0006 (0.0032) model time 0.4516 (0.4480) loss 2.5108 (2.5722) grad_norm 2.1927 (2.4489) loss_scale 128.0000 (79.5152) mem 16699MB [2024-08-11 07:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][240/625] eta 0:02:54 lr 0.000153 wd 0.0500 time 0.4441 (0.4538) data time 0.0006 (0.0031) model time 0.4435 (0.4480) loss 2.2100 (2.5687) grad_norm 1.7004 (2.4328) loss_scale 128.0000 (81.5270) mem 16699MB [2024-08-11 07:34:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][250/625] eta 0:02:50 lr 0.000153 wd 0.0500 time 0.4507 (0.4545) data time 0.0006 (0.0030) model time 0.4501 (0.4492) loss 2.8294 (2.5678) grad_norm 2.2894 (2.4322) loss_scale 128.0000 (83.3785) mem 16699MB [2024-08-11 07:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][260/625] eta 0:02:46 lr 0.000153 wd 0.0500 time 0.6460 (0.4551) data time 0.0010 (0.0030) model time 0.6450 (0.4501) loss 3.0064 (2.5708) grad_norm 2.1438 (2.4277) loss_scale 128.0000 (85.0881) mem 16699MB [2024-08-11 07:34:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][270/625] eta 0:02:41 lr 0.000152 wd 0.0500 time 0.4554 (0.4547) data time 0.0008 (0.0029) model time 0.4546 (0.4498) loss 2.4847 (2.5655) grad_norm 1.8507 (2.4261) loss_scale 128.0000 (86.6716) mem 16699MB [2024-08-11 07:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][280/625] eta 0:02:36 lr 0.000152 wd 0.0500 time 0.4549 (0.4546) data time 0.0009 (0.0028) model time 0.4540 (0.4498) loss 2.7689 (2.5667) grad_norm 2.4241 (2.4516) loss_scale 128.0000 (88.1423) mem 16699MB [2024-08-11 07:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][290/625] eta 0:02:32 lr 0.000152 wd 0.0500 time 0.4626 (0.4545) data time 0.0008 (0.0027) model time 0.4618 (0.4499) loss 2.6127 (2.5691) grad_norm 3.0985 (2.4487) loss_scale 128.0000 (89.5120) mem 16699MB [2024-08-11 07:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][300/625] eta 0:02:27 lr 0.000152 wd 0.0500 time 0.4498 (0.4544) data time 0.0007 (0.0027) model time 0.4491 (0.4499) loss 3.2699 (2.5685) grad_norm 2.7418 (2.4477) loss_scale 128.0000 (90.7907) mem 16699MB [2024-08-11 07:34:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][310/625] eta 0:02:23 lr 0.000152 wd 0.0500 time 0.4487 (0.4543) data time 0.0006 (0.0026) model time 0.4481 (0.4499) loss 2.5093 (2.5660) grad_norm 1.6246 (2.4390) loss_scale 128.0000 (91.9871) mem 16699MB [2024-08-11 07:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][320/625] eta 0:02:18 lr 0.000152 wd 0.0500 time 0.4450 (0.4541) data time 0.0007 (0.0026) model time 0.4444 (0.4498) loss 2.5611 (2.5589) grad_norm 3.0104 (2.4426) loss_scale 128.0000 (93.1090) mem 16699MB [2024-08-11 07:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][330/625] eta 0:02:13 lr 0.000152 wd 0.0500 time 0.4473 (0.4539) data time 0.0007 (0.0025) model time 0.4466 (0.4497) loss 3.3120 (2.5627) grad_norm 1.5189 (2.4444) loss_scale 128.0000 (94.1631) mem 16699MB [2024-08-11 07:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][340/625] eta 0:02:09 lr 0.000152 wd 0.0500 time 0.4468 (0.4538) data time 0.0008 (0.0025) model time 0.4460 (0.4497) loss 2.8726 (2.5681) grad_norm 2.5025 (2.4332) loss_scale 128.0000 (95.1554) mem 16699MB [2024-08-11 07:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][350/625] eta 0:02:04 lr 0.000152 wd 0.0500 time 0.4485 (0.4537) data time 0.0007 (0.0024) model time 0.4478 (0.4497) loss 2.6564 (2.5687) grad_norm 1.7401 (2.4141) loss_scale 128.0000 (96.0912) mem 16699MB [2024-08-11 07:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][360/625] eta 0:02:00 lr 0.000152 wd 0.0500 time 0.4465 (0.4536) data time 0.0009 (0.0024) model time 0.4456 (0.4497) loss 2.3951 (2.5644) grad_norm 3.5275 (2.4059) loss_scale 128.0000 (96.9751) mem 16699MB [2024-08-11 07:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][370/625] eta 0:01:55 lr 0.000152 wd 0.0500 time 0.4516 (0.4535) data time 0.0008 (0.0023) model time 0.4508 (0.4497) loss 2.7107 (2.5650) grad_norm 2.0794 (2.4021) loss_scale 128.0000 (97.8113) mem 16699MB [2024-08-11 07:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][380/625] eta 0:01:51 lr 0.000152 wd 0.0500 time 0.4420 (0.4533) data time 0.0009 (0.0023) model time 0.4410 (0.4495) loss 2.8010 (2.5640) grad_norm 2.3057 (2.3955) loss_scale 128.0000 (98.6037) mem 16699MB [2024-08-11 07:35:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][390/625] eta 0:01:46 lr 0.000152 wd 0.0500 time 0.4474 (0.4531) data time 0.0006 (0.0023) model time 0.4468 (0.4494) loss 1.5783 (2.5630) grad_norm 1.9761 (2.3917) loss_scale 128.0000 (99.3555) mem 16699MB [2024-08-11 07:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][400/625] eta 0:01:42 lr 0.000152 wd 0.0500 time 0.4510 (0.4535) data time 0.0006 (0.0022) model time 0.4504 (0.4499) loss 2.1534 (2.5638) grad_norm 2.3648 (2.3857) loss_scale 128.0000 (100.0698) mem 16699MB [2024-08-11 07:35:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][410/625] eta 0:01:37 lr 0.000151 wd 0.0500 time 0.4510 (0.4538) data time 0.0007 (0.0022) model time 0.4504 (0.4503) loss 3.3062 (2.5658) grad_norm 3.5020 (2.4020) loss_scale 128.0000 (100.7494) mem 16699MB [2024-08-11 07:35:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][420/625] eta 0:01:33 lr 0.000151 wd 0.0500 time 0.4517 (0.4537) data time 0.0008 (0.0022) model time 0.4508 (0.4503) loss 2.5663 (2.5666) grad_norm 1.9051 (2.4006) loss_scale 128.0000 (101.3967) mem 16699MB [2024-08-11 07:35:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][430/625] eta 0:01:28 lr 0.000151 wd 0.0500 time 0.4500 (0.4536) data time 0.0007 (0.0021) model time 0.4493 (0.4503) loss 1.8941 (2.5584) grad_norm 2.8128 (2.4033) loss_scale 128.0000 (102.0139) mem 16699MB [2024-08-11 07:35:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][440/625] eta 0:01:23 lr 0.000151 wd 0.0500 time 0.4540 (0.4536) data time 0.0006 (0.0021) model time 0.4534 (0.4503) loss 2.6546 (2.5557) grad_norm 3.4595 (2.4518) loss_scale 128.0000 (102.6032) mem 16699MB [2024-08-11 07:35:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][450/625] eta 0:01:19 lr 0.000151 wd 0.0500 time 0.4471 (0.4535) data time 0.0006 (0.0021) model time 0.4465 (0.4502) loss 2.8963 (2.5594) grad_norm 3.8164 (2.4690) loss_scale 128.0000 (103.1663) mem 16699MB [2024-08-11 07:35:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][460/625] eta 0:01:14 lr 0.000151 wd 0.0500 time 0.4457 (0.4534) data time 0.0006 (0.0020) model time 0.4451 (0.4502) loss 2.2374 (2.5603) grad_norm 2.0878 (2.4622) loss_scale 128.0000 (103.7050) mem 16699MB [2024-08-11 07:35:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][470/625] eta 0:01:10 lr 0.000151 wd 0.0500 time 0.4506 (0.4533) data time 0.0008 (0.0020) model time 0.4497 (0.4501) loss 2.2669 (2.5600) grad_norm 1.5989 (2.4522) loss_scale 128.0000 (104.2208) mem 16699MB [2024-08-11 07:35:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][480/625] eta 0:01:05 lr 0.000151 wd 0.0500 time 0.4517 (0.4532) data time 0.0009 (0.0020) model time 0.4508 (0.4501) loss 1.7087 (2.5580) grad_norm 2.1562 (2.4495) loss_scale 128.0000 (104.7152) mem 16699MB [2024-08-11 07:35:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][490/625] eta 0:01:01 lr 0.000151 wd 0.0500 time 0.4547 (0.4531) data time 0.0008 (0.0020) model time 0.4539 (0.4501) loss 2.8023 (2.5582) grad_norm 3.8118 (2.4550) loss_scale 128.0000 (105.1894) mem 16699MB [2024-08-11 07:36:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][500/625] eta 0:00:56 lr 0.000151 wd 0.0500 time 0.4515 (0.4531) data time 0.0006 (0.0019) model time 0.4509 (0.4501) loss 3.1920 (2.5616) grad_norm 1.7427 (2.4468) loss_scale 128.0000 (105.6447) mem 16699MB [2024-08-11 07:36:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][510/625] eta 0:00:52 lr 0.000151 wd 0.0500 time 0.4518 (0.4531) data time 0.0009 (0.0019) model time 0.4510 (0.4501) loss 2.9932 (2.5644) grad_norm 3.9030 (2.5952) loss_scale 128.0000 (106.0822) mem 16699MB [2024-08-11 07:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][520/625] eta 0:00:47 lr 0.000151 wd 0.0500 time 0.4472 (0.4530) data time 0.0009 (0.0019) model time 0.4464 (0.4501) loss 2.8371 (2.5651) grad_norm 2.8363 (inf) loss_scale 64.0000 (105.2745) mem 16699MB [2024-08-11 07:36:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][530/625] eta 0:00:43 lr 0.000151 wd 0.0500 time 0.4468 (0.4529) data time 0.0006 (0.0019) model time 0.4462 (0.4500) loss 2.9132 (2.5658) grad_norm 8.3365 (inf) loss_scale 64.0000 (104.4972) mem 16699MB [2024-08-11 07:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][540/625] eta 0:00:38 lr 0.000151 wd 0.0500 time 0.4451 (0.4528) data time 0.0009 (0.0019) model time 0.4442 (0.4500) loss 2.0261 (2.5670) grad_norm 3.6082 (inf) loss_scale 64.0000 (103.7486) mem 16699MB [2024-08-11 07:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][550/625] eta 0:00:33 lr 0.000151 wd 0.0500 time 0.4524 (0.4528) data time 0.0006 (0.0018) model time 0.4518 (0.4499) loss 2.7876 (2.5681) grad_norm 2.1954 (inf) loss_scale 64.0000 (103.0272) mem 16699MB [2024-08-11 07:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][560/625] eta 0:00:29 lr 0.000150 wd 0.0500 time 0.4501 (0.4527) data time 0.0006 (0.0018) model time 0.4495 (0.4499) loss 2.6554 (2.5704) grad_norm 2.7052 (inf) loss_scale 64.0000 (102.3316) mem 16699MB [2024-08-11 07:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][570/625] eta 0:00:24 lr 0.000150 wd 0.0500 time 0.4505 (0.4527) data time 0.0006 (0.0018) model time 0.4498 (0.4500) loss 3.1126 (2.5728) grad_norm 3.8682 (inf) loss_scale 64.0000 (101.6602) mem 16699MB [2024-08-11 07:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][580/625] eta 0:00:20 lr 0.000150 wd 0.0500 time 0.4496 (0.4527) data time 0.0007 (0.0018) model time 0.4490 (0.4500) loss 1.7158 (2.5724) grad_norm 2.3324 (inf) loss_scale 64.0000 (101.0120) mem 16699MB [2024-08-11 07:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][590/625] eta 0:00:15 lr 0.000150 wd 0.0500 time 0.4468 (0.4526) data time 0.0008 (0.0018) model time 0.4460 (0.4499) loss 2.2591 (2.5654) grad_norm 2.6512 (inf) loss_scale 64.0000 (100.3858) mem 16699MB [2024-08-11 07:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][600/625] eta 0:00:11 lr 0.000150 wd 0.0500 time 0.4460 (0.4525) data time 0.0006 (0.0018) model time 0.4454 (0.4499) loss 2.5285 (2.5658) grad_norm 2.8823 (inf) loss_scale 64.0000 (99.7804) mem 16699MB [2024-08-11 07:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][610/625] eta 0:00:06 lr 0.000150 wd 0.0500 time 0.4436 (0.4525) data time 0.0007 (0.0018) model time 0.4429 (0.4498) loss 2.6651 (2.5640) grad_norm 2.3578 (inf) loss_scale 64.0000 (99.1948) mem 16699MB [2024-08-11 07:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [237/300][620/625] eta 0:00:02 lr 0.000150 wd 0.0500 time 0.4453 (0.4523) data time 0.0006 (0.0017) model time 0.4447 (0.4497) loss 2.8505 (2.5624) grad_norm 2.0950 (inf) loss_scale 64.0000 (98.6280) mem 16699MB [2024-08-11 07:36:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 237 training takes 0:04:42 [2024-08-11 07:36:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:37:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:37:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.496 (0.496) Loss 0.5234 (0.5234) Acc@1 89.111 (89.111) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-11 07:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.155) Loss 0.8477 (0.6318) Acc@1 79.980 (86.674) Acc@5 95.654 (97.754) Mem 16699MB [2024-08-11 07:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9219 (0.7461) Acc@1 79.102 (83.866) Acc@5 95.068 (96.642) Mem 16699MB [2024-08-11 07:37:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.591 Acc@5 96.613 [2024-08-11 07:37:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 07:37:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.075 (1.075) Loss 0.4868 (0.4868) Acc@1 89.355 (89.355) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 07:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.207) Loss 0.7812 (0.5950) Acc@1 81.250 (87.318) Acc@5 96.533 (97.909) Mem 16699MB [2024-08-11 07:37:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.163) Loss 0.8687 (0.7011) Acc@1 80.225 (84.570) Acc@5 95.898 (96.968) Mem 16699MB [2024-08-11 07:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.293 Acc@5 96.917 [2024-08-11 07:37:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:37:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][0/625] eta 0:15:15 lr 0.000150 wd 0.0500 time 1.4652 (1.4652) data time 0.7491 (0.7491) model time 0.0000 (0.0000) loss 2.8081 (2.8081) grad_norm 4.2925 (4.2925) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][10/625] eta 0:05:33 lr 0.000150 wd 0.0500 time 0.4484 (0.5421) data time 0.0009 (0.0689) model time 0.0000 (0.0000) loss 2.4680 (2.5585) grad_norm 3.2335 (3.1918) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][20/625] eta 0:05:01 lr 0.000150 wd 0.0500 time 0.4498 (0.4977) data time 0.0007 (0.0364) model time 0.0000 (0.0000) loss 2.9540 (2.5668) grad_norm 1.8004 (2.9826) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][30/625] eta 0:04:47 lr 0.000150 wd 0.0500 time 0.4549 (0.4825) data time 0.0007 (0.0250) model time 0.0000 (0.0000) loss 3.0741 (2.5620) grad_norm 3.5568 (2.9908) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][40/625] eta 0:04:37 lr 0.000150 wd 0.0500 time 0.4456 (0.4741) data time 0.0009 (0.0191) model time 0.0000 (0.0000) loss 2.7749 (2.5715) grad_norm 1.5748 (2.8865) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][50/625] eta 0:04:32 lr 0.000150 wd 0.0500 time 0.4493 (0.4742) data time 0.0006 (0.0155) model time 0.0000 (0.0000) loss 2.5279 (2.5657) grad_norm 1.9810 (2.9124) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][60/625] eta 0:04:25 lr 0.000150 wd 0.0500 time 0.4504 (0.4703) data time 0.0006 (0.0131) model time 0.4498 (0.4494) loss 2.7743 (2.5840) grad_norm 1.8618 (2.9357) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][70/625] eta 0:04:19 lr 0.000150 wd 0.0500 time 0.4494 (0.4677) data time 0.0006 (0.0114) model time 0.4488 (0.4501) loss 2.6345 (2.5710) grad_norm 3.0077 (2.8538) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][80/625] eta 0:04:13 lr 0.000149 wd 0.0500 time 0.4552 (0.4657) data time 0.0006 (0.0100) model time 0.4546 (0.4504) loss 2.7886 (2.5881) grad_norm 3.3202 (2.7696) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][90/625] eta 0:04:08 lr 0.000149 wd 0.0500 time 0.4514 (0.4642) data time 0.0006 (0.0090) model time 0.4508 (0.4505) loss 2.5999 (2.5824) grad_norm 1.9196 (2.6968) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][100/625] eta 0:04:03 lr 0.000149 wd 0.0500 time 0.4473 (0.4630) data time 0.0006 (0.0082) model time 0.4466 (0.4508) loss 2.7595 (2.5687) grad_norm 3.0485 (2.6575) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:37:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][110/625] eta 0:03:57 lr 0.000149 wd 0.0500 time 0.4517 (0.4617) data time 0.0008 (0.0075) model time 0.4510 (0.4502) loss 3.0843 (2.5735) grad_norm 2.3896 (2.6256) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][120/625] eta 0:03:52 lr 0.000149 wd 0.0500 time 0.4412 (0.4606) data time 0.0008 (0.0070) model time 0.4404 (0.4499) loss 2.8762 (2.5865) grad_norm 39.8717 (2.9115) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][130/625] eta 0:03:47 lr 0.000149 wd 0.0500 time 0.4484 (0.4595) data time 0.0008 (0.0065) model time 0.4477 (0.4493) loss 2.6685 (2.5725) grad_norm 2.6671 (2.9957) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][140/625] eta 0:03:42 lr 0.000149 wd 0.0500 time 0.4503 (0.4587) data time 0.0009 (0.0061) model time 0.4494 (0.4491) loss 2.2626 (2.5836) grad_norm 1.7424 (2.9409) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][150/625] eta 0:03:37 lr 0.000149 wd 0.0500 time 0.4439 (0.4579) data time 0.0006 (0.0058) model time 0.4433 (0.4488) loss 3.0174 (2.5960) grad_norm 1.7481 (2.8851) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][160/625] eta 0:03:32 lr 0.000149 wd 0.0500 time 0.4413 (0.4572) data time 0.0007 (0.0055) model time 0.4407 (0.4486) loss 2.1705 (2.5846) grad_norm 1.5478 (2.8524) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][170/625] eta 0:03:27 lr 0.000149 wd 0.0500 time 0.4501 (0.4568) data time 0.0006 (0.0052) model time 0.4494 (0.4486) loss 3.1196 (2.5834) grad_norm 2.3356 (2.8304) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][180/625] eta 0:03:23 lr 0.000149 wd 0.0500 time 0.4446 (0.4562) data time 0.0009 (0.0049) model time 0.4437 (0.4484) loss 2.8642 (2.5758) grad_norm 2.8403 (2.8012) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][190/625] eta 0:03:18 lr 0.000149 wd 0.0500 time 0.4469 (0.4558) data time 0.0008 (0.0047) model time 0.4461 (0.4483) loss 2.5447 (2.5880) grad_norm 2.3594 (2.7734) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][200/625] eta 0:03:13 lr 0.000149 wd 0.0500 time 0.4480 (0.4554) data time 0.0007 (0.0045) model time 0.4473 (0.4482) loss 2.5506 (2.5742) grad_norm 2.8719 (2.7478) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][210/625] eta 0:03:08 lr 0.000149 wd 0.0500 time 0.4573 (0.4550) data time 0.0006 (0.0044) model time 0.4567 (0.4481) loss 1.7784 (2.5663) grad_norm 2.3965 (2.7422) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][220/625] eta 0:03:04 lr 0.000149 wd 0.0500 time 0.4481 (0.4548) data time 0.0008 (0.0042) model time 0.4473 (0.4481) loss 2.6717 (2.5649) grad_norm 1.9755 (2.7248) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][230/625] eta 0:02:59 lr 0.000148 wd 0.0500 time 0.4491 (0.4545) data time 0.0008 (0.0041) model time 0.4482 (0.4481) loss 2.9974 (2.5671) grad_norm 2.5238 (2.7241) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:38:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][240/625] eta 0:02:54 lr 0.000148 wd 0.0500 time 0.4468 (0.4543) data time 0.0006 (0.0039) model time 0.4462 (0.4481) loss 2.3027 (2.5654) grad_norm 2.3826 (2.7212) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][250/625] eta 0:02:50 lr 0.000148 wd 0.0500 time 0.4481 (0.4540) data time 0.0008 (0.0038) model time 0.4473 (0.4480) loss 2.6782 (2.5661) grad_norm 1.9715 (2.7089) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][260/625] eta 0:02:45 lr 0.000148 wd 0.0500 time 0.4434 (0.4536) data time 0.0008 (0.0037) model time 0.4426 (0.4478) loss 2.2974 (2.5654) grad_norm 2.7752 (2.6918) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][270/625] eta 0:02:41 lr 0.000148 wd 0.0500 time 0.4497 (0.4540) data time 0.0006 (0.0036) model time 0.4490 (0.4484) loss 2.2144 (2.5567) grad_norm 1.9797 (2.6740) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][280/625] eta 0:02:36 lr 0.000148 wd 0.0500 time 0.4520 (0.4536) data time 0.0009 (0.0035) model time 0.4511 (0.4483) loss 2.7244 (2.5585) grad_norm 1.7416 (2.6662) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][290/625] eta 0:02:31 lr 0.000148 wd 0.0500 time 0.4471 (0.4535) data time 0.0009 (0.0034) model time 0.4462 (0.4483) loss 2.9339 (2.5550) grad_norm 2.4192 (2.6598) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][300/625] eta 0:02:27 lr 0.000148 wd 0.0500 time 0.4548 (0.4535) data time 0.0006 (0.0033) model time 0.4542 (0.4484) loss 2.1305 (2.5461) grad_norm 3.7518 (2.6727) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][310/625] eta 0:02:22 lr 0.000148 wd 0.0500 time 0.4487 (0.4534) data time 0.0009 (0.0032) model time 0.4478 (0.4484) loss 2.9404 (2.5492) grad_norm 2.2891 (2.6750) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][320/625] eta 0:02:18 lr 0.000148 wd 0.0500 time 0.4624 (0.4533) data time 0.0007 (0.0032) model time 0.4617 (0.4486) loss 3.1029 (2.5534) grad_norm 1.8418 (2.6611) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][330/625] eta 0:02:13 lr 0.000148 wd 0.0500 time 0.4502 (0.4533) data time 0.0009 (0.0031) model time 0.4493 (0.4486) loss 2.6669 (2.5526) grad_norm 2.0659 (2.6423) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][340/625] eta 0:02:09 lr 0.000148 wd 0.0500 time 0.4510 (0.4531) data time 0.0006 (0.0030) model time 0.4504 (0.4486) loss 2.8932 (2.5604) grad_norm 2.0476 (2.6283) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][350/625] eta 0:02:04 lr 0.000148 wd 0.0500 time 0.4458 (0.4530) data time 0.0008 (0.0030) model time 0.4450 (0.4486) loss 3.0572 (2.5554) grad_norm 2.6882 (2.6213) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][360/625] eta 0:02:00 lr 0.000148 wd 0.0500 time 0.4479 (0.4530) data time 0.0009 (0.0029) model time 0.4470 (0.4487) loss 2.0788 (2.5533) grad_norm 2.0477 (2.6210) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][370/625] eta 0:01:55 lr 0.000148 wd 0.0500 time 0.4548 (0.4529) data time 0.0006 (0.0028) model time 0.4542 (0.4487) loss 2.9750 (2.5549) grad_norm 2.1106 (2.6120) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][380/625] eta 0:01:51 lr 0.000147 wd 0.0500 time 0.4525 (0.4534) data time 0.0006 (0.0028) model time 0.4518 (0.4494) loss 2.3372 (2.5505) grad_norm 1.9763 (2.6017) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][390/625] eta 0:01:46 lr 0.000147 wd 0.0500 time 0.4541 (0.4535) data time 0.0008 (0.0027) model time 0.4533 (0.4496) loss 1.9387 (2.5529) grad_norm 3.0244 (2.5920) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][400/625] eta 0:01:42 lr 0.000147 wd 0.0500 time 0.4445 (0.4534) data time 0.0009 (0.0027) model time 0.4436 (0.4495) loss 2.6479 (2.5540) grad_norm 2.6461 (2.5906) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][410/625] eta 0:01:37 lr 0.000147 wd 0.0500 time 0.4437 (0.4535) data time 0.0007 (0.0026) model time 0.4431 (0.4497) loss 2.6325 (2.5507) grad_norm 1.8381 (2.5803) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][420/625] eta 0:01:32 lr 0.000147 wd 0.0500 time 0.4341 (0.4532) data time 0.0007 (0.0026) model time 0.4334 (0.4494) loss 2.6757 (2.5524) grad_norm 4.2958 (2.5981) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][430/625] eta 0:01:28 lr 0.000147 wd 0.0500 time 0.4421 (0.4529) data time 0.0007 (0.0026) model time 0.4414 (0.4492) loss 3.1264 (2.5547) grad_norm 2.6988 (2.5910) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][440/625] eta 0:01:23 lr 0.000147 wd 0.0500 time 0.4392 (0.4526) data time 0.0009 (0.0025) model time 0.4383 (0.4489) loss 2.5201 (2.5507) grad_norm 1.5934 (2.5861) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][450/625] eta 0:01:19 lr 0.000147 wd 0.0500 time 0.4344 (0.4523) data time 0.0007 (0.0025) model time 0.4337 (0.4487) loss 2.8918 (2.5530) grad_norm 2.3254 (2.5757) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][460/625] eta 0:01:14 lr 0.000147 wd 0.0500 time 0.4325 (0.4521) data time 0.0006 (0.0024) model time 0.4319 (0.4485) loss 2.6627 (2.5530) grad_norm 2.2122 (2.5803) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][470/625] eta 0:01:10 lr 0.000147 wd 0.0500 time 0.4373 (0.4519) data time 0.0009 (0.0024) model time 0.4364 (0.4483) loss 1.9674 (2.5506) grad_norm 1.8016 (2.5944) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][480/625] eta 0:01:05 lr 0.000147 wd 0.0500 time 0.4445 (0.4516) data time 0.0008 (0.0024) model time 0.4437 (0.4481) loss 2.1887 (2.5478) grad_norm 2.4345 (2.5898) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][490/625] eta 0:01:00 lr 0.000147 wd 0.0500 time 0.4405 (0.4514) data time 0.0012 (0.0024) model time 0.4393 (0.4479) loss 2.8559 (2.5491) grad_norm 2.2870 (2.5779) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][500/625] eta 0:00:56 lr 0.000147 wd 0.0500 time 0.4398 (0.4512) data time 0.0006 (0.0023) model time 0.4391 (0.4477) loss 3.2365 (2.5500) grad_norm 2.0832 (2.5775) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][510/625] eta 0:00:51 lr 0.000147 wd 0.0500 time 0.4365 (0.4510) data time 0.0007 (0.0023) model time 0.4358 (0.4476) loss 2.9051 (2.5571) grad_norm 2.0649 (2.5786) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][520/625] eta 0:00:47 lr 0.000146 wd 0.0500 time 0.4404 (0.4509) data time 0.0007 (0.0023) model time 0.4397 (0.4475) loss 3.2221 (2.5604) grad_norm 3.3666 (2.5855) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][530/625] eta 0:00:42 lr 0.000146 wd 0.0500 time 0.4492 (0.4507) data time 0.0007 (0.0022) model time 0.4485 (0.4473) loss 2.3421 (2.5584) grad_norm 2.0910 (2.6193) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][540/625] eta 0:00:38 lr 0.000146 wd 0.0500 time 0.4354 (0.4505) data time 0.0008 (0.0022) model time 0.4346 (0.4472) loss 2.7020 (2.5593) grad_norm 1.9711 (2.6085) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][550/625] eta 0:00:33 lr 0.000146 wd 0.0500 time 0.4433 (0.4503) data time 0.0008 (0.0022) model time 0.4424 (0.4470) loss 1.9237 (2.5555) grad_norm 2.0427 (2.5964) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][560/625] eta 0:00:29 lr 0.000146 wd 0.0500 time 0.4357 (0.4501) data time 0.0009 (0.0022) model time 0.4348 (0.4469) loss 2.5403 (2.5543) grad_norm 2.2168 (2.5874) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][570/625] eta 0:00:24 lr 0.000146 wd 0.0500 time 0.4415 (0.4502) data time 0.0009 (0.0022) model time 0.4406 (0.4470) loss 2.2683 (2.5550) grad_norm 3.0170 (2.6106) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][580/625] eta 0:00:20 lr 0.000146 wd 0.0500 time 0.4447 (0.4501) data time 0.0007 (0.0021) model time 0.4440 (0.4469) loss 2.5821 (2.5556) grad_norm 2.2834 (2.6068) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][590/625] eta 0:00:15 lr 0.000146 wd 0.0500 time 0.4323 (0.4499) data time 0.0009 (0.0021) model time 0.4314 (0.4468) loss 2.8134 (2.5517) grad_norm 2.3195 (2.6031) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][600/625] eta 0:00:11 lr 0.000146 wd 0.0500 time 0.4446 (0.4498) data time 0.0006 (0.0021) model time 0.4440 (0.4467) loss 2.6728 (2.5515) grad_norm 2.5206 (2.6009) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][610/625] eta 0:00:06 lr 0.000146 wd 0.0500 time 0.4378 (0.4497) data time 0.0005 (0.0021) model time 0.4374 (0.4466) loss 1.9375 (2.5515) grad_norm 3.2117 (2.5989) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [238/300][620/625] eta 0:00:02 lr 0.000146 wd 0.0500 time 0.4425 (0.4495) data time 0.0006 (0.0020) model time 0.4419 (0.4464) loss 2.5571 (2.5532) grad_norm 1.8249 (2.5954) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 238 training takes 0:04:40 [2024-08-11 07:41:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:41:49 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:41:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.492 (0.492) Loss 0.5146 (0.5146) Acc@1 89.160 (89.160) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-11 07:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.154) Loss 0.8296 (0.6256) Acc@1 80.469 (86.723) Acc@5 96.143 (97.741) Mem 16699MB [2024-08-11 07:41:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.9253 (0.7422) Acc@1 78.906 (83.896) Acc@5 95.117 (96.696) Mem 16699MB [2024-08-11 07:41:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.641 Acc@5 96.667 [2024-08-11 07:41:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 07:41:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.822 (0.822) Loss 0.4883 (0.4883) Acc@1 89.355 (89.355) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 07:41:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.185) Loss 0.7817 (0.5955) Acc@1 81.201 (87.282) Acc@5 96.533 (97.905) Mem 16699MB [2024-08-11 07:41:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8701 (0.7018) Acc@1 80.176 (84.542) Acc@5 95.752 (96.952) Mem 16699MB [2024-08-11 07:41:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.273 Acc@5 96.907 [2024-08-11 07:41:56 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][0/625] eta 0:13:43 lr 0.000146 wd 0.0500 time 1.3178 (1.3178) data time 0.5531 (0.5531) model time 0.0000 (0.0000) loss 2.6911 (2.6911) grad_norm 1.7185 (1.7185) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][10/625] eta 0:05:19 lr 0.000146 wd 0.0500 time 0.4430 (0.5200) data time 0.0007 (0.0511) model time 0.0000 (0.0000) loss 2.4652 (2.6645) grad_norm 2.2442 (2.2270) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][20/625] eta 0:04:51 lr 0.000146 wd 0.0500 time 0.4412 (0.4818) data time 0.0007 (0.0271) model time 0.0000 (0.0000) loss 1.8010 (2.5901) grad_norm 3.2299 (2.5626) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][30/625] eta 0:04:38 lr 0.000146 wd 0.0500 time 0.4414 (0.4687) data time 0.0009 (0.0187) model time 0.0000 (0.0000) loss 2.2162 (2.6501) grad_norm 1.7761 (2.5040) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][40/625] eta 0:04:30 lr 0.000146 wd 0.0500 time 0.4453 (0.4622) data time 0.0006 (0.0143) model time 0.0000 (0.0000) loss 2.8624 (2.5433) grad_norm 3.1217 (2.4295) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][50/625] eta 0:04:24 lr 0.000145 wd 0.0500 time 0.4451 (0.4599) data time 0.0007 (0.0117) model time 0.0000 (0.0000) loss 2.2870 (2.5483) grad_norm 3.5352 (2.4141) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][60/625] eta 0:04:18 lr 0.000145 wd 0.0500 time 0.4379 (0.4568) data time 0.0007 (0.0099) model time 0.4372 (0.4401) loss 2.6649 (2.5345) grad_norm 1.9126 (2.5929) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][70/625] eta 0:04:12 lr 0.000145 wd 0.0500 time 0.4411 (0.4546) data time 0.0007 (0.0086) model time 0.4404 (0.4402) loss 2.7837 (2.5638) grad_norm 4.1708 (2.6929) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][80/625] eta 0:04:06 lr 0.000145 wd 0.0500 time 0.4419 (0.4528) data time 0.0008 (0.0077) model time 0.4410 (0.4399) loss 3.0125 (2.5665) grad_norm 4.3504 (2.8203) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][90/625] eta 0:04:01 lr 0.000145 wd 0.0500 time 0.4431 (0.4518) data time 0.0009 (0.0069) model time 0.4422 (0.4406) loss 2.1775 (2.5381) grad_norm 1.9367 (2.8075) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][100/625] eta 0:03:56 lr 0.000145 wd 0.0500 time 0.4386 (0.4508) data time 0.0006 (0.0063) model time 0.4379 (0.4405) loss 2.0955 (2.5341) grad_norm 2.0640 (2.9253) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][110/625] eta 0:03:51 lr 0.000145 wd 0.0500 time 0.4331 (0.4499) data time 0.0006 (0.0058) model time 0.4324 (0.4406) loss 3.0100 (2.5469) grad_norm 7.0163 (2.9365) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][120/625] eta 0:03:46 lr 0.000145 wd 0.0500 time 0.4468 (0.4493) data time 0.0009 (0.0054) model time 0.4459 (0.4407) loss 2.6510 (2.5573) grad_norm 3.7180 (2.9495) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:42:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][130/625] eta 0:03:42 lr 0.000145 wd 0.0500 time 0.4421 (0.4485) data time 0.0007 (0.0051) model time 0.4414 (0.4404) loss 2.6239 (2.5660) grad_norm 4.0766 (2.9494) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][140/625] eta 0:03:37 lr 0.000145 wd 0.0500 time 0.6458 (0.4495) data time 0.0009 (0.0048) model time 0.6449 (0.4427) loss 2.9470 (2.5521) grad_norm 2.3844 (2.9121) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][150/625] eta 0:03:33 lr 0.000145 wd 0.0500 time 0.4473 (0.4490) data time 0.0007 (0.0045) model time 0.4466 (0.4425) loss 2.6252 (2.5616) grad_norm 2.2781 (2.8732) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][160/625] eta 0:03:28 lr 0.000145 wd 0.0500 time 0.4454 (0.4485) data time 0.0007 (0.0043) model time 0.4447 (0.4423) loss 2.5889 (2.5689) grad_norm 2.4041 (2.8313) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][170/625] eta 0:03:23 lr 0.000145 wd 0.0500 time 0.4446 (0.4483) data time 0.0007 (0.0041) model time 0.4438 (0.4424) loss 2.8811 (2.5679) grad_norm 2.5139 (2.7940) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][180/625] eta 0:03:19 lr 0.000145 wd 0.0500 time 0.4347 (0.4479) data time 0.0007 (0.0039) model time 0.4340 (0.4424) loss 2.2353 (2.5656) grad_norm 1.5490 (2.8235) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][190/625] eta 0:03:14 lr 0.000144 wd 0.0500 time 0.4429 (0.4478) data time 0.0007 (0.0038) model time 0.4423 (0.4425) loss 2.5999 (2.5663) grad_norm 2.6639 (2.8330) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][200/625] eta 0:03:10 lr 0.000144 wd 0.0500 time 0.4414 (0.4475) data time 0.0009 (0.0036) model time 0.4405 (0.4424) loss 3.0690 (2.5665) grad_norm 1.9499 (2.8420) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][210/625] eta 0:03:05 lr 0.000144 wd 0.0500 time 0.4344 (0.4472) data time 0.0006 (0.0035) model time 0.4338 (0.4423) loss 2.7959 (2.5640) grad_norm 2.1310 (2.8302) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][220/625] eta 0:03:01 lr 0.000144 wd 0.0500 time 0.4445 (0.4471) data time 0.0007 (0.0034) model time 0.4438 (0.4424) loss 2.6085 (2.5663) grad_norm 2.0345 (2.7998) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][230/625] eta 0:02:56 lr 0.000144 wd 0.0500 time 0.4428 (0.4469) data time 0.0008 (0.0032) model time 0.4420 (0.4423) loss 2.5764 (2.5700) grad_norm 2.0090 (2.7925) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][240/625] eta 0:02:51 lr 0.000144 wd 0.0500 time 0.4424 (0.4466) data time 0.0007 (0.0032) model time 0.4417 (0.4422) loss 2.6746 (2.5780) grad_norm 2.1280 (2.8728) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][250/625] eta 0:02:47 lr 0.000144 wd 0.0500 time 0.4449 (0.4465) data time 0.0009 (0.0031) model time 0.4439 (0.4422) loss 3.0745 (2.5834) grad_norm 2.1201 (2.8541) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][260/625] eta 0:02:42 lr 0.000144 wd 0.0500 time 0.4371 (0.4463) data time 0.0007 (0.0030) model time 0.4364 (0.4421) loss 2.7434 (2.5781) grad_norm 12.0813 (2.8802) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:43:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][270/625] eta 0:02:38 lr 0.000144 wd 0.0500 time 0.4441 (0.4467) data time 0.0007 (0.0029) model time 0.4434 (0.4428) loss 2.1674 (2.5755) grad_norm 2.9526 (2.8636) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][280/625] eta 0:02:34 lr 0.000144 wd 0.0500 time 0.4321 (0.4465) data time 0.0007 (0.0028) model time 0.4314 (0.4426) loss 2.9933 (2.5706) grad_norm 2.6314 (2.8517) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][290/625] eta 0:02:29 lr 0.000144 wd 0.0500 time 0.4397 (0.4463) data time 0.0008 (0.0028) model time 0.4389 (0.4425) loss 2.7649 (2.5720) grad_norm 2.4700 (2.8619) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][300/625] eta 0:02:24 lr 0.000144 wd 0.0500 time 0.4352 (0.4461) data time 0.0006 (0.0027) model time 0.4346 (0.4424) loss 2.3056 (2.5693) grad_norm 3.1659 (2.9165) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][310/625] eta 0:02:20 lr 0.000144 wd 0.0500 time 0.4435 (0.4459) data time 0.0008 (0.0026) model time 0.4427 (0.4423) loss 1.7929 (2.5619) grad_norm 2.4940 (2.9139) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][320/625] eta 0:02:15 lr 0.000144 wd 0.0500 time 0.4505 (0.4458) data time 0.0009 (0.0026) model time 0.4496 (0.4423) loss 2.8695 (2.5651) grad_norm 2.4252 (2.9263) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][330/625] eta 0:02:11 lr 0.000144 wd 0.0500 time 0.4382 (0.4457) data time 0.0009 (0.0025) model time 0.4373 (0.4423) loss 2.4262 (2.5682) grad_norm 3.3114 (2.9185) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][340/625] eta 0:02:07 lr 0.000143 wd 0.0500 time 0.4397 (0.4456) data time 0.0009 (0.0025) model time 0.4389 (0.4423) loss 2.9900 (2.5731) grad_norm 2.0600 (2.8972) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][350/625] eta 0:02:02 lr 0.000143 wd 0.0500 time 0.4397 (0.4456) data time 0.0008 (0.0024) model time 0.4389 (0.4423) loss 3.0057 (2.5692) grad_norm 2.6568 (2.8833) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][360/625] eta 0:01:58 lr 0.000143 wd 0.0500 time 0.4400 (0.4455) data time 0.0009 (0.0024) model time 0.4391 (0.4422) loss 2.9337 (2.5757) grad_norm 2.2217 (2.8917) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][370/625] eta 0:01:53 lr 0.000143 wd 0.0500 time 0.4420 (0.4463) data time 0.0009 (0.0023) model time 0.4411 (0.4433) loss 2.6659 (2.5775) grad_norm 2.2905 (2.8982) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][380/625] eta 0:01:49 lr 0.000143 wd 0.0500 time 0.4384 (0.4462) data time 0.0006 (0.0023) model time 0.4377 (0.4432) loss 3.2508 (2.5818) grad_norm 2.0347 (2.8995) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][390/625] eta 0:01:44 lr 0.000143 wd 0.0500 time 0.4438 (0.4461) data time 0.0009 (0.0023) model time 0.4429 (0.4432) loss 2.2846 (2.5808) grad_norm 1.9954 (2.8899) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][400/625] eta 0:01:40 lr 0.000143 wd 0.0500 time 0.4396 (0.4460) data time 0.0007 (0.0022) model time 0.4388 (0.4432) loss 2.5303 (2.5845) grad_norm 2.0478 (2.8723) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][410/625] eta 0:01:35 lr 0.000143 wd 0.0500 time 0.4381 (0.4463) data time 0.0009 (0.0022) model time 0.4372 (0.4435) loss 2.3445 (2.5853) grad_norm 1.7120 (2.8636) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][420/625] eta 0:01:31 lr 0.000143 wd 0.0500 time 0.4396 (0.4461) data time 0.0006 (0.0022) model time 0.4390 (0.4434) loss 2.1351 (2.5825) grad_norm 1.8812 (2.8502) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][430/625] eta 0:01:26 lr 0.000143 wd 0.0500 time 0.4300 (0.4459) data time 0.0007 (0.0021) model time 0.4293 (0.4432) loss 1.8112 (2.5791) grad_norm 3.6133 (2.8364) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][440/625] eta 0:01:22 lr 0.000143 wd 0.0500 time 0.4407 (0.4457) data time 0.0009 (0.0021) model time 0.4398 (0.4430) loss 2.5231 (2.5789) grad_norm 1.9871 (2.8404) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][450/625] eta 0:01:17 lr 0.000143 wd 0.0500 time 0.4451 (0.4456) data time 0.0008 (0.0021) model time 0.4443 (0.4430) loss 3.1326 (2.5820) grad_norm 2.3903 (2.8703) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][460/625] eta 0:01:13 lr 0.000143 wd 0.0500 time 0.4395 (0.4456) data time 0.0007 (0.0021) model time 0.4389 (0.4429) loss 2.4975 (2.5823) grad_norm 2.3538 (2.8825) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][470/625] eta 0:01:09 lr 0.000143 wd 0.0500 time 0.4525 (0.4456) data time 0.0006 (0.0020) model time 0.4519 (0.4430) loss 2.7608 (2.5809) grad_norm 2.0388 (2.8716) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][480/625] eta 0:01:04 lr 0.000143 wd 0.0500 time 0.4381 (0.4455) data time 0.0007 (0.0020) model time 0.4374 (0.4429) loss 2.1481 (2.5834) grad_norm 2.9297 (2.8575) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][490/625] eta 0:01:00 lr 0.000142 wd 0.0500 time 0.4411 (0.4454) data time 0.0008 (0.0020) model time 0.4403 (0.4429) loss 1.7759 (2.5802) grad_norm 2.5332 (2.8439) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][500/625] eta 0:00:55 lr 0.000142 wd 0.0500 time 0.4486 (0.4454) data time 0.0009 (0.0020) model time 0.4477 (0.4429) loss 2.8488 (2.5811) grad_norm 3.8076 (2.8378) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][510/625] eta 0:00:51 lr 0.000142 wd 0.0500 time 0.4400 (0.4460) data time 0.0007 (0.0019) model time 0.4393 (0.4436) loss 2.8705 (2.5791) grad_norm 2.2439 (2.8278) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][520/625] eta 0:00:46 lr 0.000142 wd 0.0500 time 0.4471 (0.4460) data time 0.0006 (0.0019) model time 0.4465 (0.4436) loss 2.7827 (2.5789) grad_norm 2.3277 (2.8772) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][530/625] eta 0:00:42 lr 0.000142 wd 0.0500 time 0.4369 (0.4459) data time 0.0009 (0.0019) model time 0.4361 (0.4436) loss 2.1054 (2.5798) grad_norm 2.7387 (2.8740) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:45:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][540/625] eta 0:00:37 lr 0.000142 wd 0.0500 time 0.4430 (0.4459) data time 0.0008 (0.0019) model time 0.4422 (0.4435) loss 2.6220 (2.5800) grad_norm 2.3456 (2.8704) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][550/625] eta 0:00:33 lr 0.000142 wd 0.0500 time 0.4425 (0.4458) data time 0.0006 (0.0019) model time 0.4419 (0.4435) loss 2.5049 (2.5838) grad_norm 3.8508 (2.8818) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][560/625] eta 0:00:28 lr 0.000142 wd 0.0500 time 0.4496 (0.4458) data time 0.0009 (0.0018) model time 0.4487 (0.4435) loss 2.6549 (2.5851) grad_norm 2.1499 (2.8747) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][570/625] eta 0:00:24 lr 0.000142 wd 0.0500 time 0.4427 (0.4457) data time 0.0008 (0.0018) model time 0.4419 (0.4435) loss 1.8637 (2.5829) grad_norm 3.0044 (2.8725) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][580/625] eta 0:00:20 lr 0.000142 wd 0.0500 time 0.4514 (0.4457) data time 0.0008 (0.0018) model time 0.4506 (0.4434) loss 3.0636 (2.5870) grad_norm 2.4404 (2.8638) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][590/625] eta 0:00:15 lr 0.000142 wd 0.0500 time 0.4391 (0.4457) data time 0.0010 (0.0018) model time 0.4381 (0.4435) loss 2.4047 (2.5905) grad_norm 2.4182 (2.8517) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][600/625] eta 0:00:11 lr 0.000142 wd 0.0500 time 0.4421 (0.4456) data time 0.0009 (0.0018) model time 0.4412 (0.4434) loss 2.7379 (2.5878) grad_norm 1.9997 (2.8449) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][610/625] eta 0:00:06 lr 0.000142 wd 0.0500 time 0.4436 (0.4456) data time 0.0004 (0.0018) model time 0.4432 (0.4434) loss 2.2361 (2.5902) grad_norm 2.1843 (2.8347) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [239/300][620/625] eta 0:00:02 lr 0.000142 wd 0.0500 time 0.4407 (0.4455) data time 0.0006 (0.0017) model time 0.4400 (0.4433) loss 2.4513 (2.5841) grad_norm 2.0137 (2.8366) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 239 training takes 0:04:38 [2024-08-11 07:46:35 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:46:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:46:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.5142 (0.5142) Acc@1 89.014 (89.014) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 07:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8447 (0.6241) Acc@1 80.176 (86.790) Acc@5 95.947 (97.807) Mem 16699MB [2024-08-11 07:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.8936 (0.7368) Acc@1 80.029 (84.047) Acc@5 95.752 (96.773) Mem 16699MB [2024-08-11 07:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.773 Acc@5 96.737 [2024-08-11 07:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 07:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.77% [2024-08-11 07:46:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-11 07:46:41 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-11 07:46:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.476 (0.476) Loss 0.4885 (0.4885) Acc@1 89.307 (89.307) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 07:46:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.7837 (0.5958) Acc@1 81.250 (87.300) Acc@5 96.338 (97.900) Mem 16699MB [2024-08-11 07:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.8701 (0.7025) Acc@1 80.176 (84.545) Acc@5 95.752 (96.956) Mem 16699MB [2024-08-11 07:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.277 Acc@5 96.907 [2024-08-11 07:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.3% [2024-08-11 07:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][0/625] eta 0:13:43 lr 0.000142 wd 0.0500 time 1.3176 (1.3176) data time 0.7970 (0.7970) model time 0.0000 (0.0000) loss 2.7729 (2.7729) grad_norm 2.9435 (2.9435) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][10/625] eta 0:05:29 lr 0.000142 wd 0.0500 time 0.4427 (0.5350) data time 0.0007 (0.0733) model time 0.0000 (0.0000) loss 3.0176 (2.4863) grad_norm 2.8281 (2.5602) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][20/625] eta 0:04:56 lr 0.000141 wd 0.0500 time 0.4446 (0.4906) data time 0.0008 (0.0388) model time 0.0000 (0.0000) loss 2.8460 (2.5276) grad_norm 3.2148 (2.4039) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:46:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][30/625] eta 0:04:42 lr 0.000141 wd 0.0500 time 0.4360 (0.4751) data time 0.0008 (0.0265) model time 0.0000 (0.0000) loss 2.2816 (2.5006) grad_norm 2.0499 (2.6598) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][40/625] eta 0:04:32 lr 0.000141 wd 0.0500 time 0.4418 (0.4664) data time 0.0009 (0.0203) model time 0.0000 (0.0000) loss 2.6505 (2.4710) grad_norm 1.7020 (2.4976) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][50/625] eta 0:04:25 lr 0.000141 wd 0.0500 time 0.4458 (0.4620) data time 0.0009 (0.0165) model time 0.0000 (0.0000) loss 2.8111 (2.4902) grad_norm 2.1433 (4.5651) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][60/625] eta 0:04:19 lr 0.000141 wd 0.0500 time 0.4468 (0.4590) data time 0.0006 (0.0139) model time 0.4462 (0.4430) loss 2.7952 (2.4852) grad_norm 5.8740 (4.3391) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][70/625] eta 0:04:13 lr 0.000141 wd 0.0500 time 0.4391 (0.4567) data time 0.0010 (0.0121) model time 0.4381 (0.4423) loss 2.5653 (2.4907) grad_norm 2.9390 (4.1824) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][80/625] eta 0:04:07 lr 0.000141 wd 0.0500 time 0.4433 (0.4549) data time 0.0007 (0.0107) model time 0.4427 (0.4420) loss 3.0763 (2.5087) grad_norm 1.9101 (3.9917) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][90/625] eta 0:04:02 lr 0.000141 wd 0.0500 time 0.4465 (0.4532) data time 0.0006 (0.0096) model time 0.4459 (0.4412) loss 2.0771 (2.5171) grad_norm 1.8796 (3.8267) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][100/625] eta 0:03:58 lr 0.000141 wd 0.0500 time 0.4353 (0.4540) data time 0.0006 (0.0087) model time 0.4346 (0.4450) loss 2.8265 (2.5314) grad_norm 1.9696 (3.7124) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][110/625] eta 0:03:53 lr 0.000141 wd 0.0500 time 0.4570 (0.4528) data time 0.0008 (0.0080) model time 0.4561 (0.4442) loss 2.8597 (2.5341) grad_norm 1.6727 (3.5724) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][120/625] eta 0:03:48 lr 0.000141 wd 0.0500 time 0.4463 (0.4521) data time 0.0009 (0.0074) model time 0.4454 (0.4440) loss 2.6977 (2.5263) grad_norm 1.5596 (3.4494) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][130/625] eta 0:03:43 lr 0.000141 wd 0.0500 time 0.4455 (0.4515) data time 0.0006 (0.0069) model time 0.4448 (0.4439) loss 1.6798 (2.5251) grad_norm 1.6170 (3.3660) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][140/625] eta 0:03:38 lr 0.000141 wd 0.0500 time 0.4424 (0.4511) data time 0.0007 (0.0065) model time 0.4417 (0.4440) loss 3.2075 (2.5291) grad_norm 2.1210 (3.3161) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][150/625] eta 0:03:34 lr 0.000141 wd 0.0500 time 0.4458 (0.4506) data time 0.0009 (0.0061) model time 0.4450 (0.4439) loss 2.6985 (2.5182) grad_norm 2.5308 (3.2413) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:47:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][160/625] eta 0:03:29 lr 0.000141 wd 0.0500 time 0.4474 (0.4500) data time 0.0008 (0.0058) model time 0.4466 (0.4436) loss 2.8083 (2.5172) grad_norm 2.8975 (3.2470) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][170/625] eta 0:03:24 lr 0.000140 wd 0.0500 time 0.4439 (0.4497) data time 0.0008 (0.0055) model time 0.4432 (0.4436) loss 3.0154 (2.5073) grad_norm 2.8998 (3.3611) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][180/625] eta 0:03:19 lr 0.000140 wd 0.0500 time 0.4455 (0.4491) data time 0.0008 (0.0053) model time 0.4447 (0.4432) loss 2.6263 (2.5026) grad_norm 2.1779 (3.3168) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][190/625] eta 0:03:15 lr 0.000140 wd 0.0500 time 0.4411 (0.4488) data time 0.0008 (0.0050) model time 0.4403 (0.4431) loss 2.5020 (2.5062) grad_norm 2.8976 (3.2671) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][200/625] eta 0:03:10 lr 0.000140 wd 0.0500 time 0.4372 (0.4483) data time 0.0006 (0.0048) model time 0.4365 (0.4428) loss 1.7912 (2.5059) grad_norm 2.0600 (3.2917) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][210/625] eta 0:03:05 lr 0.000140 wd 0.0500 time 0.4445 (0.4480) data time 0.0008 (0.0046) model time 0.4437 (0.4427) loss 3.1114 (2.5200) grad_norm 1.4965 (3.2665) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][220/625] eta 0:03:01 lr 0.000140 wd 0.0500 time 0.4429 (0.4478) data time 0.0008 (0.0045) model time 0.4421 (0.4426) loss 2.6223 (2.5311) grad_norm 3.1864 (3.2665) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][230/625] eta 0:02:56 lr 0.000140 wd 0.0500 time 0.4355 (0.4475) data time 0.0007 (0.0043) model time 0.4349 (0.4425) loss 2.1867 (2.5297) grad_norm 1.9058 (3.2404) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][240/625] eta 0:02:52 lr 0.000140 wd 0.0500 time 0.4404 (0.4472) data time 0.0008 (0.0042) model time 0.4397 (0.4423) loss 1.7518 (2.5244) grad_norm 2.8548 (3.2007) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][250/625] eta 0:02:47 lr 0.000140 wd 0.0500 time 0.4454 (0.4470) data time 0.0009 (0.0040) model time 0.4445 (0.4423) loss 2.5360 (2.5279) grad_norm 2.4921 (3.1667) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][260/625] eta 0:02:43 lr 0.000140 wd 0.0500 time 0.4372 (0.4468) data time 0.0007 (0.0039) model time 0.4365 (0.4422) loss 2.8480 (2.5307) grad_norm 1.9555 (3.1342) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][270/625] eta 0:02:38 lr 0.000140 wd 0.0500 time 0.4447 (0.4467) data time 0.0009 (0.0038) model time 0.4438 (0.4423) loss 1.8809 (2.5298) grad_norm 3.3398 (3.1891) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][280/625] eta 0:02:34 lr 0.000140 wd 0.0500 time 0.4496 (0.4465) data time 0.0007 (0.0037) model time 0.4490 (0.4422) loss 2.9828 (2.5351) grad_norm 2.7513 (3.3287) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][290/625] eta 0:02:29 lr 0.000140 wd 0.0500 time 0.4434 (0.4463) data time 0.0007 (0.0036) model time 0.4427 (0.4421) loss 1.5630 (2.5273) grad_norm 2.2276 (3.2996) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][300/625] eta 0:02:24 lr 0.000140 wd 0.0500 time 0.4394 (0.4461) data time 0.0007 (0.0035) model time 0.4387 (0.4421) loss 3.0083 (2.5318) grad_norm 2.2084 (3.2737) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][310/625] eta 0:02:20 lr 0.000140 wd 0.0500 time 0.4444 (0.4460) data time 0.0009 (0.0034) model time 0.4436 (0.4420) loss 2.5037 (2.5265) grad_norm 1.7762 (3.2426) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][320/625] eta 0:02:15 lr 0.000139 wd 0.0500 time 0.4410 (0.4459) data time 0.0007 (0.0033) model time 0.4403 (0.4420) loss 1.7750 (2.5251) grad_norm 1.8214 (3.2055) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][330/625] eta 0:02:11 lr 0.000139 wd 0.0500 time 0.4473 (0.4458) data time 0.0007 (0.0033) model time 0.4466 (0.4420) loss 2.1765 (2.5219) grad_norm 1.8369 (3.1772) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][340/625] eta 0:02:07 lr 0.000139 wd 0.0500 time 0.4404 (0.4458) data time 0.0009 (0.0032) model time 0.4395 (0.4421) loss 1.9569 (2.5218) grad_norm 1.9254 (3.1482) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][350/625] eta 0:02:02 lr 0.000139 wd 0.0500 time 0.4476 (0.4462) data time 0.0007 (0.0031) model time 0.4470 (0.4426) loss 2.6373 (2.5257) grad_norm 1.9592 (3.1370) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][360/625] eta 0:01:58 lr 0.000139 wd 0.0500 time 0.4378 (0.4460) data time 0.0009 (0.0031) model time 0.4369 (0.4425) loss 2.8544 (2.5342) grad_norm 2.2279 (3.1165) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][370/625] eta 0:01:53 lr 0.000139 wd 0.0500 time 0.4455 (0.4459) data time 0.0010 (0.0030) model time 0.4445 (0.4424) loss 1.8078 (2.5321) grad_norm 2.0878 (3.1068) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][380/625] eta 0:01:49 lr 0.000139 wd 0.0500 time 0.4441 (0.4458) data time 0.0010 (0.0029) model time 0.4431 (0.4424) loss 2.5456 (2.5348) grad_norm 4.0326 (3.1084) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][390/625] eta 0:01:44 lr 0.000139 wd 0.0500 time 0.4452 (0.4457) data time 0.0007 (0.0029) model time 0.4446 (0.4424) loss 2.1245 (2.5385) grad_norm 1.9085 (3.1049) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][400/625] eta 0:01:40 lr 0.000139 wd 0.0500 time 0.4432 (0.4456) data time 0.0009 (0.0028) model time 0.4423 (0.4424) loss 2.7921 (2.5362) grad_norm 1.8803 (3.1198) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][410/625] eta 0:01:35 lr 0.000139 wd 0.0500 time 0.4474 (0.4456) data time 0.0009 (0.0028) model time 0.4464 (0.4424) loss 2.2009 (2.5402) grad_norm 2.0747 (3.0979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][420/625] eta 0:01:31 lr 0.000139 wd 0.0500 time 0.4455 (0.4455) data time 0.0008 (0.0027) model time 0.4447 (0.4424) loss 2.8975 (2.5490) grad_norm 3.2532 (3.0787) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][430/625] eta 0:01:26 lr 0.000139 wd 0.0500 time 0.6561 (0.4459) data time 0.0006 (0.0027) model time 0.6555 (0.4428) loss 2.7416 (2.5501) grad_norm 2.2576 (3.0628) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][440/625] eta 0:01:22 lr 0.000139 wd 0.0500 time 0.4438 (0.4458) data time 0.0006 (0.0027) model time 0.4431 (0.4428) loss 2.9491 (2.5524) grad_norm 2.3370 (3.0501) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][450/625] eta 0:01:17 lr 0.000139 wd 0.0500 time 0.4364 (0.4457) data time 0.0006 (0.0026) model time 0.4357 (0.4427) loss 2.3513 (2.5502) grad_norm 2.3363 (3.0297) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][460/625] eta 0:01:13 lr 0.000139 wd 0.0500 time 0.4454 (0.4456) data time 0.0009 (0.0026) model time 0.4445 (0.4427) loss 3.1124 (2.5528) grad_norm 10.9483 (3.0278) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][470/625] eta 0:01:09 lr 0.000138 wd 0.0500 time 0.4599 (0.4456) data time 0.0009 (0.0025) model time 0.4591 (0.4428) loss 2.9628 (2.5536) grad_norm 2.4877 (3.0133) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][480/625] eta 0:01:04 lr 0.000138 wd 0.0500 time 0.4559 (0.4457) data time 0.0008 (0.0025) model time 0.4551 (0.4429) loss 2.6993 (2.5559) grad_norm 2.6239 (3.0029) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][490/625] eta 0:01:00 lr 0.000138 wd 0.0500 time 0.4492 (0.4457) data time 0.0006 (0.0025) model time 0.4486 (0.4430) loss 2.2855 (2.5571) grad_norm 1.8962 (2.9854) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][500/625] eta 0:00:55 lr 0.000138 wd 0.0500 time 0.4446 (0.4457) data time 0.0008 (0.0024) model time 0.4438 (0.4430) loss 2.1086 (2.5481) grad_norm 1.4937 (2.9686) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][510/625] eta 0:00:51 lr 0.000138 wd 0.0500 time 0.4464 (0.4458) data time 0.0006 (0.0024) model time 0.4458 (0.4431) loss 2.8536 (2.5445) grad_norm 2.0020 (2.9751) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][520/625] eta 0:00:46 lr 0.000138 wd 0.0500 time 0.4498 (0.4458) data time 0.0008 (0.0024) model time 0.4490 (0.4431) loss 2.5492 (2.5449) grad_norm 2.5175 (2.9610) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][530/625] eta 0:00:42 lr 0.000138 wd 0.0500 time 0.4455 (0.4458) data time 0.0008 (0.0024) model time 0.4447 (0.4432) loss 2.5136 (2.5494) grad_norm 2.2427 (2.9545) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][540/625] eta 0:00:37 lr 0.000138 wd 0.0500 time 0.4512 (0.4459) data time 0.0006 (0.0023) model time 0.4506 (0.4433) loss 2.4531 (2.5499) grad_norm 2.3971 (2.9432) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][550/625] eta 0:00:33 lr 0.000138 wd 0.0500 time 0.4489 (0.4459) data time 0.0006 (0.0023) model time 0.4483 (0.4434) loss 1.8565 (2.5497) grad_norm 2.1441 (2.9258) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][560/625] eta 0:00:28 lr 0.000138 wd 0.0500 time 0.4509 (0.4459) data time 0.0008 (0.0023) model time 0.4501 (0.4435) loss 2.7875 (2.5510) grad_norm 2.4684 (2.9448) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:50:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][570/625] eta 0:00:24 lr 0.000138 wd 0.0500 time 0.4464 (0.4459) data time 0.0006 (0.0022) model time 0.4458 (0.4435) loss 2.3082 (2.5500) grad_norm 2.9018 (2.9598) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][580/625] eta 0:00:20 lr 0.000138 wd 0.0500 time 0.4470 (0.4460) data time 0.0008 (0.0022) model time 0.4461 (0.4436) loss 2.9224 (2.5486) grad_norm 4.0829 (2.9532) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][590/625] eta 0:00:15 lr 0.000138 wd 0.0500 time 0.4457 (0.4460) data time 0.0007 (0.0022) model time 0.4451 (0.4437) loss 1.9578 (2.5448) grad_norm 1.8154 (2.9371) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][600/625] eta 0:00:11 lr 0.000138 wd 0.0500 time 0.4482 (0.4460) data time 0.0006 (0.0022) model time 0.4475 (0.4437) loss 2.6003 (2.5461) grad_norm 2.0981 (2.9326) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][610/625] eta 0:00:06 lr 0.000138 wd 0.0500 time 0.4433 (0.4460) data time 0.0005 (0.0022) model time 0.4428 (0.4437) loss 1.6077 (2.5440) grad_norm 2.7458 (2.9315) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [240/300][620/625] eta 0:00:02 lr 0.000137 wd 0.0500 time 0.5966 (0.4462) data time 0.0004 (0.0021) model time 0.5962 (0.4439) loss 2.8885 (2.5468) grad_norm 1.5766 (2.9206) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:23 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 240 training takes 0:04:38 [2024-08-11 07:51:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:51:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:51:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5332 (0.5332) Acc@1 88.818 (88.818) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-11 07:51:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8242 (0.6321) Acc@1 81.104 (86.790) Acc@5 96.240 (97.736) Mem 16699MB [2024-08-11 07:51:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9028 (0.7464) Acc@1 79.980 (83.975) Acc@5 95.459 (96.622) Mem 16699MB [2024-08-11 07:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.697 Acc@5 96.587 [2024-08-11 07:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 07:51:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.899 (0.899) Loss 0.4890 (0.4890) Acc@1 89.258 (89.258) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 07:51:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.189) Loss 0.7852 (0.5965) Acc@1 81.201 (87.243) Acc@5 96.387 (97.923) Mem 16699MB [2024-08-11 07:51:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.154) Loss 0.8706 (0.7034) Acc@1 80.176 (84.501) Acc@5 95.703 (96.952) Mem 16699MB [2024-08-11 07:51:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.231 Acc@5 96.905 [2024-08-11 07:51:32 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 07:51:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][0/625] eta 0:13:05 lr 0.000137 wd 0.0500 time 1.2575 (1.2575) data time 0.7756 (0.7756) model time 0.0000 (0.0000) loss 2.6650 (2.6650) grad_norm 2.0503 (2.0503) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][10/625] eta 0:05:21 lr 0.000137 wd 0.0500 time 0.4499 (0.5220) data time 0.0007 (0.0712) model time 0.0000 (0.0000) loss 1.9671 (2.1309) grad_norm 2.1164 (2.5220) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 07:51:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][20/625] eta 0:04:54 lr 0.000137 wd 0.0500 time 0.4467 (0.4866) data time 0.0006 (0.0377) model time 0.0000 (0.0000) loss 2.9988 (2.3751) grad_norm 6.0492 (2.8527) loss_scale 128.0000 (94.4762) mem 16699MB [2024-08-11 07:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][30/625] eta 0:04:41 lr 0.000137 wd 0.0500 time 0.4488 (0.4737) data time 0.0008 (0.0258) model time 0.0000 (0.0000) loss 2.8277 (2.4386) grad_norm 2.7278 (2.7201) loss_scale 128.0000 (105.2903) mem 16699MB [2024-08-11 07:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][40/625] eta 0:04:33 lr 0.000137 wd 0.0500 time 0.4460 (0.4668) data time 0.0008 (0.0197) model time 0.0000 (0.0000) loss 3.0253 (2.5172) grad_norm 2.0299 (2.7189) loss_scale 128.0000 (110.8293) mem 16699MB [2024-08-11 07:51:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][50/625] eta 0:04:26 lr 0.000137 wd 0.0500 time 0.4461 (0.4626) data time 0.0009 (0.0160) model time 0.0000 (0.0000) loss 2.7450 (2.5516) grad_norm 2.8336 (2.7306) loss_scale 128.0000 (114.1961) mem 16699MB [2024-08-11 07:52:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][60/625] eta 0:04:19 lr 0.000137 wd 0.0500 time 0.4494 (0.4600) data time 0.0006 (0.0135) model time 0.4488 (0.4461) loss 2.4425 (2.5668) grad_norm 3.6489 (2.6850) loss_scale 128.0000 (116.4590) mem 16699MB [2024-08-11 07:52:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][70/625] eta 0:04:15 lr 0.000137 wd 0.0500 time 0.4501 (0.4605) data time 0.0007 (0.0118) model time 0.4493 (0.4543) loss 2.8054 (2.5899) grad_norm 3.6702 (2.8816) loss_scale 128.0000 (118.0845) mem 16699MB [2024-08-11 07:52:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][80/625] eta 0:04:10 lr 0.000137 wd 0.0500 time 0.4479 (0.4591) data time 0.0008 (0.0104) model time 0.4471 (0.4522) loss 2.6880 (2.5911) grad_norm 2.8083 (2.8357) loss_scale 128.0000 (119.3086) mem 16699MB [2024-08-11 07:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][90/625] eta 0:04:05 lr 0.000137 wd 0.0500 time 0.4453 (0.4580) data time 0.0007 (0.0094) model time 0.4447 (0.4512) loss 1.6211 (2.5667) grad_norm 2.1061 (2.8116) loss_scale 128.0000 (120.2637) mem 16699MB [2024-08-11 07:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][100/625] eta 0:03:59 lr 0.000137 wd 0.0500 time 0.4425 (0.4569) data time 0.0009 (0.0085) model time 0.4416 (0.4503) loss 2.5126 (2.5735) grad_norm 2.0854 (2.7847) loss_scale 128.0000 (121.0297) mem 16699MB [2024-08-11 07:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][110/625] eta 0:03:54 lr 0.000137 wd 0.0500 time 0.4434 (0.4560) data time 0.0009 (0.0078) model time 0.4426 (0.4495) loss 2.7216 (2.5767) grad_norm 5.9277 (2.8246) loss_scale 128.0000 (121.6577) mem 16699MB [2024-08-11 07:52:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][120/625] eta 0:03:49 lr 0.000137 wd 0.0500 time 0.4442 (0.4553) data time 0.0008 (0.0072) model time 0.4434 (0.4491) loss 2.6653 (2.5724) grad_norm 1.8250 (2.7765) loss_scale 128.0000 (122.1818) mem 16699MB [2024-08-11 07:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][130/625] eta 0:03:44 lr 0.000137 wd 0.0500 time 0.4462 (0.4545) data time 0.0007 (0.0068) model time 0.4456 (0.4484) loss 2.7599 (2.5625) grad_norm 6.8467 (2.8126) loss_scale 128.0000 (122.6260) mem 16699MB [2024-08-11 07:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][140/625] eta 0:03:40 lr 0.000137 wd 0.0500 time 0.4461 (0.4541) data time 0.0009 (0.0064) model time 0.4452 (0.4483) loss 2.4318 (2.5545) grad_norm 2.8441 (3.0140) loss_scale 128.0000 (123.0071) mem 16699MB [2024-08-11 07:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][150/625] eta 0:03:36 lr 0.000136 wd 0.0500 time 0.4487 (0.4552) data time 0.0006 (0.0060) model time 0.4480 (0.4505) loss 3.0779 (2.5517) grad_norm 2.1749 (2.9781) loss_scale 128.0000 (123.3377) mem 16699MB [2024-08-11 07:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][160/625] eta 0:03:31 lr 0.000136 wd 0.0500 time 0.4465 (0.4547) data time 0.0006 (0.0057) model time 0.4459 (0.4502) loss 2.9904 (2.5595) grad_norm 7.0299 (2.9801) loss_scale 128.0000 (123.6273) mem 16699MB [2024-08-11 07:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][170/625] eta 0:03:26 lr 0.000136 wd 0.0500 time 0.4476 (0.4543) data time 0.0006 (0.0054) model time 0.4470 (0.4499) loss 2.9898 (2.5652) grad_norm 2.7222 (2.9774) loss_scale 128.0000 (123.8830) mem 16699MB [2024-08-11 07:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][180/625] eta 0:03:22 lr 0.000136 wd 0.0500 time 0.4510 (0.4540) data time 0.0008 (0.0052) model time 0.4501 (0.4497) loss 2.9170 (2.5592) grad_norm 3.2432 (2.9885) loss_scale 128.0000 (124.1105) mem 16699MB [2024-08-11 07:52:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][190/625] eta 0:03:17 lr 0.000136 wd 0.0500 time 0.4446 (0.4537) data time 0.0009 (0.0049) model time 0.4437 (0.4496) loss 2.6530 (2.5594) grad_norm 2.1016 (3.0621) loss_scale 128.0000 (124.3141) mem 16699MB [2024-08-11 07:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][200/625] eta 0:03:12 lr 0.000136 wd 0.0500 time 0.4451 (0.4533) data time 0.0006 (0.0047) model time 0.4445 (0.4493) loss 1.9619 (2.5567) grad_norm 2.7347 (3.0402) loss_scale 128.0000 (124.4975) mem 16699MB [2024-08-11 07:53:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][210/625] eta 0:03:08 lr 0.000136 wd 0.0500 time 0.4460 (0.4531) data time 0.0009 (0.0045) model time 0.4452 (0.4492) loss 3.0753 (2.5588) grad_norm 1.6289 (3.0380) loss_scale 128.0000 (124.6635) mem 16699MB [2024-08-11 07:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][220/625] eta 0:03:03 lr 0.000136 wd 0.0500 time 0.4520 (0.4529) data time 0.0008 (0.0044) model time 0.4512 (0.4491) loss 2.3670 (2.5527) grad_norm 1.9151 (3.0359) loss_scale 128.0000 (124.8145) mem 16699MB [2024-08-11 07:53:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][230/625] eta 0:02:58 lr 0.000136 wd 0.0500 time 0.4448 (0.4527) data time 0.0007 (0.0042) model time 0.4441 (0.4490) loss 2.4582 (2.5546) grad_norm 1.6336 (3.0108) loss_scale 128.0000 (124.9524) mem 16699MB [2024-08-11 07:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][240/625] eta 0:02:54 lr 0.000136 wd 0.0500 time 0.4471 (0.4524) data time 0.0007 (0.0041) model time 0.4464 (0.4489) loss 2.5190 (2.5547) grad_norm 1.6722 (2.9884) loss_scale 128.0000 (125.0788) mem 16699MB [2024-08-11 07:53:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][250/625] eta 0:02:49 lr 0.000136 wd 0.0500 time 0.4459 (0.4522) data time 0.0009 (0.0039) model time 0.4450 (0.4486) loss 2.8766 (2.5578) grad_norm 1.7709 (2.9697) loss_scale 128.0000 (125.1952) mem 16699MB [2024-08-11 07:53:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][260/625] eta 0:02:44 lr 0.000136 wd 0.0500 time 0.4442 (0.4520) data time 0.0008 (0.0038) model time 0.4434 (0.4485) loss 3.0054 (2.5592) grad_norm 2.2134 (2.9530) loss_scale 128.0000 (125.3027) mem 16699MB [2024-08-11 07:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][270/625] eta 0:02:40 lr 0.000136 wd 0.0500 time 0.4472 (0.4518) data time 0.0007 (0.0037) model time 0.4465 (0.4484) loss 2.1319 (2.5528) grad_norm 1.8783 (2.9314) loss_scale 128.0000 (125.4022) mem 16699MB [2024-08-11 07:53:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][280/625] eta 0:02:35 lr 0.000136 wd 0.0500 time 0.4500 (0.4517) data time 0.0007 (0.0036) model time 0.4494 (0.4484) loss 3.0569 (2.5555) grad_norm 2.8243 (2.9075) loss_scale 128.0000 (125.4947) mem 16699MB [2024-08-11 07:53:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][290/625] eta 0:02:31 lr 0.000136 wd 0.0500 time 0.4483 (0.4516) data time 0.0006 (0.0035) model time 0.4477 (0.4484) loss 2.9548 (2.5595) grad_norm 2.3845 (2.8809) loss_scale 128.0000 (125.5808) mem 16699MB [2024-08-11 07:53:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][300/625] eta 0:02:26 lr 0.000136 wd 0.0500 time 0.4496 (0.4515) data time 0.0009 (0.0034) model time 0.4487 (0.4484) loss 2.7265 (2.5640) grad_norm 3.5567 (2.9853) loss_scale 128.0000 (125.6611) mem 16699MB [2024-08-11 07:53:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][310/625] eta 0:02:22 lr 0.000135 wd 0.0500 time 0.4485 (0.4514) data time 0.0008 (0.0033) model time 0.4477 (0.4484) loss 3.0738 (2.5655) grad_norm 3.7759 (2.9618) loss_scale 128.0000 (125.7363) mem 16699MB [2024-08-11 07:53:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][320/625] eta 0:02:17 lr 0.000135 wd 0.0500 time 0.4493 (0.4513) data time 0.0006 (0.0033) model time 0.4487 (0.4483) loss 2.2806 (2.5646) grad_norm 3.7085 (2.9397) loss_scale 128.0000 (125.8069) mem 16699MB [2024-08-11 07:54:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][330/625] eta 0:02:13 lr 0.000135 wd 0.0500 time 0.4525 (0.4512) data time 0.0006 (0.0032) model time 0.4519 (0.4483) loss 3.4111 (2.5606) grad_norm 2.3933 (2.9300) loss_scale 128.0000 (125.8731) mem 16699MB [2024-08-11 07:54:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][340/625] eta 0:02:08 lr 0.000135 wd 0.0500 time 0.4483 (0.4511) data time 0.0008 (0.0031) model time 0.4475 (0.4482) loss 2.7477 (2.5645) grad_norm 2.3952 (2.9314) loss_scale 128.0000 (125.9355) mem 16699MB [2024-08-11 07:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][350/625] eta 0:02:04 lr 0.000135 wd 0.0500 time 0.4486 (0.4510) data time 0.0007 (0.0031) model time 0.4479 (0.4482) loss 2.6953 (2.5644) grad_norm 2.1665 (2.9416) loss_scale 128.0000 (125.9943) mem 16699MB [2024-08-11 07:54:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][360/625] eta 0:01:59 lr 0.000135 wd 0.0500 time 0.4500 (0.4510) data time 0.0010 (0.0030) model time 0.4490 (0.4482) loss 1.9255 (2.5600) grad_norm 1.9355 (2.9344) loss_scale 128.0000 (126.0499) mem 16699MB [2024-08-11 07:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][370/625] eta 0:01:54 lr 0.000135 wd 0.0500 time 0.4492 (0.4510) data time 0.0009 (0.0029) model time 0.4484 (0.4483) loss 2.6724 (2.5599) grad_norm 4.6532 (2.9265) loss_scale 128.0000 (126.1024) mem 16699MB [2024-08-11 07:54:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][380/625] eta 0:01:50 lr 0.000135 wd 0.0500 time 0.4504 (0.4509) data time 0.0008 (0.0029) model time 0.4496 (0.4482) loss 2.6007 (2.5610) grad_norm 3.6588 (2.9578) loss_scale 128.0000 (126.1522) mem 16699MB [2024-08-11 07:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][390/625] eta 0:01:45 lr 0.000135 wd 0.0500 time 0.4475 (0.4508) data time 0.0009 (0.0028) model time 0.4466 (0.4482) loss 2.5041 (2.5661) grad_norm 2.1964 (2.9447) loss_scale 128.0000 (126.1995) mem 16699MB [2024-08-11 07:54:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][400/625] eta 0:01:41 lr 0.000135 wd 0.0500 time 0.5935 (0.4511) data time 0.0006 (0.0028) model time 0.5929 (0.4485) loss 3.1847 (2.5687) grad_norm 2.4864 (2.9536) loss_scale 128.0000 (126.2444) mem 16699MB [2024-08-11 07:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][410/625] eta 0:01:36 lr 0.000135 wd 0.0500 time 0.4454 (0.4510) data time 0.0006 (0.0028) model time 0.4448 (0.4484) loss 2.7831 (2.5724) grad_norm 2.2382 (2.9570) loss_scale 128.0000 (126.2871) mem 16699MB [2024-08-11 07:54:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][420/625] eta 0:01:32 lr 0.000135 wd 0.0500 time 0.4450 (0.4509) data time 0.0008 (0.0027) model time 0.4441 (0.4484) loss 2.3717 (2.5661) grad_norm 1.8507 (2.9523) loss_scale 128.0000 (126.3278) mem 16699MB [2024-08-11 07:54:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][430/625] eta 0:01:27 lr 0.000135 wd 0.0500 time 0.4477 (0.4508) data time 0.0008 (0.0027) model time 0.4468 (0.4484) loss 2.2302 (2.5729) grad_norm 2.4815 (2.9510) loss_scale 128.0000 (126.3666) mem 16699MB [2024-08-11 07:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][440/625] eta 0:01:23 lr 0.000135 wd 0.0500 time 0.4464 (0.4508) data time 0.0007 (0.0026) model time 0.4458 (0.4483) loss 3.0358 (2.5745) grad_norm 2.2309 (2.9704) loss_scale 128.0000 (126.4036) mem 16699MB [2024-08-11 07:54:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][450/625] eta 0:01:18 lr 0.000135 wd 0.0500 time 0.4463 (0.4507) data time 0.0006 (0.0026) model time 0.4456 (0.4483) loss 2.8514 (2.5740) grad_norm 1.8645 (2.9717) loss_scale 128.0000 (126.4390) mem 16699MB [2024-08-11 07:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][460/625] eta 0:01:14 lr 0.000134 wd 0.0500 time 0.4456 (0.4506) data time 0.0009 (0.0026) model time 0.4447 (0.4482) loss 2.7086 (2.5717) grad_norm 2.5196 (2.9654) loss_scale 128.0000 (126.4729) mem 16699MB [2024-08-11 07:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][470/625] eta 0:01:09 lr 0.000134 wd 0.0500 time 0.4433 (0.4505) data time 0.0009 (0.0025) model time 0.4424 (0.4482) loss 1.9711 (2.5712) grad_norm 6.8974 (2.9765) loss_scale 128.0000 (126.5053) mem 16699MB [2024-08-11 07:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][480/625] eta 0:01:05 lr 0.000134 wd 0.0500 time 0.4463 (0.4509) data time 0.0007 (0.0025) model time 0.4456 (0.4486) loss 2.8185 (2.5753) grad_norm 1.7504 (2.9733) loss_scale 128.0000 (126.5364) mem 16699MB [2024-08-11 07:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][490/625] eta 0:01:00 lr 0.000134 wd 0.0500 time 0.4466 (0.4508) data time 0.0008 (0.0025) model time 0.4459 (0.4486) loss 1.6390 (2.5750) grad_norm 1.8840 (2.9709) loss_scale 128.0000 (126.5662) mem 16699MB [2024-08-11 07:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][500/625] eta 0:00:56 lr 0.000134 wd 0.0500 time 0.4524 (0.4508) data time 0.0008 (0.0024) model time 0.4516 (0.4485) loss 1.7904 (2.5750) grad_norm 2.4953 (2.9614) loss_scale 128.0000 (126.5948) mem 16699MB [2024-08-11 07:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][510/625] eta 0:00:51 lr 0.000134 wd 0.0500 time 0.4504 (0.4508) data time 0.0006 (0.0024) model time 0.4497 (0.4486) loss 2.4247 (2.5754) grad_norm 1.9130 (2.9518) loss_scale 128.0000 (126.6223) mem 16699MB [2024-08-11 07:55:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][520/625] eta 0:00:47 lr 0.000134 wd 0.0500 time 0.4497 (0.4507) data time 0.0007 (0.0024) model time 0.4490 (0.4486) loss 3.2895 (2.5761) grad_norm 1.9082 (2.9374) loss_scale 128.0000 (126.6488) mem 16699MB [2024-08-11 07:55:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][530/625] eta 0:00:42 lr 0.000134 wd 0.0500 time 0.4502 (0.4507) data time 0.0006 (0.0023) model time 0.4496 (0.4486) loss 2.6498 (2.5724) grad_norm 1.8835 (2.9259) loss_scale 128.0000 (126.6742) mem 16699MB [2024-08-11 07:55:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][540/625] eta 0:00:38 lr 0.000134 wd 0.0500 time 0.4517 (0.4507) data time 0.0009 (0.0023) model time 0.4508 (0.4486) loss 2.7671 (2.5711) grad_norm 3.6075 (2.9254) loss_scale 128.0000 (126.6987) mem 16699MB [2024-08-11 07:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][550/625] eta 0:00:33 lr 0.000134 wd 0.0500 time 0.4495 (0.4506) data time 0.0006 (0.0023) model time 0.4489 (0.4485) loss 1.7462 (2.5714) grad_norm 1.8931 (2.9677) loss_scale 128.0000 (126.7223) mem 16699MB [2024-08-11 07:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][560/625] eta 0:00:29 lr 0.000134 wd 0.0500 time 0.4447 (0.4506) data time 0.0008 (0.0023) model time 0.4439 (0.4485) loss 3.0137 (2.5717) grad_norm 2.1947 (2.9624) loss_scale 128.0000 (126.7451) mem 16699MB [2024-08-11 07:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][570/625] eta 0:00:24 lr 0.000134 wd 0.0500 time 0.4494 (0.4505) data time 0.0008 (0.0022) model time 0.4487 (0.4485) loss 2.7807 (2.5746) grad_norm 2.3055 (2.9554) loss_scale 128.0000 (126.7671) mem 16699MB [2024-08-11 07:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][580/625] eta 0:00:20 lr 0.000134 wd 0.0500 time 0.4503 (0.4505) data time 0.0009 (0.0022) model time 0.4494 (0.4485) loss 2.2376 (2.5720) grad_norm 1.9375 (2.9406) loss_scale 128.0000 (126.7883) mem 16699MB [2024-08-11 07:55:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][590/625] eta 0:00:15 lr 0.000134 wd 0.0500 time 0.3885 (0.4505) data time 0.0006 (0.0022) model time 0.3878 (0.4485) loss 2.5551 (2.5683) grad_norm 2.4457 (2.9253) loss_scale 128.0000 (126.8088) mem 16699MB [2024-08-11 07:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][600/625] eta 0:00:11 lr 0.000134 wd 0.0500 time 0.4450 (0.4505) data time 0.0006 (0.0022) model time 0.4444 (0.4485) loss 2.2018 (2.5674) grad_norm 1.8959 (2.9225) loss_scale 128.0000 (126.8286) mem 16699MB [2024-08-11 07:56:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][610/625] eta 0:00:06 lr 0.000133 wd 0.0500 time 0.4439 (0.4505) data time 0.0004 (0.0021) model time 0.4435 (0.4485) loss 2.7206 (2.5660) grad_norm 2.1517 (2.9092) loss_scale 128.0000 (126.8478) mem 16699MB [2024-08-11 07:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [241/300][620/625] eta 0:00:02 lr 0.000133 wd 0.0500 time 0.4414 (0.4503) data time 0.0004 (0.0021) model time 0.4410 (0.4484) loss 3.0132 (2.5712) grad_norm 1.9613 (2.9203) loss_scale 128.0000 (126.8663) mem 16699MB [2024-08-11 07:56:13 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 241 training takes 0:04:41 [2024-08-11 07:56:13 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 07:56:15 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 07:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.452 (0.452) Loss 0.5044 (0.5044) Acc@1 89.307 (89.307) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 07:56:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.149) Loss 0.8354 (0.6161) Acc@1 80.811 (86.923) Acc@5 96.289 (97.838) Mem 16699MB [2024-08-11 07:56:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9297 (0.7332) Acc@1 78.857 (83.980) Acc@5 94.971 (96.691) Mem 16699MB [2024-08-11 07:56:18 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.689 Acc@5 96.649 [2024-08-11 07:56:18 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 07:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.780 (0.780) Loss 0.4900 (0.4900) Acc@1 89.111 (89.111) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 07:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.181) Loss 0.7871 (0.5972) Acc@1 81.201 (87.251) Acc@5 96.338 (97.918) Mem 16699MB [2024-08-11 07:56:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.149) Loss 0.8706 (0.7044) Acc@1 80.078 (84.496) Acc@5 95.654 (96.942) Mem 16699MB [2024-08-11 07:56:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.225 Acc@5 96.897 [2024-08-11 07:56:21 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 07:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][0/625] eta 0:12:39 lr 0.000133 wd 0.0500 time 1.2158 (1.2158) data time 0.7725 (0.7725) model time 0.0000 (0.0000) loss 2.6837 (2.6837) grad_norm 1.9761 (1.9761) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][10/625] eta 0:05:18 lr 0.000133 wd 0.0500 time 0.4480 (0.5172) data time 0.0006 (0.0710) model time 0.0000 (0.0000) loss 2.7816 (2.6336) grad_norm 2.2028 (2.4430) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][20/625] eta 0:04:52 lr 0.000133 wd 0.0500 time 0.4441 (0.4841) data time 0.0007 (0.0376) model time 0.0000 (0.0000) loss 2.4898 (2.4822) grad_norm 14.1175 (2.9953) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][30/625] eta 0:04:41 lr 0.000133 wd 0.0500 time 0.4494 (0.4724) data time 0.0006 (0.0257) model time 0.0000 (0.0000) loss 2.2547 (2.5220) grad_norm 2.6335 (2.7479) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][40/625] eta 0:04:32 lr 0.000133 wd 0.0500 time 0.4511 (0.4666) data time 0.0006 (0.0196) model time 0.0000 (0.0000) loss 2.4435 (2.5568) grad_norm 3.5442 (2.8094) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][50/625] eta 0:04:26 lr 0.000133 wd 0.0500 time 0.4486 (0.4628) data time 0.0007 (0.0159) model time 0.0000 (0.0000) loss 3.0146 (2.6063) grad_norm 1.7523 (2.6784) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][60/625] eta 0:04:19 lr 0.000133 wd 0.0500 time 0.4458 (0.4601) data time 0.0009 (0.0135) model time 0.4449 (0.4454) loss 2.7485 (2.5591) grad_norm 3.3592 (2.7268) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][70/625] eta 0:04:14 lr 0.000133 wd 0.0500 time 0.4467 (0.4583) data time 0.0008 (0.0117) model time 0.4459 (0.4462) loss 2.8255 (2.5492) grad_norm 2.3188 (2.8022) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][80/625] eta 0:04:10 lr 0.000133 wd 0.0500 time 0.4519 (0.4597) data time 0.0006 (0.0103) model time 0.4514 (0.4537) loss 2.9782 (2.5554) grad_norm 2.8417 (2.8725) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][90/625] eta 0:04:05 lr 0.000133 wd 0.0500 time 0.4473 (0.4585) data time 0.0008 (0.0093) model time 0.4465 (0.4523) loss 2.8100 (2.5860) grad_norm 2.6642 (2.8568) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][100/625] eta 0:04:00 lr 0.000133 wd 0.0500 time 0.4438 (0.4574) data time 0.0009 (0.0085) model time 0.4430 (0.4510) loss 2.6846 (2.5886) grad_norm 1.8312 (2.8412) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][110/625] eta 0:03:55 lr 0.000133 wd 0.0500 time 0.4490 (0.4577) data time 0.0006 (0.0078) model time 0.4484 (0.4525) loss 3.1690 (2.5870) grad_norm 2.4432 (2.8216) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][120/625] eta 0:03:50 lr 0.000133 wd 0.0500 time 0.4478 (0.4568) data time 0.0008 (0.0072) model time 0.4470 (0.4517) loss 2.2060 (2.5769) grad_norm 2.8009 (3.7050) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][130/625] eta 0:03:45 lr 0.000133 wd 0.0500 time 0.4485 (0.4561) data time 0.0008 (0.0067) model time 0.4476 (0.4511) loss 2.0181 (2.5562) grad_norm 2.1911 (3.6083) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][140/625] eta 0:03:40 lr 0.000132 wd 0.0500 time 0.4491 (0.4554) data time 0.0008 (0.0063) model time 0.4483 (0.4503) loss 2.0330 (2.5633) grad_norm 3.4538 (3.6215) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][150/625] eta 0:03:35 lr 0.000132 wd 0.0500 time 0.4467 (0.4547) data time 0.0007 (0.0059) model time 0.4459 (0.4498) loss 2.2054 (2.5451) grad_norm 2.1764 (3.5602) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][160/625] eta 0:03:31 lr 0.000132 wd 0.0500 time 0.4477 (0.4542) data time 0.0008 (0.0056) model time 0.4469 (0.4494) loss 2.2303 (2.5443) grad_norm 2.1128 (3.5112) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][170/625] eta 0:03:26 lr 0.000132 wd 0.0500 time 0.4429 (0.4538) data time 0.0008 (0.0053) model time 0.4421 (0.4492) loss 2.1964 (2.5409) grad_norm 2.2085 (3.4419) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][180/625] eta 0:03:21 lr 0.000132 wd 0.0500 time 0.4447 (0.4534) data time 0.0009 (0.0051) model time 0.4439 (0.4489) loss 2.3799 (2.5353) grad_norm 2.2929 (3.3702) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][190/625] eta 0:03:17 lr 0.000132 wd 0.0500 time 0.4477 (0.4533) data time 0.0009 (0.0048) model time 0.4469 (0.4491) loss 2.7566 (2.5317) grad_norm 2.9082 (3.3119) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][200/625] eta 0:03:12 lr 0.000132 wd 0.0500 time 0.4444 (0.4530) data time 0.0007 (0.0046) model time 0.4437 (0.4489) loss 1.8361 (2.5319) grad_norm 1.9150 (3.2967) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][210/625] eta 0:03:07 lr 0.000132 wd 0.0500 time 0.4443 (0.4526) data time 0.0008 (0.0045) model time 0.4435 (0.4486) loss 2.9138 (2.5283) grad_norm 20.7415 (3.3316) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][220/625] eta 0:03:03 lr 0.000132 wd 0.0500 time 0.4599 (0.4525) data time 0.0006 (0.0043) model time 0.4593 (0.4485) loss 2.7641 (2.5365) grad_norm 2.1145 (3.6762) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][230/625] eta 0:02:58 lr 0.000132 wd 0.0500 time 0.4474 (0.4522) data time 0.0008 (0.0042) model time 0.4466 (0.4484) loss 2.9922 (2.5387) grad_norm 2.5852 (3.6325) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][240/625] eta 0:02:54 lr 0.000132 wd 0.0500 time 0.4479 (0.4520) data time 0.0007 (0.0040) model time 0.4472 (0.4483) loss 2.0526 (2.5338) grad_norm 1.9563 (3.5757) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][250/625] eta 0:02:49 lr 0.000132 wd 0.0500 time 0.4459 (0.4518) data time 0.0008 (0.0039) model time 0.4451 (0.4482) loss 2.4954 (2.5398) grad_norm 2.2556 (3.5315) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][260/625] eta 0:02:44 lr 0.000132 wd 0.0500 time 0.4498 (0.4517) data time 0.0006 (0.0038) model time 0.4491 (0.4481) loss 2.2354 (2.5457) grad_norm 4.7039 (3.4850) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][270/625] eta 0:02:40 lr 0.000132 wd 0.0500 time 0.4500 (0.4519) data time 0.0006 (0.0037) model time 0.4494 (0.4486) loss 2.1217 (2.5449) grad_norm 2.1100 (3.4720) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][280/625] eta 0:02:35 lr 0.000132 wd 0.0500 time 0.4462 (0.4517) data time 0.0006 (0.0036) model time 0.4456 (0.4485) loss 2.8622 (2.5413) grad_norm 2.9816 (3.4626) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][290/625] eta 0:02:31 lr 0.000132 wd 0.0500 time 0.4456 (0.4515) data time 0.0006 (0.0035) model time 0.4450 (0.4483) loss 2.3654 (2.5372) grad_norm 2.0995 (3.4686) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][300/625] eta 0:02:26 lr 0.000131 wd 0.0500 time 0.4451 (0.4514) data time 0.0009 (0.0034) model time 0.4443 (0.4482) loss 2.6589 (2.5400) grad_norm 2.2433 (3.4250) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][310/625] eta 0:02:22 lr 0.000131 wd 0.0500 time 0.4515 (0.4513) data time 0.0006 (0.0033) model time 0.4509 (0.4482) loss 1.9423 (2.5311) grad_norm 2.8674 (3.4000) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][320/625] eta 0:02:17 lr 0.000131 wd 0.0500 time 0.4517 (0.4512) data time 0.0009 (0.0032) model time 0.4507 (0.4483) loss 3.0127 (2.5334) grad_norm 2.6938 (3.3675) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][330/625] eta 0:02:13 lr 0.000131 wd 0.0500 time 0.4456 (0.4512) data time 0.0009 (0.0031) model time 0.4447 (0.4483) loss 2.2225 (2.5323) grad_norm 1.8633 (3.3440) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][340/625] eta 0:02:08 lr 0.000131 wd 0.0500 time 0.4492 (0.4511) data time 0.0009 (0.0031) model time 0.4483 (0.4482) loss 2.0418 (2.5275) grad_norm 2.4519 (3.3270) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][350/625] eta 0:02:04 lr 0.000131 wd 0.0500 time 0.4476 (0.4509) data time 0.0008 (0.0030) model time 0.4467 (0.4481) loss 2.7607 (2.5274) grad_norm 2.5522 (3.2964) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][360/625] eta 0:01:59 lr 0.000131 wd 0.0500 time 0.4460 (0.4508) data time 0.0006 (0.0030) model time 0.4454 (0.4480) loss 2.6781 (2.5295) grad_norm 2.0852 (3.2690) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][370/625] eta 0:01:54 lr 0.000131 wd 0.0500 time 0.4484 (0.4507) data time 0.0009 (0.0029) model time 0.4475 (0.4480) loss 2.7596 (2.5268) grad_norm 8.6199 (3.2570) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][380/625] eta 0:01:50 lr 0.000131 wd 0.0500 time 0.4446 (0.4507) data time 0.0009 (0.0028) model time 0.4437 (0.4480) loss 2.3093 (2.5306) grad_norm 1.9206 (3.2541) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][390/625] eta 0:01:45 lr 0.000131 wd 0.0500 time 0.4477 (0.4506) data time 0.0009 (0.0028) model time 0.4468 (0.4480) loss 2.2745 (2.5276) grad_norm 2.2899 (3.2433) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][400/625] eta 0:01:41 lr 0.000131 wd 0.0500 time 0.4449 (0.4506) data time 0.0008 (0.0027) model time 0.4441 (0.4480) loss 1.6840 (2.5265) grad_norm 2.5490 (3.2828) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][410/625] eta 0:01:36 lr 0.000131 wd 0.0500 time 0.4497 (0.4505) data time 0.0010 (0.0027) model time 0.4487 (0.4480) loss 2.7690 (2.5257) grad_norm 1.7234 (3.2584) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][420/625] eta 0:01:32 lr 0.000131 wd 0.0500 time 0.4457 (0.4504) data time 0.0009 (0.0026) model time 0.4448 (0.4479) loss 3.0450 (2.5263) grad_norm 1.9695 (3.2346) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][430/625] eta 0:01:27 lr 0.000131 wd 0.0500 time 0.4483 (0.4503) data time 0.0008 (0.0026) model time 0.4475 (0.4478) loss 2.6020 (2.5317) grad_norm 2.3685 (3.2197) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][440/625] eta 0:01:23 lr 0.000131 wd 0.0500 time 0.4443 (0.4506) data time 0.0006 (0.0026) model time 0.4436 (0.4482) loss 2.4852 (2.5324) grad_norm 1.9693 (3.1977) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][450/625] eta 0:01:18 lr 0.000131 wd 0.0500 time 0.4495 (0.4505) data time 0.0006 (0.0025) model time 0.4489 (0.4482) loss 1.9971 (2.5349) grad_norm 1.7529 (3.3744) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][460/625] eta 0:01:14 lr 0.000130 wd 0.0500 time 0.4541 (0.4505) data time 0.0006 (0.0025) model time 0.4535 (0.4482) loss 2.0310 (2.5364) grad_norm 2.5423 (3.3566) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][470/625] eta 0:01:09 lr 0.000130 wd 0.0500 time 0.4463 (0.4505) data time 0.0006 (0.0025) model time 0.4457 (0.4482) loss 2.0798 (2.5301) grad_norm 1.6651 (3.3462) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 07:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][480/625] eta 0:01:05 lr 0.000130 wd 0.0500 time 0.4520 (0.4505) data time 0.0006 (0.0024) model time 0.4514 (0.4482) loss 1.7079 (2.5348) grad_norm 2.0521 (3.3297) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][490/625] eta 0:01:00 lr 0.000130 wd 0.0500 time 0.4458 (0.4508) data time 0.0007 (0.0024) model time 0.4452 (0.4486) loss 2.4604 (2.5346) grad_norm 2.0100 (3.3101) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][500/625] eta 0:00:56 lr 0.000130 wd 0.0500 time 0.4442 (0.4507) data time 0.0006 (0.0024) model time 0.4436 (0.4485) loss 2.5272 (2.5370) grad_norm 2.3362 (3.2857) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][510/625] eta 0:00:51 lr 0.000130 wd 0.0500 time 0.4467 (0.4507) data time 0.0008 (0.0023) model time 0.4459 (0.4485) loss 3.0866 (2.5357) grad_norm 2.4189 (3.4180) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][520/625] eta 0:00:47 lr 0.000130 wd 0.0500 time 0.4543 (0.4506) data time 0.0007 (0.0023) model time 0.4536 (0.4485) loss 2.0308 (2.5337) grad_norm 1.6903 (3.3967) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][530/625] eta 0:00:42 lr 0.000130 wd 0.0500 time 0.4482 (0.4506) data time 0.0007 (0.0023) model time 0.4475 (0.4485) loss 2.9780 (2.5328) grad_norm 2.4792 (3.3998) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][540/625] eta 0:00:38 lr 0.000130 wd 0.0500 time 0.4490 (0.4506) data time 0.0006 (0.0022) model time 0.4485 (0.4485) loss 2.9369 (2.5331) grad_norm 2.2067 (3.3863) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][550/625] eta 0:00:33 lr 0.000130 wd 0.0500 time 0.4536 (0.4506) data time 0.0009 (0.0022) model time 0.4527 (0.4485) loss 2.7674 (2.5317) grad_norm 1.8549 (3.3719) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][560/625] eta 0:00:29 lr 0.000130 wd 0.0500 time 0.4433 (0.4505) data time 0.0007 (0.0022) model time 0.4426 (0.4485) loss 3.0744 (2.5336) grad_norm 2.6547 (3.3622) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][570/625] eta 0:00:24 lr 0.000130 wd 0.0500 time 0.4483 (0.4505) data time 0.0008 (0.0022) model time 0.4475 (0.4484) loss 2.3442 (2.5303) grad_norm 3.2878 (3.3518) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][580/625] eta 0:00:20 lr 0.000130 wd 0.0500 time 0.4656 (0.4504) data time 0.0007 (0.0021) model time 0.4650 (0.4484) loss 2.5760 (2.5309) grad_norm 1.9769 (3.3395) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][590/625] eta 0:00:15 lr 0.000130 wd 0.0500 time 0.4486 (0.4504) data time 0.0006 (0.0021) model time 0.4479 (0.4484) loss 2.7090 (2.5301) grad_norm 1.8777 (3.3209) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][600/625] eta 0:00:11 lr 0.000130 wd 0.0500 time 0.4468 (0.4504) data time 0.0008 (0.0021) model time 0.4460 (0.4484) loss 2.3870 (2.5310) grad_norm 2.1764 (3.3084) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][610/625] eta 0:00:06 lr 0.000129 wd 0.0500 time 0.4425 (0.4503) data time 0.0006 (0.0021) model time 0.4420 (0.4484) loss 3.0370 (2.5343) grad_norm 3.7830 (3.2998) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [242/300][620/625] eta 0:00:02 lr 0.000129 wd 0.0500 time 0.4449 (0.4503) data time 0.0006 (0.0021) model time 0.4443 (0.4483) loss 2.6014 (2.5353) grad_norm 2.4600 (3.2886) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 242 training takes 0:04:41 [2024-08-11 08:01:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:01:04 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5146 (0.5146) Acc@1 89.014 (89.014) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 08:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8315 (0.6205) Acc@1 81.152 (86.874) Acc@5 96.143 (97.736) Mem 16699MB [2024-08-11 08:01:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9082 (0.7360) Acc@1 78.955 (84.036) Acc@5 95.410 (96.670) Mem 16699MB [2024-08-11 08:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.743 Acc@5 96.647 [2024-08-11 08:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.845 (0.845) Loss 0.4902 (0.4902) Acc@1 89.111 (89.111) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 08:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.186) Loss 0.7891 (0.5976) Acc@1 81.055 (87.216) Acc@5 96.484 (97.905) Mem 16699MB [2024-08-11 08:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8711 (0.7051) Acc@1 79.932 (84.494) Acc@5 95.508 (96.935) Mem 16699MB [2024-08-11 08:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.219 Acc@5 96.899 [2024-08-11 08:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][0/625] eta 0:12:32 lr 0.000129 wd 0.0500 time 1.2037 (1.2037) data time 0.4480 (0.4480) model time 0.0000 (0.0000) loss 2.7170 (2.7170) grad_norm 3.2064 (3.2064) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][10/625] eta 0:05:27 lr 0.000129 wd 0.0500 time 0.4468 (0.5332) data time 0.0008 (0.0416) model time 0.0000 (0.0000) loss 2.3430 (2.5991) grad_norm 3.0970 (2.5724) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][20/625] eta 0:04:57 lr 0.000129 wd 0.0500 time 0.4461 (0.4921) data time 0.0008 (0.0221) model time 0.0000 (0.0000) loss 2.7101 (2.4546) grad_norm 2.3704 (2.3373) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][30/625] eta 0:04:44 lr 0.000129 wd 0.0500 time 0.4510 (0.4776) data time 0.0007 (0.0153) model time 0.0000 (0.0000) loss 2.4715 (2.5297) grad_norm 2.5841 (2.3967) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][40/625] eta 0:04:35 lr 0.000129 wd 0.0500 time 0.4538 (0.4708) data time 0.0006 (0.0118) model time 0.0000 (0.0000) loss 1.7223 (2.4498) grad_norm 2.1702 (2.4367) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][50/625] eta 0:04:28 lr 0.000129 wd 0.0500 time 0.4465 (0.4663) data time 0.0006 (0.0096) model time 0.0000 (0.0000) loss 2.9161 (2.4455) grad_norm 1.9983 (2.3424) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][60/625] eta 0:04:21 lr 0.000129 wd 0.0500 time 0.4495 (0.4634) data time 0.0008 (0.0082) model time 0.4487 (0.4476) loss 2.8657 (2.4473) grad_norm 2.6160 (2.3785) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][70/625] eta 0:04:15 lr 0.000129 wd 0.0500 time 0.4476 (0.4612) data time 0.0009 (0.0071) model time 0.4467 (0.4473) loss 2.2135 (2.4782) grad_norm 3.4665 (2.4031) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][80/625] eta 0:04:10 lr 0.000129 wd 0.0500 time 0.4483 (0.4596) data time 0.0007 (0.0064) model time 0.4477 (0.4472) loss 2.2579 (2.4902) grad_norm 2.9986 (2.4188) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][90/625] eta 0:04:05 lr 0.000129 wd 0.0500 time 0.4489 (0.4582) data time 0.0009 (0.0058) model time 0.4480 (0.4469) loss 3.0181 (2.5152) grad_norm 1.9305 (2.3986) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][100/625] eta 0:04:00 lr 0.000129 wd 0.0500 time 0.4488 (0.4572) data time 0.0009 (0.0053) model time 0.4479 (0.4470) loss 2.5868 (2.5288) grad_norm 1.7064 (2.4102) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][110/625] eta 0:03:55 lr 0.000129 wd 0.0500 time 0.4496 (0.4565) data time 0.0006 (0.0049) model time 0.4490 (0.4473) loss 2.7914 (2.5146) grad_norm 1.7218 (2.3634) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][120/625] eta 0:03:50 lr 0.000129 wd 0.0500 time 0.4437 (0.4560) data time 0.0009 (0.0045) model time 0.4428 (0.4476) loss 3.1050 (2.5236) grad_norm 3.2023 (2.5373) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][130/625] eta 0:03:45 lr 0.000129 wd 0.0500 time 0.4484 (0.4554) data time 0.0009 (0.0043) model time 0.4475 (0.4475) loss 2.6797 (2.5387) grad_norm 2.4793 (2.5254) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][140/625] eta 0:03:40 lr 0.000129 wd 0.0500 time 0.4437 (0.4548) data time 0.0007 (0.0040) model time 0.4430 (0.4475) loss 2.9118 (2.5588) grad_norm 2.3006 (2.5061) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][150/625] eta 0:03:35 lr 0.000128 wd 0.0500 time 0.4449 (0.4543) data time 0.0007 (0.0038) model time 0.4443 (0.4473) loss 2.8552 (2.5638) grad_norm 2.9291 (2.5067) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][160/625] eta 0:03:31 lr 0.000128 wd 0.0500 time 0.4464 (0.4550) data time 0.0006 (0.0036) model time 0.4458 (0.4489) loss 2.3554 (2.5572) grad_norm 3.1867 (2.5023) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][170/625] eta 0:03:26 lr 0.000128 wd 0.0500 time 0.4479 (0.4547) data time 0.0006 (0.0035) model time 0.4473 (0.4489) loss 3.0793 (2.5624) grad_norm 3.3959 (2.4994) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][180/625] eta 0:03:22 lr 0.000128 wd 0.0500 time 0.4484 (0.4544) data time 0.0009 (0.0033) model time 0.4475 (0.4488) loss 2.9395 (2.5547) grad_norm 2.2502 (2.4933) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][190/625] eta 0:03:17 lr 0.000128 wd 0.0500 time 0.4481 (0.4541) data time 0.0008 (0.0032) model time 0.4473 (0.4488) loss 2.2361 (2.5600) grad_norm 1.6355 (2.4960) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][200/625] eta 0:03:12 lr 0.000128 wd 0.0500 time 0.4511 (0.4539) data time 0.0009 (0.0031) model time 0.4502 (0.4488) loss 2.7179 (2.5609) grad_norm 1.9896 (2.4881) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][210/625] eta 0:03:08 lr 0.000128 wd 0.0500 time 0.4514 (0.4537) data time 0.0008 (0.0030) model time 0.4506 (0.4489) loss 2.8352 (2.5601) grad_norm 3.0010 (2.4836) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][220/625] eta 0:03:03 lr 0.000128 wd 0.0500 time 0.4508 (0.4536) data time 0.0009 (0.0029) model time 0.4499 (0.4489) loss 2.6635 (2.5640) grad_norm 2.1205 (2.4891) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:02:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][230/625] eta 0:02:59 lr 0.000128 wd 0.0500 time 0.4552 (0.4534) data time 0.0006 (0.0028) model time 0.4546 (0.4489) loss 2.6913 (2.5570) grad_norm 2.3847 (2.5036) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][240/625] eta 0:02:54 lr 0.000128 wd 0.0500 time 0.4495 (0.4532) data time 0.0009 (0.0027) model time 0.4486 (0.4488) loss 2.7985 (2.5531) grad_norm 2.7087 (2.5037) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][250/625] eta 0:02:49 lr 0.000128 wd 0.0500 time 0.4503 (0.4530) data time 0.0006 (0.0026) model time 0.4497 (0.4487) loss 2.6389 (2.5530) grad_norm 2.6052 (2.4926) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][260/625] eta 0:02:45 lr 0.000128 wd 0.0500 time 0.4523 (0.4529) data time 0.0007 (0.0026) model time 0.4516 (0.4487) loss 2.8820 (2.5496) grad_norm 1.7855 (2.4832) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][270/625] eta 0:02:40 lr 0.000128 wd 0.0500 time 0.4504 (0.4528) data time 0.0007 (0.0025) model time 0.4497 (0.4488) loss 2.7401 (2.5544) grad_norm 2.6882 (2.5118) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][280/625] eta 0:02:36 lr 0.000128 wd 0.0500 time 0.4519 (0.4527) data time 0.0009 (0.0024) model time 0.4510 (0.4488) loss 2.8871 (2.5524) grad_norm 1.8787 (2.4985) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][290/625] eta 0:02:31 lr 0.000128 wd 0.0500 time 0.4430 (0.4525) data time 0.0007 (0.0024) model time 0.4423 (0.4487) loss 2.0428 (2.5490) grad_norm 2.4321 (2.5066) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][300/625] eta 0:02:27 lr 0.000127 wd 0.0500 time 0.4439 (0.4528) data time 0.0007 (0.0023) model time 0.4432 (0.4491) loss 2.0345 (2.5476) grad_norm 2.1302 (2.5062) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][310/625] eta 0:02:22 lr 0.000127 wd 0.0500 time 0.4449 (0.4525) data time 0.0008 (0.0023) model time 0.4441 (0.4489) loss 2.7960 (2.5469) grad_norm 2.5264 (2.5239) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][320/625] eta 0:02:17 lr 0.000127 wd 0.0500 time 0.4472 (0.4523) data time 0.0009 (0.0023) model time 0.4463 (0.4488) loss 2.7354 (2.5471) grad_norm 2.0309 (2.5140) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][330/625] eta 0:02:13 lr 0.000127 wd 0.0500 time 0.4434 (0.4521) data time 0.0006 (0.0022) model time 0.4428 (0.4487) loss 3.3669 (2.5428) grad_norm 1.7721 (2.5013) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][340/625] eta 0:02:08 lr 0.000127 wd 0.0500 time 0.4471 (0.4520) data time 0.0006 (0.0022) model time 0.4465 (0.4486) loss 2.3736 (2.5393) grad_norm 2.6102 (2.5129) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][350/625] eta 0:02:04 lr 0.000127 wd 0.0500 time 0.4472 (0.4518) data time 0.0007 (0.0021) model time 0.4465 (0.4485) loss 2.6669 (2.5449) grad_norm 2.5816 (2.5154) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][360/625] eta 0:01:59 lr 0.000127 wd 0.0500 time 0.4448 (0.4517) data time 0.0008 (0.0021) model time 0.4440 (0.4484) loss 2.8041 (2.5434) grad_norm 3.8779 (2.5103) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][370/625] eta 0:01:55 lr 0.000127 wd 0.0500 time 0.4431 (0.4515) data time 0.0007 (0.0021) model time 0.4424 (0.4482) loss 3.3430 (2.5428) grad_norm 1.9984 (2.5049) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][380/625] eta 0:01:50 lr 0.000127 wd 0.0500 time 0.4444 (0.4513) data time 0.0007 (0.0020) model time 0.4437 (0.4481) loss 2.8196 (2.5428) grad_norm 2.7245 (2.5134) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][390/625] eta 0:01:46 lr 0.000127 wd 0.0500 time 0.4456 (0.4512) data time 0.0006 (0.0020) model time 0.4450 (0.4480) loss 1.6732 (2.5401) grad_norm 1.9490 (2.5176) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][400/625] eta 0:01:41 lr 0.000127 wd 0.0500 time 0.4496 (0.4510) data time 0.0006 (0.0020) model time 0.4490 (0.4480) loss 2.6162 (2.5379) grad_norm 2.5155 (2.5422) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][410/625] eta 0:01:36 lr 0.000127 wd 0.0500 time 0.4470 (0.4510) data time 0.0009 (0.0019) model time 0.4461 (0.4479) loss 2.6113 (2.5425) grad_norm 2.0624 (2.5896) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][420/625] eta 0:01:32 lr 0.000127 wd 0.0500 time 0.4440 (0.4509) data time 0.0009 (0.0019) model time 0.4431 (0.4479) loss 3.0969 (2.5459) grad_norm 1.7421 (2.6223) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][430/625] eta 0:01:27 lr 0.000127 wd 0.0500 time 0.4487 (0.4508) data time 0.0008 (0.0019) model time 0.4479 (0.4479) loss 2.6821 (2.5464) grad_norm 1.9830 (2.6232) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][440/625] eta 0:01:23 lr 0.000127 wd 0.0500 time 0.4459 (0.4507) data time 0.0006 (0.0019) model time 0.4453 (0.4478) loss 2.5672 (2.5455) grad_norm 1.7060 (2.6100) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][450/625] eta 0:01:18 lr 0.000127 wd 0.0500 time 0.4415 (0.4506) data time 0.0008 (0.0019) model time 0.4407 (0.4478) loss 2.4930 (2.5464) grad_norm 2.6959 (2.6510) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][460/625] eta 0:01:14 lr 0.000126 wd 0.0500 time 0.4428 (0.4505) data time 0.0009 (0.0018) model time 0.4419 (0.4477) loss 2.4756 (2.5484) grad_norm 1.7166 (2.6507) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][470/625] eta 0:01:09 lr 0.000126 wd 0.0500 time 0.4469 (0.4504) data time 0.0006 (0.0018) model time 0.4463 (0.4476) loss 2.9781 (2.5524) grad_norm 1.9853 (2.6428) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][480/625] eta 0:01:05 lr 0.000126 wd 0.0500 time 0.4465 (0.4503) data time 0.0007 (0.0018) model time 0.4458 (0.4476) loss 1.7235 (2.5512) grad_norm 2.2989 (2.6367) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][490/625] eta 0:01:00 lr 0.000126 wd 0.0500 time 0.4489 (0.4508) data time 0.0009 (0.0018) model time 0.4480 (0.4481) loss 2.7479 (2.5516) grad_norm 2.6876 (2.6401) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][500/625] eta 0:00:56 lr 0.000126 wd 0.0500 time 0.4463 (0.4507) data time 0.0008 (0.0017) model time 0.4455 (0.4481) loss 2.6452 (2.5549) grad_norm 1.9704 (2.6315) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][510/625] eta 0:00:51 lr 0.000126 wd 0.0500 time 0.4441 (0.4506) data time 0.0007 (0.0017) model time 0.4434 (0.4480) loss 1.9103 (2.5560) grad_norm 2.5107 (2.6233) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][520/625] eta 0:00:47 lr 0.000126 wd 0.0500 time 0.4444 (0.4505) data time 0.0009 (0.0017) model time 0.4436 (0.4479) loss 2.0050 (2.5555) grad_norm 2.0133 (2.6284) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][530/625] eta 0:00:42 lr 0.000126 wd 0.0500 time 0.4452 (0.4504) data time 0.0006 (0.0017) model time 0.4446 (0.4479) loss 3.2846 (2.5522) grad_norm 3.3504 (2.6249) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][540/625] eta 0:00:38 lr 0.000126 wd 0.0500 time 0.4453 (0.4503) data time 0.0009 (0.0017) model time 0.4444 (0.4478) loss 2.9059 (2.5578) grad_norm 1.3063 (2.6237) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][550/625] eta 0:00:33 lr 0.000126 wd 0.0500 time 0.4449 (0.4503) data time 0.0007 (0.0017) model time 0.4442 (0.4478) loss 2.7778 (2.5572) grad_norm 2.0501 (2.6175) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][560/625] eta 0:00:29 lr 0.000126 wd 0.0500 time 0.4495 (0.4502) data time 0.0008 (0.0017) model time 0.4487 (0.4478) loss 2.8856 (2.5584) grad_norm 3.4485 (2.6137) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][570/625] eta 0:00:24 lr 0.000126 wd 0.0500 time 0.4476 (0.4502) data time 0.0007 (0.0016) model time 0.4469 (0.4477) loss 2.5497 (2.5614) grad_norm 2.7709 (2.6102) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][580/625] eta 0:00:20 lr 0.000126 wd 0.0500 time 0.4450 (0.4501) data time 0.0008 (0.0016) model time 0.4442 (0.4477) loss 2.7336 (2.5615) grad_norm 1.8129 (2.6041) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][590/625] eta 0:00:15 lr 0.000126 wd 0.0500 time 0.4450 (0.4501) data time 0.0006 (0.0016) model time 0.4443 (0.4477) loss 1.5068 (2.5606) grad_norm 1.6655 (2.6000) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][600/625] eta 0:00:11 lr 0.000126 wd 0.0500 time 0.4503 (0.4500) data time 0.0006 (0.0016) model time 0.4497 (0.4477) loss 2.4311 (2.5613) grad_norm 2.6103 (2.6022) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][610/625] eta 0:00:06 lr 0.000126 wd 0.0500 time 0.4425 (0.4500) data time 0.0004 (0.0016) model time 0.4421 (0.4476) loss 2.8368 (2.5641) grad_norm 2.4504 (2.6141) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [243/300][620/625] eta 0:00:02 lr 0.000125 wd 0.0500 time 0.4442 (0.4499) data time 0.0004 (0.0016) model time 0.4438 (0.4476) loss 2.8657 (2.5684) grad_norm 2.6419 (2.6214) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:05:53 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 243 training takes 0:04:41 [2024-08-11 08:05:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:05:54 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5220 (0.5220) Acc@1 88.672 (88.672) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 08:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8428 (0.6208) Acc@1 80.811 (86.958) Acc@5 95.850 (97.745) Mem 16699MB [2024-08-11 08:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9268 (0.7390) Acc@1 78.809 (84.059) Acc@5 95.068 (96.631) Mem 16699MB [2024-08-11 08:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.713 Acc@5 96.591 [2024-08-11 08:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:05:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.887 (0.887) Loss 0.4910 (0.4910) Acc@1 89.209 (89.209) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 08:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.189) Loss 0.7925 (0.5983) Acc@1 81.152 (87.234) Acc@5 96.387 (97.905) Mem 16699MB [2024-08-11 08:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.153) Loss 0.8716 (0.7062) Acc@1 79.834 (84.508) Acc@5 95.459 (96.926) Mem 16699MB [2024-08-11 08:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.239 Acc@5 96.889 [2024-08-11 08:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:06:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][0/625] eta 0:13:03 lr 0.000125 wd 0.0500 time 1.2528 (1.2528) data time 0.7989 (0.7989) model time 0.0000 (0.0000) loss 2.9097 (2.9097) grad_norm 2.1382 (2.1382) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][10/625] eta 0:05:20 lr 0.000125 wd 0.0500 time 0.4464 (0.5213) data time 0.0008 (0.0735) model time 0.0000 (0.0000) loss 2.2446 (2.6195) grad_norm 1.8503 (2.4097) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][20/625] eta 0:04:53 lr 0.000125 wd 0.0500 time 0.4425 (0.4856) data time 0.0009 (0.0389) model time 0.0000 (0.0000) loss 2.5980 (2.5852) grad_norm 1.9668 (2.2722) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][30/625] eta 0:04:41 lr 0.000125 wd 0.0500 time 0.4479 (0.4732) data time 0.0006 (0.0266) model time 0.0000 (0.0000) loss 2.3926 (2.5674) grad_norm 2.3571 (2.2949) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][40/625] eta 0:04:34 lr 0.000125 wd 0.0500 time 0.4449 (0.4696) data time 0.0007 (0.0203) model time 0.0000 (0.0000) loss 2.8637 (2.5808) grad_norm 1.6034 (2.5944) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][50/625] eta 0:04:27 lr 0.000125 wd 0.0500 time 0.4513 (0.4649) data time 0.0008 (0.0165) model time 0.0000 (0.0000) loss 1.8935 (2.5735) grad_norm 3.1779 (2.5747) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][60/625] eta 0:04:21 lr 0.000125 wd 0.0500 time 0.4467 (0.4623) data time 0.0008 (0.0139) model time 0.4459 (0.4479) loss 2.3008 (2.5605) grad_norm 2.3856 (2.5901) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][70/625] eta 0:04:15 lr 0.000125 wd 0.0500 time 0.4502 (0.4604) data time 0.0007 (0.0121) model time 0.4496 (0.4482) loss 2.2615 (2.5467) grad_norm 2.0364 (2.6067) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][80/625] eta 0:04:10 lr 0.000125 wd 0.0500 time 0.4524 (0.4591) data time 0.0006 (0.0107) model time 0.4517 (0.4484) loss 2.9315 (2.5128) grad_norm 2.2595 (2.6754) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][90/625] eta 0:04:06 lr 0.000125 wd 0.0500 time 0.4415 (0.4603) data time 0.0009 (0.0096) model time 0.4406 (0.4535) loss 2.7074 (2.5064) grad_norm 1.9664 (2.6349) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][100/625] eta 0:04:01 lr 0.000125 wd 0.0500 time 0.4471 (0.4592) data time 0.0008 (0.0087) model time 0.4462 (0.4524) loss 2.5989 (2.5100) grad_norm 2.0901 (2.6689) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][110/625] eta 0:03:55 lr 0.000125 wd 0.0500 time 0.4456 (0.4582) data time 0.0006 (0.0080) model time 0.4450 (0.4516) loss 2.7125 (2.5129) grad_norm 2.8864 (2.6358) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:06:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][120/625] eta 0:03:50 lr 0.000125 wd 0.0500 time 0.4465 (0.4573) data time 0.0006 (0.0074) model time 0.4459 (0.4508) loss 2.7997 (2.4964) grad_norm 2.7553 (2.6396) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:07:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][130/625] eta 0:03:45 lr 0.000125 wd 0.0500 time 0.4459 (0.4565) data time 0.0007 (0.0069) model time 0.4452 (0.4503) loss 1.8293 (2.4971) grad_norm 2.2582 (2.6251) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:07:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][140/625] eta 0:03:41 lr 0.000125 wd 0.0500 time 0.4497 (0.4559) data time 0.0009 (0.0065) model time 0.4488 (0.4500) loss 2.6036 (2.5056) grad_norm 1.8500 (2.6429) loss_scale 256.0000 (132.5390) mem 16699MB [2024-08-11 08:07:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][150/625] eta 0:03:36 lr 0.000125 wd 0.0500 time 0.4501 (0.4554) data time 0.0008 (0.0061) model time 0.4493 (0.4498) loss 2.9314 (2.5042) grad_norm 2.0387 (2.6077) loss_scale 256.0000 (140.7152) mem 16699MB [2024-08-11 08:07:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][160/625] eta 0:03:31 lr 0.000124 wd 0.0500 time 0.4471 (0.4550) data time 0.0009 (0.0058) model time 0.4462 (0.4496) loss 2.2925 (2.5103) grad_norm 2.2283 (2.5837) loss_scale 256.0000 (147.8758) mem 16699MB [2024-08-11 08:07:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][170/625] eta 0:03:26 lr 0.000124 wd 0.0500 time 0.4460 (0.4545) data time 0.0009 (0.0055) model time 0.4451 (0.4493) loss 2.7100 (2.5057) grad_norm 2.0682 (2.6291) loss_scale 256.0000 (154.1988) mem 16699MB [2024-08-11 08:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][180/625] eta 0:03:22 lr 0.000124 wd 0.0500 time 0.4457 (0.4541) data time 0.0008 (0.0052) model time 0.4449 (0.4490) loss 3.0543 (2.5027) grad_norm 2.4881 (2.6422) loss_scale 256.0000 (159.8232) mem 16699MB [2024-08-11 08:07:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][190/625] eta 0:03:17 lr 0.000124 wd 0.0500 time 0.4493 (0.4540) data time 0.0009 (0.0050) model time 0.4485 (0.4491) loss 3.0160 (2.5150) grad_norm 2.9552 (2.6836) loss_scale 256.0000 (164.8586) mem 16699MB [2024-08-11 08:07:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][200/625] eta 0:03:12 lr 0.000124 wd 0.0500 time 0.4488 (0.4537) data time 0.0009 (0.0048) model time 0.4480 (0.4491) loss 2.8373 (2.5140) grad_norm 1.7616 (2.6674) loss_scale 256.0000 (169.3930) mem 16699MB [2024-08-11 08:07:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][210/625] eta 0:03:08 lr 0.000124 wd 0.0500 time 0.4518 (0.4536) data time 0.0006 (0.0046) model time 0.4512 (0.4491) loss 2.7762 (2.5126) grad_norm 3.7359 (2.6534) loss_scale 256.0000 (173.4976) mem 16699MB [2024-08-11 08:07:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][220/625] eta 0:03:03 lr 0.000124 wd 0.0500 time 0.4465 (0.4534) data time 0.0007 (0.0044) model time 0.4458 (0.4491) loss 2.5430 (2.5049) grad_norm 2.5148 (2.6692) loss_scale 256.0000 (177.2308) mem 16699MB [2024-08-11 08:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][230/625] eta 0:02:59 lr 0.000124 wd 0.0500 time 0.4469 (0.4532) data time 0.0009 (0.0043) model time 0.4460 (0.4490) loss 2.4547 (2.5102) grad_norm 3.4969 (2.6638) loss_scale 256.0000 (180.6407) mem 16699MB [2024-08-11 08:07:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][240/625] eta 0:02:54 lr 0.000124 wd 0.0500 time 0.4466 (0.4530) data time 0.0009 (0.0041) model time 0.4457 (0.4489) loss 1.5733 (2.5001) grad_norm 1.9455 (2.6595) loss_scale 256.0000 (183.7676) mem 16699MB [2024-08-11 08:07:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][250/625] eta 0:02:49 lr 0.000124 wd 0.0500 time 0.4470 (0.4528) data time 0.0009 (0.0040) model time 0.4462 (0.4488) loss 2.1459 (2.5002) grad_norm 2.2924 (2.6552) loss_scale 256.0000 (186.6454) mem 16699MB [2024-08-11 08:07:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][260/625] eta 0:02:45 lr 0.000124 wd 0.0500 time 0.4550 (0.4527) data time 0.0009 (0.0039) model time 0.4541 (0.4489) loss 3.1512 (2.5055) grad_norm 2.1168 (2.6544) loss_scale 256.0000 (189.3027) mem 16699MB [2024-08-11 08:08:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][270/625] eta 0:02:40 lr 0.000124 wd 0.0500 time 0.4436 (0.4525) data time 0.0009 (0.0038) model time 0.4428 (0.4488) loss 2.8248 (2.5078) grad_norm 2.2018 (2.6398) loss_scale 256.0000 (191.7638) mem 16699MB [2024-08-11 08:08:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][280/625] eta 0:02:36 lr 0.000124 wd 0.0500 time 0.4485 (0.4529) data time 0.0009 (0.0037) model time 0.4476 (0.4494) loss 3.0692 (2.5177) grad_norm 2.1529 (2.6326) loss_scale 256.0000 (194.0498) mem 16699MB [2024-08-11 08:08:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][290/625] eta 0:02:31 lr 0.000124 wd 0.0500 time 0.4550 (0.4529) data time 0.0008 (0.0036) model time 0.4542 (0.4495) loss 2.8357 (2.5143) grad_norm 2.7904 (2.6184) loss_scale 256.0000 (196.1787) mem 16699MB [2024-08-11 08:08:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][300/625] eta 0:02:27 lr 0.000124 wd 0.0500 time 0.4480 (0.4528) data time 0.0009 (0.0035) model time 0.4471 (0.4495) loss 2.8082 (2.5209) grad_norm 1.8435 (2.6127) loss_scale 256.0000 (198.1661) mem 16699MB [2024-08-11 08:08:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][310/625] eta 0:02:22 lr 0.000124 wd 0.0500 time 0.4497 (0.4527) data time 0.0007 (0.0034) model time 0.4490 (0.4495) loss 2.8019 (2.5242) grad_norm 1.6766 (2.6328) loss_scale 256.0000 (200.0257) mem 16699MB [2024-08-11 08:08:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][320/625] eta 0:02:18 lr 0.000123 wd 0.0500 time 0.4474 (0.4526) data time 0.0007 (0.0033) model time 0.4467 (0.4494) loss 2.4909 (2.5217) grad_norm 2.0126 (2.6264) loss_scale 256.0000 (201.7695) mem 16699MB [2024-08-11 08:08:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][330/625] eta 0:02:13 lr 0.000123 wd 0.0500 time 0.4482 (0.4525) data time 0.0006 (0.0033) model time 0.4476 (0.4494) loss 1.8709 (2.5194) grad_norm 2.9873 (2.6855) loss_scale 256.0000 (203.4079) mem 16699MB [2024-08-11 08:08:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][340/625] eta 0:02:08 lr 0.000123 wd 0.0500 time 0.4516 (0.4524) data time 0.0009 (0.0032) model time 0.4507 (0.4493) loss 2.1697 (2.5233) grad_norm 2.5162 (2.7021) loss_scale 256.0000 (204.9501) mem 16699MB [2024-08-11 08:08:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][350/625] eta 0:02:04 lr 0.000123 wd 0.0500 time 0.4478 (0.4523) data time 0.0006 (0.0031) model time 0.4472 (0.4493) loss 1.7855 (2.5218) grad_norm 2.2555 (2.6990) loss_scale 256.0000 (206.4046) mem 16699MB [2024-08-11 08:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][360/625] eta 0:01:59 lr 0.000123 wd 0.0500 time 0.4483 (0.4521) data time 0.0006 (0.0030) model time 0.4477 (0.4492) loss 2.1647 (2.5203) grad_norm 2.0820 (2.7373) loss_scale 256.0000 (207.7784) mem 16699MB [2024-08-11 08:08:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][370/625] eta 0:01:55 lr 0.000123 wd 0.0500 time 0.4532 (0.4524) data time 0.0007 (0.0030) model time 0.4525 (0.4496) loss 2.0263 (2.5208) grad_norm 2.1794 (2.7384) loss_scale 256.0000 (209.0782) mem 16699MB [2024-08-11 08:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][380/625] eta 0:01:50 lr 0.000123 wd 0.0500 time 0.4473 (0.4524) data time 0.0006 (0.0029) model time 0.4467 (0.4496) loss 2.0472 (2.5160) grad_norm 5.5890 (2.7353) loss_scale 256.0000 (210.3097) mem 16699MB [2024-08-11 08:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][390/625] eta 0:01:46 lr 0.000123 wd 0.0500 time 0.4472 (0.4523) data time 0.0006 (0.0029) model time 0.4466 (0.4496) loss 2.8553 (2.5182) grad_norm 2.1316 (2.7229) loss_scale 256.0000 (211.4783) mem 16699MB [2024-08-11 08:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][400/625] eta 0:01:41 lr 0.000123 wd 0.0500 time 0.4479 (0.4522) data time 0.0007 (0.0028) model time 0.4472 (0.4496) loss 1.8566 (2.5210) grad_norm 2.8452 (2.7536) loss_scale 256.0000 (212.5885) mem 16699MB [2024-08-11 08:09:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][410/625] eta 0:01:37 lr 0.000123 wd 0.0500 time 0.4480 (0.4522) data time 0.0008 (0.0028) model time 0.4472 (0.4495) loss 2.8799 (2.5223) grad_norm 3.1847 (2.7563) loss_scale 256.0000 (213.6448) mem 16699MB [2024-08-11 08:09:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][420/625] eta 0:01:32 lr 0.000123 wd 0.0500 time 0.4488 (0.4521) data time 0.0008 (0.0027) model time 0.4480 (0.4495) loss 2.9418 (2.5271) grad_norm 2.2131 (2.7535) loss_scale 256.0000 (214.6508) mem 16699MB [2024-08-11 08:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][430/625] eta 0:01:28 lr 0.000123 wd 0.0500 time 0.4447 (0.4520) data time 0.0008 (0.0027) model time 0.4439 (0.4495) loss 1.7686 (2.5239) grad_norm 2.6141 (2.7461) loss_scale 256.0000 (215.6102) mem 16699MB [2024-08-11 08:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][440/625] eta 0:01:23 lr 0.000123 wd 0.0500 time 0.4487 (0.4520) data time 0.0010 (0.0026) model time 0.4478 (0.4495) loss 2.7918 (2.5262) grad_norm 2.3455 (2.7566) loss_scale 256.0000 (216.5261) mem 16699MB [2024-08-11 08:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][450/625] eta 0:01:19 lr 0.000123 wd 0.0500 time 0.4520 (0.4520) data time 0.0006 (0.0026) model time 0.4514 (0.4495) loss 1.6743 (2.5234) grad_norm 1.7160 (2.7541) loss_scale 256.0000 (217.4013) mem 16699MB [2024-08-11 08:09:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][460/625] eta 0:01:14 lr 0.000123 wd 0.0500 time 0.4448 (0.4519) data time 0.0009 (0.0026) model time 0.4439 (0.4494) loss 2.6807 (2.5252) grad_norm 3.0169 (2.7528) loss_scale 256.0000 (218.2386) mem 16699MB [2024-08-11 08:09:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][470/625] eta 0:01:10 lr 0.000123 wd 0.0500 time 0.4440 (0.4517) data time 0.0008 (0.0025) model time 0.4431 (0.4493) loss 1.9116 (2.5277) grad_norm 2.0319 (2.7446) loss_scale 256.0000 (219.0403) mem 16699MB [2024-08-11 08:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][480/625] eta 0:01:05 lr 0.000122 wd 0.0500 time 0.4426 (0.4516) data time 0.0009 (0.0025) model time 0.4417 (0.4492) loss 2.4226 (2.5233) grad_norm 1.8355 (2.7357) loss_scale 256.0000 (219.8087) mem 16699MB [2024-08-11 08:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][490/625] eta 0:01:00 lr 0.000122 wd 0.0500 time 0.4456 (0.4515) data time 0.0007 (0.0025) model time 0.4450 (0.4491) loss 3.0933 (2.5238) grad_norm 1.7380 (2.7389) loss_scale 256.0000 (220.5458) mem 16699MB [2024-08-11 08:09:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][500/625] eta 0:00:56 lr 0.000122 wd 0.0500 time 0.4482 (0.4518) data time 0.0007 (0.0024) model time 0.4474 (0.4495) loss 1.9110 (2.5289) grad_norm 3.1302 (2.7319) loss_scale 256.0000 (221.2535) mem 16699MB [2024-08-11 08:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][510/625] eta 0:00:51 lr 0.000122 wd 0.0500 time 0.4433 (0.4518) data time 0.0006 (0.0024) model time 0.4426 (0.4495) loss 2.8020 (2.5294) grad_norm 2.0417 (2.7401) loss_scale 256.0000 (221.9335) mem 16699MB [2024-08-11 08:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][520/625] eta 0:00:47 lr 0.000122 wd 0.0500 time 0.4462 (0.4517) data time 0.0009 (0.0024) model time 0.4453 (0.4495) loss 2.7080 (2.5276) grad_norm 2.3974 (2.7337) loss_scale 256.0000 (222.5873) mem 16699MB [2024-08-11 08:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][530/625] eta 0:00:42 lr 0.000122 wd 0.0500 time 0.4473 (0.4517) data time 0.0007 (0.0023) model time 0.4466 (0.4494) loss 2.7276 (2.5284) grad_norm 1.5930 (2.7278) loss_scale 256.0000 (223.2166) mem 16699MB [2024-08-11 08:10:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][540/625] eta 0:00:38 lr 0.000122 wd 0.0500 time 0.4446 (0.4516) data time 0.0007 (0.0023) model time 0.4440 (0.4493) loss 1.9931 (2.5282) grad_norm 6.9196 (2.7360) loss_scale 256.0000 (223.8226) mem 16699MB [2024-08-11 08:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][550/625] eta 0:00:33 lr 0.000122 wd 0.0500 time 0.4480 (0.4515) data time 0.0010 (0.0023) model time 0.4471 (0.4493) loss 2.8007 (2.5306) grad_norm 3.4819 (2.7336) loss_scale 256.0000 (224.4065) mem 16699MB [2024-08-11 08:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][560/625] eta 0:00:29 lr 0.000122 wd 0.0500 time 0.4506 (0.4515) data time 0.0006 (0.0023) model time 0.4499 (0.4493) loss 2.1020 (2.5293) grad_norm 2.4272 (2.7373) loss_scale 256.0000 (224.9697) mem 16699MB [2024-08-11 08:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][570/625] eta 0:00:24 lr 0.000122 wd 0.0500 time 0.4408 (0.4514) data time 0.0006 (0.0022) model time 0.4402 (0.4492) loss 2.0678 (2.5327) grad_norm 1.9819 (2.7323) loss_scale 256.0000 (225.5131) mem 16699MB [2024-08-11 08:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][580/625] eta 0:00:20 lr 0.000122 wd 0.0500 time 0.4523 (0.4513) data time 0.0006 (0.0022) model time 0.4516 (0.4492) loss 2.5907 (2.5334) grad_norm 2.7109 (2.7343) loss_scale 256.0000 (226.0379) mem 16699MB [2024-08-11 08:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][590/625] eta 0:00:15 lr 0.000122 wd 0.0500 time 0.4479 (0.4513) data time 0.0008 (0.0022) model time 0.4470 (0.4491) loss 1.7485 (2.5300) grad_norm 3.0883 (2.7263) loss_scale 256.0000 (226.5448) mem 16699MB [2024-08-11 08:10:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][600/625] eta 0:00:11 lr 0.000122 wd 0.0500 time 0.4428 (0.4512) data time 0.0009 (0.0022) model time 0.4419 (0.4490) loss 2.4561 (2.5261) grad_norm 3.4647 (inf) loss_scale 128.0000 (226.1830) mem 16699MB [2024-08-11 08:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][610/625] eta 0:00:06 lr 0.000122 wd 0.0500 time 0.4397 (0.4511) data time 0.0004 (0.0022) model time 0.4393 (0.4490) loss 1.8385 (2.5269) grad_norm 1.9054 (inf) loss_scale 128.0000 (224.5761) mem 16699MB [2024-08-11 08:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [244/300][620/625] eta 0:00:02 lr 0.000122 wd 0.0500 time 0.4419 (0.4509) data time 0.0006 (0.0021) model time 0.4413 (0.4488) loss 2.0138 (2.5240) grad_norm 2.0084 (inf) loss_scale 128.0000 (223.0209) mem 16699MB [2024-08-11 08:10:43 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 244 training takes 0:04:41 [2024-08-11 08:10:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:10:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5146 (0.5146) Acc@1 88.672 (88.672) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-11 08:10:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8354 (0.6170) Acc@1 80.176 (86.754) Acc@5 96.289 (97.820) Mem 16699MB [2024-08-11 08:10:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9014 (0.7307) Acc@1 79.199 (84.043) Acc@5 95.068 (96.735) Mem 16699MB [2024-08-11 08:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.741 Acc@5 96.679 [2024-08-11 08:10:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.888 (0.888) Loss 0.4912 (0.4912) Acc@1 89.062 (89.062) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:10:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.189) Loss 0.7930 (0.5987) Acc@1 81.055 (87.220) Acc@5 96.338 (97.900) Mem 16699MB [2024-08-11 08:10:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.154) Loss 0.8716 (0.7069) Acc@1 79.883 (84.487) Acc@5 95.508 (96.917) Mem 16699MB [2024-08-11 08:10:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.211 Acc@5 96.883 [2024-08-11 08:10:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:10:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][0/625] eta 0:13:15 lr 0.000122 wd 0.0500 time 1.2722 (1.2722) data time 0.4904 (0.4904) model time 0.0000 (0.0000) loss 2.7561 (2.7561) grad_norm 1.5940 (1.5940) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:10:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][10/625] eta 0:05:21 lr 0.000121 wd 0.0500 time 0.4496 (0.5232) data time 0.0006 (0.0453) model time 0.0000 (0.0000) loss 2.9412 (2.4981) grad_norm 2.0490 (2.2608) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][20/625] eta 0:04:55 lr 0.000121 wd 0.0500 time 0.4476 (0.4877) data time 0.0009 (0.0242) model time 0.0000 (0.0000) loss 2.4713 (2.5085) grad_norm 3.2715 (2.3319) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][30/625] eta 0:04:45 lr 0.000121 wd 0.0500 time 0.4490 (0.4805) data time 0.0009 (0.0167) model time 0.0000 (0.0000) loss 2.6523 (2.5045) grad_norm 3.4116 (2.3456) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][40/625] eta 0:04:36 lr 0.000121 wd 0.0500 time 0.4457 (0.4731) data time 0.0009 (0.0128) model time 0.0000 (0.0000) loss 2.3193 (2.5289) grad_norm 1.6765 (2.3161) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][50/625] eta 0:04:29 lr 0.000121 wd 0.0500 time 0.4459 (0.4679) data time 0.0008 (0.0105) model time 0.0000 (0.0000) loss 2.3661 (2.4936) grad_norm 2.9040 (2.2944) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][60/625] eta 0:04:22 lr 0.000121 wd 0.0500 time 0.4461 (0.4643) data time 0.0006 (0.0089) model time 0.4455 (0.4454) loss 2.8258 (2.4661) grad_norm 1.8360 (2.2948) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][70/625] eta 0:04:16 lr 0.000121 wd 0.0500 time 0.4431 (0.4619) data time 0.0007 (0.0078) model time 0.4425 (0.4458) loss 2.2081 (2.4307) grad_norm 1.9952 (2.3647) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][80/625] eta 0:04:11 lr 0.000121 wd 0.0500 time 0.4475 (0.4620) data time 0.0007 (0.0069) model time 0.4469 (0.4513) loss 3.1198 (2.4421) grad_norm 1.4379 (2.3396) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][90/625] eta 0:04:06 lr 0.000121 wd 0.0500 time 0.4465 (0.4606) data time 0.0008 (0.0062) model time 0.4457 (0.4505) loss 3.0377 (2.4449) grad_norm 3.8740 (2.3681) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][100/625] eta 0:04:01 lr 0.000121 wd 0.0500 time 0.4477 (0.4594) data time 0.0008 (0.0057) model time 0.4468 (0.4498) loss 2.8329 (2.4462) grad_norm 3.0474 (2.3729) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][110/625] eta 0:03:56 lr 0.000121 wd 0.0500 time 0.4448 (0.4584) data time 0.0006 (0.0053) model time 0.4441 (0.4495) loss 3.1021 (2.4611) grad_norm 1.8584 (2.3577) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][120/625] eta 0:03:50 lr 0.000121 wd 0.0500 time 0.4447 (0.4574) data time 0.0007 (0.0049) model time 0.4441 (0.4489) loss 2.5125 (2.4661) grad_norm 2.2467 (2.3359) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][130/625] eta 0:03:46 lr 0.000121 wd 0.0500 time 0.4458 (0.4566) data time 0.0006 (0.0046) model time 0.4452 (0.4486) loss 1.8244 (2.4717) grad_norm 3.0163 (2.6487) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][140/625] eta 0:03:41 lr 0.000121 wd 0.0500 time 0.4475 (0.4559) data time 0.0008 (0.0043) model time 0.4467 (0.4483) loss 2.4044 (2.4777) grad_norm 2.3811 (2.7025) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][150/625] eta 0:03:36 lr 0.000121 wd 0.0500 time 0.4478 (0.4554) data time 0.0008 (0.0041) model time 0.4471 (0.4481) loss 2.7478 (2.4816) grad_norm 1.6347 (2.6956) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][160/625] eta 0:03:31 lr 0.000121 wd 0.0500 time 0.4516 (0.4550) data time 0.0008 (0.0039) model time 0.4508 (0.4481) loss 2.6213 (2.4887) grad_norm 19.8670 (2.8059) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][170/625] eta 0:03:26 lr 0.000121 wd 0.0500 time 0.4502 (0.4547) data time 0.0007 (0.0037) model time 0.4496 (0.4483) loss 3.3076 (2.4994) grad_norm 2.3267 (2.8323) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][180/625] eta 0:03:22 lr 0.000120 wd 0.0500 time 0.4476 (0.4556) data time 0.0009 (0.0036) model time 0.4467 (0.4499) loss 3.3596 (2.4995) grad_norm 8.0339 (2.8933) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][190/625] eta 0:03:17 lr 0.000120 wd 0.0500 time 0.4438 (0.4551) data time 0.0009 (0.0034) model time 0.4429 (0.4496) loss 2.5894 (2.5030) grad_norm 11.8119 (2.9510) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][200/625] eta 0:03:13 lr 0.000120 wd 0.0500 time 0.4470 (0.4547) data time 0.0006 (0.0033) model time 0.4464 (0.4493) loss 1.6418 (2.4994) grad_norm 2.2571 (2.9261) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][210/625] eta 0:03:08 lr 0.000120 wd 0.0500 time 0.4481 (0.4543) data time 0.0009 (0.0032) model time 0.4472 (0.4491) loss 3.1198 (2.5083) grad_norm 1.8996 (2.9152) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][220/625] eta 0:03:03 lr 0.000120 wd 0.0500 time 0.4463 (0.4539) data time 0.0008 (0.0031) model time 0.4455 (0.4489) loss 2.0441 (2.5091) grad_norm 4.2124 (3.1743) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][230/625] eta 0:02:59 lr 0.000120 wd 0.0500 time 0.4451 (0.4537) data time 0.0009 (0.0030) model time 0.4442 (0.4488) loss 2.6778 (2.5104) grad_norm 2.8098 (3.2874) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][240/625] eta 0:02:54 lr 0.000120 wd 0.0500 time 0.4487 (0.4534) data time 0.0008 (0.0029) model time 0.4479 (0.4487) loss 2.6173 (2.4955) grad_norm 1.8392 (3.2743) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][250/625] eta 0:02:49 lr 0.000120 wd 0.0500 time 0.4540 (0.4532) data time 0.0008 (0.0028) model time 0.4532 (0.4487) loss 1.7298 (2.4966) grad_norm 2.3455 (3.2643) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][260/625] eta 0:02:45 lr 0.000120 wd 0.0500 time 0.4456 (0.4531) data time 0.0010 (0.0027) model time 0.4446 (0.4487) loss 2.5296 (2.4958) grad_norm 2.1085 (3.2438) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][270/625] eta 0:02:40 lr 0.000120 wd 0.0500 time 0.4466 (0.4530) data time 0.0007 (0.0027) model time 0.4459 (0.4487) loss 2.0101 (2.4952) grad_norm 8.6671 (3.2392) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:12:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][280/625] eta 0:02:36 lr 0.000120 wd 0.0500 time 0.4471 (0.4527) data time 0.0007 (0.0026) model time 0.4464 (0.4485) loss 1.7441 (2.4981) grad_norm 2.3343 (3.2261) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][290/625] eta 0:02:31 lr 0.000120 wd 0.0500 time 0.4449 (0.4525) data time 0.0008 (0.0025) model time 0.4441 (0.4484) loss 2.6801 (2.4981) grad_norm 3.1717 (3.1950) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][300/625] eta 0:02:27 lr 0.000120 wd 0.0500 time 0.4461 (0.4524) data time 0.0009 (0.0025) model time 0.4452 (0.4484) loss 2.1201 (2.4942) grad_norm 2.8223 (3.1715) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][310/625] eta 0:02:22 lr 0.000120 wd 0.0500 time 0.4449 (0.4522) data time 0.0009 (0.0024) model time 0.4441 (0.4483) loss 2.8422 (2.4914) grad_norm 1.8311 (3.2179) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][320/625] eta 0:02:17 lr 0.000120 wd 0.0500 time 0.4469 (0.4521) data time 0.0008 (0.0024) model time 0.4460 (0.4482) loss 2.7505 (2.4897) grad_norm 2.0980 (3.1979) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][330/625] eta 0:02:13 lr 0.000120 wd 0.0500 time 0.4513 (0.4519) data time 0.0006 (0.0023) model time 0.4507 (0.4482) loss 2.8408 (2.4889) grad_norm 5.7009 (3.1960) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][340/625] eta 0:02:08 lr 0.000119 wd 0.0500 time 0.4459 (0.4518) data time 0.0008 (0.0023) model time 0.4451 (0.4481) loss 2.6665 (2.4937) grad_norm 2.4746 (3.1709) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][350/625] eta 0:02:04 lr 0.000119 wd 0.0500 time 0.4440 (0.4516) data time 0.0009 (0.0022) model time 0.4431 (0.4480) loss 2.8499 (2.4943) grad_norm 3.0756 (3.1604) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][360/625] eta 0:01:59 lr 0.000119 wd 0.0500 time 0.4460 (0.4515) data time 0.0006 (0.0022) model time 0.4453 (0.4480) loss 2.7358 (2.4930) grad_norm 6.8812 (3.1607) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][370/625] eta 0:01:55 lr 0.000119 wd 0.0500 time 0.4498 (0.4514) data time 0.0008 (0.0022) model time 0.4490 (0.4480) loss 2.4555 (2.4953) grad_norm 2.2218 (3.1448) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][380/625] eta 0:01:50 lr 0.000119 wd 0.0500 time 0.4510 (0.4514) data time 0.0006 (0.0021) model time 0.4504 (0.4480) loss 2.6448 (2.4978) grad_norm 1.6411 (3.1332) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][390/625] eta 0:01:46 lr 0.000119 wd 0.0500 time 0.4534 (0.4514) data time 0.0007 (0.0021) model time 0.4528 (0.4481) loss 3.2025 (2.5056) grad_norm 2.5441 (3.1129) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][400/625] eta 0:01:41 lr 0.000119 wd 0.0500 time 0.4492 (0.4514) data time 0.0006 (0.0021) model time 0.4486 (0.4481) loss 2.5219 (2.5065) grad_norm 1.8263 (3.0940) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][410/625] eta 0:01:37 lr 0.000119 wd 0.0500 time 0.4475 (0.4513) data time 0.0009 (0.0020) model time 0.4466 (0.4481) loss 2.3906 (2.5040) grad_norm 2.6822 (3.0781) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][420/625] eta 0:01:32 lr 0.000119 wd 0.0500 time 0.4485 (0.4516) data time 0.0008 (0.0020) model time 0.4477 (0.4485) loss 2.8198 (2.5064) grad_norm 1.6533 (3.0721) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][430/625] eta 0:01:28 lr 0.000119 wd 0.0500 time 0.4447 (0.4515) data time 0.0008 (0.0020) model time 0.4438 (0.4485) loss 1.5980 (2.5012) grad_norm 2.7736 (3.0538) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][440/625] eta 0:01:23 lr 0.000119 wd 0.0500 time 0.4469 (0.4515) data time 0.0009 (0.0020) model time 0.4460 (0.4485) loss 2.8267 (2.5021) grad_norm 3.7057 (3.0590) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][450/625] eta 0:01:19 lr 0.000119 wd 0.0500 time 0.4502 (0.4515) data time 0.0008 (0.0019) model time 0.4494 (0.4485) loss 2.8825 (2.5078) grad_norm 1.9929 (3.0589) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][460/625] eta 0:01:14 lr 0.000119 wd 0.0500 time 0.4491 (0.4515) data time 0.0009 (0.0019) model time 0.4482 (0.4486) loss 2.3105 (2.5119) grad_norm 3.4114 (3.0477) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][470/625] eta 0:01:09 lr 0.000119 wd 0.0500 time 0.4464 (0.4514) data time 0.0007 (0.0019) model time 0.4457 (0.4486) loss 1.7727 (2.5115) grad_norm 2.2816 (3.1035) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][480/625] eta 0:01:05 lr 0.000119 wd 0.0500 time 0.4483 (0.4514) data time 0.0009 (0.0019) model time 0.4474 (0.4486) loss 2.8322 (2.5125) grad_norm 2.2122 (3.0958) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][490/625] eta 0:01:00 lr 0.000119 wd 0.0500 time 0.4468 (0.4514) data time 0.0007 (0.0018) model time 0.4461 (0.4486) loss 2.0222 (2.5130) grad_norm 13.4386 (3.1072) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][500/625] eta 0:00:56 lr 0.000118 wd 0.0500 time 0.4492 (0.4513) data time 0.0008 (0.0018) model time 0.4484 (0.4486) loss 2.4009 (2.5149) grad_norm 2.1508 (3.1001) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][510/625] eta 0:00:51 lr 0.000118 wd 0.0500 time 0.6658 (0.4517) data time 0.0008 (0.0018) model time 0.6650 (0.4491) loss 2.7291 (2.5153) grad_norm 1.9593 (3.0986) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][520/625] eta 0:00:47 lr 0.000118 wd 0.0500 time 0.4466 (0.4517) data time 0.0008 (0.0018) model time 0.4458 (0.4491) loss 2.6745 (2.5172) grad_norm 2.3861 (3.0887) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][530/625] eta 0:00:42 lr 0.000118 wd 0.0500 time 0.4509 (0.4517) data time 0.0008 (0.0018) model time 0.4501 (0.4491) loss 2.5624 (2.5176) grad_norm 2.4727 (3.0752) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][540/625] eta 0:00:38 lr 0.000118 wd 0.0500 time 0.4523 (0.4516) data time 0.0008 (0.0017) model time 0.4516 (0.4491) loss 2.5448 (2.5201) grad_norm 2.0081 (3.0643) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][550/625] eta 0:00:33 lr 0.000118 wd 0.0500 time 0.4453 (0.4516) data time 0.0006 (0.0017) model time 0.4446 (0.4491) loss 2.5508 (2.5241) grad_norm 1.7849 (3.0543) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][560/625] eta 0:00:29 lr 0.000118 wd 0.0500 time 0.4437 (0.4515) data time 0.0008 (0.0017) model time 0.4429 (0.4491) loss 3.2370 (2.5264) grad_norm 2.6423 (3.0409) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][570/625] eta 0:00:24 lr 0.000118 wd 0.0500 time 0.4490 (0.4514) data time 0.0009 (0.0017) model time 0.4481 (0.4490) loss 2.6460 (2.5278) grad_norm 2.1664 (3.0828) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][580/625] eta 0:00:20 lr 0.000118 wd 0.0500 time 0.4471 (0.4514) data time 0.0006 (0.0017) model time 0.4464 (0.4490) loss 3.0609 (2.5288) grad_norm 2.4571 (3.0737) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][590/625] eta 0:00:15 lr 0.000118 wd 0.0500 time 0.4502 (0.4513) data time 0.0006 (0.0017) model time 0.4496 (0.4490) loss 2.6004 (2.5314) grad_norm 10.9801 (3.0894) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][600/625] eta 0:00:11 lr 0.000118 wd 0.0500 time 0.4494 (0.4513) data time 0.0008 (0.0017) model time 0.4486 (0.4490) loss 2.6110 (2.5322) grad_norm 4.8974 (3.0835) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][610/625] eta 0:00:06 lr 0.000118 wd 0.0500 time 0.4396 (0.4513) data time 0.0006 (0.0016) model time 0.4389 (0.4489) loss 2.3208 (2.5313) grad_norm 2.5424 (3.0813) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [245/300][620/625] eta 0:00:02 lr 0.000118 wd 0.0500 time 0.4432 (0.4512) data time 0.0006 (0.0016) model time 0.4426 (0.4488) loss 2.6863 (2.5294) grad_norm 2.1237 (3.0630) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:33 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 245 training takes 0:04:41 [2024-08-11 08:15:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:15:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5181 (0.5181) Acc@1 89.111 (89.111) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:15:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 0.8384 (0.6228) Acc@1 80.225 (86.754) Acc@5 96.143 (97.723) Mem 16699MB [2024-08-11 08:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.8965 (0.7367) Acc@1 78.760 (83.887) Acc@5 95.312 (96.677) Mem 16699MB [2024-08-11 08:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.569 Acc@5 96.635 [2024-08-11 08:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 08:15:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.830 (0.830) Loss 0.4927 (0.4927) Acc@1 89.111 (89.111) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.7949 (0.5996) Acc@1 81.152 (87.243) Acc@5 96.240 (97.892) Mem 16699MB [2024-08-11 08:15:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8726 (0.7078) Acc@1 79.785 (84.473) Acc@5 95.410 (96.910) Mem 16699MB [2024-08-11 08:15:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.199 Acc@5 96.875 [2024-08-11 08:15:42 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][0/625] eta 0:13:42 lr 0.000118 wd 0.0500 time 1.3163 (1.3163) data time 0.6768 (0.6768) model time 0.0000 (0.0000) loss 2.3683 (2.3683) grad_norm 2.4510 (2.4510) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][10/625] eta 0:05:22 lr 0.000118 wd 0.0500 time 0.4465 (0.5250) data time 0.0006 (0.0622) model time 0.0000 (0.0000) loss 2.7844 (2.5644) grad_norm 3.1101 (2.1553) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][20/625] eta 0:04:55 lr 0.000118 wd 0.0500 time 0.4460 (0.4879) data time 0.0009 (0.0330) model time 0.0000 (0.0000) loss 2.7586 (2.4937) grad_norm 2.2271 (2.2114) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:15:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][30/625] eta 0:04:42 lr 0.000118 wd 0.0500 time 0.4474 (0.4754) data time 0.0006 (0.0226) model time 0.0000 (0.0000) loss 2.3000 (2.4518) grad_norm 2.3331 (2.6380) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][40/625] eta 0:04:34 lr 0.000117 wd 0.0500 time 0.4451 (0.4687) data time 0.0007 (0.0173) model time 0.0000 (0.0000) loss 1.7639 (2.4429) grad_norm 2.2994 (2.6161) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][50/625] eta 0:04:27 lr 0.000117 wd 0.0500 time 0.4490 (0.4645) data time 0.0008 (0.0141) model time 0.0000 (0.0000) loss 2.5688 (2.4451) grad_norm 2.3234 (2.5247) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][60/625] eta 0:04:20 lr 0.000117 wd 0.0500 time 0.4509 (0.4618) data time 0.0008 (0.0119) model time 0.4501 (0.4469) loss 2.8708 (2.4856) grad_norm 2.1489 (2.5508) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][70/625] eta 0:04:15 lr 0.000117 wd 0.0500 time 0.4442 (0.4595) data time 0.0008 (0.0103) model time 0.4434 (0.4458) loss 2.7290 (2.4872) grad_norm 1.8603 (2.5344) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][80/625] eta 0:04:09 lr 0.000117 wd 0.0500 time 0.4446 (0.4578) data time 0.0007 (0.0092) model time 0.4439 (0.4456) loss 2.4950 (2.5032) grad_norm 1.8646 (2.6867) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][90/625] eta 0:04:04 lr 0.000117 wd 0.0500 time 0.4503 (0.4568) data time 0.0008 (0.0083) model time 0.4495 (0.4460) loss 2.8650 (2.5086) grad_norm 3.0013 (2.7297) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][100/625] eta 0:03:59 lr 0.000117 wd 0.0500 time 0.4488 (0.4560) data time 0.0006 (0.0075) model time 0.4482 (0.4465) loss 2.8341 (2.5147) grad_norm 1.7078 (2.7198) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][110/625] eta 0:03:54 lr 0.000117 wd 0.0500 time 0.4489 (0.4555) data time 0.0007 (0.0069) model time 0.4482 (0.4471) loss 3.0155 (2.5154) grad_norm 2.9017 (2.6748) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][120/625] eta 0:03:49 lr 0.000117 wd 0.0500 time 0.4478 (0.4550) data time 0.0008 (0.0064) model time 0.4470 (0.4472) loss 2.7370 (2.5160) grad_norm 1.8328 (2.6395) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][130/625] eta 0:03:44 lr 0.000117 wd 0.0500 time 0.4365 (0.4545) data time 0.0009 (0.0060) model time 0.4355 (0.4472) loss 2.7261 (2.5272) grad_norm 2.5504 (2.6168) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][140/625] eta 0:03:40 lr 0.000117 wd 0.0500 time 0.4474 (0.4555) data time 0.0007 (0.0056) model time 0.4467 (0.4495) loss 3.2428 (2.5311) grad_norm 2.4622 (2.6290) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][150/625] eta 0:03:36 lr 0.000117 wd 0.0500 time 0.4475 (0.4549) data time 0.0006 (0.0053) model time 0.4468 (0.4492) loss 2.6782 (2.5371) grad_norm 2.2172 (2.6270) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][160/625] eta 0:03:31 lr 0.000117 wd 0.0500 time 0.4496 (0.4545) data time 0.0008 (0.0050) model time 0.4488 (0.4491) loss 2.5801 (2.5451) grad_norm 2.3397 (2.6827) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][170/625] eta 0:03:26 lr 0.000117 wd 0.0500 time 0.4511 (0.4544) data time 0.0006 (0.0048) model time 0.4505 (0.4492) loss 2.5021 (2.5551) grad_norm 3.7327 (2.6870) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:17:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][180/625] eta 0:03:22 lr 0.000117 wd 0.0500 time 0.4457 (0.4541) data time 0.0007 (0.0046) model time 0.4451 (0.4492) loss 2.8253 (2.5463) grad_norm 2.3614 (3.0905) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][190/625] eta 0:03:17 lr 0.000117 wd 0.0500 time 0.4466 (0.4538) data time 0.0007 (0.0044) model time 0.4459 (0.4491) loss 2.4604 (2.5435) grad_norm 2.6088 (3.0818) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:17:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][200/625] eta 0:03:12 lr 0.000117 wd 0.0500 time 0.4450 (0.4535) data time 0.0009 (0.0042) model time 0.4441 (0.4490) loss 2.3301 (2.5444) grad_norm 2.0694 (inf) loss_scale 64.0000 (127.3632) mem 16699MB [2024-08-11 08:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][210/625] eta 0:03:08 lr 0.000116 wd 0.0500 time 0.4500 (0.4532) data time 0.0009 (0.0040) model time 0.4491 (0.4488) loss 2.8811 (2.5527) grad_norm 2.3504 (inf) loss_scale 64.0000 (124.3602) mem 16699MB [2024-08-11 08:17:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][220/625] eta 0:03:03 lr 0.000116 wd 0.0500 time 0.4417 (0.4529) data time 0.0007 (0.0039) model time 0.4410 (0.4486) loss 2.7316 (2.5453) grad_norm 2.5064 (inf) loss_scale 64.0000 (121.6290) mem 16699MB [2024-08-11 08:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][230/625] eta 0:02:58 lr 0.000116 wd 0.0500 time 0.4437 (0.4526) data time 0.0009 (0.0038) model time 0.4428 (0.4484) loss 2.9829 (2.5404) grad_norm 1.9147 (inf) loss_scale 64.0000 (119.1342) mem 16699MB [2024-08-11 08:17:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][240/625] eta 0:02:54 lr 0.000116 wd 0.0500 time 0.4471 (0.4524) data time 0.0007 (0.0036) model time 0.4465 (0.4482) loss 1.6991 (2.5254) grad_norm 2.6266 (inf) loss_scale 64.0000 (116.8465) mem 16699MB [2024-08-11 08:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][250/625] eta 0:02:49 lr 0.000116 wd 0.0500 time 0.4456 (0.4522) data time 0.0006 (0.0035) model time 0.4450 (0.4482) loss 2.9384 (2.5270) grad_norm 3.1332 (inf) loss_scale 64.0000 (114.7410) mem 16699MB [2024-08-11 08:17:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][260/625] eta 0:02:44 lr 0.000116 wd 0.0500 time 0.4468 (0.4520) data time 0.0007 (0.0034) model time 0.4462 (0.4481) loss 2.5726 (2.5311) grad_norm 2.3054 (inf) loss_scale 64.0000 (112.7969) mem 16699MB [2024-08-11 08:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][270/625] eta 0:02:40 lr 0.000116 wd 0.0500 time 0.4501 (0.4519) data time 0.0006 (0.0033) model time 0.4495 (0.4481) loss 2.5607 (2.5361) grad_norm 2.2206 (inf) loss_scale 64.0000 (110.9963) mem 16699MB [2024-08-11 08:17:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][280/625] eta 0:02:35 lr 0.000116 wd 0.0500 time 0.4451 (0.4516) data time 0.0006 (0.0032) model time 0.4444 (0.4479) loss 2.7179 (2.5364) grad_norm 41.5570 (inf) loss_scale 64.0000 (109.3238) mem 16699MB [2024-08-11 08:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][290/625] eta 0:02:31 lr 0.000116 wd 0.0500 time 0.4427 (0.4514) data time 0.0006 (0.0032) model time 0.4422 (0.4478) loss 3.1334 (2.5398) grad_norm 3.1304 (inf) loss_scale 64.0000 (107.7663) mem 16699MB [2024-08-11 08:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][300/625] eta 0:02:26 lr 0.000116 wd 0.0500 time 0.4571 (0.4513) data time 0.0008 (0.0031) model time 0.4563 (0.4477) loss 2.6682 (2.5478) grad_norm 2.3308 (inf) loss_scale 64.0000 (106.3123) mem 16699MB [2024-08-11 08:18:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][310/625] eta 0:02:22 lr 0.000116 wd 0.0500 time 0.4454 (0.4511) data time 0.0006 (0.0030) model time 0.4448 (0.4477) loss 2.8883 (2.5361) grad_norm 2.4956 (inf) loss_scale 64.0000 (104.9518) mem 16699MB [2024-08-11 08:18:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][320/625] eta 0:02:17 lr 0.000116 wd 0.0500 time 0.4457 (0.4515) data time 0.0008 (0.0029) model time 0.4449 (0.4482) loss 2.6647 (2.5358) grad_norm 2.2558 (inf) loss_scale 64.0000 (103.6760) mem 16699MB [2024-08-11 08:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][330/625] eta 0:02:13 lr 0.000116 wd 0.0500 time 0.4425 (0.4520) data time 0.0007 (0.0029) model time 0.4418 (0.4489) loss 2.7433 (2.5267) grad_norm 1.9178 (inf) loss_scale 64.0000 (102.4773) mem 16699MB [2024-08-11 08:18:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][340/625] eta 0:02:08 lr 0.000116 wd 0.0500 time 0.4469 (0.4519) data time 0.0006 (0.0028) model time 0.4463 (0.4488) loss 1.8940 (2.5257) grad_norm 2.8281 (inf) loss_scale 64.0000 (101.3490) mem 16699MB [2024-08-11 08:18:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][350/625] eta 0:02:04 lr 0.000116 wd 0.0500 time 0.4479 (0.4518) data time 0.0007 (0.0027) model time 0.4472 (0.4488) loss 2.4270 (2.5356) grad_norm 2.2052 (inf) loss_scale 64.0000 (100.2849) mem 16699MB [2024-08-11 08:18:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][360/625] eta 0:01:59 lr 0.000116 wd 0.0500 time 0.4470 (0.4517) data time 0.0006 (0.0027) model time 0.4463 (0.4488) loss 1.6822 (2.5252) grad_norm 1.7885 (inf) loss_scale 64.0000 (99.2798) mem 16699MB [2024-08-11 08:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][370/625] eta 0:01:55 lr 0.000115 wd 0.0500 time 0.4499 (0.4516) data time 0.0008 (0.0026) model time 0.4491 (0.4487) loss 2.3406 (2.5306) grad_norm 2.1594 (inf) loss_scale 64.0000 (98.3288) mem 16699MB [2024-08-11 08:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][380/625] eta 0:01:50 lr 0.000115 wd 0.0500 time 0.4436 (0.4516) data time 0.0008 (0.0026) model time 0.4427 (0.4487) loss 2.9311 (2.5283) grad_norm 2.7844 (inf) loss_scale 64.0000 (97.4278) mem 16699MB [2024-08-11 08:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][390/625] eta 0:01:46 lr 0.000115 wd 0.0500 time 0.4492 (0.4515) data time 0.0006 (0.0026) model time 0.4486 (0.4487) loss 2.7859 (2.5274) grad_norm 2.9699 (inf) loss_scale 64.0000 (96.5729) mem 16699MB [2024-08-11 08:18:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][400/625] eta 0:01:41 lr 0.000115 wd 0.0500 time 0.4495 (0.4514) data time 0.0007 (0.0025) model time 0.4488 (0.4487) loss 1.7842 (2.5315) grad_norm 2.3018 (inf) loss_scale 64.0000 (95.7606) mem 16699MB [2024-08-11 08:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][410/625] eta 0:01:37 lr 0.000115 wd 0.0500 time 0.4523 (0.4514) data time 0.0008 (0.0025) model time 0.4515 (0.4487) loss 2.6869 (2.5258) grad_norm 2.6604 (inf) loss_scale 64.0000 (94.9878) mem 16699MB [2024-08-11 08:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][420/625] eta 0:01:32 lr 0.000115 wd 0.0500 time 0.4473 (0.4514) data time 0.0008 (0.0024) model time 0.4465 (0.4487) loss 2.3668 (2.5258) grad_norm 2.3032 (inf) loss_scale 64.0000 (94.2518) mem 16699MB [2024-08-11 08:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][430/625] eta 0:01:28 lr 0.000115 wd 0.0500 time 0.4469 (0.4513) data time 0.0007 (0.0024) model time 0.4462 (0.4488) loss 2.4973 (2.5289) grad_norm 3.5474 (inf) loss_scale 64.0000 (93.5499) mem 16699MB [2024-08-11 08:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][440/625] eta 0:01:23 lr 0.000115 wd 0.0500 time 0.4468 (0.4513) data time 0.0008 (0.0024) model time 0.4459 (0.4487) loss 3.3228 (2.5294) grad_norm 1.7226 (inf) loss_scale 64.0000 (92.8798) mem 16699MB [2024-08-11 08:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][450/625] eta 0:01:18 lr 0.000115 wd 0.0500 time 0.4517 (0.4512) data time 0.0008 (0.0023) model time 0.4509 (0.4487) loss 2.5289 (2.5300) grad_norm 1.7700 (inf) loss_scale 64.0000 (92.2395) mem 16699MB [2024-08-11 08:19:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][460/625] eta 0:01:14 lr 0.000115 wd 0.0500 time 0.4481 (0.4511) data time 0.0006 (0.0023) model time 0.4475 (0.4487) loss 3.2869 (2.5341) grad_norm 2.2031 (inf) loss_scale 64.0000 (91.6269) mem 16699MB [2024-08-11 08:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][470/625] eta 0:01:09 lr 0.000115 wd 0.0500 time 0.4495 (0.4511) data time 0.0009 (0.0023) model time 0.4486 (0.4487) loss 2.6033 (2.5349) grad_norm 2.3503 (inf) loss_scale 64.0000 (91.0403) mem 16699MB [2024-08-11 08:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][480/625] eta 0:01:05 lr 0.000115 wd 0.0500 time 0.4505 (0.4511) data time 0.0006 (0.0022) model time 0.4499 (0.4487) loss 1.9819 (2.5327) grad_norm 2.9293 (inf) loss_scale 64.0000 (90.4782) mem 16699MB [2024-08-11 08:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][490/625] eta 0:01:00 lr 0.000115 wd 0.0500 time 0.4526 (0.4510) data time 0.0008 (0.0022) model time 0.4518 (0.4486) loss 2.0588 (2.5258) grad_norm 2.3985 (inf) loss_scale 64.0000 (89.9389) mem 16699MB [2024-08-11 08:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][500/625] eta 0:00:56 lr 0.000115 wd 0.0500 time 0.4402 (0.4510) data time 0.0006 (0.0022) model time 0.4396 (0.4486) loss 2.4567 (2.5270) grad_norm 2.0834 (inf) loss_scale 64.0000 (89.4212) mem 16699MB [2024-08-11 08:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][510/625] eta 0:00:51 lr 0.000115 wd 0.0500 time 0.4472 (0.4509) data time 0.0008 (0.0022) model time 0.4465 (0.4485) loss 2.3970 (2.5240) grad_norm 2.1624 (inf) loss_scale 64.0000 (88.9237) mem 16699MB [2024-08-11 08:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][520/625] eta 0:00:47 lr 0.000115 wd 0.0500 time 0.4490 (0.4508) data time 0.0009 (0.0021) model time 0.4481 (0.4485) loss 2.4158 (2.5212) grad_norm 2.0029 (inf) loss_scale 64.0000 (88.4453) mem 16699MB [2024-08-11 08:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][530/625] eta 0:00:42 lr 0.000115 wd 0.0500 time 0.4453 (0.4508) data time 0.0006 (0.0021) model time 0.4446 (0.4485) loss 3.1075 (2.5209) grad_norm 2.0947 (inf) loss_scale 64.0000 (87.9849) mem 16699MB [2024-08-11 08:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][540/625] eta 0:00:38 lr 0.000114 wd 0.0500 time 0.4466 (0.4507) data time 0.0006 (0.0021) model time 0.4460 (0.4485) loss 2.3789 (2.5233) grad_norm 2.5492 (inf) loss_scale 64.0000 (87.5416) mem 16699MB [2024-08-11 08:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][550/625] eta 0:00:33 lr 0.000114 wd 0.0500 time 0.4522 (0.4513) data time 0.0007 (0.0021) model time 0.4515 (0.4491) loss 2.6463 (2.5194) grad_norm 2.3642 (inf) loss_scale 64.0000 (87.1143) mem 16699MB [2024-08-11 08:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][560/625] eta 0:00:29 lr 0.000114 wd 0.0500 time 0.4504 (0.4513) data time 0.0006 (0.0020) model time 0.4497 (0.4491) loss 1.8865 (2.5139) grad_norm 2.3842 (inf) loss_scale 64.0000 (86.7023) mem 16699MB [2024-08-11 08:19:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][570/625] eta 0:00:24 lr 0.000114 wd 0.0500 time 0.4512 (0.4513) data time 0.0007 (0.0020) model time 0.4505 (0.4492) loss 1.4913 (2.5139) grad_norm 8.0087 (inf) loss_scale 64.0000 (86.3047) mem 16699MB [2024-08-11 08:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][580/625] eta 0:00:20 lr 0.000114 wd 0.0500 time 0.4463 (0.4512) data time 0.0008 (0.0020) model time 0.4455 (0.4491) loss 1.7357 (2.5113) grad_norm 1.7052 (inf) loss_scale 64.0000 (85.9208) mem 16699MB [2024-08-11 08:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][590/625] eta 0:00:15 lr 0.000114 wd 0.0500 time 0.4455 (0.4511) data time 0.0008 (0.0020) model time 0.4447 (0.4490) loss 2.6022 (2.5145) grad_norm 1.9511 (inf) loss_scale 64.0000 (85.5499) mem 16699MB [2024-08-11 08:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][600/625] eta 0:00:11 lr 0.000114 wd 0.0500 time 0.4450 (0.4510) data time 0.0006 (0.0020) model time 0.4444 (0.4489) loss 2.2509 (2.5146) grad_norm 2.8803 (inf) loss_scale 64.0000 (85.1913) mem 16699MB [2024-08-11 08:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][610/625] eta 0:00:06 lr 0.000114 wd 0.0500 time 0.4425 (0.4509) data time 0.0004 (0.0019) model time 0.4421 (0.4488) loss 2.8056 (2.5135) grad_norm 1.7905 (inf) loss_scale 64.0000 (84.8445) mem 16699MB [2024-08-11 08:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [246/300][620/625] eta 0:00:02 lr 0.000114 wd 0.0500 time 0.4451 (0.4508) data time 0.0006 (0.0019) model time 0.4445 (0.4487) loss 2.2762 (2.5171) grad_norm 2.8697 (inf) loss_scale 64.0000 (84.5089) mem 16699MB [2024-08-11 08:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 246 training takes 0:04:41 [2024-08-11 08:20:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:20:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.5273 (0.5273) Acc@1 88.672 (88.672) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 08:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8398 (0.6262) Acc@1 80.420 (86.799) Acc@5 96.289 (97.798) Mem 16699MB [2024-08-11 08:20:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.9150 (0.7435) Acc@1 78.906 (83.898) Acc@5 95.020 (96.677) Mem 16699MB [2024-08-11 08:20:29 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.611 Acc@5 96.631 [2024-08-11 08:20:29 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 08:20:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.940 (0.940) Loss 0.4939 (0.4939) Acc@1 89.160 (89.160) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.194) Loss 0.7969 (0.6003) Acc@1 81.006 (87.234) Acc@5 96.289 (97.892) Mem 16699MB [2024-08-11 08:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.156) Loss 0.8730 (0.7087) Acc@1 79.785 (84.454) Acc@5 95.410 (96.912) Mem 16699MB [2024-08-11 08:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.175 Acc@5 96.879 [2024-08-11 08:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:20:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][0/625] eta 0:13:05 lr 0.000114 wd 0.0500 time 1.2570 (1.2570) data time 0.5445 (0.5445) model time 0.0000 (0.0000) loss 2.5581 (2.5581) grad_norm 2.4454 (2.4454) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:20:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][10/625] eta 0:05:20 lr 0.000114 wd 0.0500 time 0.4435 (0.5215) data time 0.0006 (0.0502) model time 0.0000 (0.0000) loss 1.6769 (2.4728) grad_norm 2.5093 (2.2908) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:20:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][20/625] eta 0:04:53 lr 0.000114 wd 0.0500 time 0.4465 (0.4856) data time 0.0007 (0.0267) model time 0.0000 (0.0000) loss 2.1932 (2.3410) grad_norm 1.7467 (2.2527) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:20:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][30/625] eta 0:04:41 lr 0.000114 wd 0.0500 time 0.4544 (0.4729) data time 0.0006 (0.0183) model time 0.0000 (0.0000) loss 2.3935 (2.2920) grad_norm 5.0109 (2.3975) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:20:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][40/625] eta 0:04:32 lr 0.000114 wd 0.0500 time 0.4444 (0.4663) data time 0.0006 (0.0141) model time 0.0000 (0.0000) loss 2.1801 (2.3452) grad_norm 2.3793 (2.5133) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:20:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][50/625] eta 0:04:25 lr 0.000114 wd 0.0500 time 0.4475 (0.4623) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 2.7704 (2.4143) grad_norm 2.4740 (2.6934) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][60/625] eta 0:04:19 lr 0.000114 wd 0.0500 time 0.4508 (0.4601) data time 0.0006 (0.0097) model time 0.4502 (0.4480) loss 3.0974 (2.4743) grad_norm 1.9328 (2.6781) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][70/625] eta 0:04:14 lr 0.000114 wd 0.0500 time 0.4499 (0.4585) data time 0.0006 (0.0085) model time 0.4493 (0.4481) loss 2.9544 (2.5045) grad_norm 3.2141 (2.7070) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][80/625] eta 0:04:09 lr 0.000113 wd 0.0500 time 0.4494 (0.4573) data time 0.0007 (0.0075) model time 0.4486 (0.4480) loss 2.9784 (2.5049) grad_norm 2.2570 (2.6557) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][90/625] eta 0:04:04 lr 0.000113 wd 0.0500 time 0.4488 (0.4561) data time 0.0008 (0.0068) model time 0.4480 (0.4474) loss 2.4895 (2.5116) grad_norm 2.4248 (2.9151) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][100/625] eta 0:04:00 lr 0.000113 wd 0.0500 time 0.4449 (0.4580) data time 0.0009 (0.0062) model time 0.4440 (0.4528) loss 2.7153 (2.5070) grad_norm 2.0424 (2.9449) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][110/625] eta 0:03:55 lr 0.000113 wd 0.0500 time 0.4496 (0.4570) data time 0.0008 (0.0057) model time 0.4488 (0.4517) loss 2.7947 (2.5040) grad_norm 2.2700 (2.8790) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][120/625] eta 0:03:50 lr 0.000113 wd 0.0500 time 0.4467 (0.4562) data time 0.0006 (0.0053) model time 0.4461 (0.4509) loss 1.5677 (2.4934) grad_norm 2.3413 (2.8546) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][130/625] eta 0:03:45 lr 0.000113 wd 0.0500 time 0.4482 (0.4555) data time 0.0006 (0.0050) model time 0.4476 (0.4504) loss 1.5842 (2.4942) grad_norm 2.5910 (2.8535) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][140/625] eta 0:03:40 lr 0.000113 wd 0.0500 time 0.4485 (0.4549) data time 0.0008 (0.0047) model time 0.4477 (0.4498) loss 3.1107 (2.4813) grad_norm 3.2797 (2.8689) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][150/625] eta 0:03:36 lr 0.000113 wd 0.0500 time 0.4522 (0.4553) data time 0.0008 (0.0044) model time 0.4514 (0.4509) loss 2.7496 (2.4696) grad_norm 2.2349 (2.8740) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][160/625] eta 0:03:31 lr 0.000113 wd 0.0500 time 0.4449 (0.4548) data time 0.0008 (0.0042) model time 0.4441 (0.4505) loss 2.6820 (2.4745) grad_norm 1.7035 (2.8499) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][170/625] eta 0:03:26 lr 0.000113 wd 0.0500 time 0.4440 (0.4543) data time 0.0009 (0.0040) model time 0.4432 (0.4500) loss 2.5147 (2.4776) grad_norm 3.6298 (2.9338) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][180/625] eta 0:03:21 lr 0.000113 wd 0.0500 time 0.4470 (0.4537) data time 0.0006 (0.0038) model time 0.4464 (0.4496) loss 2.0558 (2.4860) grad_norm 2.0824 (3.0955) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:21:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][190/625] eta 0:03:17 lr 0.000113 wd 0.0500 time 0.4470 (0.4533) data time 0.0006 (0.0037) model time 0.4464 (0.4493) loss 1.7348 (2.4771) grad_norm 2.1573 (3.0506) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][200/625] eta 0:03:12 lr 0.000113 wd 0.0500 time 0.4492 (0.4531) data time 0.0006 (0.0035) model time 0.4487 (0.4491) loss 2.9830 (2.4834) grad_norm 2.5939 (3.0386) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][210/625] eta 0:03:07 lr 0.000113 wd 0.0500 time 0.4449 (0.4529) data time 0.0006 (0.0034) model time 0.4442 (0.4490) loss 2.1368 (2.4839) grad_norm 2.5613 (3.0282) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][220/625] eta 0:03:03 lr 0.000113 wd 0.0500 time 0.4442 (0.4526) data time 0.0006 (0.0033) model time 0.4436 (0.4489) loss 2.7060 (2.4776) grad_norm 4.6321 (3.0586) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][230/625] eta 0:02:58 lr 0.000113 wd 0.0500 time 0.4467 (0.4524) data time 0.0009 (0.0032) model time 0.4459 (0.4488) loss 2.8410 (2.4789) grad_norm 2.9178 (3.0566) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][240/625] eta 0:02:54 lr 0.000113 wd 0.0500 time 0.4460 (0.4521) data time 0.0006 (0.0031) model time 0.4454 (0.4485) loss 2.5225 (2.4759) grad_norm 3.2535 (3.0372) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][250/625] eta 0:02:50 lr 0.000112 wd 0.0500 time 0.4458 (0.4534) data time 0.0008 (0.0030) model time 0.4449 (0.4503) loss 1.4537 (2.4816) grad_norm 2.5772 (3.0162) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][260/625] eta 0:02:45 lr 0.000112 wd 0.0500 time 0.4456 (0.4532) data time 0.0009 (0.0029) model time 0.4447 (0.4501) loss 2.0103 (2.4731) grad_norm 2.8192 (2.9912) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][270/625] eta 0:02:40 lr 0.000112 wd 0.0500 time 0.4514 (0.4530) data time 0.0008 (0.0028) model time 0.4506 (0.4500) loss 3.0603 (2.4683) grad_norm 2.5337 (2.9612) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][280/625] eta 0:02:36 lr 0.000112 wd 0.0500 time 0.4469 (0.4528) data time 0.0009 (0.0028) model time 0.4461 (0.4498) loss 2.4198 (2.4622) grad_norm 2.3509 (2.9442) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][290/625] eta 0:02:31 lr 0.000112 wd 0.0500 time 0.4459 (0.4526) data time 0.0009 (0.0027) model time 0.4449 (0.4497) loss 2.8984 (2.4680) grad_norm 1.8697 (2.9284) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][300/625] eta 0:02:27 lr 0.000112 wd 0.0500 time 0.4484 (0.4525) data time 0.0006 (0.0026) model time 0.4478 (0.4496) loss 3.2099 (2.4744) grad_norm 7.6068 (2.9619) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][310/625] eta 0:02:22 lr 0.000112 wd 0.0500 time 0.4472 (0.4522) data time 0.0007 (0.0026) model time 0.4465 (0.4494) loss 3.3073 (2.4714) grad_norm 3.3232 (2.9781) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:22:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][320/625] eta 0:02:17 lr 0.000112 wd 0.0500 time 0.4473 (0.4521) data time 0.0006 (0.0025) model time 0.4467 (0.4493) loss 2.8193 (2.4737) grad_norm 2.9705 (3.0577) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][330/625] eta 0:02:13 lr 0.000112 wd 0.0500 time 0.4446 (0.4519) data time 0.0008 (0.0025) model time 0.4437 (0.4492) loss 2.4547 (2.4766) grad_norm 2.6542 (3.0460) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][340/625] eta 0:02:08 lr 0.000112 wd 0.0500 time 0.4508 (0.4518) data time 0.0008 (0.0024) model time 0.4500 (0.4491) loss 2.9222 (2.4812) grad_norm 3.3290 (3.0405) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][350/625] eta 0:02:04 lr 0.000112 wd 0.0500 time 0.4463 (0.4517) data time 0.0006 (0.0024) model time 0.4457 (0.4491) loss 2.8229 (2.4816) grad_norm 2.3355 (3.0287) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][360/625] eta 0:01:59 lr 0.000112 wd 0.0500 time 0.4470 (0.4517) data time 0.0009 (0.0023) model time 0.4461 (0.4491) loss 2.8608 (2.4831) grad_norm 2.9747 (3.0132) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][370/625] eta 0:01:55 lr 0.000112 wd 0.0500 time 0.4537 (0.4516) data time 0.0008 (0.0023) model time 0.4529 (0.4491) loss 2.8503 (2.4884) grad_norm 3.5297 (3.0080) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][380/625] eta 0:01:50 lr 0.000112 wd 0.0500 time 0.4459 (0.4515) data time 0.0006 (0.0022) model time 0.4453 (0.4490) loss 2.9015 (2.4904) grad_norm 2.6276 (2.9881) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][390/625] eta 0:01:46 lr 0.000112 wd 0.0500 time 0.4436 (0.4515) data time 0.0006 (0.0022) model time 0.4430 (0.4490) loss 3.0756 (2.4931) grad_norm 1.6982 (2.9680) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][400/625] eta 0:01:41 lr 0.000112 wd 0.0500 time 0.4476 (0.4514) data time 0.0008 (0.0022) model time 0.4468 (0.4489) loss 2.5627 (2.4989) grad_norm 2.4387 (2.9555) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][410/625] eta 0:01:37 lr 0.000112 wd 0.0500 time 0.4505 (0.4513) data time 0.0007 (0.0021) model time 0.4498 (0.4489) loss 2.6366 (2.4957) grad_norm 1.6934 (2.9629) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][420/625] eta 0:01:32 lr 0.000111 wd 0.0500 time 0.4488 (0.4512) data time 0.0008 (0.0021) model time 0.4480 (0.4488) loss 2.6408 (2.5002) grad_norm 1.5477 (2.9552) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][430/625] eta 0:01:27 lr 0.000111 wd 0.0500 time 0.4481 (0.4512) data time 0.0006 (0.0021) model time 0.4475 (0.4488) loss 2.4414 (2.4955) grad_norm 1.7923 (2.9401) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][440/625] eta 0:01:23 lr 0.000111 wd 0.0500 time 0.4482 (0.4511) data time 0.0006 (0.0021) model time 0.4476 (0.4488) loss 3.2202 (2.4957) grad_norm 1.5965 (2.9245) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][450/625] eta 0:01:18 lr 0.000111 wd 0.0500 time 0.4481 (0.4511) data time 0.0006 (0.0020) model time 0.4475 (0.4488) loss 1.5996 (2.4939) grad_norm 2.6881 (2.9154) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][460/625] eta 0:01:14 lr 0.000111 wd 0.0500 time 0.4489 (0.4510) data time 0.0006 (0.0020) model time 0.4483 (0.4488) loss 3.0779 (2.4993) grad_norm 1.7692 (2.8954) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][470/625] eta 0:01:09 lr 0.000111 wd 0.0500 time 0.4468 (0.4509) data time 0.0008 (0.0020) model time 0.4460 (0.4487) loss 2.5738 (2.4997) grad_norm 2.2223 (2.8979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][480/625] eta 0:01:05 lr 0.000111 wd 0.0500 time 0.4504 (0.4512) data time 0.0008 (0.0019) model time 0.4496 (0.4491) loss 2.3541 (2.5005) grad_norm 2.4187 (2.8835) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][490/625] eta 0:01:00 lr 0.000111 wd 0.0500 time 0.4473 (0.4511) data time 0.0008 (0.0019) model time 0.4465 (0.4490) loss 2.8355 (2.5028) grad_norm 2.5702 (2.9057) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][500/625] eta 0:00:56 lr 0.000111 wd 0.0500 time 0.4494 (0.4511) data time 0.0005 (0.0019) model time 0.4489 (0.4490) loss 2.8265 (2.5005) grad_norm 2.2516 (2.8979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][510/625] eta 0:00:51 lr 0.000111 wd 0.0500 time 0.4471 (0.4510) data time 0.0006 (0.0019) model time 0.4465 (0.4490) loss 2.8708 (2.5028) grad_norm 1.6332 (2.8888) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][520/625] eta 0:00:47 lr 0.000111 wd 0.0500 time 0.4475 (0.4510) data time 0.0006 (0.0019) model time 0.4469 (0.4489) loss 2.5466 (2.5026) grad_norm 2.3446 (2.8939) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][530/625] eta 0:00:42 lr 0.000111 wd 0.0500 time 0.4501 (0.4509) data time 0.0009 (0.0018) model time 0.4492 (0.4489) loss 2.5616 (2.5048) grad_norm 2.3466 (2.8811) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][540/625] eta 0:00:38 lr 0.000111 wd 0.0500 time 0.4443 (0.4508) data time 0.0009 (0.0018) model time 0.4434 (0.4488) loss 2.6500 (2.5024) grad_norm 2.1145 (2.8762) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][550/625] eta 0:00:33 lr 0.000111 wd 0.0500 time 0.4413 (0.4507) data time 0.0007 (0.0018) model time 0.4406 (0.4487) loss 2.6289 (2.5024) grad_norm 3.1188 (2.8693) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][560/625] eta 0:00:29 lr 0.000111 wd 0.0500 time 0.4454 (0.4507) data time 0.0009 (0.0018) model time 0.4445 (0.4487) loss 2.3958 (2.5031) grad_norm 3.3984 (2.8629) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][570/625] eta 0:00:24 lr 0.000111 wd 0.0500 time 0.4491 (0.4507) data time 0.0009 (0.0018) model time 0.4482 (0.4487) loss 2.6840 (2.5011) grad_norm 2.3165 (2.8930) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][580/625] eta 0:00:20 lr 0.000111 wd 0.0500 time 0.4478 (0.4513) data time 0.0006 (0.0018) model time 0.4472 (0.4494) loss 2.1329 (2.4992) grad_norm 1.9810 (2.9536) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:24:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][590/625] eta 0:00:15 lr 0.000110 wd 0.0500 time 0.4471 (0.4512) data time 0.0008 (0.0017) model time 0.4462 (0.4494) loss 2.3322 (2.5002) grad_norm 2.2197 (2.9470) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][600/625] eta 0:00:11 lr 0.000110 wd 0.0500 time 0.4500 (0.4512) data time 0.0008 (0.0017) model time 0.4492 (0.4493) loss 3.0529 (2.4999) grad_norm 1.8853 (2.9417) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][610/625] eta 0:00:06 lr 0.000110 wd 0.0500 time 0.4388 (0.4511) data time 0.0004 (0.0017) model time 0.4384 (0.4492) loss 3.1382 (2.5045) grad_norm 2.2452 (2.9341) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [247/300][620/625] eta 0:00:02 lr 0.000110 wd 0.0500 time 0.4408 (0.4509) data time 0.0004 (0.0017) model time 0.4403 (0.4491) loss 2.9226 (2.5089) grad_norm 2.3813 (2.9304) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 247 training takes 0:04:41 [2024-08-11 08:25:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:25:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5210 (0.5210) Acc@1 89.111 (89.111) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-11 08:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8193 (0.6227) Acc@1 81.348 (86.839) Acc@5 95.947 (97.723) Mem 16699MB [2024-08-11 08:25:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9102 (0.7426) Acc@1 79.102 (83.998) Acc@5 95.166 (96.670) Mem 16699MB [2024-08-11 08:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.681 Acc@5 96.653 [2024-08-11 08:25:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:25:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.822 (0.822) Loss 0.4941 (0.4941) Acc@1 89.258 (89.258) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.7979 (0.6008) Acc@1 80.908 (87.260) Acc@5 96.191 (97.865) Mem 16699MB [2024-08-11 08:25:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.151) Loss 0.8730 (0.7096) Acc@1 79.785 (84.468) Acc@5 95.361 (96.896) Mem 16699MB [2024-08-11 08:25:22 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.183 Acc@5 96.857 [2024-08-11 08:25:22 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:25:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][0/625] eta 0:13:53 lr 0.000110 wd 0.0500 time 1.3333 (1.3333) data time 0.6432 (0.6432) model time 0.0000 (0.0000) loss 2.4538 (2.4538) grad_norm 1.9686 (1.9686) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][10/625] eta 0:05:25 lr 0.000110 wd 0.0500 time 0.4498 (0.5297) data time 0.0011 (0.0593) model time 0.0000 (0.0000) loss 2.7238 (2.5517) grad_norm 2.0587 (2.7603) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][20/625] eta 0:05:01 lr 0.000110 wd 0.0500 time 0.4499 (0.4983) data time 0.0007 (0.0314) model time 0.0000 (0.0000) loss 2.5147 (2.6074) grad_norm 8.8942 (3.0332) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][30/625] eta 0:04:47 lr 0.000110 wd 0.0500 time 0.4472 (0.4830) data time 0.0007 (0.0216) model time 0.0000 (0.0000) loss 3.1658 (2.6192) grad_norm 2.0424 (3.3198) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][40/625] eta 0:04:37 lr 0.000110 wd 0.0500 time 0.4508 (0.4751) data time 0.0008 (0.0165) model time 0.0000 (0.0000) loss 2.5334 (2.6069) grad_norm 2.2162 (3.4963) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][50/625] eta 0:04:29 lr 0.000110 wd 0.0500 time 0.4386 (0.4690) data time 0.0007 (0.0134) model time 0.0000 (0.0000) loss 2.1246 (2.5678) grad_norm 1.6335 (3.2325) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][60/625] eta 0:04:22 lr 0.000110 wd 0.0500 time 0.4407 (0.4650) data time 0.0006 (0.0114) model time 0.4401 (0.4434) loss 2.8677 (2.5891) grad_norm 2.2508 (3.1705) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:25:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][70/625] eta 0:04:16 lr 0.000110 wd 0.0500 time 0.4468 (0.4620) data time 0.0008 (0.0099) model time 0.4460 (0.4434) loss 2.4781 (2.5958) grad_norm 2.1900 (3.1294) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][80/625] eta 0:04:10 lr 0.000110 wd 0.0500 time 0.4472 (0.4601) data time 0.0008 (0.0088) model time 0.4464 (0.4441) loss 2.5152 (2.6125) grad_norm 3.8963 (3.0440) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][90/625] eta 0:04:05 lr 0.000110 wd 0.0500 time 0.4494 (0.4587) data time 0.0007 (0.0079) model time 0.4487 (0.4446) loss 2.4812 (2.6120) grad_norm 2.2207 (2.9853) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][100/625] eta 0:04:00 lr 0.000110 wd 0.0500 time 0.4485 (0.4575) data time 0.0008 (0.0072) model time 0.4477 (0.4450) loss 1.9474 (2.5939) grad_norm 2.1893 (2.9219) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][110/625] eta 0:03:55 lr 0.000110 wd 0.0500 time 0.4445 (0.4566) data time 0.0008 (0.0066) model time 0.4437 (0.4453) loss 2.1575 (2.5712) grad_norm 2.1663 (2.8591) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][120/625] eta 0:03:50 lr 0.000110 wd 0.0500 time 0.4432 (0.4557) data time 0.0008 (0.0062) model time 0.4424 (0.4451) loss 2.5390 (2.5713) grad_norm 1.5560 (2.8777) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][130/625] eta 0:03:45 lr 0.000110 wd 0.0500 time 0.4461 (0.4549) data time 0.0009 (0.0057) model time 0.4452 (0.4451) loss 3.0315 (2.5649) grad_norm 2.3480 (2.8672) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][140/625] eta 0:03:40 lr 0.000109 wd 0.0500 time 0.4494 (0.4543) data time 0.0007 (0.0054) model time 0.4488 (0.4451) loss 2.4832 (2.5751) grad_norm 2.7301 (2.8759) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][150/625] eta 0:03:35 lr 0.000109 wd 0.0500 time 0.4444 (0.4538) data time 0.0009 (0.0051) model time 0.4435 (0.4452) loss 1.8070 (2.5748) grad_norm 2.1308 (2.8589) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][160/625] eta 0:03:30 lr 0.000109 wd 0.0500 time 0.4463 (0.4535) data time 0.0007 (0.0048) model time 0.4456 (0.4454) loss 2.9293 (2.5730) grad_norm 2.6927 (2.8822) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][170/625] eta 0:03:26 lr 0.000109 wd 0.0500 time 0.4468 (0.4540) data time 0.0008 (0.0046) model time 0.4460 (0.4467) loss 2.7550 (2.5610) grad_norm 3.5806 (2.8862) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][180/625] eta 0:03:21 lr 0.000109 wd 0.0500 time 0.4419 (0.4536) data time 0.0006 (0.0044) model time 0.4413 (0.4467) loss 2.4735 (2.5545) grad_norm 2.0718 (2.8744) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][190/625] eta 0:03:17 lr 0.000109 wd 0.0500 time 0.4454 (0.4532) data time 0.0008 (0.0042) model time 0.4446 (0.4465) loss 2.8710 (2.5659) grad_norm 25.2796 (2.9836) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][200/625] eta 0:03:12 lr 0.000109 wd 0.0500 time 0.4423 (0.4528) data time 0.0006 (0.0040) model time 0.4417 (0.4464) loss 2.2449 (2.5570) grad_norm 2.3765 (2.9771) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:26:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][210/625] eta 0:03:07 lr 0.000109 wd 0.0500 time 0.4470 (0.4525) data time 0.0006 (0.0039) model time 0.4464 (0.4463) loss 2.7298 (2.5511) grad_norm 1.9878 (2.9642) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][220/625] eta 0:03:03 lr 0.000109 wd 0.0500 time 0.4463 (0.4522) data time 0.0006 (0.0037) model time 0.4457 (0.4463) loss 1.8721 (2.5411) grad_norm 2.5476 (2.9705) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][230/625] eta 0:02:58 lr 0.000109 wd 0.0500 time 0.4490 (0.4520) data time 0.0008 (0.0036) model time 0.4482 (0.4463) loss 2.6840 (2.5409) grad_norm 3.1323 (2.9423) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][240/625] eta 0:02:53 lr 0.000109 wd 0.0500 time 0.4463 (0.4518) data time 0.0008 (0.0035) model time 0.4456 (0.4464) loss 2.3355 (2.5494) grad_norm 2.2184 (2.9359) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][250/625] eta 0:02:49 lr 0.000109 wd 0.0500 time 0.6558 (0.4525) data time 0.0005 (0.0034) model time 0.6552 (0.4474) loss 2.1761 (2.5508) grad_norm 2.0799 (2.9049) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][260/625] eta 0:02:45 lr 0.000109 wd 0.0500 time 0.4425 (0.4529) data time 0.0008 (0.0033) model time 0.4417 (0.4481) loss 2.5837 (2.5450) grad_norm 4.4614 (2.8814) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][270/625] eta 0:02:40 lr 0.000109 wd 0.0500 time 0.4440 (0.4526) data time 0.0006 (0.0032) model time 0.4434 (0.4480) loss 2.3079 (2.5456) grad_norm 1.8094 (2.8509) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][280/625] eta 0:02:36 lr 0.000109 wd 0.0500 time 0.4511 (0.4524) data time 0.0006 (0.0031) model time 0.4505 (0.4479) loss 2.8836 (2.5447) grad_norm 2.0775 (2.9217) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][290/625] eta 0:02:31 lr 0.000109 wd 0.0500 time 0.4474 (0.4522) data time 0.0007 (0.0030) model time 0.4467 (0.4478) loss 2.9720 (2.5494) grad_norm 2.4214 (2.9192) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][300/625] eta 0:02:26 lr 0.000109 wd 0.0500 time 0.4479 (0.4521) data time 0.0009 (0.0030) model time 0.4471 (0.4478) loss 2.5771 (2.5486) grad_norm 2.7460 (2.9029) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][310/625] eta 0:02:22 lr 0.000108 wd 0.0500 time 0.4464 (0.4519) data time 0.0008 (0.0029) model time 0.4456 (0.4477) loss 2.5603 (2.5428) grad_norm 2.4160 (2.8944) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][320/625] eta 0:02:17 lr 0.000108 wd 0.0500 time 0.4448 (0.4518) data time 0.0006 (0.0028) model time 0.4442 (0.4477) loss 3.1843 (2.5495) grad_norm 2.2443 (2.9014) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][330/625] eta 0:02:13 lr 0.000108 wd 0.0500 time 0.4473 (0.4517) data time 0.0006 (0.0028) model time 0.4467 (0.4477) loss 1.7784 (2.5467) grad_norm 2.8391 (2.8999) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:27:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][340/625] eta 0:02:08 lr 0.000108 wd 0.0500 time 0.4507 (0.4516) data time 0.0006 (0.0027) model time 0.4501 (0.4477) loss 2.3731 (2.5491) grad_norm 2.2451 (2.9028) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][350/625] eta 0:02:04 lr 0.000108 wd 0.0500 time 0.4462 (0.4514) data time 0.0006 (0.0027) model time 0.4456 (0.4476) loss 1.8546 (2.5474) grad_norm 2.2022 (2.8854) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][360/625] eta 0:01:59 lr 0.000108 wd 0.0500 time 0.4471 (0.4514) data time 0.0008 (0.0026) model time 0.4462 (0.4476) loss 2.6413 (2.5460) grad_norm 2.2428 (2.8675) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][370/625] eta 0:01:55 lr 0.000108 wd 0.0500 time 0.4498 (0.4513) data time 0.0008 (0.0026) model time 0.4490 (0.4477) loss 2.5893 (2.5477) grad_norm 4.0371 (2.8777) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][380/625] eta 0:01:50 lr 0.000108 wd 0.0500 time 0.4511 (0.4513) data time 0.0009 (0.0025) model time 0.4502 (0.4477) loss 2.6229 (2.5467) grad_norm 4.9305 (2.9043) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][390/625] eta 0:01:46 lr 0.000108 wd 0.0500 time 0.4482 (0.4512) data time 0.0009 (0.0025) model time 0.4474 (0.4477) loss 2.5155 (2.5465) grad_norm 3.5108 (2.9074) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][400/625] eta 0:01:41 lr 0.000108 wd 0.0500 time 0.4450 (0.4512) data time 0.0006 (0.0024) model time 0.4444 (0.4477) loss 2.7194 (2.5517) grad_norm 2.5398 (2.8979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][410/625] eta 0:01:36 lr 0.000108 wd 0.0500 time 0.4462 (0.4511) data time 0.0007 (0.0024) model time 0.4455 (0.4477) loss 2.0401 (2.5482) grad_norm 1.8716 (2.9024) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][420/625] eta 0:01:32 lr 0.000108 wd 0.0500 time 0.4561 (0.4510) data time 0.0007 (0.0024) model time 0.4555 (0.4477) loss 2.9619 (2.5537) grad_norm 3.6191 (2.8996) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][430/625] eta 0:01:27 lr 0.000108 wd 0.0500 time 0.4461 (0.4509) data time 0.0008 (0.0023) model time 0.4452 (0.4477) loss 2.4775 (2.5578) grad_norm 2.4921 (2.9642) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][440/625] eta 0:01:23 lr 0.000108 wd 0.0500 time 0.4491 (0.4513) data time 0.0009 (0.0023) model time 0.4483 (0.4482) loss 2.1720 (2.5548) grad_norm 2.2251 (2.9534) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][450/625] eta 0:01:19 lr 0.000108 wd 0.0500 time 0.4470 (0.4517) data time 0.0006 (0.0022) model time 0.4464 (0.4486) loss 2.8114 (2.5566) grad_norm 2.7078 (2.9395) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][460/625] eta 0:01:14 lr 0.000108 wd 0.0500 time 0.4521 (0.4517) data time 0.0007 (0.0022) model time 0.4514 (0.4487) loss 3.0774 (2.5594) grad_norm 1.8067 (2.9304) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:28:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][470/625] eta 0:01:09 lr 0.000108 wd 0.0500 time 0.4478 (0.4516) data time 0.0006 (0.0022) model time 0.4472 (0.4487) loss 1.9458 (2.5615) grad_norm 2.7065 (2.9262) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][480/625] eta 0:01:05 lr 0.000107 wd 0.0500 time 0.4476 (0.4515) data time 0.0007 (0.0022) model time 0.4469 (0.4486) loss 3.3663 (2.5612) grad_norm 2.3260 (2.9325) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][490/625] eta 0:01:00 lr 0.000107 wd 0.0500 time 0.4482 (0.4514) data time 0.0009 (0.0021) model time 0.4473 (0.4486) loss 2.8644 (2.5631) grad_norm 1.7515 (2.9322) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][500/625] eta 0:00:56 lr 0.000107 wd 0.0500 time 0.4422 (0.4517) data time 0.0009 (0.0021) model time 0.4413 (0.4489) loss 2.8167 (2.5626) grad_norm 2.1285 (2.9300) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][510/625] eta 0:00:51 lr 0.000107 wd 0.0500 time 0.4452 (0.4516) data time 0.0008 (0.0021) model time 0.4443 (0.4488) loss 2.8419 (2.5616) grad_norm 2.6856 (2.9173) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][520/625] eta 0:00:47 lr 0.000107 wd 0.0500 time 0.4469 (0.4515) data time 0.0006 (0.0021) model time 0.4463 (0.4487) loss 2.8768 (2.5628) grad_norm 4.7451 (2.9801) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][530/625] eta 0:00:42 lr 0.000107 wd 0.0500 time 0.4474 (0.4514) data time 0.0006 (0.0020) model time 0.4469 (0.4487) loss 2.7275 (2.5623) grad_norm 1.7058 (2.9690) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][540/625] eta 0:00:38 lr 0.000107 wd 0.0500 time 0.4429 (0.4513) data time 0.0008 (0.0020) model time 0.4421 (0.4486) loss 1.9610 (2.5594) grad_norm 2.0758 (2.9631) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][550/625] eta 0:00:33 lr 0.000107 wd 0.0500 time 0.4462 (0.4512) data time 0.0008 (0.0020) model time 0.4454 (0.4485) loss 2.4587 (2.5612) grad_norm 2.3430 (2.9573) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][560/625] eta 0:00:29 lr 0.000107 wd 0.0500 time 0.4444 (0.4510) data time 0.0008 (0.0020) model time 0.4436 (0.4484) loss 2.3216 (2.5587) grad_norm 2.0768 (2.9436) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][570/625] eta 0:00:24 lr 0.000107 wd 0.0500 time 0.4472 (0.4510) data time 0.0006 (0.0019) model time 0.4466 (0.4484) loss 1.5222 (2.5578) grad_norm 2.3814 (2.9292) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][580/625] eta 0:00:20 lr 0.000107 wd 0.0500 time 0.4462 (0.4509) data time 0.0008 (0.0019) model time 0.4454 (0.4483) loss 3.1604 (2.5580) grad_norm 2.9387 (2.9244) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][590/625] eta 0:00:15 lr 0.000107 wd 0.0500 time 0.4468 (0.4508) data time 0.0008 (0.0019) model time 0.4459 (0.4483) loss 1.8772 (2.5587) grad_norm 2.7353 (2.9547) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][600/625] eta 0:00:11 lr 0.000107 wd 0.0500 time 0.4487 (0.4508) data time 0.0007 (0.0019) model time 0.4480 (0.4483) loss 2.5044 (2.5627) grad_norm 2.7666 (2.9681) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][610/625] eta 0:00:06 lr 0.000107 wd 0.0500 time 0.4445 (0.4507) data time 0.0004 (0.0019) model time 0.4441 (0.4482) loss 3.0076 (2.5658) grad_norm 2.0917 (2.9605) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [248/300][620/625] eta 0:00:02 lr 0.000107 wd 0.0500 time 0.4429 (0.4506) data time 0.0004 (0.0019) model time 0.4425 (0.4481) loss 1.4224 (2.5629) grad_norm 3.3158 (2.9830) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 248 training takes 0:04:41 [2024-08-11 08:30:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:30:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:30:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5264 (0.5264) Acc@1 88.818 (88.818) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-11 08:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8428 (0.6218) Acc@1 80.371 (86.865) Acc@5 95.654 (97.701) Mem 16699MB [2024-08-11 08:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.134) Loss 0.9082 (0.7404) Acc@1 79.053 (83.908) Acc@5 95.215 (96.645) Mem 16699MB [2024-08-11 08:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.615 Acc@5 96.603 [2024-08-11 08:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.6% [2024-08-11 08:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.831 (0.831) Loss 0.4954 (0.4954) Acc@1 89.209 (89.209) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:30:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.184) Loss 0.8003 (0.6015) Acc@1 80.859 (87.229) Acc@5 96.289 (97.847) Mem 16699MB [2024-08-11 08:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.151) Loss 0.8760 (0.7108) Acc@1 79.736 (84.442) Acc@5 95.361 (96.894) Mem 16699MB [2024-08-11 08:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.159 Acc@5 96.853 [2024-08-11 08:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:30:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][0/625] eta 0:13:08 lr 0.000107 wd 0.0500 time 1.2617 (1.2617) data time 0.5927 (0.5927) model time 0.0000 (0.0000) loss 2.3805 (2.3805) grad_norm 2.1940 (2.1940) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][10/625] eta 0:05:20 lr 0.000107 wd 0.0500 time 0.4467 (0.5203) data time 0.0008 (0.0546) model time 0.0000 (0.0000) loss 2.7746 (2.2974) grad_norm 2.0524 (2.5042) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][20/625] eta 0:04:53 lr 0.000107 wd 0.0500 time 0.4493 (0.4857) data time 0.0008 (0.0290) model time 0.0000 (0.0000) loss 2.1487 (2.3697) grad_norm 5.4480 (2.6631) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][30/625] eta 0:04:45 lr 0.000106 wd 0.0500 time 0.6533 (0.4799) data time 0.0006 (0.0199) model time 0.0000 (0.0000) loss 1.8771 (2.4363) grad_norm 2.2702 (3.0125) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][40/625] eta 0:04:37 lr 0.000106 wd 0.0500 time 0.4462 (0.4737) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 2.4757 (2.4333) grad_norm 2.1705 (3.0191) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][50/625] eta 0:04:31 lr 0.000106 wd 0.0500 time 0.4519 (0.4719) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 2.7951 (2.4084) grad_norm 2.6204 (2.9567) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][60/625] eta 0:04:24 lr 0.000106 wd 0.0500 time 0.4459 (0.4678) data time 0.0008 (0.0105) model time 0.4451 (0.4457) loss 1.9240 (2.4063) grad_norm 36.3411 (3.4465) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][70/625] eta 0:04:18 lr 0.000106 wd 0.0500 time 0.4430 (0.4649) data time 0.0007 (0.0091) model time 0.4422 (0.4463) loss 2.9397 (2.3676) grad_norm 3.9511 (3.7774) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][80/625] eta 0:04:12 lr 0.000106 wd 0.0500 time 0.4436 (0.4627) data time 0.0006 (0.0081) model time 0.4430 (0.4461) loss 2.4790 (2.3770) grad_norm 1.9969 (3.9465) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][90/625] eta 0:04:06 lr 0.000106 wd 0.0500 time 0.4495 (0.4609) data time 0.0007 (0.0073) model time 0.4488 (0.4460) loss 1.6916 (2.3619) grad_norm 2.7149 (3.8087) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:30:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][100/625] eta 0:04:01 lr 0.000106 wd 0.0500 time 0.4479 (0.4596) data time 0.0009 (0.0067) model time 0.4470 (0.4463) loss 2.5472 (2.3831) grad_norm 3.0526 (3.6771) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][110/625] eta 0:03:56 lr 0.000106 wd 0.0500 time 0.4455 (0.4586) data time 0.0006 (0.0061) model time 0.4448 (0.4464) loss 1.9672 (2.3984) grad_norm 3.4664 (3.5791) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][120/625] eta 0:03:51 lr 0.000106 wd 0.0500 time 0.4566 (0.4577) data time 0.0008 (0.0057) model time 0.4558 (0.4465) loss 2.9003 (2.4115) grad_norm 1.7475 (3.4680) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][130/625] eta 0:03:46 lr 0.000106 wd 0.0500 time 0.4444 (0.4569) data time 0.0008 (0.0053) model time 0.4436 (0.4465) loss 2.7819 (2.4066) grad_norm 2.2247 (3.3669) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][140/625] eta 0:03:41 lr 0.000106 wd 0.0500 time 0.4471 (0.4561) data time 0.0008 (0.0050) model time 0.4463 (0.4464) loss 1.7919 (2.4041) grad_norm 1.9594 (3.3338) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][150/625] eta 0:03:36 lr 0.000106 wd 0.0500 time 0.4444 (0.4554) data time 0.0009 (0.0047) model time 0.4435 (0.4463) loss 3.0695 (2.4152) grad_norm 2.9611 (3.3623) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][160/625] eta 0:03:31 lr 0.000106 wd 0.0500 time 0.4480 (0.4549) data time 0.0008 (0.0045) model time 0.4472 (0.4462) loss 2.8877 (2.4344) grad_norm 2.1230 (3.3158) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][170/625] eta 0:03:26 lr 0.000106 wd 0.0500 time 0.4495 (0.4545) data time 0.0008 (0.0043) model time 0.4487 (0.4463) loss 2.6385 (2.4429) grad_norm 2.7754 (3.2749) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][180/625] eta 0:03:22 lr 0.000106 wd 0.0500 time 0.4491 (0.4541) data time 0.0008 (0.0041) model time 0.4483 (0.4464) loss 2.1846 (2.4415) grad_norm 2.8431 (3.2867) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][190/625] eta 0:03:17 lr 0.000106 wd 0.0500 time 0.4470 (0.4538) data time 0.0006 (0.0039) model time 0.4465 (0.4464) loss 1.6189 (2.4384) grad_norm 2.0608 (3.2849) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][200/625] eta 0:03:13 lr 0.000105 wd 0.0500 time 0.4469 (0.4541) data time 0.0007 (0.0037) model time 0.4462 (0.4473) loss 1.6717 (2.4335) grad_norm 3.0655 (3.2425) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][210/625] eta 0:03:08 lr 0.000105 wd 0.0500 time 0.4472 (0.4538) data time 0.0008 (0.0036) model time 0.4464 (0.4473) loss 2.4058 (2.4439) grad_norm 1.6104 (3.1887) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][220/625] eta 0:03:03 lr 0.000105 wd 0.0500 time 0.4460 (0.4535) data time 0.0005 (0.0035) model time 0.4455 (0.4472) loss 2.1247 (2.4516) grad_norm 5.2503 (3.1992) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][230/625] eta 0:02:59 lr 0.000105 wd 0.0500 time 0.4453 (0.4532) data time 0.0006 (0.0033) model time 0.4447 (0.4472) loss 2.9473 (2.4557) grad_norm 1.8986 (3.1750) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][240/625] eta 0:02:54 lr 0.000105 wd 0.0500 time 0.4482 (0.4531) data time 0.0008 (0.0032) model time 0.4474 (0.4472) loss 2.4492 (2.4636) grad_norm 3.7708 (3.1469) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][250/625] eta 0:02:49 lr 0.000105 wd 0.0500 time 0.4499 (0.4529) data time 0.0006 (0.0031) model time 0.4493 (0.4473) loss 2.7057 (2.4692) grad_norm 3.6055 (3.1233) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][260/625] eta 0:02:45 lr 0.000105 wd 0.0500 time 0.4469 (0.4545) data time 0.0007 (0.0031) model time 0.4461 (0.4495) loss 2.7537 (2.4673) grad_norm 3.6019 (3.1053) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][270/625] eta 0:02:41 lr 0.000105 wd 0.0500 time 0.4459 (0.4543) data time 0.0006 (0.0030) model time 0.4453 (0.4494) loss 1.9360 (2.4633) grad_norm 2.4262 (3.1431) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][280/625] eta 0:02:36 lr 0.000105 wd 0.0500 time 0.4481 (0.4541) data time 0.0008 (0.0029) model time 0.4473 (0.4493) loss 2.7972 (2.4694) grad_norm 3.6275 (3.1307) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][290/625] eta 0:02:32 lr 0.000105 wd 0.0500 time 0.4443 (0.4538) data time 0.0006 (0.0028) model time 0.4437 (0.4492) loss 2.7269 (2.4720) grad_norm 6.5161 (3.1376) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][300/625] eta 0:02:27 lr 0.000105 wd 0.0500 time 0.4482 (0.4536) data time 0.0008 (0.0028) model time 0.4475 (0.4491) loss 2.5243 (2.4693) grad_norm 2.9887 (3.1518) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][310/625] eta 0:02:22 lr 0.000105 wd 0.0500 time 0.4469 (0.4534) data time 0.0008 (0.0027) model time 0.4461 (0.4490) loss 2.1483 (2.4746) grad_norm 49.3618 (3.2837) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][320/625] eta 0:02:18 lr 0.000105 wd 0.0500 time 0.4503 (0.4533) data time 0.0006 (0.0026) model time 0.4496 (0.4490) loss 2.7049 (2.4766) grad_norm 1.6030 (3.2583) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 08:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][330/625] eta 0:02:13 lr 0.000105 wd 0.0500 time 0.4521 (0.4531) data time 0.0008 (0.0026) model time 0.4513 (0.4489) loss 2.2484 (2.4777) grad_norm 1.8498 (3.2823) loss_scale 128.0000 (65.3535) mem 16699MB [2024-08-11 08:32:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][340/625] eta 0:02:09 lr 0.000105 wd 0.0500 time 0.4479 (0.4530) data time 0.0008 (0.0025) model time 0.4470 (0.4489) loss 1.9996 (2.4771) grad_norm 2.3255 (3.2492) loss_scale 128.0000 (67.1906) mem 16699MB [2024-08-11 08:32:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][350/625] eta 0:02:04 lr 0.000105 wd 0.0500 time 0.4467 (0.4529) data time 0.0008 (0.0025) model time 0.4458 (0.4488) loss 2.7157 (2.4787) grad_norm 2.0174 (3.2922) loss_scale 128.0000 (68.9231) mem 16699MB [2024-08-11 08:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][360/625] eta 0:01:59 lr 0.000105 wd 0.0500 time 0.4485 (0.4527) data time 0.0008 (0.0024) model time 0.4477 (0.4488) loss 2.9784 (2.4809) grad_norm 1.8429 (3.2579) loss_scale 128.0000 (70.5596) mem 16699MB [2024-08-11 08:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][370/625] eta 0:01:55 lr 0.000104 wd 0.0500 time 0.4495 (0.4525) data time 0.0008 (0.0024) model time 0.4487 (0.4487) loss 2.8676 (2.4885) grad_norm 1.8601 (3.2261) loss_scale 128.0000 (72.1078) mem 16699MB [2024-08-11 08:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][380/625] eta 0:01:50 lr 0.000104 wd 0.0500 time 0.4464 (0.4524) data time 0.0006 (0.0023) model time 0.4458 (0.4486) loss 2.4973 (2.4897) grad_norm 5.9505 (3.2117) loss_scale 128.0000 (73.5748) mem 16699MB [2024-08-11 08:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][390/625] eta 0:01:46 lr 0.000104 wd 0.0500 time 0.4473 (0.4522) data time 0.0006 (0.0023) model time 0.4467 (0.4485) loss 2.7250 (2.4872) grad_norm 1.8817 (3.1943) loss_scale 128.0000 (74.9668) mem 16699MB [2024-08-11 08:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][400/625] eta 0:01:41 lr 0.000104 wd 0.0500 time 0.4452 (0.4531) data time 0.0006 (0.0023) model time 0.4447 (0.4496) loss 2.5714 (2.4873) grad_norm 3.1534 (3.1860) loss_scale 128.0000 (76.2893) mem 16699MB [2024-08-11 08:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][410/625] eta 0:01:37 lr 0.000104 wd 0.0500 time 0.4486 (0.4534) data time 0.0006 (0.0022) model time 0.4480 (0.4500) loss 2.9578 (2.4897) grad_norm 1.9446 (3.1856) loss_scale 128.0000 (77.5474) mem 16699MB [2024-08-11 08:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][420/625] eta 0:01:32 lr 0.000104 wd 0.0500 time 0.4453 (0.4533) data time 0.0006 (0.0022) model time 0.4447 (0.4500) loss 2.1879 (2.4892) grad_norm 3.0168 (3.2155) loss_scale 128.0000 (78.7458) mem 16699MB [2024-08-11 08:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][430/625] eta 0:01:28 lr 0.000104 wd 0.0500 time 0.4452 (0.4531) data time 0.0006 (0.0022) model time 0.4447 (0.4499) loss 1.8235 (2.4876) grad_norm 1.8155 (3.2030) loss_scale 128.0000 (79.8886) mem 16699MB [2024-08-11 08:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][440/625] eta 0:01:23 lr 0.000104 wd 0.0500 time 0.4460 (0.4530) data time 0.0008 (0.0021) model time 0.4452 (0.4498) loss 1.7316 (2.4835) grad_norm 3.4162 (3.1905) loss_scale 128.0000 (80.9796) mem 16699MB [2024-08-11 08:33:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][450/625] eta 0:01:19 lr 0.000104 wd 0.0500 time 0.4449 (0.4529) data time 0.0007 (0.0021) model time 0.4443 (0.4497) loss 2.4945 (2.4876) grad_norm 5.2038 (3.1764) loss_scale 128.0000 (82.0222) mem 16699MB [2024-08-11 08:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][460/625] eta 0:01:14 lr 0.000104 wd 0.0500 time 0.4527 (0.4528) data time 0.0008 (0.0021) model time 0.4519 (0.4496) loss 2.5780 (2.4921) grad_norm 2.3892 (3.1634) loss_scale 128.0000 (83.0195) mem 16699MB [2024-08-11 08:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][470/625] eta 0:01:10 lr 0.000104 wd 0.0500 time 0.4547 (0.4527) data time 0.0007 (0.0020) model time 0.4539 (0.4496) loss 2.1801 (2.4942) grad_norm 2.0062 (3.1412) loss_scale 128.0000 (83.9745) mem 16699MB [2024-08-11 08:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][480/625] eta 0:01:05 lr 0.000104 wd 0.0500 time 0.4540 (0.4526) data time 0.0009 (0.0020) model time 0.4530 (0.4495) loss 2.5138 (2.4948) grad_norm 3.0507 (3.1346) loss_scale 128.0000 (84.8898) mem 16699MB [2024-08-11 08:33:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][490/625] eta 0:01:01 lr 0.000104 wd 0.0500 time 0.4427 (0.4525) data time 0.0008 (0.0020) model time 0.4419 (0.4495) loss 2.7816 (2.4940) grad_norm 1.3311 (3.1172) loss_scale 128.0000 (85.7678) mem 16699MB [2024-08-11 08:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][500/625] eta 0:00:56 lr 0.000104 wd 0.0500 time 0.4450 (0.4524) data time 0.0006 (0.0020) model time 0.4445 (0.4494) loss 1.8436 (2.4936) grad_norm 2.8020 (3.3275) loss_scale 128.0000 (86.6108) mem 16699MB [2024-08-11 08:34:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][510/625] eta 0:00:52 lr 0.000104 wd 0.0500 time 0.4516 (0.4523) data time 0.0008 (0.0019) model time 0.4508 (0.4493) loss 2.7242 (2.4942) grad_norm 1.8458 (3.3104) loss_scale 128.0000 (87.4207) mem 16699MB [2024-08-11 08:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][520/625] eta 0:00:47 lr 0.000104 wd 0.0500 time 0.4455 (0.4522) data time 0.0009 (0.0019) model time 0.4447 (0.4493) loss 2.9047 (2.4991) grad_norm 1.9967 (3.2895) loss_scale 128.0000 (88.1996) mem 16699MB [2024-08-11 08:34:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][530/625] eta 0:00:42 lr 0.000104 wd 0.0500 time 0.4472 (0.4524) data time 0.0008 (0.0019) model time 0.4464 (0.4495) loss 2.8189 (2.4992) grad_norm 2.4089 (3.2960) loss_scale 128.0000 (88.9492) mem 16699MB [2024-08-11 08:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][540/625] eta 0:00:38 lr 0.000104 wd 0.0500 time 0.4513 (0.4523) data time 0.0006 (0.0019) model time 0.4507 (0.4495) loss 2.7119 (2.4986) grad_norm 3.8183 (3.2835) loss_scale 128.0000 (89.6710) mem 16699MB [2024-08-11 08:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][550/625] eta 0:00:33 lr 0.000103 wd 0.0500 time 0.4480 (0.4523) data time 0.0007 (0.0019) model time 0.4473 (0.4495) loss 2.7608 (2.5029) grad_norm 3.4796 (3.2766) loss_scale 128.0000 (90.3666) mem 16699MB [2024-08-11 08:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][560/625] eta 0:00:29 lr 0.000103 wd 0.0500 time 0.4492 (0.4522) data time 0.0006 (0.0018) model time 0.4485 (0.4495) loss 1.8838 (2.5021) grad_norm 2.4421 (3.2620) loss_scale 128.0000 (91.0374) mem 16699MB [2024-08-11 08:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][570/625] eta 0:00:24 lr 0.000103 wd 0.0500 time 0.4456 (0.4521) data time 0.0008 (0.0018) model time 0.4447 (0.4494) loss 2.2916 (2.5002) grad_norm 1.9225 (3.2482) loss_scale 128.0000 (91.6848) mem 16699MB [2024-08-11 08:34:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][580/625] eta 0:00:20 lr 0.000103 wd 0.0500 time 0.4447 (0.4520) data time 0.0008 (0.0018) model time 0.4439 (0.4493) loss 2.9923 (2.5024) grad_norm 3.3976 (3.2391) loss_scale 128.0000 (92.3098) mem 16699MB [2024-08-11 08:34:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][590/625] eta 0:00:15 lr 0.000103 wd 0.0500 time 0.4442 (0.4519) data time 0.0008 (0.0018) model time 0.4433 (0.4493) loss 3.0350 (2.5038) grad_norm 8.8945 (3.2383) loss_scale 128.0000 (92.9137) mem 16699MB [2024-08-11 08:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][600/625] eta 0:00:11 lr 0.000103 wd 0.0500 time 0.4431 (0.4518) data time 0.0009 (0.0018) model time 0.4421 (0.4492) loss 2.8599 (2.5051) grad_norm 2.2706 (3.2225) loss_scale 128.0000 (93.4975) mem 16699MB [2024-08-11 08:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][610/625] eta 0:00:06 lr 0.000103 wd 0.0500 time 0.4424 (0.4518) data time 0.0006 (0.0018) model time 0.4418 (0.4491) loss 2.8194 (2.5040) grad_norm 1.9033 (3.2150) loss_scale 128.0000 (94.0622) mem 16699MB [2024-08-11 08:34:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [249/300][620/625] eta 0:00:02 lr 0.000103 wd 0.0500 time 0.4424 (0.4516) data time 0.0004 (0.0017) model time 0.4420 (0.4490) loss 2.6301 (2.5069) grad_norm 5.1816 (3.2058) loss_scale 128.0000 (94.6087) mem 16699MB [2024-08-11 08:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 249 training takes 0:04:42 [2024-08-11 08:34:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:34:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:34:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.469 (0.469) Loss 0.5264 (0.5264) Acc@1 89.014 (89.014) Acc@5 98.584 (98.584) Mem 16699MB [2024-08-11 08:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8374 (0.6246) Acc@1 80.664 (86.763) Acc@5 96.338 (97.758) Mem 16699MB [2024-08-11 08:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9229 (0.7439) Acc@1 78.809 (83.966) Acc@5 95.508 (96.642) Mem 16699MB [2024-08-11 08:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.677 Acc@5 96.623 [2024-08-11 08:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.883 (0.883) Loss 0.4961 (0.4961) Acc@1 89.258 (89.258) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.189) Loss 0.8013 (0.6021) Acc@1 81.006 (87.251) Acc@5 96.484 (97.865) Mem 16699MB [2024-08-11 08:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.154) Loss 0.8750 (0.7118) Acc@1 79.883 (84.459) Acc@5 95.557 (96.901) Mem 16699MB [2024-08-11 08:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.163 Acc@5 96.859 [2024-08-11 08:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][0/625] eta 0:12:49 lr 0.000103 wd 0.0500 time 1.2313 (1.2313) data time 0.4448 (0.4448) model time 0.0000 (0.0000) loss 2.2039 (2.2039) grad_norm 2.9451 (2.9451) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][10/625] eta 0:05:18 lr 0.000103 wd 0.0500 time 0.4431 (0.5184) data time 0.0006 (0.0412) model time 0.0000 (0.0000) loss 2.1127 (2.4703) grad_norm 2.2088 (2.2492) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][20/625] eta 0:04:53 lr 0.000103 wd 0.0500 time 0.4477 (0.4845) data time 0.0007 (0.0220) model time 0.0000 (0.0000) loss 2.7027 (2.5288) grad_norm 2.6796 (2.3622) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][30/625] eta 0:04:41 lr 0.000103 wd 0.0500 time 0.4446 (0.4726) data time 0.0008 (0.0151) model time 0.0000 (0.0000) loss 2.8003 (2.5216) grad_norm 3.0815 (2.7217) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][40/625] eta 0:04:33 lr 0.000103 wd 0.0500 time 0.4460 (0.4667) data time 0.0008 (0.0116) model time 0.0000 (0.0000) loss 2.8367 (2.5908) grad_norm 2.3351 (2.6138) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][50/625] eta 0:04:30 lr 0.000103 wd 0.0500 time 0.4475 (0.4706) data time 0.0008 (0.0095) model time 0.0000 (0.0000) loss 2.5354 (2.5179) grad_norm 2.3715 (2.5460) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][60/625] eta 0:04:24 lr 0.000103 wd 0.0500 time 0.4475 (0.4673) data time 0.0006 (0.0081) model time 0.4469 (0.4496) loss 2.5782 (2.5204) grad_norm 1.7900 (2.4858) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][70/625] eta 0:04:18 lr 0.000103 wd 0.0500 time 0.4504 (0.4650) data time 0.0007 (0.0071) model time 0.4498 (0.4498) loss 1.7992 (2.4846) grad_norm 3.4608 (2.4815) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][80/625] eta 0:04:13 lr 0.000103 wd 0.0500 time 0.4447 (0.4646) data time 0.0006 (0.0063) model time 0.4440 (0.4534) loss 2.9536 (2.4876) grad_norm 1.8043 (2.4752) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][90/625] eta 0:04:07 lr 0.000103 wd 0.0500 time 0.4425 (0.4627) data time 0.0007 (0.0057) model time 0.4418 (0.4518) loss 1.9260 (2.4835) grad_norm 2.4565 (2.5209) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][100/625] eta 0:04:02 lr 0.000102 wd 0.0500 time 0.4470 (0.4610) data time 0.0006 (0.0052) model time 0.4464 (0.4504) loss 2.6990 (2.5088) grad_norm 1.8112 (2.4947) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][110/625] eta 0:03:56 lr 0.000102 wd 0.0500 time 0.4441 (0.4597) data time 0.0008 (0.0048) model time 0.4433 (0.4496) loss 2.2562 (2.5116) grad_norm 2.4201 (2.6167) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:35:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][120/625] eta 0:03:51 lr 0.000102 wd 0.0500 time 0.4461 (0.4587) data time 0.0007 (0.0045) model time 0.4454 (0.4491) loss 3.1399 (2.5301) grad_norm 3.0509 (2.6267) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][130/625] eta 0:03:46 lr 0.000102 wd 0.0500 time 0.4498 (0.4579) data time 0.0008 (0.0042) model time 0.4489 (0.4489) loss 2.2544 (2.5344) grad_norm 2.3593 (2.5874) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][140/625] eta 0:03:41 lr 0.000102 wd 0.0500 time 0.4502 (0.4572) data time 0.0008 (0.0040) model time 0.4494 (0.4488) loss 2.6922 (2.5027) grad_norm 2.8049 (2.5737) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][150/625] eta 0:03:36 lr 0.000102 wd 0.0500 time 0.4435 (0.4567) data time 0.0006 (0.0038) model time 0.4428 (0.4487) loss 2.6324 (2.5088) grad_norm 3.1387 (2.5727) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][160/625] eta 0:03:32 lr 0.000102 wd 0.0500 time 0.4459 (0.4560) data time 0.0006 (0.0036) model time 0.4453 (0.4484) loss 2.6209 (2.5064) grad_norm 2.2149 (2.6058) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][170/625] eta 0:03:27 lr 0.000102 wd 0.0500 time 0.4452 (0.4557) data time 0.0006 (0.0034) model time 0.4446 (0.4486) loss 2.1653 (2.5002) grad_norm 2.1027 (2.6414) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][180/625] eta 0:03:22 lr 0.000102 wd 0.0500 time 0.4488 (0.4554) data time 0.0008 (0.0033) model time 0.4479 (0.4486) loss 2.8771 (2.4992) grad_norm 1.9888 (2.6309) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][190/625] eta 0:03:17 lr 0.000102 wd 0.0500 time 0.4510 (0.4550) data time 0.0006 (0.0031) model time 0.4504 (0.4486) loss 2.8844 (2.5111) grad_norm 1.7632 (2.6224) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][200/625] eta 0:03:13 lr 0.000102 wd 0.0500 time 0.4466 (0.4548) data time 0.0008 (0.0030) model time 0.4458 (0.4486) loss 2.8235 (2.5095) grad_norm 2.7170 (2.6611) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][210/625] eta 0:03:08 lr 0.000102 wd 0.0500 time 0.4475 (0.4546) data time 0.0008 (0.0029) model time 0.4468 (0.4487) loss 3.3015 (2.5031) grad_norm 1.9588 (2.6648) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][220/625] eta 0:03:04 lr 0.000102 wd 0.0500 time 0.6579 (0.4553) data time 0.0006 (0.0028) model time 0.6573 (0.4499) loss 2.9189 (2.4965) grad_norm 2.5205 (2.6691) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][230/625] eta 0:02:59 lr 0.000102 wd 0.0500 time 0.4461 (0.4547) data time 0.0006 (0.0027) model time 0.4454 (0.4494) loss 2.5922 (2.4936) grad_norm 1.8043 (2.6506) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][240/625] eta 0:02:54 lr 0.000102 wd 0.0500 time 0.4444 (0.4544) data time 0.0009 (0.0026) model time 0.4435 (0.4493) loss 3.0330 (2.5036) grad_norm 13.9979 (2.7147) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][250/625] eta 0:02:50 lr 0.000102 wd 0.0500 time 0.4451 (0.4542) data time 0.0006 (0.0026) model time 0.4445 (0.4491) loss 2.8028 (2.5077) grad_norm 1.7106 (2.7046) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][260/625] eta 0:02:45 lr 0.000102 wd 0.0500 time 0.4568 (0.4539) data time 0.0008 (0.0025) model time 0.4560 (0.4490) loss 2.5892 (2.5044) grad_norm 2.0964 (2.6864) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][270/625] eta 0:02:41 lr 0.000102 wd 0.0500 time 0.4467 (0.4537) data time 0.0008 (0.0025) model time 0.4459 (0.4489) loss 2.4771 (2.5128) grad_norm 2.4703 (2.9668) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][280/625] eta 0:02:36 lr 0.000101 wd 0.0500 time 0.4459 (0.4535) data time 0.0006 (0.0024) model time 0.4452 (0.4488) loss 1.9184 (2.5082) grad_norm 2.4814 (2.9522) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][290/625] eta 0:02:31 lr 0.000101 wd 0.0500 time 0.4477 (0.4533) data time 0.0009 (0.0024) model time 0.4469 (0.4488) loss 2.6592 (2.5141) grad_norm 1.9084 (2.9866) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][300/625] eta 0:02:27 lr 0.000101 wd 0.0500 time 0.4476 (0.4532) data time 0.0009 (0.0023) model time 0.4467 (0.4488) loss 1.6954 (2.5142) grad_norm 2.0403 (3.0274) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][310/625] eta 0:02:22 lr 0.000101 wd 0.0500 time 0.4465 (0.4529) data time 0.0006 (0.0023) model time 0.4460 (0.4486) loss 2.5424 (2.5155) grad_norm 4.3624 (3.0125) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][320/625] eta 0:02:18 lr 0.000101 wd 0.0500 time 0.4446 (0.4528) data time 0.0006 (0.0022) model time 0.4439 (0.4485) loss 2.7456 (2.5213) grad_norm 2.4766 (3.0133) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][330/625] eta 0:02:13 lr 0.000101 wd 0.0500 time 0.4459 (0.4526) data time 0.0008 (0.0022) model time 0.4451 (0.4484) loss 2.7652 (2.5215) grad_norm 2.5762 (2.9981) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][340/625] eta 0:02:08 lr 0.000101 wd 0.0500 time 0.4469 (0.4524) data time 0.0009 (0.0021) model time 0.4460 (0.4484) loss 1.7814 (2.5124) grad_norm 1.9891 (2.9845) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][350/625] eta 0:02:04 lr 0.000101 wd 0.0500 time 0.4450 (0.4522) data time 0.0006 (0.0021) model time 0.4444 (0.4483) loss 3.1789 (2.5154) grad_norm 1.9692 (2.9645) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][360/625] eta 0:01:59 lr 0.000101 wd 0.0500 time 0.4474 (0.4521) data time 0.0006 (0.0021) model time 0.4467 (0.4483) loss 2.2434 (2.5174) grad_norm 1.8597 (3.0051) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][370/625] eta 0:01:55 lr 0.000101 wd 0.0500 time 0.4511 (0.4521) data time 0.0009 (0.0020) model time 0.4502 (0.4482) loss 2.3349 (2.5227) grad_norm 3.4580 (2.9925) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:37:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][380/625] eta 0:01:50 lr 0.000101 wd 0.0500 time 0.6618 (0.4525) data time 0.0009 (0.0020) model time 0.6609 (0.4488) loss 2.6390 (2.5240) grad_norm 2.2955 (2.9848) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][390/625] eta 0:01:46 lr 0.000101 wd 0.0500 time 0.4469 (0.4528) data time 0.0006 (0.0020) model time 0.4463 (0.4493) loss 2.8620 (2.5183) grad_norm 2.4811 (2.9707) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][400/625] eta 0:01:41 lr 0.000101 wd 0.0500 time 0.4492 (0.4526) data time 0.0007 (0.0019) model time 0.4484 (0.4492) loss 2.7415 (2.5173) grad_norm 3.7186 (2.9545) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][410/625] eta 0:01:37 lr 0.000101 wd 0.0500 time 0.4521 (0.4526) data time 0.0009 (0.0019) model time 0.4511 (0.4492) loss 2.0478 (2.5151) grad_norm 2.2571 (2.9483) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][420/625] eta 0:01:32 lr 0.000101 wd 0.0500 time 0.4510 (0.4525) data time 0.0008 (0.0019) model time 0.4502 (0.4491) loss 2.5346 (2.5216) grad_norm 1.8639 (2.9383) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][430/625] eta 0:01:28 lr 0.000101 wd 0.0500 time 0.4487 (0.4524) data time 0.0008 (0.0019) model time 0.4478 (0.4491) loss 2.7622 (2.5167) grad_norm 3.2083 (2.9256) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][440/625] eta 0:01:23 lr 0.000101 wd 0.0500 time 0.4473 (0.4523) data time 0.0006 (0.0018) model time 0.4467 (0.4491) loss 2.9151 (2.5178) grad_norm 12.9713 (2.9709) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][450/625] eta 0:01:19 lr 0.000101 wd 0.0500 time 0.4476 (0.4522) data time 0.0008 (0.0018) model time 0.4468 (0.4491) loss 3.0527 (2.5199) grad_norm 2.7778 (2.9631) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][460/625] eta 0:01:14 lr 0.000100 wd 0.0500 time 0.4517 (0.4522) data time 0.0006 (0.0018) model time 0.4511 (0.4490) loss 2.3898 (2.5191) grad_norm 1.7410 (2.9644) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][470/625] eta 0:01:10 lr 0.000100 wd 0.0500 time 0.4482 (0.4521) data time 0.0006 (0.0018) model time 0.4476 (0.4490) loss 3.1002 (2.5152) grad_norm 1.9546 (2.9492) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][480/625] eta 0:01:05 lr 0.000100 wd 0.0500 time 0.4456 (0.4520) data time 0.0007 (0.0018) model time 0.4449 (0.4490) loss 1.5758 (2.5166) grad_norm 3.4098 (2.9383) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][490/625] eta 0:01:01 lr 0.000100 wd 0.0500 time 0.4458 (0.4520) data time 0.0006 (0.0017) model time 0.4452 (0.4490) loss 3.1366 (2.5159) grad_norm 5.4377 (2.9693) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][500/625] eta 0:00:56 lr 0.000100 wd 0.0500 time 0.4460 (0.4519) data time 0.0009 (0.0017) model time 0.4452 (0.4489) loss 2.4845 (2.5165) grad_norm 1.6692 (2.9722) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][510/625] eta 0:00:51 lr 0.000100 wd 0.0500 time 0.4481 (0.4518) data time 0.0007 (0.0017) model time 0.4474 (0.4489) loss 2.7529 (2.5133) grad_norm 3.1878 (2.9659) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:38:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][520/625] eta 0:00:47 lr 0.000100 wd 0.0500 time 0.4482 (0.4517) data time 0.0009 (0.0017) model time 0.4474 (0.4488) loss 2.5562 (2.5157) grad_norm 1.7381 (2.9520) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][530/625] eta 0:00:42 lr 0.000100 wd 0.0500 time 0.4461 (0.4516) data time 0.0007 (0.0017) model time 0.4454 (0.4488) loss 3.1348 (2.5144) grad_norm 2.4440 (2.9460) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][540/625] eta 0:00:38 lr 0.000100 wd 0.0500 time 0.4451 (0.4515) data time 0.0008 (0.0017) model time 0.4442 (0.4487) loss 2.9706 (2.5140) grad_norm 3.2843 (2.9422) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][550/625] eta 0:00:33 lr 0.000100 wd 0.0500 time 0.4449 (0.4514) data time 0.0006 (0.0016) model time 0.4443 (0.4486) loss 2.4051 (2.5117) grad_norm 2.5434 (2.9297) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][560/625] eta 0:00:29 lr 0.000100 wd 0.0500 time 0.4449 (0.4516) data time 0.0009 (0.0016) model time 0.4440 (0.4489) loss 2.9218 (2.5133) grad_norm 2.9763 (2.9236) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][570/625] eta 0:00:24 lr 0.000100 wd 0.0500 time 0.6108 (0.4519) data time 0.0008 (0.0016) model time 0.6099 (0.4492) loss 1.6524 (2.5127) grad_norm 21.6733 (2.9516) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][580/625] eta 0:00:20 lr 0.000100 wd 0.0500 time 0.4401 (0.4519) data time 0.0008 (0.0016) model time 0.4393 (0.4493) loss 2.6292 (2.5149) grad_norm 2.1972 (2.9414) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][590/625] eta 0:00:15 lr 0.000100 wd 0.0500 time 0.4435 (0.4519) data time 0.0007 (0.0016) model time 0.4428 (0.4492) loss 2.7799 (2.5177) grad_norm 1.5610 (2.9315) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][600/625] eta 0:00:11 lr 0.000100 wd 0.0500 time 0.4529 (0.4518) data time 0.0007 (0.0016) model time 0.4522 (0.4492) loss 2.4282 (2.5169) grad_norm 2.0895 (2.9155) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][610/625] eta 0:00:06 lr 0.000100 wd 0.0500 time 0.4395 (0.4517) data time 0.0004 (0.0016) model time 0.4391 (0.4491) loss 2.5411 (2.5183) grad_norm 19.3415 (2.9335) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [250/300][620/625] eta 0:00:02 lr 0.000100 wd 0.0500 time 0.4402 (0.4515) data time 0.0004 (0.0015) model time 0.4398 (0.4489) loss 2.8967 (2.5208) grad_norm 2.4662 (2.9233) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:39:45 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 250 training takes 0:04:42 [2024-08-11 08:39:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:39:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:39:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5225 (0.5225) Acc@1 89.209 (89.209) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 08:39:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8560 (0.6342) Acc@1 80.176 (86.754) Acc@5 95.996 (97.705) Mem 16699MB [2024-08-11 08:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.133) Loss 0.9180 (0.7502) Acc@1 79.199 (83.912) Acc@5 95.312 (96.619) Mem 16699MB [2024-08-11 08:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.651 Acc@5 96.607 [2024-08-11 08:39:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:39:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.900 (0.900) Loss 0.4976 (0.4976) Acc@1 89.258 (89.258) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:39:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.190) Loss 0.8047 (0.6029) Acc@1 81.104 (87.229) Acc@5 96.436 (97.852) Mem 16699MB [2024-08-11 08:39:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.155) Loss 0.8760 (0.7129) Acc@1 80.078 (84.442) Acc@5 95.654 (96.882) Mem 16699MB [2024-08-11 08:39:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.151 Acc@5 96.843 [2024-08-11 08:39:54 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][0/625] eta 0:13:19 lr 0.000100 wd 0.0500 time 1.2799 (1.2799) data time 0.7946 (0.7946) model time 0.0000 (0.0000) loss 2.7608 (2.7608) grad_norm 2.2343 (2.2343) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][10/625] eta 0:05:22 lr 0.000099 wd 0.0500 time 0.4468 (0.5239) data time 0.0007 (0.0730) model time 0.0000 (0.0000) loss 2.9339 (2.6503) grad_norm 2.1464 (2.2360) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][20/625] eta 0:04:55 lr 0.000099 wd 0.0500 time 0.4483 (0.4883) data time 0.0009 (0.0386) model time 0.0000 (0.0000) loss 2.1187 (2.4924) grad_norm 1.6196 (2.2339) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][30/625] eta 0:04:43 lr 0.000099 wd 0.0500 time 0.4483 (0.4759) data time 0.0007 (0.0264) model time 0.0000 (0.0000) loss 1.5206 (2.4482) grad_norm 2.0361 (2.3265) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][40/625] eta 0:04:34 lr 0.000099 wd 0.0500 time 0.4433 (0.4685) data time 0.0008 (0.0202) model time 0.0000 (0.0000) loss 2.3333 (2.4348) grad_norm 3.1755 (4.1013) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][50/625] eta 0:04:27 lr 0.000099 wd 0.0500 time 0.4463 (0.4645) data time 0.0009 (0.0164) model time 0.0000 (0.0000) loss 2.6182 (2.3915) grad_norm 2.9764 (3.8642) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][60/625] eta 0:04:20 lr 0.000099 wd 0.0500 time 0.4483 (0.4616) data time 0.0006 (0.0138) model time 0.4477 (0.4460) loss 2.7631 (2.3961) grad_norm 2.9499 (4.0983) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][70/625] eta 0:04:15 lr 0.000099 wd 0.0500 time 0.4477 (0.4597) data time 0.0006 (0.0120) model time 0.4471 (0.4465) loss 2.6205 (2.3949) grad_norm 1.6777 (3.8836) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][80/625] eta 0:04:09 lr 0.000099 wd 0.0500 time 0.4463 (0.4583) data time 0.0008 (0.0106) model time 0.4455 (0.4468) loss 2.8804 (2.4052) grad_norm 3.2065 (3.7050) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][90/625] eta 0:04:04 lr 0.000099 wd 0.0500 time 0.4480 (0.4572) data time 0.0006 (0.0095) model time 0.4474 (0.4470) loss 1.5348 (2.4085) grad_norm 2.3597 (4.1756) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][100/625] eta 0:03:59 lr 0.000099 wd 0.0500 time 0.4486 (0.4563) data time 0.0007 (0.0087) model time 0.4479 (0.4471) loss 2.0750 (2.4220) grad_norm 2.1754 (3.9815) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][110/625] eta 0:03:55 lr 0.000099 wd 0.0500 time 0.4461 (0.4567) data time 0.0008 (0.0080) model time 0.4452 (0.4492) loss 2.5446 (2.4066) grad_norm 2.1888 (4.0298) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][120/625] eta 0:03:50 lr 0.000099 wd 0.0500 time 0.4452 (0.4560) data time 0.0006 (0.0074) model time 0.4445 (0.4490) loss 2.6223 (2.4270) grad_norm 2.1950 (3.9259) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][130/625] eta 0:03:45 lr 0.000099 wd 0.0500 time 0.4426 (0.4553) data time 0.0007 (0.0069) model time 0.4420 (0.4487) loss 1.6729 (2.4260) grad_norm 2.0858 (3.9472) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:40:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][140/625] eta 0:03:40 lr 0.000099 wd 0.0500 time 0.4509 (0.4550) data time 0.0008 (0.0064) model time 0.4502 (0.4488) loss 2.8440 (2.4318) grad_norm 2.4069 (3.9301) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][150/625] eta 0:03:37 lr 0.000099 wd 0.0500 time 0.4499 (0.4572) data time 0.0008 (0.0061) model time 0.4491 (0.4526) loss 2.2676 (2.4278) grad_norm 2.1413 (3.8383) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][160/625] eta 0:03:32 lr 0.000099 wd 0.0500 time 0.4465 (0.4567) data time 0.0008 (0.0058) model time 0.4457 (0.4522) loss 2.9026 (2.4340) grad_norm 2.0762 (3.8258) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][170/625] eta 0:03:27 lr 0.000099 wd 0.0500 time 0.4454 (0.4562) data time 0.0007 (0.0055) model time 0.4447 (0.4519) loss 1.8325 (2.4402) grad_norm 2.7149 (3.7369) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][180/625] eta 0:03:22 lr 0.000099 wd 0.0500 time 0.4563 (0.4559) data time 0.0007 (0.0052) model time 0.4556 (0.4516) loss 1.9615 (2.4367) grad_norm 1.4877 (3.6737) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][190/625] eta 0:03:18 lr 0.000098 wd 0.0500 time 0.4474 (0.4554) data time 0.0009 (0.0050) model time 0.4466 (0.4513) loss 2.4548 (2.4394) grad_norm 3.5462 (3.6761) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][200/625] eta 0:03:13 lr 0.000098 wd 0.0500 time 0.4484 (0.4550) data time 0.0006 (0.0048) model time 0.4478 (0.4510) loss 2.0113 (2.4453) grad_norm 2.4885 (3.6617) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][210/625] eta 0:03:08 lr 0.000098 wd 0.0500 time 0.4527 (0.4547) data time 0.0009 (0.0046) model time 0.4518 (0.4507) loss 2.6184 (2.4481) grad_norm 3.0126 (3.6158) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][220/625] eta 0:03:04 lr 0.000098 wd 0.0500 time 0.4497 (0.4544) data time 0.0009 (0.0044) model time 0.4488 (0.4506) loss 2.9955 (2.4409) grad_norm 2.0601 (3.5650) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][230/625] eta 0:02:59 lr 0.000098 wd 0.0500 time 0.4469 (0.4542) data time 0.0008 (0.0043) model time 0.4460 (0.4505) loss 2.2659 (2.4503) grad_norm 1.8936 (3.5086) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][240/625] eta 0:02:54 lr 0.000098 wd 0.0500 time 0.4511 (0.4542) data time 0.0008 (0.0041) model time 0.4503 (0.4506) loss 2.9204 (2.4549) grad_norm 4.9316 (3.4906) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][250/625] eta 0:02:50 lr 0.000098 wd 0.0500 time 0.4471 (0.4540) data time 0.0007 (0.0040) model time 0.4464 (0.4505) loss 2.1264 (2.4557) grad_norm 2.3901 (3.4429) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][260/625] eta 0:02:45 lr 0.000098 wd 0.0500 time 0.4448 (0.4543) data time 0.0006 (0.0039) model time 0.4442 (0.4509) loss 1.7734 (2.4606) grad_norm 2.1828 (3.4114) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][270/625] eta 0:02:41 lr 0.000098 wd 0.0500 time 0.4467 (0.4541) data time 0.0008 (0.0038) model time 0.4459 (0.4509) loss 2.3327 (2.4631) grad_norm 2.5279 (3.4465) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][280/625] eta 0:02:36 lr 0.000098 wd 0.0500 time 0.4461 (0.4538) data time 0.0009 (0.0036) model time 0.4452 (0.4507) loss 2.8735 (2.4702) grad_norm 3.5590 (3.5949) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][290/625] eta 0:02:31 lr 0.000098 wd 0.0500 time 0.4510 (0.4537) data time 0.0009 (0.0035) model time 0.4501 (0.4506) loss 2.5799 (2.4642) grad_norm 2.5754 (3.6496) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][300/625] eta 0:02:27 lr 0.000098 wd 0.0500 time 0.4470 (0.4536) data time 0.0007 (0.0035) model time 0.4463 (0.4506) loss 2.1508 (2.4587) grad_norm 2.7499 (3.6075) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][310/625] eta 0:02:22 lr 0.000098 wd 0.0500 time 0.4484 (0.4535) data time 0.0009 (0.0034) model time 0.4475 (0.4505) loss 2.5020 (2.4620) grad_norm 2.4557 (3.5980) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][320/625] eta 0:02:18 lr 0.000098 wd 0.0500 time 0.4459 (0.4534) data time 0.0006 (0.0033) model time 0.4453 (0.4505) loss 2.8027 (2.4669) grad_norm 3.3104 (3.5768) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][330/625] eta 0:02:13 lr 0.000098 wd 0.0500 time 0.4445 (0.4532) data time 0.0008 (0.0032) model time 0.4437 (0.4504) loss 1.9487 (2.4722) grad_norm 2.7842 (3.6138) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][340/625] eta 0:02:09 lr 0.000098 wd 0.0500 time 0.4482 (0.4531) data time 0.0006 (0.0032) model time 0.4475 (0.4502) loss 2.6922 (2.4675) grad_norm 2.2823 (3.5961) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][350/625] eta 0:02:04 lr 0.000098 wd 0.0500 time 0.4448 (0.4529) data time 0.0008 (0.0031) model time 0.4440 (0.4501) loss 2.4786 (2.4701) grad_norm 2.0618 (3.5651) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][360/625] eta 0:01:59 lr 0.000098 wd 0.0500 time 0.4487 (0.4528) data time 0.0008 (0.0030) model time 0.4478 (0.4500) loss 2.6635 (2.4637) grad_norm 1.8083 (3.5274) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][370/625] eta 0:01:55 lr 0.000097 wd 0.0500 time 0.4429 (0.4542) data time 0.0009 (0.0030) model time 0.4420 (0.4517) loss 3.0364 (2.4638) grad_norm 2.3846 (3.4956) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][380/625] eta 0:01:51 lr 0.000097 wd 0.0500 time 0.4489 (0.4540) data time 0.0008 (0.0029) model time 0.4481 (0.4516) loss 2.3519 (2.4653) grad_norm 2.1548 (3.4597) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][390/625] eta 0:01:46 lr 0.000097 wd 0.0500 time 0.4512 (0.4539) data time 0.0008 (0.0029) model time 0.4504 (0.4515) loss 2.4783 (2.4649) grad_norm 2.6874 (3.4330) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:42:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][400/625] eta 0:01:42 lr 0.000097 wd 0.0500 time 0.4451 (0.4537) data time 0.0009 (0.0028) model time 0.4443 (0.4513) loss 2.4777 (2.4601) grad_norm 3.0333 (3.4061) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][410/625] eta 0:01:37 lr 0.000097 wd 0.0500 time 0.4502 (0.4536) data time 0.0007 (0.0028) model time 0.4495 (0.4512) loss 1.8649 (2.4578) grad_norm 2.6317 (3.4400) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][420/625] eta 0:01:32 lr 0.000097 wd 0.0500 time 0.4488 (0.4534) data time 0.0006 (0.0027) model time 0.4482 (0.4511) loss 1.6753 (2.4555) grad_norm 1.8452 (3.4122) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][430/625] eta 0:01:28 lr 0.000097 wd 0.0500 time 0.4522 (0.4533) data time 0.0009 (0.0027) model time 0.4514 (0.4510) loss 2.7306 (2.4548) grad_norm 2.0918 (3.3923) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][440/625] eta 0:01:23 lr 0.000097 wd 0.0500 time 0.4477 (0.4532) data time 0.0006 (0.0026) model time 0.4471 (0.4509) loss 2.3026 (2.4569) grad_norm 1.6497 (3.3994) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][450/625] eta 0:01:19 lr 0.000097 wd 0.0500 time 0.4478 (0.4531) data time 0.0007 (0.0026) model time 0.4471 (0.4508) loss 1.7088 (2.4558) grad_norm 1.9003 (3.3807) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][460/625] eta 0:01:14 lr 0.000097 wd 0.0500 time 0.4488 (0.4530) data time 0.0006 (0.0025) model time 0.4481 (0.4508) loss 3.0921 (2.4586) grad_norm 2.1682 (3.3728) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][470/625] eta 0:01:10 lr 0.000097 wd 0.0500 time 0.4469 (0.4529) data time 0.0006 (0.0025) model time 0.4464 (0.4507) loss 2.4342 (2.4543) grad_norm 3.7647 (3.3559) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][480/625] eta 0:01:05 lr 0.000097 wd 0.0500 time 0.4465 (0.4527) data time 0.0007 (0.0025) model time 0.4458 (0.4505) loss 3.0659 (2.4552) grad_norm 3.2195 (3.3390) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][490/625] eta 0:01:01 lr 0.000097 wd 0.0500 time 0.4497 (0.4526) data time 0.0009 (0.0024) model time 0.4488 (0.4504) loss 3.0141 (2.4541) grad_norm 2.9059 (3.3207) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][500/625] eta 0:00:56 lr 0.000097 wd 0.0500 time 0.4471 (0.4525) data time 0.0008 (0.0024) model time 0.4463 (0.4503) loss 2.0946 (2.4525) grad_norm 2.3271 (3.3100) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][510/625] eta 0:00:52 lr 0.000097 wd 0.0500 time 0.6076 (0.4535) data time 0.0006 (0.0024) model time 0.6070 (0.4515) loss 1.7761 (2.4539) grad_norm 3.0878 (3.3065) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][520/625] eta 0:00:47 lr 0.000097 wd 0.0500 time 0.4466 (0.4538) data time 0.0007 (0.0023) model time 0.4459 (0.4518) loss 2.9589 (2.4564) grad_norm 2.1944 (3.3272) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][530/625] eta 0:00:43 lr 0.000097 wd 0.0500 time 0.4486 (0.4537) data time 0.0007 (0.0023) model time 0.4478 (0.4517) loss 1.8810 (2.4585) grad_norm 2.4303 (3.3066) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:43:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][540/625] eta 0:00:38 lr 0.000097 wd 0.0500 time 0.4486 (0.4536) data time 0.0006 (0.0023) model time 0.4480 (0.4516) loss 3.3030 (2.4600) grad_norm 2.6004 (3.2934) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][550/625] eta 0:00:34 lr 0.000096 wd 0.0500 time 0.4494 (0.4535) data time 0.0008 (0.0023) model time 0.4486 (0.4515) loss 2.5218 (2.4624) grad_norm 1.9526 (3.2785) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][560/625] eta 0:00:29 lr 0.000096 wd 0.0500 time 0.4458 (0.4533) data time 0.0009 (0.0022) model time 0.4448 (0.4514) loss 2.4784 (2.4638) grad_norm 2.1973 (3.2836) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][570/625] eta 0:00:24 lr 0.000096 wd 0.0500 time 0.4477 (0.4533) data time 0.0006 (0.0022) model time 0.4471 (0.4513) loss 1.9509 (2.4632) grad_norm 1.9337 (3.3078) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][580/625] eta 0:00:20 lr 0.000096 wd 0.0500 time 0.4521 (0.4532) data time 0.0007 (0.0022) model time 0.4515 (0.4513) loss 1.8446 (2.4605) grad_norm 2.7104 (3.3029) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][590/625] eta 0:00:15 lr 0.000096 wd 0.0500 time 0.4505 (0.4534) data time 0.0008 (0.0022) model time 0.4497 (0.4515) loss 2.8647 (2.4642) grad_norm 1.9872 (3.2920) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][600/625] eta 0:00:11 lr 0.000096 wd 0.0500 time 0.4462 (0.4533) data time 0.0008 (0.0022) model time 0.4454 (0.4514) loss 2.5514 (2.4652) grad_norm 2.1133 (3.2947) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][610/625] eta 0:00:06 lr 0.000096 wd 0.0500 time 0.4602 (0.4533) data time 0.0006 (0.0021) model time 0.4595 (0.4514) loss 2.3108 (2.4643) grad_norm 2.1883 (3.2823) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [251/300][620/625] eta 0:00:02 lr 0.000096 wd 0.0500 time 0.4427 (0.4531) data time 0.0004 (0.0021) model time 0.4423 (0.4512) loss 2.9240 (2.4685) grad_norm 2.7053 (3.2746) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 251 training takes 0:04:43 [2024-08-11 08:44:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:44:39 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5151 (0.5151) Acc@1 89.062 (89.062) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 08:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8398 (0.6264) Acc@1 80.664 (86.830) Acc@5 96.191 (97.785) Mem 16699MB [2024-08-11 08:44:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9243 (0.7473) Acc@1 79.102 (83.970) Acc@5 95.361 (96.603) Mem 16699MB [2024-08-11 08:44:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.663 Acc@5 96.567 [2024-08-11 08:44:42 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:44:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.821 (0.821) Loss 0.4983 (0.4983) Acc@1 89.307 (89.307) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 08:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8047 (0.6037) Acc@1 81.006 (87.198) Acc@5 96.387 (97.869) Mem 16699MB [2024-08-11 08:44:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.151) Loss 0.8760 (0.7138) Acc@1 79.834 (84.424) Acc@5 95.654 (96.877) Mem 16699MB [2024-08-11 08:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.123 Acc@5 96.837 [2024-08-11 08:44:46 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 08:44:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][0/625] eta 0:12:22 lr 0.000096 wd 0.0500 time 1.1882 (1.1882) data time 0.7345 (0.7345) model time 0.0000 (0.0000) loss 2.8940 (2.8940) grad_norm 2.9962 (2.9962) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][10/625] eta 0:05:15 lr 0.000096 wd 0.0500 time 0.4481 (0.5138) data time 0.0006 (0.0676) model time 0.0000 (0.0000) loss 1.7521 (2.5684) grad_norm 2.8722 (4.0510) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:44:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][20/625] eta 0:04:51 lr 0.000096 wd 0.0500 time 0.4467 (0.4821) data time 0.0008 (0.0358) model time 0.0000 (0.0000) loss 2.1488 (2.5123) grad_norm 1.9877 (3.1449) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][30/625] eta 0:04:40 lr 0.000096 wd 0.0500 time 0.4488 (0.4710) data time 0.0008 (0.0245) model time 0.0000 (0.0000) loss 2.5320 (2.4354) grad_norm 2.3828 (2.9522) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][40/625] eta 0:04:32 lr 0.000096 wd 0.0500 time 0.4506 (0.4656) data time 0.0006 (0.0187) model time 0.0000 (0.0000) loss 1.9392 (2.4522) grad_norm 3.4200 (2.9542) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][50/625] eta 0:04:25 lr 0.000096 wd 0.0500 time 0.4457 (0.4621) data time 0.0006 (0.0152) model time 0.0000 (0.0000) loss 2.5414 (2.4154) grad_norm 3.7327 (3.0683) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][60/625] eta 0:04:19 lr 0.000096 wd 0.0500 time 0.4488 (0.4598) data time 0.0008 (0.0128) model time 0.4480 (0.4470) loss 2.6185 (2.4260) grad_norm 2.3619 (2.9937) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][70/625] eta 0:04:14 lr 0.000096 wd 0.0500 time 0.4492 (0.4581) data time 0.0007 (0.0111) model time 0.4486 (0.4470) loss 3.0545 (2.4356) grad_norm 2.0924 (2.9233) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][80/625] eta 0:04:08 lr 0.000096 wd 0.0500 time 0.4459 (0.4569) data time 0.0008 (0.0099) model time 0.4450 (0.4471) loss 2.7715 (2.4609) grad_norm 2.8365 (2.8770) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][90/625] eta 0:04:03 lr 0.000096 wd 0.0500 time 0.4480 (0.4559) data time 0.0006 (0.0089) model time 0.4474 (0.4470) loss 1.8795 (2.4420) grad_norm 2.1388 (2.8289) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][100/625] eta 0:04:00 lr 0.000096 wd 0.0500 time 0.4498 (0.4589) data time 0.0008 (0.0081) model time 0.4490 (0.4547) loss 2.6247 (2.4504) grad_norm 2.1438 (2.8086) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][110/625] eta 0:03:55 lr 0.000095 wd 0.0500 time 0.4455 (0.4581) data time 0.0008 (0.0074) model time 0.4447 (0.4539) loss 2.5990 (2.4570) grad_norm 2.2660 (2.7648) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][120/625] eta 0:03:51 lr 0.000095 wd 0.0500 time 0.4506 (0.4575) data time 0.0008 (0.0069) model time 0.4498 (0.4533) loss 2.7359 (2.4603) grad_norm 1.7865 (2.7180) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][130/625] eta 0:03:46 lr 0.000095 wd 0.0500 time 0.4489 (0.4569) data time 0.0009 (0.0064) model time 0.4480 (0.4527) loss 2.3810 (2.4662) grad_norm 2.6059 (2.6971) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][140/625] eta 0:03:41 lr 0.000095 wd 0.0500 time 0.4535 (0.4562) data time 0.0009 (0.0060) model time 0.4526 (0.4521) loss 2.8297 (2.4749) grad_norm 2.2526 (2.8564) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][150/625] eta 0:03:36 lr 0.000095 wd 0.0500 time 0.4463 (0.4558) data time 0.0006 (0.0057) model time 0.4457 (0.4518) loss 2.2311 (2.4734) grad_norm 2.4000 (2.8231) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:45:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][160/625] eta 0:03:32 lr 0.000095 wd 0.0500 time 0.4504 (0.4561) data time 0.0009 (0.0054) model time 0.4495 (0.4525) loss 2.8170 (2.4721) grad_norm 2.3538 (2.8881) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][170/625] eta 0:03:27 lr 0.000095 wd 0.0500 time 0.4556 (0.4560) data time 0.0008 (0.0051) model time 0.4547 (0.4526) loss 2.3777 (2.4688) grad_norm 2.4656 (2.8636) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][180/625] eta 0:03:22 lr 0.000095 wd 0.0500 time 0.4451 (0.4557) data time 0.0008 (0.0049) model time 0.4443 (0.4524) loss 2.6888 (2.4764) grad_norm 7.7046 (2.9157) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][190/625] eta 0:03:18 lr 0.000095 wd 0.0500 time 0.4519 (0.4553) data time 0.0006 (0.0047) model time 0.4513 (0.4521) loss 2.5188 (2.4820) grad_norm 4.2039 (2.9179) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][200/625] eta 0:03:13 lr 0.000095 wd 0.0500 time 0.4455 (0.4549) data time 0.0007 (0.0045) model time 0.4448 (0.4516) loss 2.8654 (2.4925) grad_norm 2.0866 (2.9045) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][210/625] eta 0:03:08 lr 0.000095 wd 0.0500 time 0.4467 (0.4545) data time 0.0008 (0.0043) model time 0.4459 (0.4513) loss 2.9141 (2.4956) grad_norm 2.3307 (2.8806) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][220/625] eta 0:03:04 lr 0.000095 wd 0.0500 time 0.4463 (0.4543) data time 0.0007 (0.0041) model time 0.4456 (0.4512) loss 2.5925 (2.4851) grad_norm 1.5401 (2.8448) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][230/625] eta 0:02:59 lr 0.000095 wd 0.0500 time 0.4471 (0.4541) data time 0.0008 (0.0040) model time 0.4463 (0.4510) loss 2.7331 (2.4839) grad_norm 2.1405 (2.8912) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][240/625] eta 0:02:54 lr 0.000095 wd 0.0500 time 0.4527 (0.4540) data time 0.0007 (0.0039) model time 0.4520 (0.4510) loss 2.8057 (2.4933) grad_norm 1.8526 (2.8602) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][250/625] eta 0:02:50 lr 0.000095 wd 0.0500 time 0.4508 (0.4539) data time 0.0008 (0.0037) model time 0.4501 (0.4509) loss 2.4896 (2.4944) grad_norm 2.5490 (2.8480) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][260/625] eta 0:02:45 lr 0.000095 wd 0.0500 time 0.4441 (0.4537) data time 0.0007 (0.0036) model time 0.4434 (0.4508) loss 2.3067 (2.4996) grad_norm 3.1997 (2.8613) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][270/625] eta 0:02:40 lr 0.000095 wd 0.0500 time 0.4482 (0.4535) data time 0.0008 (0.0035) model time 0.4473 (0.4507) loss 2.7213 (2.5086) grad_norm 1.9193 (2.8465) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][280/625] eta 0:02:36 lr 0.000095 wd 0.0500 time 0.4491 (0.4533) data time 0.0009 (0.0034) model time 0.4482 (0.4505) loss 2.7242 (2.5071) grad_norm 1.4936 (2.8422) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:46:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][290/625] eta 0:02:31 lr 0.000095 wd 0.0500 time 0.4482 (0.4531) data time 0.0008 (0.0033) model time 0.4473 (0.4504) loss 2.5983 (2.4978) grad_norm 3.5477 (2.8330) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][300/625] eta 0:02:27 lr 0.000094 wd 0.0500 time 0.4518 (0.4535) data time 0.0006 (0.0033) model time 0.4512 (0.4509) loss 3.0448 (2.5025) grad_norm 3.0805 (2.8411) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][310/625] eta 0:02:22 lr 0.000094 wd 0.0500 time 0.4485 (0.4533) data time 0.0008 (0.0032) model time 0.4477 (0.4507) loss 1.6252 (2.4943) grad_norm 1.9396 (2.9485) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][320/625] eta 0:02:18 lr 0.000094 wd 0.0500 time 0.4510 (0.4532) data time 0.0009 (0.0031) model time 0.4502 (0.4507) loss 2.4636 (2.4980) grad_norm 2.1636 (2.9498) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][330/625] eta 0:02:13 lr 0.000094 wd 0.0500 time 0.4527 (0.4531) data time 0.0008 (0.0030) model time 0.4519 (0.4506) loss 2.7855 (2.5029) grad_norm 1.7618 (2.9960) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][340/625] eta 0:02:09 lr 0.000094 wd 0.0500 time 0.4507 (0.4529) data time 0.0006 (0.0030) model time 0.4501 (0.4505) loss 2.2453 (2.5080) grad_norm 7.4908 (2.9962) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][350/625] eta 0:02:04 lr 0.000094 wd 0.0500 time 0.4484 (0.4528) data time 0.0008 (0.0029) model time 0.4477 (0.4503) loss 2.7514 (2.5118) grad_norm 3.3151 (3.0700) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][360/625] eta 0:01:59 lr 0.000094 wd 0.0500 time 0.4464 (0.4526) data time 0.0006 (0.0029) model time 0.4458 (0.4502) loss 2.4355 (2.5108) grad_norm 3.3487 (3.0576) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][370/625] eta 0:01:55 lr 0.000094 wd 0.0500 time 0.4512 (0.4525) data time 0.0009 (0.0028) model time 0.4503 (0.4501) loss 2.5336 (2.5121) grad_norm 3.0556 (3.0505) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][380/625] eta 0:01:50 lr 0.000094 wd 0.0500 time 0.4479 (0.4523) data time 0.0006 (0.0028) model time 0.4472 (0.4500) loss 2.6617 (2.5145) grad_norm 2.0201 (3.0454) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][390/625] eta 0:01:46 lr 0.000094 wd 0.0500 time 0.4474 (0.4523) data time 0.0009 (0.0027) model time 0.4465 (0.4500) loss 1.7183 (2.5108) grad_norm 1.6614 (3.0369) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][400/625] eta 0:01:41 lr 0.000094 wd 0.0500 time 0.4472 (0.4522) data time 0.0007 (0.0027) model time 0.4466 (0.4499) loss 1.5797 (2.5058) grad_norm 2.1482 (3.0262) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][410/625] eta 0:01:37 lr 0.000094 wd 0.0500 time 0.4510 (0.4521) data time 0.0009 (0.0026) model time 0.4501 (0.4498) loss 2.7570 (2.5061) grad_norm 1.6357 (3.0079) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][420/625] eta 0:01:32 lr 0.000094 wd 0.0500 time 0.4408 (0.4520) data time 0.0008 (0.0026) model time 0.4399 (0.4497) loss 2.7756 (2.5052) grad_norm 2.7404 (3.0195) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:48:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][430/625] eta 0:01:28 lr 0.000094 wd 0.0500 time 0.4460 (0.4524) data time 0.0009 (0.0025) model time 0.4451 (0.4502) loss 2.7746 (2.5005) grad_norm 3.2705 (3.0265) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][440/625] eta 0:01:23 lr 0.000094 wd 0.0500 time 0.4457 (0.4527) data time 0.0007 (0.0025) model time 0.4450 (0.4506) loss 2.7955 (2.5034) grad_norm 1.9220 (3.0134) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:48:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][450/625] eta 0:01:19 lr 0.000094 wd 0.0500 time 0.4451 (0.4525) data time 0.0006 (0.0025) model time 0.4444 (0.4504) loss 2.4584 (2.5076) grad_norm 4.7359 (3.0139) loss_scale 256.0000 (128.5676) mem 16699MB [2024-08-11 08:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][460/625] eta 0:01:14 lr 0.000094 wd 0.0500 time 0.4516 (0.4524) data time 0.0006 (0.0024) model time 0.4509 (0.4504) loss 1.6786 (2.5043) grad_norm 2.0982 (3.0249) loss_scale 256.0000 (131.3319) mem 16699MB [2024-08-11 08:48:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][470/625] eta 0:01:10 lr 0.000094 wd 0.0500 time 0.4497 (0.4523) data time 0.0008 (0.0024) model time 0.4488 (0.4503) loss 2.7262 (2.5066) grad_norm 2.6259 (3.0129) loss_scale 256.0000 (133.9788) mem 16699MB [2024-08-11 08:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][480/625] eta 0:01:05 lr 0.000093 wd 0.0500 time 0.4457 (0.4522) data time 0.0008 (0.0024) model time 0.4448 (0.4502) loss 2.5225 (2.5015) grad_norm 2.2771 (3.0115) loss_scale 256.0000 (136.5156) mem 16699MB [2024-08-11 08:48:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][490/625] eta 0:01:01 lr 0.000093 wd 0.0500 time 0.4431 (0.4521) data time 0.0006 (0.0023) model time 0.4425 (0.4501) loss 2.6973 (2.5009) grad_norm 1.9167 (3.0031) loss_scale 256.0000 (138.9491) mem 16699MB [2024-08-11 08:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][500/625] eta 0:00:56 lr 0.000093 wd 0.0500 time 0.4486 (0.4520) data time 0.0009 (0.0023) model time 0.4478 (0.4501) loss 2.7902 (2.5008) grad_norm 2.6542 (3.0358) loss_scale 256.0000 (141.2854) mem 16699MB [2024-08-11 08:48:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][510/625] eta 0:00:51 lr 0.000093 wd 0.0500 time 0.4494 (0.4520) data time 0.0007 (0.0023) model time 0.4487 (0.4500) loss 1.5716 (2.4987) grad_norm 1.9890 (3.0168) loss_scale 256.0000 (143.5303) mem 16699MB [2024-08-11 08:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][520/625] eta 0:00:47 lr 0.000093 wd 0.0500 time 0.4461 (0.4518) data time 0.0007 (0.0022) model time 0.4454 (0.4499) loss 2.9268 (2.4962) grad_norm 2.4313 (3.0155) loss_scale 256.0000 (145.6891) mem 16699MB [2024-08-11 08:48:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][530/625] eta 0:00:42 lr 0.000093 wd 0.0500 time 0.4518 (0.4518) data time 0.0006 (0.0022) model time 0.4512 (0.4498) loss 2.7956 (2.4925) grad_norm 2.4540 (3.0204) loss_scale 256.0000 (147.7665) mem 16699MB [2024-08-11 08:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][540/625] eta 0:00:38 lr 0.000093 wd 0.0500 time 0.4481 (0.4517) data time 0.0006 (0.0022) model time 0.4475 (0.4498) loss 2.3002 (2.4919) grad_norm 2.4115 (3.0492) loss_scale 256.0000 (149.7671) mem 16699MB [2024-08-11 08:48:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][550/625] eta 0:00:33 lr 0.000093 wd 0.0500 time 0.4449 (0.4517) data time 0.0009 (0.0022) model time 0.4440 (0.4497) loss 1.8282 (2.4894) grad_norm 2.3226 (3.0416) loss_scale 256.0000 (151.6951) mem 16699MB [2024-08-11 08:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][560/625] eta 0:00:29 lr 0.000093 wd 0.0500 time 0.4461 (0.4516) data time 0.0008 (0.0021) model time 0.4453 (0.4497) loss 2.4812 (2.4883) grad_norm 2.1649 (3.0318) loss_scale 256.0000 (153.5544) mem 16699MB [2024-08-11 08:49:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][570/625] eta 0:00:24 lr 0.000093 wd 0.0500 time 0.4468 (0.4515) data time 0.0008 (0.0021) model time 0.4460 (0.4496) loss 2.0934 (2.4903) grad_norm 2.1725 (3.0209) loss_scale 256.0000 (155.3485) mem 16699MB [2024-08-11 08:49:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][580/625] eta 0:00:20 lr 0.000093 wd 0.0500 time 0.4435 (0.4514) data time 0.0006 (0.0021) model time 0.4429 (0.4496) loss 2.9658 (2.4949) grad_norm 3.1979 (3.0128) loss_scale 256.0000 (157.0809) mem 16699MB [2024-08-11 08:49:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][590/625] eta 0:00:15 lr 0.000093 wd 0.0500 time 0.4452 (0.4514) data time 0.0006 (0.0021) model time 0.4445 (0.4496) loss 2.8195 (2.4946) grad_norm 2.3690 (3.0952) loss_scale 256.0000 (158.7547) mem 16699MB [2024-08-11 08:49:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][600/625] eta 0:00:11 lr 0.000093 wd 0.0500 time 0.4487 (0.4514) data time 0.0007 (0.0020) model time 0.4481 (0.4495) loss 3.0264 (2.4972) grad_norm 2.0648 (3.0807) loss_scale 256.0000 (160.3727) mem 16699MB [2024-08-11 08:49:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][610/625] eta 0:00:06 lr 0.000093 wd 0.0500 time 0.4442 (0.4513) data time 0.0004 (0.0020) model time 0.4439 (0.4495) loss 2.6441 (2.5000) grad_norm 3.2095 (3.0693) loss_scale 256.0000 (161.9378) mem 16699MB [2024-08-11 08:49:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [252/300][620/625] eta 0:00:02 lr 0.000093 wd 0.0500 time 0.4488 (0.4514) data time 0.0006 (0.0020) model time 0.4482 (0.4496) loss 2.6064 (2.4991) grad_norm 2.3749 (3.0628) loss_scale 256.0000 (163.4525) mem 16699MB [2024-08-11 08:49:28 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 252 training takes 0:04:42 [2024-08-11 08:49:28 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:49:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:49:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.462 (0.462) Loss 0.5322 (0.5322) Acc@1 89.209 (89.209) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 08:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8452 (0.6267) Acc@1 81.006 (86.825) Acc@5 96.143 (97.692) Mem 16699MB [2024-08-11 08:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9121 (0.7459) Acc@1 78.906 (83.947) Acc@5 95.312 (96.649) Mem 16699MB [2024-08-11 08:49:33 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.689 Acc@5 96.619 [2024-08-11 08:49:33 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:49:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.853 (0.853) Loss 0.5000 (0.5000) Acc@1 89.258 (89.258) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 08:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.186) Loss 0.8047 (0.6045) Acc@1 81.104 (87.220) Acc@5 96.436 (97.860) Mem 16699MB [2024-08-11 08:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.152) Loss 0.8770 (0.7150) Acc@1 79.883 (84.428) Acc@5 95.654 (96.877) Mem 16699MB [2024-08-11 08:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.149 Acc@5 96.837 [2024-08-11 08:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 08:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][0/625] eta 0:13:06 lr 0.000093 wd 0.0500 time 1.2584 (1.2584) data time 0.6086 (0.6086) model time 0.0000 (0.0000) loss 3.1687 (3.1687) grad_norm 1.6906 (1.6906) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][10/625] eta 0:05:20 lr 0.000093 wd 0.0500 time 0.4462 (0.5207) data time 0.0008 (0.0560) model time 0.0000 (0.0000) loss 2.0213 (2.6383) grad_norm 2.7444 (2.9811) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][20/625] eta 0:04:54 lr 0.000093 wd 0.0500 time 0.4499 (0.4863) data time 0.0006 (0.0298) model time 0.0000 (0.0000) loss 2.5481 (2.5977) grad_norm 4.2101 (2.7149) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][30/625] eta 0:04:41 lr 0.000093 wd 0.0500 time 0.4502 (0.4735) data time 0.0006 (0.0204) model time 0.0000 (0.0000) loss 2.6275 (2.6442) grad_norm 2.6551 (2.7332) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][40/625] eta 0:04:33 lr 0.000092 wd 0.0500 time 0.4460 (0.4673) data time 0.0010 (0.0156) model time 0.0000 (0.0000) loss 2.4617 (2.5866) grad_norm 1.9808 (2.6577) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][50/625] eta 0:04:26 lr 0.000092 wd 0.0500 time 0.4476 (0.4636) data time 0.0006 (0.0127) model time 0.0000 (0.0000) loss 1.3888 (2.5037) grad_norm 3.1931 (2.7664) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][60/625] eta 0:04:20 lr 0.000092 wd 0.0500 time 0.4512 (0.4611) data time 0.0008 (0.0108) model time 0.4504 (0.4479) loss 2.3792 (2.4660) grad_norm 2.1153 (2.9448) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][70/625] eta 0:04:14 lr 0.000092 wd 0.0500 time 0.4466 (0.4593) data time 0.0009 (0.0094) model time 0.4457 (0.4475) loss 2.7634 (2.4766) grad_norm 2.2186 (2.8673) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][80/625] eta 0:04:09 lr 0.000092 wd 0.0500 time 0.4459 (0.4578) data time 0.0006 (0.0083) model time 0.4453 (0.4472) loss 2.4169 (2.4805) grad_norm 2.0631 (2.7983) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][90/625] eta 0:04:04 lr 0.000092 wd 0.0500 time 0.4510 (0.4568) data time 0.0007 (0.0075) model time 0.4503 (0.4472) loss 1.6133 (2.4707) grad_norm 2.7464 (2.7714) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][100/625] eta 0:03:59 lr 0.000092 wd 0.0500 time 0.4490 (0.4559) data time 0.0008 (0.0068) model time 0.4482 (0.4472) loss 3.0801 (2.4728) grad_norm 4.0343 (2.9110) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][110/625] eta 0:03:54 lr 0.000092 wd 0.0500 time 0.4467 (0.4553) data time 0.0006 (0.0063) model time 0.4461 (0.4474) loss 3.0709 (2.4738) grad_norm 2.6270 (2.8457) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 08:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][120/625] eta 0:03:49 lr 0.000092 wd 0.0500 time 0.4496 (0.4547) data time 0.0008 (0.0059) model time 0.4488 (0.4474) loss 2.2084 (2.4819) grad_norm 2.8239 (inf) loss_scale 128.0000 (253.8843) mem 16699MB [2024-08-11 08:50:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][130/625] eta 0:03:45 lr 0.000092 wd 0.0500 time 0.4521 (0.4554) data time 0.0007 (0.0055) model time 0.4515 (0.4494) loss 2.7328 (2.4969) grad_norm 2.0650 (inf) loss_scale 128.0000 (244.2748) mem 16699MB [2024-08-11 08:50:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][140/625] eta 0:03:41 lr 0.000092 wd 0.0500 time 0.4495 (0.4565) data time 0.0007 (0.0051) model time 0.4488 (0.4517) loss 1.5085 (2.4868) grad_norm 2.0779 (inf) loss_scale 128.0000 (236.0284) mem 16699MB [2024-08-11 08:50:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][150/625] eta 0:03:37 lr 0.000092 wd 0.0500 time 0.4468 (0.4569) data time 0.0008 (0.0049) model time 0.4460 (0.4527) loss 2.5765 (2.4861) grad_norm 2.0487 (inf) loss_scale 128.0000 (228.8742) mem 16699MB [2024-08-11 08:50:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][160/625] eta 0:03:32 lr 0.000092 wd 0.0500 time 0.4446 (0.4563) data time 0.0006 (0.0046) model time 0.4440 (0.4522) loss 2.3722 (2.4972) grad_norm 3.0823 (inf) loss_scale 128.0000 (222.6087) mem 16699MB [2024-08-11 08:50:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][170/625] eta 0:03:27 lr 0.000092 wd 0.0500 time 0.4473 (0.4558) data time 0.0009 (0.0044) model time 0.4464 (0.4517) loss 2.7986 (2.4962) grad_norm 3.2188 (inf) loss_scale 128.0000 (217.0760) mem 16699MB [2024-08-11 08:50:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][180/625] eta 0:03:22 lr 0.000092 wd 0.0500 time 0.4480 (0.4554) data time 0.0007 (0.0042) model time 0.4473 (0.4513) loss 2.4855 (2.4997) grad_norm 3.1008 (inf) loss_scale 128.0000 (212.1547) mem 16699MB [2024-08-11 08:51:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][190/625] eta 0:03:17 lr 0.000092 wd 0.0500 time 0.4491 (0.4551) data time 0.0006 (0.0040) model time 0.4485 (0.4512) loss 2.3099 (2.5009) grad_norm 2.6263 (inf) loss_scale 128.0000 (207.7487) mem 16699MB [2024-08-11 08:51:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][200/625] eta 0:03:13 lr 0.000092 wd 0.0500 time 0.4473 (0.4548) data time 0.0008 (0.0039) model time 0.4465 (0.4509) loss 2.8412 (2.5107) grad_norm 2.3050 (inf) loss_scale 128.0000 (203.7811) mem 16699MB [2024-08-11 08:51:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][210/625] eta 0:03:08 lr 0.000092 wd 0.0500 time 0.4449 (0.4545) data time 0.0009 (0.0037) model time 0.4440 (0.4507) loss 2.6048 (2.5051) grad_norm 2.2908 (inf) loss_scale 128.0000 (200.1896) mem 16699MB [2024-08-11 08:51:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][220/625] eta 0:03:03 lr 0.000092 wd 0.0500 time 0.4469 (0.4543) data time 0.0006 (0.0036) model time 0.4463 (0.4507) loss 2.8524 (2.4988) grad_norm 2.5253 (inf) loss_scale 128.0000 (196.9231) mem 16699MB [2024-08-11 08:51:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][230/625] eta 0:02:59 lr 0.000091 wd 0.0500 time 0.4464 (0.4540) data time 0.0008 (0.0035) model time 0.4455 (0.4505) loss 2.7439 (2.5005) grad_norm 2.1908 (inf) loss_scale 128.0000 (193.9394) mem 16699MB [2024-08-11 08:51:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][240/625] eta 0:02:54 lr 0.000091 wd 0.0500 time 0.4450 (0.4537) data time 0.0006 (0.0034) model time 0.4445 (0.4501) loss 2.0467 (2.5013) grad_norm 3.2196 (inf) loss_scale 128.0000 (191.2033) mem 16699MB [2024-08-11 08:51:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][250/625] eta 0:02:50 lr 0.000091 wd 0.0500 time 0.4473 (0.4534) data time 0.0008 (0.0033) model time 0.4465 (0.4499) loss 2.3173 (2.4914) grad_norm 1.7933 (inf) loss_scale 128.0000 (188.6853) mem 16699MB [2024-08-11 08:51:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][260/625] eta 0:02:45 lr 0.000091 wd 0.0500 time 0.4473 (0.4532) data time 0.0008 (0.0032) model time 0.4465 (0.4498) loss 2.7588 (2.4919) grad_norm 2.3995 (inf) loss_scale 128.0000 (186.3602) mem 16699MB [2024-08-11 08:51:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][270/625] eta 0:02:40 lr 0.000091 wd 0.0500 time 0.4452 (0.4529) data time 0.0008 (0.0031) model time 0.4444 (0.4496) loss 2.0291 (2.4931) grad_norm 2.0102 (inf) loss_scale 128.0000 (184.2066) mem 16699MB [2024-08-11 08:51:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][280/625] eta 0:02:36 lr 0.000091 wd 0.0500 time 0.4500 (0.4527) data time 0.0007 (0.0030) model time 0.4493 (0.4494) loss 2.4498 (2.4973) grad_norm 3.5614 (inf) loss_scale 128.0000 (182.2064) mem 16699MB [2024-08-11 08:51:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][290/625] eta 0:02:31 lr 0.000091 wd 0.0500 time 0.4478 (0.4525) data time 0.0006 (0.0029) model time 0.4472 (0.4493) loss 2.5884 (2.5018) grad_norm 2.3296 (inf) loss_scale 128.0000 (180.3436) mem 16699MB [2024-08-11 08:51:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][300/625] eta 0:02:26 lr 0.000091 wd 0.0500 time 0.4468 (0.4522) data time 0.0008 (0.0029) model time 0.4460 (0.4491) loss 2.4463 (2.4996) grad_norm 2.3091 (inf) loss_scale 128.0000 (178.6047) mem 16699MB [2024-08-11 08:51:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][310/625] eta 0:02:22 lr 0.000091 wd 0.0500 time 0.4538 (0.4521) data time 0.0008 (0.0028) model time 0.4531 (0.4490) loss 2.6879 (2.5052) grad_norm 2.4415 (inf) loss_scale 128.0000 (176.9775) mem 16699MB [2024-08-11 08:52:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][320/625] eta 0:02:17 lr 0.000091 wd 0.0500 time 0.4418 (0.4520) data time 0.0008 (0.0027) model time 0.4410 (0.4489) loss 1.5093 (2.4956) grad_norm 2.8093 (inf) loss_scale 128.0000 (175.4517) mem 16699MB [2024-08-11 08:52:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][330/625] eta 0:02:13 lr 0.000091 wd 0.0500 time 0.4587 (0.4518) data time 0.0008 (0.0027) model time 0.4579 (0.4488) loss 2.8224 (2.4948) grad_norm 3.1095 (inf) loss_scale 128.0000 (174.0181) mem 16699MB [2024-08-11 08:52:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][340/625] eta 0:02:08 lr 0.000091 wd 0.0500 time 0.4468 (0.4516) data time 0.0008 (0.0026) model time 0.4460 (0.4487) loss 1.5519 (2.4983) grad_norm 2.3961 (inf) loss_scale 128.0000 (172.6686) mem 16699MB [2024-08-11 08:52:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][350/625] eta 0:02:04 lr 0.000091 wd 0.0500 time 0.4459 (0.4515) data time 0.0006 (0.0026) model time 0.4453 (0.4486) loss 2.8050 (2.4953) grad_norm 2.6400 (inf) loss_scale 128.0000 (171.3960) mem 16699MB [2024-08-11 08:52:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][360/625] eta 0:01:59 lr 0.000091 wd 0.0500 time 0.4447 (0.4514) data time 0.0010 (0.0025) model time 0.4437 (0.4485) loss 2.5194 (2.4998) grad_norm 2.1312 (inf) loss_scale 128.0000 (170.1939) mem 16699MB [2024-08-11 08:52:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][370/625] eta 0:01:55 lr 0.000091 wd 0.0500 time 0.4488 (0.4513) data time 0.0007 (0.0025) model time 0.4481 (0.4485) loss 2.0576 (2.4968) grad_norm 1.9831 (inf) loss_scale 128.0000 (169.0566) mem 16699MB [2024-08-11 08:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][380/625] eta 0:01:50 lr 0.000091 wd 0.0500 time 0.4487 (0.4512) data time 0.0006 (0.0024) model time 0.4481 (0.4484) loss 2.7340 (2.4979) grad_norm 1.9566 (inf) loss_scale 128.0000 (167.9790) mem 16699MB [2024-08-11 08:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][390/625] eta 0:01:45 lr 0.000091 wd 0.0500 time 0.4454 (0.4511) data time 0.0008 (0.0024) model time 0.4446 (0.4483) loss 2.5362 (2.5001) grad_norm 2.9167 (inf) loss_scale 128.0000 (166.9565) mem 16699MB [2024-08-11 08:52:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][400/625] eta 0:01:41 lr 0.000091 wd 0.0500 time 0.4487 (0.4510) data time 0.0008 (0.0024) model time 0.4479 (0.4483) loss 2.4716 (2.4963) grad_norm 2.1226 (inf) loss_scale 128.0000 (165.9850) mem 16699MB [2024-08-11 08:52:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][410/625] eta 0:01:36 lr 0.000091 wd 0.0500 time 0.4503 (0.4510) data time 0.0007 (0.0023) model time 0.4496 (0.4483) loss 2.3592 (2.4973) grad_norm 2.5198 (inf) loss_scale 128.0000 (165.0608) mem 16699MB [2024-08-11 08:52:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][420/625] eta 0:01:32 lr 0.000090 wd 0.0500 time 0.4503 (0.4509) data time 0.0007 (0.0023) model time 0.4496 (0.4483) loss 2.0309 (2.4951) grad_norm 2.3485 (inf) loss_scale 128.0000 (164.1805) mem 16699MB [2024-08-11 08:52:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][430/625] eta 0:01:27 lr 0.000090 wd 0.0500 time 0.4504 (0.4510) data time 0.0006 (0.0023) model time 0.4497 (0.4484) loss 2.6684 (2.4947) grad_norm 1.6024 (inf) loss_scale 128.0000 (163.3411) mem 16699MB [2024-08-11 08:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][440/625] eta 0:01:23 lr 0.000090 wd 0.0500 time 0.4467 (0.4509) data time 0.0007 (0.0022) model time 0.4460 (0.4484) loss 2.7576 (2.4993) grad_norm 2.8522 (inf) loss_scale 128.0000 (162.5397) mem 16699MB [2024-08-11 08:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][450/625] eta 0:01:18 lr 0.000090 wd 0.0500 time 0.4578 (0.4509) data time 0.0007 (0.0022) model time 0.4571 (0.4484) loss 2.1290 (2.5006) grad_norm 2.0489 (inf) loss_scale 128.0000 (161.7738) mem 16699MB [2024-08-11 08:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][460/625] eta 0:01:14 lr 0.000090 wd 0.0500 time 0.4473 (0.4508) data time 0.0009 (0.0022) model time 0.4464 (0.4483) loss 2.7770 (2.5048) grad_norm 1.8739 (inf) loss_scale 128.0000 (161.0412) mem 16699MB [2024-08-11 08:53:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][470/625] eta 0:01:09 lr 0.000090 wd 0.0500 time 0.4459 (0.4510) data time 0.0006 (0.0022) model time 0.4453 (0.4486) loss 2.3239 (2.5060) grad_norm 2.7263 (inf) loss_scale 128.0000 (160.3397) mem 16699MB [2024-08-11 08:53:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][480/625] eta 0:01:05 lr 0.000090 wd 0.0500 time 0.4480 (0.4513) data time 0.0007 (0.0021) model time 0.4473 (0.4490) loss 2.7196 (2.5072) grad_norm 2.1473 (inf) loss_scale 128.0000 (159.6674) mem 16699MB [2024-08-11 08:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][490/625] eta 0:01:00 lr 0.000090 wd 0.0500 time 0.4458 (0.4512) data time 0.0008 (0.0021) model time 0.4450 (0.4489) loss 2.7387 (2.5064) grad_norm 1.4004 (inf) loss_scale 128.0000 (159.0224) mem 16699MB [2024-08-11 08:53:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][500/625] eta 0:00:56 lr 0.000090 wd 0.0500 time 0.4494 (0.4512) data time 0.0006 (0.0021) model time 0.4488 (0.4489) loss 3.0769 (2.5044) grad_norm 2.4026 (inf) loss_scale 128.0000 (158.4032) mem 16699MB [2024-08-11 08:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][510/625] eta 0:00:51 lr 0.000090 wd 0.0500 time 0.4477 (0.4511) data time 0.0008 (0.0020) model time 0.4469 (0.4489) loss 2.1225 (2.5054) grad_norm 2.1286 (inf) loss_scale 128.0000 (157.8082) mem 16699MB [2024-08-11 08:53:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][520/625] eta 0:00:47 lr 0.000090 wd 0.0500 time 0.4430 (0.4510) data time 0.0006 (0.0020) model time 0.4424 (0.4488) loss 2.2512 (2.5036) grad_norm 2.4613 (inf) loss_scale 128.0000 (157.2361) mem 16699MB [2024-08-11 08:53:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][530/625] eta 0:00:42 lr 0.000090 wd 0.0500 time 0.4433 (0.4510) data time 0.0007 (0.0020) model time 0.4427 (0.4488) loss 2.3285 (2.5064) grad_norm 7.3080 (inf) loss_scale 128.0000 (156.6855) mem 16699MB [2024-08-11 08:53:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][540/625] eta 0:00:38 lr 0.000090 wd 0.0500 time 0.4436 (0.4509) data time 0.0006 (0.0020) model time 0.4430 (0.4487) loss 2.7090 (2.5068) grad_norm 1.8531 (inf) loss_scale 128.0000 (156.1553) mem 16699MB [2024-08-11 08:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][550/625] eta 0:00:33 lr 0.000090 wd 0.0500 time 0.4494 (0.4508) data time 0.0006 (0.0020) model time 0.4488 (0.4486) loss 1.6003 (2.5045) grad_norm 2.6973 (inf) loss_scale 128.0000 (155.6443) mem 16699MB [2024-08-11 08:53:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][560/625] eta 0:00:29 lr 0.000090 wd 0.0500 time 0.4476 (0.4507) data time 0.0009 (0.0019) model time 0.4467 (0.4486) loss 2.7185 (2.5011) grad_norm 1.8278 (inf) loss_scale 128.0000 (155.1515) mem 16699MB [2024-08-11 08:53:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][570/625] eta 0:00:24 lr 0.000090 wd 0.0500 time 0.4445 (0.4506) data time 0.0006 (0.0019) model time 0.4439 (0.4485) loss 2.9177 (2.4998) grad_norm 1.5983 (inf) loss_scale 128.0000 (154.6760) mem 16699MB [2024-08-11 08:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][580/625] eta 0:00:20 lr 0.000090 wd 0.0500 time 0.4423 (0.4506) data time 0.0011 (0.0019) model time 0.4412 (0.4485) loss 1.9610 (2.4987) grad_norm 2.5855 (inf) loss_scale 128.0000 (154.2169) mem 16699MB [2024-08-11 08:54:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][590/625] eta 0:00:15 lr 0.000090 wd 0.0500 time 0.4438 (0.4505) data time 0.0007 (0.0019) model time 0.4432 (0.4485) loss 2.1130 (2.4998) grad_norm 3.1436 (inf) loss_scale 128.0000 (153.7733) mem 16699MB [2024-08-11 08:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][600/625] eta 0:00:11 lr 0.000090 wd 0.0500 time 0.4500 (0.4505) data time 0.0008 (0.0019) model time 0.4492 (0.4484) loss 2.7612 (2.4979) grad_norm 2.4515 (inf) loss_scale 128.0000 (153.3444) mem 16699MB [2024-08-11 08:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][610/625] eta 0:00:06 lr 0.000089 wd 0.0500 time 0.4416 (0.4504) data time 0.0006 (0.0019) model time 0.4410 (0.4484) loss 2.6270 (2.4979) grad_norm 2.8756 (inf) loss_scale 128.0000 (152.9296) mem 16699MB [2024-08-11 08:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [253/300][620/625] eta 0:00:02 lr 0.000089 wd 0.0500 time 0.4528 (0.4503) data time 0.0006 (0.0018) model time 0.4522 (0.4483) loss 2.5662 (2.4995) grad_norm 6.5037 (inf) loss_scale 128.0000 (152.5282) mem 16699MB [2024-08-11 08:54:18 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 253 training takes 0:04:41 [2024-08-11 08:54:18 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:54:19 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.489 (0.489) Loss 0.5366 (0.5366) Acc@1 88.721 (88.721) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 08:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 0.8379 (0.6264) Acc@1 80.908 (86.821) Acc@5 96.191 (97.776) Mem 16699MB [2024-08-11 08:54:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.9395 (0.7460) Acc@1 78.613 (83.970) Acc@5 95.166 (96.661) Mem 16699MB [2024-08-11 08:54:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.717 Acc@5 96.639 [2024-08-11 08:54:23 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.896 (0.896) Loss 0.5010 (0.5010) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 08:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.189) Loss 0.8081 (0.6053) Acc@1 81.201 (87.234) Acc@5 96.484 (97.869) Mem 16699MB [2024-08-11 08:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.154) Loss 0.8784 (0.7163) Acc@1 79.736 (84.438) Acc@5 95.654 (96.877) Mem 16699MB [2024-08-11 08:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.163 Acc@5 96.835 [2024-08-11 08:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][0/625] eta 0:12:37 lr 0.000089 wd 0.0500 time 1.2128 (1.2128) data time 0.7825 (0.7825) model time 0.0000 (0.0000) loss 2.5789 (2.5789) grad_norm 2.8139 (2.8139) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][10/625] eta 0:05:26 lr 0.000089 wd 0.0500 time 0.4438 (0.5307) data time 0.0011 (0.0719) model time 0.0000 (0.0000) loss 2.6744 (2.4562) grad_norm 3.2678 (3.2545) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][20/625] eta 0:04:57 lr 0.000089 wd 0.0500 time 0.4472 (0.4909) data time 0.0006 (0.0380) model time 0.0000 (0.0000) loss 1.4954 (2.4006) grad_norm 6.8189 (5.3143) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][30/625] eta 0:04:43 lr 0.000089 wd 0.0500 time 0.4471 (0.4763) data time 0.0006 (0.0260) model time 0.0000 (0.0000) loss 2.0838 (2.3980) grad_norm 1.6852 (4.9816) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][40/625] eta 0:04:34 lr 0.000089 wd 0.0500 time 0.4490 (0.4690) data time 0.0008 (0.0199) model time 0.0000 (0.0000) loss 1.5425 (2.3859) grad_norm 15.0788 (4.9838) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][50/625] eta 0:04:27 lr 0.000089 wd 0.0500 time 0.4465 (0.4646) data time 0.0007 (0.0162) model time 0.0000 (0.0000) loss 1.5183 (2.4089) grad_norm 2.9674 (4.6088) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][60/625] eta 0:04:20 lr 0.000089 wd 0.0500 time 0.4474 (0.4615) data time 0.0008 (0.0136) model time 0.4466 (0.4452) loss 2.5872 (2.4187) grad_norm 2.4877 (4.4228) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:54:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][70/625] eta 0:04:15 lr 0.000089 wd 0.0500 time 0.4486 (0.4598) data time 0.0007 (0.0118) model time 0.4479 (0.4466) loss 3.1005 (2.4284) grad_norm 2.8047 (4.1703) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][80/625] eta 0:04:11 lr 0.000089 wd 0.0500 time 0.4457 (0.4608) data time 0.0008 (0.0105) model time 0.4448 (0.4534) loss 2.4911 (2.4342) grad_norm 3.0173 (3.9269) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][90/625] eta 0:04:05 lr 0.000089 wd 0.0500 time 0.4493 (0.4595) data time 0.0009 (0.0094) model time 0.4484 (0.4522) loss 2.0216 (2.4295) grad_norm 2.0235 (3.8288) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][100/625] eta 0:04:00 lr 0.000089 wd 0.0500 time 0.4482 (0.4583) data time 0.0008 (0.0086) model time 0.4473 (0.4510) loss 2.0657 (2.4250) grad_norm 2.9690 (3.6872) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][110/625] eta 0:03:55 lr 0.000089 wd 0.0500 time 0.4439 (0.4572) data time 0.0008 (0.0079) model time 0.4431 (0.4500) loss 2.5932 (2.4417) grad_norm 4.3262 (3.5846) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][120/625] eta 0:03:50 lr 0.000089 wd 0.0500 time 0.4438 (0.4562) data time 0.0008 (0.0073) model time 0.4430 (0.4492) loss 1.9328 (2.4411) grad_norm 3.0570 (3.4889) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][130/625] eta 0:03:45 lr 0.000089 wd 0.0500 time 0.4524 (0.4554) data time 0.0006 (0.0068) model time 0.4518 (0.4487) loss 3.2459 (2.4331) grad_norm 1.9994 (3.4112) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][140/625] eta 0:03:40 lr 0.000089 wd 0.0500 time 0.4512 (0.4549) data time 0.0009 (0.0064) model time 0.4503 (0.4485) loss 2.5648 (2.4329) grad_norm 2.1081 (3.3463) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][150/625] eta 0:03:36 lr 0.000089 wd 0.0500 time 0.6533 (0.4558) data time 0.0007 (0.0060) model time 0.6526 (0.4505) loss 2.5903 (2.4323) grad_norm 2.0387 (3.2942) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][160/625] eta 0:03:31 lr 0.000089 wd 0.0500 time 0.4459 (0.4550) data time 0.0009 (0.0057) model time 0.4451 (0.4497) loss 2.3349 (2.4441) grad_norm 5.9830 (3.2761) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][170/625] eta 0:03:26 lr 0.000088 wd 0.0500 time 0.4468 (0.4546) data time 0.0008 (0.0054) model time 0.4460 (0.4495) loss 2.5600 (2.4471) grad_norm 3.3995 (3.2535) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][180/625] eta 0:03:22 lr 0.000088 wd 0.0500 time 0.4495 (0.4542) data time 0.0008 (0.0051) model time 0.4487 (0.4493) loss 2.9515 (2.4397) grad_norm 2.5050 (3.2303) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][190/625] eta 0:03:17 lr 0.000088 wd 0.0500 time 0.4475 (0.4539) data time 0.0008 (0.0049) model time 0.4467 (0.4492) loss 2.3754 (2.4441) grad_norm 2.4084 (3.2101) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][200/625] eta 0:03:12 lr 0.000088 wd 0.0500 time 0.4456 (0.4535) data time 0.0007 (0.0047) model time 0.4450 (0.4489) loss 1.7825 (2.4370) grad_norm 2.1834 (3.1849) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][210/625] eta 0:03:08 lr 0.000088 wd 0.0500 time 0.4494 (0.4532) data time 0.0007 (0.0045) model time 0.4487 (0.4488) loss 2.7013 (2.4449) grad_norm 2.2297 (3.3106) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][220/625] eta 0:03:03 lr 0.000088 wd 0.0500 time 0.4506 (0.4530) data time 0.0007 (0.0044) model time 0.4499 (0.4487) loss 2.3274 (2.4538) grad_norm 2.7986 (3.3126) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][230/625] eta 0:02:58 lr 0.000088 wd 0.0500 time 0.4483 (0.4528) data time 0.0008 (0.0042) model time 0.4474 (0.4486) loss 2.6918 (2.4457) grad_norm 1.9644 (3.2798) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][240/625] eta 0:02:54 lr 0.000088 wd 0.0500 time 0.4503 (0.4526) data time 0.0006 (0.0041) model time 0.4497 (0.4486) loss 2.8570 (2.4419) grad_norm 4.6916 (3.2727) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][250/625] eta 0:02:49 lr 0.000088 wd 0.0500 time 0.4488 (0.4524) data time 0.0009 (0.0039) model time 0.4479 (0.4485) loss 2.0972 (2.4454) grad_norm 2.1070 (3.2563) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][260/625] eta 0:02:45 lr 0.000088 wd 0.0500 time 0.4437 (0.4522) data time 0.0008 (0.0038) model time 0.4429 (0.4484) loss 2.5025 (2.4495) grad_norm 2.3866 (3.2255) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][270/625] eta 0:02:40 lr 0.000088 wd 0.0500 time 0.4432 (0.4526) data time 0.0009 (0.0037) model time 0.4424 (0.4490) loss 2.0743 (2.4528) grad_norm 3.7048 (3.2274) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][280/625] eta 0:02:36 lr 0.000088 wd 0.0500 time 0.4476 (0.4525) data time 0.0008 (0.0036) model time 0.4468 (0.4490) loss 2.4783 (2.4509) grad_norm 3.6355 (3.2209) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][290/625] eta 0:02:31 lr 0.000088 wd 0.0500 time 0.4485 (0.4524) data time 0.0007 (0.0035) model time 0.4478 (0.4489) loss 1.6821 (2.4513) grad_norm 1.6008 (3.2458) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][300/625] eta 0:02:26 lr 0.000088 wd 0.0500 time 0.4473 (0.4522) data time 0.0006 (0.0034) model time 0.4467 (0.4489) loss 2.4788 (2.4517) grad_norm 2.1501 (3.2454) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][310/625] eta 0:02:22 lr 0.000088 wd 0.0500 time 0.4463 (0.4521) data time 0.0006 (0.0033) model time 0.4457 (0.4488) loss 2.4783 (2.4535) grad_norm 3.9180 (3.2453) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][320/625] eta 0:02:17 lr 0.000088 wd 0.0500 time 0.4482 (0.4520) data time 0.0008 (0.0033) model time 0.4473 (0.4488) loss 2.7627 (2.4589) grad_norm 3.0439 (3.2832) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][330/625] eta 0:02:13 lr 0.000088 wd 0.0500 time 0.4530 (0.4519) data time 0.0007 (0.0032) model time 0.4523 (0.4487) loss 2.9417 (2.4634) grad_norm 2.6938 (3.2754) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][340/625] eta 0:02:08 lr 0.000088 wd 0.0500 time 0.4488 (0.4518) data time 0.0009 (0.0031) model time 0.4479 (0.4487) loss 2.8065 (2.4683) grad_norm 3.0636 (3.2592) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][350/625] eta 0:02:04 lr 0.000088 wd 0.0500 time 0.4476 (0.4516) data time 0.0006 (0.0031) model time 0.4469 (0.4486) loss 2.2334 (2.4707) grad_norm 1.9746 (3.2423) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][360/625] eta 0:01:59 lr 0.000087 wd 0.0500 time 0.4508 (0.4516) data time 0.0006 (0.0030) model time 0.4502 (0.4486) loss 1.4240 (2.4710) grad_norm 2.5167 (3.2337) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][370/625] eta 0:01:55 lr 0.000087 wd 0.0500 time 0.4464 (0.4515) data time 0.0006 (0.0029) model time 0.4458 (0.4486) loss 2.4888 (2.4717) grad_norm 2.6784 (3.2443) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][380/625] eta 0:01:50 lr 0.000087 wd 0.0500 time 0.4480 (0.4514) data time 0.0008 (0.0029) model time 0.4472 (0.4485) loss 2.8126 (2.4741) grad_norm 6.3128 (3.2400) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][390/625] eta 0:01:46 lr 0.000087 wd 0.0500 time 0.4474 (0.4513) data time 0.0009 (0.0028) model time 0.4466 (0.4485) loss 2.3772 (2.4759) grad_norm 2.2193 (3.2221) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][400/625] eta 0:01:41 lr 0.000087 wd 0.0500 time 0.4452 (0.4512) data time 0.0008 (0.0028) model time 0.4443 (0.4484) loss 2.7343 (2.4765) grad_norm 2.0019 (3.1993) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][410/625] eta 0:01:36 lr 0.000087 wd 0.0500 time 0.4487 (0.4511) data time 0.0009 (0.0027) model time 0.4478 (0.4484) loss 2.8568 (2.4764) grad_norm 3.0288 (3.1850) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][420/625] eta 0:01:32 lr 0.000087 wd 0.0500 time 0.4518 (0.4510) data time 0.0006 (0.0027) model time 0.4512 (0.4483) loss 2.3347 (2.4806) grad_norm 3.4336 (3.1714) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][430/625] eta 0:01:27 lr 0.000087 wd 0.0500 time 0.4480 (0.4510) data time 0.0006 (0.0027) model time 0.4474 (0.4483) loss 2.4490 (2.4821) grad_norm 4.7248 (3.1618) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][440/625] eta 0:01:23 lr 0.000087 wd 0.0500 time 0.4480 (0.4510) data time 0.0008 (0.0026) model time 0.4473 (0.4484) loss 2.4346 (2.4849) grad_norm 1.8369 (3.1705) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][450/625] eta 0:01:18 lr 0.000087 wd 0.0500 time 0.4488 (0.4509) data time 0.0009 (0.0026) model time 0.4479 (0.4483) loss 2.7284 (2.4870) grad_norm 4.7227 (3.1786) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][460/625] eta 0:01:14 lr 0.000087 wd 0.0500 time 0.4489 (0.4508) data time 0.0006 (0.0025) model time 0.4483 (0.4483) loss 2.9873 (2.4895) grad_norm 2.6652 (3.1710) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][470/625] eta 0:01:09 lr 0.000087 wd 0.0500 time 0.4423 (0.4507) data time 0.0008 (0.0025) model time 0.4415 (0.4482) loss 2.8620 (2.4901) grad_norm 2.6944 (3.1902) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][480/625] eta 0:01:05 lr 0.000087 wd 0.0500 time 0.4469 (0.4506) data time 0.0009 (0.0025) model time 0.4460 (0.4481) loss 2.3027 (2.4901) grad_norm 2.2442 (3.1899) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][490/625] eta 0:01:00 lr 0.000087 wd 0.0500 time 0.4462 (0.4513) data time 0.0008 (0.0024) model time 0.4455 (0.4489) loss 2.4868 (2.4916) grad_norm 2.3730 (3.2041) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][500/625] eta 0:00:56 lr 0.000087 wd 0.0500 time 0.4470 (0.4512) data time 0.0006 (0.0024) model time 0.4464 (0.4489) loss 2.4702 (2.4956) grad_norm 2.9711 (3.1969) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][510/625] eta 0:00:51 lr 0.000087 wd 0.0500 time 0.4450 (0.4511) data time 0.0008 (0.0024) model time 0.4442 (0.4488) loss 2.8116 (2.5008) grad_norm 2.7602 (3.1966) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][520/625] eta 0:00:47 lr 0.000087 wd 0.0500 time 0.4488 (0.4511) data time 0.0008 (0.0023) model time 0.4481 (0.4488) loss 2.5313 (2.5047) grad_norm 2.2551 (3.1949) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][530/625] eta 0:00:42 lr 0.000087 wd 0.0500 time 0.4484 (0.4510) data time 0.0007 (0.0023) model time 0.4478 (0.4488) loss 3.0930 (2.5071) grad_norm 2.0606 (3.2001) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][540/625] eta 0:00:38 lr 0.000087 wd 0.0500 time 0.4445 (0.4510) data time 0.0009 (0.0023) model time 0.4437 (0.4487) loss 2.6947 (2.5043) grad_norm 2.2837 (3.2288) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][550/625] eta 0:00:33 lr 0.000087 wd 0.0500 time 0.4427 (0.4509) data time 0.0007 (0.0023) model time 0.4421 (0.4487) loss 1.9514 (2.5052) grad_norm 1.9358 (3.2152) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][560/625] eta 0:00:29 lr 0.000086 wd 0.0500 time 0.4470 (0.4508) data time 0.0008 (0.0022) model time 0.4462 (0.4486) loss 2.7012 (2.5039) grad_norm 3.1223 (3.2486) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][570/625] eta 0:00:24 lr 0.000086 wd 0.0500 time 0.4456 (0.4508) data time 0.0007 (0.0022) model time 0.4449 (0.4486) loss 1.5893 (2.5033) grad_norm 1.7762 (3.2461) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][580/625] eta 0:00:20 lr 0.000086 wd 0.0500 time 0.4538 (0.4507) data time 0.0006 (0.0022) model time 0.4531 (0.4485) loss 2.7613 (2.5039) grad_norm 4.5779 (3.2413) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][590/625] eta 0:00:15 lr 0.000086 wd 0.0500 time 0.4529 (0.4507) data time 0.0006 (0.0022) model time 0.4523 (0.4486) loss 2.5554 (2.4998) grad_norm 2.6433 (3.2409) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][600/625] eta 0:00:11 lr 0.000086 wd 0.0500 time 0.4449 (0.4507) data time 0.0008 (0.0021) model time 0.4441 (0.4486) loss 2.2638 (2.4987) grad_norm 1.4935 (3.2263) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][610/625] eta 0:00:06 lr 0.000086 wd 0.0500 time 0.4429 (0.4506) data time 0.0006 (0.0021) model time 0.4423 (0.4485) loss 2.1467 (2.4974) grad_norm 3.0943 (3.2269) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [254/300][620/625] eta 0:00:02 lr 0.000086 wd 0.0500 time 0.4447 (0.4505) data time 0.0006 (0.0021) model time 0.4441 (0.4484) loss 2.8886 (2.4971) grad_norm 2.6242 (3.2186) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 254 training takes 0:04:41 [2024-08-11 08:59:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 08:59:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 08:59:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5259 (0.5259) Acc@1 89.209 (89.209) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 08:59:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8350 (0.6298) Acc@1 80.713 (86.794) Acc@5 96.094 (97.754) Mem 16699MB [2024-08-11 08:59:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9268 (0.7459) Acc@1 79.053 (84.012) Acc@5 95.459 (96.656) Mem 16699MB [2024-08-11 08:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.727 Acc@5 96.637 [2024-08-11 08:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 08:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.796 (0.796) Loss 0.5020 (0.5020) Acc@1 89.404 (89.404) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 08:59:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.183) Loss 0.8086 (0.6060) Acc@1 81.201 (87.225) Acc@5 96.436 (97.887) Mem 16699MB [2024-08-11 08:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.150) Loss 0.8813 (0.7174) Acc@1 79.736 (84.440) Acc@5 95.605 (96.898) Mem 16699MB [2024-08-11 08:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.163 Acc@5 96.855 [2024-08-11 08:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 08:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][0/625] eta 0:13:17 lr 0.000086 wd 0.0500 time 1.2757 (1.2757) data time 0.4289 (0.4289) model time 0.0000 (0.0000) loss 2.8733 (2.8733) grad_norm 2.8063 (2.8063) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][10/625] eta 0:05:30 lr 0.000086 wd 0.0500 time 0.4486 (0.5369) data time 0.0006 (0.0397) model time 0.0000 (0.0000) loss 1.4338 (2.5201) grad_norm 2.5923 (2.4282) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][20/625] eta 0:04:59 lr 0.000086 wd 0.0500 time 0.4520 (0.4952) data time 0.0008 (0.0212) model time 0.0000 (0.0000) loss 2.6417 (2.4350) grad_norm 2.2056 (2.3935) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][30/625] eta 0:04:46 lr 0.000086 wd 0.0500 time 0.4475 (0.4817) data time 0.0008 (0.0146) model time 0.0000 (0.0000) loss 2.6221 (2.5032) grad_norm 2.2216 (2.6301) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][40/625] eta 0:04:37 lr 0.000086 wd 0.0500 time 0.4475 (0.4738) data time 0.0006 (0.0112) model time 0.0000 (0.0000) loss 2.2917 (2.4876) grad_norm 3.3722 (2.6306) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][50/625] eta 0:04:29 lr 0.000086 wd 0.0500 time 0.4429 (0.4688) data time 0.0008 (0.0092) model time 0.0000 (0.0000) loss 3.0110 (2.4719) grad_norm 4.2456 (2.7138) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][60/625] eta 0:04:22 lr 0.000086 wd 0.0500 time 0.4477 (0.4653) data time 0.0008 (0.0078) model time 0.4470 (0.4464) loss 2.8251 (2.4738) grad_norm 3.7688 (2.6929) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][70/625] eta 0:04:16 lr 0.000086 wd 0.0500 time 0.4464 (0.4628) data time 0.0006 (0.0068) model time 0.4458 (0.4468) loss 3.1503 (2.4840) grad_norm 2.4330 (2.6543) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][80/625] eta 0:04:11 lr 0.000086 wd 0.0500 time 0.4429 (0.4607) data time 0.0008 (0.0061) model time 0.4421 (0.4461) loss 2.2134 (2.4793) grad_norm 1.9349 (2.6321) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 08:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][90/625] eta 0:04:05 lr 0.000086 wd 0.0500 time 0.4461 (0.4593) data time 0.0008 (0.0055) model time 0.4453 (0.4463) loss 2.4095 (2.4726) grad_norm 2.8747 (3.5295) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][100/625] eta 0:04:00 lr 0.000086 wd 0.0500 time 0.4468 (0.4583) data time 0.0010 (0.0050) model time 0.4458 (0.4467) loss 1.9655 (2.4912) grad_norm 2.5978 (3.4213) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][110/625] eta 0:03:55 lr 0.000086 wd 0.0500 time 0.4477 (0.4573) data time 0.0006 (0.0047) model time 0.4471 (0.4467) loss 1.9721 (2.4924) grad_norm 24.7325 (3.5862) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][120/625] eta 0:03:50 lr 0.000085 wd 0.0500 time 0.4449 (0.4565) data time 0.0006 (0.0043) model time 0.4443 (0.4468) loss 2.9053 (2.5048) grad_norm 2.6252 (3.5126) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][130/625] eta 0:03:45 lr 0.000085 wd 0.0500 time 0.4472 (0.4558) data time 0.0006 (0.0041) model time 0.4467 (0.4466) loss 2.4474 (2.5018) grad_norm 3.4130 (3.4159) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][140/625] eta 0:03:40 lr 0.000085 wd 0.0500 time 0.4453 (0.4551) data time 0.0007 (0.0038) model time 0.4446 (0.4466) loss 3.0175 (2.5021) grad_norm 2.0484 (3.3449) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][150/625] eta 0:03:36 lr 0.000085 wd 0.0500 time 0.4467 (0.4560) data time 0.0006 (0.0036) model time 0.4461 (0.4486) loss 2.9571 (2.4959) grad_norm 2.3103 (3.2891) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][160/625] eta 0:03:31 lr 0.000085 wd 0.0500 time 0.4478 (0.4555) data time 0.0006 (0.0035) model time 0.4472 (0.4486) loss 2.0660 (2.4742) grad_norm 2.4943 (3.3065) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][170/625] eta 0:03:27 lr 0.000085 wd 0.0500 time 0.4493 (0.4551) data time 0.0007 (0.0033) model time 0.4486 (0.4485) loss 2.8913 (2.4759) grad_norm 4.2758 (3.2661) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][180/625] eta 0:03:22 lr 0.000085 wd 0.0500 time 0.4498 (0.4555) data time 0.0006 (0.0032) model time 0.4492 (0.4494) loss 2.4155 (2.4752) grad_norm 2.1287 (3.5271) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][190/625] eta 0:03:17 lr 0.000085 wd 0.0500 time 0.4459 (0.4550) data time 0.0006 (0.0030) model time 0.4453 (0.4492) loss 2.5324 (2.4711) grad_norm 2.5771 (3.4785) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][200/625] eta 0:03:13 lr 0.000085 wd 0.0500 time 0.4459 (0.4546) data time 0.0008 (0.0029) model time 0.4451 (0.4490) loss 2.6388 (2.4701) grad_norm 27.4375 (3.6032) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][210/625] eta 0:03:08 lr 0.000085 wd 0.0500 time 0.4440 (0.4543) data time 0.0006 (0.0028) model time 0.4434 (0.4489) loss 2.7413 (2.4680) grad_norm 3.1462 (3.6027) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:00:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][220/625] eta 0:03:03 lr 0.000085 wd 0.0500 time 0.4482 (0.4540) data time 0.0008 (0.0027) model time 0.4475 (0.4488) loss 2.8507 (2.4710) grad_norm 2.9092 (3.5669) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][230/625] eta 0:02:59 lr 0.000085 wd 0.0500 time 0.4518 (0.4538) data time 0.0008 (0.0027) model time 0.4511 (0.4487) loss 2.3268 (2.4732) grad_norm 2.7814 (3.5369) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][240/625] eta 0:02:54 lr 0.000085 wd 0.0500 time 0.4536 (0.4536) data time 0.0006 (0.0026) model time 0.4530 (0.4487) loss 1.5045 (2.4647) grad_norm 4.9279 (3.5024) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][250/625] eta 0:02:50 lr 0.000085 wd 0.0500 time 0.4508 (0.4534) data time 0.0008 (0.0025) model time 0.4500 (0.4486) loss 2.4669 (2.4562) grad_norm 1.8657 (3.7107) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][260/625] eta 0:02:45 lr 0.000085 wd 0.0500 time 0.4491 (0.4532) data time 0.0006 (0.0025) model time 0.4485 (0.4486) loss 3.1912 (2.4650) grad_norm 2.1786 (3.6624) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][270/625] eta 0:02:40 lr 0.000085 wd 0.0500 time 0.4474 (0.4530) data time 0.0006 (0.0024) model time 0.4468 (0.4485) loss 1.8508 (2.4608) grad_norm 3.3583 (3.6257) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][280/625] eta 0:02:36 lr 0.000085 wd 0.0500 time 0.4465 (0.4528) data time 0.0006 (0.0023) model time 0.4459 (0.4484) loss 1.7830 (2.4656) grad_norm 1.7539 (3.5707) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][290/625] eta 0:02:31 lr 0.000085 wd 0.0500 time 0.4461 (0.4526) data time 0.0006 (0.0023) model time 0.4455 (0.4484) loss 2.3453 (2.4686) grad_norm 1.9637 (3.5270) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][300/625] eta 0:02:27 lr 0.000085 wd 0.0500 time 0.4472 (0.4525) data time 0.0008 (0.0022) model time 0.4464 (0.4483) loss 2.9743 (2.4685) grad_norm 3.2717 (3.5055) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][310/625] eta 0:02:22 lr 0.000085 wd 0.0500 time 0.4470 (0.4523) data time 0.0008 (0.0022) model time 0.4462 (0.4483) loss 2.2481 (2.4716) grad_norm 2.9137 (3.4674) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][320/625] eta 0:02:17 lr 0.000084 wd 0.0500 time 0.4469 (0.4522) data time 0.0006 (0.0022) model time 0.4463 (0.4483) loss 2.9512 (2.4803) grad_norm 2.2834 (3.4414) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][330/625] eta 0:02:13 lr 0.000084 wd 0.0500 time 0.4530 (0.4521) data time 0.0008 (0.0021) model time 0.4523 (0.4483) loss 2.4321 (2.4755) grad_norm 2.4920 (3.4112) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][340/625] eta 0:02:08 lr 0.000084 wd 0.0500 time 0.4491 (0.4520) data time 0.0007 (0.0021) model time 0.4484 (0.4482) loss 2.1212 (2.4712) grad_norm 3.0432 (3.3890) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][350/625] eta 0:02:04 lr 0.000084 wd 0.0500 time 0.4462 (0.4518) data time 0.0005 (0.0020) model time 0.4457 (0.4481) loss 1.5882 (2.4693) grad_norm 2.4649 (3.3798) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][360/625] eta 0:01:59 lr 0.000084 wd 0.0500 time 0.4471 (0.4517) data time 0.0009 (0.0020) model time 0.4462 (0.4481) loss 2.4453 (2.4681) grad_norm 3.6419 (3.3611) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][370/625] eta 0:01:55 lr 0.000084 wd 0.0500 time 0.4465 (0.4516) data time 0.0008 (0.0020) model time 0.4457 (0.4481) loss 1.8702 (2.4670) grad_norm 1.5726 (3.3378) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][380/625] eta 0:01:50 lr 0.000084 wd 0.0500 time 0.4495 (0.4515) data time 0.0006 (0.0019) model time 0.4489 (0.4480) loss 1.7343 (2.4620) grad_norm 1.7611 (3.3376) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][390/625] eta 0:01:46 lr 0.000084 wd 0.0500 time 0.4464 (0.4514) data time 0.0006 (0.0019) model time 0.4458 (0.4480) loss 2.3924 (2.4673) grad_norm 2.9297 (3.3413) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][400/625] eta 0:01:41 lr 0.000084 wd 0.0500 time 0.4479 (0.4514) data time 0.0008 (0.0019) model time 0.4471 (0.4480) loss 2.6459 (2.4711) grad_norm 2.9142 (3.3393) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][410/625] eta 0:01:37 lr 0.000084 wd 0.0500 time 0.4421 (0.4513) data time 0.0008 (0.0019) model time 0.4413 (0.4480) loss 2.8449 (2.4717) grad_norm 8.2289 (3.3694) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][420/625] eta 0:01:32 lr 0.000084 wd 0.0500 time 0.4488 (0.4512) data time 0.0008 (0.0018) model time 0.4480 (0.4479) loss 2.7334 (2.4771) grad_norm 4.2505 (3.4531) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][430/625] eta 0:01:27 lr 0.000084 wd 0.0500 time 0.4447 (0.4511) data time 0.0008 (0.0018) model time 0.4440 (0.4479) loss 2.9646 (2.4788) grad_norm 3.7868 (3.5049) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][440/625] eta 0:01:23 lr 0.000084 wd 0.0500 time 0.4466 (0.4510) data time 0.0008 (0.0018) model time 0.4458 (0.4478) loss 2.9066 (2.4780) grad_norm 2.5775 (3.4892) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][450/625] eta 0:01:18 lr 0.000084 wd 0.0500 time 0.4473 (0.4509) data time 0.0006 (0.0018) model time 0.4467 (0.4478) loss 2.6562 (2.4766) grad_norm 1.9975 (3.4684) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][460/625] eta 0:01:14 lr 0.000084 wd 0.0500 time 0.4505 (0.4508) data time 0.0006 (0.0018) model time 0.4500 (0.4478) loss 1.8117 (2.4764) grad_norm 2.2438 (3.4631) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][470/625] eta 0:01:09 lr 0.000084 wd 0.0500 time 0.4486 (0.4508) data time 0.0008 (0.0017) model time 0.4478 (0.4478) loss 2.2444 (2.4731) grad_norm 2.1553 (3.4455) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][480/625] eta 0:01:05 lr 0.000084 wd 0.0500 time 0.4488 (0.4508) data time 0.0006 (0.0017) model time 0.4482 (0.4478) loss 2.6235 (2.4761) grad_norm 1.9706 (3.4561) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:02:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][490/625] eta 0:01:00 lr 0.000084 wd 0.0500 time 0.4453 (0.4511) data time 0.0008 (0.0017) model time 0.4445 (0.4482) loss 2.3237 (2.4761) grad_norm 5.4742 (3.4536) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][500/625] eta 0:00:56 lr 0.000084 wd 0.0500 time 0.4467 (0.4511) data time 0.0008 (0.0017) model time 0.4458 (0.4482) loss 2.5636 (2.4764) grad_norm 2.7512 (3.4449) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][510/625] eta 0:00:51 lr 0.000084 wd 0.0500 time 0.4460 (0.4513) data time 0.0008 (0.0017) model time 0.4452 (0.4485) loss 2.7347 (2.4801) grad_norm 2.2369 (3.4339) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][520/625] eta 0:00:47 lr 0.000083 wd 0.0500 time 0.4510 (0.4512) data time 0.0006 (0.0016) model time 0.4504 (0.4485) loss 2.5963 (2.4825) grad_norm 2.0759 (3.4154) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][530/625] eta 0:00:42 lr 0.000083 wd 0.0500 time 0.4466 (0.4512) data time 0.0008 (0.0016) model time 0.4457 (0.4485) loss 2.5298 (2.4808) grad_norm 2.0323 (3.3990) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][540/625] eta 0:00:38 lr 0.000083 wd 0.0500 time 0.4520 (0.4512) data time 0.0006 (0.0016) model time 0.4513 (0.4485) loss 2.2791 (2.4790) grad_norm 3.7328 (3.3846) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][550/625] eta 0:00:33 lr 0.000083 wd 0.0500 time 0.4518 (0.4511) data time 0.0008 (0.0016) model time 0.4510 (0.4485) loss 2.8243 (2.4820) grad_norm 2.4513 (3.3781) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][560/625] eta 0:00:29 lr 0.000083 wd 0.0500 time 0.4485 (0.4511) data time 0.0006 (0.0016) model time 0.4480 (0.4485) loss 2.3409 (2.4779) grad_norm 3.6078 (3.3658) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][570/625] eta 0:00:24 lr 0.000083 wd 0.0500 time 0.4460 (0.4510) data time 0.0008 (0.0016) model time 0.4453 (0.4484) loss 2.4108 (2.4797) grad_norm 1.7193 (3.3556) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][580/625] eta 0:00:20 lr 0.000083 wd 0.0500 time 0.4480 (0.4510) data time 0.0008 (0.0016) model time 0.4472 (0.4484) loss 1.7567 (2.4741) grad_norm 3.6957 (3.3487) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][590/625] eta 0:00:15 lr 0.000083 wd 0.0500 time 0.4469 (0.4509) data time 0.0006 (0.0015) model time 0.4464 (0.4484) loss 3.1147 (2.4776) grad_norm 2.3317 (3.3369) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][600/625] eta 0:00:11 lr 0.000083 wd 0.0500 time 0.4496 (0.4509) data time 0.0008 (0.0015) model time 0.4488 (0.4484) loss 2.7389 (2.4777) grad_norm 3.9809 (3.3290) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][610/625] eta 0:00:06 lr 0.000083 wd 0.0500 time 0.4425 (0.4508) data time 0.0006 (0.0015) model time 0.4419 (0.4484) loss 1.8120 (2.4744) grad_norm 1.7095 (3.3380) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [255/300][620/625] eta 0:00:02 lr 0.000083 wd 0.0500 time 0.4447 (0.4507) data time 0.0004 (0.0015) model time 0.4444 (0.4483) loss 2.4334 (2.4737) grad_norm 2.5819 (3.3314) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:03:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 255 training takes 0:04:41 [2024-08-11 09:03:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:04:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.487 (0.487) Loss 0.5376 (0.5376) Acc@1 88.770 (88.770) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 09:04:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 0.8359 (0.6290) Acc@1 80.811 (86.830) Acc@5 96.631 (97.816) Mem 16699MB [2024-08-11 09:04:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9263 (0.7505) Acc@1 78.809 (84.019) Acc@5 95.166 (96.659) Mem 16699MB [2024-08-11 09:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.719 Acc@5 96.617 [2024-08-11 09:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.888 (0.888) Loss 0.5029 (0.5029) Acc@1 89.307 (89.307) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:04:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.190) Loss 0.8086 (0.6065) Acc@1 81.104 (87.198) Acc@5 96.484 (97.860) Mem 16699MB [2024-08-11 09:04:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.154) Loss 0.8828 (0.7181) Acc@1 79.639 (84.415) Acc@5 95.557 (96.870) Mem 16699MB [2024-08-11 09:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.141 Acc@5 96.829 [2024-08-11 09:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][0/625] eta 0:13:06 lr 0.000083 wd 0.0500 time 1.2587 (1.2587) data time 0.5973 (0.5973) model time 0.0000 (0.0000) loss 2.6227 (2.6227) grad_norm 1.6989 (1.6989) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][10/625] eta 0:05:19 lr 0.000083 wd 0.0500 time 0.4470 (0.5203) data time 0.0008 (0.0550) model time 0.0000 (0.0000) loss 2.9946 (2.4979) grad_norm 1.9340 (2.4282) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][20/625] eta 0:04:53 lr 0.000083 wd 0.0500 time 0.4479 (0.4851) data time 0.0008 (0.0292) model time 0.0000 (0.0000) loss 2.7696 (2.5267) grad_norm 5.4057 (2.5647) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][30/625] eta 0:04:41 lr 0.000083 wd 0.0500 time 0.4480 (0.4733) data time 0.0006 (0.0200) model time 0.0000 (0.0000) loss 1.8367 (2.4523) grad_norm 2.1889 (2.4544) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][40/625] eta 0:04:33 lr 0.000083 wd 0.0500 time 0.4529 (0.4676) data time 0.0006 (0.0153) model time 0.0000 (0.0000) loss 2.3468 (2.4494) grad_norm 6.4883 (2.5695) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][50/625] eta 0:04:26 lr 0.000083 wd 0.0500 time 0.4467 (0.4638) data time 0.0008 (0.0125) model time 0.0000 (0.0000) loss 2.8462 (2.4845) grad_norm 2.1693 (2.6148) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][60/625] eta 0:04:21 lr 0.000083 wd 0.0500 time 0.4520 (0.4620) data time 0.0008 (0.0106) model time 0.4513 (0.4518) loss 1.6487 (2.5028) grad_norm 1.8680 (2.6758) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][70/625] eta 0:04:15 lr 0.000083 wd 0.0500 time 0.4448 (0.4599) data time 0.0008 (0.0092) model time 0.4440 (0.4491) loss 2.4510 (2.5163) grad_norm 3.2010 (2.7098) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][80/625] eta 0:04:09 lr 0.000083 wd 0.0500 time 0.4398 (0.4581) data time 0.0008 (0.0082) model time 0.4390 (0.4477) loss 1.6844 (2.5272) grad_norm 4.3891 (2.7372) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][90/625] eta 0:04:05 lr 0.000082 wd 0.0500 time 0.4373 (0.4591) data time 0.0008 (0.0074) model time 0.4364 (0.4523) loss 2.4973 (2.5209) grad_norm 3.2755 (2.7551) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][100/625] eta 0:04:00 lr 0.000082 wd 0.0500 time 0.4448 (0.4579) data time 0.0006 (0.0067) model time 0.4442 (0.4510) loss 3.0354 (2.5493) grad_norm 3.1351 (3.1127) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][110/625] eta 0:03:55 lr 0.000082 wd 0.0500 time 0.4471 (0.4570) data time 0.0008 (0.0062) model time 0.4463 (0.4504) loss 2.1783 (2.5338) grad_norm 2.3062 (3.2136) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][120/625] eta 0:03:50 lr 0.000082 wd 0.0500 time 0.4548 (0.4565) data time 0.0008 (0.0057) model time 0.4540 (0.4504) loss 2.4992 (2.5345) grad_norm 2.0516 (3.1958) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][130/625] eta 0:03:45 lr 0.000082 wd 0.0500 time 0.4460 (0.4560) data time 0.0008 (0.0054) model time 0.4452 (0.4502) loss 3.3098 (2.5218) grad_norm 2.5818 (3.2365) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][140/625] eta 0:03:40 lr 0.000082 wd 0.0500 time 0.4454 (0.4555) data time 0.0008 (0.0050) model time 0.4446 (0.4500) loss 2.4374 (2.5191) grad_norm 1.9308 (3.2654) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][150/625] eta 0:03:36 lr 0.000082 wd 0.0500 time 0.4467 (0.4550) data time 0.0008 (0.0048) model time 0.4458 (0.4496) loss 2.6118 (2.5130) grad_norm 2.0241 (3.2100) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][160/625] eta 0:03:31 lr 0.000082 wd 0.0500 time 0.4443 (0.4545) data time 0.0008 (0.0045) model time 0.4434 (0.4493) loss 2.7522 (2.5081) grad_norm 1.8604 (3.1899) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][170/625] eta 0:03:26 lr 0.000082 wd 0.0500 time 0.4448 (0.4540) data time 0.0008 (0.0043) model time 0.4440 (0.4490) loss 2.2710 (2.5176) grad_norm 2.1686 (3.1592) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][180/625] eta 0:03:21 lr 0.000082 wd 0.0500 time 0.4536 (0.4536) data time 0.0008 (0.0041) model time 0.4528 (0.4488) loss 2.8824 (2.5256) grad_norm 1.9637 (3.1556) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][190/625] eta 0:03:17 lr 0.000082 wd 0.0500 time 0.4435 (0.4534) data time 0.0009 (0.0039) model time 0.4426 (0.4488) loss 2.3631 (2.5204) grad_norm 2.0937 (3.1299) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][200/625] eta 0:03:12 lr 0.000082 wd 0.0500 time 0.4493 (0.4539) data time 0.0009 (0.0038) model time 0.4484 (0.4497) loss 2.8665 (2.5245) grad_norm 6.7570 (3.1359) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][210/625] eta 0:03:08 lr 0.000082 wd 0.0500 time 0.4483 (0.4537) data time 0.0009 (0.0036) model time 0.4474 (0.4496) loss 2.5990 (2.5295) grad_norm 2.7694 (3.1642) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][220/625] eta 0:03:03 lr 0.000082 wd 0.0500 time 0.4448 (0.4534) data time 0.0008 (0.0035) model time 0.4440 (0.4494) loss 2.4647 (2.5356) grad_norm 1.7625 (3.1519) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][230/625] eta 0:02:58 lr 0.000082 wd 0.0500 time 0.4456 (0.4531) data time 0.0006 (0.0034) model time 0.4450 (0.4492) loss 1.5537 (2.5398) grad_norm 2.5135 (3.1639) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][240/625] eta 0:02:54 lr 0.000082 wd 0.0500 time 0.4500 (0.4528) data time 0.0006 (0.0033) model time 0.4494 (0.4491) loss 2.8782 (2.5369) grad_norm 2.8618 (3.4208) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][250/625] eta 0:02:49 lr 0.000082 wd 0.0500 time 0.4487 (0.4526) data time 0.0005 (0.0032) model time 0.4481 (0.4489) loss 1.9457 (2.5267) grad_norm 3.7881 (3.3818) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][260/625] eta 0:02:45 lr 0.000082 wd 0.0500 time 0.4495 (0.4524) data time 0.0008 (0.0031) model time 0.4487 (0.4488) loss 1.7225 (2.5225) grad_norm 6.4457 (3.3681) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][270/625] eta 0:02:40 lr 0.000082 wd 0.0500 time 0.4417 (0.4523) data time 0.0009 (0.0030) model time 0.4408 (0.4488) loss 2.7712 (2.5190) grad_norm 2.1701 (3.3410) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][280/625] eta 0:02:36 lr 0.000082 wd 0.0500 time 0.4552 (0.4528) data time 0.0006 (0.0029) model time 0.4545 (0.4495) loss 2.7585 (2.5189) grad_norm 2.8678 (3.3113) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][290/625] eta 0:02:31 lr 0.000081 wd 0.0500 time 0.4506 (0.4526) data time 0.0008 (0.0029) model time 0.4498 (0.4494) loss 2.0832 (2.5180) grad_norm 2.4110 (3.2722) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][300/625] eta 0:02:27 lr 0.000081 wd 0.0500 time 0.4509 (0.4524) data time 0.0009 (0.0028) model time 0.4500 (0.4493) loss 2.6033 (2.5166) grad_norm 3.7491 (3.2456) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][310/625] eta 0:02:22 lr 0.000081 wd 0.0500 time 0.4458 (0.4523) data time 0.0009 (0.0027) model time 0.4450 (0.4492) loss 2.5756 (2.5147) grad_norm 4.9088 (3.2437) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][320/625] eta 0:02:17 lr 0.000081 wd 0.0500 time 0.4496 (0.4522) data time 0.0006 (0.0027) model time 0.4490 (0.4491) loss 2.8942 (2.5115) grad_norm 2.4235 (3.2313) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][330/625] eta 0:02:13 lr 0.000081 wd 0.0500 time 0.4487 (0.4520) data time 0.0006 (0.0026) model time 0.4481 (0.4491) loss 2.6975 (2.5066) grad_norm 1.8032 (3.2165) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][340/625] eta 0:02:08 lr 0.000081 wd 0.0500 time 0.4473 (0.4520) data time 0.0010 (0.0026) model time 0.4463 (0.4491) loss 2.6916 (2.5099) grad_norm 2.3851 (3.2104) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][350/625] eta 0:02:04 lr 0.000081 wd 0.0500 time 0.4490 (0.4518) data time 0.0009 (0.0025) model time 0.4481 (0.4490) loss 2.8433 (2.5074) grad_norm 1.9061 (3.1802) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][360/625] eta 0:01:59 lr 0.000081 wd 0.0500 time 0.4505 (0.4518) data time 0.0009 (0.0025) model time 0.4495 (0.4489) loss 1.6902 (2.5112) grad_norm 3.8132 (3.1889) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][370/625] eta 0:01:55 lr 0.000081 wd 0.0500 time 0.4454 (0.4516) data time 0.0009 (0.0024) model time 0.4445 (0.4488) loss 2.6336 (2.5088) grad_norm 2.3542 (3.1739) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][380/625] eta 0:01:50 lr 0.000081 wd 0.0500 time 0.4468 (0.4515) data time 0.0006 (0.0024) model time 0.4462 (0.4488) loss 2.1503 (2.5034) grad_norm 2.3543 (3.1577) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][390/625] eta 0:01:46 lr 0.000081 wd 0.0500 time 0.4479 (0.4514) data time 0.0009 (0.0024) model time 0.4471 (0.4487) loss 2.6578 (2.4969) grad_norm 2.7787 (3.1479) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][400/625] eta 0:01:41 lr 0.000081 wd 0.0500 time 0.4460 (0.4513) data time 0.0008 (0.0023) model time 0.4452 (0.4486) loss 2.5457 (2.4995) grad_norm 2.5486 (3.1523) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][410/625] eta 0:01:37 lr 0.000081 wd 0.0500 time 0.4527 (0.4512) data time 0.0006 (0.0023) model time 0.4520 (0.4486) loss 2.8779 (2.5021) grad_norm 3.4391 (3.1690) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][420/625] eta 0:01:32 lr 0.000081 wd 0.0500 time 0.4505 (0.4512) data time 0.0007 (0.0022) model time 0.4497 (0.4486) loss 2.8558 (2.5099) grad_norm 2.6639 (3.1678) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][430/625] eta 0:01:27 lr 0.000081 wd 0.0500 time 0.4488 (0.4511) data time 0.0008 (0.0022) model time 0.4479 (0.4486) loss 1.8248 (2.5078) grad_norm 2.2678 (3.1966) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][440/625] eta 0:01:23 lr 0.000081 wd 0.0500 time 0.4465 (0.4510) data time 0.0006 (0.0022) model time 0.4459 (0.4485) loss 2.4465 (2.5089) grad_norm 2.4261 (3.1881) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][450/625] eta 0:01:18 lr 0.000081 wd 0.0500 time 0.4465 (0.4509) data time 0.0006 (0.0021) model time 0.4459 (0.4484) loss 2.6675 (2.5128) grad_norm 2.7814 (3.1788) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][460/625] eta 0:01:14 lr 0.000081 wd 0.0500 time 0.4466 (0.4508) data time 0.0009 (0.0021) model time 0.4458 (0.4483) loss 2.9885 (2.5129) grad_norm 4.9628 (3.1705) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][470/625] eta 0:01:09 lr 0.000081 wd 0.0500 time 0.4510 (0.4507) data time 0.0009 (0.0021) model time 0.4501 (0.4483) loss 2.8109 (2.5123) grad_norm 1.9824 (3.1540) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][480/625] eta 0:01:05 lr 0.000081 wd 0.0500 time 0.4521 (0.4507) data time 0.0006 (0.0021) model time 0.4515 (0.4483) loss 1.9372 (2.5135) grad_norm 2.2746 (3.1549) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][490/625] eta 0:01:00 lr 0.000080 wd 0.0500 time 0.4466 (0.4506) data time 0.0009 (0.0020) model time 0.4458 (0.4483) loss 2.4514 (2.5135) grad_norm 2.8829 (3.1538) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][500/625] eta 0:00:56 lr 0.000080 wd 0.0500 time 0.4498 (0.4510) data time 0.0009 (0.0020) model time 0.4489 (0.4487) loss 2.4452 (2.5096) grad_norm 1.9669 (3.1735) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][510/625] eta 0:00:51 lr 0.000080 wd 0.0500 time 0.4488 (0.4510) data time 0.0008 (0.0020) model time 0.4480 (0.4487) loss 2.0665 (2.5088) grad_norm 2.1136 (3.1607) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][520/625] eta 0:00:47 lr 0.000080 wd 0.0500 time 0.4496 (0.4509) data time 0.0008 (0.0020) model time 0.4488 (0.4487) loss 2.3994 (2.5109) grad_norm 2.5630 (3.1834) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][530/625] eta 0:00:42 lr 0.000080 wd 0.0500 time 0.4458 (0.4508) data time 0.0006 (0.0019) model time 0.4452 (0.4486) loss 2.7163 (2.5126) grad_norm 3.4691 (3.1711) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][540/625] eta 0:00:38 lr 0.000080 wd 0.0500 time 0.4431 (0.4510) data time 0.0008 (0.0019) model time 0.4422 (0.4489) loss 2.1742 (2.5120) grad_norm 3.1720 (3.1665) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][550/625] eta 0:00:33 lr 0.000080 wd 0.0500 time 0.4537 (0.4510) data time 0.0006 (0.0019) model time 0.4531 (0.4489) loss 1.8315 (2.5099) grad_norm 1.7088 (3.1474) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][560/625] eta 0:00:29 lr 0.000080 wd 0.0500 time 0.4553 (0.4509) data time 0.0006 (0.0019) model time 0.4546 (0.4488) loss 3.2397 (2.5123) grad_norm 3.0302 (3.1400) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][570/625] eta 0:00:24 lr 0.000080 wd 0.0500 time 0.4489 (0.4509) data time 0.0008 (0.0019) model time 0.4481 (0.4488) loss 2.4577 (2.5121) grad_norm 1.7535 (3.1323) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][580/625] eta 0:00:20 lr 0.000080 wd 0.0500 time 0.4486 (0.4509) data time 0.0006 (0.0018) model time 0.4479 (0.4488) loss 2.2633 (2.5098) grad_norm 2.3064 (3.1171) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][590/625] eta 0:00:15 lr 0.000080 wd 0.0500 time 0.4533 (0.4509) data time 0.0006 (0.0018) model time 0.4527 (0.4488) loss 1.7692 (2.5105) grad_norm 2.0734 (3.1168) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][600/625] eta 0:00:11 lr 0.000080 wd 0.0500 time 0.4490 (0.4508) data time 0.0007 (0.0018) model time 0.4484 (0.4488) loss 2.5040 (2.5126) grad_norm 2.5374 (3.1164) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][610/625] eta 0:00:06 lr 0.000080 wd 0.0500 time 0.4421 (0.4508) data time 0.0006 (0.0018) model time 0.4415 (0.4487) loss 1.9531 (2.5118) grad_norm 2.4984 (3.1109) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [256/300][620/625] eta 0:00:02 lr 0.000080 wd 0.0500 time 0.4426 (0.4506) data time 0.0004 (0.0018) model time 0.4422 (0.4486) loss 2.5896 (2.5158) grad_norm 1.8523 (3.1035) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 256 training takes 0:04:41 [2024-08-11 09:08:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:08:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:08:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5210 (0.5210) Acc@1 89.258 (89.258) Acc@5 98.633 (98.633) Mem 16699MB [2024-08-11 09:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8530 (0.6294) Acc@1 80.029 (86.710) Acc@5 96.045 (97.714) Mem 16699MB [2024-08-11 09:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9351 (0.7468) Acc@1 78.711 (84.017) Acc@5 95.166 (96.605) Mem 16699MB [2024-08-11 09:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.763 Acc@5 96.583 [2024-08-11 09:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 09:08:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.829 (0.829) Loss 0.5044 (0.5044) Acc@1 89.404 (89.404) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.185) Loss 0.8096 (0.6071) Acc@1 81.152 (87.194) Acc@5 96.484 (97.852) Mem 16699MB [2024-08-11 09:08:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8857 (0.7191) Acc@1 79.590 (84.419) Acc@5 95.557 (96.849) Mem 16699MB [2024-08-11 09:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.145 Acc@5 96.813 [2024-08-11 09:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][0/625] eta 0:13:30 lr 0.000080 wd 0.0500 time 1.2963 (1.2963) data time 0.6065 (0.6065) model time 0.0000 (0.0000) loss 2.9091 (2.9091) grad_norm 6.8993 (6.8993) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][10/625] eta 0:05:23 lr 0.000080 wd 0.0500 time 0.4476 (0.5261) data time 0.0007 (0.0559) model time 0.0000 (0.0000) loss 2.8259 (2.4648) grad_norm 2.7460 (3.6315) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][20/625] eta 0:04:55 lr 0.000080 wd 0.0500 time 0.4468 (0.4891) data time 0.0008 (0.0297) model time 0.0000 (0.0000) loss 2.9392 (2.3264) grad_norm 2.5027 (3.1911) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][30/625] eta 0:04:46 lr 0.000080 wd 0.0500 time 0.4433 (0.4817) data time 0.0008 (0.0204) model time 0.0000 (0.0000) loss 2.5304 (2.3449) grad_norm 5.0010 (3.0247) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][40/625] eta 0:04:36 lr 0.000080 wd 0.0500 time 0.4465 (0.4730) data time 0.0006 (0.0157) model time 0.0000 (0.0000) loss 2.5905 (2.3040) grad_norm 2.4556 (2.9435) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][50/625] eta 0:04:29 lr 0.000080 wd 0.0500 time 0.4462 (0.4680) data time 0.0008 (0.0128) model time 0.0000 (0.0000) loss 2.6316 (2.3401) grad_norm 3.4010 (2.9934) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][60/625] eta 0:04:22 lr 0.000080 wd 0.0500 time 0.4488 (0.4645) data time 0.0007 (0.0108) model time 0.4480 (0.4459) loss 2.3723 (2.3808) grad_norm 5.5189 (3.0350) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][70/625] eta 0:04:16 lr 0.000079 wd 0.0500 time 0.4452 (0.4620) data time 0.0008 (0.0094) model time 0.4443 (0.4460) loss 3.0989 (2.4144) grad_norm 3.1533 (3.0759) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][80/625] eta 0:04:10 lr 0.000079 wd 0.0500 time 0.4454 (0.4602) data time 0.0009 (0.0084) model time 0.4445 (0.4461) loss 2.3651 (2.4181) grad_norm 2.0489 (3.0908) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][90/625] eta 0:04:06 lr 0.000079 wd 0.0500 time 0.4484 (0.4603) data time 0.0008 (0.0075) model time 0.4476 (0.4497) loss 2.9067 (2.4274) grad_norm 1.7665 (3.0285) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][100/625] eta 0:04:00 lr 0.000079 wd 0.0500 time 0.4450 (0.4588) data time 0.0009 (0.0069) model time 0.4440 (0.4487) loss 2.5375 (2.4489) grad_norm 2.1128 (2.9802) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][110/625] eta 0:03:55 lr 0.000079 wd 0.0500 time 0.4456 (0.4576) data time 0.0010 (0.0063) model time 0.4447 (0.4480) loss 2.9377 (2.4606) grad_norm 1.9995 (3.0065) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][120/625] eta 0:03:50 lr 0.000079 wd 0.0500 time 0.4471 (0.4568) data time 0.0007 (0.0059) model time 0.4464 (0.4478) loss 1.8975 (2.4558) grad_norm 27.2219 (3.1966) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][130/625] eta 0:03:45 lr 0.000079 wd 0.0500 time 0.4475 (0.4560) data time 0.0006 (0.0055) model time 0.4468 (0.4476) loss 2.6594 (2.4697) grad_norm 2.1971 (3.2610) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][140/625] eta 0:03:40 lr 0.000079 wd 0.0500 time 0.4476 (0.4554) data time 0.0009 (0.0052) model time 0.4467 (0.4475) loss 2.1265 (2.4772) grad_norm 3.0309 (3.2248) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][150/625] eta 0:03:36 lr 0.000079 wd 0.0500 time 0.4447 (0.4549) data time 0.0009 (0.0049) model time 0.4438 (0.4474) loss 2.5942 (2.4683) grad_norm 1.7649 (3.1611) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][160/625] eta 0:03:31 lr 0.000079 wd 0.0500 time 0.4445 (0.4544) data time 0.0009 (0.0046) model time 0.4436 (0.4473) loss 2.6474 (2.4677) grad_norm 2.7297 (3.1358) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][170/625] eta 0:03:26 lr 0.000079 wd 0.0500 time 0.4467 (0.4540) data time 0.0008 (0.0044) model time 0.4459 (0.4472) loss 2.7771 (2.4658) grad_norm 2.7881 (3.1310) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][180/625] eta 0:03:22 lr 0.000079 wd 0.0500 time 0.4462 (0.4548) data time 0.0009 (0.0042) model time 0.4453 (0.4488) loss 2.3280 (2.4532) grad_norm 2.1921 (3.1172) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][190/625] eta 0:03:17 lr 0.000079 wd 0.0500 time 0.4473 (0.4544) data time 0.0006 (0.0040) model time 0.4467 (0.4486) loss 2.7695 (2.4544) grad_norm 2.0558 (3.0930) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][200/625] eta 0:03:12 lr 0.000079 wd 0.0500 time 0.4482 (0.4540) data time 0.0008 (0.0039) model time 0.4474 (0.4484) loss 2.4944 (2.4488) grad_norm 3.2326 (3.0864) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][210/625] eta 0:03:08 lr 0.000079 wd 0.0500 time 0.4490 (0.4538) data time 0.0007 (0.0037) model time 0.4483 (0.4485) loss 2.8286 (2.4562) grad_norm 2.9850 (3.0856) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][220/625] eta 0:03:03 lr 0.000079 wd 0.0500 time 0.4480 (0.4536) data time 0.0007 (0.0036) model time 0.4474 (0.4484) loss 2.4962 (2.4478) grad_norm 2.5552 (3.0582) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][230/625] eta 0:02:59 lr 0.000079 wd 0.0500 time 0.4479 (0.4533) data time 0.0006 (0.0035) model time 0.4473 (0.4483) loss 2.9074 (2.4536) grad_norm 2.3848 (3.2674) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][240/625] eta 0:02:54 lr 0.000079 wd 0.0500 time 0.4466 (0.4537) data time 0.0008 (0.0034) model time 0.4458 (0.4490) loss 2.6683 (2.4535) grad_norm 2.1292 (3.2657) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][250/625] eta 0:02:50 lr 0.000079 wd 0.0500 time 0.4523 (0.4534) data time 0.0008 (0.0033) model time 0.4514 (0.4488) loss 2.7544 (2.4587) grad_norm 2.2343 (3.2429) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][260/625] eta 0:02:45 lr 0.000079 wd 0.0500 time 0.4585 (0.4532) data time 0.0007 (0.0032) model time 0.4578 (0.4487) loss 2.0625 (2.4625) grad_norm 4.1285 (3.2426) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:10:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][270/625] eta 0:02:40 lr 0.000078 wd 0.0500 time 0.4463 (0.4529) data time 0.0008 (0.0031) model time 0.4455 (0.4486) loss 1.7007 (2.4643) grad_norm 1.9383 (3.2247) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][280/625] eta 0:02:36 lr 0.000078 wd 0.0500 time 0.4503 (0.4528) data time 0.0006 (0.0030) model time 0.4497 (0.4486) loss 2.4565 (2.4592) grad_norm 4.1376 (3.2064) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][290/625] eta 0:02:31 lr 0.000078 wd 0.0500 time 0.4486 (0.4528) data time 0.0006 (0.0029) model time 0.4479 (0.4487) loss 2.8567 (2.4661) grad_norm 2.8037 (3.1902) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][300/625] eta 0:02:27 lr 0.000078 wd 0.0500 time 0.4493 (0.4526) data time 0.0006 (0.0029) model time 0.4486 (0.4487) loss 2.9872 (2.4688) grad_norm 4.0905 (3.1706) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][310/625] eta 0:02:22 lr 0.000078 wd 0.0500 time 0.4433 (0.4525) data time 0.0009 (0.0028) model time 0.4424 (0.4486) loss 1.9893 (2.4741) grad_norm 3.0313 (3.1687) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][320/625] eta 0:02:17 lr 0.000078 wd 0.0500 time 0.4488 (0.4524) data time 0.0008 (0.0027) model time 0.4479 (0.4486) loss 2.3028 (2.4748) grad_norm 1.9020 (3.1410) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][330/625] eta 0:02:13 lr 0.000078 wd 0.0500 time 0.4460 (0.4522) data time 0.0008 (0.0027) model time 0.4452 (0.4485) loss 1.8265 (2.4720) grad_norm 2.2574 (3.1237) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][340/625] eta 0:02:08 lr 0.000078 wd 0.0500 time 0.4509 (0.4521) data time 0.0006 (0.0026) model time 0.4503 (0.4484) loss 1.7158 (2.4697) grad_norm 2.9452 (3.1125) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][350/625] eta 0:02:04 lr 0.000078 wd 0.0500 time 0.4518 (0.4520) data time 0.0008 (0.0026) model time 0.4509 (0.4484) loss 2.5379 (2.4705) grad_norm 2.6859 (3.1572) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][360/625] eta 0:01:59 lr 0.000078 wd 0.0500 time 0.4458 (0.4519) data time 0.0006 (0.0025) model time 0.4452 (0.4484) loss 2.7180 (2.4699) grad_norm 2.4638 (3.1412) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][370/625] eta 0:01:55 lr 0.000078 wd 0.0500 time 0.4491 (0.4518) data time 0.0008 (0.0025) model time 0.4483 (0.4484) loss 2.4918 (2.4713) grad_norm 2.3272 (3.1184) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][380/625] eta 0:01:50 lr 0.000078 wd 0.0500 time 0.4494 (0.4517) data time 0.0008 (0.0024) model time 0.4486 (0.4483) loss 2.6701 (2.4706) grad_norm 1.8672 (3.3990) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][390/625] eta 0:01:46 lr 0.000078 wd 0.0500 time 0.4498 (0.4516) data time 0.0009 (0.0024) model time 0.4490 (0.4483) loss 1.9888 (2.4752) grad_norm 4.1006 (3.3970) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:11:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][400/625] eta 0:01:41 lr 0.000078 wd 0.0500 time 0.4449 (0.4515) data time 0.0006 (0.0023) model time 0.4443 (0.4482) loss 2.6623 (2.4773) grad_norm 2.8183 (3.3694) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][410/625] eta 0:01:37 lr 0.000078 wd 0.0500 time 0.4439 (0.4513) data time 0.0008 (0.0023) model time 0.4431 (0.4481) loss 1.9307 (2.4800) grad_norm 2.2263 (3.3509) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][420/625] eta 0:01:32 lr 0.000078 wd 0.0500 time 0.4495 (0.4513) data time 0.0007 (0.0023) model time 0.4488 (0.4482) loss 2.0520 (2.4838) grad_norm 1.8770 (3.3243) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][430/625] eta 0:01:27 lr 0.000078 wd 0.0500 time 0.4508 (0.4512) data time 0.0008 (0.0022) model time 0.4500 (0.4482) loss 2.8384 (2.4829) grad_norm 2.4409 (3.3025) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][440/625] eta 0:01:23 lr 0.000078 wd 0.0500 time 0.4446 (0.4512) data time 0.0006 (0.0022) model time 0.4441 (0.4482) loss 2.8025 (2.4844) grad_norm 1.9365 (3.3230) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][450/625] eta 0:01:18 lr 0.000078 wd 0.0500 time 0.4463 (0.4511) data time 0.0006 (0.0022) model time 0.4457 (0.4482) loss 2.1742 (2.4846) grad_norm 1.6591 (3.3067) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][460/625] eta 0:01:14 lr 0.000078 wd 0.0500 time 0.4525 (0.4511) data time 0.0008 (0.0021) model time 0.4517 (0.4482) loss 2.8800 (2.4864) grad_norm 2.2177 (3.3432) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][470/625] eta 0:01:09 lr 0.000077 wd 0.0500 time 0.4459 (0.4510) data time 0.0006 (0.0021) model time 0.4454 (0.4481) loss 2.0941 (2.4906) grad_norm 2.1683 (3.3486) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][480/625] eta 0:01:05 lr 0.000077 wd 0.0500 time 0.4450 (0.4509) data time 0.0006 (0.0021) model time 0.4444 (0.4481) loss 2.7452 (2.4899) grad_norm 2.2130 (3.3513) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][490/625] eta 0:01:00 lr 0.000077 wd 0.0500 time 0.4733 (0.4509) data time 0.0010 (0.0021) model time 0.4724 (0.4481) loss 2.5963 (2.4848) grad_norm 2.1850 (3.3415) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][500/625] eta 0:00:56 lr 0.000077 wd 0.0500 time 0.4502 (0.4509) data time 0.0008 (0.0020) model time 0.4493 (0.4481) loss 2.8676 (2.4846) grad_norm 2.2240 (3.3260) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][510/625] eta 0:00:51 lr 0.000077 wd 0.0500 time 0.4494 (0.4513) data time 0.0006 (0.0020) model time 0.4488 (0.4486) loss 1.6183 (2.4823) grad_norm 2.0730 (3.3077) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][520/625] eta 0:00:47 lr 0.000077 wd 0.0500 time 0.4490 (0.4513) data time 0.0008 (0.0020) model time 0.4483 (0.4487) loss 2.9689 (2.4837) grad_norm 1.4883 (3.2954) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:12:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][530/625] eta 0:00:42 lr 0.000077 wd 0.0500 time 0.4473 (0.4513) data time 0.0006 (0.0020) model time 0.4467 (0.4487) loss 1.8519 (2.4855) grad_norm 2.4226 (3.4573) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][540/625] eta 0:00:38 lr 0.000077 wd 0.0500 time 0.4468 (0.4512) data time 0.0009 (0.0019) model time 0.4460 (0.4486) loss 2.8755 (2.4858) grad_norm 2.0564 (3.4819) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][550/625] eta 0:00:33 lr 0.000077 wd 0.0500 time 0.4457 (0.4511) data time 0.0006 (0.0019) model time 0.4450 (0.4486) loss 1.8159 (2.4798) grad_norm 2.6150 (3.4638) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][560/625] eta 0:00:29 lr 0.000077 wd 0.0500 time 0.4458 (0.4510) data time 0.0007 (0.0019) model time 0.4451 (0.4485) loss 2.6541 (2.4790) grad_norm 3.1744 (3.4463) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][570/625] eta 0:00:24 lr 0.000077 wd 0.0500 time 0.4476 (0.4513) data time 0.0007 (0.0019) model time 0.4468 (0.4488) loss 1.7539 (2.4784) grad_norm 2.6756 (3.4430) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][580/625] eta 0:00:20 lr 0.000077 wd 0.0500 time 0.4515 (0.4512) data time 0.0007 (0.0019) model time 0.4508 (0.4488) loss 2.7296 (2.4792) grad_norm 1.8203 (3.4323) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][590/625] eta 0:00:15 lr 0.000077 wd 0.0500 time 0.4443 (0.4511) data time 0.0006 (0.0018) model time 0.4438 (0.4487) loss 2.3498 (2.4804) grad_norm 3.1228 (3.4199) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][600/625] eta 0:00:11 lr 0.000077 wd 0.0500 time 0.4463 (0.4511) data time 0.0008 (0.0018) model time 0.4454 (0.4487) loss 1.8599 (2.4800) grad_norm 2.6569 (3.4062) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][610/625] eta 0:00:06 lr 0.000077 wd 0.0500 time 0.4395 (0.4510) data time 0.0007 (0.0018) model time 0.4389 (0.4486) loss 2.7060 (2.4814) grad_norm 3.3898 (3.3954) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [257/300][620/625] eta 0:00:02 lr 0.000077 wd 0.0500 time 0.4422 (0.4508) data time 0.0006 (0.0018) model time 0.4416 (0.4485) loss 2.5381 (2.4807) grad_norm 2.0351 (3.3860) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 257 training takes 0:04:41 [2024-08-11 09:13:38 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:13:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:13:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.473 (0.473) Loss 0.5142 (0.5142) Acc@1 88.916 (88.916) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 09:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8442 (0.6203) Acc@1 80.273 (86.754) Acc@5 96.094 (97.754) Mem 16699MB [2024-08-11 09:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9067 (0.7413) Acc@1 79.785 (83.989) Acc@5 95.557 (96.689) Mem 16699MB [2024-08-11 09:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.701 Acc@5 96.673 [2024-08-11 09:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:13:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.801 (0.801) Loss 0.5049 (0.5049) Acc@1 89.600 (89.600) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:13:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.183) Loss 0.8110 (0.6080) Acc@1 81.152 (87.234) Acc@5 96.533 (97.834) Mem 16699MB [2024-08-11 09:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.150) Loss 0.8877 (0.7204) Acc@1 79.541 (84.461) Acc@5 95.459 (96.842) Mem 16699MB [2024-08-11 09:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.185 Acc@5 96.809 [2024-08-11 09:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 09:13:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][0/625] eta 0:13:04 lr 0.000077 wd 0.0500 time 1.2548 (1.2548) data time 0.4547 (0.4547) model time 0.0000 (0.0000) loss 2.7987 (2.7987) grad_norm 2.1979 (2.1979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][10/625] eta 0:05:21 lr 0.000077 wd 0.0500 time 0.4475 (0.5224) data time 0.0008 (0.0421) model time 0.0000 (0.0000) loss 2.5690 (2.6076) grad_norm 1.7804 (2.3832) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:13:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][20/625] eta 0:04:55 lr 0.000077 wd 0.0500 time 0.4474 (0.4880) data time 0.0009 (0.0224) model time 0.0000 (0.0000) loss 2.9460 (2.6112) grad_norm 4.7095 (2.4653) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][30/625] eta 0:04:43 lr 0.000077 wd 0.0500 time 0.4504 (0.4757) data time 0.0008 (0.0154) model time 0.0000 (0.0000) loss 2.4027 (2.6097) grad_norm 2.1353 (2.6573) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][40/625] eta 0:04:34 lr 0.000077 wd 0.0500 time 0.4468 (0.4693) data time 0.0008 (0.0119) model time 0.0000 (0.0000) loss 2.7587 (2.6527) grad_norm 3.1191 (2.5680) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][50/625] eta 0:04:27 lr 0.000077 wd 0.0500 time 0.4465 (0.4652) data time 0.0008 (0.0097) model time 0.0000 (0.0000) loss 1.9378 (2.5553) grad_norm 3.9211 (2.6558) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][60/625] eta 0:04:21 lr 0.000076 wd 0.0500 time 0.4490 (0.4624) data time 0.0007 (0.0083) model time 0.4483 (0.4473) loss 2.4011 (2.5473) grad_norm 7.0707 (3.1774) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][70/625] eta 0:04:15 lr 0.000076 wd 0.0500 time 0.4451 (0.4604) data time 0.0006 (0.0072) model time 0.4445 (0.4474) loss 2.6615 (2.5437) grad_norm 9.3140 (3.1838) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][80/625] eta 0:04:10 lr 0.000076 wd 0.0500 time 0.4558 (0.4591) data time 0.0007 (0.0064) model time 0.4550 (0.4478) loss 2.7928 (2.5598) grad_norm 2.1859 (3.2176) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][90/625] eta 0:04:05 lr 0.000076 wd 0.0500 time 0.4504 (0.4582) data time 0.0008 (0.0058) model time 0.4496 (0.4485) loss 1.7674 (2.5318) grad_norm 2.5607 (3.2269) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][100/625] eta 0:04:00 lr 0.000076 wd 0.0500 time 0.4456 (0.4574) data time 0.0007 (0.0053) model time 0.4450 (0.4487) loss 2.0987 (2.5291) grad_norm 1.6988 (3.1843) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][110/625] eta 0:03:55 lr 0.000076 wd 0.0500 time 0.4468 (0.4568) data time 0.0009 (0.0049) model time 0.4459 (0.4487) loss 2.5442 (2.5239) grad_norm 2.5795 (3.1338) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][120/625] eta 0:03:50 lr 0.000076 wd 0.0500 time 0.4453 (0.4560) data time 0.0006 (0.0046) model time 0.4447 (0.4484) loss 2.7193 (2.5284) grad_norm 2.7199 (3.1874) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][130/625] eta 0:03:46 lr 0.000076 wd 0.0500 time 0.4473 (0.4577) data time 0.0006 (0.0043) model time 0.4466 (0.4521) loss 2.3218 (2.5173) grad_norm 2.6545 (3.1887) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][140/625] eta 0:03:41 lr 0.000076 wd 0.0500 time 0.4559 (0.4571) data time 0.0006 (0.0040) model time 0.4553 (0.4517) loss 2.9092 (2.5240) grad_norm 2.6842 (3.1392) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:14:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][150/625] eta 0:03:36 lr 0.000076 wd 0.0500 time 0.4521 (0.4565) data time 0.0008 (0.0038) model time 0.4513 (0.4513) loss 2.6070 (2.5131) grad_norm 2.0037 (3.1179) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][160/625] eta 0:03:32 lr 0.000076 wd 0.0500 time 0.4432 (0.4559) data time 0.0009 (0.0036) model time 0.4423 (0.4508) loss 2.5207 (2.5019) grad_norm 3.0019 (3.1156) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][170/625] eta 0:03:27 lr 0.000076 wd 0.0500 time 0.4482 (0.4555) data time 0.0006 (0.0035) model time 0.4475 (0.4505) loss 2.1750 (2.5147) grad_norm 7.0744 (3.1547) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][180/625] eta 0:03:22 lr 0.000076 wd 0.0500 time 0.4499 (0.4551) data time 0.0009 (0.0033) model time 0.4490 (0.4503) loss 2.7801 (2.5193) grad_norm 2.8803 (3.2394) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][190/625] eta 0:03:17 lr 0.000076 wd 0.0500 time 0.4495 (0.4548) data time 0.0009 (0.0032) model time 0.4486 (0.4502) loss 2.6536 (2.5141) grad_norm 2.8892 (3.3582) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][200/625] eta 0:03:13 lr 0.000076 wd 0.0500 time 0.4486 (0.4544) data time 0.0006 (0.0031) model time 0.4480 (0.4499) loss 3.2998 (2.5202) grad_norm 1.9516 (3.3237) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][210/625] eta 0:03:08 lr 0.000076 wd 0.0500 time 0.4471 (0.4541) data time 0.0007 (0.0030) model time 0.4464 (0.4497) loss 2.3629 (2.5187) grad_norm 3.0318 (3.2978) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][220/625] eta 0:03:03 lr 0.000076 wd 0.0500 time 0.4516 (0.4537) data time 0.0007 (0.0029) model time 0.4509 (0.4495) loss 2.5527 (2.5171) grad_norm 2.6773 (3.2512) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][230/625] eta 0:02:59 lr 0.000076 wd 0.0500 time 0.4567 (0.4536) data time 0.0009 (0.0028) model time 0.4558 (0.4495) loss 2.4806 (2.5181) grad_norm 4.1204 (3.2605) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][240/625] eta 0:02:54 lr 0.000076 wd 0.0500 time 0.4501 (0.4535) data time 0.0006 (0.0027) model time 0.4495 (0.4495) loss 2.9419 (2.5137) grad_norm 6.6841 (3.2594) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][250/625] eta 0:02:49 lr 0.000076 wd 0.0500 time 0.4464 (0.4532) data time 0.0009 (0.0026) model time 0.4455 (0.4493) loss 2.8004 (2.5129) grad_norm 2.2774 (3.3192) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][260/625] eta 0:02:45 lr 0.000075 wd 0.0500 time 0.4461 (0.4530) data time 0.0008 (0.0026) model time 0.4452 (0.4492) loss 3.0801 (2.5159) grad_norm 2.2003 (3.2909) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][270/625] eta 0:02:40 lr 0.000075 wd 0.0500 time 0.4417 (0.4532) data time 0.0007 (0.0025) model time 0.4410 (0.4496) loss 3.0199 (2.5148) grad_norm 2.8193 (3.2611) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][280/625] eta 0:02:36 lr 0.000075 wd 0.0500 time 0.4466 (0.4530) data time 0.0008 (0.0024) model time 0.4457 (0.4494) loss 2.7374 (2.5140) grad_norm 2.5090 (3.2848) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:15:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][290/625] eta 0:02:31 lr 0.000075 wd 0.0500 time 0.4462 (0.4527) data time 0.0006 (0.0024) model time 0.4456 (0.4492) loss 2.6750 (2.5155) grad_norm 2.6729 (3.2459) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][300/625] eta 0:02:27 lr 0.000075 wd 0.0500 time 0.4460 (0.4525) data time 0.0006 (0.0023) model time 0.4454 (0.4491) loss 1.7895 (2.5074) grad_norm 2.1381 (3.2369) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][310/625] eta 0:02:22 lr 0.000075 wd 0.0500 time 0.4433 (0.4524) data time 0.0009 (0.0023) model time 0.4425 (0.4490) loss 2.7792 (2.5112) grad_norm 3.1060 (3.2343) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][320/625] eta 0:02:18 lr 0.000075 wd 0.0500 time 0.4516 (0.4528) data time 0.0006 (0.0022) model time 0.4509 (0.4497) loss 3.2067 (2.5162) grad_norm 2.3283 (3.2818) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][330/625] eta 0:02:13 lr 0.000075 wd 0.0500 time 0.4465 (0.4527) data time 0.0006 (0.0022) model time 0.4459 (0.4495) loss 2.5124 (2.5207) grad_norm 2.0618 (3.3116) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][340/625] eta 0:02:08 lr 0.000075 wd 0.0500 time 0.4453 (0.4524) data time 0.0007 (0.0022) model time 0.4446 (0.4494) loss 2.9325 (2.5207) grad_norm 2.3296 (3.2903) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][350/625] eta 0:02:04 lr 0.000075 wd 0.0500 time 0.4474 (0.4522) data time 0.0008 (0.0021) model time 0.4466 (0.4492) loss 2.2622 (2.5214) grad_norm 2.7647 (3.2807) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][360/625] eta 0:01:59 lr 0.000075 wd 0.0500 time 0.4434 (0.4520) data time 0.0008 (0.0021) model time 0.4426 (0.4490) loss 2.6850 (2.5284) grad_norm 2.6302 (3.2501) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][370/625] eta 0:01:55 lr 0.000075 wd 0.0500 time 0.4470 (0.4519) data time 0.0006 (0.0021) model time 0.4464 (0.4489) loss 1.7530 (2.5280) grad_norm 2.5911 (3.2621) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][380/625] eta 0:01:50 lr 0.000075 wd 0.0500 time 0.4467 (0.4517) data time 0.0008 (0.0020) model time 0.4460 (0.4488) loss 2.0798 (2.5257) grad_norm 3.6531 (3.2452) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][390/625] eta 0:01:46 lr 0.000075 wd 0.0500 time 0.4518 (0.4517) data time 0.0008 (0.0020) model time 0.4510 (0.4488) loss 2.7270 (2.5232) grad_norm 2.2161 (3.2671) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][400/625] eta 0:01:41 lr 0.000075 wd 0.0500 time 0.4504 (0.4516) data time 0.0007 (0.0020) model time 0.4498 (0.4488) loss 2.8796 (2.5262) grad_norm 2.1597 (3.2519) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][410/625] eta 0:01:37 lr 0.000075 wd 0.0500 time 0.4464 (0.4514) data time 0.0007 (0.0019) model time 0.4457 (0.4487) loss 1.5875 (2.5237) grad_norm 2.4649 (3.2358) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:16:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][420/625] eta 0:01:32 lr 0.000075 wd 0.0500 time 0.4432 (0.4513) data time 0.0006 (0.0019) model time 0.4425 (0.4485) loss 2.6682 (2.5225) grad_norm 1.9817 (3.3105) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][430/625] eta 0:01:27 lr 0.000075 wd 0.0500 time 0.4427 (0.4512) data time 0.0007 (0.0019) model time 0.4421 (0.4484) loss 2.0797 (2.5227) grad_norm 2.3318 (3.3239) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][440/625] eta 0:01:23 lr 0.000075 wd 0.0500 time 0.4491 (0.4510) data time 0.0009 (0.0019) model time 0.4483 (0.4484) loss 2.9179 (2.5259) grad_norm 2.6868 (3.3588) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][450/625] eta 0:01:18 lr 0.000075 wd 0.0500 time 0.4474 (0.4510) data time 0.0008 (0.0018) model time 0.4466 (0.4484) loss 2.2672 (2.5264) grad_norm 5.6115 (3.3584) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][460/625] eta 0:01:14 lr 0.000075 wd 0.0500 time 0.4497 (0.4510) data time 0.0008 (0.0018) model time 0.4489 (0.4483) loss 2.7269 (2.5254) grad_norm 2.3334 (3.3414) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][470/625] eta 0:01:09 lr 0.000074 wd 0.0500 time 0.4480 (0.4509) data time 0.0007 (0.0018) model time 0.4474 (0.4483) loss 2.8459 (2.5247) grad_norm 2.7864 (3.3945) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][480/625] eta 0:01:05 lr 0.000074 wd 0.0500 time 0.4453 (0.4509) data time 0.0008 (0.0018) model time 0.4445 (0.4483) loss 2.4462 (2.5202) grad_norm 2.3170 (3.3708) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][490/625] eta 0:01:00 lr 0.000074 wd 0.0500 time 0.4440 (0.4508) data time 0.0007 (0.0018) model time 0.4433 (0.4483) loss 1.8873 (2.5199) grad_norm 3.3770 (3.3543) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][500/625] eta 0:00:56 lr 0.000074 wd 0.0500 time 0.4462 (0.4507) data time 0.0008 (0.0017) model time 0.4454 (0.4483) loss 2.8705 (2.5187) grad_norm 1.9346 (3.3434) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][510/625] eta 0:00:51 lr 0.000074 wd 0.0500 time 0.4458 (0.4506) data time 0.0009 (0.0017) model time 0.4449 (0.4482) loss 2.0976 (2.5196) grad_norm 2.3898 (3.4362) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][520/625] eta 0:00:47 lr 0.000074 wd 0.0500 time 0.4644 (0.4506) data time 0.0010 (0.0017) model time 0.4634 (0.4482) loss 2.9393 (2.5202) grad_norm 5.4626 (3.4268) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][530/625] eta 0:00:42 lr 0.000074 wd 0.0500 time 0.4469 (0.4506) data time 0.0007 (0.0017) model time 0.4462 (0.4482) loss 2.8514 (2.5234) grad_norm 2.5646 (3.4122) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][540/625] eta 0:00:38 lr 0.000074 wd 0.0500 time 0.4483 (0.4512) data time 0.0006 (0.0017) model time 0.4476 (0.4489) loss 1.6189 (2.5180) grad_norm 1.8931 (3.3887) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][550/625] eta 0:00:33 lr 0.000074 wd 0.0500 time 0.4487 (0.4512) data time 0.0007 (0.0017) model time 0.4481 (0.4489) loss 2.6734 (2.5156) grad_norm 2.2361 (3.3684) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][560/625] eta 0:00:29 lr 0.000074 wd 0.0500 time 0.4454 (0.4511) data time 0.0006 (0.0016) model time 0.4448 (0.4488) loss 2.9018 (2.5171) grad_norm 2.7789 (3.3498) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][570/625] eta 0:00:24 lr 0.000074 wd 0.0500 time 0.4462 (0.4510) data time 0.0008 (0.0016) model time 0.4453 (0.4488) loss 2.4812 (2.5146) grad_norm 2.1092 (3.3432) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][580/625] eta 0:00:20 lr 0.000074 wd 0.0500 time 0.4470 (0.4510) data time 0.0007 (0.0016) model time 0.4464 (0.4487) loss 3.1126 (2.5178) grad_norm 3.1883 (3.3435) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][590/625] eta 0:00:15 lr 0.000074 wd 0.0500 time 0.4499 (0.4509) data time 0.0007 (0.0016) model time 0.4493 (0.4487) loss 2.1297 (2.5197) grad_norm 3.0050 (3.3721) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][600/625] eta 0:00:11 lr 0.000074 wd 0.0500 time 0.4590 (0.4509) data time 0.0006 (0.0016) model time 0.4584 (0.4487) loss 2.9829 (2.5187) grad_norm 2.4634 (3.3588) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][610/625] eta 0:00:06 lr 0.000074 wd 0.0500 time 0.4451 (0.4511) data time 0.0004 (0.0016) model time 0.4447 (0.4490) loss 1.8779 (2.5154) grad_norm 2.3937 (3.3508) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [258/300][620/625] eta 0:00:02 lr 0.000074 wd 0.0500 time 0.4429 (0.4510) data time 0.0004 (0.0016) model time 0.4425 (0.4489) loss 2.5438 (2.5165) grad_norm 2.4260 (3.3408) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 258 training takes 0:04:41 [2024-08-11 09:18:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:18:30 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.5181 (0.5181) Acc@1 89.014 (89.014) Acc@5 98.828 (98.828) Mem 16699MB [2024-08-11 09:18:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8428 (0.6207) Acc@1 80.469 (86.865) Acc@5 96.094 (97.749) Mem 16699MB [2024-08-11 09:18:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9189 (0.7445) Acc@1 79.492 (84.001) Acc@5 95.264 (96.656) Mem 16699MB [2024-08-11 09:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.737 Acc@5 96.635 [2024-08-11 09:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:18:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.818 (0.818) Loss 0.5054 (0.5054) Acc@1 89.551 (89.551) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.185) Loss 0.8130 (0.6086) Acc@1 81.152 (87.220) Acc@5 96.436 (97.829) Mem 16699MB [2024-08-11 09:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.8882 (0.7215) Acc@1 79.443 (84.435) Acc@5 95.508 (96.845) Mem 16699MB [2024-08-11 09:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.151 Acc@5 96.809 [2024-08-11 09:18:37 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 09:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][0/625] eta 0:12:44 lr 0.000074 wd 0.0500 time 1.2238 (1.2238) data time 0.5654 (0.5654) model time 0.0000 (0.0000) loss 2.4193 (2.4193) grad_norm 2.6927 (2.6927) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][10/625] eta 0:05:18 lr 0.000074 wd 0.0500 time 0.4486 (0.5185) data time 0.0009 (0.0522) model time 0.0000 (0.0000) loss 2.6322 (2.4082) grad_norm 2.6348 (3.7646) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][20/625] eta 0:04:52 lr 0.000074 wd 0.0500 time 0.4441 (0.4838) data time 0.0009 (0.0278) model time 0.0000 (0.0000) loss 1.7906 (2.4249) grad_norm 3.3699 (3.3190) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][30/625] eta 0:04:41 lr 0.000074 wd 0.0500 time 0.4447 (0.4723) data time 0.0007 (0.0191) model time 0.0000 (0.0000) loss 3.2097 (2.4456) grad_norm 4.3775 (3.0938) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][40/625] eta 0:04:32 lr 0.000074 wd 0.0500 time 0.4468 (0.4665) data time 0.0007 (0.0146) model time 0.0000 (0.0000) loss 2.7160 (2.4607) grad_norm 2.4639 (3.4292) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][50/625] eta 0:04:26 lr 0.000074 wd 0.0500 time 0.4514 (0.4631) data time 0.0009 (0.0119) model time 0.0000 (0.0000) loss 1.9193 (2.4482) grad_norm 2.1446 (3.3611) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][60/625] eta 0:04:20 lr 0.000073 wd 0.0500 time 0.4477 (0.4606) data time 0.0006 (0.0101) model time 0.4471 (0.4472) loss 1.7269 (2.4303) grad_norm 2.3903 (3.2336) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][70/625] eta 0:04:14 lr 0.000073 wd 0.0500 time 0.4432 (0.4586) data time 0.0007 (0.0088) model time 0.4425 (0.4463) loss 2.9075 (2.4384) grad_norm 2.2277 (3.1172) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][80/625] eta 0:04:09 lr 0.000073 wd 0.0500 time 0.4490 (0.4573) data time 0.0006 (0.0078) model time 0.4483 (0.4465) loss 2.4628 (2.4365) grad_norm 2.6718 (3.0403) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][90/625] eta 0:04:06 lr 0.000073 wd 0.0500 time 0.4489 (0.4599) data time 0.0006 (0.0070) model time 0.4483 (0.4551) loss 2.4638 (2.4403) grad_norm 2.3337 (3.0294) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][100/625] eta 0:04:00 lr 0.000073 wd 0.0500 time 0.4520 (0.4589) data time 0.0008 (0.0064) model time 0.4512 (0.4538) loss 2.1220 (2.4215) grad_norm 3.2378 (2.9921) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][110/625] eta 0:03:55 lr 0.000073 wd 0.0500 time 0.4465 (0.4581) data time 0.0009 (0.0059) model time 0.4456 (0.4529) loss 3.0168 (2.4360) grad_norm 3.4815 (2.9911) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][120/625] eta 0:03:51 lr 0.000073 wd 0.0500 time 0.4471 (0.4575) data time 0.0008 (0.0055) model time 0.4463 (0.4525) loss 2.7097 (2.4513) grad_norm 3.3556 (3.0702) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:19:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][130/625] eta 0:03:46 lr 0.000073 wd 0.0500 time 0.4501 (0.4570) data time 0.0006 (0.0051) model time 0.4495 (0.4522) loss 1.6405 (2.4607) grad_norm 2.6149 (3.0505) loss_scale 128.0000 (68.3969) mem 16699MB [2024-08-11 09:19:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][140/625] eta 0:03:41 lr 0.000073 wd 0.0500 time 0.4546 (0.4564) data time 0.0009 (0.0048) model time 0.4536 (0.4517) loss 1.7593 (2.4394) grad_norm 2.5360 (3.1899) loss_scale 128.0000 (72.6241) mem 16699MB [2024-08-11 09:19:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][150/625] eta 0:03:36 lr 0.000073 wd 0.0500 time 0.4443 (0.4566) data time 0.0007 (0.0046) model time 0.4436 (0.4524) loss 3.0737 (2.4488) grad_norm 2.5807 (3.2607) loss_scale 128.0000 (76.2914) mem 16699MB [2024-08-11 09:19:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][160/625] eta 0:03:32 lr 0.000073 wd 0.0500 time 0.4442 (0.4561) data time 0.0007 (0.0043) model time 0.4435 (0.4520) loss 2.5672 (2.4449) grad_norm 2.5890 (3.2889) loss_scale 128.0000 (79.5031) mem 16699MB [2024-08-11 09:19:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][170/625] eta 0:03:27 lr 0.000073 wd 0.0500 time 0.4519 (0.4556) data time 0.0008 (0.0041) model time 0.4511 (0.4516) loss 2.5058 (2.4485) grad_norm 1.5671 (3.2631) loss_scale 128.0000 (82.3392) mem 16699MB [2024-08-11 09:20:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][180/625] eta 0:03:22 lr 0.000073 wd 0.0500 time 0.4529 (0.4554) data time 0.0008 (0.0040) model time 0.4521 (0.4515) loss 2.4402 (2.4484) grad_norm 4.4378 (3.2603) loss_scale 128.0000 (84.8619) mem 16699MB [2024-08-11 09:20:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][190/625] eta 0:03:17 lr 0.000073 wd 0.0500 time 0.4437 (0.4551) data time 0.0007 (0.0038) model time 0.4430 (0.4514) loss 2.6860 (2.4482) grad_norm 3.8847 (3.2613) loss_scale 128.0000 (87.1204) mem 16699MB [2024-08-11 09:20:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][200/625] eta 0:03:13 lr 0.000073 wd 0.0500 time 0.4479 (0.4549) data time 0.0008 (0.0036) model time 0.4471 (0.4513) loss 1.8955 (2.4391) grad_norm 2.1608 (3.2242) loss_scale 128.0000 (89.1542) mem 16699MB [2024-08-11 09:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][210/625] eta 0:03:08 lr 0.000073 wd 0.0500 time 0.4484 (0.4546) data time 0.0006 (0.0036) model time 0.4478 (0.4510) loss 2.6266 (2.4476) grad_norm 2.2455 (3.1973) loss_scale 128.0000 (90.9953) mem 16699MB [2024-08-11 09:20:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][220/625] eta 0:03:04 lr 0.000073 wd 0.0500 time 0.4479 (0.4544) data time 0.0009 (0.0034) model time 0.4470 (0.4509) loss 2.8095 (2.4457) grad_norm 1.7836 (3.1906) loss_scale 128.0000 (92.6697) mem 16699MB [2024-08-11 09:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][230/625] eta 0:02:59 lr 0.000073 wd 0.0500 time 0.4458 (0.4551) data time 0.0009 (0.0033) model time 0.4449 (0.4519) loss 2.5179 (2.4381) grad_norm 2.9802 (3.1911) loss_scale 128.0000 (94.1991) mem 16699MB [2024-08-11 09:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][240/625] eta 0:02:55 lr 0.000073 wd 0.0500 time 0.4488 (0.4555) data time 0.0006 (0.0032) model time 0.4482 (0.4526) loss 1.8234 (2.4360) grad_norm 2.1168 (3.1664) loss_scale 128.0000 (95.6017) mem 16699MB [2024-08-11 09:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][250/625] eta 0:02:50 lr 0.000073 wd 0.0500 time 0.4479 (0.4553) data time 0.0006 (0.0031) model time 0.4473 (0.4524) loss 1.7633 (2.4347) grad_norm 2.8163 (3.1331) loss_scale 128.0000 (96.8924) mem 16699MB [2024-08-11 09:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][260/625] eta 0:02:46 lr 0.000073 wd 0.0500 time 0.4500 (0.4550) data time 0.0008 (0.0030) model time 0.4493 (0.4522) loss 2.9319 (2.4328) grad_norm 3.1099 (3.1217) loss_scale 128.0000 (98.0843) mem 16699MB [2024-08-11 09:20:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][270/625] eta 0:02:41 lr 0.000072 wd 0.0500 time 0.4514 (0.4548) data time 0.0008 (0.0030) model time 0.4506 (0.4520) loss 2.5740 (2.4406) grad_norm 2.3655 (3.1051) loss_scale 128.0000 (99.1882) mem 16699MB [2024-08-11 09:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][280/625] eta 0:02:36 lr 0.000072 wd 0.0500 time 0.4476 (0.4547) data time 0.0006 (0.0029) model time 0.4470 (0.4519) loss 2.0805 (2.4404) grad_norm 2.6458 (3.1044) loss_scale 128.0000 (100.2135) mem 16699MB [2024-08-11 09:20:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][290/625] eta 0:02:32 lr 0.000072 wd 0.0500 time 0.4473 (0.4545) data time 0.0009 (0.0028) model time 0.4464 (0.4517) loss 2.5029 (2.4426) grad_norm 2.3108 (3.0977) loss_scale 128.0000 (101.1684) mem 16699MB [2024-08-11 09:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][300/625] eta 0:02:27 lr 0.000072 wd 0.0500 time 0.4487 (0.4542) data time 0.0009 (0.0028) model time 0.4478 (0.4515) loss 2.7711 (2.4421) grad_norm 2.7926 (3.0856) loss_scale 128.0000 (102.0598) mem 16699MB [2024-08-11 09:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][310/625] eta 0:02:23 lr 0.000072 wd 0.0500 time 0.4494 (0.4540) data time 0.0006 (0.0027) model time 0.4488 (0.4513) loss 2.0843 (2.4438) grad_norm 2.8066 (3.1048) loss_scale 128.0000 (102.8939) mem 16699MB [2024-08-11 09:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][320/625] eta 0:02:18 lr 0.000072 wd 0.0500 time 0.4471 (0.4538) data time 0.0009 (0.0026) model time 0.4462 (0.4512) loss 2.7775 (2.4504) grad_norm 2.2395 (3.0790) loss_scale 128.0000 (103.6760) mem 16699MB [2024-08-11 09:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][330/625] eta 0:02:13 lr 0.000072 wd 0.0500 time 0.4498 (0.4537) data time 0.0008 (0.0026) model time 0.4489 (0.4511) loss 2.6072 (2.4556) grad_norm 2.2064 (3.0528) loss_scale 128.0000 (104.4109) mem 16699MB [2024-08-11 09:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][340/625] eta 0:02:09 lr 0.000072 wd 0.0500 time 0.4455 (0.4535) data time 0.0006 (0.0025) model time 0.4449 (0.4510) loss 1.6564 (2.4505) grad_norm 2.8392 (3.0605) loss_scale 128.0000 (105.1026) mem 16699MB [2024-08-11 09:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][350/625] eta 0:02:04 lr 0.000072 wd 0.0500 time 0.4492 (0.4534) data time 0.0006 (0.0025) model time 0.4486 (0.4509) loss 2.1713 (2.4553) grad_norm 3.8165 (3.0643) loss_scale 128.0000 (105.7550) mem 16699MB [2024-08-11 09:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][360/625] eta 0:02:00 lr 0.000072 wd 0.0500 time 0.4513 (0.4532) data time 0.0006 (0.0024) model time 0.4506 (0.4508) loss 2.8136 (2.4559) grad_norm 2.3948 (3.0523) loss_scale 128.0000 (106.3712) mem 16699MB [2024-08-11 09:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][370/625] eta 0:01:55 lr 0.000072 wd 0.0500 time 0.4456 (0.4534) data time 0.0008 (0.0024) model time 0.4448 (0.4510) loss 3.2470 (2.4619) grad_norm 1.9998 (3.0398) loss_scale 128.0000 (106.9542) mem 16699MB [2024-08-11 09:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][380/625] eta 0:01:51 lr 0.000072 wd 0.0500 time 0.4558 (0.4533) data time 0.0008 (0.0024) model time 0.4550 (0.4509) loss 1.8321 (2.4627) grad_norm 2.7610 (3.0347) loss_scale 128.0000 (107.5066) mem 16699MB [2024-08-11 09:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][390/625] eta 0:01:46 lr 0.000072 wd 0.0500 time 0.4486 (0.4531) data time 0.0006 (0.0023) model time 0.4479 (0.4508) loss 2.3187 (2.4611) grad_norm 2.2335 (3.0249) loss_scale 128.0000 (108.0307) mem 16699MB [2024-08-11 09:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][400/625] eta 0:01:41 lr 0.000072 wd 0.0500 time 0.4503 (0.4530) data time 0.0008 (0.0023) model time 0.4495 (0.4507) loss 2.2416 (2.4662) grad_norm 2.3568 (3.0177) loss_scale 128.0000 (108.5287) mem 16699MB [2024-08-11 09:21:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][410/625] eta 0:01:37 lr 0.000072 wd 0.0500 time 0.4486 (0.4529) data time 0.0009 (0.0022) model time 0.4478 (0.4506) loss 2.7872 (2.4638) grad_norm 2.4058 (3.0267) loss_scale 128.0000 (109.0024) mem 16699MB [2024-08-11 09:21:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][420/625] eta 0:01:32 lr 0.000072 wd 0.0500 time 0.4446 (0.4528) data time 0.0008 (0.0022) model time 0.4437 (0.4505) loss 2.3507 (2.4632) grad_norm 2.6783 (3.0167) loss_scale 128.0000 (109.4537) mem 16699MB [2024-08-11 09:21:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][430/625] eta 0:01:28 lr 0.000072 wd 0.0500 time 0.4477 (0.4527) data time 0.0006 (0.0022) model time 0.4471 (0.4504) loss 1.8172 (2.4556) grad_norm 3.8529 (3.0110) loss_scale 128.0000 (109.8840) mem 16699MB [2024-08-11 09:21:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][440/625] eta 0:01:23 lr 0.000072 wd 0.0500 time 0.4474 (0.4526) data time 0.0009 (0.0021) model time 0.4465 (0.4503) loss 2.3144 (2.4512) grad_norm 2.5062 (3.0003) loss_scale 128.0000 (110.2948) mem 16699MB [2024-08-11 09:22:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][450/625] eta 0:01:19 lr 0.000072 wd 0.0500 time 0.4471 (0.4525) data time 0.0007 (0.0021) model time 0.4465 (0.4502) loss 2.8148 (2.4537) grad_norm 1.8830 (2.9852) loss_scale 128.0000 (110.6874) mem 16699MB [2024-08-11 09:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][460/625] eta 0:01:14 lr 0.000072 wd 0.0500 time 0.4449 (0.4523) data time 0.0007 (0.0021) model time 0.4442 (0.4501) loss 2.7477 (2.4585) grad_norm 2.3621 (2.9731) loss_scale 128.0000 (111.0629) mem 16699MB [2024-08-11 09:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][470/625] eta 0:01:10 lr 0.000072 wd 0.0500 time 0.4466 (0.4523) data time 0.0009 (0.0021) model time 0.4457 (0.4501) loss 2.7416 (2.4627) grad_norm 1.9427 (2.9642) loss_scale 128.0000 (111.4225) mem 16699MB [2024-08-11 09:22:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][480/625] eta 0:01:05 lr 0.000071 wd 0.0500 time 0.4477 (0.4522) data time 0.0010 (0.0020) model time 0.4467 (0.4500) loss 2.6189 (2.4655) grad_norm 1.9012 (2.9547) loss_scale 128.0000 (111.7672) mem 16699MB [2024-08-11 09:22:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][490/625] eta 0:01:01 lr 0.000071 wd 0.0500 time 0.4489 (0.4521) data time 0.0008 (0.0020) model time 0.4481 (0.4499) loss 2.6594 (2.4677) grad_norm 23.8473 (3.0102) loss_scale 128.0000 (112.0978) mem 16699MB [2024-08-11 09:22:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][500/625] eta 0:00:56 lr 0.000071 wd 0.0500 time 0.4499 (0.4520) data time 0.0008 (0.0020) model time 0.4491 (0.4499) loss 2.8520 (2.4656) grad_norm 2.7420 (3.0061) loss_scale 128.0000 (112.4152) mem 16699MB [2024-08-11 09:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][510/625] eta 0:00:52 lr 0.000071 wd 0.0500 time 0.3880 (0.4522) data time 0.0009 (0.0020) model time 0.3871 (0.4501) loss 2.8514 (2.4642) grad_norm 2.3804 (3.0444) loss_scale 128.0000 (112.7202) mem 16699MB [2024-08-11 09:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][520/625] eta 0:00:47 lr 0.000071 wd 0.0500 time 0.4453 (0.4521) data time 0.0007 (0.0020) model time 0.4446 (0.4500) loss 2.4231 (2.4670) grad_norm 1.5082 (3.0256) loss_scale 128.0000 (113.0134) mem 16699MB [2024-08-11 09:22:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][530/625] eta 0:00:42 lr 0.000071 wd 0.0500 time 0.4471 (0.4520) data time 0.0009 (0.0019) model time 0.4462 (0.4500) loss 1.6042 (2.4631) grad_norm 2.2604 (3.0243) loss_scale 128.0000 (113.2957) mem 16699MB [2024-08-11 09:22:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][540/625] eta 0:00:38 lr 0.000071 wd 0.0500 time 0.4450 (0.4519) data time 0.0007 (0.0019) model time 0.4444 (0.4499) loss 2.5268 (2.4645) grad_norm 2.1709 (3.0091) loss_scale 128.0000 (113.5675) mem 16699MB [2024-08-11 09:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][550/625] eta 0:00:33 lr 0.000071 wd 0.0500 time 0.4478 (0.4519) data time 0.0008 (0.0019) model time 0.4470 (0.4499) loss 2.4373 (2.4634) grad_norm 2.0809 (3.0289) loss_scale 128.0000 (113.8294) mem 16699MB [2024-08-11 09:22:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][560/625] eta 0:00:29 lr 0.000071 wd 0.0500 time 0.4489 (0.4518) data time 0.0009 (0.0019) model time 0.4481 (0.4498) loss 2.1154 (2.4633) grad_norm 2.2917 (3.0225) loss_scale 128.0000 (114.0820) mem 16699MB [2024-08-11 09:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][570/625] eta 0:00:24 lr 0.000071 wd 0.0500 time 0.4487 (0.4521) data time 0.0007 (0.0019) model time 0.4481 (0.4502) loss 2.0933 (2.4650) grad_norm 1.8220 (3.0100) loss_scale 128.0000 (114.3257) mem 16699MB [2024-08-11 09:23:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][580/625] eta 0:00:20 lr 0.000071 wd 0.0500 time 0.4474 (0.4521) data time 0.0007 (0.0018) model time 0.4467 (0.4501) loss 1.5468 (2.4599) grad_norm 2.8237 (3.0012) loss_scale 128.0000 (114.5611) mem 16699MB [2024-08-11 09:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][590/625] eta 0:00:15 lr 0.000071 wd 0.0500 time 0.4588 (0.4520) data time 0.0006 (0.0018) model time 0.4582 (0.4501) loss 2.4840 (2.4598) grad_norm 2.5027 (2.9987) loss_scale 128.0000 (114.7885) mem 16699MB [2024-08-11 09:23:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][600/625] eta 0:00:11 lr 0.000071 wd 0.0500 time 0.4479 (0.4520) data time 0.0008 (0.0018) model time 0.4471 (0.4501) loss 2.5532 (2.4605) grad_norm 1.8963 (3.0027) loss_scale 128.0000 (115.0083) mem 16699MB [2024-08-11 09:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][610/625] eta 0:00:06 lr 0.000071 wd 0.0500 time 0.4468 (0.4519) data time 0.0006 (0.0018) model time 0.4462 (0.4500) loss 2.6672 (2.4601) grad_norm 2.2174 (3.0122) loss_scale 128.0000 (115.2209) mem 16699MB [2024-08-11 09:23:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [259/300][620/625] eta 0:00:02 lr 0.000071 wd 0.0500 time 0.4440 (0.4518) data time 0.0004 (0.0018) model time 0.4435 (0.4499) loss 1.6558 (2.4554) grad_norm 2.5294 (3.0103) loss_scale 128.0000 (115.4267) mem 16699MB [2024-08-11 09:23:20 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 259 training takes 0:04:42 [2024-08-11 09:23:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:23:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:23:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.481 (0.481) Loss 0.5288 (0.5288) Acc@1 89.160 (89.160) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8462 (0.6319) Acc@1 80.908 (86.741) Acc@5 96.338 (97.718) Mem 16699MB [2024-08-11 09:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9180 (0.7499) Acc@1 79.004 (84.031) Acc@5 95.215 (96.598) Mem 16699MB [2024-08-11 09:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.729 Acc@5 96.551 [2024-08-11 09:23:24 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:23:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.860 (0.860) Loss 0.5059 (0.5059) Acc@1 89.600 (89.600) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:23:26 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.188) Loss 0.8135 (0.6091) Acc@1 81.055 (87.220) Acc@5 96.436 (97.843) Mem 16699MB [2024-08-11 09:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.153) Loss 0.8892 (0.7225) Acc@1 79.395 (84.405) Acc@5 95.557 (96.835) Mem 16699MB [2024-08-11 09:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.129 Acc@5 96.807 [2024-08-11 09:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:23:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][0/625] eta 0:12:48 lr 0.000071 wd 0.0500 time 1.2300 (1.2300) data time 0.4956 (0.4956) model time 0.0000 (0.0000) loss 2.2395 (2.2395) grad_norm 3.0231 (3.0231) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:23:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][10/625] eta 0:05:20 lr 0.000071 wd 0.0500 time 0.4460 (0.5207) data time 0.0006 (0.0458) model time 0.0000 (0.0000) loss 1.6158 (2.3029) grad_norm 2.0136 (2.5515) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:23:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][20/625] eta 0:04:53 lr 0.000071 wd 0.0500 time 0.4443 (0.4854) data time 0.0010 (0.0244) model time 0.0000 (0.0000) loss 2.5764 (2.4699) grad_norm 3.2731 (2.6459) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:23:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][30/625] eta 0:04:41 lr 0.000071 wd 0.0500 time 0.4425 (0.4725) data time 0.0006 (0.0168) model time 0.0000 (0.0000) loss 1.9283 (2.4052) grad_norm 2.4515 (3.1897) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:23:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][40/625] eta 0:04:32 lr 0.000071 wd 0.0500 time 0.4460 (0.4659) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 3.1229 (2.3969) grad_norm 2.2667 (3.2531) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:23:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][50/625] eta 0:04:25 lr 0.000071 wd 0.0500 time 0.4458 (0.4623) data time 0.0010 (0.0105) model time 0.0000 (0.0000) loss 1.9240 (2.3803) grad_norm 2.4363 (3.1845) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:23:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][60/625] eta 0:04:19 lr 0.000071 wd 0.0500 time 0.4499 (0.4598) data time 0.0008 (0.0090) model time 0.4491 (0.4463) loss 2.2195 (2.4081) grad_norm 4.3533 (3.1320) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][70/625] eta 0:04:14 lr 0.000071 wd 0.0500 time 0.4499 (0.4580) data time 0.0009 (0.0078) model time 0.4490 (0.4464) loss 2.8693 (2.4127) grad_norm 2.9815 (3.0192) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][80/625] eta 0:04:09 lr 0.000070 wd 0.0500 time 0.4468 (0.4569) data time 0.0009 (0.0069) model time 0.4459 (0.4469) loss 2.7726 (2.4199) grad_norm 3.7882 (3.0308) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][90/625] eta 0:04:03 lr 0.000070 wd 0.0500 time 0.4479 (0.4561) data time 0.0008 (0.0063) model time 0.4471 (0.4473) loss 2.6512 (2.4045) grad_norm 1.9255 (2.9511) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][100/625] eta 0:04:00 lr 0.000070 wd 0.0500 time 0.6532 (0.4573) data time 0.0008 (0.0057) model time 0.6523 (0.4514) loss 2.7772 (2.3973) grad_norm 2.0348 (2.9032) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][110/625] eta 0:03:54 lr 0.000070 wd 0.0500 time 0.4486 (0.4559) data time 0.0006 (0.0053) model time 0.4479 (0.4497) loss 1.9422 (2.3765) grad_norm 1.9993 (3.2514) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][120/625] eta 0:03:49 lr 0.000070 wd 0.0500 time 0.4505 (0.4554) data time 0.0006 (0.0050) model time 0.4499 (0.4494) loss 2.6407 (2.3935) grad_norm 1.9723 (3.1837) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][130/625] eta 0:03:45 lr 0.000070 wd 0.0500 time 0.4531 (0.4549) data time 0.0006 (0.0047) model time 0.4525 (0.4493) loss 2.0517 (2.4002) grad_norm 2.1291 (3.1776) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][140/625] eta 0:03:40 lr 0.000070 wd 0.0500 time 0.4509 (0.4548) data time 0.0007 (0.0044) model time 0.4501 (0.4497) loss 2.8974 (2.4084) grad_norm 2.4437 (3.1928) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][150/625] eta 0:03:35 lr 0.000070 wd 0.0500 time 0.4470 (0.4544) data time 0.0006 (0.0042) model time 0.4464 (0.4495) loss 2.4738 (2.4172) grad_norm 2.2361 (3.1431) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][160/625] eta 0:03:31 lr 0.000070 wd 0.0500 time 0.4484 (0.4541) data time 0.0006 (0.0040) model time 0.4477 (0.4494) loss 3.1874 (2.4118) grad_norm 2.3502 (3.1173) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][170/625] eta 0:03:26 lr 0.000070 wd 0.0500 time 0.4506 (0.4538) data time 0.0008 (0.0038) model time 0.4498 (0.4493) loss 1.5463 (2.4194) grad_norm 2.0252 (3.1125) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][180/625] eta 0:03:21 lr 0.000070 wd 0.0500 time 0.4485 (0.4535) data time 0.0008 (0.0036) model time 0.4477 (0.4491) loss 2.5505 (2.4202) grad_norm 2.1557 (3.0926) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][190/625] eta 0:03:17 lr 0.000070 wd 0.0500 time 0.4472 (0.4532) data time 0.0009 (0.0035) model time 0.4463 (0.4489) loss 2.4220 (2.4249) grad_norm 1.9944 (3.1131) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:24:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][200/625] eta 0:03:12 lr 0.000070 wd 0.0500 time 0.4507 (0.4529) data time 0.0006 (0.0033) model time 0.4501 (0.4488) loss 1.8016 (2.4261) grad_norm 1.5790 (3.0974) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][210/625] eta 0:03:07 lr 0.000070 wd 0.0500 time 0.4475 (0.4527) data time 0.0006 (0.0032) model time 0.4468 (0.4488) loss 2.2887 (2.4345) grad_norm 2.7332 (3.0751) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][220/625] eta 0:03:03 lr 0.000070 wd 0.0500 time 0.4474 (0.4525) data time 0.0006 (0.0031) model time 0.4467 (0.4487) loss 2.4869 (2.4396) grad_norm 2.5471 (3.1037) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][230/625] eta 0:02:58 lr 0.000070 wd 0.0500 time 0.4490 (0.4525) data time 0.0007 (0.0030) model time 0.4483 (0.4488) loss 2.6559 (2.4433) grad_norm 2.1828 (3.1574) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][240/625] eta 0:02:54 lr 0.000070 wd 0.0500 time 0.4467 (0.4532) data time 0.0006 (0.0029) model time 0.4461 (0.4498) loss 3.0357 (2.4439) grad_norm 2.3629 (3.1499) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][250/625] eta 0:02:49 lr 0.000070 wd 0.0500 time 0.4484 (0.4530) data time 0.0009 (0.0028) model time 0.4475 (0.4497) loss 2.5005 (2.4449) grad_norm 1.7492 (3.1286) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][260/625] eta 0:02:45 lr 0.000070 wd 0.0500 time 0.4451 (0.4528) data time 0.0008 (0.0028) model time 0.4443 (0.4496) loss 2.8037 (2.4563) grad_norm 2.5327 (3.2028) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][270/625] eta 0:02:40 lr 0.000070 wd 0.0500 time 0.4515 (0.4527) data time 0.0009 (0.0027) model time 0.4507 (0.4496) loss 1.8092 (2.4563) grad_norm 2.1285 (3.1841) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][280/625] eta 0:02:36 lr 0.000070 wd 0.0500 time 0.4507 (0.4525) data time 0.0007 (0.0026) model time 0.4500 (0.4495) loss 2.6452 (2.4592) grad_norm 2.5177 (3.2654) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][290/625] eta 0:02:31 lr 0.000069 wd 0.0500 time 0.4503 (0.4525) data time 0.0007 (0.0026) model time 0.4496 (0.4495) loss 2.3571 (2.4528) grad_norm 2.7962 (3.2499) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][300/625] eta 0:02:27 lr 0.000069 wd 0.0500 time 0.4459 (0.4524) data time 0.0009 (0.0025) model time 0.4451 (0.4495) loss 2.7023 (2.4569) grad_norm 2.2204 (3.2588) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][310/625] eta 0:02:22 lr 0.000069 wd 0.0500 time 0.4480 (0.4522) data time 0.0006 (0.0025) model time 0.4474 (0.4494) loss 3.1957 (2.4668) grad_norm 2.6340 (3.2776) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][320/625] eta 0:02:17 lr 0.000069 wd 0.0500 time 0.4492 (0.4521) data time 0.0006 (0.0024) model time 0.4485 (0.4493) loss 2.9010 (2.4695) grad_norm 1.9341 (3.2509) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:25:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][330/625] eta 0:02:13 lr 0.000069 wd 0.0500 time 0.4467 (0.4523) data time 0.0006 (0.0024) model time 0.4461 (0.4496) loss 1.9360 (2.4631) grad_norm 2.2295 (3.2332) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][340/625] eta 0:02:08 lr 0.000069 wd 0.0500 time 0.4493 (0.4521) data time 0.0008 (0.0023) model time 0.4484 (0.4494) loss 2.7304 (2.4629) grad_norm 2.0369 (3.2065) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][350/625] eta 0:02:04 lr 0.000069 wd 0.0500 time 0.4492 (0.4520) data time 0.0007 (0.0023) model time 0.4485 (0.4494) loss 2.9198 (2.4578) grad_norm 3.3757 (3.2206) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][360/625] eta 0:01:59 lr 0.000069 wd 0.0500 time 0.4465 (0.4519) data time 0.0007 (0.0022) model time 0.4459 (0.4493) loss 2.9652 (2.4618) grad_norm 2.2126 (3.2072) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][370/625] eta 0:01:55 lr 0.000069 wd 0.0500 time 0.4470 (0.4518) data time 0.0009 (0.0022) model time 0.4461 (0.4492) loss 1.8247 (2.4659) grad_norm 2.2386 (3.1844) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][380/625] eta 0:01:50 lr 0.000069 wd 0.0500 time 0.4446 (0.4516) data time 0.0006 (0.0022) model time 0.4440 (0.4491) loss 1.8240 (2.4602) grad_norm 2.7536 (3.1696) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][390/625] eta 0:01:46 lr 0.000069 wd 0.0500 time 0.4501 (0.4515) data time 0.0008 (0.0021) model time 0.4493 (0.4490) loss 2.3801 (2.4643) grad_norm 3.6685 (3.1514) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][400/625] eta 0:01:41 lr 0.000069 wd 0.0500 time 0.4448 (0.4514) data time 0.0006 (0.0021) model time 0.4441 (0.4490) loss 1.8651 (2.4685) grad_norm 2.5253 (3.1388) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][410/625] eta 0:01:37 lr 0.000069 wd 0.0500 time 0.4466 (0.4513) data time 0.0007 (0.0021) model time 0.4460 (0.4489) loss 2.5613 (2.4653) grad_norm 2.5353 (3.1319) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][420/625] eta 0:01:32 lr 0.000069 wd 0.0500 time 0.4510 (0.4513) data time 0.0006 (0.0020) model time 0.4504 (0.4489) loss 2.8974 (2.4699) grad_norm 7.8177 (3.1322) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][430/625] eta 0:01:28 lr 0.000069 wd 0.0500 time 0.4587 (0.4516) data time 0.0008 (0.0020) model time 0.4579 (0.4493) loss 1.7613 (2.4704) grad_norm 1.7202 (3.1225) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][440/625] eta 0:01:23 lr 0.000069 wd 0.0500 time 0.4513 (0.4516) data time 0.0007 (0.0020) model time 0.4506 (0.4493) loss 2.5171 (2.4714) grad_norm 2.9354 (3.1054) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][450/625] eta 0:01:19 lr 0.000069 wd 0.0500 time 0.4461 (0.4515) data time 0.0009 (0.0020) model time 0.4453 (0.4493) loss 1.5318 (2.4717) grad_norm 1.7558 (3.1043) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:26:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][460/625] eta 0:01:14 lr 0.000069 wd 0.0500 time 0.4462 (0.4514) data time 0.0009 (0.0019) model time 0.4453 (0.4492) loss 1.8021 (2.4695) grad_norm 12.9011 (3.1202) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][470/625] eta 0:01:09 lr 0.000069 wd 0.0500 time 0.4445 (0.4516) data time 0.0008 (0.0019) model time 0.4437 (0.4494) loss 3.0357 (2.4737) grad_norm 2.0502 (3.1195) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][480/625] eta 0:01:05 lr 0.000069 wd 0.0500 time 0.4463 (0.4515) data time 0.0010 (0.0019) model time 0.4454 (0.4494) loss 2.5926 (2.4773) grad_norm 11.4632 (3.1194) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][490/625] eta 0:01:00 lr 0.000069 wd 0.0500 time 0.4510 (0.4515) data time 0.0010 (0.0019) model time 0.4500 (0.4494) loss 2.7584 (2.4795) grad_norm 3.6610 (3.1125) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][500/625] eta 0:00:56 lr 0.000069 wd 0.0500 time 0.4455 (0.4514) data time 0.0009 (0.0019) model time 0.4446 (0.4493) loss 2.2606 (2.4802) grad_norm 74.0407 (3.2877) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][510/625] eta 0:00:51 lr 0.000068 wd 0.0500 time 0.4538 (0.4514) data time 0.0009 (0.0018) model time 0.4529 (0.4493) loss 2.7372 (2.4793) grad_norm 2.3591 (3.2812) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][520/625] eta 0:00:47 lr 0.000068 wd 0.0500 time 0.4517 (0.4514) data time 0.0009 (0.0018) model time 0.4508 (0.4493) loss 2.3585 (2.4752) grad_norm 40.9557 (3.3677) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][530/625] eta 0:00:42 lr 0.000068 wd 0.0500 time 0.4466 (0.4513) data time 0.0008 (0.0018) model time 0.4458 (0.4493) loss 1.9512 (2.4710) grad_norm 5.3138 (3.3666) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][540/625] eta 0:00:38 lr 0.000068 wd 0.0500 time 0.4506 (0.4512) data time 0.0006 (0.0018) model time 0.4500 (0.4492) loss 1.9986 (2.4715) grad_norm 2.8913 (3.3642) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][550/625] eta 0:00:33 lr 0.000068 wd 0.0500 time 0.4512 (0.4512) data time 0.0008 (0.0018) model time 0.4504 (0.4492) loss 2.7186 (2.4729) grad_norm 5.6663 (3.4214) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][560/625] eta 0:00:29 lr 0.000068 wd 0.0500 time 0.4472 (0.4511) data time 0.0008 (0.0017) model time 0.4464 (0.4491) loss 3.0044 (2.4736) grad_norm 1.9186 (3.4042) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][570/625] eta 0:00:24 lr 0.000068 wd 0.0500 time 0.4503 (0.4511) data time 0.0008 (0.0017) model time 0.4495 (0.4491) loss 1.8047 (2.4725) grad_norm 2.0298 (3.3825) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][580/625] eta 0:00:20 lr 0.000068 wd 0.0500 time 0.4521 (0.4511) data time 0.0008 (0.0017) model time 0.4513 (0.4491) loss 1.5493 (2.4738) grad_norm 1.6647 (3.3613) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][590/625] eta 0:00:15 lr 0.000068 wd 0.0500 time 0.4450 (0.4510) data time 0.0006 (0.0017) model time 0.4443 (0.4491) loss 3.0944 (2.4743) grad_norm 2.5350 (3.3502) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:27:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][600/625] eta 0:00:11 lr 0.000068 wd 0.0500 time 0.4480 (0.4509) data time 0.0006 (0.0017) model time 0.4474 (0.4490) loss 2.8713 (2.4774) grad_norm 2.7449 (3.3352) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][610/625] eta 0:00:06 lr 0.000068 wd 0.0500 time 0.4455 (0.4509) data time 0.0006 (0.0017) model time 0.4449 (0.4490) loss 2.3448 (2.4725) grad_norm 2.5130 (3.3225) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [260/300][620/625] eta 0:00:02 lr 0.000068 wd 0.0500 time 0.4452 (0.4508) data time 0.0004 (0.0017) model time 0.4448 (0.4489) loss 2.1953 (2.4721) grad_norm 2.1347 (3.3280) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 260 training takes 0:04:41 [2024-08-11 09:28:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:28:11 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:28:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5234 (0.5234) Acc@1 89.404 (89.404) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 09:28:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8379 (0.6280) Acc@1 81.104 (86.941) Acc@5 96.436 (97.767) Mem 16699MB [2024-08-11 09:28:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9229 (0.7446) Acc@1 79.248 (84.198) Acc@5 95.410 (96.673) Mem 16699MB [2024-08-11 09:28:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.903 Acc@5 96.635 [2024-08-11 09:28:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 09:28:15 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.90% [2024-08-11 09:28:15 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-11 09:28:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-11 09:28:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5063 (0.5063) Acc@1 89.600 (89.600) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:28:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8145 (0.6096) Acc@1 81.104 (87.220) Acc@5 96.436 (97.829) Mem 16699MB [2024-08-11 09:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.8911 (0.7232) Acc@1 79.346 (84.398) Acc@5 95.654 (96.822) Mem 16699MB [2024-08-11 09:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.115 Acc@5 96.789 [2024-08-11 09:28:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:28:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][0/625] eta 0:12:44 lr 0.000068 wd 0.0500 time 1.2230 (1.2230) data time 0.6782 (0.6782) model time 0.0000 (0.0000) loss 2.8760 (2.8760) grad_norm 1.9102 (1.9102) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][10/625] eta 0:05:20 lr 0.000068 wd 0.0500 time 0.4717 (0.5213) data time 0.0006 (0.0624) model time 0.0000 (0.0000) loss 2.6292 (2.4349) grad_norm 2.3154 (2.4969) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][20/625] eta 0:04:58 lr 0.000068 wd 0.0500 time 0.4491 (0.4941) data time 0.0007 (0.0331) model time 0.0000 (0.0000) loss 2.7204 (2.4439) grad_norm 4.4085 (2.8186) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][30/625] eta 0:04:44 lr 0.000068 wd 0.0500 time 0.4476 (0.4788) data time 0.0006 (0.0227) model time 0.0000 (0.0000) loss 2.3475 (2.3938) grad_norm 3.5799 (2.9343) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][40/625] eta 0:04:37 lr 0.000068 wd 0.0500 time 0.4481 (0.4742) data time 0.0006 (0.0173) model time 0.0000 (0.0000) loss 1.7892 (2.4097) grad_norm 2.2932 (2.8160) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][50/625] eta 0:04:29 lr 0.000068 wd 0.0500 time 0.4471 (0.4689) data time 0.0009 (0.0141) model time 0.0000 (0.0000) loss 2.2597 (2.4340) grad_norm 2.3660 (2.7092) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][60/625] eta 0:04:22 lr 0.000068 wd 0.0500 time 0.4472 (0.4651) data time 0.0009 (0.0119) model time 0.4464 (0.4453) loss 2.9280 (2.4286) grad_norm 2.5318 (2.6792) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][70/625] eta 0:04:16 lr 0.000068 wd 0.0500 time 0.4449 (0.4626) data time 0.0009 (0.0103) model time 0.4440 (0.4458) loss 2.7427 (2.4255) grad_norm 1.6843 (2.7158) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:28:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][80/625] eta 0:04:11 lr 0.000068 wd 0.0500 time 0.4531 (0.4609) data time 0.0009 (0.0092) model time 0.4522 (0.4465) loss 1.6963 (2.4299) grad_norm 10.6644 (2.7865) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][90/625] eta 0:04:05 lr 0.000068 wd 0.0500 time 0.4485 (0.4597) data time 0.0009 (0.0083) model time 0.4476 (0.4472) loss 2.5665 (2.4464) grad_norm 2.9114 (2.8951) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][100/625] eta 0:04:00 lr 0.000068 wd 0.0500 time 0.4454 (0.4585) data time 0.0007 (0.0075) model time 0.4447 (0.4471) loss 2.4743 (2.4420) grad_norm 2.3202 (2.8620) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][110/625] eta 0:03:55 lr 0.000067 wd 0.0500 time 0.4462 (0.4576) data time 0.0007 (0.0069) model time 0.4456 (0.4473) loss 2.5751 (2.4386) grad_norm 2.7161 (2.8675) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][120/625] eta 0:03:50 lr 0.000067 wd 0.0500 time 0.4475 (0.4569) data time 0.0009 (0.0064) model time 0.4466 (0.4474) loss 1.9939 (2.4476) grad_norm 2.7385 (2.8454) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][130/625] eta 0:03:45 lr 0.000067 wd 0.0500 time 0.4477 (0.4561) data time 0.0007 (0.0060) model time 0.4471 (0.4472) loss 2.6193 (2.4602) grad_norm 2.7526 (2.8743) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][140/625] eta 0:03:40 lr 0.000067 wd 0.0500 time 0.4471 (0.4554) data time 0.0007 (0.0056) model time 0.4464 (0.4470) loss 2.6604 (2.4788) grad_norm 2.3236 (2.9252) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][150/625] eta 0:03:36 lr 0.000067 wd 0.0500 time 0.4436 (0.4549) data time 0.0008 (0.0053) model time 0.4428 (0.4469) loss 2.7413 (2.4928) grad_norm 2.3132 (2.9899) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][160/625] eta 0:03:31 lr 0.000067 wd 0.0500 time 0.4579 (0.4547) data time 0.0009 (0.0050) model time 0.4569 (0.4473) loss 2.4102 (2.4760) grad_norm 1.9006 (3.2324) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][170/625] eta 0:03:26 lr 0.000067 wd 0.0500 time 0.4528 (0.4545) data time 0.0009 (0.0048) model time 0.4520 (0.4476) loss 1.6838 (2.4701) grad_norm 2.8696 (3.1866) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][180/625] eta 0:03:22 lr 0.000067 wd 0.0500 time 0.4479 (0.4542) data time 0.0008 (0.0046) model time 0.4471 (0.4477) loss 2.9365 (2.4779) grad_norm 2.5289 (3.1347) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][190/625] eta 0:03:17 lr 0.000067 wd 0.0500 time 0.4460 (0.4539) data time 0.0008 (0.0044) model time 0.4451 (0.4476) loss 2.8846 (2.4777) grad_norm 2.4227 (3.1056) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][200/625] eta 0:03:12 lr 0.000067 wd 0.0500 time 0.4532 (0.4536) data time 0.0006 (0.0042) model time 0.4526 (0.4476) loss 2.9982 (2.4827) grad_norm 3.1566 (3.1908) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][210/625] eta 0:03:08 lr 0.000067 wd 0.0500 time 0.4469 (0.4533) data time 0.0008 (0.0040) model time 0.4461 (0.4476) loss 2.8481 (2.4900) grad_norm 2.5306 (3.1546) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][220/625] eta 0:03:03 lr 0.000067 wd 0.0500 time 0.4505 (0.4532) data time 0.0006 (0.0039) model time 0.4498 (0.4477) loss 2.4936 (2.4881) grad_norm 1.8891 (3.1560) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][230/625] eta 0:02:58 lr 0.000067 wd 0.0500 time 0.4537 (0.4531) data time 0.0006 (0.0037) model time 0.4532 (0.4478) loss 2.6105 (2.4905) grad_norm 3.3592 (3.1920) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][240/625] eta 0:02:54 lr 0.000067 wd 0.0500 time 0.4521 (0.4538) data time 0.0008 (0.0036) model time 0.4513 (0.4490) loss 2.1360 (2.4829) grad_norm 3.0339 (3.1726) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][250/625] eta 0:02:50 lr 0.000067 wd 0.0500 time 0.4503 (0.4537) data time 0.0008 (0.0035) model time 0.4495 (0.4490) loss 2.3639 (2.4895) grad_norm 1.9523 (3.1459) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][260/625] eta 0:02:45 lr 0.000067 wd 0.0500 time 0.4523 (0.4535) data time 0.0006 (0.0034) model time 0.4517 (0.4490) loss 2.1782 (2.4948) grad_norm 2.3898 (3.2310) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][270/625] eta 0:02:40 lr 0.000067 wd 0.0500 time 0.4463 (0.4533) data time 0.0008 (0.0033) model time 0.4455 (0.4488) loss 2.2137 (2.4949) grad_norm 3.2116 (3.2188) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][280/625] eta 0:02:36 lr 0.000067 wd 0.0500 time 0.4549 (0.4531) data time 0.0007 (0.0032) model time 0.4542 (0.4488) loss 2.5299 (2.4971) grad_norm 2.6271 (3.1951) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][290/625] eta 0:02:31 lr 0.000067 wd 0.0500 time 0.4499 (0.4530) data time 0.0008 (0.0031) model time 0.4491 (0.4488) loss 2.1426 (2.4996) grad_norm 3.6725 (3.1768) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][300/625] eta 0:02:27 lr 0.000067 wd 0.0500 time 0.4503 (0.4529) data time 0.0009 (0.0031) model time 0.4494 (0.4489) loss 2.6115 (2.5020) grad_norm 1.9635 (3.1576) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][310/625] eta 0:02:22 lr 0.000067 wd 0.0500 time 0.4466 (0.4528) data time 0.0007 (0.0030) model time 0.4459 (0.4489) loss 3.4674 (2.5108) grad_norm 2.5289 (3.3177) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][320/625] eta 0:02:18 lr 0.000067 wd 0.0500 time 0.4591 (0.4528) data time 0.0009 (0.0029) model time 0.4582 (0.4489) loss 2.7761 (2.5139) grad_norm 2.6260 (3.5034) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][330/625] eta 0:02:13 lr 0.000066 wd 0.0500 time 0.4487 (0.4527) data time 0.0006 (0.0029) model time 0.4481 (0.4489) loss 2.8864 (2.5178) grad_norm 2.0749 (3.5185) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][340/625] eta 0:02:08 lr 0.000066 wd 0.0500 time 0.4475 (0.4525) data time 0.0009 (0.0028) model time 0.4467 (0.4488) loss 2.8568 (2.5174) grad_norm 2.4005 (3.5109) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:30:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][350/625] eta 0:02:04 lr 0.000066 wd 0.0500 time 0.4514 (0.4525) data time 0.0006 (0.0028) model time 0.4508 (0.4489) loss 2.1204 (2.5095) grad_norm 1.8063 (3.4852) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][360/625] eta 0:01:59 lr 0.000066 wd 0.0500 time 0.4480 (0.4523) data time 0.0008 (0.0027) model time 0.4471 (0.4488) loss 2.3807 (2.5063) grad_norm 2.2144 (3.4571) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][370/625] eta 0:01:55 lr 0.000066 wd 0.0500 time 0.4504 (0.4527) data time 0.0006 (0.0027) model time 0.4498 (0.4492) loss 2.3182 (2.5056) grad_norm 2.4211 (3.4238) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][380/625] eta 0:01:50 lr 0.000066 wd 0.0500 time 0.4504 (0.4526) data time 0.0006 (0.0026) model time 0.4498 (0.4492) loss 2.5805 (2.5128) grad_norm 3.1764 (3.4021) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][390/625] eta 0:01:46 lr 0.000066 wd 0.0500 time 0.4520 (0.4531) data time 0.0006 (0.0026) model time 0.4514 (0.4498) loss 1.9288 (2.5119) grad_norm 2.5269 (3.3814) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][400/625] eta 0:01:41 lr 0.000066 wd 0.0500 time 0.4477 (0.4530) data time 0.0006 (0.0026) model time 0.4471 (0.4498) loss 3.1983 (2.5156) grad_norm 2.0211 (3.3606) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][410/625] eta 0:01:37 lr 0.000066 wd 0.0500 time 0.4435 (0.4528) data time 0.0007 (0.0025) model time 0.4428 (0.4497) loss 1.3883 (2.5081) grad_norm 1.7561 (3.4131) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][420/625] eta 0:01:32 lr 0.000066 wd 0.0500 time 0.4443 (0.4527) data time 0.0009 (0.0025) model time 0.4434 (0.4496) loss 2.8798 (2.5079) grad_norm 1.8907 (3.4040) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][430/625] eta 0:01:28 lr 0.000066 wd 0.0500 time 0.4492 (0.4526) data time 0.0007 (0.0024) model time 0.4484 (0.4495) loss 2.3543 (2.5066) grad_norm 3.0377 (3.4205) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][440/625] eta 0:01:23 lr 0.000066 wd 0.0500 time 0.4530 (0.4525) data time 0.0006 (0.0024) model time 0.4524 (0.4496) loss 2.9141 (2.5048) grad_norm 11.1426 (3.4224) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][450/625] eta 0:01:19 lr 0.000066 wd 0.0500 time 0.4487 (0.4524) data time 0.0006 (0.0024) model time 0.4482 (0.4495) loss 2.4981 (2.5059) grad_norm 2.2880 (3.4079) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][460/625] eta 0:01:14 lr 0.000066 wd 0.0500 time 0.4479 (0.4523) data time 0.0008 (0.0023) model time 0.4470 (0.4494) loss 1.9733 (2.5030) grad_norm 1.9515 (3.3876) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][470/625] eta 0:01:10 lr 0.000066 wd 0.0500 time 0.4509 (0.4523) data time 0.0006 (0.0023) model time 0.4503 (0.4494) loss 2.3259 (2.5018) grad_norm 1.7624 (3.3642) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:31:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][480/625] eta 0:01:05 lr 0.000066 wd 0.0500 time 0.4476 (0.4522) data time 0.0006 (0.0023) model time 0.4470 (0.4493) loss 2.3825 (2.5038) grad_norm 2.0798 (3.3519) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][490/625] eta 0:01:01 lr 0.000066 wd 0.0500 time 0.4446 (0.4520) data time 0.0006 (0.0022) model time 0.4440 (0.4492) loss 3.0238 (2.5035) grad_norm 2.0662 (3.3386) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][500/625] eta 0:00:56 lr 0.000066 wd 0.0500 time 0.4426 (0.4519) data time 0.0008 (0.0022) model time 0.4417 (0.4492) loss 2.7169 (2.5051) grad_norm 2.2276 (3.3200) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][510/625] eta 0:00:51 lr 0.000066 wd 0.0500 time 0.4552 (0.4519) data time 0.0006 (0.0022) model time 0.4546 (0.4491) loss 2.6534 (2.5097) grad_norm 2.4681 (3.3308) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][520/625] eta 0:00:47 lr 0.000066 wd 0.0500 time 0.4465 (0.4518) data time 0.0009 (0.0021) model time 0.4456 (0.4491) loss 2.8206 (2.5085) grad_norm 3.0065 (3.3293) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][530/625] eta 0:00:42 lr 0.000066 wd 0.0500 time 0.4452 (0.4517) data time 0.0006 (0.0021) model time 0.4446 (0.4491) loss 2.2732 (2.5090) grad_norm 38.2147 (3.3750) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][540/625] eta 0:00:38 lr 0.000066 wd 0.0500 time 0.4475 (0.4517) data time 0.0006 (0.0021) model time 0.4469 (0.4490) loss 2.6144 (2.5082) grad_norm 2.9111 (3.3884) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][550/625] eta 0:00:33 lr 0.000066 wd 0.0500 time 0.4500 (0.4516) data time 0.0006 (0.0021) model time 0.4494 (0.4490) loss 1.8214 (2.5046) grad_norm 2.4931 (3.3839) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][560/625] eta 0:00:29 lr 0.000065 wd 0.0500 time 0.4432 (0.4516) data time 0.0006 (0.0020) model time 0.4426 (0.4490) loss 2.5512 (2.4967) grad_norm 2.1935 (3.3936) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][570/625] eta 0:00:24 lr 0.000065 wd 0.0500 time 0.4459 (0.4515) data time 0.0008 (0.0020) model time 0.4451 (0.4490) loss 2.7179 (2.4961) grad_norm 4.1938 (3.3880) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][580/625] eta 0:00:20 lr 0.000065 wd 0.0500 time 0.4487 (0.4514) data time 0.0006 (0.0020) model time 0.4481 (0.4489) loss 1.8660 (2.4928) grad_norm 2.5533 (3.3772) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][590/625] eta 0:00:15 lr 0.000065 wd 0.0500 time 0.4511 (0.4514) data time 0.0006 (0.0020) model time 0.4505 (0.4489) loss 2.3433 (2.4944) grad_norm 2.4896 (3.3736) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][600/625] eta 0:00:11 lr 0.000065 wd 0.0500 time 0.4495 (0.4514) data time 0.0006 (0.0020) model time 0.4489 (0.4489) loss 3.0009 (2.4944) grad_norm 3.7624 (3.3685) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:32:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][610/625] eta 0:00:06 lr 0.000065 wd 0.0500 time 0.4405 (0.4513) data time 0.0006 (0.0020) model time 0.4399 (0.4489) loss 2.9243 (2.4929) grad_norm 2.1838 (3.3581) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [261/300][620/625] eta 0:00:02 lr 0.000065 wd 0.0500 time 0.4457 (0.4512) data time 0.0006 (0.0019) model time 0.4452 (0.4488) loss 2.8543 (2.4943) grad_norm 2.5962 (3.3527) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 261 training takes 0:04:41 [2024-08-11 09:33:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:33:03 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:33:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5337 (0.5337) Acc@1 89.014 (89.014) Acc@5 98.730 (98.730) Mem 16699MB [2024-08-11 09:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8516 (0.6250) Acc@1 80.225 (86.901) Acc@5 95.996 (97.727) Mem 16699MB [2024-08-11 09:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9233 (0.7478) Acc@1 79.199 (84.056) Acc@5 95.508 (96.636) Mem 16699MB [2024-08-11 09:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.767 Acc@5 96.603 [2024-08-11 09:33:06 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 09:33:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.875 (0.875) Loss 0.5068 (0.5068) Acc@1 89.648 (89.648) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:33:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.187) Loss 0.8154 (0.6102) Acc@1 81.152 (87.207) Acc@5 96.484 (97.820) Mem 16699MB [2024-08-11 09:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.153) Loss 0.8916 (0.7242) Acc@1 79.199 (84.398) Acc@5 95.703 (96.817) Mem 16699MB [2024-08-11 09:33:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.113 Acc@5 96.777 [2024-08-11 09:33:10 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:33:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][0/625] eta 0:12:17 lr 0.000065 wd 0.0500 time 1.1808 (1.1808) data time 0.5764 (0.5764) model time 0.0000 (0.0000) loss 2.6786 (2.6786) grad_norm 9.4619 (9.4619) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][10/625] eta 0:05:16 lr 0.000065 wd 0.0500 time 0.4490 (0.5140) data time 0.0006 (0.0531) model time 0.0000 (0.0000) loss 2.2616 (2.4886) grad_norm 2.4157 (4.5339) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][20/625] eta 0:04:51 lr 0.000065 wd 0.0500 time 0.4461 (0.4826) data time 0.0006 (0.0284) model time 0.0000 (0.0000) loss 2.7431 (2.3522) grad_norm 3.1941 (3.5911) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][30/625] eta 0:04:40 lr 0.000065 wd 0.0500 time 0.4552 (0.4721) data time 0.0006 (0.0195) model time 0.0000 (0.0000) loss 2.9712 (2.4739) grad_norm 2.5394 (4.4301) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][40/625] eta 0:04:32 lr 0.000065 wd 0.0500 time 0.4441 (0.4665) data time 0.0008 (0.0149) model time 0.0000 (0.0000) loss 2.7161 (2.4555) grad_norm 2.6482 (3.9737) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][50/625] eta 0:04:28 lr 0.000065 wd 0.0500 time 0.4477 (0.4673) data time 0.0009 (0.0122) model time 0.0000 (0.0000) loss 2.4430 (2.4995) grad_norm 2.5546 (3.7135) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][60/625] eta 0:04:22 lr 0.000065 wd 0.0500 time 0.4465 (0.4642) data time 0.0006 (0.0103) model time 0.4459 (0.4476) loss 2.9078 (2.5040) grad_norm 2.5732 (3.6254) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][70/625] eta 0:04:16 lr 0.000065 wd 0.0500 time 0.4500 (0.4619) data time 0.0006 (0.0090) model time 0.4494 (0.4473) loss 2.4521 (2.4677) grad_norm 2.7821 (3.5299) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][80/625] eta 0:04:12 lr 0.000065 wd 0.0500 time 0.6512 (0.4625) data time 0.0006 (0.0080) model time 0.6505 (0.4537) loss 2.2381 (2.4744) grad_norm 1.9226 (3.7118) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][90/625] eta 0:04:06 lr 0.000065 wd 0.0500 time 0.4430 (0.4601) data time 0.0008 (0.0072) model time 0.4423 (0.4501) loss 2.6382 (2.4759) grad_norm 9.5651 (4.1166) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][100/625] eta 0:04:01 lr 0.000065 wd 0.0500 time 0.4483 (0.4591) data time 0.0006 (0.0065) model time 0.4476 (0.4500) loss 2.9972 (2.4843) grad_norm 2.1352 (4.0246) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][110/625] eta 0:03:56 lr 0.000065 wd 0.0500 time 0.4527 (0.4584) data time 0.0008 (0.0060) model time 0.4520 (0.4501) loss 2.5458 (2.4705) grad_norm 1.8904 (3.9033) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][120/625] eta 0:03:51 lr 0.000065 wd 0.0500 time 0.4462 (0.4577) data time 0.0006 (0.0056) model time 0.4456 (0.4500) loss 2.2866 (2.4497) grad_norm 3.9529 (3.8028) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][130/625] eta 0:03:46 lr 0.000065 wd 0.0500 time 0.4496 (0.4572) data time 0.0008 (0.0052) model time 0.4488 (0.4499) loss 2.5492 (2.4410) grad_norm 2.2804 (3.7058) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][140/625] eta 0:03:41 lr 0.000065 wd 0.0500 time 0.4434 (0.4564) data time 0.0009 (0.0049) model time 0.4425 (0.4494) loss 1.6840 (2.4509) grad_norm 1.8612 (3.6275) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][150/625] eta 0:03:36 lr 0.000065 wd 0.0500 time 0.4425 (0.4558) data time 0.0008 (0.0047) model time 0.4416 (0.4491) loss 2.4703 (2.4460) grad_norm 2.9152 (3.5761) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][160/625] eta 0:03:31 lr 0.000064 wd 0.0500 time 0.4498 (0.4554) data time 0.0006 (0.0044) model time 0.4492 (0.4490) loss 2.3142 (2.4562) grad_norm 2.1850 (3.5304) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][170/625] eta 0:03:27 lr 0.000064 wd 0.0500 time 0.4447 (0.4549) data time 0.0008 (0.0042) model time 0.4439 (0.4489) loss 2.4973 (2.4528) grad_norm 2.2273 (3.4681) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][180/625] eta 0:03:22 lr 0.000064 wd 0.0500 time 0.4476 (0.4546) data time 0.0009 (0.0040) model time 0.4467 (0.4487) loss 2.7953 (2.4601) grad_norm 2.5056 (3.4870) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][190/625] eta 0:03:17 lr 0.000064 wd 0.0500 time 0.4481 (0.4543) data time 0.0008 (0.0039) model time 0.4473 (0.4487) loss 2.8663 (2.4645) grad_norm 3.4033 (3.5025) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][200/625] eta 0:03:12 lr 0.000064 wd 0.0500 time 0.4487 (0.4540) data time 0.0009 (0.0037) model time 0.4479 (0.4486) loss 2.5608 (2.4647) grad_norm 1.8513 (3.4529) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][210/625] eta 0:03:08 lr 0.000064 wd 0.0500 time 0.4446 (0.4536) data time 0.0007 (0.0036) model time 0.4439 (0.4485) loss 1.9143 (2.4698) grad_norm 2.6845 (3.4500) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][220/625] eta 0:03:03 lr 0.000064 wd 0.0500 time 0.4449 (0.4533) data time 0.0008 (0.0034) model time 0.4441 (0.4483) loss 2.9280 (2.4688) grad_norm 2.9711 (3.4534) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][230/625] eta 0:02:58 lr 0.000064 wd 0.0500 time 0.4483 (0.4530) data time 0.0008 (0.0033) model time 0.4474 (0.4482) loss 2.9182 (2.4694) grad_norm 4.4324 (3.4108) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][240/625] eta 0:02:54 lr 0.000064 wd 0.0500 time 0.4443 (0.4527) data time 0.0009 (0.0032) model time 0.4435 (0.4480) loss 2.6138 (2.4763) grad_norm 2.1515 (3.3977) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][250/625] eta 0:02:49 lr 0.000064 wd 0.0500 time 0.4445 (0.4525) data time 0.0010 (0.0031) model time 0.4435 (0.4480) loss 2.8834 (2.4813) grad_norm 6.2579 (3.3760) loss_scale 256.0000 (130.0398) mem 16699MB [2024-08-11 09:35:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][260/625] eta 0:02:45 lr 0.000064 wd 0.0500 time 0.4460 (0.4524) data time 0.0007 (0.0030) model time 0.4453 (0.4479) loss 3.0017 (2.4874) grad_norm 2.5033 (3.3854) loss_scale 256.0000 (134.8659) mem 16699MB [2024-08-11 09:35:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][270/625] eta 0:02:40 lr 0.000064 wd 0.0500 time 0.4505 (0.4523) data time 0.0008 (0.0030) model time 0.4498 (0.4480) loss 3.2128 (2.4818) grad_norm 1.9174 (3.3495) loss_scale 256.0000 (139.3358) mem 16699MB [2024-08-11 09:35:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][280/625] eta 0:02:35 lr 0.000064 wd 0.0500 time 0.4444 (0.4521) data time 0.0007 (0.0029) model time 0.4437 (0.4479) loss 2.8403 (2.4748) grad_norm 2.8566 (3.3339) loss_scale 256.0000 (143.4875) mem 16699MB [2024-08-11 09:35:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][290/625] eta 0:02:31 lr 0.000064 wd 0.0500 time 0.4469 (0.4520) data time 0.0009 (0.0028) model time 0.4460 (0.4479) loss 2.4787 (2.4746) grad_norm 3.9687 (3.3107) loss_scale 256.0000 (147.3540) mem 16699MB [2024-08-11 09:35:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][300/625] eta 0:02:26 lr 0.000064 wd 0.0500 time 0.4481 (0.4518) data time 0.0007 (0.0028) model time 0.4474 (0.4478) loss 1.4949 (2.4679) grad_norm 3.0758 (3.3099) loss_scale 256.0000 (150.9635) mem 16699MB [2024-08-11 09:35:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][310/625] eta 0:02:22 lr 0.000064 wd 0.0500 time 0.4471 (0.4517) data time 0.0008 (0.0027) model time 0.4463 (0.4478) loss 2.7964 (2.4652) grad_norm 2.4385 (3.3308) loss_scale 256.0000 (154.3408) mem 16699MB [2024-08-11 09:35:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][320/625] eta 0:02:17 lr 0.000064 wd 0.0500 time 0.4485 (0.4516) data time 0.0007 (0.0026) model time 0.4478 (0.4478) loss 1.3589 (2.4624) grad_norm 2.0186 (3.3157) loss_scale 256.0000 (157.5078) mem 16699MB [2024-08-11 09:35:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][330/625] eta 0:02:13 lr 0.000064 wd 0.0500 time 0.4473 (0.4515) data time 0.0009 (0.0026) model time 0.4464 (0.4478) loss 2.6140 (2.4675) grad_norm 3.1022 (3.2991) loss_scale 256.0000 (160.4834) mem 16699MB [2024-08-11 09:35:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][340/625] eta 0:02:08 lr 0.000064 wd 0.0500 time 0.4493 (0.4515) data time 0.0008 (0.0025) model time 0.4485 (0.4479) loss 2.8359 (2.4618) grad_norm 2.3888 (3.2813) loss_scale 256.0000 (163.2845) mem 16699MB [2024-08-11 09:35:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][350/625] eta 0:02:04 lr 0.000064 wd 0.0500 time 0.4450 (0.4515) data time 0.0007 (0.0025) model time 0.4443 (0.4479) loss 2.8331 (2.4652) grad_norm 2.2894 (3.3264) loss_scale 256.0000 (165.9259) mem 16699MB [2024-08-11 09:35:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][360/625] eta 0:01:59 lr 0.000064 wd 0.0500 time 0.4496 (0.4514) data time 0.0008 (0.0024) model time 0.4488 (0.4479) loss 2.8212 (2.4606) grad_norm 3.2729 (3.3131) loss_scale 256.0000 (168.4211) mem 16699MB [2024-08-11 09:35:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][370/625] eta 0:01:55 lr 0.000064 wd 0.0500 time 0.4518 (0.4513) data time 0.0006 (0.0024) model time 0.4512 (0.4479) loss 3.0331 (2.4594) grad_norm 2.0646 (3.2925) loss_scale 256.0000 (170.7817) mem 16699MB [2024-08-11 09:36:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][380/625] eta 0:01:50 lr 0.000064 wd 0.0500 time 0.4478 (0.4517) data time 0.0007 (0.0024) model time 0.4471 (0.4485) loss 1.5001 (2.4570) grad_norm 3.6015 (3.2725) loss_scale 256.0000 (173.0184) mem 16699MB [2024-08-11 09:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][390/625] eta 0:01:46 lr 0.000063 wd 0.0500 time 0.4462 (0.4516) data time 0.0006 (0.0023) model time 0.4456 (0.4484) loss 2.1672 (2.4594) grad_norm 2.2412 (3.2518) loss_scale 256.0000 (175.1407) mem 16699MB [2024-08-11 09:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][400/625] eta 0:01:41 lr 0.000063 wd 0.0500 time 0.4485 (0.4515) data time 0.0007 (0.0023) model time 0.4478 (0.4484) loss 2.4836 (2.4624) grad_norm 2.0409 (3.2317) loss_scale 256.0000 (177.1571) mem 16699MB [2024-08-11 09:36:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][410/625] eta 0:01:37 lr 0.000063 wd 0.0500 time 0.4518 (0.4515) data time 0.0009 (0.0023) model time 0.4509 (0.4484) loss 2.7275 (2.4640) grad_norm 2.1257 (3.2110) loss_scale 256.0000 (179.0754) mem 16699MB [2024-08-11 09:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][420/625] eta 0:01:32 lr 0.000063 wd 0.0500 time 0.4510 (0.4519) data time 0.0006 (0.0022) model time 0.4503 (0.4489) loss 2.5541 (2.4613) grad_norm 2.7314 (3.1918) loss_scale 256.0000 (180.9026) mem 16699MB [2024-08-11 09:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][430/625] eta 0:01:28 lr 0.000063 wd 0.0500 time 0.4463 (0.4518) data time 0.0010 (0.0022) model time 0.4453 (0.4488) loss 2.5465 (2.4633) grad_norm 1.7684 (3.1694) loss_scale 256.0000 (182.6450) mem 16699MB [2024-08-11 09:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][440/625] eta 0:01:23 lr 0.000063 wd 0.0500 time 0.4475 (0.4517) data time 0.0006 (0.0022) model time 0.4469 (0.4488) loss 2.8163 (2.4619) grad_norm 2.7837 (3.1499) loss_scale 256.0000 (184.3084) mem 16699MB [2024-08-11 09:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][450/625] eta 0:01:19 lr 0.000063 wd 0.0500 time 0.4472 (0.4516) data time 0.0007 (0.0021) model time 0.4465 (0.4488) loss 2.5122 (2.4599) grad_norm 2.2135 (3.1434) loss_scale 256.0000 (185.8980) mem 16699MB [2024-08-11 09:36:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][460/625] eta 0:01:14 lr 0.000063 wd 0.0500 time 0.4497 (0.4516) data time 0.0007 (0.0021) model time 0.4490 (0.4488) loss 2.4460 (2.4628) grad_norm 2.0805 (3.1259) loss_scale 256.0000 (187.4187) mem 16699MB [2024-08-11 09:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][470/625] eta 0:01:09 lr 0.000063 wd 0.0500 time 0.4493 (0.4515) data time 0.0008 (0.0021) model time 0.4485 (0.4488) loss 2.2672 (2.4663) grad_norm 3.4116 (3.1503) loss_scale 256.0000 (188.8747) mem 16699MB [2024-08-11 09:36:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][480/625] eta 0:01:05 lr 0.000063 wd 0.0500 time 0.4542 (0.4515) data time 0.0008 (0.0021) model time 0.4535 (0.4488) loss 2.4825 (2.4666) grad_norm 3.4658 (3.1909) loss_scale 256.0000 (190.2703) mem 16699MB [2024-08-11 09:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][490/625] eta 0:01:00 lr 0.000063 wd 0.0500 time 0.4450 (0.4514) data time 0.0007 (0.0020) model time 0.4443 (0.4488) loss 2.4228 (2.4660) grad_norm 2.5726 (3.2034) loss_scale 256.0000 (191.6090) mem 16699MB [2024-08-11 09:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][500/625] eta 0:00:56 lr 0.000063 wd 0.0500 time 0.4445 (0.4514) data time 0.0007 (0.0020) model time 0.4437 (0.4488) loss 2.2539 (2.4662) grad_norm 2.6991 (3.1899) loss_scale 256.0000 (192.8942) mem 16699MB [2024-08-11 09:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][510/625] eta 0:00:51 lr 0.000063 wd 0.0500 time 0.4476 (0.4513) data time 0.0006 (0.0020) model time 0.4470 (0.4487) loss 2.6376 (2.4681) grad_norm 3.2143 (3.1938) loss_scale 256.0000 (194.1292) mem 16699MB [2024-08-11 09:37:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][520/625] eta 0:00:47 lr 0.000063 wd 0.0500 time 0.4452 (0.4512) data time 0.0008 (0.0020) model time 0.4443 (0.4487) loss 2.9211 (2.4684) grad_norm 3.4174 (3.2064) loss_scale 256.0000 (195.3167) mem 16699MB [2024-08-11 09:37:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][530/625] eta 0:00:42 lr 0.000063 wd 0.0500 time 0.4474 (0.4512) data time 0.0006 (0.0019) model time 0.4468 (0.4486) loss 3.0534 (2.4675) grad_norm 3.9431 (3.1982) loss_scale 256.0000 (196.4595) mem 16699MB [2024-08-11 09:37:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][540/625] eta 0:00:38 lr 0.000063 wd 0.0500 time 0.4460 (0.4511) data time 0.0006 (0.0019) model time 0.4454 (0.4486) loss 1.9623 (2.4650) grad_norm 2.6711 (3.1849) loss_scale 256.0000 (197.5601) mem 16699MB [2024-08-11 09:37:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][550/625] eta 0:00:33 lr 0.000063 wd 0.0500 time 0.4475 (0.4511) data time 0.0008 (0.0019) model time 0.4467 (0.4486) loss 2.2136 (2.4651) grad_norm 4.4656 (3.2434) loss_scale 256.0000 (198.6207) mem 16699MB [2024-08-11 09:37:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][560/625] eta 0:00:29 lr 0.000063 wd 0.0500 time 0.4514 (0.4510) data time 0.0008 (0.0019) model time 0.4506 (0.4486) loss 2.6989 (2.4681) grad_norm 2.4076 (3.2365) loss_scale 256.0000 (199.6435) mem 16699MB [2024-08-11 09:37:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][570/625] eta 0:00:24 lr 0.000063 wd 0.0500 time 0.4462 (0.4513) data time 0.0008 (0.0019) model time 0.4455 (0.4489) loss 2.0801 (2.4685) grad_norm 2.6022 (3.3123) loss_scale 256.0000 (200.6305) mem 16699MB [2024-08-11 09:37:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][580/625] eta 0:00:20 lr 0.000063 wd 0.0500 time 0.4471 (0.4512) data time 0.0010 (0.0018) model time 0.4461 (0.4488) loss 2.9179 (2.4689) grad_norm 2.1025 (3.2992) loss_scale 256.0000 (201.5835) mem 16699MB [2024-08-11 09:37:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][590/625] eta 0:00:15 lr 0.000063 wd 0.0500 time 0.4442 (0.4512) data time 0.0006 (0.0018) model time 0.4436 (0.4488) loss 2.7559 (2.4691) grad_norm 1.9217 (3.2975) loss_scale 256.0000 (202.5042) mem 16699MB [2024-08-11 09:37:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][600/625] eta 0:00:11 lr 0.000063 wd 0.0500 time 0.4436 (0.4511) data time 0.0007 (0.0018) model time 0.4429 (0.4487) loss 1.5829 (2.4692) grad_norm 2.2501 (3.2888) loss_scale 256.0000 (203.3943) mem 16699MB [2024-08-11 09:37:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][610/625] eta 0:00:06 lr 0.000063 wd 0.0500 time 0.4423 (0.4511) data time 0.0004 (0.0018) model time 0.4419 (0.4488) loss 1.9138 (2.4708) grad_norm 2.7075 (3.2753) loss_scale 256.0000 (204.2553) mem 16699MB [2024-08-11 09:37:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [262/300][620/625] eta 0:00:02 lr 0.000062 wd 0.0500 time 0.4434 (0.4510) data time 0.0006 (0.0018) model time 0.4427 (0.4487) loss 2.9221 (2.4676) grad_norm 2.6911 (3.2635) loss_scale 256.0000 (205.0886) mem 16699MB [2024-08-11 09:37:52 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 262 training takes 0:04:41 [2024-08-11 09:37:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:37:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:37:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.470 (0.470) Loss 0.5190 (0.5190) Acc@1 89.209 (89.209) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 09:37:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8477 (0.6235) Acc@1 80.371 (86.967) Acc@5 96.143 (97.785) Mem 16699MB [2024-08-11 09:37:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9092 (0.7466) Acc@1 79.688 (84.047) Acc@5 95.459 (96.647) Mem 16699MB [2024-08-11 09:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.739 Acc@5 96.605 [2024-08-11 09:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.803 (0.803) Loss 0.5083 (0.5083) Acc@1 89.648 (89.648) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:37:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8179 (0.6108) Acc@1 81.006 (87.216) Acc@5 96.436 (97.820) Mem 16699MB [2024-08-11 09:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.152) Loss 0.8926 (0.7252) Acc@1 79.248 (84.426) Acc@5 95.752 (96.815) Mem 16699MB [2024-08-11 09:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.131 Acc@5 96.777 [2024-08-11 09:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:38:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][0/625] eta 0:12:14 lr 0.000062 wd 0.0500 time 1.1745 (1.1745) data time 0.5685 (0.5685) model time 0.0000 (0.0000) loss 2.8547 (2.8547) grad_norm 6.1384 (6.1384) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][10/625] eta 0:05:16 lr 0.000062 wd 0.0500 time 0.4476 (0.5144) data time 0.0008 (0.0524) model time 0.0000 (0.0000) loss 2.8943 (2.4707) grad_norm 6.2981 (3.3830) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][20/625] eta 0:04:52 lr 0.000062 wd 0.0500 time 0.4431 (0.4827) data time 0.0010 (0.0278) model time 0.0000 (0.0000) loss 1.8299 (2.3560) grad_norm 7.3950 (3.2499) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][30/625] eta 0:04:40 lr 0.000062 wd 0.0500 time 0.4484 (0.4716) data time 0.0008 (0.0191) model time 0.0000 (0.0000) loss 2.6124 (2.3484) grad_norm 1.9895 (3.5350) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][40/625] eta 0:04:32 lr 0.000062 wd 0.0500 time 0.4495 (0.4659) data time 0.0007 (0.0147) model time 0.0000 (0.0000) loss 1.7070 (2.3111) grad_norm 3.0634 (3.5318) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][50/625] eta 0:04:26 lr 0.000062 wd 0.0500 time 0.4530 (0.4629) data time 0.0008 (0.0120) model time 0.0000 (0.0000) loss 2.9632 (2.3453) grad_norm 2.4883 (3.3996) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][60/625] eta 0:04:20 lr 0.000062 wd 0.0500 time 0.4519 (0.4610) data time 0.0009 (0.0102) model time 0.4510 (0.4505) loss 2.7991 (2.3826) grad_norm 3.0068 (3.2079) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][70/625] eta 0:04:15 lr 0.000062 wd 0.0500 time 0.4534 (0.4596) data time 0.0009 (0.0089) model time 0.4526 (0.4503) loss 2.8819 (2.4058) grad_norm 1.8044 (3.3300) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][80/625] eta 0:04:09 lr 0.000062 wd 0.0500 time 0.4485 (0.4582) data time 0.0008 (0.0079) model time 0.4477 (0.4495) loss 2.3419 (2.4259) grad_norm 2.4774 (3.2674) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][90/625] eta 0:04:04 lr 0.000062 wd 0.0500 time 0.4460 (0.4571) data time 0.0007 (0.0071) model time 0.4453 (0.4489) loss 2.8447 (2.4309) grad_norm 1.9437 (3.1668) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][100/625] eta 0:03:59 lr 0.000062 wd 0.0500 time 0.4498 (0.4562) data time 0.0007 (0.0065) model time 0.4491 (0.4485) loss 2.3772 (2.4348) grad_norm 2.1276 (3.1228) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][110/625] eta 0:03:54 lr 0.000062 wd 0.0500 time 0.4447 (0.4554) data time 0.0007 (0.0060) model time 0.4440 (0.4482) loss 2.0042 (2.4346) grad_norm 2.6811 (3.0607) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:38:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][120/625] eta 0:03:49 lr 0.000062 wd 0.0500 time 0.4566 (0.4549) data time 0.0006 (0.0056) model time 0.4560 (0.4482) loss 2.6265 (2.4292) grad_norm 11.9812 (3.1002) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][130/625] eta 0:03:44 lr 0.000062 wd 0.0500 time 0.4527 (0.4545) data time 0.0007 (0.0052) model time 0.4520 (0.4483) loss 2.3745 (2.4239) grad_norm 2.9201 (3.0400) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][140/625] eta 0:03:40 lr 0.000062 wd 0.0500 time 0.4448 (0.4553) data time 0.0007 (0.0049) model time 0.4441 (0.4502) loss 2.5453 (2.4283) grad_norm 1.7197 (3.0475) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][150/625] eta 0:03:36 lr 0.000062 wd 0.0500 time 0.4463 (0.4550) data time 0.0008 (0.0046) model time 0.4455 (0.4502) loss 2.4303 (2.4327) grad_norm 1.8103 (2.9931) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][160/625] eta 0:03:31 lr 0.000062 wd 0.0500 time 0.4462 (0.4546) data time 0.0007 (0.0044) model time 0.4455 (0.4499) loss 2.1361 (2.4319) grad_norm 2.8864 (2.9798) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][170/625] eta 0:03:26 lr 0.000062 wd 0.0500 time 0.4486 (0.4542) data time 0.0009 (0.0042) model time 0.4478 (0.4497) loss 2.4171 (2.4393) grad_norm 3.1814 (3.0101) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][180/625] eta 0:03:22 lr 0.000062 wd 0.0500 time 0.4470 (0.4541) data time 0.0006 (0.0040) model time 0.4464 (0.4499) loss 2.9326 (2.4505) grad_norm 1.6235 (2.9773) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][190/625] eta 0:03:17 lr 0.000062 wd 0.0500 time 0.4506 (0.4538) data time 0.0007 (0.0038) model time 0.4500 (0.4497) loss 2.1249 (2.4520) grad_norm 2.0807 (2.9847) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][200/625] eta 0:03:12 lr 0.000062 wd 0.0500 time 0.4480 (0.4536) data time 0.0006 (0.0037) model time 0.4474 (0.4496) loss 2.8988 (2.4592) grad_norm 2.4259 (2.9648) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][210/625] eta 0:03:08 lr 0.000062 wd 0.0500 time 0.4487 (0.4534) data time 0.0008 (0.0035) model time 0.4479 (0.4495) loss 2.2978 (2.4526) grad_norm 3.4625 (2.9468) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][220/625] eta 0:03:03 lr 0.000062 wd 0.0500 time 0.4656 (0.4533) data time 0.0008 (0.0034) model time 0.4648 (0.4495) loss 2.8104 (2.4605) grad_norm 2.5450 (2.9353) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][230/625] eta 0:02:58 lr 0.000061 wd 0.0500 time 0.4470 (0.4531) data time 0.0008 (0.0033) model time 0.4462 (0.4495) loss 2.1853 (2.4631) grad_norm 1.9166 (2.9519) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][240/625] eta 0:02:54 lr 0.000061 wd 0.0500 time 0.4464 (0.4529) data time 0.0007 (0.0032) model time 0.4457 (0.4493) loss 3.1638 (2.4709) grad_norm 3.2813 (2.9393) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][250/625] eta 0:02:49 lr 0.000061 wd 0.0500 time 0.4413 (0.4526) data time 0.0009 (0.0031) model time 0.4404 (0.4492) loss 2.0272 (2.4678) grad_norm 2.9663 (2.9331) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][260/625] eta 0:02:45 lr 0.000061 wd 0.0500 time 0.4505 (0.4525) data time 0.0008 (0.0030) model time 0.4497 (0.4492) loss 2.4422 (2.4732) grad_norm 1.9086 (2.9351) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][270/625] eta 0:02:40 lr 0.000061 wd 0.0500 time 0.4521 (0.4525) data time 0.0009 (0.0029) model time 0.4512 (0.4492) loss 2.7797 (2.4697) grad_norm 3.1450 (2.9253) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][280/625] eta 0:02:36 lr 0.000061 wd 0.0500 time 0.4518 (0.4524) data time 0.0007 (0.0029) model time 0.4511 (0.4493) loss 2.5858 (2.4676) grad_norm 2.0101 (2.9604) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][290/625] eta 0:02:31 lr 0.000061 wd 0.0500 time 0.4695 (0.4524) data time 0.0006 (0.0028) model time 0.4689 (0.4494) loss 2.4628 (2.4686) grad_norm 2.9929 (2.9625) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][300/625] eta 0:02:26 lr 0.000061 wd 0.0500 time 0.4458 (0.4523) data time 0.0008 (0.0027) model time 0.4450 (0.4493) loss 2.1959 (2.4708) grad_norm 6.1226 (2.9549) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][310/625] eta 0:02:22 lr 0.000061 wd 0.0500 time 0.4438 (0.4521) data time 0.0009 (0.0027) model time 0.4429 (0.4492) loss 2.6401 (2.4682) grad_norm 1.9855 (2.9426) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][320/625] eta 0:02:17 lr 0.000061 wd 0.0500 time 0.4474 (0.4524) data time 0.0010 (0.0026) model time 0.4464 (0.4496) loss 1.7353 (2.4669) grad_norm 2.2187 (2.9314) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][330/625] eta 0:02:13 lr 0.000061 wd 0.0500 time 0.4478 (0.4523) data time 0.0009 (0.0026) model time 0.4469 (0.4495) loss 2.5730 (2.4683) grad_norm 2.4735 (2.9209) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][340/625] eta 0:02:08 lr 0.000061 wd 0.0500 time 0.4487 (0.4522) data time 0.0006 (0.0025) model time 0.4481 (0.4495) loss 3.0833 (2.4708) grad_norm 2.0594 (2.9120) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][350/625] eta 0:02:04 lr 0.000061 wd 0.0500 time 0.4487 (0.4521) data time 0.0008 (0.0025) model time 0.4479 (0.4494) loss 2.6679 (2.4697) grad_norm 2.9156 (2.9162) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][360/625] eta 0:01:59 lr 0.000061 wd 0.0500 time 0.4513 (0.4526) data time 0.0006 (0.0024) model time 0.4506 (0.4500) loss 2.2936 (2.4704) grad_norm 5.2172 (3.0023) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][370/625] eta 0:01:55 lr 0.000061 wd 0.0500 time 0.4489 (0.4525) data time 0.0009 (0.0024) model time 0.4480 (0.4500) loss 2.2717 (2.4699) grad_norm 2.5434 (3.0256) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][380/625] eta 0:01:50 lr 0.000061 wd 0.0500 time 0.4448 (0.4524) data time 0.0008 (0.0023) model time 0.4440 (0.4499) loss 2.6266 (2.4693) grad_norm 2.0610 (3.0256) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][390/625] eta 0:01:46 lr 0.000061 wd 0.0500 time 0.4467 (0.4522) data time 0.0007 (0.0023) model time 0.4460 (0.4498) loss 1.4720 (2.4700) grad_norm 2.1757 (3.0164) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][400/625] eta 0:01:41 lr 0.000061 wd 0.0500 time 0.4486 (0.4521) data time 0.0008 (0.0023) model time 0.4478 (0.4497) loss 2.4832 (2.4670) grad_norm 2.3794 (3.0053) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][410/625] eta 0:01:37 lr 0.000061 wd 0.0500 time 0.4448 (0.4520) data time 0.0009 (0.0022) model time 0.4439 (0.4497) loss 2.4396 (2.4618) grad_norm 2.2168 (3.0006) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][420/625] eta 0:01:32 lr 0.000061 wd 0.0500 time 0.4478 (0.4520) data time 0.0009 (0.0022) model time 0.4469 (0.4496) loss 1.7404 (2.4576) grad_norm 1.8697 (2.9953) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][430/625] eta 0:01:28 lr 0.000061 wd 0.0500 time 0.4486 (0.4519) data time 0.0009 (0.0022) model time 0.4477 (0.4496) loss 1.5818 (2.4552) grad_norm 2.6852 (2.9914) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][440/625] eta 0:01:23 lr 0.000061 wd 0.0500 time 0.4475 (0.4519) data time 0.0009 (0.0021) model time 0.4466 (0.4496) loss 2.9387 (2.4562) grad_norm 3.9064 (3.0017) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][450/625] eta 0:01:19 lr 0.000061 wd 0.0500 time 0.4443 (0.4518) data time 0.0007 (0.0021) model time 0.4436 (0.4495) loss 2.8334 (2.4602) grad_norm 2.9144 (2.9896) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][460/625] eta 0:01:14 lr 0.000060 wd 0.0500 time 0.4490 (0.4517) data time 0.0010 (0.0021) model time 0.4480 (0.4495) loss 2.6794 (2.4619) grad_norm 3.5487 (2.9908) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][470/625] eta 0:01:09 lr 0.000060 wd 0.0500 time 0.4481 (0.4516) data time 0.0007 (0.0020) model time 0.4474 (0.4494) loss 2.1580 (2.4565) grad_norm 3.3725 (2.9956) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][480/625] eta 0:01:05 lr 0.000060 wd 0.0500 time 0.4460 (0.4515) data time 0.0009 (0.0020) model time 0.4451 (0.4493) loss 2.7602 (2.4567) grad_norm 2.8019 (2.9898) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][490/625] eta 0:01:00 lr 0.000060 wd 0.0500 time 0.4518 (0.4515) data time 0.0008 (0.0020) model time 0.4510 (0.4493) loss 2.7474 (2.4622) grad_norm 3.0958 (2.9849) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][500/625] eta 0:00:56 lr 0.000060 wd 0.0500 time 0.4466 (0.4514) data time 0.0010 (0.0020) model time 0.4455 (0.4493) loss 2.3220 (2.4584) grad_norm 2.4294 (3.0417) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][510/625] eta 0:00:51 lr 0.000060 wd 0.0500 time 0.4533 (0.4518) data time 0.0007 (0.0020) model time 0.4526 (0.4497) loss 2.7188 (2.4609) grad_norm 2.4956 (3.0350) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:41:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][520/625] eta 0:00:47 lr 0.000060 wd 0.0500 time 0.4504 (0.4517) data time 0.0008 (0.0019) model time 0.4496 (0.4497) loss 2.7869 (2.4591) grad_norm 2.6967 (3.0415) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][530/625] eta 0:00:42 lr 0.000060 wd 0.0500 time 0.4459 (0.4517) data time 0.0007 (0.0019) model time 0.4453 (0.4496) loss 2.1554 (2.4587) grad_norm 2.2432 (3.0556) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][540/625] eta 0:00:38 lr 0.000060 wd 0.0500 time 0.4472 (0.4516) data time 0.0006 (0.0019) model time 0.4466 (0.4496) loss 2.8105 (2.4629) grad_norm 2.1897 (3.0448) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][550/625] eta 0:00:33 lr 0.000060 wd 0.0500 time 0.4478 (0.4515) data time 0.0008 (0.0019) model time 0.4469 (0.4495) loss 2.6292 (2.4644) grad_norm 1.8366 (3.0399) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][560/625] eta 0:00:29 lr 0.000060 wd 0.0500 time 0.4496 (0.4514) data time 0.0008 (0.0019) model time 0.4487 (0.4495) loss 2.4844 (2.4654) grad_norm 2.0739 (3.0691) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][570/625] eta 0:00:24 lr 0.000060 wd 0.0500 time 0.4598 (0.4514) data time 0.0009 (0.0018) model time 0.4589 (0.4495) loss 2.3938 (2.4643) grad_norm 2.0604 (3.0609) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][580/625] eta 0:00:20 lr 0.000060 wd 0.0500 time 0.4499 (0.4514) data time 0.0007 (0.0018) model time 0.4492 (0.4495) loss 2.3957 (2.4672) grad_norm 2.1144 (3.0567) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][590/625] eta 0:00:15 lr 0.000060 wd 0.0500 time 0.4466 (0.4514) data time 0.0007 (0.0018) model time 0.4459 (0.4495) loss 2.6852 (2.4681) grad_norm 56.8923 (3.1397) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][600/625] eta 0:00:11 lr 0.000060 wd 0.0500 time 0.4453 (0.4513) data time 0.0006 (0.0018) model time 0.4446 (0.4494) loss 2.8638 (2.4685) grad_norm 3.0491 (3.1437) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][610/625] eta 0:00:06 lr 0.000060 wd 0.0500 time 0.4426 (0.4513) data time 0.0004 (0.0018) model time 0.4421 (0.4494) loss 2.4949 (2.4678) grad_norm 3.3068 (3.1423) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [263/300][620/625] eta 0:00:02 lr 0.000060 wd 0.0500 time 0.4456 (0.4512) data time 0.0006 (0.0018) model time 0.4450 (0.4493) loss 2.6251 (2.4684) grad_norm 2.9754 (3.1368) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:42 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 263 training takes 0:04:41 [2024-08-11 09:42:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:42:44 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:42:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5352 (0.5352) Acc@1 88.770 (88.770) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 09:42:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8623 (0.6313) Acc@1 80.322 (86.856) Acc@5 95.654 (97.696) Mem 16699MB [2024-08-11 09:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9321 (0.7506) Acc@1 79.248 (83.938) Acc@5 95.410 (96.647) Mem 16699MB [2024-08-11 09:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.673 Acc@5 96.613 [2024-08-11 09:42:47 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:42:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.822 (0.822) Loss 0.5088 (0.5088) Acc@1 89.648 (89.648) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.183) Loss 0.8188 (0.6116) Acc@1 81.055 (87.189) Acc@5 96.387 (97.807) Mem 16699MB [2024-08-11 09:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.8936 (0.7263) Acc@1 79.395 (84.433) Acc@5 95.801 (96.815) Mem 16699MB [2024-08-11 09:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.141 Acc@5 96.767 [2024-08-11 09:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 09:42:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][0/625] eta 0:12:51 lr 0.000060 wd 0.0500 time 1.2343 (1.2343) data time 0.4067 (0.4067) model time 0.0000 (0.0000) loss 2.6485 (2.6485) grad_norm 2.5133 (2.5133) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:42:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][10/625] eta 0:05:20 lr 0.000060 wd 0.0500 time 0.4515 (0.5212) data time 0.0009 (0.0378) model time 0.0000 (0.0000) loss 2.4824 (2.5715) grad_norm 2.4097 (2.7515) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][20/625] eta 0:04:54 lr 0.000060 wd 0.0500 time 0.4484 (0.4865) data time 0.0007 (0.0202) model time 0.0000 (0.0000) loss 1.9675 (2.5037) grad_norm 3.1642 (2.7898) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][30/625] eta 0:04:41 lr 0.000060 wd 0.0500 time 0.4492 (0.4739) data time 0.0009 (0.0140) model time 0.0000 (0.0000) loss 2.6997 (2.4016) grad_norm 2.5398 (2.8101) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][40/625] eta 0:04:33 lr 0.000060 wd 0.0500 time 0.4442 (0.4671) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 2.4784 (2.4247) grad_norm 2.3406 (2.9225) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][50/625] eta 0:04:26 lr 0.000060 wd 0.0500 time 0.4508 (0.4633) data time 0.0006 (0.0088) model time 0.0000 (0.0000) loss 2.8496 (2.4546) grad_norm 2.7517 (3.0295) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][60/625] eta 0:04:20 lr 0.000060 wd 0.0500 time 0.4437 (0.4605) data time 0.0007 (0.0075) model time 0.4431 (0.4450) loss 2.3571 (2.4436) grad_norm 10.2701 (3.1523) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][70/625] eta 0:04:14 lr 0.000060 wd 0.0500 time 0.4546 (0.4592) data time 0.0007 (0.0066) model time 0.4539 (0.4478) loss 2.6731 (2.4533) grad_norm 2.4705 (3.0281) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][80/625] eta 0:04:09 lr 0.000059 wd 0.0500 time 0.4455 (0.4578) data time 0.0007 (0.0059) model time 0.4448 (0.4477) loss 2.9284 (2.4412) grad_norm 4.5330 (2.9924) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][90/625] eta 0:04:04 lr 0.000059 wd 0.0500 time 0.4504 (0.4567) data time 0.0007 (0.0053) model time 0.4498 (0.4475) loss 2.8151 (2.4658) grad_norm 2.7281 (2.9222) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][100/625] eta 0:04:00 lr 0.000059 wd 0.0500 time 0.4449 (0.4580) data time 0.0008 (0.0049) model time 0.4441 (0.4518) loss 2.9180 (2.4683) grad_norm 2.3090 (2.8782) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][110/625] eta 0:03:55 lr 0.000059 wd 0.0500 time 0.4600 (0.4577) data time 0.0007 (0.0045) model time 0.4594 (0.4521) loss 1.5476 (2.4702) grad_norm 2.0995 (3.1299) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][120/625] eta 0:03:50 lr 0.000059 wd 0.0500 time 0.4635 (0.4570) data time 0.0008 (0.0042) model time 0.4626 (0.4516) loss 2.4850 (2.4613) grad_norm 2.5669 (3.2914) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][130/625] eta 0:03:45 lr 0.000059 wd 0.0500 time 0.4520 (0.4564) data time 0.0006 (0.0039) model time 0.4514 (0.4512) loss 2.7151 (2.4836) grad_norm 35.0902 (3.4849) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:43:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][140/625] eta 0:03:41 lr 0.000059 wd 0.0500 time 0.4618 (0.4561) data time 0.0006 (0.0037) model time 0.4611 (0.4511) loss 3.1150 (2.4961) grad_norm 2.6776 (3.4511) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:44:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][150/625] eta 0:03:36 lr 0.000059 wd 0.0500 time 0.4482 (0.4565) data time 0.0007 (0.0035) model time 0.4476 (0.4521) loss 2.9409 (2.4936) grad_norm 2.2345 (3.6082) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:44:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][160/625] eta 0:03:32 lr 0.000059 wd 0.0500 time 0.4478 (0.4560) data time 0.0006 (0.0034) model time 0.4472 (0.4517) loss 2.8818 (2.5060) grad_norm 2.7671 (3.6870) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:44:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][170/625] eta 0:03:27 lr 0.000059 wd 0.0500 time 0.4502 (0.4555) data time 0.0007 (0.0032) model time 0.4495 (0.4514) loss 2.7982 (2.4945) grad_norm 1.7581 (3.6338) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 09:44:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][180/625] eta 0:03:22 lr 0.000059 wd 0.0500 time 0.4518 (0.4550) data time 0.0008 (0.0031) model time 0.4510 (0.4509) loss 2.5436 (2.4953) grad_norm 2.4348 (inf) loss_scale 128.0000 (252.4641) mem 16699MB [2024-08-11 09:44:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][190/625] eta 0:03:17 lr 0.000059 wd 0.0500 time 0.4519 (0.4547) data time 0.0006 (0.0030) model time 0.4513 (0.4507) loss 1.8589 (2.4947) grad_norm 8.2835 (inf) loss_scale 128.0000 (245.9476) mem 16699MB [2024-08-11 09:44:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][200/625] eta 0:03:13 lr 0.000059 wd 0.0500 time 0.4460 (0.4543) data time 0.0008 (0.0029) model time 0.4452 (0.4504) loss 2.1988 (2.4975) grad_norm 2.4004 (inf) loss_scale 128.0000 (240.0796) mem 16699MB [2024-08-11 09:44:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][210/625] eta 0:03:08 lr 0.000059 wd 0.0500 time 0.4505 (0.4540) data time 0.0007 (0.0028) model time 0.4499 (0.4502) loss 1.6129 (2.4916) grad_norm 2.0931 (inf) loss_scale 128.0000 (234.7678) mem 16699MB [2024-08-11 09:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][220/625] eta 0:03:03 lr 0.000059 wd 0.0500 time 0.4467 (0.4537) data time 0.0006 (0.0027) model time 0.4460 (0.4500) loss 2.6879 (2.4833) grad_norm 2.2175 (inf) loss_scale 128.0000 (229.9367) mem 16699MB [2024-08-11 09:44:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][230/625] eta 0:02:59 lr 0.000059 wd 0.0500 time 0.4454 (0.4535) data time 0.0008 (0.0026) model time 0.4445 (0.4499) loss 1.6732 (2.4725) grad_norm 3.1657 (inf) loss_scale 128.0000 (225.5238) mem 16699MB [2024-08-11 09:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][240/625] eta 0:02:54 lr 0.000059 wd 0.0500 time 0.4472 (0.4532) data time 0.0008 (0.0025) model time 0.4464 (0.4497) loss 2.4241 (2.4748) grad_norm 3.7452 (inf) loss_scale 128.0000 (221.4772) mem 16699MB [2024-08-11 09:44:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][250/625] eta 0:02:49 lr 0.000059 wd 0.0500 time 0.4467 (0.4529) data time 0.0009 (0.0025) model time 0.4459 (0.4495) loss 2.5089 (2.4751) grad_norm 2.4844 (inf) loss_scale 128.0000 (217.7530) mem 16699MB [2024-08-11 09:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][260/625] eta 0:02:45 lr 0.000059 wd 0.0500 time 0.4413 (0.4527) data time 0.0010 (0.0024) model time 0.4403 (0.4493) loss 2.8134 (2.4760) grad_norm 3.6789 (inf) loss_scale 128.0000 (214.3142) mem 16699MB [2024-08-11 09:44:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][270/625] eta 0:02:40 lr 0.000059 wd 0.0500 time 0.4474 (0.4525) data time 0.0008 (0.0023) model time 0.4466 (0.4492) loss 2.6679 (2.4841) grad_norm 2.0575 (inf) loss_scale 128.0000 (211.1292) mem 16699MB [2024-08-11 09:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][280/625] eta 0:02:36 lr 0.000059 wd 0.0500 time 0.4477 (0.4523) data time 0.0009 (0.0023) model time 0.4468 (0.4490) loss 2.0795 (2.4813) grad_norm 2.1294 (inf) loss_scale 128.0000 (208.1708) mem 16699MB [2024-08-11 09:45:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][290/625] eta 0:02:31 lr 0.000059 wd 0.0500 time 0.4500 (0.4522) data time 0.0009 (0.0022) model time 0.4491 (0.4491) loss 2.3414 (2.4765) grad_norm 4.0092 (inf) loss_scale 128.0000 (205.4158) mem 16699MB [2024-08-11 09:45:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][300/625] eta 0:02:26 lr 0.000059 wd 0.0500 time 0.4515 (0.4521) data time 0.0007 (0.0022) model time 0.4508 (0.4490) loss 2.6123 (2.4852) grad_norm 2.5392 (inf) loss_scale 128.0000 (202.8439) mem 16699MB [2024-08-11 09:45:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][310/625] eta 0:02:22 lr 0.000059 wd 0.0500 time 0.4458 (0.4520) data time 0.0009 (0.0021) model time 0.4450 (0.4489) loss 2.5588 (2.4834) grad_norm 1.6673 (inf) loss_scale 128.0000 (200.4373) mem 16699MB [2024-08-11 09:45:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][320/625] eta 0:02:17 lr 0.000058 wd 0.0500 time 0.4456 (0.4518) data time 0.0009 (0.0021) model time 0.4447 (0.4488) loss 1.9745 (2.4867) grad_norm 2.4984 (inf) loss_scale 128.0000 (198.1807) mem 16699MB [2024-08-11 09:45:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][330/625] eta 0:02:13 lr 0.000058 wd 0.0500 time 0.4547 (0.4518) data time 0.0006 (0.0021) model time 0.4541 (0.4488) loss 1.4142 (2.4797) grad_norm 2.3484 (inf) loss_scale 128.0000 (196.0604) mem 16699MB [2024-08-11 09:45:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][340/625] eta 0:02:08 lr 0.000058 wd 0.0500 time 0.4482 (0.4517) data time 0.0008 (0.0020) model time 0.4474 (0.4488) loss 2.8202 (2.4812) grad_norm 2.2381 (inf) loss_scale 128.0000 (194.0645) mem 16699MB [2024-08-11 09:45:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][350/625] eta 0:02:04 lr 0.000058 wd 0.0500 time 0.4495 (0.4516) data time 0.0008 (0.0020) model time 0.4487 (0.4488) loss 2.4374 (2.4782) grad_norm 2.2364 (inf) loss_scale 128.0000 (192.1823) mem 16699MB [2024-08-11 09:45:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][360/625] eta 0:01:59 lr 0.000058 wd 0.0500 time 0.4497 (0.4516) data time 0.0008 (0.0020) model time 0.4489 (0.4488) loss 1.9128 (2.4780) grad_norm 2.8991 (inf) loss_scale 128.0000 (190.4044) mem 16699MB [2024-08-11 09:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][370/625] eta 0:01:55 lr 0.000058 wd 0.0500 time 0.4521 (0.4515) data time 0.0008 (0.0019) model time 0.4513 (0.4488) loss 2.7143 (2.4784) grad_norm 2.2926 (inf) loss_scale 128.0000 (188.7224) mem 16699MB [2024-08-11 09:45:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][380/625] eta 0:01:50 lr 0.000058 wd 0.0500 time 0.4500 (0.4515) data time 0.0008 (0.0019) model time 0.4492 (0.4488) loss 1.9800 (2.4691) grad_norm 1.8204 (inf) loss_scale 128.0000 (187.1286) mem 16699MB [2024-08-11 09:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][390/625] eta 0:01:46 lr 0.000058 wd 0.0500 time 0.4468 (0.4514) data time 0.0006 (0.0019) model time 0.4462 (0.4488) loss 2.6692 (2.4716) grad_norm 3.3043 (inf) loss_scale 128.0000 (185.6164) mem 16699MB [2024-08-11 09:45:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][400/625] eta 0:01:41 lr 0.000058 wd 0.0500 time 0.4457 (0.4513) data time 0.0007 (0.0019) model time 0.4450 (0.4488) loss 2.3693 (2.4712) grad_norm 2.9041 (inf) loss_scale 128.0000 (184.1796) mem 16699MB [2024-08-11 09:45:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][410/625] eta 0:01:37 lr 0.000058 wd 0.0500 time 0.4502 (0.4512) data time 0.0009 (0.0018) model time 0.4493 (0.4487) loss 1.6391 (2.4699) grad_norm 2.9363 (inf) loss_scale 128.0000 (182.8127) mem 16699MB [2024-08-11 09:46:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][420/625] eta 0:01:32 lr 0.000058 wd 0.0500 time 0.4491 (0.4512) data time 0.0006 (0.0018) model time 0.4484 (0.4487) loss 3.0656 (2.4678) grad_norm 2.4509 (inf) loss_scale 128.0000 (181.5107) mem 16699MB [2024-08-11 09:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][430/625] eta 0:01:28 lr 0.000058 wd 0.0500 time 0.4478 (0.4516) data time 0.0007 (0.0018) model time 0.4471 (0.4492) loss 1.9167 (2.4647) grad_norm 2.1146 (inf) loss_scale 128.0000 (180.2691) mem 16699MB [2024-08-11 09:46:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][440/625] eta 0:01:23 lr 0.000058 wd 0.0500 time 0.4470 (0.4515) data time 0.0007 (0.0018) model time 0.4463 (0.4492) loss 2.9600 (2.4610) grad_norm 2.7758 (inf) loss_scale 128.0000 (179.0839) mem 16699MB [2024-08-11 09:46:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][450/625] eta 0:01:19 lr 0.000058 wd 0.0500 time 0.4517 (0.4515) data time 0.0007 (0.0017) model time 0.4510 (0.4491) loss 2.6023 (2.4566) grad_norm 1.7754 (inf) loss_scale 128.0000 (177.9512) mem 16699MB [2024-08-11 09:46:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][460/625] eta 0:01:14 lr 0.000058 wd 0.0500 time 0.4495 (0.4514) data time 0.0008 (0.0017) model time 0.4487 (0.4491) loss 2.7007 (2.4560) grad_norm 2.3165 (inf) loss_scale 128.0000 (176.8677) mem 16699MB [2024-08-11 09:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][470/625] eta 0:01:09 lr 0.000058 wd 0.0500 time 0.4482 (0.4514) data time 0.0006 (0.0017) model time 0.4475 (0.4491) loss 3.0540 (2.4607) grad_norm 1.8768 (inf) loss_scale 128.0000 (175.8301) mem 16699MB [2024-08-11 09:46:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][480/625] eta 0:01:05 lr 0.000058 wd 0.0500 time 0.4484 (0.4517) data time 0.0008 (0.0017) model time 0.4476 (0.4494) loss 1.9591 (2.4596) grad_norm 5.4363 (inf) loss_scale 128.0000 (174.8358) mem 16699MB [2024-08-11 09:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][490/625] eta 0:01:00 lr 0.000058 wd 0.0500 time 0.4503 (0.4516) data time 0.0008 (0.0017) model time 0.4496 (0.4494) loss 1.6415 (2.4599) grad_norm 3.8716 (inf) loss_scale 128.0000 (173.8819) mem 16699MB [2024-08-11 09:46:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][500/625] eta 0:00:56 lr 0.000058 wd 0.0500 time 0.4460 (0.4515) data time 0.0006 (0.0016) model time 0.4453 (0.4493) loss 2.3483 (2.4593) grad_norm 2.2822 (inf) loss_scale 128.0000 (172.9661) mem 16699MB [2024-08-11 09:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][510/625] eta 0:00:51 lr 0.000058 wd 0.0500 time 0.4461 (0.4515) data time 0.0006 (0.0016) model time 0.4455 (0.4493) loss 2.6058 (2.4601) grad_norm 2.4481 (inf) loss_scale 128.0000 (172.0861) mem 16699MB [2024-08-11 09:46:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][520/625] eta 0:00:47 lr 0.000058 wd 0.0500 time 0.4512 (0.4515) data time 0.0006 (0.0016) model time 0.4506 (0.4493) loss 2.5652 (2.4606) grad_norm 2.9977 (inf) loss_scale 128.0000 (171.2399) mem 16699MB [2024-08-11 09:46:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][530/625] eta 0:00:42 lr 0.000058 wd 0.0500 time 0.4507 (0.4514) data time 0.0008 (0.0016) model time 0.4499 (0.4493) loss 2.7537 (2.4660) grad_norm 2.2669 (inf) loss_scale 128.0000 (170.4256) mem 16699MB [2024-08-11 09:46:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][540/625] eta 0:00:38 lr 0.000058 wd 0.0500 time 0.4457 (0.4514) data time 0.0008 (0.0016) model time 0.4449 (0.4494) loss 2.9265 (2.4685) grad_norm 3.2935 (inf) loss_scale 128.0000 (169.6414) mem 16699MB [2024-08-11 09:47:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][550/625] eta 0:00:33 lr 0.000058 wd 0.0500 time 0.4543 (0.4514) data time 0.0006 (0.0016) model time 0.4537 (0.4493) loss 3.0566 (2.4684) grad_norm 2.7911 (inf) loss_scale 128.0000 (168.8857) mem 16699MB [2024-08-11 09:47:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][560/625] eta 0:00:29 lr 0.000057 wd 0.0500 time 0.4452 (0.4514) data time 0.0008 (0.0016) model time 0.4443 (0.4493) loss 2.8658 (2.4673) grad_norm 2.9425 (inf) loss_scale 128.0000 (168.1569) mem 16699MB [2024-08-11 09:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][570/625] eta 0:00:24 lr 0.000057 wd 0.0500 time 0.4469 (0.4513) data time 0.0009 (0.0016) model time 0.4461 (0.4493) loss 2.7683 (2.4712) grad_norm 2.1025 (inf) loss_scale 128.0000 (167.4536) mem 16699MB [2024-08-11 09:47:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][580/625] eta 0:00:20 lr 0.000057 wd 0.0500 time 0.4549 (0.4513) data time 0.0006 (0.0015) model time 0.4543 (0.4493) loss 2.1092 (2.4727) grad_norm 7.0227 (inf) loss_scale 128.0000 (166.7745) mem 16699MB [2024-08-11 09:47:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][590/625] eta 0:00:15 lr 0.000057 wd 0.0500 time 0.4482 (0.4513) data time 0.0007 (0.0015) model time 0.4475 (0.4493) loss 2.7923 (2.4751) grad_norm 2.0816 (inf) loss_scale 128.0000 (166.1184) mem 16699MB [2024-08-11 09:47:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][600/625] eta 0:00:11 lr 0.000057 wd 0.0500 time 0.4494 (0.4513) data time 0.0008 (0.0015) model time 0.4486 (0.4493) loss 1.8898 (2.4738) grad_norm 4.8547 (inf) loss_scale 128.0000 (165.4842) mem 16699MB [2024-08-11 09:47:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][610/625] eta 0:00:06 lr 0.000057 wd 0.0500 time 0.4457 (0.4512) data time 0.0007 (0.0015) model time 0.4450 (0.4493) loss 2.8363 (2.4726) grad_norm 1.9543 (inf) loss_scale 128.0000 (164.8707) mem 16699MB [2024-08-11 09:47:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [264/300][620/625] eta 0:00:02 lr 0.000057 wd 0.0500 time 0.4435 (0.4514) data time 0.0006 (0.0015) model time 0.4429 (0.4495) loss 3.0220 (2.4747) grad_norm 1.8751 (inf) loss_scale 128.0000 (164.2770) mem 16699MB [2024-08-11 09:47:33 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 264 training takes 0:04:42 [2024-08-11 09:47:33 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:47:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:47:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.477 (0.477) Loss 0.5264 (0.5264) Acc@1 88.770 (88.770) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.153) Loss 0.8408 (0.6245) Acc@1 81.006 (86.914) Acc@5 96.191 (97.803) Mem 16699MB [2024-08-11 09:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9214 (0.7442) Acc@1 79.639 (84.089) Acc@5 95.166 (96.687) Mem 16699MB [2024-08-11 09:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.807 Acc@5 96.651 [2024-08-11 09:47:38 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 09:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.782 (0.782) Loss 0.5103 (0.5103) Acc@1 89.453 (89.453) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 09:47:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.181) Loss 0.8203 (0.6121) Acc@1 81.104 (87.225) Acc@5 96.387 (97.812) Mem 16699MB [2024-08-11 09:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.149) Loss 0.8945 (0.7269) Acc@1 79.590 (84.468) Acc@5 95.752 (96.810) Mem 16699MB [2024-08-11 09:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.177 Acc@5 96.763 [2024-08-11 09:47:42 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 09:47:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][0/625] eta 0:13:04 lr 0.000057 wd 0.0500 time 1.2557 (1.2557) data time 0.5047 (0.5047) model time 0.0000 (0.0000) loss 2.3337 (2.3337) grad_norm 3.0361 (3.0361) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:47:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][10/625] eta 0:05:20 lr 0.000057 wd 0.0500 time 0.4474 (0.5209) data time 0.0006 (0.0466) model time 0.0000 (0.0000) loss 2.9616 (2.4170) grad_norm 7.8455 (7.6107) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:47:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][20/625] eta 0:04:53 lr 0.000057 wd 0.0500 time 0.4471 (0.4859) data time 0.0008 (0.0248) model time 0.0000 (0.0000) loss 2.6182 (2.4695) grad_norm 3.7080 (5.7284) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:47:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][30/625] eta 0:04:44 lr 0.000057 wd 0.0500 time 0.4517 (0.4782) data time 0.0007 (0.0171) model time 0.0000 (0.0000) loss 2.7559 (2.4533) grad_norm 1.7765 (4.7251) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][40/625] eta 0:04:35 lr 0.000057 wd 0.0500 time 0.4498 (0.4710) data time 0.0009 (0.0131) model time 0.0000 (0.0000) loss 2.0893 (2.4386) grad_norm 2.1223 (4.1882) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][50/625] eta 0:04:28 lr 0.000057 wd 0.0500 time 0.4484 (0.4668) data time 0.0006 (0.0107) model time 0.0000 (0.0000) loss 2.6724 (2.4719) grad_norm 3.6087 (3.8761) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][60/625] eta 0:04:21 lr 0.000057 wd 0.0500 time 0.4433 (0.4636) data time 0.0009 (0.0091) model time 0.4424 (0.4463) loss 2.8710 (2.4678) grad_norm 2.3538 (3.6528) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][70/625] eta 0:04:15 lr 0.000057 wd 0.0500 time 0.4442 (0.4611) data time 0.0009 (0.0079) model time 0.4433 (0.4455) loss 2.8598 (2.4829) grad_norm 1.9945 (3.4680) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][80/625] eta 0:04:10 lr 0.000057 wd 0.0500 time 0.4487 (0.4595) data time 0.0009 (0.0071) model time 0.4478 (0.4462) loss 2.0485 (2.4695) grad_norm 2.1346 (3.3808) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][90/625] eta 0:04:05 lr 0.000057 wd 0.0500 time 0.4474 (0.4583) data time 0.0008 (0.0064) model time 0.4466 (0.4466) loss 2.1428 (2.4895) grad_norm 2.5029 (3.2928) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][100/625] eta 0:04:00 lr 0.000057 wd 0.0500 time 0.4457 (0.4575) data time 0.0006 (0.0058) model time 0.4451 (0.4471) loss 2.7283 (2.4897) grad_norm 11.8315 (3.6536) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][110/625] eta 0:03:55 lr 0.000057 wd 0.0500 time 0.4475 (0.4566) data time 0.0008 (0.0054) model time 0.4466 (0.4470) loss 2.7465 (2.5122) grad_norm 2.1738 (3.5982) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][120/625] eta 0:03:50 lr 0.000057 wd 0.0500 time 0.4450 (0.4558) data time 0.0008 (0.0050) model time 0.4443 (0.4470) loss 2.4934 (2.5280) grad_norm 2.5615 (3.5434) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][130/625] eta 0:03:45 lr 0.000057 wd 0.0500 time 0.4480 (0.4552) data time 0.0008 (0.0047) model time 0.4472 (0.4470) loss 2.6326 (2.5304) grad_norm 2.2754 (3.4745) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][140/625] eta 0:03:41 lr 0.000057 wd 0.0500 time 0.6637 (0.4561) data time 0.0008 (0.0044) model time 0.6629 (0.4492) loss 2.8926 (2.5381) grad_norm 1.9709 (3.4042) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][150/625] eta 0:03:36 lr 0.000057 wd 0.0500 time 0.4442 (0.4554) data time 0.0007 (0.0042) model time 0.4435 (0.4487) loss 1.7135 (2.5264) grad_norm 3.0340 (3.3392) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][160/625] eta 0:03:31 lr 0.000057 wd 0.0500 time 0.4478 (0.4548) data time 0.0006 (0.0039) model time 0.4472 (0.4484) loss 2.2540 (2.5019) grad_norm 3.1188 (3.4059) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:48:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][170/625] eta 0:03:27 lr 0.000057 wd 0.0500 time 0.4483 (0.4552) data time 0.0007 (0.0038) model time 0.4476 (0.4495) loss 2.8765 (2.4955) grad_norm 2.0301 (3.4292) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][180/625] eta 0:03:22 lr 0.000056 wd 0.0500 time 0.4467 (0.4549) data time 0.0006 (0.0036) model time 0.4461 (0.4494) loss 2.6470 (2.4887) grad_norm 2.6254 (3.4010) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][190/625] eta 0:03:17 lr 0.000056 wd 0.0500 time 0.4497 (0.4547) data time 0.0008 (0.0035) model time 0.4489 (0.4495) loss 2.9637 (2.4951) grad_norm 2.3374 (3.4124) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][200/625] eta 0:03:13 lr 0.000056 wd 0.0500 time 0.4480 (0.4544) data time 0.0008 (0.0033) model time 0.4472 (0.4493) loss 2.8462 (2.4814) grad_norm 4.6529 (3.4036) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][210/625] eta 0:03:08 lr 0.000056 wd 0.0500 time 0.4471 (0.4540) data time 0.0006 (0.0032) model time 0.4465 (0.4491) loss 2.3290 (2.4769) grad_norm 1.9059 (3.3874) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][220/625] eta 0:03:03 lr 0.000056 wd 0.0500 time 0.4474 (0.4538) data time 0.0008 (0.0031) model time 0.4466 (0.4491) loss 2.2281 (2.4822) grad_norm 2.0738 (3.3741) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][230/625] eta 0:02:59 lr 0.000056 wd 0.0500 time 0.4477 (0.4535) data time 0.0006 (0.0030) model time 0.4471 (0.4490) loss 2.9502 (2.4894) grad_norm 24.7178 (3.4327) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][240/625] eta 0:02:54 lr 0.000056 wd 0.0500 time 0.4501 (0.4534) data time 0.0006 (0.0029) model time 0.4495 (0.4490) loss 2.1652 (2.4887) grad_norm 2.7531 (3.4973) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][250/625] eta 0:02:49 lr 0.000056 wd 0.0500 time 0.4488 (0.4533) data time 0.0008 (0.0028) model time 0.4480 (0.4490) loss 2.6288 (2.4892) grad_norm 3.0933 (3.4821) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][260/625] eta 0:02:45 lr 0.000056 wd 0.0500 time 0.4452 (0.4531) data time 0.0008 (0.0027) model time 0.4444 (0.4490) loss 2.5975 (2.4909) grad_norm 18.8940 (3.5069) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][270/625] eta 0:02:40 lr 0.000056 wd 0.0500 time 0.4482 (0.4530) data time 0.0006 (0.0027) model time 0.4476 (0.4490) loss 2.0942 (2.4825) grad_norm 4.4041 (3.4912) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][280/625] eta 0:02:36 lr 0.000056 wd 0.0500 time 0.4461 (0.4529) data time 0.0007 (0.0026) model time 0.4454 (0.4490) loss 2.6512 (2.4758) grad_norm 16.7992 (3.5146) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][290/625] eta 0:02:31 lr 0.000056 wd 0.0500 time 0.4492 (0.4527) data time 0.0008 (0.0025) model time 0.4484 (0.4489) loss 2.7479 (2.4766) grad_norm 2.8230 (3.5140) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:49:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][300/625] eta 0:02:27 lr 0.000056 wd 0.0500 time 0.4445 (0.4525) data time 0.0008 (0.0025) model time 0.4437 (0.4487) loss 2.8500 (2.4749) grad_norm 1.9288 (3.4916) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][310/625] eta 0:02:22 lr 0.000056 wd 0.0500 time 0.4560 (0.4524) data time 0.0010 (0.0024) model time 0.4550 (0.4487) loss 2.4956 (2.4767) grad_norm 2.6293 (3.4643) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][320/625] eta 0:02:17 lr 0.000056 wd 0.0500 time 0.4468 (0.4522) data time 0.0006 (0.0024) model time 0.4462 (0.4487) loss 2.7540 (2.4786) grad_norm 3.3398 (3.4479) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][330/625] eta 0:02:13 lr 0.000056 wd 0.0500 time 0.4481 (0.4522) data time 0.0006 (0.0023) model time 0.4474 (0.4487) loss 2.6424 (2.4851) grad_norm 2.0858 (3.4087) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][340/625] eta 0:02:08 lr 0.000056 wd 0.0500 time 0.4510 (0.4521) data time 0.0008 (0.0023) model time 0.4502 (0.4487) loss 2.8529 (2.4868) grad_norm 2.9333 (3.3859) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][350/625] eta 0:02:04 lr 0.000056 wd 0.0500 time 0.4482 (0.4520) data time 0.0009 (0.0022) model time 0.4473 (0.4486) loss 2.6758 (2.4937) grad_norm 7.0336 (3.3765) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][360/625] eta 0:01:59 lr 0.000056 wd 0.0500 time 0.4450 (0.4518) data time 0.0006 (0.0022) model time 0.4444 (0.4485) loss 1.8379 (2.4874) grad_norm 2.2452 (3.3509) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][370/625] eta 0:01:55 lr 0.000056 wd 0.0500 time 0.4463 (0.4517) data time 0.0008 (0.0022) model time 0.4455 (0.4485) loss 2.0252 (2.4891) grad_norm 2.3393 (3.3639) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][380/625] eta 0:01:50 lr 0.000056 wd 0.0500 time 0.4459 (0.4516) data time 0.0008 (0.0021) model time 0.4451 (0.4484) loss 2.5797 (2.4898) grad_norm 2.6207 (3.3528) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][390/625] eta 0:01:46 lr 0.000056 wd 0.0500 time 0.4493 (0.4515) data time 0.0008 (0.0021) model time 0.4485 (0.4484) loss 2.3994 (2.4847) grad_norm 2.7161 (3.3590) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][400/625] eta 0:01:41 lr 0.000056 wd 0.0500 time 0.4483 (0.4514) data time 0.0006 (0.0021) model time 0.4477 (0.4484) loss 2.8141 (2.4856) grad_norm 1.9793 (3.3449) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][410/625] eta 0:01:37 lr 0.000056 wd 0.0500 time 0.4470 (0.4514) data time 0.0006 (0.0020) model time 0.4464 (0.4484) loss 1.7417 (2.4846) grad_norm 1.7705 (3.3324) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][420/625] eta 0:01:32 lr 0.000056 wd 0.0500 time 0.4451 (0.4514) data time 0.0008 (0.0020) model time 0.4443 (0.4484) loss 1.7305 (2.4811) grad_norm 2.1702 (3.3245) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:50:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][430/625] eta 0:01:27 lr 0.000055 wd 0.0500 time 0.4457 (0.4512) data time 0.0006 (0.0020) model time 0.4451 (0.4484) loss 2.8755 (2.4803) grad_norm 1.9424 (3.3058) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][440/625] eta 0:01:23 lr 0.000055 wd 0.0500 time 0.4446 (0.4512) data time 0.0009 (0.0019) model time 0.4437 (0.4483) loss 1.7406 (2.4757) grad_norm 1.7425 (3.2879) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][450/625] eta 0:01:18 lr 0.000055 wd 0.0500 time 0.4486 (0.4511) data time 0.0008 (0.0019) model time 0.4478 (0.4483) loss 2.5218 (2.4773) grad_norm 2.0337 (3.2613) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][460/625] eta 0:01:14 lr 0.000055 wd 0.0500 time 0.4569 (0.4510) data time 0.0009 (0.0019) model time 0.4560 (0.4483) loss 2.0656 (2.4786) grad_norm 2.2657 (3.2438) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][470/625] eta 0:01:09 lr 0.000055 wd 0.0500 time 0.4504 (0.4510) data time 0.0006 (0.0019) model time 0.4497 (0.4483) loss 1.7084 (2.4780) grad_norm 1.9557 (3.2251) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][480/625] eta 0:01:05 lr 0.000055 wd 0.0500 time 0.4505 (0.4514) data time 0.0008 (0.0019) model time 0.4497 (0.4488) loss 1.6945 (2.4739) grad_norm 2.5780 (3.2247) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][490/625] eta 0:01:00 lr 0.000055 wd 0.0500 time 0.4464 (0.4513) data time 0.0006 (0.0018) model time 0.4457 (0.4487) loss 2.7701 (2.4684) grad_norm 2.0082 (3.2101) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][500/625] eta 0:00:56 lr 0.000055 wd 0.0500 time 0.4454 (0.4513) data time 0.0009 (0.0018) model time 0.4445 (0.4487) loss 1.7587 (2.4662) grad_norm 2.8094 (3.2005) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][510/625] eta 0:00:51 lr 0.000055 wd 0.0500 time 0.4454 (0.4515) data time 0.0008 (0.0018) model time 0.4446 (0.4490) loss 2.5386 (2.4686) grad_norm 2.0609 (3.3161) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][520/625] eta 0:00:47 lr 0.000055 wd 0.0500 time 0.4465 (0.4514) data time 0.0008 (0.0018) model time 0.4457 (0.4489) loss 2.6604 (2.4676) grad_norm 2.5957 (3.2974) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][530/625] eta 0:00:42 lr 0.000055 wd 0.0500 time 0.4487 (0.4514) data time 0.0009 (0.0018) model time 0.4478 (0.4489) loss 2.3908 (2.4653) grad_norm 3.3081 (3.2904) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][540/625] eta 0:00:38 lr 0.000055 wd 0.0500 time 0.4443 (0.4513) data time 0.0009 (0.0017) model time 0.4434 (0.4489) loss 2.5094 (2.4670) grad_norm 2.9690 (3.2984) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][550/625] eta 0:00:33 lr 0.000055 wd 0.0500 time 0.4521 (0.4513) data time 0.0008 (0.0017) model time 0.4513 (0.4489) loss 2.7967 (2.4677) grad_norm 2.1811 (3.2871) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][560/625] eta 0:00:29 lr 0.000055 wd 0.0500 time 0.4683 (0.4513) data time 0.0009 (0.0017) model time 0.4673 (0.4490) loss 2.6906 (2.4638) grad_norm 3.8019 (3.2934) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:51:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][570/625] eta 0:00:24 lr 0.000055 wd 0.0500 time 0.4461 (0.4512) data time 0.0008 (0.0017) model time 0.4452 (0.4489) loss 2.2914 (2.4672) grad_norm 3.6283 (3.2861) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][580/625] eta 0:00:20 lr 0.000055 wd 0.0500 time 0.4488 (0.4512) data time 0.0008 (0.0017) model time 0.4480 (0.4489) loss 2.6658 (2.4708) grad_norm 2.0905 (3.2989) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][590/625] eta 0:00:15 lr 0.000055 wd 0.0500 time 0.4450 (0.4511) data time 0.0007 (0.0017) model time 0.4443 (0.4488) loss 2.1823 (2.4736) grad_norm 2.2367 (3.3105) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][600/625] eta 0:00:11 lr 0.000055 wd 0.0500 time 0.4557 (0.4511) data time 0.0006 (0.0016) model time 0.4551 (0.4488) loss 2.5646 (2.4778) grad_norm 2.8973 (3.3033) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][610/625] eta 0:00:06 lr 0.000055 wd 0.0500 time 0.4473 (0.4510) data time 0.0006 (0.0016) model time 0.4467 (0.4488) loss 2.0796 (2.4772) grad_norm 7.2049 (3.4188) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [265/300][620/625] eta 0:00:02 lr 0.000055 wd 0.0500 time 0.4470 (0.4509) data time 0.0006 (0.0016) model time 0.4464 (0.4487) loss 1.9761 (2.4782) grad_norm 3.9932 (3.4109) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:23 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 265 training takes 0:04:41 [2024-08-11 09:52:23 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:52:25 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:52:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.5317 (0.5317) Acc@1 88.477 (88.477) Acc@5 98.779 (98.779) Mem 16699MB [2024-08-11 09:52:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8501 (0.6256) Acc@1 80.371 (86.745) Acc@5 96.289 (97.749) Mem 16699MB [2024-08-11 09:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9312 (0.7477) Acc@1 79.590 (83.968) Acc@5 95.166 (96.608) Mem 16699MB [2024-08-11 09:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.697 Acc@5 96.567 [2024-08-11 09:52:28 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:52:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.786 (0.786) Loss 0.5112 (0.5112) Acc@1 89.404 (89.404) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:52:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.182) Loss 0.8223 (0.6127) Acc@1 81.104 (87.198) Acc@5 96.338 (97.798) Mem 16699MB [2024-08-11 09:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.150) Loss 0.8955 (0.7281) Acc@1 79.639 (84.433) Acc@5 95.752 (96.805) Mem 16699MB [2024-08-11 09:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.155 Acc@5 96.755 [2024-08-11 09:52:32 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 09:52:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][0/625] eta 0:13:12 lr 0.000055 wd 0.0500 time 1.2673 (1.2673) data time 0.7331 (0.7331) model time 0.0000 (0.0000) loss 2.4158 (2.4158) grad_norm 4.7288 (4.7288) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][10/625] eta 0:05:20 lr 0.000055 wd 0.0500 time 0.4476 (0.5206) data time 0.0009 (0.0675) model time 0.0000 (0.0000) loss 2.3449 (2.2743) grad_norm 2.2254 (2.7434) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][20/625] eta 0:04:54 lr 0.000055 wd 0.0500 time 0.4484 (0.4860) data time 0.0007 (0.0357) model time 0.0000 (0.0000) loss 2.2085 (2.3314) grad_norm 2.1341 (3.5584) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][30/625] eta 0:04:41 lr 0.000055 wd 0.0500 time 0.4473 (0.4731) data time 0.0009 (0.0245) model time 0.0000 (0.0000) loss 2.9056 (2.4037) grad_norm 3.0003 (3.2971) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][40/625] eta 0:04:33 lr 0.000055 wd 0.0500 time 0.4459 (0.4669) data time 0.0008 (0.0187) model time 0.0000 (0.0000) loss 2.8105 (2.4312) grad_norm 2.6071 (3.0743) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:52:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][50/625] eta 0:04:27 lr 0.000055 wd 0.0500 time 0.4475 (0.4645) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 2.7367 (2.4347) grad_norm 2.9849 (2.9570) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][60/625] eta 0:04:20 lr 0.000054 wd 0.0500 time 0.4470 (0.4618) data time 0.0009 (0.0129) model time 0.4462 (0.4472) loss 3.0317 (2.4466) grad_norm 2.1252 (2.9124) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][70/625] eta 0:04:15 lr 0.000054 wd 0.0500 time 0.4500 (0.4600) data time 0.0009 (0.0112) model time 0.4491 (0.4476) loss 2.4480 (2.4399) grad_norm 2.0276 (3.0668) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][80/625] eta 0:04:12 lr 0.000054 wd 0.0500 time 0.4472 (0.4630) data time 0.0008 (0.0099) model time 0.4464 (0.4595) loss 2.3868 (2.4460) grad_norm 2.1759 (3.0348) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][90/625] eta 0:04:06 lr 0.000054 wd 0.0500 time 0.4462 (0.4611) data time 0.0009 (0.0089) model time 0.4453 (0.4560) loss 2.9317 (2.4353) grad_norm 1.8305 (2.9687) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][100/625] eta 0:04:01 lr 0.000054 wd 0.0500 time 0.4479 (0.4597) data time 0.0008 (0.0081) model time 0.4471 (0.4541) loss 2.2488 (2.4503) grad_norm 3.1262 (3.1124) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][110/625] eta 0:03:56 lr 0.000054 wd 0.0500 time 0.4560 (0.4588) data time 0.0008 (0.0075) model time 0.4552 (0.4531) loss 2.7467 (2.4606) grad_norm 2.7630 (3.1441) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][120/625] eta 0:03:51 lr 0.000054 wd 0.0500 time 0.4463 (0.4579) data time 0.0009 (0.0069) model time 0.4455 (0.4523) loss 2.4166 (2.4533) grad_norm 2.5248 (3.1229) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][130/625] eta 0:03:46 lr 0.000054 wd 0.0500 time 0.4469 (0.4573) data time 0.0009 (0.0064) model time 0.4460 (0.4518) loss 2.8627 (2.4627) grad_norm 2.1870 (3.0793) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][140/625] eta 0:03:41 lr 0.000054 wd 0.0500 time 0.4470 (0.4566) data time 0.0009 (0.0061) model time 0.4461 (0.4513) loss 2.7591 (2.4540) grad_norm 2.1240 (3.0216) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][150/625] eta 0:03:36 lr 0.000054 wd 0.0500 time 0.4453 (0.4560) data time 0.0007 (0.0057) model time 0.4446 (0.4508) loss 2.7437 (2.4520) grad_norm 3.8099 (3.0398) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][160/625] eta 0:03:31 lr 0.000054 wd 0.0500 time 0.4444 (0.4555) data time 0.0009 (0.0054) model time 0.4436 (0.4504) loss 1.9798 (2.4466) grad_norm 2.5055 (3.1009) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][170/625] eta 0:03:27 lr 0.000054 wd 0.0500 time 0.4469 (0.4552) data time 0.0010 (0.0051) model time 0.4459 (0.4504) loss 2.4357 (2.4463) grad_norm 3.1846 (3.1251) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][180/625] eta 0:03:22 lr 0.000054 wd 0.0500 time 0.4619 (0.4549) data time 0.0009 (0.0049) model time 0.4610 (0.4503) loss 2.6509 (2.4608) grad_norm 2.0778 (3.0926) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:53:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][190/625] eta 0:03:17 lr 0.000054 wd 0.0500 time 0.4485 (0.4546) data time 0.0009 (0.0047) model time 0.4476 (0.4502) loss 2.5339 (2.4727) grad_norm 2.3382 (3.0734) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][200/625] eta 0:03:13 lr 0.000054 wd 0.0500 time 0.4510 (0.4552) data time 0.0006 (0.0046) model time 0.4503 (0.4510) loss 2.9180 (2.4735) grad_norm 2.0954 (3.0716) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][210/625] eta 0:03:08 lr 0.000054 wd 0.0500 time 0.4482 (0.4549) data time 0.0009 (0.0044) model time 0.4474 (0.4509) loss 2.7453 (2.4595) grad_norm 3.3015 (3.0923) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][220/625] eta 0:03:04 lr 0.000054 wd 0.0500 time 0.4473 (0.4546) data time 0.0009 (0.0042) model time 0.4464 (0.4507) loss 1.7182 (2.4476) grad_norm 3.2754 (3.0855) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][230/625] eta 0:02:59 lr 0.000054 wd 0.0500 time 0.4454 (0.4542) data time 0.0006 (0.0041) model time 0.4448 (0.4504) loss 1.8794 (2.4482) grad_norm 2.0778 (3.0577) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][240/625] eta 0:02:54 lr 0.000054 wd 0.0500 time 0.4500 (0.4540) data time 0.0009 (0.0040) model time 0.4491 (0.4502) loss 2.1108 (2.4493) grad_norm 2.5631 (3.0355) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][250/625] eta 0:02:50 lr 0.000054 wd 0.0500 time 0.4498 (0.4537) data time 0.0009 (0.0038) model time 0.4489 (0.4500) loss 2.5114 (2.4499) grad_norm 3.4594 (3.0531) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][260/625] eta 0:02:45 lr 0.000054 wd 0.0500 time 0.4469 (0.4535) data time 0.0007 (0.0037) model time 0.4462 (0.4499) loss 2.5410 (2.4425) grad_norm 2.8460 (3.0896) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][270/625] eta 0:02:41 lr 0.000054 wd 0.0500 time 0.4458 (0.4546) data time 0.0009 (0.0036) model time 0.4449 (0.4514) loss 2.6917 (2.4355) grad_norm 3.4621 (3.1554) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][280/625] eta 0:02:36 lr 0.000054 wd 0.0500 time 0.4482 (0.4544) data time 0.0008 (0.0035) model time 0.4473 (0.4512) loss 2.1767 (2.4280) grad_norm 16.6763 (3.1911) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][290/625] eta 0:02:32 lr 0.000054 wd 0.0500 time 0.4502 (0.4542) data time 0.0008 (0.0034) model time 0.4494 (0.4511) loss 2.9679 (2.4299) grad_norm 7.5258 (3.2083) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][300/625] eta 0:02:27 lr 0.000054 wd 0.0500 time 0.4460 (0.4539) data time 0.0009 (0.0033) model time 0.4450 (0.4509) loss 2.5718 (2.4336) grad_norm 3.3049 (3.3569) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][310/625] eta 0:02:22 lr 0.000053 wd 0.0500 time 0.4449 (0.4537) data time 0.0009 (0.0033) model time 0.4440 (0.4507) loss 2.8060 (2.4321) grad_norm 2.9404 (3.3368) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:54:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][320/625] eta 0:02:18 lr 0.000053 wd 0.0500 time 0.4478 (0.4535) data time 0.0008 (0.0032) model time 0.4470 (0.4505) loss 2.6899 (2.4305) grad_norm 2.0981 (3.3180) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][330/625] eta 0:02:13 lr 0.000053 wd 0.0500 time 0.4478 (0.4533) data time 0.0007 (0.0031) model time 0.4471 (0.4504) loss 3.0872 (2.4331) grad_norm 3.8772 (3.3295) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][340/625] eta 0:02:09 lr 0.000053 wd 0.0500 time 0.4461 (0.4531) data time 0.0007 (0.0031) model time 0.4455 (0.4502) loss 2.4875 (2.4338) grad_norm 1.5467 (3.3102) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][350/625] eta 0:02:04 lr 0.000053 wd 0.0500 time 0.4473 (0.4530) data time 0.0007 (0.0030) model time 0.4466 (0.4501) loss 2.7460 (2.4353) grad_norm 2.9365 (3.2954) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][360/625] eta 0:02:00 lr 0.000053 wd 0.0500 time 0.4478 (0.4528) data time 0.0008 (0.0029) model time 0.4470 (0.4500) loss 2.6209 (2.4415) grad_norm 2.3500 (3.2956) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][370/625] eta 0:01:55 lr 0.000053 wd 0.0500 time 0.4470 (0.4527) data time 0.0007 (0.0029) model time 0.4463 (0.4499) loss 1.4977 (2.4381) grad_norm 2.4147 (3.2848) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][380/625] eta 0:01:50 lr 0.000053 wd 0.0500 time 0.4470 (0.4526) data time 0.0008 (0.0028) model time 0.4462 (0.4498) loss 1.8237 (2.4400) grad_norm 2.4740 (3.2803) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][390/625] eta 0:01:46 lr 0.000053 wd 0.0500 time 0.4428 (0.4524) data time 0.0009 (0.0028) model time 0.4420 (0.4498) loss 2.6838 (2.4420) grad_norm 2.5440 (3.2760) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][400/625] eta 0:01:41 lr 0.000053 wd 0.0500 time 0.4430 (0.4523) data time 0.0008 (0.0027) model time 0.4423 (0.4497) loss 2.7783 (2.4432) grad_norm 3.0168 (3.2622) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][410/625] eta 0:01:37 lr 0.000053 wd 0.0500 time 0.4494 (0.4523) data time 0.0008 (0.0027) model time 0.4486 (0.4497) loss 2.7934 (2.4436) grad_norm 2.3222 (3.2613) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][420/625] eta 0:01:32 lr 0.000053 wd 0.0500 time 0.4572 (0.4523) data time 0.0008 (0.0026) model time 0.4564 (0.4498) loss 2.3681 (2.4384) grad_norm 3.1346 (3.2508) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][430/625] eta 0:01:28 lr 0.000053 wd 0.0500 time 0.4470 (0.4524) data time 0.0007 (0.0026) model time 0.4463 (0.4498) loss 2.2728 (2.4346) grad_norm 1.9373 (3.2305) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][440/625] eta 0:01:23 lr 0.000053 wd 0.0500 time 0.4472 (0.4523) data time 0.0009 (0.0025) model time 0.4463 (0.4498) loss 1.7742 (2.4318) grad_norm 2.3373 (3.2224) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][450/625] eta 0:01:19 lr 0.000053 wd 0.0500 time 0.4467 (0.4522) data time 0.0006 (0.0025) model time 0.4461 (0.4498) loss 2.3241 (2.4364) grad_norm 2.1656 (3.2093) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:56:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][460/625] eta 0:01:14 lr 0.000053 wd 0.0500 time 0.4445 (0.4521) data time 0.0006 (0.0025) model time 0.4438 (0.4497) loss 2.5122 (2.4405) grad_norm 2.5807 (3.2065) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:56:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][470/625] eta 0:01:10 lr 0.000053 wd 0.0500 time 0.4459 (0.4520) data time 0.0007 (0.0024) model time 0.4453 (0.4496) loss 1.6496 (2.4398) grad_norm 15.0389 (3.3558) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:56:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][480/625] eta 0:01:05 lr 0.000053 wd 0.0500 time 0.4484 (0.4519) data time 0.0008 (0.0024) model time 0.4476 (0.4496) loss 2.6112 (2.4397) grad_norm 1.8484 (3.3558) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:56:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][490/625] eta 0:01:01 lr 0.000053 wd 0.0500 time 0.4489 (0.4522) data time 0.0009 (0.0024) model time 0.4480 (0.4499) loss 2.2266 (2.4385) grad_norm 1.9990 (3.3385) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 09:56:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][500/625] eta 0:00:56 lr 0.000053 wd 0.0500 time 0.4476 (0.4521) data time 0.0009 (0.0023) model time 0.4467 (0.4499) loss 2.8365 (2.4421) grad_norm 2.3113 (inf) loss_scale 64.0000 (127.4890) mem 16699MB [2024-08-11 09:56:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][510/625] eta 0:00:51 lr 0.000053 wd 0.0500 time 0.4468 (0.4521) data time 0.0007 (0.0023) model time 0.4461 (0.4498) loss 2.6120 (2.4404) grad_norm 1.9992 (inf) loss_scale 64.0000 (126.2466) mem 16699MB [2024-08-11 09:56:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][520/625] eta 0:00:47 lr 0.000053 wd 0.0500 time 0.4467 (0.4520) data time 0.0008 (0.0023) model time 0.4459 (0.4497) loss 2.3059 (2.4367) grad_norm 2.7371 (inf) loss_scale 64.0000 (125.0518) mem 16699MB [2024-08-11 09:56:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][530/625] eta 0:00:42 lr 0.000053 wd 0.0500 time 0.4473 (0.4522) data time 0.0008 (0.0023) model time 0.4464 (0.4500) loss 2.1097 (2.4389) grad_norm 3.7074 (inf) loss_scale 64.0000 (123.9021) mem 16699MB [2024-08-11 09:56:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][540/625] eta 0:00:38 lr 0.000053 wd 0.0500 time 0.4449 (0.4522) data time 0.0008 (0.0022) model time 0.4441 (0.4500) loss 2.5837 (2.4369) grad_norm 1.9679 (inf) loss_scale 64.0000 (122.7948) mem 16699MB [2024-08-11 09:56:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][550/625] eta 0:00:33 lr 0.000053 wd 0.0500 time 0.4480 (0.4521) data time 0.0008 (0.0022) model time 0.4472 (0.4500) loss 2.1698 (2.4396) grad_norm 3.6577 (inf) loss_scale 64.0000 (121.7278) mem 16699MB [2024-08-11 09:56:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][560/625] eta 0:00:29 lr 0.000053 wd 0.0500 time 0.4535 (0.4521) data time 0.0008 (0.0022) model time 0.4527 (0.4499) loss 2.5160 (2.4378) grad_norm 3.9009 (inf) loss_scale 64.0000 (120.6988) mem 16699MB [2024-08-11 09:56:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][570/625] eta 0:00:24 lr 0.000052 wd 0.0500 time 0.4520 (0.4520) data time 0.0007 (0.0022) model time 0.4513 (0.4499) loss 2.1539 (2.4373) grad_norm 2.7607 (inf) loss_scale 64.0000 (119.7058) mem 16699MB [2024-08-11 09:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][580/625] eta 0:00:20 lr 0.000052 wd 0.0500 time 0.4467 (0.4520) data time 0.0006 (0.0021) model time 0.4460 (0.4499) loss 2.1725 (2.4379) grad_norm 3.9824 (inf) loss_scale 64.0000 (118.7470) mem 16699MB [2024-08-11 09:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][590/625] eta 0:00:15 lr 0.000052 wd 0.0500 time 0.4483 (0.4519) data time 0.0009 (0.0021) model time 0.4474 (0.4499) loss 2.6883 (2.4413) grad_norm 3.4196 (inf) loss_scale 64.0000 (117.8206) mem 16699MB [2024-08-11 09:57:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][600/625] eta 0:00:11 lr 0.000052 wd 0.0500 time 0.4587 (0.4519) data time 0.0006 (0.0021) model time 0.4581 (0.4499) loss 1.4592 (2.4432) grad_norm 2.8956 (inf) loss_scale 64.0000 (116.9251) mem 16699MB [2024-08-11 09:57:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][610/625] eta 0:00:06 lr 0.000052 wd 0.0500 time 0.4439 (0.4518) data time 0.0004 (0.0021) model time 0.4435 (0.4498) loss 2.6980 (2.4446) grad_norm 4.3402 (inf) loss_scale 64.0000 (116.0589) mem 16699MB [2024-08-11 09:57:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [266/300][620/625] eta 0:00:02 lr 0.000052 wd 0.0500 time 0.4446 (0.4517) data time 0.0006 (0.0021) model time 0.4439 (0.4497) loss 1.7511 (2.4446) grad_norm 2.5143 (inf) loss_scale 64.0000 (115.2206) mem 16699MB [2024-08-11 09:57:14 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 266 training takes 0:04:42 [2024-08-11 09:57:14 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 09:57:16 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 09:57:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.475 (0.475) Loss 0.5146 (0.5146) Acc@1 89.648 (89.648) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 09:57:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8433 (0.6202) Acc@1 80.225 (86.856) Acc@5 96.094 (97.727) Mem 16699MB [2024-08-11 09:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9243 (0.7430) Acc@1 79.688 (84.049) Acc@5 95.654 (96.684) Mem 16699MB [2024-08-11 09:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.743 Acc@5 96.647 [2024-08-11 09:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 09:57:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.897 (0.897) Loss 0.5122 (0.5122) Acc@1 89.453 (89.453) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 09:57:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.190) Loss 0.8237 (0.6132) Acc@1 81.201 (87.207) Acc@5 96.338 (97.812) Mem 16699MB [2024-08-11 09:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.154) Loss 0.8984 (0.7290) Acc@1 79.736 (84.426) Acc@5 95.752 (96.803) Mem 16699MB [2024-08-11 09:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.153 Acc@5 96.757 [2024-08-11 09:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.2% [2024-08-11 09:57:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][0/625] eta 0:13:18 lr 0.000052 wd 0.0500 time 1.2783 (1.2783) data time 0.4748 (0.4748) model time 0.0000 (0.0000) loss 1.7787 (1.7787) grad_norm 9.8519 (9.8519) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][10/625] eta 0:05:34 lr 0.000052 wd 0.0500 time 0.4487 (0.5436) data time 0.0009 (0.0440) model time 0.0000 (0.0000) loss 2.1228 (2.2340) grad_norm 3.2777 (3.2737) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][20/625] eta 0:05:01 lr 0.000052 wd 0.0500 time 0.4475 (0.4986) data time 0.0007 (0.0234) model time 0.0000 (0.0000) loss 1.4749 (2.2895) grad_norm 3.4675 (3.1534) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][30/625] eta 0:04:46 lr 0.000052 wd 0.0500 time 0.4469 (0.4822) data time 0.0006 (0.0161) model time 0.0000 (0.0000) loss 2.7331 (2.3238) grad_norm 2.3063 (3.0144) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][40/625] eta 0:04:37 lr 0.000052 wd 0.0500 time 0.4443 (0.4738) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 2.2815 (2.3841) grad_norm 5.4871 (3.1425) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][50/625] eta 0:04:29 lr 0.000052 wd 0.0500 time 0.4463 (0.4687) data time 0.0009 (0.0101) model time 0.0000 (0.0000) loss 1.7038 (2.4002) grad_norm 2.4590 (3.0949) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][60/625] eta 0:04:22 lr 0.000052 wd 0.0500 time 0.4486 (0.4655) data time 0.0009 (0.0086) model time 0.4478 (0.4478) loss 1.9239 (2.4108) grad_norm 3.4580 (3.0006) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:57:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][70/625] eta 0:04:16 lr 0.000052 wd 0.0500 time 0.4458 (0.4630) data time 0.0008 (0.0075) model time 0.4449 (0.4477) loss 2.4451 (2.3894) grad_norm 4.6168 (3.0446) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][80/625] eta 0:04:11 lr 0.000052 wd 0.0500 time 0.4510 (0.4619) data time 0.0006 (0.0067) model time 0.4504 (0.4494) loss 3.0455 (2.4005) grad_norm 4.6227 (2.9797) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][90/625] eta 0:04:06 lr 0.000052 wd 0.0500 time 0.4479 (0.4606) data time 0.0006 (0.0061) model time 0.4472 (0.4494) loss 2.5908 (2.4137) grad_norm 1.8864 (2.9458) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][100/625] eta 0:04:01 lr 0.000052 wd 0.0500 time 0.4454 (0.4593) data time 0.0006 (0.0055) model time 0.4447 (0.4487) loss 2.2047 (2.4039) grad_norm 3.4852 (2.9754) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][110/625] eta 0:03:55 lr 0.000052 wd 0.0500 time 0.4482 (0.4582) data time 0.0008 (0.0051) model time 0.4474 (0.4485) loss 2.8738 (2.4018) grad_norm 7.6399 (2.9490) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][120/625] eta 0:03:50 lr 0.000052 wd 0.0500 time 0.4455 (0.4572) data time 0.0006 (0.0048) model time 0.4449 (0.4480) loss 1.6392 (2.3913) grad_norm 4.5658 (3.0434) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][130/625] eta 0:03:45 lr 0.000052 wd 0.0500 time 0.4454 (0.4565) data time 0.0008 (0.0045) model time 0.4446 (0.4479) loss 2.5998 (2.4135) grad_norm 2.4508 (3.0930) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][140/625] eta 0:03:41 lr 0.000052 wd 0.0500 time 0.4476 (0.4559) data time 0.0007 (0.0042) model time 0.4469 (0.4478) loss 2.3388 (2.4146) grad_norm 2.3505 (3.1006) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][150/625] eta 0:03:36 lr 0.000052 wd 0.0500 time 0.4480 (0.4555) data time 0.0007 (0.0040) model time 0.4473 (0.4480) loss 2.4940 (2.4180) grad_norm 1.7988 (3.0984) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][160/625] eta 0:03:32 lr 0.000052 wd 0.0500 time 0.4483 (0.4564) data time 0.0008 (0.0038) model time 0.4475 (0.4499) loss 2.6315 (2.4129) grad_norm 3.4838 (3.1298) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][170/625] eta 0:03:27 lr 0.000052 wd 0.0500 time 0.4483 (0.4560) data time 0.0006 (0.0036) model time 0.4477 (0.4497) loss 2.0362 (2.4110) grad_norm 1.8884 (3.1038) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][180/625] eta 0:03:22 lr 0.000052 wd 0.0500 time 0.4493 (0.4555) data time 0.0008 (0.0035) model time 0.4484 (0.4495) loss 1.8104 (2.4109) grad_norm 2.6218 (3.0548) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][190/625] eta 0:03:17 lr 0.000052 wd 0.0500 time 0.4453 (0.4551) data time 0.0006 (0.0033) model time 0.4447 (0.4493) loss 2.5519 (2.4122) grad_norm 1.9517 (3.0253) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][200/625] eta 0:03:13 lr 0.000051 wd 0.0500 time 0.4443 (0.4547) data time 0.0006 (0.0032) model time 0.4437 (0.4491) loss 2.4972 (2.4083) grad_norm 2.2565 (3.0614) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:58:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][210/625] eta 0:03:08 lr 0.000051 wd 0.0500 time 0.4468 (0.4544) data time 0.0007 (0.0031) model time 0.4461 (0.4491) loss 2.3465 (2.4170) grad_norm 3.0031 (3.0392) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][220/625] eta 0:03:03 lr 0.000051 wd 0.0500 time 0.4438 (0.4542) data time 0.0006 (0.0030) model time 0.4432 (0.4490) loss 1.9139 (2.4151) grad_norm 3.6727 (3.0293) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][230/625] eta 0:02:59 lr 0.000051 wd 0.0500 time 0.4439 (0.4546) data time 0.0006 (0.0029) model time 0.4433 (0.4498) loss 2.7974 (2.4226) grad_norm 2.3544 (3.0106) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][240/625] eta 0:02:54 lr 0.000051 wd 0.0500 time 0.4452 (0.4543) data time 0.0008 (0.0028) model time 0.4444 (0.4496) loss 1.9469 (2.4098) grad_norm 2.5803 (2.9933) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][250/625] eta 0:02:50 lr 0.000051 wd 0.0500 time 0.4457 (0.4540) data time 0.0009 (0.0027) model time 0.4449 (0.4494) loss 2.4174 (2.4134) grad_norm 1.9412 (2.9851) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][260/625] eta 0:02:45 lr 0.000051 wd 0.0500 time 0.4461 (0.4538) data time 0.0007 (0.0026) model time 0.4454 (0.4493) loss 2.0012 (2.4161) grad_norm 2.3402 (3.0356) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][270/625] eta 0:02:41 lr 0.000051 wd 0.0500 time 0.4461 (0.4535) data time 0.0009 (0.0026) model time 0.4452 (0.4492) loss 2.8328 (2.4193) grad_norm 3.0966 (3.0290) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][280/625] eta 0:02:36 lr 0.000051 wd 0.0500 time 0.4501 (0.4533) data time 0.0007 (0.0025) model time 0.4494 (0.4491) loss 2.0489 (2.4131) grad_norm 11.3254 (3.0481) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][290/625] eta 0:02:31 lr 0.000051 wd 0.0500 time 0.4484 (0.4532) data time 0.0008 (0.0025) model time 0.4476 (0.4490) loss 2.8921 (2.4206) grad_norm 1.9028 (3.0461) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][300/625] eta 0:02:27 lr 0.000051 wd 0.0500 time 0.4466 (0.4530) data time 0.0009 (0.0024) model time 0.4457 (0.4490) loss 2.9656 (2.4253) grad_norm 1.5441 (3.0298) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][310/625] eta 0:02:22 lr 0.000051 wd 0.0500 time 0.4495 (0.4529) data time 0.0009 (0.0024) model time 0.4487 (0.4490) loss 2.7592 (2.4250) grad_norm 2.8389 (3.0742) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][320/625] eta 0:02:18 lr 0.000051 wd 0.0500 time 0.4475 (0.4528) data time 0.0006 (0.0023) model time 0.4469 (0.4489) loss 2.0482 (2.4273) grad_norm 2.7678 (3.1388) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][330/625] eta 0:02:13 lr 0.000051 wd 0.0500 time 0.4456 (0.4526) data time 0.0008 (0.0023) model time 0.4448 (0.4489) loss 2.5511 (2.4248) grad_norm 2.4313 (3.1314) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 09:59:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][340/625] eta 0:02:08 lr 0.000051 wd 0.0500 time 0.4466 (0.4525) data time 0.0006 (0.0022) model time 0.4460 (0.4488) loss 1.7354 (2.4240) grad_norm 2.2927 (3.1267) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][350/625] eta 0:02:04 lr 0.000051 wd 0.0500 time 0.4490 (0.4524) data time 0.0007 (0.0022) model time 0.4483 (0.4488) loss 1.7929 (2.4209) grad_norm 9.4323 (3.1335) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][360/625] eta 0:01:59 lr 0.000051 wd 0.0500 time 0.4498 (0.4523) data time 0.0007 (0.0021) model time 0.4491 (0.4488) loss 2.3297 (2.4225) grad_norm 2.4058 (3.1636) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][370/625] eta 0:01:55 lr 0.000051 wd 0.0500 time 0.4467 (0.4522) data time 0.0007 (0.0021) model time 0.4460 (0.4488) loss 2.0383 (2.4234) grad_norm 2.4087 (3.1475) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][380/625] eta 0:01:50 lr 0.000051 wd 0.0500 time 0.4446 (0.4521) data time 0.0009 (0.0021) model time 0.4437 (0.4487) loss 2.7420 (2.4293) grad_norm 2.4346 (3.1708) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][390/625] eta 0:01:46 lr 0.000051 wd 0.0500 time 0.4473 (0.4520) data time 0.0008 (0.0020) model time 0.4465 (0.4486) loss 2.2722 (2.4318) grad_norm 16.7328 (3.1898) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][400/625] eta 0:01:41 lr 0.000051 wd 0.0500 time 0.4492 (0.4518) data time 0.0008 (0.0020) model time 0.4483 (0.4485) loss 2.7615 (2.4368) grad_norm 2.5541 (3.1844) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][410/625] eta 0:01:37 lr 0.000051 wd 0.0500 time 0.4467 (0.4517) data time 0.0007 (0.0020) model time 0.4459 (0.4485) loss 2.9548 (2.4410) grad_norm 2.6939 (3.1743) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][420/625] eta 0:01:32 lr 0.000051 wd 0.0500 time 0.4445 (0.4516) data time 0.0007 (0.0020) model time 0.4438 (0.4484) loss 2.7273 (2.4442) grad_norm 2.2977 (3.1976) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][430/625] eta 0:01:28 lr 0.000051 wd 0.0500 time 0.4469 (0.4516) data time 0.0009 (0.0019) model time 0.4460 (0.4484) loss 2.4409 (2.4440) grad_norm 2.6769 (3.1949) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][440/625] eta 0:01:23 lr 0.000051 wd 0.0500 time 0.4465 (0.4516) data time 0.0008 (0.0019) model time 0.4458 (0.4485) loss 2.6500 (2.4466) grad_norm 2.8701 (3.2020) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][450/625] eta 0:01:19 lr 0.000051 wd 0.0500 time 0.4495 (0.4516) data time 0.0008 (0.0019) model time 0.4487 (0.4485) loss 2.0210 (2.4465) grad_norm 2.2982 (3.1955) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][460/625] eta 0:01:14 lr 0.000050 wd 0.0500 time 0.4457 (0.4515) data time 0.0010 (0.0019) model time 0.4447 (0.4485) loss 2.2984 (2.4489) grad_norm 2.0406 (3.1895) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:00:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][470/625] eta 0:01:09 lr 0.000050 wd 0.0500 time 0.4447 (0.4514) data time 0.0009 (0.0018) model time 0.4437 (0.4484) loss 1.9000 (2.4478) grad_norm 2.1892 (3.1758) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][480/625] eta 0:01:05 lr 0.000050 wd 0.0500 time 0.4451 (0.4513) data time 0.0008 (0.0018) model time 0.4443 (0.4484) loss 2.6735 (2.4493) grad_norm 2.1235 (3.1702) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][490/625] eta 0:01:00 lr 0.000050 wd 0.0500 time 0.4454 (0.4516) data time 0.0008 (0.0018) model time 0.4446 (0.4488) loss 2.9732 (2.4514) grad_norm 2.2233 (3.2258) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][500/625] eta 0:00:56 lr 0.000050 wd 0.0500 time 0.4498 (0.4515) data time 0.0009 (0.0018) model time 0.4489 (0.4487) loss 2.5221 (2.4533) grad_norm 2.4377 (3.2177) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][510/625] eta 0:00:51 lr 0.000050 wd 0.0500 time 0.4464 (0.4515) data time 0.0007 (0.0018) model time 0.4457 (0.4487) loss 2.1604 (2.4534) grad_norm 3.9706 (3.2137) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][520/625] eta 0:00:47 lr 0.000050 wd 0.0500 time 0.4481 (0.4514) data time 0.0006 (0.0017) model time 0.4475 (0.4487) loss 1.8460 (2.4553) grad_norm 2.2468 (3.2076) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][530/625] eta 0:00:42 lr 0.000050 wd 0.0500 time 0.4477 (0.4514) data time 0.0006 (0.0017) model time 0.4471 (0.4487) loss 2.0702 (2.4540) grad_norm 3.0634 (3.2094) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][540/625] eta 0:00:38 lr 0.000050 wd 0.0500 time 0.4451 (0.4513) data time 0.0008 (0.0017) model time 0.4442 (0.4486) loss 2.6314 (2.4568) grad_norm 3.1665 (3.1897) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][550/625] eta 0:00:33 lr 0.000050 wd 0.0500 time 0.4512 (0.4512) data time 0.0007 (0.0017) model time 0.4505 (0.4485) loss 2.8479 (2.4559) grad_norm 3.2417 (3.1786) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][560/625] eta 0:00:29 lr 0.000050 wd 0.0500 time 0.4448 (0.4513) data time 0.0006 (0.0017) model time 0.4442 (0.4488) loss 2.5768 (2.4561) grad_norm 3.3090 (3.1740) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][570/625] eta 0:00:24 lr 0.000050 wd 0.0500 time 0.4603 (0.4513) data time 0.0009 (0.0017) model time 0.4595 (0.4488) loss 2.5315 (2.4566) grad_norm 2.6449 (3.1658) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][580/625] eta 0:00:20 lr 0.000050 wd 0.0500 time 0.4474 (0.4513) data time 0.0007 (0.0016) model time 0.4467 (0.4487) loss 2.8141 (2.4580) grad_norm 2.1824 (3.1628) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][590/625] eta 0:00:15 lr 0.000050 wd 0.0500 time 0.4481 (0.4512) data time 0.0006 (0.0016) model time 0.4475 (0.4487) loss 2.1263 (2.4567) grad_norm 2.3500 (3.1718) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][600/625] eta 0:00:11 lr 0.000050 wd 0.0500 time 0.4513 (0.4512) data time 0.0006 (0.0016) model time 0.4507 (0.4487) loss 2.2607 (2.4536) grad_norm 2.2363 (3.1786) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:01:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][610/625] eta 0:00:06 lr 0.000050 wd 0.0500 time 0.4426 (0.4511) data time 0.0006 (0.0016) model time 0.4419 (0.4487) loss 2.6436 (2.4533) grad_norm 2.8739 (3.1694) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [267/300][620/625] eta 0:00:02 lr 0.000050 wd 0.0500 time 0.4443 (0.4510) data time 0.0006 (0.0016) model time 0.4437 (0.4486) loss 2.5950 (2.4535) grad_norm 5.0352 (3.1684) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 267 training takes 0:04:41 [2024-08-11 10:02:04 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:02:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:02:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.478 (0.478) Loss 0.5405 (0.5405) Acc@1 88.428 (88.428) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8584 (0.6347) Acc@1 80.664 (86.719) Acc@5 96.240 (97.714) Mem 16699MB [2024-08-11 10:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9214 (0.7539) Acc@1 79.443 (83.954) Acc@5 95.410 (96.622) Mem 16699MB [2024-08-11 10:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.723 Acc@5 96.563 [2024-08-11 10:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 10:02:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.917 (0.917) Loss 0.5132 (0.5132) Acc@1 89.502 (89.502) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:02:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.191) Loss 0.8242 (0.6137) Acc@1 81.152 (87.189) Acc@5 96.289 (97.798) Mem 16699MB [2024-08-11 10:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.155) Loss 0.8984 (0.7299) Acc@1 79.736 (84.398) Acc@5 95.654 (96.780) Mem 16699MB [2024-08-11 10:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.119 Acc@5 96.733 [2024-08-11 10:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 10:02:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][0/625] eta 0:13:03 lr 0.000050 wd 0.0500 time 1.2528 (1.2528) data time 0.6854 (0.6854) model time 0.0000 (0.0000) loss 2.4717 (2.4717) grad_norm 3.9040 (3.9040) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][10/625] eta 0:05:20 lr 0.000050 wd 0.0500 time 0.4524 (0.5215) data time 0.0006 (0.0631) model time 0.0000 (0.0000) loss 1.4858 (2.4205) grad_norm 5.1982 (2.9753) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][20/625] eta 0:04:54 lr 0.000050 wd 0.0500 time 0.4469 (0.4868) data time 0.0006 (0.0335) model time 0.0000 (0.0000) loss 2.4266 (2.5060) grad_norm 27.3221 (4.3079) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][30/625] eta 0:04:42 lr 0.000050 wd 0.0500 time 0.4549 (0.4752) data time 0.0007 (0.0230) model time 0.0000 (0.0000) loss 2.7107 (2.5451) grad_norm 3.3479 (4.0155) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][40/625] eta 0:04:34 lr 0.000050 wd 0.0500 time 0.4462 (0.4693) data time 0.0006 (0.0176) model time 0.0000 (0.0000) loss 2.5621 (2.5743) grad_norm 2.1020 (3.6433) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][50/625] eta 0:04:27 lr 0.000050 wd 0.0500 time 0.4493 (0.4653) data time 0.0006 (0.0143) model time 0.0000 (0.0000) loss 2.4058 (2.5152) grad_norm 3.0881 (3.4553) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][60/625] eta 0:04:21 lr 0.000050 wd 0.0500 time 0.4510 (0.4623) data time 0.0009 (0.0121) model time 0.4502 (0.4463) loss 2.5377 (2.4967) grad_norm 2.6655 (3.4734) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][70/625] eta 0:04:15 lr 0.000050 wd 0.0500 time 0.4461 (0.4603) data time 0.0007 (0.0105) model time 0.4453 (0.4467) loss 2.9042 (2.4913) grad_norm 64.1649 (4.2087) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][80/625] eta 0:04:09 lr 0.000050 wd 0.0500 time 0.4464 (0.4586) data time 0.0007 (0.0093) model time 0.4458 (0.4464) loss 2.7452 (2.4815) grad_norm 3.6003 (4.6127) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][90/625] eta 0:04:06 lr 0.000050 wd 0.0500 time 0.4480 (0.4599) data time 0.0006 (0.0084) model time 0.4473 (0.4522) loss 1.5504 (2.4712) grad_norm 3.3457 (4.4052) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:02:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][100/625] eta 0:04:00 lr 0.000050 wd 0.0500 time 0.4531 (0.4590) data time 0.0008 (0.0076) model time 0.4523 (0.4517) loss 2.1968 (2.4586) grad_norm 21.2773 (4.3977) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][110/625] eta 0:03:56 lr 0.000049 wd 0.0500 time 0.4610 (0.4583) data time 0.0007 (0.0070) model time 0.4603 (0.4515) loss 1.8148 (2.4499) grad_norm 2.4143 (4.2400) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][120/625] eta 0:03:51 lr 0.000049 wd 0.0500 time 0.4486 (0.4579) data time 0.0008 (0.0065) model time 0.4479 (0.4517) loss 1.6210 (2.4404) grad_norm 2.5770 (4.1207) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][130/625] eta 0:03:46 lr 0.000049 wd 0.0500 time 0.4471 (0.4570) data time 0.0008 (0.0061) model time 0.4464 (0.4509) loss 2.4210 (2.4314) grad_norm 3.9890 (4.0645) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][140/625] eta 0:03:41 lr 0.000049 wd 0.0500 time 0.4430 (0.4563) data time 0.0009 (0.0057) model time 0.4421 (0.4503) loss 2.2929 (2.4297) grad_norm 3.8606 (3.9917) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][150/625] eta 0:03:36 lr 0.000049 wd 0.0500 time 0.4462 (0.4557) data time 0.0009 (0.0054) model time 0.4453 (0.4500) loss 2.6122 (2.4514) grad_norm 2.2128 (3.9076) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][160/625] eta 0:03:31 lr 0.000049 wd 0.0500 time 0.4485 (0.4552) data time 0.0008 (0.0051) model time 0.4477 (0.4497) loss 2.6396 (2.4601) grad_norm 2.3972 (4.0983) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][170/625] eta 0:03:26 lr 0.000049 wd 0.0500 time 0.4507 (0.4548) data time 0.0006 (0.0048) model time 0.4501 (0.4496) loss 2.5267 (2.4605) grad_norm 4.0141 (4.0444) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][180/625] eta 0:03:22 lr 0.000049 wd 0.0500 time 0.4503 (0.4545) data time 0.0007 (0.0046) model time 0.4497 (0.4495) loss 2.7986 (2.4585) grad_norm 1.9180 (3.9843) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][190/625] eta 0:03:17 lr 0.000049 wd 0.0500 time 0.4463 (0.4542) data time 0.0009 (0.0044) model time 0.4454 (0.4493) loss 2.7376 (2.4598) grad_norm 2.3396 (3.9104) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][200/625] eta 0:03:12 lr 0.000049 wd 0.0500 time 0.4475 (0.4538) data time 0.0008 (0.0042) model time 0.4467 (0.4491) loss 2.6757 (2.4717) grad_norm 4.6929 (4.3923) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][210/625] eta 0:03:08 lr 0.000049 wd 0.0500 time 0.4420 (0.4535) data time 0.0006 (0.0041) model time 0.4415 (0.4490) loss 2.8371 (2.4674) grad_norm 3.1867 (4.3067) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][220/625] eta 0:03:03 lr 0.000049 wd 0.0500 time 0.4486 (0.4533) data time 0.0006 (0.0039) model time 0.4480 (0.4489) loss 2.5324 (2.4725) grad_norm 2.0408 (4.2732) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:03:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][230/625] eta 0:02:58 lr 0.000049 wd 0.0500 time 0.4486 (0.4531) data time 0.0007 (0.0038) model time 0.4480 (0.4488) loss 2.2506 (2.4676) grad_norm 2.1963 (4.2066) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][240/625] eta 0:02:54 lr 0.000049 wd 0.0500 time 0.4521 (0.4530) data time 0.0008 (0.0037) model time 0.4513 (0.4489) loss 2.5079 (2.4692) grad_norm 2.5099 (4.1810) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][250/625] eta 0:02:49 lr 0.000049 wd 0.0500 time 0.4458 (0.4529) data time 0.0006 (0.0036) model time 0.4452 (0.4489) loss 2.6679 (2.4641) grad_norm 1.9115 (4.1276) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][260/625] eta 0:02:45 lr 0.000049 wd 0.0500 time 0.4535 (0.4528) data time 0.0008 (0.0035) model time 0.4527 (0.4490) loss 2.3551 (2.4609) grad_norm 2.3048 (4.0677) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][270/625] eta 0:02:40 lr 0.000049 wd 0.0500 time 0.4470 (0.4532) data time 0.0007 (0.0034) model time 0.4463 (0.4496) loss 2.3431 (2.4639) grad_norm 2.1579 (4.0034) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][280/625] eta 0:02:36 lr 0.000049 wd 0.0500 time 0.4441 (0.4535) data time 0.0008 (0.0033) model time 0.4433 (0.4501) loss 2.5545 (2.4720) grad_norm 3.6758 (3.9537) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][290/625] eta 0:02:31 lr 0.000049 wd 0.0500 time 0.4460 (0.4533) data time 0.0006 (0.0032) model time 0.4454 (0.4499) loss 2.5793 (2.4719) grad_norm 3.0107 (3.9328) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][300/625] eta 0:02:27 lr 0.000049 wd 0.0500 time 0.4501 (0.4531) data time 0.0009 (0.0031) model time 0.4493 (0.4498) loss 2.6505 (2.4747) grad_norm 4.0340 (3.8889) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][310/625] eta 0:02:22 lr 0.000049 wd 0.0500 time 0.4473 (0.4530) data time 0.0009 (0.0030) model time 0.4464 (0.4497) loss 2.5322 (2.4694) grad_norm 2.4775 (3.8714) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][320/625] eta 0:02:18 lr 0.000049 wd 0.0500 time 0.4453 (0.4528) data time 0.0008 (0.0030) model time 0.4445 (0.4496) loss 2.6013 (2.4769) grad_norm 1.9866 (3.8500) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][330/625] eta 0:02:13 lr 0.000049 wd 0.0500 time 0.4477 (0.4527) data time 0.0008 (0.0029) model time 0.4469 (0.4495) loss 2.4761 (2.4696) grad_norm 1.8357 (3.8115) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][340/625] eta 0:02:08 lr 0.000049 wd 0.0500 time 0.4453 (0.4525) data time 0.0006 (0.0028) model time 0.4446 (0.4494) loss 2.8887 (2.4696) grad_norm 2.3325 (3.7776) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][350/625] eta 0:02:04 lr 0.000049 wd 0.0500 time 0.4468 (0.4523) data time 0.0008 (0.0028) model time 0.4460 (0.4493) loss 2.4884 (2.4747) grad_norm 2.6474 (3.7385) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:04:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][360/625] eta 0:01:59 lr 0.000049 wd 0.0500 time 0.4453 (0.4521) data time 0.0008 (0.0027) model time 0.4444 (0.4491) loss 2.2665 (2.4815) grad_norm 1.5312 (3.7040) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][370/625] eta 0:01:55 lr 0.000049 wd 0.0500 time 0.4439 (0.4520) data time 0.0008 (0.0027) model time 0.4431 (0.4491) loss 2.1375 (2.4804) grad_norm 2.0362 (3.6740) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][380/625] eta 0:01:50 lr 0.000048 wd 0.0500 time 0.4451 (0.4519) data time 0.0009 (0.0026) model time 0.4442 (0.4490) loss 2.7730 (2.4845) grad_norm 2.5934 (3.6544) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][390/625] eta 0:01:46 lr 0.000048 wd 0.0500 time 0.4477 (0.4518) data time 0.0007 (0.0026) model time 0.4470 (0.4489) loss 1.9475 (2.4796) grad_norm 2.2210 (3.6222) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][400/625] eta 0:01:41 lr 0.000048 wd 0.0500 time 0.4542 (0.4517) data time 0.0007 (0.0025) model time 0.4535 (0.4488) loss 2.2310 (2.4842) grad_norm 1.8424 (3.5866) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][410/625] eta 0:01:37 lr 0.000048 wd 0.0500 time 0.4480 (0.4515) data time 0.0007 (0.0025) model time 0.4473 (0.4488) loss 2.1200 (2.4788) grad_norm 2.6959 (3.5766) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][420/625] eta 0:01:32 lr 0.000048 wd 0.0500 time 0.4448 (0.4514) data time 0.0007 (0.0025) model time 0.4441 (0.4486) loss 1.8914 (2.4822) grad_norm 3.3607 (3.5566) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][430/625] eta 0:01:27 lr 0.000048 wd 0.0500 time 0.4451 (0.4513) data time 0.0007 (0.0024) model time 0.4444 (0.4486) loss 2.7744 (2.4783) grad_norm 2.7267 (3.6066) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][440/625] eta 0:01:23 lr 0.000048 wd 0.0500 time 0.4501 (0.4512) data time 0.0006 (0.0024) model time 0.4495 (0.4485) loss 1.6252 (2.4795) grad_norm 3.0497 (3.5901) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][450/625] eta 0:01:18 lr 0.000048 wd 0.0500 time 0.4451 (0.4511) data time 0.0007 (0.0023) model time 0.4445 (0.4484) loss 2.5629 (2.4821) grad_norm 13.4665 (3.5993) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][460/625] eta 0:01:14 lr 0.000048 wd 0.0500 time 0.4498 (0.4510) data time 0.0008 (0.0023) model time 0.4490 (0.4484) loss 2.7020 (2.4801) grad_norm 1.9001 (3.5721) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][470/625] eta 0:01:09 lr 0.000048 wd 0.0500 time 0.4478 (0.4510) data time 0.0007 (0.0023) model time 0.4472 (0.4484) loss 2.0821 (2.4782) grad_norm 2.5647 (3.5812) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][480/625] eta 0:01:05 lr 0.000048 wd 0.0500 time 0.4467 (0.4509) data time 0.0006 (0.0023) model time 0.4461 (0.4484) loss 2.6142 (2.4779) grad_norm 2.9620 (3.5612) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][490/625] eta 0:01:00 lr 0.000048 wd 0.0500 time 0.4470 (0.4509) data time 0.0007 (0.0022) model time 0.4463 (0.4484) loss 2.2422 (2.4793) grad_norm 2.7081 (3.5379) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:05:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][500/625] eta 0:00:56 lr 0.000048 wd 0.0500 time 0.4441 (0.4512) data time 0.0007 (0.0022) model time 0.4435 (0.4488) loss 1.7423 (2.4754) grad_norm 2.0981 (3.5178) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][510/625] eta 0:00:51 lr 0.000048 wd 0.0500 time 0.4472 (0.4511) data time 0.0008 (0.0022) model time 0.4464 (0.4488) loss 2.0564 (2.4715) grad_norm 3.1101 (3.5065) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][520/625] eta 0:00:47 lr 0.000048 wd 0.0500 time 0.4467 (0.4511) data time 0.0008 (0.0021) model time 0.4459 (0.4487) loss 2.5711 (2.4736) grad_norm 2.4274 (3.4816) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][530/625] eta 0:00:42 lr 0.000048 wd 0.0500 time 0.4477 (0.4510) data time 0.0008 (0.0021) model time 0.4470 (0.4487) loss 2.6462 (2.4717) grad_norm 2.3027 (3.4586) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][540/625] eta 0:00:38 lr 0.000048 wd 0.0500 time 0.4480 (0.4510) data time 0.0008 (0.0021) model time 0.4472 (0.4486) loss 2.4859 (2.4751) grad_norm 2.3022 (3.4588) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][550/625] eta 0:00:33 lr 0.000048 wd 0.0500 time 0.4475 (0.4509) data time 0.0007 (0.0021) model time 0.4469 (0.4486) loss 2.5898 (2.4765) grad_norm 2.4936 (3.4572) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][560/625] eta 0:00:29 lr 0.000048 wd 0.0500 time 0.4467 (0.4508) data time 0.0006 (0.0020) model time 0.4461 (0.4486) loss 2.0563 (2.4713) grad_norm 2.6479 (3.4387) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][570/625] eta 0:00:24 lr 0.000048 wd 0.0500 time 0.4466 (0.4508) data time 0.0006 (0.0020) model time 0.4460 (0.4485) loss 2.6281 (2.4707) grad_norm 2.8107 (3.4276) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][580/625] eta 0:00:20 lr 0.000048 wd 0.0500 time 0.4430 (0.4507) data time 0.0007 (0.0020) model time 0.4422 (0.4485) loss 2.9522 (2.4747) grad_norm 3.2232 (3.4127) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][590/625] eta 0:00:15 lr 0.000048 wd 0.0500 time 0.4520 (0.4507) data time 0.0008 (0.0020) model time 0.4512 (0.4485) loss 2.7120 (2.4778) grad_norm 2.0202 (3.4144) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][600/625] eta 0:00:11 lr 0.000048 wd 0.0500 time 0.4530 (0.4509) data time 0.0006 (0.0020) model time 0.4524 (0.4487) loss 1.8234 (2.4767) grad_norm 2.7247 (3.4085) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][610/625] eta 0:00:06 lr 0.000048 wd 0.0500 time 0.4436 (0.4508) data time 0.0006 (0.0020) model time 0.4430 (0.4487) loss 2.6154 (2.4787) grad_norm 3.8686 (3.3975) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [268/300][620/625] eta 0:00:02 lr 0.000048 wd 0.0500 time 0.4426 (0.4507) data time 0.0006 (0.0019) model time 0.4420 (0.4486) loss 2.0519 (2.4780) grad_norm 3.3724 (3.3968) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:06:55 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 268 training takes 0:04:41 [2024-08-11 10:06:55 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:06:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:06:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.483 (0.483) Loss 0.5244 (0.5244) Acc@1 88.818 (88.818) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8433 (0.6224) Acc@1 80.811 (86.963) Acc@5 96.191 (97.745) Mem 16699MB [2024-08-11 10:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9258 (0.7452) Acc@1 78.906 (84.077) Acc@5 95.361 (96.677) Mem 16699MB [2024-08-11 10:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.785 Acc@5 96.623 [2024-08-11 10:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:07:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.810 (0.810) Loss 0.5146 (0.5146) Acc@1 89.502 (89.502) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:07:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8262 (0.6143) Acc@1 81.250 (87.167) Acc@5 96.387 (97.803) Mem 16699MB [2024-08-11 10:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.9009 (0.7308) Acc@1 79.736 (84.368) Acc@5 95.605 (96.782) Mem 16699MB [2024-08-11 10:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.095 Acc@5 96.733 [2024-08-11 10:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 10:07:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][0/625] eta 0:12:35 lr 0.000048 wd 0.0500 time 1.2084 (1.2084) data time 0.6812 (0.6812) model time 0.0000 (0.0000) loss 1.7736 (1.7736) grad_norm 2.0855 (2.0855) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][10/625] eta 0:05:17 lr 0.000048 wd 0.0500 time 0.4468 (0.5161) data time 0.0010 (0.0627) model time 0.0000 (0.0000) loss 2.7273 (2.4965) grad_norm 2.2850 (2.7252) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][20/625] eta 0:04:52 lr 0.000047 wd 0.0500 time 0.4430 (0.4831) data time 0.0007 (0.0333) model time 0.0000 (0.0000) loss 2.7124 (2.5188) grad_norm 3.0537 (2.8216) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][30/625] eta 0:04:43 lr 0.000047 wd 0.0500 time 0.4473 (0.4766) data time 0.0006 (0.0228) model time 0.0000 (0.0000) loss 2.8214 (2.5164) grad_norm 1.9459 (2.8252) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][40/625] eta 0:04:34 lr 0.000047 wd 0.0500 time 0.4499 (0.4695) data time 0.0008 (0.0175) model time 0.0000 (0.0000) loss 2.3847 (2.4900) grad_norm 1.8861 (2.7436) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][50/625] eta 0:04:27 lr 0.000047 wd 0.0500 time 0.4517 (0.4654) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 2.7124 (2.4856) grad_norm 4.1897 (4.2783) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][60/625] eta 0:04:21 lr 0.000047 wd 0.0500 time 0.4476 (0.4625) data time 0.0007 (0.0120) model time 0.4470 (0.4467) loss 2.6283 (2.4703) grad_norm 2.5141 (4.0804) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][70/625] eta 0:04:15 lr 0.000047 wd 0.0500 time 0.4504 (0.4603) data time 0.0007 (0.0104) model time 0.4496 (0.4467) loss 2.7246 (2.4798) grad_norm 1.8605 (4.0012) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][80/625] eta 0:04:10 lr 0.000047 wd 0.0500 time 0.4507 (0.4588) data time 0.0006 (0.0092) model time 0.4501 (0.4469) loss 2.1582 (2.4467) grad_norm 3.2481 (3.8452) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][90/625] eta 0:04:04 lr 0.000047 wd 0.0500 time 0.4478 (0.4576) data time 0.0008 (0.0083) model time 0.4470 (0.4469) loss 1.7130 (2.4438) grad_norm 3.0263 (3.7007) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][100/625] eta 0:03:59 lr 0.000047 wd 0.0500 time 0.4453 (0.4568) data time 0.0006 (0.0076) model time 0.4447 (0.4473) loss 2.5413 (2.4118) grad_norm 1.7214 (3.6300) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][110/625] eta 0:03:54 lr 0.000047 wd 0.0500 time 0.4508 (0.4562) data time 0.0006 (0.0069) model time 0.4501 (0.4476) loss 1.7715 (2.4159) grad_norm 2.5720 (3.5520) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:07:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][120/625] eta 0:03:50 lr 0.000047 wd 0.0500 time 0.4498 (0.4556) data time 0.0006 (0.0064) model time 0.4491 (0.4477) loss 2.4710 (2.4139) grad_norm 3.1252 (3.6775) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][130/625] eta 0:03:45 lr 0.000047 wd 0.0500 time 0.4496 (0.4551) data time 0.0008 (0.0060) model time 0.4488 (0.4478) loss 2.5242 (2.4356) grad_norm 3.1021 (3.6282) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][140/625] eta 0:03:40 lr 0.000047 wd 0.0500 time 0.4496 (0.4550) data time 0.0008 (0.0057) model time 0.4488 (0.4484) loss 2.6131 (2.4494) grad_norm 2.2242 (3.5948) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][150/625] eta 0:03:35 lr 0.000047 wd 0.0500 time 0.4469 (0.4545) data time 0.0009 (0.0053) model time 0.4460 (0.4482) loss 2.5209 (2.4464) grad_norm 7.8686 (3.5675) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][160/625] eta 0:03:31 lr 0.000047 wd 0.0500 time 0.4485 (0.4541) data time 0.0006 (0.0051) model time 0.4479 (0.4480) loss 2.6114 (2.4474) grad_norm 2.2134 (3.7668) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][170/625] eta 0:03:26 lr 0.000047 wd 0.0500 time 0.4440 (0.4536) data time 0.0008 (0.0048) model time 0.4432 (0.4478) loss 2.6651 (2.4433) grad_norm 2.3399 (3.7707) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][180/625] eta 0:03:22 lr 0.000047 wd 0.0500 time 0.4481 (0.4544) data time 0.0006 (0.0046) model time 0.4475 (0.4493) loss 2.7396 (2.4367) grad_norm 3.2648 (3.7207) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][190/625] eta 0:03:17 lr 0.000047 wd 0.0500 time 0.4460 (0.4540) data time 0.0007 (0.0044) model time 0.4453 (0.4491) loss 2.6472 (2.4412) grad_norm 3.0770 (3.7619) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][200/625] eta 0:03:12 lr 0.000047 wd 0.0500 time 0.4472 (0.4538) data time 0.0008 (0.0042) model time 0.4464 (0.4490) loss 2.7975 (2.4484) grad_norm 2.8607 (3.8321) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][210/625] eta 0:03:08 lr 0.000047 wd 0.0500 time 0.4460 (0.4535) data time 0.0009 (0.0040) model time 0.4451 (0.4489) loss 1.7461 (2.4342) grad_norm 3.6611 (3.8016) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][220/625] eta 0:03:03 lr 0.000047 wd 0.0500 time 0.4440 (0.4532) data time 0.0007 (0.0039) model time 0.4433 (0.4488) loss 1.5565 (2.4369) grad_norm 2.2902 (3.7530) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][230/625] eta 0:02:58 lr 0.000047 wd 0.0500 time 0.4484 (0.4529) data time 0.0009 (0.0038) model time 0.4475 (0.4486) loss 2.5050 (2.4417) grad_norm 2.4971 (3.7115) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][240/625] eta 0:02:54 lr 0.000047 wd 0.0500 time 0.4520 (0.4527) data time 0.0006 (0.0036) model time 0.4514 (0.4485) loss 2.7031 (2.4372) grad_norm 3.8729 (3.7214) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][250/625] eta 0:02:49 lr 0.000047 wd 0.0500 time 0.4474 (0.4525) data time 0.0006 (0.0035) model time 0.4467 (0.4484) loss 1.9401 (2.4361) grad_norm 2.6476 (3.6891) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][260/625] eta 0:02:45 lr 0.000047 wd 0.0500 time 0.4507 (0.4523) data time 0.0008 (0.0034) model time 0.4499 (0.4483) loss 2.9783 (2.4400) grad_norm 2.2374 (3.6445) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][270/625] eta 0:02:40 lr 0.000047 wd 0.0500 time 0.4499 (0.4522) data time 0.0007 (0.0033) model time 0.4492 (0.4483) loss 2.4238 (2.4412) grad_norm 16.4418 (3.6781) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][280/625] eta 0:02:35 lr 0.000047 wd 0.0500 time 0.4482 (0.4521) data time 0.0006 (0.0032) model time 0.4476 (0.4484) loss 2.7922 (2.4389) grad_norm 3.3758 (3.6463) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][290/625] eta 0:02:31 lr 0.000047 wd 0.0500 time 0.4467 (0.4520) data time 0.0009 (0.0032) model time 0.4458 (0.4484) loss 1.9922 (2.4370) grad_norm 2.7699 (3.6219) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][300/625] eta 0:02:26 lr 0.000046 wd 0.0500 time 0.4414 (0.4519) data time 0.0008 (0.0031) model time 0.4406 (0.4483) loss 2.7062 (2.4340) grad_norm 3.7640 (3.7641) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][310/625] eta 0:02:22 lr 0.000046 wd 0.0500 time 0.4490 (0.4518) data time 0.0007 (0.0030) model time 0.4483 (0.4483) loss 2.5953 (2.4407) grad_norm 2.5575 (3.7551) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][320/625] eta 0:02:17 lr 0.000046 wd 0.0500 time 0.4477 (0.4516) data time 0.0008 (0.0029) model time 0.4469 (0.4482) loss 2.6497 (2.4382) grad_norm 4.3994 (3.7426) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][330/625] eta 0:02:13 lr 0.000046 wd 0.0500 time 0.4502 (0.4516) data time 0.0010 (0.0029) model time 0.4492 (0.4483) loss 2.2004 (2.4354) grad_norm 2.2955 (3.7780) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][340/625] eta 0:02:08 lr 0.000046 wd 0.0500 time 0.4479 (0.4515) data time 0.0008 (0.0028) model time 0.4471 (0.4482) loss 2.5315 (2.4336) grad_norm 3.1067 (3.7808) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][350/625] eta 0:02:04 lr 0.000046 wd 0.0500 time 0.4506 (0.4515) data time 0.0007 (0.0028) model time 0.4499 (0.4483) loss 2.7965 (2.4406) grad_norm 4.8721 (3.7577) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][360/625] eta 0:01:59 lr 0.000046 wd 0.0500 time 0.4473 (0.4518) data time 0.0009 (0.0027) model time 0.4464 (0.4487) loss 2.7458 (2.4424) grad_norm 2.2468 (3.7414) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][370/625] eta 0:01:55 lr 0.000046 wd 0.0500 time 0.4452 (0.4517) data time 0.0008 (0.0027) model time 0.4443 (0.4487) loss 2.0731 (2.4436) grad_norm 1.9883 (3.7065) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][380/625] eta 0:01:50 lr 0.000046 wd 0.0500 time 0.4492 (0.4516) data time 0.0009 (0.0026) model time 0.4483 (0.4486) loss 2.6933 (2.4428) grad_norm 2.7741 (3.6815) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][390/625] eta 0:01:46 lr 0.000046 wd 0.0500 time 0.4469 (0.4515) data time 0.0006 (0.0026) model time 0.4463 (0.4486) loss 2.2528 (2.4383) grad_norm 3.9190 (3.6598) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][400/625] eta 0:01:41 lr 0.000046 wd 0.0500 time 0.4488 (0.4515) data time 0.0007 (0.0025) model time 0.4481 (0.4486) loss 2.9639 (2.4393) grad_norm 3.9114 (3.6330) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][410/625] eta 0:01:37 lr 0.000046 wd 0.0500 time 0.4537 (0.4515) data time 0.0009 (0.0025) model time 0.4528 (0.4487) loss 2.6952 (2.4463) grad_norm 2.6059 (3.6038) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][420/625] eta 0:01:32 lr 0.000046 wd 0.0500 time 0.4506 (0.4514) data time 0.0010 (0.0024) model time 0.4496 (0.4487) loss 2.3898 (2.4438) grad_norm 4.0534 (3.5939) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][430/625] eta 0:01:28 lr 0.000046 wd 0.0500 time 0.4413 (0.4513) data time 0.0008 (0.0024) model time 0.4405 (0.4486) loss 2.8791 (2.4423) grad_norm 2.6226 (3.5979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][440/625] eta 0:01:23 lr 0.000046 wd 0.0500 time 0.4451 (0.4512) data time 0.0009 (0.0024) model time 0.4443 (0.4485) loss 2.4814 (2.4378) grad_norm 3.6837 (3.5739) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][450/625] eta 0:01:18 lr 0.000046 wd 0.0500 time 0.4477 (0.4512) data time 0.0008 (0.0023) model time 0.4469 (0.4485) loss 2.4068 (2.4364) grad_norm 2.8870 (3.6548) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][460/625] eta 0:01:14 lr 0.000046 wd 0.0500 time 0.4523 (0.4511) data time 0.0006 (0.0023) model time 0.4517 (0.4485) loss 2.6017 (2.4370) grad_norm 3.6112 (3.6383) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][470/625] eta 0:01:09 lr 0.000046 wd 0.0500 time 0.4477 (0.4511) data time 0.0007 (0.0023) model time 0.4470 (0.4485) loss 2.7486 (2.4352) grad_norm 1.9485 (3.6116) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][480/625] eta 0:01:05 lr 0.000046 wd 0.0500 time 0.4505 (0.4510) data time 0.0008 (0.0022) model time 0.4497 (0.4485) loss 2.6090 (2.4347) grad_norm 2.1539 (3.6463) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][490/625] eta 0:01:00 lr 0.000046 wd 0.0500 time 0.4487 (0.4510) data time 0.0006 (0.0022) model time 0.4481 (0.4485) loss 2.5465 (2.4322) grad_norm 2.6335 (3.6283) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][500/625] eta 0:00:56 lr 0.000046 wd 0.0500 time 0.4498 (0.4510) data time 0.0006 (0.0022) model time 0.4492 (0.4485) loss 2.7890 (2.4343) grad_norm 3.6237 (3.6520) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][510/625] eta 0:00:51 lr 0.000046 wd 0.0500 time 0.4458 (0.4516) data time 0.0007 (0.0022) model time 0.4451 (0.4492) loss 1.3017 (2.4331) grad_norm 3.7133 (3.6414) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:10:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][520/625] eta 0:00:47 lr 0.000046 wd 0.0500 time 0.4462 (0.4515) data time 0.0009 (0.0021) model time 0.4453 (0.4492) loss 1.5620 (2.4318) grad_norm 6.4846 (3.6368) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][530/625] eta 0:00:42 lr 0.000046 wd 0.0500 time 0.4494 (0.4514) data time 0.0009 (0.0021) model time 0.4486 (0.4491) loss 2.7547 (2.4374) grad_norm 2.6629 (3.6271) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][540/625] eta 0:00:38 lr 0.000046 wd 0.0500 time 0.4689 (0.4514) data time 0.0006 (0.0021) model time 0.4683 (0.4491) loss 3.1210 (2.4394) grad_norm 2.8533 (3.6139) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][550/625] eta 0:00:33 lr 0.000046 wd 0.0500 time 0.4494 (0.4514) data time 0.0009 (0.0021) model time 0.4486 (0.4492) loss 2.7505 (2.4366) grad_norm 2.4375 (3.5961) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][560/625] eta 0:00:29 lr 0.000046 wd 0.0500 time 0.4464 (0.4514) data time 0.0009 (0.0020) model time 0.4455 (0.4492) loss 2.6220 (2.4362) grad_norm 2.1769 (3.5939) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][570/625] eta 0:00:24 lr 0.000046 wd 0.0500 time 0.4519 (0.4514) data time 0.0006 (0.0020) model time 0.4514 (0.4492) loss 2.6345 (2.4374) grad_norm 2.6314 (3.5976) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][580/625] eta 0:00:20 lr 0.000045 wd 0.0500 time 0.4459 (0.4514) data time 0.0009 (0.0020) model time 0.4451 (0.4492) loss 2.0114 (2.4373) grad_norm 2.7653 (3.6182) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][590/625] eta 0:00:15 lr 0.000045 wd 0.0500 time 0.4491 (0.4513) data time 0.0009 (0.0020) model time 0.4483 (0.4492) loss 2.8571 (2.4378) grad_norm 1.9470 (3.6155) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][600/625] eta 0:00:11 lr 0.000045 wd 0.0500 time 0.4453 (0.4513) data time 0.0008 (0.0020) model time 0.4444 (0.4491) loss 2.8648 (2.4388) grad_norm 2.4799 (3.6255) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][610/625] eta 0:00:06 lr 0.000045 wd 0.0500 time 0.4475 (0.4512) data time 0.0004 (0.0020) model time 0.4471 (0.4491) loss 2.0402 (2.4346) grad_norm 2.5938 (3.6060) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [269/300][620/625] eta 0:00:02 lr 0.000045 wd 0.0500 time 0.4459 (0.4511) data time 0.0007 (0.0019) model time 0.4452 (0.4490) loss 2.7660 (2.4368) grad_norm 11.5603 (3.6198) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:11:45 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 269 training takes 0:04:41 [2024-08-11 10:11:45 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:11:47 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:11:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.486 (0.486) Loss 0.5347 (0.5347) Acc@1 89.062 (89.062) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 10:11:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.153) Loss 0.8574 (0.6331) Acc@1 80.469 (86.879) Acc@5 96.143 (97.741) Mem 16699MB [2024-08-11 10:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9258 (0.7517) Acc@1 79.443 (84.047) Acc@5 95.166 (96.652) Mem 16699MB [2024-08-11 10:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.801 Acc@5 96.597 [2024-08-11 10:11:50 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:11:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.868 (0.868) Loss 0.5146 (0.5146) Acc@1 89.502 (89.502) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:11:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.188) Loss 0.8257 (0.6146) Acc@1 81.104 (87.114) Acc@5 96.387 (97.798) Mem 16699MB [2024-08-11 10:11:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.153) Loss 0.9033 (0.7314) Acc@1 79.443 (84.317) Acc@5 95.557 (96.777) Mem 16699MB [2024-08-11 10:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.053 Acc@5 96.727 [2024-08-11 10:11:54 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.1% [2024-08-11 10:11:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][0/625] eta 0:12:20 lr 0.000045 wd 0.0500 time 1.1846 (1.1846) data time 0.6209 (0.6209) model time 0.0000 (0.0000) loss 2.2350 (2.2350) grad_norm 3.1742 (3.1742) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:11:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][10/625] eta 0:05:17 lr 0.000045 wd 0.0500 time 0.4501 (0.5166) data time 0.0008 (0.0573) model time 0.0000 (0.0000) loss 2.0240 (2.3329) grad_norm 2.0882 (2.4920) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][20/625] eta 0:04:53 lr 0.000045 wd 0.0500 time 0.4460 (0.4847) data time 0.0008 (0.0304) model time 0.0000 (0.0000) loss 2.5477 (2.4356) grad_norm 2.2964 (2.9160) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][30/625] eta 0:04:41 lr 0.000045 wd 0.0500 time 0.4497 (0.4727) data time 0.0007 (0.0208) model time 0.0000 (0.0000) loss 1.9121 (2.4156) grad_norm 2.1113 (2.7854) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][40/625] eta 0:04:33 lr 0.000045 wd 0.0500 time 0.4485 (0.4667) data time 0.0006 (0.0159) model time 0.0000 (0.0000) loss 2.4643 (2.4106) grad_norm 3.0473 (2.8520) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][50/625] eta 0:04:26 lr 0.000045 wd 0.0500 time 0.4523 (0.4631) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 2.9787 (2.4238) grad_norm 2.2153 (2.8870) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][60/625] eta 0:04:20 lr 0.000045 wd 0.0500 time 0.4454 (0.4610) data time 0.0006 (0.0110) model time 0.4448 (0.4492) loss 2.5478 (2.4415) grad_norm 5.0754 (2.9651) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][70/625] eta 0:04:14 lr 0.000045 wd 0.0500 time 0.4493 (0.4594) data time 0.0009 (0.0096) model time 0.4485 (0.4490) loss 2.6707 (2.4418) grad_norm 2.8918 (3.6032) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][80/625] eta 0:04:09 lr 0.000045 wd 0.0500 time 0.4454 (0.4580) data time 0.0008 (0.0085) model time 0.4447 (0.4484) loss 2.8151 (2.4581) grad_norm 2.2467 (3.7386) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][90/625] eta 0:04:04 lr 0.000045 wd 0.0500 time 0.4424 (0.4569) data time 0.0008 (0.0076) model time 0.4416 (0.4481) loss 2.8042 (2.4770) grad_norm 2.2554 (3.6995) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][100/625] eta 0:03:59 lr 0.000045 wd 0.0500 time 0.4452 (0.4570) data time 0.0006 (0.0070) model time 0.4446 (0.4499) loss 2.4969 (2.4939) grad_norm 2.5531 (3.5952) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][110/625] eta 0:03:54 lr 0.000045 wd 0.0500 time 0.4530 (0.4561) data time 0.0006 (0.0064) model time 0.4524 (0.4494) loss 2.7805 (2.5038) grad_norm 7.5228 (3.5335) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][120/625] eta 0:03:49 lr 0.000045 wd 0.0500 time 0.4467 (0.4554) data time 0.0006 (0.0059) model time 0.4461 (0.4489) loss 3.0123 (2.5146) grad_norm 3.2092 (3.4974) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][130/625] eta 0:03:45 lr 0.000045 wd 0.0500 time 0.6570 (0.4564) data time 0.0006 (0.0055) model time 0.6564 (0.4514) loss 2.6464 (2.5022) grad_norm 2.9021 (3.4516) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:12:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][140/625] eta 0:03:41 lr 0.000045 wd 0.0500 time 0.4482 (0.4559) data time 0.0009 (0.0052) model time 0.4473 (0.4510) loss 2.6028 (2.4944) grad_norm 2.5918 (3.3884) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][150/625] eta 0:03:36 lr 0.000045 wd 0.0500 time 0.4500 (0.4555) data time 0.0006 (0.0049) model time 0.4494 (0.4508) loss 1.6544 (2.4814) grad_norm 3.4495 (3.3440) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][160/625] eta 0:03:31 lr 0.000045 wd 0.0500 time 0.4452 (0.4551) data time 0.0006 (0.0046) model time 0.4446 (0.4506) loss 3.0281 (2.4819) grad_norm 2.0157 (3.3076) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][170/625] eta 0:03:26 lr 0.000045 wd 0.0500 time 0.4447 (0.4547) data time 0.0009 (0.0044) model time 0.4438 (0.4503) loss 1.8057 (2.4848) grad_norm 2.7037 (3.2542) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][180/625] eta 0:03:22 lr 0.000045 wd 0.0500 time 0.4482 (0.4542) data time 0.0008 (0.0042) model time 0.4475 (0.4500) loss 2.1262 (2.4852) grad_norm 3.4097 (3.2154) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][190/625] eta 0:03:17 lr 0.000045 wd 0.0500 time 0.4490 (0.4539) data time 0.0008 (0.0041) model time 0.4481 (0.4498) loss 2.6300 (2.4743) grad_norm 2.6695 (3.2115) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][200/625] eta 0:03:12 lr 0.000045 wd 0.0500 time 0.4473 (0.4536) data time 0.0006 (0.0039) model time 0.4467 (0.4496) loss 2.1531 (2.4684) grad_norm 3.8644 (3.1869) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][210/625] eta 0:03:08 lr 0.000045 wd 0.0500 time 0.4491 (0.4534) data time 0.0007 (0.0037) model time 0.4483 (0.4495) loss 2.7901 (2.4704) grad_norm 5.2834 (3.1676) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][220/625] eta 0:03:03 lr 0.000045 wd 0.0500 time 0.4475 (0.4532) data time 0.0008 (0.0036) model time 0.4467 (0.4494) loss 2.0593 (2.4701) grad_norm 3.1225 (3.1426) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][230/625] eta 0:02:58 lr 0.000045 wd 0.0500 time 0.4503 (0.4530) data time 0.0009 (0.0035) model time 0.4494 (0.4493) loss 2.0014 (2.4641) grad_norm 3.1688 (3.1347) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][240/625] eta 0:02:54 lr 0.000044 wd 0.0500 time 0.4442 (0.4528) data time 0.0006 (0.0034) model time 0.4436 (0.4492) loss 1.5329 (2.4577) grad_norm 2.1485 (3.1142) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][250/625] eta 0:02:49 lr 0.000044 wd 0.0500 time 0.4489 (0.4526) data time 0.0006 (0.0033) model time 0.4483 (0.4491) loss 2.3076 (2.4634) grad_norm 1.6466 (3.0868) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][260/625] eta 0:02:45 lr 0.000044 wd 0.0500 time 0.4445 (0.4524) data time 0.0008 (0.0032) model time 0.4436 (0.4490) loss 2.3203 (2.4602) grad_norm 2.0151 (3.0659) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:13:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][270/625] eta 0:02:40 lr 0.000044 wd 0.0500 time 0.4483 (0.4522) data time 0.0006 (0.0031) model time 0.4476 (0.4489) loss 2.0447 (2.4542) grad_norm 3.1921 (3.0437) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:14:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][280/625] eta 0:02:35 lr 0.000044 wd 0.0500 time 0.4512 (0.4521) data time 0.0008 (0.0030) model time 0.4504 (0.4488) loss 2.1601 (2.4534) grad_norm 3.2805 (3.0542) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:14:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][290/625] eta 0:02:31 lr 0.000044 wd 0.0500 time 0.4486 (0.4520) data time 0.0007 (0.0029) model time 0.4479 (0.4488) loss 2.7518 (2.4522) grad_norm 3.0572 (3.0794) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:14:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][300/625] eta 0:02:26 lr 0.000044 wd 0.0500 time 0.4468 (0.4519) data time 0.0009 (0.0029) model time 0.4459 (0.4488) loss 2.4486 (2.4501) grad_norm 3.0689 (3.0669) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:14:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][310/625] eta 0:02:22 lr 0.000044 wd 0.0500 time 0.4462 (0.4518) data time 0.0007 (0.0028) model time 0.4455 (0.4487) loss 1.4717 (2.4443) grad_norm 3.0866 (3.0589) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:14:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][320/625] eta 0:02:18 lr 0.000044 wd 0.0500 time 0.6188 (0.4526) data time 0.0007 (0.0028) model time 0.6181 (0.4498) loss 2.8439 (2.4450) grad_norm 2.9279 (3.0515) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:14:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][330/625] eta 0:02:13 lr 0.000044 wd 0.0500 time 0.4491 (0.4524) data time 0.0007 (0.0027) model time 0.4485 (0.4496) loss 2.3106 (2.4466) grad_norm 2.5868 (inf) loss_scale 64.0000 (127.2266) mem 16699MB [2024-08-11 10:14:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][340/625] eta 0:02:08 lr 0.000044 wd 0.0500 time 0.4452 (0.4523) data time 0.0007 (0.0026) model time 0.4445 (0.4495) loss 1.7022 (2.4411) grad_norm 3.7231 (inf) loss_scale 64.0000 (125.3724) mem 16699MB [2024-08-11 10:14:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][350/625] eta 0:02:04 lr 0.000044 wd 0.0500 time 0.4489 (0.4522) data time 0.0006 (0.0026) model time 0.4483 (0.4495) loss 2.5287 (2.4367) grad_norm 2.1510 (inf) loss_scale 64.0000 (123.6239) mem 16699MB [2024-08-11 10:14:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][360/625] eta 0:01:59 lr 0.000044 wd 0.0500 time 0.4499 (0.4521) data time 0.0007 (0.0025) model time 0.4492 (0.4494) loss 2.8586 (2.4400) grad_norm 2.1674 (inf) loss_scale 64.0000 (121.9723) mem 16699MB [2024-08-11 10:14:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][370/625] eta 0:01:55 lr 0.000044 wd 0.0500 time 0.4523 (0.4520) data time 0.0006 (0.0025) model time 0.4517 (0.4494) loss 2.1850 (2.4402) grad_norm 1.8408 (inf) loss_scale 64.0000 (120.4097) mem 16699MB [2024-08-11 10:14:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][380/625] eta 0:01:50 lr 0.000044 wd 0.0500 time 0.4519 (0.4519) data time 0.0006 (0.0024) model time 0.4512 (0.4494) loss 2.2715 (2.4335) grad_norm 2.9438 (inf) loss_scale 64.0000 (118.9291) mem 16699MB [2024-08-11 10:14:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][390/625] eta 0:01:46 lr 0.000044 wd 0.0500 time 0.4491 (0.4518) data time 0.0008 (0.0024) model time 0.4483 (0.4493) loss 2.4465 (2.4347) grad_norm 2.0420 (inf) loss_scale 64.0000 (117.5243) mem 16699MB [2024-08-11 10:14:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][400/625] eta 0:01:41 lr 0.000044 wd 0.0500 time 0.4483 (0.4517) data time 0.0006 (0.0024) model time 0.4477 (0.4492) loss 2.5956 (2.4372) grad_norm 1.9032 (inf) loss_scale 64.0000 (116.1895) mem 16699MB [2024-08-11 10:14:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][410/625] eta 0:01:37 lr 0.000044 wd 0.0500 time 0.4484 (0.4516) data time 0.0010 (0.0023) model time 0.4474 (0.4492) loss 2.8679 (2.4380) grad_norm 3.8492 (inf) loss_scale 64.0000 (114.9197) mem 16699MB [2024-08-11 10:15:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][420/625] eta 0:01:32 lr 0.000044 wd 0.0500 time 0.4714 (0.4517) data time 0.0008 (0.0023) model time 0.4705 (0.4493) loss 2.5765 (2.4412) grad_norm 2.7469 (inf) loss_scale 64.0000 (113.7102) mem 16699MB [2024-08-11 10:15:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][430/625] eta 0:01:28 lr 0.000044 wd 0.0500 time 0.4460 (0.4516) data time 0.0008 (0.0023) model time 0.4452 (0.4492) loss 1.9476 (2.4396) grad_norm 1.7013 (inf) loss_scale 64.0000 (112.5568) mem 16699MB [2024-08-11 10:15:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][440/625] eta 0:01:23 lr 0.000044 wd 0.0500 time 0.4465 (0.4515) data time 0.0007 (0.0022) model time 0.4458 (0.4492) loss 2.8689 (2.4385) grad_norm 2.1691 (inf) loss_scale 64.0000 (111.4558) mem 16699MB [2024-08-11 10:15:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][450/625] eta 0:01:19 lr 0.000044 wd 0.0500 time 0.4464 (0.4514) data time 0.0007 (0.0022) model time 0.4457 (0.4491) loss 1.8587 (2.4386) grad_norm 2.1407 (inf) loss_scale 64.0000 (110.4035) mem 16699MB [2024-08-11 10:15:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][460/625] eta 0:01:14 lr 0.000044 wd 0.0500 time 0.4463 (0.4513) data time 0.0010 (0.0022) model time 0.4453 (0.4491) loss 2.2537 (2.4368) grad_norm 2.5339 (inf) loss_scale 64.0000 (109.3970) mem 16699MB [2024-08-11 10:15:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][470/625] eta 0:01:09 lr 0.000044 wd 0.0500 time 0.4453 (0.4516) data time 0.0008 (0.0021) model time 0.4445 (0.4493) loss 2.7746 (2.4377) grad_norm 3.6536 (inf) loss_scale 64.0000 (108.4331) mem 16699MB [2024-08-11 10:15:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][480/625] eta 0:01:05 lr 0.000044 wd 0.0500 time 0.4478 (0.4515) data time 0.0006 (0.0021) model time 0.4472 (0.4493) loss 2.7634 (2.4385) grad_norm 2.3010 (inf) loss_scale 64.0000 (107.5094) mem 16699MB [2024-08-11 10:15:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][490/625] eta 0:01:00 lr 0.000044 wd 0.0500 time 0.4492 (0.4514) data time 0.0006 (0.0021) model time 0.4486 (0.4492) loss 2.4975 (2.4347) grad_norm 3.1468 (inf) loss_scale 64.0000 (106.6232) mem 16699MB [2024-08-11 10:15:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][500/625] eta 0:00:56 lr 0.000044 wd 0.0500 time 0.4514 (0.4513) data time 0.0008 (0.0021) model time 0.4506 (0.4492) loss 2.8280 (2.4386) grad_norm 2.4095 (inf) loss_scale 64.0000 (105.7725) mem 16699MB [2024-08-11 10:15:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][510/625] eta 0:00:51 lr 0.000044 wd 0.0500 time 0.4478 (0.4513) data time 0.0011 (0.0020) model time 0.4467 (0.4492) loss 2.7917 (2.4439) grad_norm 2.2549 (inf) loss_scale 64.0000 (104.9550) mem 16699MB [2024-08-11 10:15:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][520/625] eta 0:00:47 lr 0.000044 wd 0.0500 time 0.4489 (0.4513) data time 0.0007 (0.0020) model time 0.4482 (0.4492) loss 2.6049 (2.4436) grad_norm 2.9953 (inf) loss_scale 64.0000 (104.1689) mem 16699MB [2024-08-11 10:15:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][530/625] eta 0:00:42 lr 0.000043 wd 0.0500 time 0.4485 (0.4513) data time 0.0008 (0.0020) model time 0.4477 (0.4492) loss 2.3694 (2.4421) grad_norm 11.3264 (inf) loss_scale 64.0000 (103.4124) mem 16699MB [2024-08-11 10:15:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][540/625] eta 0:00:38 lr 0.000043 wd 0.0500 time 0.6388 (0.4516) data time 0.0007 (0.0020) model time 0.6381 (0.4495) loss 2.6765 (2.4396) grad_norm 2.0162 (inf) loss_scale 64.0000 (102.6839) mem 16699MB [2024-08-11 10:16:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][550/625] eta 0:00:33 lr 0.000043 wd 0.0500 time 0.4460 (0.4515) data time 0.0007 (0.0019) model time 0.4453 (0.4495) loss 2.4981 (2.4395) grad_norm 2.0507 (inf) loss_scale 64.0000 (101.9819) mem 16699MB [2024-08-11 10:16:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][560/625] eta 0:00:29 lr 0.000043 wd 0.0500 time 0.4478 (0.4514) data time 0.0008 (0.0019) model time 0.4470 (0.4494) loss 1.7118 (2.4400) grad_norm 3.2605 (inf) loss_scale 64.0000 (101.3048) mem 16699MB [2024-08-11 10:16:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][570/625] eta 0:00:24 lr 0.000043 wd 0.0500 time 0.4523 (0.4514) data time 0.0009 (0.0019) model time 0.4514 (0.4494) loss 2.3077 (2.4390) grad_norm 3.0912 (inf) loss_scale 64.0000 (100.6515) mem 16699MB [2024-08-11 10:16:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][580/625] eta 0:00:20 lr 0.000043 wd 0.0500 time 0.4510 (0.4513) data time 0.0007 (0.0019) model time 0.4503 (0.4494) loss 2.7322 (2.4384) grad_norm 3.2959 (inf) loss_scale 64.0000 (100.0207) mem 16699MB [2024-08-11 10:16:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][590/625] eta 0:00:15 lr 0.000043 wd 0.0500 time 0.4467 (0.4513) data time 0.0010 (0.0019) model time 0.4456 (0.4493) loss 2.6812 (2.4381) grad_norm 3.9138 (inf) loss_scale 64.0000 (99.4112) mem 16699MB [2024-08-11 10:16:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][600/625] eta 0:00:11 lr 0.000043 wd 0.0500 time 0.4473 (0.4512) data time 0.0009 (0.0019) model time 0.4464 (0.4493) loss 3.1209 (2.4394) grad_norm 2.0231 (inf) loss_scale 64.0000 (98.8220) mem 16699MB [2024-08-11 10:16:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][610/625] eta 0:00:06 lr 0.000043 wd 0.0500 time 0.4456 (0.4512) data time 0.0004 (0.0018) model time 0.4452 (0.4492) loss 2.6091 (2.4398) grad_norm 3.8795 (inf) loss_scale 64.0000 (98.2520) mem 16699MB [2024-08-11 10:16:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [270/300][620/625] eta 0:00:02 lr 0.000043 wd 0.0500 time 0.4424 (0.4510) data time 0.0006 (0.0018) model time 0.4418 (0.4491) loss 2.7397 (2.4383) grad_norm 3.0314 (inf) loss_scale 64.0000 (97.7005) mem 16699MB [2024-08-11 10:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 270 training takes 0:04:41 [2024-08-11 10:16:36 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:16:37 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:16:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.485 (0.485) Loss 0.5259 (0.5259) Acc@1 89.111 (89.111) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 10:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.153) Loss 0.8564 (0.6331) Acc@1 80.518 (86.910) Acc@5 96.191 (97.749) Mem 16699MB [2024-08-11 10:16:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9316 (0.7537) Acc@1 79.102 (84.073) Acc@5 95.166 (96.666) Mem 16699MB [2024-08-11 10:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.621 [2024-08-11 10:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.790 (0.790) Loss 0.5151 (0.5151) Acc@1 89.453 (89.453) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:16:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.183) Loss 0.8271 (0.6154) Acc@1 81.250 (87.092) Acc@5 96.387 (97.789) Mem 16699MB [2024-08-11 10:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.9043 (0.7323) Acc@1 79.395 (84.298) Acc@5 95.508 (96.768) Mem 16699MB [2024-08-11 10:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.043 Acc@5 96.717 [2024-08-11 10:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:16:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][0/625] eta 0:12:39 lr 0.000043 wd 0.0500 time 1.2149 (1.2149) data time 0.5427 (0.5427) model time 0.0000 (0.0000) loss 1.8329 (1.8329) grad_norm inf (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][10/625] eta 0:05:18 lr 0.000043 wd 0.0500 time 0.4451 (0.5183) data time 0.0009 (0.0501) model time 0.0000 (0.0000) loss 1.6382 (2.2647) grad_norm 2.3095 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:16:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][20/625] eta 0:04:53 lr 0.000043 wd 0.0500 time 0.4473 (0.4854) data time 0.0008 (0.0267) model time 0.0000 (0.0000) loss 2.7339 (2.3670) grad_norm 1.8495 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][30/625] eta 0:04:41 lr 0.000043 wd 0.0500 time 0.4475 (0.4735) data time 0.0009 (0.0183) model time 0.0000 (0.0000) loss 2.6062 (2.4053) grad_norm 2.3191 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][40/625] eta 0:04:33 lr 0.000043 wd 0.0500 time 0.4498 (0.4672) data time 0.0006 (0.0141) model time 0.0000 (0.0000) loss 1.4773 (2.4209) grad_norm 2.7650 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][50/625] eta 0:04:26 lr 0.000043 wd 0.0500 time 0.4525 (0.4634) data time 0.0008 (0.0115) model time 0.0000 (0.0000) loss 2.8312 (2.4409) grad_norm 2.0569 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][60/625] eta 0:04:20 lr 0.000043 wd 0.0500 time 0.4452 (0.4608) data time 0.0006 (0.0097) model time 0.4446 (0.4464) loss 2.8871 (2.4138) grad_norm 2.1218 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][70/625] eta 0:04:14 lr 0.000043 wd 0.0500 time 0.4480 (0.4587) data time 0.0006 (0.0085) model time 0.4474 (0.4458) loss 2.3909 (2.4263) grad_norm 2.9971 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][80/625] eta 0:04:09 lr 0.000043 wd 0.0500 time 0.4461 (0.4579) data time 0.0007 (0.0076) model time 0.4453 (0.4476) loss 1.9769 (2.4393) grad_norm 2.5358 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][90/625] eta 0:04:04 lr 0.000043 wd 0.0500 time 0.4523 (0.4569) data time 0.0009 (0.0069) model time 0.4515 (0.4476) loss 2.7094 (2.4486) grad_norm 2.3989 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][100/625] eta 0:04:00 lr 0.000043 wd 0.0500 time 0.4474 (0.4582) data time 0.0007 (0.0063) model time 0.4467 (0.4520) loss 2.3289 (2.4286) grad_norm 2.2949 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][110/625] eta 0:03:55 lr 0.000043 wd 0.0500 time 0.4497 (0.4574) data time 0.0006 (0.0058) model time 0.4491 (0.4513) loss 2.0600 (2.4403) grad_norm 3.0992 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][120/625] eta 0:03:50 lr 0.000043 wd 0.0500 time 0.4469 (0.4566) data time 0.0006 (0.0054) model time 0.4463 (0.4507) loss 2.8417 (2.4321) grad_norm 8.2131 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][130/625] eta 0:03:45 lr 0.000043 wd 0.0500 time 0.4477 (0.4559) data time 0.0006 (0.0050) model time 0.4471 (0.4502) loss 3.3027 (2.4421) grad_norm 3.5854 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][140/625] eta 0:03:40 lr 0.000043 wd 0.0500 time 0.4479 (0.4553) data time 0.0008 (0.0047) model time 0.4472 (0.4498) loss 2.3722 (2.4475) grad_norm 3.2286 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][150/625] eta 0:03:36 lr 0.000043 wd 0.0500 time 0.4424 (0.4548) data time 0.0007 (0.0045) model time 0.4417 (0.4495) loss 2.6766 (2.4441) grad_norm 1.7871 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:17:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][160/625] eta 0:03:31 lr 0.000043 wd 0.0500 time 0.4512 (0.4544) data time 0.0008 (0.0042) model time 0.4504 (0.4494) loss 2.4538 (2.4430) grad_norm 1.9318 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][170/625] eta 0:03:26 lr 0.000043 wd 0.0500 time 0.4471 (0.4541) data time 0.0006 (0.0040) model time 0.4465 (0.4493) loss 2.6741 (2.4423) grad_norm 17.7941 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][180/625] eta 0:03:21 lr 0.000043 wd 0.0500 time 0.4467 (0.4538) data time 0.0006 (0.0039) model time 0.4461 (0.4492) loss 2.1786 (2.4529) grad_norm 2.0719 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][190/625] eta 0:03:17 lr 0.000043 wd 0.0500 time 0.4487 (0.4536) data time 0.0006 (0.0037) model time 0.4481 (0.4492) loss 1.5612 (2.4418) grad_norm 2.2864 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][200/625] eta 0:03:12 lr 0.000042 wd 0.0500 time 0.4498 (0.4534) data time 0.0007 (0.0036) model time 0.4491 (0.4491) loss 2.6936 (2.4488) grad_norm 3.4038 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][210/625] eta 0:03:08 lr 0.000042 wd 0.0500 time 0.4448 (0.4531) data time 0.0007 (0.0034) model time 0.4441 (0.4490) loss 2.5013 (2.4520) grad_norm 2.2786 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][220/625] eta 0:03:03 lr 0.000042 wd 0.0500 time 0.4425 (0.4529) data time 0.0007 (0.0033) model time 0.4419 (0.4489) loss 1.6706 (2.4483) grad_norm 2.2750 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][230/625] eta 0:02:58 lr 0.000042 wd 0.0500 time 0.4481 (0.4527) data time 0.0006 (0.0032) model time 0.4474 (0.4488) loss 2.3431 (2.4502) grad_norm 2.3385 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][240/625] eta 0:02:54 lr 0.000042 wd 0.0500 time 0.4563 (0.4534) data time 0.0007 (0.0031) model time 0.4556 (0.4499) loss 2.1581 (2.4596) grad_norm 4.1541 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][250/625] eta 0:02:49 lr 0.000042 wd 0.0500 time 0.4496 (0.4533) data time 0.0009 (0.0030) model time 0.4487 (0.4499) loss 2.9528 (2.4597) grad_norm 2.8622 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][260/625] eta 0:02:45 lr 0.000042 wd 0.0500 time 0.4493 (0.4531) data time 0.0006 (0.0029) model time 0.4487 (0.4498) loss 2.9073 (2.4623) grad_norm 2.3795 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][270/625] eta 0:02:40 lr 0.000042 wd 0.0500 time 0.4507 (0.4530) data time 0.0006 (0.0029) model time 0.4501 (0.4497) loss 1.9589 (2.4580) grad_norm 2.0891 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][280/625] eta 0:02:36 lr 0.000042 wd 0.0500 time 0.4511 (0.4529) data time 0.0006 (0.0028) model time 0.4505 (0.4497) loss 2.4490 (2.4626) grad_norm 2.2664 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:18:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][290/625] eta 0:02:31 lr 0.000042 wd 0.0500 time 0.4454 (0.4527) data time 0.0007 (0.0027) model time 0.4448 (0.4496) loss 2.7130 (2.4574) grad_norm 2.6654 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][300/625] eta 0:02:27 lr 0.000042 wd 0.0500 time 0.4470 (0.4530) data time 0.0007 (0.0026) model time 0.4463 (0.4500) loss 2.1295 (2.4566) grad_norm 2.2176 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][310/625] eta 0:02:22 lr 0.000042 wd 0.0500 time 0.4555 (0.4529) data time 0.0008 (0.0026) model time 0.4547 (0.4500) loss 2.9943 (2.4572) grad_norm 3.1842 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][320/625] eta 0:02:18 lr 0.000042 wd 0.0500 time 0.4538 (0.4528) data time 0.0008 (0.0025) model time 0.4530 (0.4500) loss 2.7214 (2.4606) grad_norm 2.7105 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][330/625] eta 0:02:13 lr 0.000042 wd 0.0500 time 0.4464 (0.4528) data time 0.0007 (0.0025) model time 0.4457 (0.4500) loss 2.8948 (2.4617) grad_norm 2.3455 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][340/625] eta 0:02:08 lr 0.000042 wd 0.0500 time 0.4497 (0.4526) data time 0.0007 (0.0024) model time 0.4489 (0.4499) loss 2.7381 (2.4610) grad_norm 2.2352 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][350/625] eta 0:02:04 lr 0.000042 wd 0.0500 time 0.4459 (0.4524) data time 0.0006 (0.0024) model time 0.4453 (0.4497) loss 2.8803 (2.4626) grad_norm 19.6105 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][360/625] eta 0:01:59 lr 0.000042 wd 0.0500 time 0.4465 (0.4522) data time 0.0006 (0.0023) model time 0.4459 (0.4496) loss 2.2926 (2.4614) grad_norm 2.3204 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][370/625] eta 0:01:55 lr 0.000042 wd 0.0500 time 0.4469 (0.4521) data time 0.0007 (0.0023) model time 0.4462 (0.4495) loss 1.9368 (2.4657) grad_norm 2.2490 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][380/625] eta 0:01:50 lr 0.000042 wd 0.0500 time 0.4496 (0.4520) data time 0.0007 (0.0023) model time 0.4489 (0.4494) loss 2.5026 (2.4635) grad_norm 3.1597 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][390/625] eta 0:01:46 lr 0.000042 wd 0.0500 time 0.4480 (0.4519) data time 0.0008 (0.0022) model time 0.4472 (0.4493) loss 2.8176 (2.4581) grad_norm 2.6853 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][400/625] eta 0:01:41 lr 0.000042 wd 0.0500 time 0.4464 (0.4518) data time 0.0010 (0.0022) model time 0.4454 (0.4493) loss 2.0410 (2.4571) grad_norm 2.3391 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][410/625] eta 0:01:37 lr 0.000042 wd 0.0500 time 0.4452 (0.4517) data time 0.0009 (0.0022) model time 0.4443 (0.4492) loss 2.8259 (2.4581) grad_norm 3.3117 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][420/625] eta 0:01:32 lr 0.000042 wd 0.0500 time 0.4471 (0.4516) data time 0.0008 (0.0021) model time 0.4462 (0.4491) loss 2.9643 (2.4571) grad_norm 4.0973 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:19:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][430/625] eta 0:01:28 lr 0.000042 wd 0.0500 time 0.4457 (0.4515) data time 0.0009 (0.0021) model time 0.4448 (0.4490) loss 1.5399 (2.4589) grad_norm 2.5333 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][440/625] eta 0:01:23 lr 0.000042 wd 0.0500 time 0.4464 (0.4517) data time 0.0007 (0.0021) model time 0.4458 (0.4494) loss 1.7299 (2.4620) grad_norm 2.4473 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][450/625] eta 0:01:19 lr 0.000042 wd 0.0500 time 0.4455 (0.4516) data time 0.0006 (0.0020) model time 0.4449 (0.4493) loss 3.0102 (2.4621) grad_norm 2.3596 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][460/625] eta 0:01:14 lr 0.000042 wd 0.0500 time 0.4444 (0.4516) data time 0.0009 (0.0020) model time 0.4436 (0.4493) loss 2.5160 (2.4618) grad_norm 2.5543 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][470/625] eta 0:01:09 lr 0.000042 wd 0.0500 time 0.4443 (0.4515) data time 0.0007 (0.0020) model time 0.4436 (0.4492) loss 1.8293 (2.4603) grad_norm 4.7421 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][480/625] eta 0:01:05 lr 0.000042 wd 0.0500 time 0.4465 (0.4514) data time 0.0007 (0.0020) model time 0.4458 (0.4492) loss 2.7423 (2.4603) grad_norm 12.1936 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][490/625] eta 0:01:00 lr 0.000042 wd 0.0500 time 0.4454 (0.4513) data time 0.0006 (0.0019) model time 0.4448 (0.4491) loss 3.0935 (2.4644) grad_norm 2.6036 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][500/625] eta 0:00:56 lr 0.000041 wd 0.0500 time 0.4454 (0.4513) data time 0.0007 (0.0019) model time 0.4447 (0.4490) loss 2.6856 (2.4626) grad_norm 6.1948 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][510/625] eta 0:00:51 lr 0.000041 wd 0.0500 time 0.4435 (0.4512) data time 0.0007 (0.0019) model time 0.4428 (0.4490) loss 1.7138 (2.4576) grad_norm 3.9154 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][520/625] eta 0:00:47 lr 0.000041 wd 0.0500 time 0.4449 (0.4511) data time 0.0009 (0.0019) model time 0.4440 (0.4489) loss 2.5242 (2.4561) grad_norm 2.6262 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][530/625] eta 0:00:42 lr 0.000041 wd 0.0500 time 0.4454 (0.4510) data time 0.0009 (0.0019) model time 0.4445 (0.4489) loss 2.5145 (2.4542) grad_norm 2.1085 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][540/625] eta 0:00:38 lr 0.000041 wd 0.0500 time 0.4451 (0.4510) data time 0.0007 (0.0018) model time 0.4444 (0.4488) loss 2.3260 (2.4541) grad_norm 2.0616 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][550/625] eta 0:00:33 lr 0.000041 wd 0.0500 time 0.4453 (0.4509) data time 0.0006 (0.0018) model time 0.4447 (0.4488) loss 2.4217 (2.4508) grad_norm 2.5995 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:20:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][560/625] eta 0:00:29 lr 0.000041 wd 0.0500 time 0.4458 (0.4508) data time 0.0008 (0.0018) model time 0.4450 (0.4487) loss 2.7418 (2.4497) grad_norm 3.5713 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][570/625] eta 0:00:24 lr 0.000041 wd 0.0500 time 0.4433 (0.4508) data time 0.0009 (0.0018) model time 0.4424 (0.4487) loss 2.6096 (2.4478) grad_norm 3.3983 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][580/625] eta 0:00:20 lr 0.000041 wd 0.0500 time 0.4503 (0.4511) data time 0.0009 (0.0018) model time 0.4495 (0.4490) loss 2.5492 (2.4497) grad_norm 7.2157 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][590/625] eta 0:00:15 lr 0.000041 wd 0.0500 time 0.4525 (0.4510) data time 0.0008 (0.0018) model time 0.4517 (0.4490) loss 2.4790 (2.4513) grad_norm 2.1728 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][600/625] eta 0:00:11 lr 0.000041 wd 0.0500 time 0.4490 (0.4509) data time 0.0009 (0.0017) model time 0.4481 (0.4489) loss 2.6332 (2.4535) grad_norm 2.1558 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][610/625] eta 0:00:06 lr 0.000041 wd 0.0500 time 0.4451 (0.4509) data time 0.0006 (0.0017) model time 0.4445 (0.4489) loss 2.2537 (2.4536) grad_norm 2.8884 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [271/300][620/625] eta 0:00:02 lr 0.000041 wd 0.0500 time 0.4432 (0.4508) data time 0.0006 (0.0017) model time 0.4426 (0.4488) loss 2.5806 (2.4510) grad_norm 3.3598 (inf) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 271 training takes 0:04:41 [2024-08-11 10:21:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:21:28 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:21:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.474 (0.474) Loss 0.5288 (0.5288) Acc@1 88.867 (88.867) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8560 (0.6311) Acc@1 80.371 (86.892) Acc@5 96.045 (97.741) Mem 16699MB [2024-08-11 10:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9502 (0.7522) Acc@1 79.199 (84.066) Acc@5 95.410 (96.656) Mem 16699MB [2024-08-11 10:21:31 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.807 Acc@5 96.603 [2024-08-11 10:21:31 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:21:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.830 (0.830) Loss 0.5156 (0.5156) Acc@1 89.453 (89.453) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 10:21:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.185) Loss 0.8281 (0.6159) Acc@1 81.104 (87.114) Acc@5 96.338 (97.789) Mem 16699MB [2024-08-11 10:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.152) Loss 0.9058 (0.7333) Acc@1 79.443 (84.296) Acc@5 95.508 (96.770) Mem 16699MB [2024-08-11 10:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.047 Acc@5 96.715 [2024-08-11 10:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:21:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][0/625] eta 0:12:55 lr 0.000041 wd 0.0500 time 1.2407 (1.2407) data time 0.4167 (0.4167) model time 0.0000 (0.0000) loss 2.8890 (2.8890) grad_norm 2.3237 (2.3237) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][10/625] eta 0:05:19 lr 0.000041 wd 0.0500 time 0.4500 (0.5201) data time 0.0008 (0.0386) model time 0.0000 (0.0000) loss 2.5634 (2.4056) grad_norm 2.6241 (3.9516) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][20/625] eta 0:04:53 lr 0.000041 wd 0.0500 time 0.4457 (0.4857) data time 0.0009 (0.0206) model time 0.0000 (0.0000) loss 2.8457 (2.4859) grad_norm 24.2655 (4.5540) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][30/625] eta 0:04:42 lr 0.000041 wd 0.0500 time 0.4478 (0.4740) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 2.1446 (2.4761) grad_norm 2.6463 (4.1852) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][40/625] eta 0:04:33 lr 0.000041 wd 0.0500 time 0.4505 (0.4682) data time 0.0006 (0.0110) model time 0.0000 (0.0000) loss 2.3756 (2.4503) grad_norm 2.1378 (3.9847) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:21:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][50/625] eta 0:04:27 lr 0.000041 wd 0.0500 time 0.4520 (0.4647) data time 0.0008 (0.0090) model time 0.0000 (0.0000) loss 2.0557 (2.4568) grad_norm 2.4424 (3.6663) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][60/625] eta 0:04:21 lr 0.000041 wd 0.0500 time 0.4471 (0.4628) data time 0.0007 (0.0077) model time 0.4465 (0.4523) loss 1.8632 (2.4743) grad_norm 5.1988 (3.5763) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][70/625] eta 0:04:15 lr 0.000041 wd 0.0500 time 0.4496 (0.4607) data time 0.0008 (0.0067) model time 0.4488 (0.4497) loss 2.4776 (2.4649) grad_norm 9.8582 (3.5755) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][80/625] eta 0:04:10 lr 0.000041 wd 0.0500 time 0.4520 (0.4591) data time 0.0007 (0.0060) model time 0.4514 (0.4488) loss 2.9317 (2.4760) grad_norm 2.6678 (3.5294) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][90/625] eta 0:04:04 lr 0.000041 wd 0.0500 time 0.4497 (0.4578) data time 0.0007 (0.0054) model time 0.4491 (0.4481) loss 2.2396 (2.4739) grad_norm 2.1308 (3.5310) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][100/625] eta 0:03:59 lr 0.000041 wd 0.0500 time 0.4479 (0.4567) data time 0.0006 (0.0049) model time 0.4473 (0.4478) loss 1.6385 (2.4574) grad_norm 1.9450 (3.4345) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][110/625] eta 0:03:54 lr 0.000041 wd 0.0500 time 0.4504 (0.4559) data time 0.0008 (0.0046) model time 0.4495 (0.4477) loss 1.4954 (2.4534) grad_norm 12.3972 (3.6183) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][120/625] eta 0:03:49 lr 0.000041 wd 0.0500 time 0.4515 (0.4553) data time 0.0006 (0.0043) model time 0.4509 (0.4477) loss 2.3437 (2.4394) grad_norm 2.2846 (3.6001) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][130/625] eta 0:03:45 lr 0.000041 wd 0.0500 time 0.4495 (0.4548) data time 0.0009 (0.0040) model time 0.4486 (0.4476) loss 2.2578 (2.4346) grad_norm 2.3067 (3.5147) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][140/625] eta 0:03:40 lr 0.000041 wd 0.0500 time 0.4460 (0.4542) data time 0.0009 (0.0038) model time 0.4451 (0.4475) loss 2.6171 (2.4396) grad_norm 2.3891 (3.4879) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][150/625] eta 0:03:35 lr 0.000041 wd 0.0500 time 0.4501 (0.4538) data time 0.0008 (0.0036) model time 0.4494 (0.4474) loss 2.5810 (2.4404) grad_norm 3.0934 (3.4390) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][160/625] eta 0:03:30 lr 0.000041 wd 0.0500 time 0.4440 (0.4534) data time 0.0006 (0.0034) model time 0.4434 (0.4474) loss 2.5232 (2.4413) grad_norm 3.8186 (3.4349) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][170/625] eta 0:03:26 lr 0.000041 wd 0.0500 time 0.4453 (0.4530) data time 0.0008 (0.0032) model time 0.4445 (0.4473) loss 2.4457 (2.4340) grad_norm 2.4877 (3.4446) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:22:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][180/625] eta 0:03:21 lr 0.000040 wd 0.0500 time 0.4438 (0.4528) data time 0.0008 (0.0031) model time 0.4430 (0.4474) loss 2.5789 (2.4326) grad_norm 3.1254 (3.4351) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][190/625] eta 0:03:16 lr 0.000040 wd 0.0500 time 0.4474 (0.4526) data time 0.0008 (0.0030) model time 0.4467 (0.4474) loss 2.1684 (2.4187) grad_norm 4.4071 (3.4010) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][200/625] eta 0:03:12 lr 0.000040 wd 0.0500 time 0.4510 (0.4524) data time 0.0008 (0.0029) model time 0.4503 (0.4474) loss 2.8892 (2.4252) grad_norm 2.8489 (3.3649) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][210/625] eta 0:03:07 lr 0.000040 wd 0.0500 time 0.4483 (0.4522) data time 0.0007 (0.0028) model time 0.4477 (0.4475) loss 2.8018 (2.4201) grad_norm 2.3056 (3.3456) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][220/625] eta 0:03:03 lr 0.000040 wd 0.0500 time 0.4565 (0.4520) data time 0.0008 (0.0027) model time 0.4556 (0.4474) loss 2.5441 (2.4189) grad_norm 2.4960 (3.3336) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][230/625] eta 0:02:58 lr 0.000040 wd 0.0500 time 0.4468 (0.4518) data time 0.0008 (0.0026) model time 0.4460 (0.4473) loss 2.8264 (2.4244) grad_norm 2.8235 (3.3176) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][240/625] eta 0:02:53 lr 0.000040 wd 0.0500 time 0.4472 (0.4516) data time 0.0005 (0.0025) model time 0.4467 (0.4472) loss 2.5472 (2.4240) grad_norm 2.5069 (3.2862) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][250/625] eta 0:02:49 lr 0.000040 wd 0.0500 time 0.4364 (0.4523) data time 0.0011 (0.0025) model time 0.4353 (0.4484) loss 2.4307 (2.4283) grad_norm 2.3446 (3.2470) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][260/625] eta 0:02:45 lr 0.000040 wd 0.0500 time 0.4508 (0.4522) data time 0.0006 (0.0024) model time 0.4501 (0.4484) loss 2.6089 (2.4207) grad_norm 2.9663 (3.2333) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][270/625] eta 0:02:40 lr 0.000040 wd 0.0500 time 0.4504 (0.4522) data time 0.0007 (0.0023) model time 0.4497 (0.4485) loss 3.0337 (2.4310) grad_norm 1.6908 (3.2192) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][280/625] eta 0:02:36 lr 0.000040 wd 0.0500 time 0.4503 (0.4525) data time 0.0007 (0.0023) model time 0.4496 (0.4491) loss 2.7044 (2.4366) grad_norm 1.9947 (3.1981) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][290/625] eta 0:02:31 lr 0.000040 wd 0.0500 time 0.4512 (0.4524) data time 0.0006 (0.0022) model time 0.4506 (0.4490) loss 3.0862 (2.4429) grad_norm 2.6247 (3.1729) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][300/625] eta 0:02:26 lr 0.000040 wd 0.0500 time 0.4461 (0.4522) data time 0.0006 (0.0022) model time 0.4455 (0.4489) loss 2.5496 (2.4422) grad_norm 2.3396 (3.1732) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][310/625] eta 0:02:22 lr 0.000040 wd 0.0500 time 0.4510 (0.4521) data time 0.0006 (0.0022) model time 0.4504 (0.4488) loss 2.1532 (2.4434) grad_norm 2.4698 (3.1930) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:23:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][320/625] eta 0:02:17 lr 0.000040 wd 0.0500 time 0.4498 (0.4519) data time 0.0006 (0.0021) model time 0.4492 (0.4487) loss 2.9740 (2.4451) grad_norm 1.8052 (3.1722) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][330/625] eta 0:02:13 lr 0.000040 wd 0.0500 time 0.4466 (0.4518) data time 0.0006 (0.0021) model time 0.4460 (0.4487) loss 2.9003 (2.4492) grad_norm 2.3006 (3.2145) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][340/625] eta 0:02:08 lr 0.000040 wd 0.0500 time 0.4496 (0.4518) data time 0.0006 (0.0020) model time 0.4490 (0.4487) loss 2.8899 (2.4557) grad_norm 7.1192 (3.2095) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][350/625] eta 0:02:04 lr 0.000040 wd 0.0500 time 0.4542 (0.4517) data time 0.0007 (0.0020) model time 0.4535 (0.4487) loss 2.3343 (2.4534) grad_norm 11.8808 (3.2099) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][360/625] eta 0:01:59 lr 0.000040 wd 0.0500 time 0.4505 (0.4517) data time 0.0008 (0.0020) model time 0.4496 (0.4487) loss 2.9996 (2.4510) grad_norm 2.4474 (3.2139) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][370/625] eta 0:01:55 lr 0.000040 wd 0.0500 time 0.4456 (0.4515) data time 0.0006 (0.0019) model time 0.4450 (0.4486) loss 1.9465 (2.4523) grad_norm 2.8055 (3.2280) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][380/625] eta 0:01:50 lr 0.000040 wd 0.0500 time 0.4479 (0.4514) data time 0.0007 (0.0019) model time 0.4472 (0.4485) loss 2.3041 (2.4462) grad_norm 2.3997 (3.2226) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][390/625] eta 0:01:46 lr 0.000040 wd 0.0500 time 0.4499 (0.4513) data time 0.0008 (0.0019) model time 0.4491 (0.4485) loss 2.7851 (2.4516) grad_norm 3.2614 (3.2072) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][400/625] eta 0:01:41 lr 0.000040 wd 0.0500 time 0.4480 (0.4512) data time 0.0008 (0.0019) model time 0.4473 (0.4484) loss 2.4037 (2.4483) grad_norm 4.2758 (3.2132) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][410/625] eta 0:01:37 lr 0.000040 wd 0.0500 time 0.4548 (0.4512) data time 0.0009 (0.0018) model time 0.4539 (0.4485) loss 2.8171 (2.4495) grad_norm 2.4409 (3.2012) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][420/625] eta 0:01:32 lr 0.000040 wd 0.0500 time 0.4501 (0.4511) data time 0.0006 (0.0018) model time 0.4495 (0.4485) loss 1.6355 (2.4496) grad_norm 2.2359 (3.1891) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][430/625] eta 0:01:28 lr 0.000040 wd 0.0500 time 0.4445 (0.4514) data time 0.0009 (0.0018) model time 0.4436 (0.4488) loss 1.9558 (2.4480) grad_norm 3.3854 (3.1745) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][440/625] eta 0:01:23 lr 0.000040 wd 0.0500 time 0.4475 (0.4517) data time 0.0007 (0.0018) model time 0.4468 (0.4492) loss 2.3515 (2.4453) grad_norm 2.9560 (3.1976) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:24:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][450/625] eta 0:01:19 lr 0.000040 wd 0.0500 time 0.4491 (0.4516) data time 0.0007 (0.0018) model time 0.4484 (0.4491) loss 1.6676 (2.4423) grad_norm 1.9005 (3.1822) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][460/625] eta 0:01:14 lr 0.000040 wd 0.0500 time 0.4464 (0.4515) data time 0.0008 (0.0017) model time 0.4456 (0.4490) loss 2.6003 (2.4450) grad_norm 2.3747 (3.1709) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][470/625] eta 0:01:09 lr 0.000040 wd 0.0500 time 0.4541 (0.4515) data time 0.0006 (0.0017) model time 0.4535 (0.4491) loss 2.6204 (2.4451) grad_norm 2.0833 (3.1590) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][480/625] eta 0:01:05 lr 0.000040 wd 0.0500 time 0.4485 (0.4515) data time 0.0007 (0.0017) model time 0.4478 (0.4491) loss 1.5024 (2.4461) grad_norm 2.1437 (3.1697) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][490/625] eta 0:01:00 lr 0.000039 wd 0.0500 time 0.4461 (0.4514) data time 0.0006 (0.0017) model time 0.4455 (0.4491) loss 2.7666 (2.4484) grad_norm 2.3859 (3.2074) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][500/625] eta 0:00:56 lr 0.000039 wd 0.0500 time 0.4457 (0.4514) data time 0.0009 (0.0017) model time 0.4448 (0.4490) loss 3.0785 (2.4503) grad_norm 3.5430 (3.1988) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][510/625] eta 0:00:51 lr 0.000039 wd 0.0500 time 0.4467 (0.4513) data time 0.0006 (0.0016) model time 0.4461 (0.4490) loss 1.9796 (2.4507) grad_norm 2.8858 (3.1855) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][520/625] eta 0:00:47 lr 0.000039 wd 0.0500 time 0.4481 (0.4512) data time 0.0007 (0.0016) model time 0.4474 (0.4489) loss 2.2155 (2.4483) grad_norm 1.7435 (3.1789) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][530/625] eta 0:00:42 lr 0.000039 wd 0.0500 time 0.4441 (0.4511) data time 0.0008 (0.0016) model time 0.4433 (0.4488) loss 2.9534 (2.4456) grad_norm 2.1308 (3.1678) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][540/625] eta 0:00:38 lr 0.000039 wd 0.0500 time 0.4455 (0.4510) data time 0.0006 (0.0016) model time 0.4449 (0.4488) loss 1.7671 (2.4458) grad_norm 2.2298 (3.1614) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][550/625] eta 0:00:33 lr 0.000039 wd 0.0500 time 0.4462 (0.4510) data time 0.0008 (0.0016) model time 0.4454 (0.4488) loss 2.5229 (2.4459) grad_norm 2.6431 (3.6282) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][560/625] eta 0:00:29 lr 0.000039 wd 0.0500 time 0.4494 (0.4510) data time 0.0011 (0.0016) model time 0.4483 (0.4488) loss 2.2525 (2.4490) grad_norm 3.3463 (3.6181) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][570/625] eta 0:00:24 lr 0.000039 wd 0.0500 time 0.4500 (0.4509) data time 0.0009 (0.0016) model time 0.4491 (0.4488) loss 2.8389 (2.4512) grad_norm 2.3896 (3.6073) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:25:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][580/625] eta 0:00:20 lr 0.000039 wd 0.0500 time 0.4497 (0.4509) data time 0.0006 (0.0016) model time 0.4491 (0.4487) loss 2.3772 (2.4492) grad_norm 2.5168 (3.6017) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][590/625] eta 0:00:15 lr 0.000039 wd 0.0500 time 0.4445 (0.4508) data time 0.0010 (0.0015) model time 0.4436 (0.4487) loss 2.4822 (2.4478) grad_norm 2.3741 (3.5830) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][600/625] eta 0:00:11 lr 0.000039 wd 0.0500 time 0.4479 (0.4507) data time 0.0009 (0.0015) model time 0.4470 (0.4486) loss 2.6894 (2.4474) grad_norm 2.7606 (3.6067) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][610/625] eta 0:00:06 lr 0.000039 wd 0.0500 time 0.4434 (0.4507) data time 0.0006 (0.0015) model time 0.4428 (0.4486) loss 2.2394 (2.4468) grad_norm 2.9328 (3.6123) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [272/300][620/625] eta 0:00:02 lr 0.000039 wd 0.0500 time 0.4446 (0.4506) data time 0.0005 (0.0015) model time 0.4442 (0.4485) loss 3.1027 (2.4443) grad_norm 3.0146 (3.6018) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 272 training takes 0:04:41 [2024-08-11 10:26:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:26:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:26:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.482 (0.482) Loss 0.5269 (0.5269) Acc@1 88.916 (88.916) Acc@5 99.072 (99.072) Mem 16699MB [2024-08-11 10:26:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8584 (0.6312) Acc@1 80.225 (86.963) Acc@5 95.898 (97.785) Mem 16699MB [2024-08-11 10:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9312 (0.7538) Acc@1 79.688 (84.149) Acc@5 95.117 (96.659) Mem 16699MB [2024-08-11 10:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.907 Acc@5 96.609 [2024-08-11 10:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 10:26:21 vssm_base_ms_e300] (main_hfai_mnodes.py 265): INFO New max accuracy: 83.91% [2024-08-11 10:26:21 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saving...... [2024-08-11 10:26:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt.pth saved !!! [2024-08-11 10:26:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5161 (0.5161) Acc@1 89.453 (89.453) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 10:26:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.149) Loss 0.8311 (0.6167) Acc@1 80.957 (87.069) Acc@5 96.387 (97.776) Mem 16699MB [2024-08-11 10:26:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.133) Loss 0.9072 (0.7341) Acc@1 79.492 (84.287) Acc@5 95.508 (96.752) Mem 16699MB [2024-08-11 10:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.035 Acc@5 96.703 [2024-08-11 10:26:26 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:26:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][0/625] eta 0:12:49 lr 0.000039 wd 0.0500 time 1.2319 (1.2319) data time 0.5134 (0.5134) model time 0.0000 (0.0000) loss 2.8226 (2.8226) grad_norm 4.4957 (4.4957) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][10/625] eta 0:05:20 lr 0.000039 wd 0.0500 time 0.4554 (0.5216) data time 0.0007 (0.0475) model time 0.0000 (0.0000) loss 2.0906 (2.5207) grad_norm 2.7748 (4.3238) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][20/625] eta 0:04:58 lr 0.000039 wd 0.0500 time 0.4544 (0.4937) data time 0.0008 (0.0253) model time 0.0000 (0.0000) loss 2.1272 (2.4889) grad_norm 2.7965 (3.6727) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][30/625] eta 0:04:48 lr 0.000039 wd 0.0500 time 0.4542 (0.4844) data time 0.0009 (0.0175) model time 0.0000 (0.0000) loss 2.7156 (2.4321) grad_norm 8.6840 (3.8946) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][40/625] eta 0:04:38 lr 0.000039 wd 0.0500 time 0.4496 (0.4760) data time 0.0006 (0.0134) model time 0.0000 (0.0000) loss 2.6479 (2.4355) grad_norm 2.4232 (4.1483) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][50/625] eta 0:04:30 lr 0.000039 wd 0.0500 time 0.4559 (0.4711) data time 0.0006 (0.0109) model time 0.0000 (0.0000) loss 2.9561 (2.4597) grad_norm 2.6787 (3.9020) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][60/625] eta 0:04:24 lr 0.000039 wd 0.0500 time 0.4538 (0.4677) data time 0.0008 (0.0093) model time 0.4531 (0.4494) loss 2.6698 (2.4538) grad_norm 2.1925 (3.7555) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:26:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][70/625] eta 0:04:18 lr 0.000039 wd 0.0500 time 0.4459 (0.4651) data time 0.0010 (0.0081) model time 0.4449 (0.4489) loss 2.5926 (2.4486) grad_norm 2.9351 (3.5816) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][80/625] eta 0:04:12 lr 0.000039 wd 0.0500 time 0.4541 (0.4633) data time 0.0010 (0.0072) model time 0.4532 (0.4491) loss 2.4999 (2.4506) grad_norm 2.1716 (3.5301) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][90/625] eta 0:04:06 lr 0.000039 wd 0.0500 time 0.4473 (0.4615) data time 0.0009 (0.0065) model time 0.4464 (0.4485) loss 2.7013 (2.4587) grad_norm 2.6253 (3.4580) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][100/625] eta 0:04:01 lr 0.000039 wd 0.0500 time 0.4492 (0.4605) data time 0.0008 (0.0060) model time 0.4484 (0.4487) loss 2.7435 (2.4528) grad_norm 2.0573 (3.4556) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][110/625] eta 0:03:56 lr 0.000039 wd 0.0500 time 0.4473 (0.4594) data time 0.0007 (0.0055) model time 0.4467 (0.4485) loss 2.4403 (2.4399) grad_norm 2.1667 (3.4204) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][120/625] eta 0:03:51 lr 0.000039 wd 0.0500 time 0.4492 (0.4585) data time 0.0007 (0.0051) model time 0.4486 (0.4484) loss 2.7262 (2.4383) grad_norm 8.4440 (3.4351) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][130/625] eta 0:03:46 lr 0.000039 wd 0.0500 time 0.4557 (0.4581) data time 0.0008 (0.0048) model time 0.4549 (0.4489) loss 2.7390 (2.4342) grad_norm 3.1854 (3.4023) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][140/625] eta 0:03:41 lr 0.000039 wd 0.0500 time 0.4543 (0.4576) data time 0.0009 (0.0045) model time 0.4534 (0.4491) loss 2.0730 (2.4323) grad_norm 3.2630 (3.4642) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][150/625] eta 0:03:37 lr 0.000039 wd 0.0500 time 0.4458 (0.4570) data time 0.0009 (0.0043) model time 0.4449 (0.4490) loss 2.6641 (2.4216) grad_norm 3.2526 (3.4917) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][160/625] eta 0:03:32 lr 0.000039 wd 0.0500 time 0.4475 (0.4564) data time 0.0008 (0.0041) model time 0.4467 (0.4488) loss 2.3462 (2.4280) grad_norm 1.7719 (3.4239) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][170/625] eta 0:03:27 lr 0.000039 wd 0.0500 time 0.4528 (0.4559) data time 0.0006 (0.0039) model time 0.4522 (0.4485) loss 2.2983 (2.4374) grad_norm 1.9077 (3.3662) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][180/625] eta 0:03:22 lr 0.000038 wd 0.0500 time 0.4516 (0.4554) data time 0.0006 (0.0037) model time 0.4510 (0.4484) loss 2.4800 (2.4478) grad_norm 2.4121 (3.3043) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][190/625] eta 0:03:17 lr 0.000038 wd 0.0500 time 0.4454 (0.4550) data time 0.0009 (0.0036) model time 0.4445 (0.4483) loss 2.6081 (2.4458) grad_norm 4.9240 (3.2721) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:27:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][200/625] eta 0:03:13 lr 0.000038 wd 0.0500 time 0.4502 (0.4547) data time 0.0009 (0.0034) model time 0.4493 (0.4483) loss 2.7124 (2.4498) grad_norm 3.6696 (3.2436) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][210/625] eta 0:03:08 lr 0.000038 wd 0.0500 time 0.4501 (0.4545) data time 0.0006 (0.0033) model time 0.4495 (0.4483) loss 2.7836 (2.4529) grad_norm 3.3900 (3.2489) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][220/625] eta 0:03:03 lr 0.000038 wd 0.0500 time 0.4511 (0.4543) data time 0.0006 (0.0032) model time 0.4505 (0.4484) loss 2.4911 (2.4528) grad_norm 2.8289 (3.2151) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][230/625] eta 0:02:59 lr 0.000038 wd 0.0500 time 0.4458 (0.4541) data time 0.0009 (0.0031) model time 0.4449 (0.4484) loss 1.7905 (2.4469) grad_norm 3.6306 (3.1969) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][240/625] eta 0:02:54 lr 0.000038 wd 0.0500 time 0.4471 (0.4538) data time 0.0006 (0.0030) model time 0.4465 (0.4483) loss 2.8185 (2.4509) grad_norm 2.5780 (3.1888) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][250/625] eta 0:02:50 lr 0.000038 wd 0.0500 time 0.4494 (0.4544) data time 0.0006 (0.0029) model time 0.4488 (0.4492) loss 1.6614 (2.4511) grad_norm 3.0689 (3.2017) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][260/625] eta 0:02:45 lr 0.000038 wd 0.0500 time 0.4496 (0.4541) data time 0.0006 (0.0028) model time 0.4490 (0.4491) loss 2.7817 (2.4448) grad_norm 2.3455 (3.2068) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][270/625] eta 0:02:41 lr 0.000038 wd 0.0500 time 0.4512 (0.4538) data time 0.0008 (0.0028) model time 0.4504 (0.4490) loss 2.6741 (2.4439) grad_norm 2.0037 (3.1774) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][280/625] eta 0:02:36 lr 0.000038 wd 0.0500 time 0.4476 (0.4536) data time 0.0009 (0.0027) model time 0.4468 (0.4489) loss 2.7494 (2.4424) grad_norm 2.3601 (3.1888) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][290/625] eta 0:02:31 lr 0.000038 wd 0.0500 time 0.4436 (0.4534) data time 0.0006 (0.0026) model time 0.4430 (0.4488) loss 2.4808 (2.4351) grad_norm 2.1957 (3.1600) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][300/625] eta 0:02:27 lr 0.000038 wd 0.0500 time 0.4470 (0.4532) data time 0.0009 (0.0026) model time 0.4461 (0.4487) loss 2.5795 (2.4282) grad_norm 2.9166 (3.1550) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][310/625] eta 0:02:22 lr 0.000038 wd 0.0500 time 0.4464 (0.4530) data time 0.0009 (0.0025) model time 0.4455 (0.4486) loss 2.3906 (2.4284) grad_norm 2.5203 (3.1628) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][320/625] eta 0:02:18 lr 0.000038 wd 0.0500 time 0.4505 (0.4529) data time 0.0006 (0.0025) model time 0.4499 (0.4486) loss 2.1950 (2.4287) grad_norm 2.1055 (3.1400) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:28:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][330/625] eta 0:02:13 lr 0.000038 wd 0.0500 time 0.4445 (0.4527) data time 0.0009 (0.0024) model time 0.4436 (0.4485) loss 2.0408 (2.4310) grad_norm 4.1768 (3.1243) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][340/625] eta 0:02:09 lr 0.000038 wd 0.0500 time 0.4520 (0.4527) data time 0.0008 (0.0024) model time 0.4512 (0.4486) loss 2.5223 (2.4368) grad_norm 2.1981 (3.1265) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][350/625] eta 0:02:04 lr 0.000038 wd 0.0500 time 0.4456 (0.4527) data time 0.0008 (0.0023) model time 0.4447 (0.4487) loss 2.4900 (2.4342) grad_norm 3.1579 (3.1166) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][360/625] eta 0:02:00 lr 0.000038 wd 0.0500 time 0.4461 (0.4531) data time 0.0008 (0.0023) model time 0.4453 (0.4492) loss 2.5081 (2.4325) grad_norm 2.6536 (3.1076) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][370/625] eta 0:01:55 lr 0.000038 wd 0.0500 time 0.4501 (0.4530) data time 0.0006 (0.0022) model time 0.4495 (0.4492) loss 2.6565 (2.4337) grad_norm 2.5373 (3.0928) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][380/625] eta 0:01:50 lr 0.000038 wd 0.0500 time 0.4475 (0.4529) data time 0.0007 (0.0022) model time 0.4469 (0.4492) loss 2.0779 (2.4344) grad_norm 2.8053 (3.0950) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][390/625] eta 0:01:46 lr 0.000038 wd 0.0500 time 0.4442 (0.4527) data time 0.0009 (0.0022) model time 0.4434 (0.4491) loss 2.2426 (2.4340) grad_norm 2.7520 (3.0836) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][400/625] eta 0:01:41 lr 0.000038 wd 0.0500 time 0.4468 (0.4531) data time 0.0008 (0.0021) model time 0.4459 (0.4496) loss 2.8202 (2.4334) grad_norm 2.0865 (3.0982) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][410/625] eta 0:01:37 lr 0.000038 wd 0.0500 time 0.4483 (0.4530) data time 0.0007 (0.0021) model time 0.4477 (0.4495) loss 2.8748 (2.4306) grad_norm 2.0734 (3.0987) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][420/625] eta 0:01:32 lr 0.000038 wd 0.0500 time 0.4502 (0.4529) data time 0.0009 (0.0021) model time 0.4494 (0.4495) loss 1.5242 (2.4292) grad_norm 2.0202 (3.0811) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][430/625] eta 0:01:28 lr 0.000038 wd 0.0500 time 0.4510 (0.4529) data time 0.0008 (0.0020) model time 0.4501 (0.4496) loss 2.3016 (2.4345) grad_norm 1.6319 (3.0741) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][440/625] eta 0:01:23 lr 0.000038 wd 0.0500 time 0.4463 (0.4528) data time 0.0008 (0.0020) model time 0.4454 (0.4495) loss 2.5722 (2.4338) grad_norm 2.2127 (3.0697) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][450/625] eta 0:01:19 lr 0.000038 wd 0.0500 time 0.4481 (0.4527) data time 0.0006 (0.0020) model time 0.4475 (0.4495) loss 2.3034 (2.4299) grad_norm 3.9436 (3.1535) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][460/625] eta 0:01:14 lr 0.000038 wd 0.0500 time 0.4459 (0.4526) data time 0.0009 (0.0020) model time 0.4450 (0.4494) loss 2.5290 (2.4305) grad_norm 2.2072 (3.1494) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][470/625] eta 0:01:10 lr 0.000038 wd 0.0500 time 0.4465 (0.4525) data time 0.0009 (0.0019) model time 0.4456 (0.4493) loss 2.4560 (2.4318) grad_norm 2.8283 (3.1422) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][480/625] eta 0:01:05 lr 0.000038 wd 0.0500 time 0.4456 (0.4523) data time 0.0008 (0.0019) model time 0.4448 (0.4493) loss 2.8774 (2.4334) grad_norm 1.9574 (3.1699) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][490/625] eta 0:01:01 lr 0.000038 wd 0.0500 time 0.4472 (0.4523) data time 0.0008 (0.0019) model time 0.4464 (0.4492) loss 2.7959 (2.4340) grad_norm 2.2213 (3.1941) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][500/625] eta 0:00:56 lr 0.000037 wd 0.0500 time 0.4499 (0.4522) data time 0.0006 (0.0019) model time 0.4493 (0.4492) loss 2.8361 (2.4368) grad_norm 2.4613 (3.2156) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][510/625] eta 0:00:51 lr 0.000037 wd 0.0500 time 0.4481 (0.4521) data time 0.0008 (0.0018) model time 0.4473 (0.4492) loss 2.7317 (2.4395) grad_norm 2.6166 (3.2116) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][520/625] eta 0:00:47 lr 0.000037 wd 0.0500 time 0.4565 (0.4521) data time 0.0008 (0.0018) model time 0.4557 (0.4492) loss 2.4524 (2.4393) grad_norm 2.8310 (3.2059) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][530/625] eta 0:00:42 lr 0.000037 wd 0.0500 time 0.4471 (0.4520) data time 0.0008 (0.0018) model time 0.4464 (0.4491) loss 2.1135 (2.4401) grad_norm 3.2981 (3.2048) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][540/625] eta 0:00:38 lr 0.000037 wd 0.0500 time 0.4451 (0.4519) data time 0.0007 (0.0018) model time 0.4445 (0.4491) loss 2.3630 (2.4404) grad_norm 2.4908 (3.1968) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][550/625] eta 0:00:33 lr 0.000037 wd 0.0500 time 0.4457 (0.4519) data time 0.0008 (0.0018) model time 0.4449 (0.4491) loss 2.0191 (2.4405) grad_norm 2.0381 (3.1808) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][560/625] eta 0:00:29 lr 0.000037 wd 0.0500 time 0.4484 (0.4518) data time 0.0007 (0.0018) model time 0.4477 (0.4491) loss 2.5900 (2.4420) grad_norm 1.7841 (3.2049) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][570/625] eta 0:00:24 lr 0.000037 wd 0.0500 time 0.4485 (0.4518) data time 0.0008 (0.0017) model time 0.4477 (0.4490) loss 2.5715 (2.4432) grad_norm 2.2259 (3.2050) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][580/625] eta 0:00:20 lr 0.000037 wd 0.0500 time 0.4514 (0.4517) data time 0.0008 (0.0017) model time 0.4506 (0.4490) loss 2.7130 (2.4462) grad_norm 2.9219 (3.2245) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][590/625] eta 0:00:15 lr 0.000037 wd 0.0500 time 0.4503 (0.4517) data time 0.0009 (0.0017) model time 0.4494 (0.4490) loss 1.6392 (2.4413) grad_norm 2.1456 (3.2252) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:30:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][600/625] eta 0:00:11 lr 0.000037 wd 0.0500 time 0.4443 (0.4516) data time 0.0008 (0.0017) model time 0.4435 (0.4490) loss 2.2771 (2.4374) grad_norm 55.6011 (3.3083) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][610/625] eta 0:00:06 lr 0.000037 wd 0.0500 time 0.4407 (0.4515) data time 0.0004 (0.0017) model time 0.4402 (0.4489) loss 2.0440 (2.4383) grad_norm 2.9139 (3.5005) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [273/300][620/625] eta 0:00:02 lr 0.000037 wd 0.0500 time 0.4456 (0.4514) data time 0.0004 (0.0017) model time 0.4451 (0.4488) loss 2.7849 (2.4390) grad_norm 3.2982 (3.5034) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:08 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 273 training takes 0:04:42 [2024-08-11 10:31:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:31:09 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5171 (0.5171) Acc@1 88.916 (88.916) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 10:31:11 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.151) Loss 0.8477 (0.6265) Acc@1 80.762 (86.896) Acc@5 96.045 (97.772) Mem 16699MB [2024-08-11 10:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9268 (0.7479) Acc@1 79.395 (84.091) Acc@5 95.459 (96.659) Mem 16699MB [2024-08-11 10:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.783 Acc@5 96.609 [2024-08-11 10:31:12 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:31:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.797 (0.797) Loss 0.5171 (0.5171) Acc@1 89.453 (89.453) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.182) Loss 0.8325 (0.6179) Acc@1 81.055 (87.078) Acc@5 96.436 (97.785) Mem 16699MB [2024-08-11 10:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.150) Loss 0.9082 (0.7355) Acc@1 79.443 (84.273) Acc@5 95.459 (96.747) Mem 16699MB [2024-08-11 10:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.029 Acc@5 96.695 [2024-08-11 10:31:16 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:31:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][0/625] eta 0:12:50 lr 0.000037 wd 0.0500 time 1.2335 (1.2335) data time 0.5958 (0.5958) model time 0.0000 (0.0000) loss 2.5605 (2.5605) grad_norm 2.4570 (2.4570) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][10/625] eta 0:05:19 lr 0.000037 wd 0.0500 time 0.4525 (0.5202) data time 0.0006 (0.0549) model time 0.0000 (0.0000) loss 2.4656 (2.4617) grad_norm 6.9952 (2.9193) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][20/625] eta 0:04:54 lr 0.000037 wd 0.0500 time 0.4470 (0.4860) data time 0.0008 (0.0291) model time 0.0000 (0.0000) loss 2.3443 (2.4550) grad_norm 3.1093 (2.6589) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][30/625] eta 0:04:42 lr 0.000037 wd 0.0500 time 0.4456 (0.4740) data time 0.0008 (0.0200) model time 0.0000 (0.0000) loss 2.3636 (2.3785) grad_norm 3.2106 (3.1170) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][40/625] eta 0:04:33 lr 0.000037 wd 0.0500 time 0.4445 (0.4680) data time 0.0008 (0.0153) model time 0.0000 (0.0000) loss 2.6398 (2.3858) grad_norm 3.1924 (3.5052) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][50/625] eta 0:04:29 lr 0.000037 wd 0.0500 time 0.4512 (0.4686) data time 0.0009 (0.0125) model time 0.0000 (0.0000) loss 1.7271 (2.4073) grad_norm 4.3824 (3.4306) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][60/625] eta 0:04:23 lr 0.000037 wd 0.0500 time 0.4481 (0.4658) data time 0.0009 (0.0106) model time 0.4472 (0.4507) loss 2.5708 (2.4093) grad_norm 2.7916 (3.3433) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][70/625] eta 0:04:17 lr 0.000037 wd 0.0500 time 0.4486 (0.4634) data time 0.0007 (0.0092) model time 0.4479 (0.4494) loss 3.1535 (2.4537) grad_norm 2.5672 (3.4792) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][80/625] eta 0:04:12 lr 0.000037 wd 0.0500 time 0.4460 (0.4635) data time 0.0007 (0.0081) model time 0.4453 (0.4540) loss 2.8473 (2.4467) grad_norm 2.1294 (3.3681) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:31:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][90/625] eta 0:04:07 lr 0.000037 wd 0.0500 time 0.4507 (0.4620) data time 0.0006 (0.0073) model time 0.4501 (0.4529) loss 1.4154 (2.4251) grad_norm 3.2123 (3.3414) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:32:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][100/625] eta 0:04:02 lr 0.000037 wd 0.0500 time 0.4508 (0.4612) data time 0.0006 (0.0067) model time 0.4501 (0.4528) loss 2.6958 (2.4349) grad_norm 19.2227 (3.5103) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:32:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][110/625] eta 0:03:56 lr 0.000037 wd 0.0500 time 0.4517 (0.4601) data time 0.0008 (0.0062) model time 0.4509 (0.4522) loss 2.1258 (2.4313) grad_norm 2.4567 (3.6101) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:32:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][120/625] eta 0:03:51 lr 0.000037 wd 0.0500 time 0.4429 (0.4590) data time 0.0009 (0.0057) model time 0.4420 (0.4511) loss 2.5555 (2.4258) grad_norm 1.9976 (3.6181) loss_scale 32.0000 (32.0000) mem 16699MB [2024-08-11 10:32:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][130/625] eta 0:03:46 lr 0.000037 wd 0.0500 time 0.4505 (0.4581) data time 0.0008 (0.0053) model time 0.4496 (0.4506) loss 2.4077 (2.4277) grad_norm 2.4132 (3.6627) loss_scale 64.0000 (33.4656) mem 16699MB [2024-08-11 10:32:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][140/625] eta 0:03:41 lr 0.000037 wd 0.0500 time 0.4459 (0.4574) data time 0.0009 (0.0050) model time 0.4450 (0.4503) loss 2.7498 (2.4272) grad_norm 6.3440 (3.6470) loss_scale 64.0000 (35.6312) mem 16699MB [2024-08-11 10:32:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][150/625] eta 0:03:37 lr 0.000037 wd 0.0500 time 0.4539 (0.4570) data time 0.0006 (0.0047) model time 0.4533 (0.4503) loss 2.3076 (2.4173) grad_norm 2.5890 (3.6050) loss_scale 64.0000 (37.5099) mem 16699MB [2024-08-11 10:32:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][160/625] eta 0:03:32 lr 0.000037 wd 0.0500 time 0.4461 (0.4565) data time 0.0009 (0.0045) model time 0.4452 (0.4501) loss 2.9148 (2.4141) grad_norm 2.2607 (3.6306) loss_scale 64.0000 (39.1553) mem 16699MB [2024-08-11 10:32:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][170/625] eta 0:03:27 lr 0.000037 wd 0.0500 time 0.4492 (0.4561) data time 0.0006 (0.0043) model time 0.4486 (0.4500) loss 3.0571 (2.4201) grad_norm 2.2960 (3.6102) loss_scale 64.0000 (40.6082) mem 16699MB [2024-08-11 10:32:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][180/625] eta 0:03:22 lr 0.000037 wd 0.0500 time 0.4469 (0.4557) data time 0.0008 (0.0041) model time 0.4462 (0.4499) loss 2.7849 (2.4270) grad_norm 3.5178 (3.6839) loss_scale 64.0000 (41.9006) mem 16699MB [2024-08-11 10:32:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][190/625] eta 0:03:18 lr 0.000037 wd 0.0500 time 0.4452 (0.4553) data time 0.0006 (0.0039) model time 0.4446 (0.4496) loss 2.0744 (2.4319) grad_norm 3.7394 (3.6686) loss_scale 64.0000 (43.0576) mem 16699MB [2024-08-11 10:32:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][200/625] eta 0:03:13 lr 0.000036 wd 0.0500 time 0.4455 (0.4548) data time 0.0008 (0.0038) model time 0.4446 (0.4493) loss 3.0283 (2.4325) grad_norm 3.5467 (3.6539) loss_scale 64.0000 (44.0995) mem 16699MB [2024-08-11 10:32:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][210/625] eta 0:03:08 lr 0.000036 wd 0.0500 time 0.4465 (0.4545) data time 0.0008 (0.0036) model time 0.4457 (0.4492) loss 2.8008 (2.4366) grad_norm 2.2417 (3.6284) loss_scale 64.0000 (45.0427) mem 16699MB [2024-08-11 10:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][220/625] eta 0:03:03 lr 0.000036 wd 0.0500 time 0.4488 (0.4542) data time 0.0006 (0.0035) model time 0.4482 (0.4491) loss 2.2110 (2.4363) grad_norm 3.1749 (3.6104) loss_scale 64.0000 (45.9005) mem 16699MB [2024-08-11 10:33:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][230/625] eta 0:02:59 lr 0.000036 wd 0.0500 time 0.4481 (0.4540) data time 0.0008 (0.0034) model time 0.4472 (0.4490) loss 2.0124 (2.4454) grad_norm 2.9484 (3.5896) loss_scale 64.0000 (46.6840) mem 16699MB [2024-08-11 10:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][240/625] eta 0:02:54 lr 0.000036 wd 0.0500 time 0.4468 (0.4538) data time 0.0008 (0.0033) model time 0.4459 (0.4490) loss 2.6015 (2.4517) grad_norm 2.0067 (3.5869) loss_scale 64.0000 (47.4025) mem 16699MB [2024-08-11 10:33:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][250/625] eta 0:02:50 lr 0.000036 wd 0.0500 time 0.4491 (0.4536) data time 0.0006 (0.0032) model time 0.4485 (0.4489) loss 2.4511 (2.4437) grad_norm 1.7573 (3.5350) loss_scale 64.0000 (48.0637) mem 16699MB [2024-08-11 10:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][260/625] eta 0:02:45 lr 0.000036 wd 0.0500 time 0.4461 (0.4533) data time 0.0008 (0.0031) model time 0.4453 (0.4488) loss 2.4136 (2.4369) grad_norm 2.4024 (3.5287) loss_scale 64.0000 (48.6743) mem 16699MB [2024-08-11 10:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][270/625] eta 0:02:40 lr 0.000036 wd 0.0500 time 0.4482 (0.4531) data time 0.0006 (0.0030) model time 0.4476 (0.4487) loss 2.1276 (2.4299) grad_norm 1.9552 (3.5573) loss_scale 64.0000 (49.2399) mem 16699MB [2024-08-11 10:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][280/625] eta 0:02:36 lr 0.000036 wd 0.0500 time 0.4471 (0.4529) data time 0.0007 (0.0029) model time 0.4464 (0.4486) loss 2.3173 (2.4195) grad_norm 3.1474 (3.5374) loss_scale 64.0000 (49.7651) mem 16699MB [2024-08-11 10:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][290/625] eta 0:02:31 lr 0.000036 wd 0.0500 time 0.4486 (0.4527) data time 0.0006 (0.0029) model time 0.4480 (0.4485) loss 2.0465 (2.4174) grad_norm 1.6981 (3.4974) loss_scale 64.0000 (50.2543) mem 16699MB [2024-08-11 10:33:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][300/625] eta 0:02:27 lr 0.000036 wd 0.0500 time 0.4535 (0.4526) data time 0.0007 (0.0028) model time 0.4528 (0.4485) loss 2.2882 (2.4166) grad_norm 2.5039 (3.4870) loss_scale 64.0000 (50.7110) mem 16699MB [2024-08-11 10:33:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][310/625] eta 0:02:22 lr 0.000036 wd 0.0500 time 0.4484 (0.4525) data time 0.0009 (0.0027) model time 0.4476 (0.4485) loss 2.6693 (2.4127) grad_norm 3.1804 (3.4755) loss_scale 64.0000 (51.1383) mem 16699MB [2024-08-11 10:33:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][320/625] eta 0:02:17 lr 0.000036 wd 0.0500 time 0.4495 (0.4524) data time 0.0009 (0.0027) model time 0.4486 (0.4485) loss 2.4825 (2.4074) grad_norm 2.4084 (3.4609) loss_scale 64.0000 (51.5389) mem 16699MB [2024-08-11 10:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][330/625] eta 0:02:13 lr 0.000036 wd 0.0500 time 0.4490 (0.4522) data time 0.0009 (0.0026) model time 0.4481 (0.4484) loss 2.6927 (2.4078) grad_norm 6.9617 (3.4707) loss_scale 64.0000 (51.9154) mem 16699MB [2024-08-11 10:33:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][340/625] eta 0:02:08 lr 0.000036 wd 0.0500 time 0.4469 (0.4521) data time 0.0009 (0.0026) model time 0.4460 (0.4483) loss 2.1162 (2.4095) grad_norm 2.9797 (3.4541) loss_scale 64.0000 (52.2698) mem 16699MB [2024-08-11 10:33:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][350/625] eta 0:02:04 lr 0.000036 wd 0.0500 time 0.4471 (0.4520) data time 0.0008 (0.0025) model time 0.4463 (0.4483) loss 2.5309 (2.4117) grad_norm 26.0655 (3.4931) loss_scale 64.0000 (52.6040) mem 16699MB [2024-08-11 10:33:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][360/625] eta 0:01:59 lr 0.000036 wd 0.0500 time 0.4452 (0.4518) data time 0.0010 (0.0025) model time 0.4442 (0.4482) loss 2.5528 (2.4125) grad_norm 1.9325 (3.4806) loss_scale 64.0000 (52.9197) mem 16699MB [2024-08-11 10:34:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][370/625] eta 0:01:55 lr 0.000036 wd 0.0500 time 0.4493 (0.4517) data time 0.0009 (0.0024) model time 0.4484 (0.4482) loss 2.3965 (2.4171) grad_norm 2.5874 (3.4533) loss_scale 64.0000 (53.2183) mem 16699MB [2024-08-11 10:34:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][380/625] eta 0:01:50 lr 0.000036 wd 0.0500 time 0.4532 (0.4522) data time 0.0007 (0.0024) model time 0.4525 (0.4488) loss 1.9749 (2.4173) grad_norm 2.7475 (3.4309) loss_scale 64.0000 (53.5013) mem 16699MB [2024-08-11 10:34:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][390/625] eta 0:01:46 lr 0.000036 wd 0.0500 time 0.4496 (0.4521) data time 0.0008 (0.0024) model time 0.4488 (0.4488) loss 1.8155 (2.4181) grad_norm 2.0216 (3.4091) loss_scale 64.0000 (53.7698) mem 16699MB [2024-08-11 10:34:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][400/625] eta 0:01:41 lr 0.000036 wd 0.0500 time 0.4494 (0.4521) data time 0.0006 (0.0023) model time 0.4488 (0.4488) loss 2.9616 (2.4249) grad_norm 9.9911 (3.4118) loss_scale 64.0000 (54.0249) mem 16699MB [2024-08-11 10:34:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][410/625] eta 0:01:37 lr 0.000036 wd 0.0500 time 0.4454 (0.4522) data time 0.0009 (0.0023) model time 0.4446 (0.4491) loss 1.7784 (2.4277) grad_norm 2.6251 (3.4144) loss_scale 64.0000 (54.2676) mem 16699MB [2024-08-11 10:34:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][420/625] eta 0:01:32 lr 0.000036 wd 0.0500 time 0.4457 (0.4521) data time 0.0008 (0.0023) model time 0.4449 (0.4490) loss 2.3195 (2.4258) grad_norm 6.7272 (3.4179) loss_scale 64.0000 (54.4988) mem 16699MB [2024-08-11 10:34:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][430/625] eta 0:01:28 lr 0.000036 wd 0.0500 time 0.4558 (0.4520) data time 0.0006 (0.0022) model time 0.4552 (0.4490) loss 1.7958 (2.4209) grad_norm 2.2770 (3.4169) loss_scale 64.0000 (54.7193) mem 16699MB [2024-08-11 10:34:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][440/625] eta 0:01:23 lr 0.000036 wd 0.0500 time 0.4476 (0.4519) data time 0.0006 (0.0022) model time 0.4470 (0.4489) loss 2.8660 (2.4222) grad_norm 3.5204 (3.4261) loss_scale 64.0000 (54.9297) mem 16699MB [2024-08-11 10:34:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][450/625] eta 0:01:19 lr 0.000036 wd 0.0500 time 0.4493 (0.4519) data time 0.0006 (0.0022) model time 0.4487 (0.4489) loss 1.9557 (2.4221) grad_norm 2.1664 (3.4099) loss_scale 64.0000 (55.1308) mem 16699MB [2024-08-11 10:34:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][460/625] eta 0:01:14 lr 0.000036 wd 0.0500 time 0.4474 (0.4518) data time 0.0009 (0.0021) model time 0.4465 (0.4489) loss 2.3338 (2.4248) grad_norm 25.4184 (3.4459) loss_scale 64.0000 (55.3232) mem 16699MB [2024-08-11 10:34:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][470/625] eta 0:01:10 lr 0.000036 wd 0.0500 time 0.4475 (0.4517) data time 0.0006 (0.0021) model time 0.4468 (0.4488) loss 2.4442 (2.4273) grad_norm 2.6555 (3.4719) loss_scale 64.0000 (55.5074) mem 16699MB [2024-08-11 10:34:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][480/625] eta 0:01:05 lr 0.000036 wd 0.0500 time 0.4439 (0.4516) data time 0.0006 (0.0021) model time 0.4433 (0.4487) loss 2.4375 (2.4279) grad_norm 2.8814 (3.4523) loss_scale 64.0000 (55.6840) mem 16699MB [2024-08-11 10:34:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][490/625] eta 0:01:00 lr 0.000036 wd 0.0500 time 0.4467 (0.4515) data time 0.0009 (0.0021) model time 0.4458 (0.4486) loss 2.7194 (2.4290) grad_norm 2.1996 (3.4610) loss_scale 64.0000 (55.8534) mem 16699MB [2024-08-11 10:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][500/625] eta 0:00:56 lr 0.000036 wd 0.0500 time 0.4491 (0.4514) data time 0.0009 (0.0020) model time 0.4482 (0.4486) loss 2.4374 (2.4302) grad_norm 2.2281 (3.5953) loss_scale 64.0000 (56.0160) mem 16699MB [2024-08-11 10:35:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][510/625] eta 0:00:51 lr 0.000036 wd 0.0500 time 0.4469 (0.4513) data time 0.0006 (0.0020) model time 0.4463 (0.4485) loss 2.4649 (2.4311) grad_norm 3.3813 (3.5805) loss_scale 64.0000 (56.1722) mem 16699MB [2024-08-11 10:35:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][520/625] eta 0:00:47 lr 0.000036 wd 0.0500 time 0.4450 (0.4512) data time 0.0009 (0.0020) model time 0.4441 (0.4485) loss 2.0170 (2.4343) grad_norm 42.6858 (3.6442) loss_scale 64.0000 (56.3225) mem 16699MB [2024-08-11 10:35:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][530/625] eta 0:00:42 lr 0.000035 wd 0.0500 time 0.4493 (0.4512) data time 0.0006 (0.0020) model time 0.4487 (0.4485) loss 2.1987 (2.4359) grad_norm 3.5828 (3.6281) loss_scale 64.0000 (56.4670) mem 16699MB [2024-08-11 10:35:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][540/625] eta 0:00:38 lr 0.000035 wd 0.0500 time 0.4499 (0.4511) data time 0.0008 (0.0019) model time 0.4492 (0.4485) loss 2.7485 (2.4385) grad_norm 2.4047 (3.6248) loss_scale 64.0000 (56.6063) mem 16699MB [2024-08-11 10:35:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][550/625] eta 0:00:33 lr 0.000035 wd 0.0500 time 0.4475 (0.4511) data time 0.0005 (0.0019) model time 0.4469 (0.4484) loss 2.4217 (2.4399) grad_norm 2.3892 (3.6180) loss_scale 64.0000 (56.7405) mem 16699MB [2024-08-11 10:35:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][560/625] eta 0:00:29 lr 0.000035 wd 0.0500 time 0.4458 (0.4510) data time 0.0008 (0.0019) model time 0.4450 (0.4484) loss 2.7548 (2.4426) grad_norm 2.2615 (3.6181) loss_scale 64.0000 (56.8699) mem 16699MB [2024-08-11 10:35:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][570/625] eta 0:00:24 lr 0.000035 wd 0.0500 time 0.4504 (0.4512) data time 0.0008 (0.0019) model time 0.4495 (0.4486) loss 2.5811 (2.4470) grad_norm 2.6328 (3.6163) loss_scale 64.0000 (56.9947) mem 16699MB [2024-08-11 10:35:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][580/625] eta 0:00:20 lr 0.000035 wd 0.0500 time 0.4447 (0.4511) data time 0.0007 (0.0019) model time 0.4440 (0.4486) loss 2.4966 (2.4453) grad_norm 3.5546 (3.6016) loss_scale 64.0000 (57.1153) mem 16699MB [2024-08-11 10:35:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][590/625] eta 0:00:15 lr 0.000035 wd 0.0500 time 0.4479 (0.4511) data time 0.0006 (0.0018) model time 0.4473 (0.4486) loss 2.4214 (2.4455) grad_norm 2.3726 (3.5979) loss_scale 64.0000 (57.2318) mem 16699MB [2024-08-11 10:35:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][600/625] eta 0:00:11 lr 0.000035 wd 0.0500 time 0.4484 (0.4511) data time 0.0007 (0.0018) model time 0.4477 (0.4487) loss 2.6039 (2.4456) grad_norm 2.8444 (3.5846) loss_scale 64.0000 (57.3444) mem 16699MB [2024-08-11 10:35:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][610/625] eta 0:00:06 lr 0.000035 wd 0.0500 time 0.4425 (0.4511) data time 0.0004 (0.0018) model time 0.4420 (0.4487) loss 2.8922 (2.4489) grad_norm 2.3355 (3.5805) loss_scale 64.0000 (57.4534) mem 16699MB [2024-08-11 10:35:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [274/300][620/625] eta 0:00:02 lr 0.000035 wd 0.0500 time 0.4422 (0.4510) data time 0.0006 (0.0018) model time 0.4416 (0.4485) loss 2.9146 (2.4487) grad_norm 2.6838 (3.5679) loss_scale 64.0000 (57.5588) mem 16699MB [2024-08-11 10:35:58 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 274 training takes 0:04:41 [2024-08-11 10:35:58 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:36:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:36:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.468 (0.468) Loss 0.5293 (0.5293) Acc@1 88.525 (88.525) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 10:36:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.150) Loss 0.8394 (0.6250) Acc@1 81.250 (86.998) Acc@5 96.240 (97.758) Mem 16699MB [2024-08-11 10:36:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.134) Loss 0.9341 (0.7477) Acc@1 79.297 (84.177) Acc@5 95.410 (96.640) Mem 16699MB [2024-08-11 10:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.905 Acc@5 96.611 [2024-08-11 10:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 10:36:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.791 (0.791) Loss 0.5166 (0.5166) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:36:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.182) Loss 0.8320 (0.6179) Acc@1 81.055 (87.065) Acc@5 96.436 (97.794) Mem 16699MB [2024-08-11 10:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.150) Loss 0.9092 (0.7359) Acc@1 79.492 (84.263) Acc@5 95.410 (96.742) Mem 16699MB [2024-08-11 10:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.019 Acc@5 96.689 [2024-08-11 10:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:36:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][0/625] eta 0:13:29 lr 0.000035 wd 0.0500 time 1.2951 (1.2951) data time 0.7093 (0.7093) model time 0.0000 (0.0000) loss 2.8782 (2.8782) grad_norm 2.3189 (2.3189) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][10/625] eta 0:05:22 lr 0.000035 wd 0.0500 time 0.4468 (0.5242) data time 0.0009 (0.0653) model time 0.0000 (0.0000) loss 2.7042 (2.5525) grad_norm 2.3293 (2.5510) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][20/625] eta 0:04:55 lr 0.000035 wd 0.0500 time 0.4448 (0.4880) data time 0.0007 (0.0346) model time 0.0000 (0.0000) loss 1.5420 (2.4237) grad_norm 2.1666 (2.6897) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][30/625] eta 0:04:42 lr 0.000035 wd 0.0500 time 0.4487 (0.4755) data time 0.0007 (0.0237) model time 0.0000 (0.0000) loss 2.6126 (2.4334) grad_norm 1.9363 (2.8269) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][40/625] eta 0:04:34 lr 0.000035 wd 0.0500 time 0.4494 (0.4690) data time 0.0008 (0.0181) model time 0.0000 (0.0000) loss 2.3336 (2.4482) grad_norm 2.1987 (2.7411) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][50/625] eta 0:04:27 lr 0.000035 wd 0.0500 time 0.4475 (0.4651) data time 0.0006 (0.0148) model time 0.0000 (0.0000) loss 2.7669 (2.4854) grad_norm 2.2025 (2.6801) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][60/625] eta 0:04:21 lr 0.000035 wd 0.0500 time 0.4488 (0.4623) data time 0.0009 (0.0125) model time 0.4479 (0.4472) loss 2.7591 (2.4651) grad_norm 3.6321 (2.7041) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][70/625] eta 0:04:15 lr 0.000035 wd 0.0500 time 0.4474 (0.4601) data time 0.0009 (0.0108) model time 0.4465 (0.4466) loss 2.4384 (2.4333) grad_norm 2.2262 (2.7337) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][80/625] eta 0:04:09 lr 0.000035 wd 0.0500 time 0.4450 (0.4585) data time 0.0007 (0.0096) model time 0.4443 (0.4464) loss 3.3915 (2.4080) grad_norm 2.2150 (2.8971) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][90/625] eta 0:04:04 lr 0.000035 wd 0.0500 time 0.4467 (0.4573) data time 0.0009 (0.0086) model time 0.4458 (0.4465) loss 2.0710 (2.4115) grad_norm 4.2928 (2.9586) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][100/625] eta 0:03:59 lr 0.000035 wd 0.0500 time 0.4467 (0.4565) data time 0.0009 (0.0079) model time 0.4458 (0.4469) loss 2.7319 (2.4233) grad_norm 3.7256 (2.9840) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][110/625] eta 0:03:54 lr 0.000035 wd 0.0500 time 0.4518 (0.4559) data time 0.0008 (0.0072) model time 0.4510 (0.4473) loss 2.7547 (2.4269) grad_norm 2.0992 (2.9534) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][120/625] eta 0:03:49 lr 0.000035 wd 0.0500 time 0.4488 (0.4554) data time 0.0006 (0.0067) model time 0.4482 (0.4476) loss 2.3765 (2.4245) grad_norm 2.3999 (3.0401) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][130/625] eta 0:03:45 lr 0.000035 wd 0.0500 time 0.4513 (0.4550) data time 0.0006 (0.0062) model time 0.4507 (0.4478) loss 3.0151 (2.4308) grad_norm 2.0680 (3.0117) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][140/625] eta 0:03:41 lr 0.000035 wd 0.0500 time 0.6594 (0.4559) data time 0.0007 (0.0059) model time 0.6587 (0.4498) loss 2.3611 (2.4216) grad_norm 3.8910 (3.0026) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][150/625] eta 0:03:36 lr 0.000035 wd 0.0500 time 0.4446 (0.4552) data time 0.0006 (0.0055) model time 0.4440 (0.4494) loss 2.3502 (2.4277) grad_norm 1.8998 (3.2554) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][160/625] eta 0:03:31 lr 0.000035 wd 0.0500 time 0.4441 (0.4555) data time 0.0010 (0.0052) model time 0.4431 (0.4503) loss 2.7346 (2.4352) grad_norm 2.1526 (3.6765) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][170/625] eta 0:03:27 lr 0.000035 wd 0.0500 time 0.4446 (0.4550) data time 0.0008 (0.0050) model time 0.4438 (0.4499) loss 2.6484 (2.4500) grad_norm 2.3700 (3.6632) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][180/625] eta 0:03:22 lr 0.000035 wd 0.0500 time 0.4481 (0.4546) data time 0.0007 (0.0048) model time 0.4474 (0.4497) loss 2.9893 (2.4559) grad_norm 5.6109 (3.6401) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][190/625] eta 0:03:17 lr 0.000035 wd 0.0500 time 0.4481 (0.4543) data time 0.0009 (0.0045) model time 0.4472 (0.4495) loss 2.6872 (2.4599) grad_norm 2.2111 (3.5870) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][200/625] eta 0:03:12 lr 0.000035 wd 0.0500 time 0.4501 (0.4540) data time 0.0009 (0.0044) model time 0.4492 (0.4494) loss 2.4691 (2.4600) grad_norm 2.8555 (3.6091) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][210/625] eta 0:03:08 lr 0.000035 wd 0.0500 time 0.4462 (0.4536) data time 0.0006 (0.0042) model time 0.4456 (0.4492) loss 2.8323 (2.4657) grad_norm 2.5169 (3.6200) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][220/625] eta 0:03:03 lr 0.000035 wd 0.0500 time 0.4459 (0.4534) data time 0.0007 (0.0040) model time 0.4452 (0.4490) loss 2.6133 (2.4692) grad_norm 3.8549 (3.6244) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][230/625] eta 0:02:58 lr 0.000035 wd 0.0500 time 0.4451 (0.4530) data time 0.0006 (0.0039) model time 0.4445 (0.4488) loss 2.8553 (2.4745) grad_norm 3.0150 (3.6096) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:37:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][240/625] eta 0:02:54 lr 0.000035 wd 0.0500 time 0.4458 (0.4528) data time 0.0009 (0.0038) model time 0.4449 (0.4486) loss 2.3074 (2.4677) grad_norm 2.3644 (3.6281) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][250/625] eta 0:02:49 lr 0.000034 wd 0.0500 time 0.4500 (0.4527) data time 0.0006 (0.0037) model time 0.4494 (0.4487) loss 2.6518 (2.4745) grad_norm 5.1299 (3.6038) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][260/625] eta 0:02:45 lr 0.000034 wd 0.0500 time 0.4452 (0.4525) data time 0.0006 (0.0036) model time 0.4446 (0.4486) loss 2.4258 (2.4811) grad_norm 2.3446 (3.5819) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][270/625] eta 0:02:40 lr 0.000034 wd 0.0500 time 0.4448 (0.4524) data time 0.0009 (0.0035) model time 0.4439 (0.4486) loss 2.0605 (2.4813) grad_norm 2.3092 (3.5547) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][280/625] eta 0:02:36 lr 0.000034 wd 0.0500 time 0.4458 (0.4522) data time 0.0007 (0.0034) model time 0.4452 (0.4485) loss 1.4207 (2.4775) grad_norm 2.1665 (3.5389) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][290/625] eta 0:02:31 lr 0.000034 wd 0.0500 time 0.4428 (0.4520) data time 0.0008 (0.0033) model time 0.4420 (0.4484) loss 2.6967 (2.4765) grad_norm 2.8374 (3.5198) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][300/625] eta 0:02:26 lr 0.000034 wd 0.0500 time 0.4447 (0.4518) data time 0.0007 (0.0032) model time 0.4440 (0.4483) loss 1.6283 (2.4748) grad_norm 3.3291 (3.5278) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][310/625] eta 0:02:22 lr 0.000034 wd 0.0500 time 0.4466 (0.4521) data time 0.0008 (0.0031) model time 0.4458 (0.4487) loss 2.0084 (2.4736) grad_norm 3.9345 (3.5174) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][320/625] eta 0:02:17 lr 0.000034 wd 0.0500 time 0.4522 (0.4520) data time 0.0006 (0.0030) model time 0.4516 (0.4487) loss 1.5509 (2.4656) grad_norm 4.1863 (3.4974) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][330/625] eta 0:02:13 lr 0.000034 wd 0.0500 time 0.4511 (0.4519) data time 0.0008 (0.0030) model time 0.4503 (0.4487) loss 2.7675 (2.4649) grad_norm 1.9819 (3.4682) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][340/625] eta 0:02:08 lr 0.000034 wd 0.0500 time 0.4463 (0.4518) data time 0.0008 (0.0029) model time 0.4455 (0.4486) loss 2.4452 (2.4621) grad_norm 2.4334 (3.4514) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][350/625] eta 0:02:04 lr 0.000034 wd 0.0500 time 0.4433 (0.4517) data time 0.0008 (0.0029) model time 0.4425 (0.4486) loss 2.4739 (2.4624) grad_norm 4.8771 (3.4436) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][360/625] eta 0:01:59 lr 0.000034 wd 0.0500 time 0.4442 (0.4515) data time 0.0006 (0.0028) model time 0.4436 (0.4484) loss 1.9491 (2.4640) grad_norm 3.8015 (3.4540) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][370/625] eta 0:01:55 lr 0.000034 wd 0.0500 time 0.4459 (0.4523) data time 0.0006 (0.0028) model time 0.4453 (0.4494) loss 2.1936 (2.4635) grad_norm 1.7492 (3.4353) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:38:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][380/625] eta 0:01:50 lr 0.000034 wd 0.0500 time 0.4491 (0.4521) data time 0.0007 (0.0027) model time 0.4484 (0.4493) loss 2.5698 (2.4626) grad_norm 2.8748 (3.4238) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][390/625] eta 0:01:46 lr 0.000034 wd 0.0500 time 0.4477 (0.4520) data time 0.0013 (0.0027) model time 0.4464 (0.4492) loss 2.7107 (2.4604) grad_norm 3.4168 (3.4224) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][400/625] eta 0:01:41 lr 0.000034 wd 0.0500 time 0.4498 (0.4519) data time 0.0006 (0.0026) model time 0.4492 (0.4492) loss 3.0503 (2.4653) grad_norm 2.4893 (3.4204) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][410/625] eta 0:01:37 lr 0.000034 wd 0.0500 time 0.4489 (0.4518) data time 0.0009 (0.0026) model time 0.4480 (0.4491) loss 2.6559 (2.4654) grad_norm 3.7729 (3.4191) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][420/625] eta 0:01:32 lr 0.000034 wd 0.0500 time 0.4609 (0.4518) data time 0.0006 (0.0025) model time 0.4603 (0.4492) loss 2.2509 (2.4600) grad_norm 1.9312 (3.3928) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][430/625] eta 0:01:28 lr 0.000034 wd 0.0500 time 0.4463 (0.4517) data time 0.0009 (0.0025) model time 0.4454 (0.4490) loss 2.6145 (2.4626) grad_norm 3.0471 (3.4177) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][440/625] eta 0:01:23 lr 0.000034 wd 0.0500 time 0.4470 (0.4516) data time 0.0008 (0.0025) model time 0.4462 (0.4490) loss 2.1493 (2.4566) grad_norm 2.7482 (3.4174) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][450/625] eta 0:01:19 lr 0.000034 wd 0.0500 time 0.4522 (0.4515) data time 0.0009 (0.0024) model time 0.4514 (0.4489) loss 2.6230 (2.4590) grad_norm 3.4993 (3.4084) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][460/625] eta 0:01:14 lr 0.000034 wd 0.0500 time 0.4467 (0.4515) data time 0.0006 (0.0024) model time 0.4461 (0.4489) loss 2.6822 (2.4637) grad_norm 2.4138 (3.3955) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][470/625] eta 0:01:09 lr 0.000034 wd 0.0500 time 0.4511 (0.4514) data time 0.0009 (0.0024) model time 0.4502 (0.4489) loss 1.8373 (2.4592) grad_norm 4.2274 (3.3886) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][480/625] eta 0:01:05 lr 0.000034 wd 0.0500 time 0.4455 (0.4514) data time 0.0009 (0.0023) model time 0.4446 (0.4489) loss 2.5777 (2.4629) grad_norm 2.7573 (3.3834) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][490/625] eta 0:01:00 lr 0.000034 wd 0.0500 time 0.4450 (0.4514) data time 0.0007 (0.0023) model time 0.4443 (0.4489) loss 2.3322 (2.4608) grad_norm 2.7827 (3.3720) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][500/625] eta 0:00:56 lr 0.000034 wd 0.0500 time 0.4468 (0.4513) data time 0.0007 (0.0023) model time 0.4461 (0.4489) loss 2.5547 (2.4591) grad_norm 8.3725 (3.4102) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:39:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][510/625] eta 0:00:51 lr 0.000034 wd 0.0500 time 0.4476 (0.4519) data time 0.0009 (0.0022) model time 0.4467 (0.4496) loss 2.6455 (2.4620) grad_norm 2.3464 (3.4072) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][520/625] eta 0:00:47 lr 0.000034 wd 0.0500 time 0.4469 (0.4518) data time 0.0006 (0.0022) model time 0.4462 (0.4496) loss 2.7841 (2.4607) grad_norm 1.7876 (3.3972) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][530/625] eta 0:00:42 lr 0.000034 wd 0.0500 time 0.4482 (0.4518) data time 0.0009 (0.0022) model time 0.4473 (0.4495) loss 2.7302 (2.4600) grad_norm 2.1097 (3.4112) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][540/625] eta 0:00:38 lr 0.000034 wd 0.0500 time 0.4456 (0.4517) data time 0.0006 (0.0022) model time 0.4450 (0.4494) loss 2.7359 (2.4615) grad_norm 6.5919 (3.4062) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][550/625] eta 0:00:33 lr 0.000034 wd 0.0500 time 0.4471 (0.4516) data time 0.0007 (0.0021) model time 0.4464 (0.4494) loss 3.1266 (2.4631) grad_norm 3.7996 (3.4028) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][560/625] eta 0:00:29 lr 0.000034 wd 0.0500 time 0.4502 (0.4516) data time 0.0006 (0.0021) model time 0.4495 (0.4494) loss 2.0411 (2.4640) grad_norm 28.4713 (3.4417) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][570/625] eta 0:00:24 lr 0.000034 wd 0.0500 time 0.4423 (0.4515) data time 0.0006 (0.0021) model time 0.4417 (0.4493) loss 2.3518 (2.4646) grad_norm 3.6481 (3.4358) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][580/625] eta 0:00:20 lr 0.000034 wd 0.0500 time 0.4487 (0.4514) data time 0.0006 (0.0021) model time 0.4481 (0.4493) loss 1.9382 (2.4629) grad_norm 4.8590 (3.4363) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][590/625] eta 0:00:15 lr 0.000034 wd 0.0500 time 0.4456 (0.4514) data time 0.0007 (0.0020) model time 0.4449 (0.4492) loss 2.7754 (2.4620) grad_norm 3.0461 (3.4295) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][600/625] eta 0:00:11 lr 0.000033 wd 0.0500 time 0.4455 (0.4513) data time 0.0008 (0.0020) model time 0.4447 (0.4492) loss 2.5538 (2.4649) grad_norm 3.2301 (3.4279) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][610/625] eta 0:00:06 lr 0.000033 wd 0.0500 time 0.4426 (0.4512) data time 0.0006 (0.0020) model time 0.4421 (0.4491) loss 2.3603 (2.4660) grad_norm 3.9948 (3.4391) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [275/300][620/625] eta 0:00:02 lr 0.000033 wd 0.0500 time 0.4455 (0.4511) data time 0.0004 (0.0020) model time 0.4451 (0.4490) loss 3.0418 (2.4678) grad_norm 2.5750 (3.4326) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:40:48 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 275 training takes 0:04:41 [2024-08-11 10:40:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:40:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:40:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.479 (0.479) Loss 0.5376 (0.5376) Acc@1 88.281 (88.281) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 10:40:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8467 (0.6311) Acc@1 81.299 (87.012) Acc@5 96.094 (97.736) Mem 16699MB [2024-08-11 10:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.134) Loss 0.9507 (0.7528) Acc@1 79.248 (84.098) Acc@5 95.020 (96.624) Mem 16699MB [2024-08-11 10:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.837 Acc@5 96.587 [2024-08-11 10:40:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:40:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.908 (0.908) Loss 0.5181 (0.5181) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:40:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.191) Loss 0.8325 (0.6186) Acc@1 81.006 (87.065) Acc@5 96.289 (97.798) Mem 16699MB [2024-08-11 10:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.155) Loss 0.9116 (0.7367) Acc@1 79.492 (84.259) Acc@5 95.410 (96.745) Mem 16699MB [2024-08-11 10:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.013 Acc@5 96.691 [2024-08-11 10:40:57 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:40:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][0/625] eta 0:13:42 lr 0.000033 wd 0.0500 time 1.3154 (1.3154) data time 0.5805 (0.5805) model time 0.0000 (0.0000) loss 2.1629 (2.1629) grad_norm 4.4999 (4.4999) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][10/625] eta 0:05:23 lr 0.000033 wd 0.0500 time 0.4459 (0.5265) data time 0.0007 (0.0535) model time 0.0000 (0.0000) loss 2.4314 (2.4589) grad_norm 2.0941 (2.7366) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][20/625] eta 0:04:55 lr 0.000033 wd 0.0500 time 0.4521 (0.4890) data time 0.0007 (0.0284) model time 0.0000 (0.0000) loss 2.2644 (2.4408) grad_norm 2.5589 (2.8022) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][30/625] eta 0:04:42 lr 0.000033 wd 0.0500 time 0.4465 (0.4753) data time 0.0007 (0.0195) model time 0.0000 (0.0000) loss 1.7935 (2.4564) grad_norm 3.3003 (2.9900) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][40/625] eta 0:04:34 lr 0.000033 wd 0.0500 time 0.4478 (0.4684) data time 0.0009 (0.0149) model time 0.0000 (0.0000) loss 2.5348 (2.4038) grad_norm 2.8388 (3.0192) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][50/625] eta 0:04:27 lr 0.000033 wd 0.0500 time 0.4465 (0.4644) data time 0.0007 (0.0122) model time 0.0000 (0.0000) loss 2.0986 (2.3774) grad_norm 3.6180 (4.6626) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][60/625] eta 0:04:20 lr 0.000033 wd 0.0500 time 0.4484 (0.4619) data time 0.0006 (0.0103) model time 0.4478 (0.4483) loss 3.0423 (2.4008) grad_norm 3.5349 (4.4113) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][70/625] eta 0:04:15 lr 0.000033 wd 0.0500 time 0.4501 (0.4601) data time 0.0007 (0.0090) model time 0.4494 (0.4484) loss 2.4494 (2.4175) grad_norm 4.4120 (4.2506) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][80/625] eta 0:04:09 lr 0.000033 wd 0.0500 time 0.4468 (0.4587) data time 0.0006 (0.0080) model time 0.4462 (0.4482) loss 2.9988 (2.4468) grad_norm 2.4674 (4.0972) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][90/625] eta 0:04:04 lr 0.000033 wd 0.0500 time 0.4470 (0.4575) data time 0.0007 (0.0072) model time 0.4463 (0.4480) loss 1.8842 (2.4394) grad_norm 2.8710 (4.0250) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][100/625] eta 0:04:00 lr 0.000033 wd 0.0500 time 0.4477 (0.4587) data time 0.0006 (0.0066) model time 0.4471 (0.4521) loss 1.9128 (2.4181) grad_norm 2.3236 (3.8769) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][110/625] eta 0:03:55 lr 0.000033 wd 0.0500 time 0.4459 (0.4577) data time 0.0007 (0.0060) model time 0.4452 (0.4513) loss 1.9667 (2.4268) grad_norm 2.4640 (3.7902) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][120/625] eta 0:03:50 lr 0.000033 wd 0.0500 time 0.4471 (0.4571) data time 0.0006 (0.0056) model time 0.4464 (0.4510) loss 2.2771 (2.4178) grad_norm 2.8688 (3.7212) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:41:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][130/625] eta 0:03:45 lr 0.000033 wd 0.0500 time 0.4500 (0.4566) data time 0.0009 (0.0053) model time 0.4491 (0.4507) loss 2.6073 (2.4118) grad_norm 3.0983 (3.7032) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][140/625] eta 0:03:41 lr 0.000033 wd 0.0500 time 0.4510 (0.4569) data time 0.0007 (0.0049) model time 0.4503 (0.4518) loss 2.6486 (2.4091) grad_norm 2.9317 (3.6258) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][150/625] eta 0:03:36 lr 0.000033 wd 0.0500 time 0.4492 (0.4564) data time 0.0008 (0.0047) model time 0.4484 (0.4515) loss 1.7505 (2.4101) grad_norm 3.4523 (3.9330) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][160/625] eta 0:03:31 lr 0.000033 wd 0.0500 time 0.4424 (0.4559) data time 0.0006 (0.0044) model time 0.4418 (0.4512) loss 2.4796 (2.4149) grad_norm 3.5318 (3.9207) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][170/625] eta 0:03:27 lr 0.000033 wd 0.0500 time 0.4481 (0.4555) data time 0.0010 (0.0042) model time 0.4472 (0.4509) loss 3.0609 (2.4236) grad_norm 2.3898 (3.8631) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][180/625] eta 0:03:22 lr 0.000033 wd 0.0500 time 0.4435 (0.4551) data time 0.0008 (0.0040) model time 0.4427 (0.4506) loss 2.3510 (2.4282) grad_norm 2.3706 (3.9254) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][190/625] eta 0:03:17 lr 0.000033 wd 0.0500 time 0.4483 (0.4547) data time 0.0007 (0.0039) model time 0.4476 (0.4503) loss 3.1166 (2.4316) grad_norm 2.6133 (3.9091) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][200/625] eta 0:03:13 lr 0.000033 wd 0.0500 time 0.4488 (0.4544) data time 0.0006 (0.0037) model time 0.4482 (0.4502) loss 2.6426 (2.4388) grad_norm 3.0855 (3.8592) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][210/625] eta 0:03:08 lr 0.000033 wd 0.0500 time 0.4487 (0.4542) data time 0.0006 (0.0036) model time 0.4481 (0.4502) loss 2.5550 (2.4388) grad_norm 3.3078 (3.8040) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][220/625] eta 0:03:03 lr 0.000033 wd 0.0500 time 0.4454 (0.4540) data time 0.0007 (0.0035) model time 0.4447 (0.4500) loss 2.3357 (2.4423) grad_norm 3.3750 (3.7894) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][230/625] eta 0:02:59 lr 0.000033 wd 0.0500 time 0.4502 (0.4537) data time 0.0007 (0.0033) model time 0.4495 (0.4499) loss 2.4203 (2.4281) grad_norm 4.6286 (3.8454) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][240/625] eta 0:02:54 lr 0.000033 wd 0.0500 time 0.4472 (0.4535) data time 0.0007 (0.0032) model time 0.4465 (0.4497) loss 2.2788 (2.4290) grad_norm 2.5462 (3.7917) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][250/625] eta 0:02:49 lr 0.000033 wd 0.0500 time 0.4440 (0.4532) data time 0.0008 (0.0032) model time 0.4432 (0.4495) loss 2.1965 (2.4338) grad_norm 1.8374 (3.7480) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:42:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][260/625] eta 0:02:45 lr 0.000033 wd 0.0500 time 0.4597 (0.4530) data time 0.0009 (0.0031) model time 0.4588 (0.4494) loss 3.0625 (2.4404) grad_norm 4.5456 (3.7381) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][270/625] eta 0:02:40 lr 0.000033 wd 0.0500 time 0.4484 (0.4529) data time 0.0008 (0.0030) model time 0.4476 (0.4493) loss 2.3109 (2.4448) grad_norm 3.1224 (3.9286) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][280/625] eta 0:02:36 lr 0.000033 wd 0.0500 time 0.4492 (0.4527) data time 0.0008 (0.0029) model time 0.4484 (0.4493) loss 2.8556 (2.4470) grad_norm 3.7925 (3.8950) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][290/625] eta 0:02:31 lr 0.000033 wd 0.0500 time 0.4460 (0.4526) data time 0.0007 (0.0028) model time 0.4453 (0.4492) loss 2.5223 (2.4514) grad_norm 4.3028 (3.9248) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][300/625] eta 0:02:27 lr 0.000033 wd 0.0500 time 0.4392 (0.4524) data time 0.0008 (0.0028) model time 0.4383 (0.4491) loss 1.6206 (2.4452) grad_norm 2.3679 (3.9177) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][310/625] eta 0:02:22 lr 0.000033 wd 0.0500 time 0.4448 (0.4522) data time 0.0010 (0.0027) model time 0.4438 (0.4490) loss 2.1416 (2.4466) grad_norm 3.0980 (3.9035) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][320/625] eta 0:02:17 lr 0.000033 wd 0.0500 time 0.4567 (0.4521) data time 0.0006 (0.0026) model time 0.4561 (0.4489) loss 2.6953 (2.4469) grad_norm 6.7055 (3.8854) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][330/625] eta 0:02:13 lr 0.000032 wd 0.0500 time 0.4460 (0.4519) data time 0.0008 (0.0026) model time 0.4452 (0.4488) loss 1.8928 (2.4439) grad_norm 3.7091 (3.8548) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][340/625] eta 0:02:08 lr 0.000032 wd 0.0500 time 0.4467 (0.4518) data time 0.0008 (0.0025) model time 0.4459 (0.4487) loss 3.0331 (2.4483) grad_norm 5.1550 (3.8419) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][350/625] eta 0:02:04 lr 0.000032 wd 0.0500 time 0.4486 (0.4517) data time 0.0007 (0.0025) model time 0.4479 (0.4487) loss 2.1241 (2.4406) grad_norm 8.6058 (3.8142) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][360/625] eta 0:01:59 lr 0.000032 wd 0.0500 time 0.4459 (0.4516) data time 0.0006 (0.0024) model time 0.4453 (0.4486) loss 2.2158 (2.4457) grad_norm 4.2015 (3.7898) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][370/625] eta 0:01:55 lr 0.000032 wd 0.0500 time 0.4452 (0.4515) data time 0.0006 (0.0024) model time 0.4445 (0.4486) loss 2.5844 (2.4484) grad_norm 3.1294 (3.7812) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][380/625] eta 0:01:50 lr 0.000032 wd 0.0500 time 0.4492 (0.4514) data time 0.0006 (0.0024) model time 0.4486 (0.4485) loss 2.5013 (2.4445) grad_norm 2.0867 (3.7776) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][390/625] eta 0:01:46 lr 0.000032 wd 0.0500 time 0.4473 (0.4513) data time 0.0006 (0.0023) model time 0.4467 (0.4485) loss 2.6642 (2.4420) grad_norm 3.7062 (3.7553) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:43:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][400/625] eta 0:01:41 lr 0.000032 wd 0.0500 time 0.4453 (0.4511) data time 0.0008 (0.0023) model time 0.4445 (0.4484) loss 2.7215 (2.4401) grad_norm 1.9748 (3.7395) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][410/625] eta 0:01:36 lr 0.000032 wd 0.0500 time 0.4461 (0.4510) data time 0.0006 (0.0022) model time 0.4454 (0.4483) loss 2.9829 (2.4448) grad_norm 2.7256 (3.7157) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][420/625] eta 0:01:32 lr 0.000032 wd 0.0500 time 0.4473 (0.4510) data time 0.0007 (0.0022) model time 0.4466 (0.4483) loss 2.5543 (2.4467) grad_norm 2.4973 (3.6988) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][430/625] eta 0:01:28 lr 0.000032 wd 0.0500 time 0.6580 (0.4514) data time 0.0009 (0.0022) model time 0.6571 (0.4488) loss 2.6665 (2.4455) grad_norm 2.9203 (3.6726) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][440/625] eta 0:01:23 lr 0.000032 wd 0.0500 time 0.4458 (0.4513) data time 0.0007 (0.0021) model time 0.4451 (0.4488) loss 2.9345 (2.4482) grad_norm 3.1072 (3.6494) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][450/625] eta 0:01:18 lr 0.000032 wd 0.0500 time 0.4450 (0.4512) data time 0.0008 (0.0021) model time 0.4442 (0.4487) loss 3.0971 (2.4488) grad_norm 2.0813 (3.6700) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][460/625] eta 0:01:14 lr 0.000032 wd 0.0500 time 0.4487 (0.4511) data time 0.0007 (0.0021) model time 0.4481 (0.4487) loss 2.5614 (2.4442) grad_norm 2.7937 (3.6811) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][470/625] eta 0:01:09 lr 0.000032 wd 0.0500 time 0.5756 (0.4513) data time 0.0008 (0.0021) model time 0.5748 (0.4489) loss 2.2109 (2.4422) grad_norm 2.0572 (3.6587) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][480/625] eta 0:01:05 lr 0.000032 wd 0.0500 time 0.4462 (0.4513) data time 0.0008 (0.0020) model time 0.4454 (0.4489) loss 2.5725 (2.4384) grad_norm 4.3484 (3.6446) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][490/625] eta 0:01:00 lr 0.000032 wd 0.0500 time 0.4460 (0.4512) data time 0.0007 (0.0020) model time 0.4453 (0.4489) loss 2.2071 (2.4398) grad_norm 4.9301 (3.6458) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][500/625] eta 0:00:56 lr 0.000032 wd 0.0500 time 0.4511 (0.4512) data time 0.0006 (0.0020) model time 0.4505 (0.4489) loss 2.2418 (2.4401) grad_norm 3.3604 (3.6342) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][510/625] eta 0:00:51 lr 0.000032 wd 0.0500 time 0.4476 (0.4511) data time 0.0007 (0.0020) model time 0.4470 (0.4488) loss 2.6618 (2.4396) grad_norm 3.0221 (3.6250) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][520/625] eta 0:00:47 lr 0.000032 wd 0.0500 time 0.4462 (0.4511) data time 0.0008 (0.0019) model time 0.4454 (0.4488) loss 2.5240 (2.4368) grad_norm 2.7857 (3.6107) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:44:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][530/625] eta 0:00:42 lr 0.000032 wd 0.0500 time 0.4438 (0.4510) data time 0.0006 (0.0019) model time 0.4432 (0.4487) loss 2.4316 (2.4351) grad_norm 2.8903 (3.5977) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][540/625] eta 0:00:38 lr 0.000032 wd 0.0500 time 0.4466 (0.4509) data time 0.0006 (0.0019) model time 0.4460 (0.4487) loss 2.5208 (2.4341) grad_norm 2.6018 (3.5928) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][550/625] eta 0:00:33 lr 0.000032 wd 0.0500 time 0.4428 (0.4508) data time 0.0009 (0.0019) model time 0.4420 (0.4486) loss 2.7599 (2.4337) grad_norm 2.7172 (3.5783) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][560/625] eta 0:00:29 lr 0.000032 wd 0.0500 time 0.4461 (0.4507) data time 0.0009 (0.0019) model time 0.4452 (0.4485) loss 2.3754 (2.4357) grad_norm 2.3611 (3.5624) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][570/625] eta 0:00:24 lr 0.000032 wd 0.0500 time 0.4471 (0.4507) data time 0.0006 (0.0018) model time 0.4465 (0.4485) loss 2.8180 (2.4382) grad_norm 2.6228 (3.5533) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][580/625] eta 0:00:20 lr 0.000032 wd 0.0500 time 0.4461 (0.4506) data time 0.0008 (0.0018) model time 0.4453 (0.4485) loss 2.7171 (2.4378) grad_norm 2.4791 (3.5724) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][590/625] eta 0:00:15 lr 0.000032 wd 0.0500 time 0.4522 (0.4506) data time 0.0007 (0.0018) model time 0.4516 (0.4485) loss 2.5578 (2.4360) grad_norm 3.2098 (3.6310) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][600/625] eta 0:00:11 lr 0.000032 wd 0.0500 time 0.4462 (0.4505) data time 0.0008 (0.0018) model time 0.4453 (0.4484) loss 2.6227 (2.4383) grad_norm 2.5927 (3.6142) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][610/625] eta 0:00:06 lr 0.000032 wd 0.0500 time 0.4406 (0.4504) data time 0.0004 (0.0018) model time 0.4402 (0.4483) loss 2.5301 (2.4403) grad_norm 2.4948 (3.6220) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [276/300][620/625] eta 0:00:02 lr 0.000032 wd 0.0500 time 0.5936 (0.4505) data time 0.0004 (0.0018) model time 0.5932 (0.4484) loss 2.9085 (2.4418) grad_norm 2.0902 (3.5990) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:39 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 276 training takes 0:04:41 [2024-08-11 10:45:39 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:45:40 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:45:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.497 (0.497) Loss 0.5332 (0.5332) Acc@1 88.672 (88.672) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.154) Loss 0.8364 (0.6292) Acc@1 81.250 (86.981) Acc@5 96.143 (97.794) Mem 16699MB [2024-08-11 10:45:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9395 (0.7512) Acc@1 78.955 (84.136) Acc@5 95.166 (96.631) Mem 16699MB [2024-08-11 10:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.841 Acc@5 96.579 [2024-08-11 10:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:45:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.807 (0.807) Loss 0.5190 (0.5190) Acc@1 89.355 (89.355) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 10:45:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8335 (0.6193) Acc@1 81.104 (87.105) Acc@5 96.289 (97.812) Mem 16699MB [2024-08-11 10:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.9136 (0.7376) Acc@1 79.541 (84.289) Acc@5 95.410 (96.735) Mem 16699MB [2024-08-11 10:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.039 Acc@5 96.685 [2024-08-11 10:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:45:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][0/625] eta 0:13:12 lr 0.000032 wd 0.0500 time 1.2686 (1.2686) data time 0.7568 (0.7568) model time 0.0000 (0.0000) loss 1.7845 (1.7845) grad_norm 3.2415 (3.2415) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][10/625] eta 0:05:22 lr 0.000032 wd 0.0500 time 0.4429 (0.5249) data time 0.0006 (0.0695) model time 0.0000 (0.0000) loss 3.0295 (2.5148) grad_norm 2.2176 (2.4854) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:45:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][20/625] eta 0:04:54 lr 0.000032 wd 0.0500 time 0.4435 (0.4872) data time 0.0008 (0.0368) model time 0.0000 (0.0000) loss 2.6980 (2.5725) grad_norm 3.9895 (2.6238) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][30/625] eta 0:04:41 lr 0.000032 wd 0.0500 time 0.4462 (0.4739) data time 0.0010 (0.0252) model time 0.0000 (0.0000) loss 2.9600 (2.5159) grad_norm 2.1115 (2.6166) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][40/625] eta 0:04:33 lr 0.000032 wd 0.0500 time 0.4457 (0.4668) data time 0.0007 (0.0193) model time 0.0000 (0.0000) loss 2.4060 (2.5137) grad_norm 2.2766 (2.6307) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][50/625] eta 0:04:25 lr 0.000032 wd 0.0500 time 0.4424 (0.4626) data time 0.0008 (0.0157) model time 0.0000 (0.0000) loss 2.1160 (2.5087) grad_norm 1.9686 (2.8431) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][60/625] eta 0:04:19 lr 0.000032 wd 0.0500 time 0.4538 (0.4599) data time 0.0007 (0.0132) model time 0.4531 (0.4453) loss 2.0332 (2.5112) grad_norm 6.4522 (2.9299) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][70/625] eta 0:04:14 lr 0.000031 wd 0.0500 time 0.4448 (0.4579) data time 0.0009 (0.0115) model time 0.4439 (0.4453) loss 1.8222 (2.4852) grad_norm 2.4932 (2.8776) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][80/625] eta 0:04:08 lr 0.000031 wd 0.0500 time 0.4462 (0.4568) data time 0.0008 (0.0102) model time 0.4454 (0.4460) loss 2.0504 (2.4868) grad_norm 3.3494 (2.8962) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][90/625] eta 0:04:03 lr 0.000031 wd 0.0500 time 0.4479 (0.4560) data time 0.0006 (0.0091) model time 0.4473 (0.4467) loss 1.9180 (2.4847) grad_norm 14.2258 (2.9953) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][100/625] eta 0:03:58 lr 0.000031 wd 0.0500 time 0.4460 (0.4552) data time 0.0007 (0.0083) model time 0.4452 (0.4468) loss 1.7501 (2.4766) grad_norm 3.5811 (3.1421) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][110/625] eta 0:03:54 lr 0.000031 wd 0.0500 time 0.4467 (0.4545) data time 0.0007 (0.0076) model time 0.4461 (0.4468) loss 2.7458 (2.4697) grad_norm 1.8997 (3.1023) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][120/625] eta 0:03:49 lr 0.000031 wd 0.0500 time 0.4456 (0.4538) data time 0.0008 (0.0071) model time 0.4448 (0.4466) loss 2.8301 (2.4573) grad_norm 4.5015 (3.0985) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][130/625] eta 0:03:44 lr 0.000031 wd 0.0500 time 0.4446 (0.4534) data time 0.0007 (0.0066) model time 0.4439 (0.4467) loss 1.8649 (2.4545) grad_norm 3.3046 (3.5729) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][140/625] eta 0:03:39 lr 0.000031 wd 0.0500 time 0.4436 (0.4529) data time 0.0007 (0.0062) model time 0.4429 (0.4466) loss 2.9008 (2.4533) grad_norm 3.4486 (3.5786) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:46:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][150/625] eta 0:03:35 lr 0.000031 wd 0.0500 time 0.4480 (0.4540) data time 0.0009 (0.0058) model time 0.4472 (0.4488) loss 1.4950 (2.4454) grad_norm 2.7904 (3.5343) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][160/625] eta 0:03:31 lr 0.000031 wd 0.0500 time 0.4454 (0.4546) data time 0.0008 (0.0055) model time 0.4446 (0.4500) loss 2.4351 (2.4386) grad_norm 2.6508 (3.4846) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][170/625] eta 0:03:26 lr 0.000031 wd 0.0500 time 0.4471 (0.4543) data time 0.0006 (0.0053) model time 0.4465 (0.4500) loss 2.7313 (2.4458) grad_norm 4.5662 (3.5606) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][180/625] eta 0:03:21 lr 0.000031 wd 0.0500 time 0.4495 (0.4539) data time 0.0009 (0.0050) model time 0.4486 (0.4497) loss 2.9864 (2.4528) grad_norm 2.0778 (3.5088) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][190/625] eta 0:03:17 lr 0.000031 wd 0.0500 time 0.4464 (0.4535) data time 0.0008 (0.0048) model time 0.4457 (0.4494) loss 2.7974 (2.4650) grad_norm 2.4643 (3.4969) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][200/625] eta 0:03:12 lr 0.000031 wd 0.0500 time 0.4443 (0.4532) data time 0.0006 (0.0046) model time 0.4437 (0.4492) loss 2.4355 (2.4685) grad_norm 3.1371 (3.5571) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][210/625] eta 0:03:07 lr 0.000031 wd 0.0500 time 0.4508 (0.4529) data time 0.0006 (0.0044) model time 0.4502 (0.4491) loss 2.7902 (2.4762) grad_norm 3.2651 (3.6172) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][220/625] eta 0:03:03 lr 0.000031 wd 0.0500 time 0.4461 (0.4528) data time 0.0008 (0.0042) model time 0.4452 (0.4490) loss 2.1956 (2.4713) grad_norm 4.2771 (3.5927) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][230/625] eta 0:02:58 lr 0.000031 wd 0.0500 time 0.4561 (0.4527) data time 0.0009 (0.0041) model time 0.4553 (0.4490) loss 2.7894 (2.4695) grad_norm 1.9995 (3.5953) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][240/625] eta 0:02:54 lr 0.000031 wd 0.0500 time 0.4494 (0.4525) data time 0.0006 (0.0040) model time 0.4488 (0.4489) loss 2.8165 (2.4592) grad_norm 2.8084 (3.5526) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 10:47:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][250/625] eta 0:02:49 lr 0.000031 wd 0.0500 time 0.4457 (0.4523) data time 0.0006 (0.0038) model time 0.4451 (0.4489) loss 2.4756 (2.4574) grad_norm 3.2840 (3.5397) loss_scale 128.0000 (64.2550) mem 16699MB [2024-08-11 10:47:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][260/625] eta 0:02:45 lr 0.000031 wd 0.0500 time 0.4511 (0.4523) data time 0.0006 (0.0037) model time 0.4505 (0.4490) loss 1.5160 (2.4609) grad_norm 2.0371 (3.5184) loss_scale 128.0000 (66.6973) mem 16699MB [2024-08-11 10:47:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][270/625] eta 0:02:40 lr 0.000031 wd 0.0500 time 0.4510 (0.4522) data time 0.0009 (0.0036) model time 0.4501 (0.4489) loss 2.4419 (2.4598) grad_norm 10.2021 (3.5166) loss_scale 128.0000 (68.9594) mem 16699MB [2024-08-11 10:47:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][280/625] eta 0:02:35 lr 0.000031 wd 0.0500 time 0.4481 (0.4520) data time 0.0008 (0.0035) model time 0.4473 (0.4488) loss 2.7183 (2.4598) grad_norm 2.2785 (3.5223) loss_scale 128.0000 (71.0605) mem 16699MB [2024-08-11 10:47:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][290/625] eta 0:02:31 lr 0.000031 wd 0.0500 time 0.4710 (0.4520) data time 0.0009 (0.0034) model time 0.4701 (0.4490) loss 1.9591 (2.4555) grad_norm 2.7749 (3.4955) loss_scale 128.0000 (73.0172) mem 16699MB [2024-08-11 10:48:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][300/625] eta 0:02:26 lr 0.000031 wd 0.0500 time 0.4478 (0.4519) data time 0.0008 (0.0033) model time 0.4471 (0.4489) loss 2.3024 (2.4486) grad_norm 26.4222 (3.5639) loss_scale 128.0000 (74.8439) mem 16699MB [2024-08-11 10:48:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][310/625] eta 0:02:22 lr 0.000031 wd 0.0500 time 0.4535 (0.4519) data time 0.0007 (0.0033) model time 0.4529 (0.4490) loss 2.9920 (2.4553) grad_norm 56.7785 (3.7380) loss_scale 128.0000 (76.5531) mem 16699MB [2024-08-11 10:48:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][320/625] eta 0:02:17 lr 0.000031 wd 0.0500 time 0.4426 (0.4518) data time 0.0008 (0.0032) model time 0.4418 (0.4489) loss 1.7005 (2.4490) grad_norm 4.3616 (3.7272) loss_scale 128.0000 (78.1558) mem 16699MB [2024-08-11 10:48:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][330/625] eta 0:02:13 lr 0.000031 wd 0.0500 time 0.4452 (0.4517) data time 0.0006 (0.0031) model time 0.4446 (0.4488) loss 2.6495 (2.4493) grad_norm 4.0155 (3.8448) loss_scale 128.0000 (79.6616) mem 16699MB [2024-08-11 10:48:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][340/625] eta 0:02:08 lr 0.000031 wd 0.0500 time 0.4476 (0.4516) data time 0.0008 (0.0030) model time 0.4468 (0.4488) loss 2.6906 (2.4496) grad_norm 3.0391 (3.8303) loss_scale 128.0000 (81.0792) mem 16699MB [2024-08-11 10:48:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][350/625] eta 0:02:04 lr 0.000031 wd 0.0500 time 0.4423 (0.4514) data time 0.0009 (0.0030) model time 0.4415 (0.4487) loss 2.5633 (2.4482) grad_norm 2.4337 (3.8056) loss_scale 128.0000 (82.4160) mem 16699MB [2024-08-11 10:48:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][360/625] eta 0:01:59 lr 0.000031 wd 0.0500 time 0.4472 (0.4513) data time 0.0006 (0.0029) model time 0.4466 (0.4486) loss 1.5741 (2.4415) grad_norm 2.6371 (3.7768) loss_scale 128.0000 (83.6787) mem 16699MB [2024-08-11 10:48:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][370/625] eta 0:01:55 lr 0.000031 wd 0.0500 time 0.4491 (0.4513) data time 0.0009 (0.0029) model time 0.4482 (0.4486) loss 2.2209 (2.4429) grad_norm 2.3205 (3.7387) loss_scale 128.0000 (84.8733) mem 16699MB [2024-08-11 10:48:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][380/625] eta 0:01:50 lr 0.000031 wd 0.0500 time 0.4458 (0.4513) data time 0.0006 (0.0028) model time 0.4452 (0.4487) loss 2.3891 (2.4419) grad_norm 3.4183 (3.7259) loss_scale 128.0000 (86.0052) mem 16699MB [2024-08-11 10:48:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][390/625] eta 0:01:46 lr 0.000031 wd 0.0500 time 0.4508 (0.4513) data time 0.0009 (0.0028) model time 0.4499 (0.4488) loss 2.3158 (2.4404) grad_norm 3.0035 (3.7129) loss_scale 128.0000 (87.0793) mem 16699MB [2024-08-11 10:48:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][400/625] eta 0:01:41 lr 0.000031 wd 0.0500 time 0.4444 (0.4512) data time 0.0008 (0.0027) model time 0.4436 (0.4487) loss 2.9034 (2.4429) grad_norm 2.4433 (3.6826) loss_scale 128.0000 (88.0998) mem 16699MB [2024-08-11 10:48:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][410/625] eta 0:01:36 lr 0.000031 wd 0.0500 time 0.4508 (0.4512) data time 0.0009 (0.0027) model time 0.4499 (0.4487) loss 1.8300 (2.4391) grad_norm 2.0800 (3.6533) loss_scale 128.0000 (89.0706) mem 16699MB [2024-08-11 10:48:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][420/625] eta 0:01:32 lr 0.000031 wd 0.0500 time 0.4462 (0.4511) data time 0.0009 (0.0026) model time 0.4453 (0.4486) loss 1.4342 (2.4376) grad_norm 4.1082 (3.6367) loss_scale 128.0000 (89.9952) mem 16699MB [2024-08-11 10:49:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][430/625] eta 0:01:27 lr 0.000031 wd 0.0500 time 0.4543 (0.4510) data time 0.0008 (0.0026) model time 0.4535 (0.4486) loss 2.8906 (2.4369) grad_norm 2.4725 (3.6293) loss_scale 128.0000 (90.8770) mem 16699MB [2024-08-11 10:49:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][440/625] eta 0:01:23 lr 0.000030 wd 0.0500 time 0.4498 (0.4510) data time 0.0009 (0.0025) model time 0.4489 (0.4487) loss 3.1155 (2.4390) grad_norm 2.7484 (3.6023) loss_scale 128.0000 (91.7188) mem 16699MB [2024-08-11 10:49:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][450/625] eta 0:01:18 lr 0.000030 wd 0.0500 time 0.4540 (0.4510) data time 0.0007 (0.0025) model time 0.4533 (0.4487) loss 2.3176 (2.4426) grad_norm 45.3147 (3.6826) loss_scale 128.0000 (92.5233) mem 16699MB [2024-08-11 10:49:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][460/625] eta 0:01:14 lr 0.000030 wd 0.0500 time 0.4497 (0.4510) data time 0.0007 (0.0025) model time 0.4490 (0.4487) loss 1.6162 (2.4409) grad_norm 2.4882 (3.6792) loss_scale 128.0000 (93.2928) mem 16699MB [2024-08-11 10:49:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][470/625] eta 0:01:09 lr 0.000030 wd 0.0500 time 0.4463 (0.4509) data time 0.0006 (0.0024) model time 0.4457 (0.4487) loss 2.1863 (2.4380) grad_norm 3.8769 (3.6685) loss_scale 128.0000 (94.0297) mem 16699MB [2024-08-11 10:49:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][480/625] eta 0:01:05 lr 0.000030 wd 0.0500 time 0.4464 (0.4513) data time 0.0006 (0.0024) model time 0.4458 (0.4491) loss 2.3301 (2.4400) grad_norm 2.8466 (3.6558) loss_scale 128.0000 (94.7360) mem 16699MB [2024-08-11 10:49:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][490/625] eta 0:01:00 lr 0.000030 wd 0.0500 time 0.4470 (0.4515) data time 0.0006 (0.0024) model time 0.4463 (0.4494) loss 2.0842 (2.4360) grad_norm 2.4230 (3.6405) loss_scale 128.0000 (95.4134) mem 16699MB [2024-08-11 10:49:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][500/625] eta 0:00:56 lr 0.000030 wd 0.0500 time 0.4462 (0.4514) data time 0.0009 (0.0023) model time 0.4453 (0.4494) loss 2.5533 (2.4344) grad_norm 2.7274 (3.6191) loss_scale 128.0000 (96.0639) mem 16699MB [2024-08-11 10:49:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][510/625] eta 0:00:51 lr 0.000030 wd 0.0500 time 0.4545 (0.4514) data time 0.0007 (0.0023) model time 0.4538 (0.4493) loss 1.5312 (2.4390) grad_norm 2.8487 (3.6049) loss_scale 128.0000 (96.6888) mem 16699MB [2024-08-11 10:49:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][520/625] eta 0:00:47 lr 0.000030 wd 0.0500 time 0.4480 (0.4513) data time 0.0009 (0.0023) model time 0.4471 (0.4493) loss 2.7209 (2.4412) grad_norm 2.4308 (3.5869) loss_scale 128.0000 (97.2898) mem 16699MB [2024-08-11 10:49:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][530/625] eta 0:00:42 lr 0.000030 wd 0.0500 time 0.4496 (0.4513) data time 0.0009 (0.0023) model time 0.4488 (0.4493) loss 2.4578 (2.4365) grad_norm 2.6778 (3.5703) loss_scale 128.0000 (97.8682) mem 16699MB [2024-08-11 10:49:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][540/625] eta 0:00:38 lr 0.000030 wd 0.0500 time 0.4555 (0.4513) data time 0.0009 (0.0022) model time 0.4546 (0.4493) loss 2.0346 (2.4372) grad_norm 2.4061 (3.5652) loss_scale 128.0000 (98.4251) mem 16699MB [2024-08-11 10:49:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][550/625] eta 0:00:33 lr 0.000030 wd 0.0500 time 0.4482 (0.4513) data time 0.0007 (0.0022) model time 0.4475 (0.4493) loss 1.8704 (2.4377) grad_norm 2.2173 (3.5576) loss_scale 128.0000 (98.9619) mem 16699MB [2024-08-11 10:50:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][560/625] eta 0:00:29 lr 0.000030 wd 0.0500 time 0.4481 (0.4512) data time 0.0007 (0.0022) model time 0.4473 (0.4492) loss 1.9569 (2.4350) grad_norm 2.0783 (3.6667) loss_scale 128.0000 (99.4795) mem 16699MB [2024-08-11 10:50:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][570/625] eta 0:00:24 lr 0.000030 wd 0.0500 time 0.4468 (0.4511) data time 0.0007 (0.0022) model time 0.4461 (0.4492) loss 2.2873 (2.4315) grad_norm 3.6292 (3.6635) loss_scale 128.0000 (99.9790) mem 16699MB [2024-08-11 10:50:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][580/625] eta 0:00:20 lr 0.000030 wd 0.0500 time 0.4464 (0.4511) data time 0.0007 (0.0021) model time 0.4457 (0.4491) loss 2.4820 (2.4303) grad_norm 2.3289 (3.6504) loss_scale 128.0000 (100.4613) mem 16699MB [2024-08-11 10:50:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][590/625] eta 0:00:15 lr 0.000030 wd 0.0500 time 0.4451 (0.4510) data time 0.0007 (0.0021) model time 0.4444 (0.4491) loss 2.7096 (2.4333) grad_norm 3.9816 (3.6388) loss_scale 128.0000 (100.9272) mem 16699MB [2024-08-11 10:50:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][600/625] eta 0:00:11 lr 0.000030 wd 0.0500 time 0.4470 (0.4510) data time 0.0006 (0.0021) model time 0.4464 (0.4491) loss 3.0796 (2.4374) grad_norm 4.0518 (3.6292) loss_scale 128.0000 (101.3777) mem 16699MB [2024-08-11 10:50:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][610/625] eta 0:00:06 lr 0.000030 wd 0.0500 time 0.4438 (0.4510) data time 0.0004 (0.0021) model time 0.4434 (0.4490) loss 2.1262 (2.4364) grad_norm 15.8192 (3.6351) loss_scale 128.0000 (101.8134) mem 16699MB [2024-08-11 10:50:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [277/300][620/625] eta 0:00:02 lr 0.000030 wd 0.0500 time 0.4424 (0.4508) data time 0.0005 (0.0021) model time 0.4418 (0.4489) loss 2.2876 (2.4352) grad_norm 2.6112 (3.6229) loss_scale 128.0000 (102.2351) mem 16699MB [2024-08-11 10:50:29 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 277 training takes 0:04:41 [2024-08-11 10:50:29 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:50:31 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.498 (0.498) Loss 0.5361 (0.5361) Acc@1 88.574 (88.574) Acc@5 98.877 (98.877) Mem 16699MB [2024-08-11 10:50:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.154) Loss 0.8408 (0.6333) Acc@1 80.859 (86.856) Acc@5 96.191 (97.741) Mem 16699MB [2024-08-11 10:50:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.9429 (0.7564) Acc@1 79.395 (83.991) Acc@5 95.264 (96.584) Mem 16699MB [2024-08-11 10:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.749 Acc@5 96.547 [2024-08-11 10:50:34 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.7% [2024-08-11 10:50:35 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.797 (0.797) Loss 0.5195 (0.5195) Acc@1 89.258 (89.258) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 10:50:36 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.183) Loss 0.8345 (0.6199) Acc@1 81.055 (87.096) Acc@5 96.191 (97.807) Mem 16699MB [2024-08-11 10:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.9155 (0.7387) Acc@1 79.590 (84.282) Acc@5 95.410 (96.715) Mem 16699MB [2024-08-11 10:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.037 Acc@5 96.663 [2024-08-11 10:50:37 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:50:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][0/625] eta 0:14:06 lr 0.000030 wd 0.0500 time 1.3549 (1.3549) data time 0.6811 (0.6811) model time 0.0000 (0.0000) loss 3.1625 (3.1625) grad_norm 2.9691 (2.9691) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:50:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][10/625] eta 0:05:26 lr 0.000030 wd 0.0500 time 0.4492 (0.5315) data time 0.0008 (0.0627) model time 0.0000 (0.0000) loss 2.9863 (2.5556) grad_norm 2.6287 (6.9199) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:50:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][20/625] eta 0:04:57 lr 0.000030 wd 0.0500 time 0.4507 (0.4920) data time 0.0008 (0.0333) model time 0.0000 (0.0000) loss 2.8057 (2.5715) grad_norm 2.1795 (4.8584) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:50:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][30/625] eta 0:04:45 lr 0.000030 wd 0.0500 time 0.4525 (0.4792) data time 0.0008 (0.0228) model time 0.0000 (0.0000) loss 2.3708 (2.5407) grad_norm 1.7570 (4.3928) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:50:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][40/625] eta 0:04:36 lr 0.000030 wd 0.0500 time 0.4532 (0.4731) data time 0.0008 (0.0175) model time 0.0000 (0.0000) loss 2.9402 (2.5674) grad_norm 2.9476 (3.9755) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][50/625] eta 0:04:29 lr 0.000030 wd 0.0500 time 0.4492 (0.4685) data time 0.0008 (0.0142) model time 0.0000 (0.0000) loss 2.7041 (2.5064) grad_norm 2.5178 (3.7669) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][60/625] eta 0:04:22 lr 0.000030 wd 0.0500 time 0.4461 (0.4654) data time 0.0009 (0.0120) model time 0.4452 (0.4484) loss 2.8829 (2.5265) grad_norm 3.0201 (3.6066) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][70/625] eta 0:04:17 lr 0.000030 wd 0.0500 time 0.4488 (0.4631) data time 0.0007 (0.0105) model time 0.4481 (0.4485) loss 2.9928 (2.5404) grad_norm 3.7902 (3.5532) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][80/625] eta 0:04:12 lr 0.000030 wd 0.0500 time 0.4483 (0.4638) data time 0.0007 (0.0093) model time 0.4476 (0.4549) loss 2.5862 (2.5250) grad_norm 2.8816 (3.4441) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][90/625] eta 0:04:07 lr 0.000030 wd 0.0500 time 0.4469 (0.4621) data time 0.0009 (0.0084) model time 0.4460 (0.4530) loss 2.7648 (2.5173) grad_norm 2.2429 (3.3311) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][100/625] eta 0:04:01 lr 0.000030 wd 0.0500 time 0.4494 (0.4608) data time 0.0007 (0.0076) model time 0.4487 (0.4520) loss 3.2913 (2.5396) grad_norm 2.1542 (3.3031) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][110/625] eta 0:03:56 lr 0.000030 wd 0.0500 time 0.4474 (0.4596) data time 0.0007 (0.0070) model time 0.4467 (0.4512) loss 1.8437 (2.5391) grad_norm 2.8470 (3.2696) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][120/625] eta 0:03:51 lr 0.000030 wd 0.0500 time 0.4473 (0.4588) data time 0.0008 (0.0065) model time 0.4465 (0.4508) loss 1.5451 (2.5131) grad_norm 2.3018 (3.2736) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][130/625] eta 0:03:46 lr 0.000030 wd 0.0500 time 0.4449 (0.4579) data time 0.0008 (0.0061) model time 0.4441 (0.4502) loss 2.6176 (2.4979) grad_norm 2.1972 (3.2598) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][140/625] eta 0:03:41 lr 0.000030 wd 0.0500 time 0.4486 (0.4571) data time 0.0009 (0.0057) model time 0.4478 (0.4498) loss 2.2329 (2.4788) grad_norm 2.2222 (3.3367) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][150/625] eta 0:03:36 lr 0.000030 wd 0.0500 time 0.4455 (0.4564) data time 0.0007 (0.0054) model time 0.4448 (0.4494) loss 1.7170 (2.4704) grad_norm 3.4414 (3.3383) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][160/625] eta 0:03:31 lr 0.000030 wd 0.0500 time 0.4466 (0.4558) data time 0.0006 (0.0051) model time 0.4460 (0.4491) loss 2.0081 (2.4712) grad_norm 2.7777 (3.3005) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:51:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][170/625] eta 0:03:27 lr 0.000030 wd 0.0500 time 0.4471 (0.4553) data time 0.0006 (0.0048) model time 0.4465 (0.4489) loss 2.5324 (2.4664) grad_norm 2.3724 (3.2806) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][180/625] eta 0:03:22 lr 0.000030 wd 0.0500 time 0.4479 (0.4558) data time 0.0006 (0.0046) model time 0.4473 (0.4500) loss 2.5509 (2.4668) grad_norm 2.5689 (3.3052) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][190/625] eta 0:03:18 lr 0.000030 wd 0.0500 time 0.4505 (0.4554) data time 0.0006 (0.0044) model time 0.4499 (0.4498) loss 2.6251 (2.4657) grad_norm 2.3417 (3.2707) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][200/625] eta 0:03:13 lr 0.000029 wd 0.0500 time 0.4469 (0.4550) data time 0.0009 (0.0042) model time 0.4460 (0.4496) loss 2.7345 (2.4589) grad_norm 3.0243 (3.2567) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][210/625] eta 0:03:08 lr 0.000029 wd 0.0500 time 0.4480 (0.4547) data time 0.0009 (0.0041) model time 0.4470 (0.4494) loss 2.8547 (2.4529) grad_norm 2.2883 (3.2488) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][220/625] eta 0:03:04 lr 0.000029 wd 0.0500 time 0.4481 (0.4543) data time 0.0008 (0.0039) model time 0.4472 (0.4493) loss 2.3708 (2.4525) grad_norm 2.7536 (3.2627) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][230/625] eta 0:02:59 lr 0.000029 wd 0.0500 time 0.4461 (0.4540) data time 0.0011 (0.0038) model time 0.4450 (0.4491) loss 2.6202 (2.4421) grad_norm 1.9033 (3.2801) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][240/625] eta 0:02:54 lr 0.000029 wd 0.0500 time 0.4472 (0.4538) data time 0.0008 (0.0037) model time 0.4465 (0.4491) loss 2.6531 (2.4403) grad_norm 1.8954 (3.3483) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][250/625] eta 0:02:50 lr 0.000029 wd 0.0500 time 0.4605 (0.4537) data time 0.0008 (0.0035) model time 0.4597 (0.4491) loss 2.4223 (2.4380) grad_norm 2.3749 (3.3364) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][260/625] eta 0:02:45 lr 0.000029 wd 0.0500 time 0.4504 (0.4536) data time 0.0008 (0.0034) model time 0.4496 (0.4491) loss 2.7261 (2.4394) grad_norm 2.3049 (3.3057) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][270/625] eta 0:02:41 lr 0.000029 wd 0.0500 time 0.4490 (0.4540) data time 0.0007 (0.0033) model time 0.4483 (0.4498) loss 2.8244 (2.4501) grad_norm 3.2653 (3.2851) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][280/625] eta 0:02:36 lr 0.000029 wd 0.0500 time 0.4467 (0.4538) data time 0.0009 (0.0032) model time 0.4458 (0.4497) loss 2.7040 (2.4504) grad_norm 2.7043 (3.2540) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][290/625] eta 0:02:31 lr 0.000029 wd 0.0500 time 0.4505 (0.4536) data time 0.0006 (0.0032) model time 0.4499 (0.4496) loss 2.2898 (2.4551) grad_norm 2.5692 (3.2450) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][300/625] eta 0:02:27 lr 0.000029 wd 0.0500 time 0.4472 (0.4534) data time 0.0006 (0.0031) model time 0.4465 (0.4495) loss 2.5759 (2.4568) grad_norm 4.7653 (3.2264) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:52:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][310/625] eta 0:02:22 lr 0.000029 wd 0.0500 time 0.4490 (0.4532) data time 0.0008 (0.0030) model time 0.4482 (0.4494) loss 2.8324 (2.4586) grad_norm 17.8724 (3.3969) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][320/625] eta 0:02:18 lr 0.000029 wd 0.0500 time 0.4480 (0.4530) data time 0.0008 (0.0029) model time 0.4472 (0.4493) loss 2.3917 (2.4557) grad_norm 3.0790 (3.3763) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][330/625] eta 0:02:13 lr 0.000029 wd 0.0500 time 0.4467 (0.4529) data time 0.0009 (0.0029) model time 0.4458 (0.4493) loss 2.5633 (2.4612) grad_norm 2.8594 (3.3615) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][340/625] eta 0:02:09 lr 0.000029 wd 0.0500 time 0.4498 (0.4528) data time 0.0008 (0.0028) model time 0.4490 (0.4492) loss 2.3295 (2.4591) grad_norm 3.2750 (3.3396) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][350/625] eta 0:02:04 lr 0.000029 wd 0.0500 time 0.4504 (0.4527) data time 0.0006 (0.0028) model time 0.4498 (0.4492) loss 2.5968 (2.4607) grad_norm 3.9608 (3.3194) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][360/625] eta 0:01:59 lr 0.000029 wd 0.0500 time 0.4433 (0.4526) data time 0.0007 (0.0027) model time 0.4426 (0.4491) loss 2.3649 (2.4636) grad_norm 4.8668 (3.3222) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][370/625] eta 0:01:55 lr 0.000029 wd 0.0500 time 0.4493 (0.4524) data time 0.0008 (0.0027) model time 0.4485 (0.4490) loss 3.0133 (2.4644) grad_norm 2.9813 (3.3080) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][380/625] eta 0:01:50 lr 0.000029 wd 0.0500 time 0.4447 (0.4523) data time 0.0009 (0.0026) model time 0.4438 (0.4490) loss 2.0775 (2.4628) grad_norm 3.1119 (3.3107) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][390/625] eta 0:01:46 lr 0.000029 wd 0.0500 time 0.4482 (0.4522) data time 0.0010 (0.0026) model time 0.4472 (0.4489) loss 1.6696 (2.4546) grad_norm 3.3595 (3.3069) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][400/625] eta 0:01:41 lr 0.000029 wd 0.0500 time 0.4479 (0.4521) data time 0.0006 (0.0025) model time 0.4473 (0.4489) loss 2.6138 (2.4517) grad_norm 2.6182 (3.3066) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][410/625] eta 0:01:37 lr 0.000029 wd 0.0500 time 0.4501 (0.4520) data time 0.0007 (0.0025) model time 0.4494 (0.4489) loss 2.4594 (2.4542) grad_norm 2.9969 (3.2979) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][420/625] eta 0:01:32 lr 0.000029 wd 0.0500 time 0.4468 (0.4519) data time 0.0006 (0.0024) model time 0.4462 (0.4488) loss 2.4214 (2.4483) grad_norm 2.7926 (3.2847) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][430/625] eta 0:01:28 lr 0.000029 wd 0.0500 time 0.4436 (0.4518) data time 0.0009 (0.0024) model time 0.4427 (0.4487) loss 2.6766 (2.4470) grad_norm 2.6379 (3.2788) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:53:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][440/625] eta 0:01:23 lr 0.000029 wd 0.0500 time 0.4464 (0.4517) data time 0.0009 (0.0024) model time 0.4455 (0.4487) loss 2.7749 (2.4463) grad_norm 2.2602 (3.2666) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][450/625] eta 0:01:19 lr 0.000029 wd 0.0500 time 0.4458 (0.4516) data time 0.0007 (0.0023) model time 0.4450 (0.4486) loss 2.2702 (2.4485) grad_norm 2.2879 (3.2760) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][460/625] eta 0:01:14 lr 0.000029 wd 0.0500 time 0.4480 (0.4515) data time 0.0008 (0.0023) model time 0.4472 (0.4486) loss 2.6454 (2.4476) grad_norm 2.4167 (3.3213) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][470/625] eta 0:01:09 lr 0.000029 wd 0.0500 time 0.4483 (0.4514) data time 0.0006 (0.0023) model time 0.4477 (0.4486) loss 2.7919 (2.4464) grad_norm 4.2196 (3.3579) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][480/625] eta 0:01:05 lr 0.000029 wd 0.0500 time 0.4508 (0.4514) data time 0.0008 (0.0022) model time 0.4500 (0.4485) loss 2.4929 (2.4457) grad_norm 1.8253 (3.3580) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][490/625] eta 0:01:00 lr 0.000029 wd 0.0500 time 0.4486 (0.4517) data time 0.0008 (0.0022) model time 0.4479 (0.4490) loss 2.8415 (2.4476) grad_norm 2.4958 (3.3511) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][500/625] eta 0:00:56 lr 0.000029 wd 0.0500 time 0.4470 (0.4516) data time 0.0007 (0.0022) model time 0.4463 (0.4489) loss 2.3129 (2.4446) grad_norm 2.4041 (3.3469) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][510/625] eta 0:00:51 lr 0.000029 wd 0.0500 time 0.4471 (0.4515) data time 0.0007 (0.0022) model time 0.4465 (0.4488) loss 2.5862 (2.4469) grad_norm 1.7753 (3.3394) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][520/625] eta 0:00:47 lr 0.000029 wd 0.0500 time 0.4451 (0.4517) data time 0.0007 (0.0021) model time 0.4444 (0.4491) loss 1.9029 (2.4451) grad_norm 2.2753 (3.3252) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][530/625] eta 0:00:42 lr 0.000029 wd 0.0500 time 0.4484 (0.4516) data time 0.0006 (0.0021) model time 0.4477 (0.4490) loss 2.1134 (2.4484) grad_norm 2.3224 (3.3092) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][540/625] eta 0:00:38 lr 0.000029 wd 0.0500 time 0.4545 (0.4516) data time 0.0008 (0.0021) model time 0.4538 (0.4491) loss 2.0113 (2.4473) grad_norm 2.3418 (3.3013) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][550/625] eta 0:00:33 lr 0.000029 wd 0.0500 time 0.4482 (0.4517) data time 0.0007 (0.0021) model time 0.4476 (0.4491) loss 2.8827 (2.4503) grad_norm 3.5732 (3.3013) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][560/625] eta 0:00:29 lr 0.000029 wd 0.0500 time 0.4485 (0.4516) data time 0.0008 (0.0020) model time 0.4477 (0.4491) loss 2.5907 (2.4508) grad_norm 2.3054 (3.2903) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:54:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][570/625] eta 0:00:24 lr 0.000029 wd 0.0500 time 0.4455 (0.4516) data time 0.0006 (0.0020) model time 0.4449 (0.4491) loss 2.3350 (2.4527) grad_norm 3.7104 (3.2866) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][580/625] eta 0:00:20 lr 0.000029 wd 0.0500 time 0.4565 (0.4516) data time 0.0006 (0.0020) model time 0.4558 (0.4491) loss 2.5173 (2.4536) grad_norm 5.7590 (3.3244) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][590/625] eta 0:00:15 lr 0.000028 wd 0.0500 time 0.4466 (0.4515) data time 0.0009 (0.0020) model time 0.4457 (0.4491) loss 2.7910 (2.4528) grad_norm 2.9168 (3.3344) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][600/625] eta 0:00:11 lr 0.000028 wd 0.0500 time 0.4527 (0.4515) data time 0.0006 (0.0020) model time 0.4520 (0.4491) loss 1.9179 (2.4539) grad_norm 14.9439 (3.3605) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][610/625] eta 0:00:06 lr 0.000028 wd 0.0500 time 0.4587 (0.4515) data time 0.0004 (0.0020) model time 0.4582 (0.4491) loss 1.5200 (2.4500) grad_norm 2.0467 (3.3656) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [278/300][620/625] eta 0:00:02 lr 0.000028 wd 0.0500 time 0.4513 (0.4514) data time 0.0005 (0.0019) model time 0.4507 (0.4490) loss 2.4767 (2.4509) grad_norm 2.8445 (3.3560) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:20 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 278 training takes 0:04:42 [2024-08-11 10:55:20 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 10:55:21 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 10:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.519 (0.519) Loss 0.5317 (0.5317) Acc@1 88.672 (88.672) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 10:55:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.156) Loss 0.8608 (0.6298) Acc@1 80.811 (86.963) Acc@5 95.947 (97.745) Mem 16699MB [2024-08-11 10:55:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.137) Loss 0.9395 (0.7522) Acc@1 79.053 (84.089) Acc@5 95.166 (96.636) Mem 16699MB [2024-08-11 10:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.813 Acc@5 96.579 [2024-08-11 10:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 10:55:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.827 (0.827) Loss 0.5200 (0.5200) Acc@1 89.160 (89.160) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 10:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8350 (0.6206) Acc@1 81.152 (87.114) Acc@5 96.143 (97.772) Mem 16699MB [2024-08-11 10:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.9175 (0.7396) Acc@1 79.492 (84.294) Acc@5 95.361 (96.682) Mem 16699MB [2024-08-11 10:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.045 Acc@5 96.629 [2024-08-11 10:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 10:55:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][0/625] eta 0:13:37 lr 0.000028 wd 0.0500 time 1.3083 (1.3083) data time 0.6844 (0.6844) model time 0.0000 (0.0000) loss 2.7424 (2.7424) grad_norm 2.3373 (2.3373) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][10/625] eta 0:05:34 lr 0.000028 wd 0.0500 time 0.4479 (0.5446) data time 0.0008 (0.0629) model time 0.0000 (0.0000) loss 2.0294 (2.3822) grad_norm 3.1305 (2.6509) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][20/625] eta 0:05:01 lr 0.000028 wd 0.0500 time 0.4470 (0.4985) data time 0.0008 (0.0334) model time 0.0000 (0.0000) loss 1.9738 (2.3646) grad_norm 4.0593 (3.0860) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][30/625] eta 0:04:47 lr 0.000028 wd 0.0500 time 0.4490 (0.4824) data time 0.0008 (0.0229) model time 0.0000 (0.0000) loss 2.6617 (2.3451) grad_norm 2.5283 (3.1589) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][40/625] eta 0:04:37 lr 0.000028 wd 0.0500 time 0.4471 (0.4741) data time 0.0009 (0.0175) model time 0.0000 (0.0000) loss 2.3736 (2.3195) grad_norm 2.4092 (3.0513) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][50/625] eta 0:04:30 lr 0.000028 wd 0.0500 time 0.4565 (0.4701) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 1.4887 (2.3082) grad_norm 3.7594 (3.0294) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:55:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][60/625] eta 0:04:24 lr 0.000028 wd 0.0500 time 0.4496 (0.4673) data time 0.0009 (0.0120) model time 0.4487 (0.4522) loss 2.3426 (2.3320) grad_norm 4.3823 (3.0235) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][70/625] eta 0:04:19 lr 0.000028 wd 0.0500 time 0.4520 (0.4668) data time 0.0007 (0.0105) model time 0.4513 (0.4575) loss 2.4820 (2.3501) grad_norm 2.7286 (3.2172) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][80/625] eta 0:04:13 lr 0.000028 wd 0.0500 time 0.4459 (0.4646) data time 0.0006 (0.0093) model time 0.4453 (0.4543) loss 2.5985 (2.3537) grad_norm 2.6890 (3.2207) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][90/625] eta 0:04:07 lr 0.000028 wd 0.0500 time 0.4451 (0.4628) data time 0.0007 (0.0083) model time 0.4444 (0.4526) loss 3.0435 (2.3875) grad_norm 3.1756 (3.2857) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][100/625] eta 0:04:02 lr 0.000028 wd 0.0500 time 0.4498 (0.4614) data time 0.0009 (0.0076) model time 0.4489 (0.4516) loss 2.6816 (2.4127) grad_norm 2.5873 (3.2099) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][110/625] eta 0:03:57 lr 0.000028 wd 0.0500 time 0.4489 (0.4603) data time 0.0009 (0.0070) model time 0.4480 (0.4511) loss 2.5190 (2.4134) grad_norm 2.8065 (3.1566) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][120/625] eta 0:03:51 lr 0.000028 wd 0.0500 time 0.4469 (0.4594) data time 0.0010 (0.0065) model time 0.4460 (0.4507) loss 2.7965 (2.4296) grad_norm 2.7719 (3.1430) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][130/625] eta 0:03:47 lr 0.000028 wd 0.0500 time 0.4507 (0.4587) data time 0.0009 (0.0061) model time 0.4498 (0.4506) loss 2.7379 (2.4380) grad_norm 1.8151 (3.1117) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][140/625] eta 0:03:42 lr 0.000028 wd 0.0500 time 0.4505 (0.4581) data time 0.0009 (0.0057) model time 0.4496 (0.4504) loss 2.0373 (2.4451) grad_norm 4.1110 (3.0782) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][150/625] eta 0:03:37 lr 0.000028 wd 0.0500 time 0.4433 (0.4574) data time 0.0009 (0.0054) model time 0.4425 (0.4500) loss 2.6007 (2.4562) grad_norm 2.3876 (3.2896) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][160/625] eta 0:03:32 lr 0.000028 wd 0.0500 time 0.4444 (0.4580) data time 0.0007 (0.0051) model time 0.4437 (0.4515) loss 2.7016 (2.4588) grad_norm 4.3273 (3.3543) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][170/625] eta 0:03:28 lr 0.000028 wd 0.0500 time 0.4519 (0.4573) data time 0.0008 (0.0048) model time 0.4511 (0.4511) loss 2.2802 (2.4499) grad_norm 2.1518 (3.3495) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][180/625] eta 0:03:23 lr 0.000028 wd 0.0500 time 0.4468 (0.4568) data time 0.0007 (0.0046) model time 0.4461 (0.4507) loss 2.2954 (2.4599) grad_norm 4.2469 (3.3602) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:56:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][190/625] eta 0:03:18 lr 0.000028 wd 0.0500 time 0.4474 (0.4563) data time 0.0006 (0.0044) model time 0.4467 (0.4504) loss 2.8341 (2.4585) grad_norm 2.9417 (3.3327) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][200/625] eta 0:03:13 lr 0.000028 wd 0.0500 time 0.4515 (0.4560) data time 0.0008 (0.0042) model time 0.4507 (0.4504) loss 3.0429 (2.4663) grad_norm 4.9395 (3.3214) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][210/625] eta 0:03:09 lr 0.000028 wd 0.0500 time 0.6622 (0.4567) data time 0.0009 (0.0041) model time 0.6613 (0.4516) loss 2.5038 (2.4646) grad_norm 2.5505 (3.3000) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][220/625] eta 0:03:04 lr 0.000028 wd 0.0500 time 0.4497 (0.4561) data time 0.0008 (0.0039) model time 0.4488 (0.4510) loss 2.6649 (2.4731) grad_norm 2.8154 (3.2760) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][230/625] eta 0:03:00 lr 0.000028 wd 0.0500 time 0.4487 (0.4557) data time 0.0006 (0.0038) model time 0.4481 (0.4508) loss 3.0359 (2.4786) grad_norm 5.3264 (3.2862) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][240/625] eta 0:02:55 lr 0.000028 wd 0.0500 time 0.4457 (0.4553) data time 0.0008 (0.0037) model time 0.4449 (0.4506) loss 2.5669 (2.4757) grad_norm 2.6606 (3.4073) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][250/625] eta 0:02:50 lr 0.000028 wd 0.0500 time 0.4498 (0.4550) data time 0.0006 (0.0036) model time 0.4492 (0.4504) loss 2.4567 (2.4840) grad_norm 2.1848 (3.3842) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][260/625] eta 0:02:45 lr 0.000028 wd 0.0500 time 0.4477 (0.4548) data time 0.0008 (0.0034) model time 0.4469 (0.4502) loss 2.2957 (2.4714) grad_norm 3.1351 (3.3710) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][270/625] eta 0:02:41 lr 0.000028 wd 0.0500 time 0.4464 (0.4545) data time 0.0009 (0.0034) model time 0.4456 (0.4501) loss 2.3324 (2.4615) grad_norm 2.1868 (3.3527) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][280/625] eta 0:02:36 lr 0.000028 wd 0.0500 time 0.4455 (0.4543) data time 0.0009 (0.0033) model time 0.4447 (0.4499) loss 2.4683 (2.4635) grad_norm 3.5558 (3.3518) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][290/625] eta 0:02:32 lr 0.000028 wd 0.0500 time 0.4496 (0.4541) data time 0.0006 (0.0032) model time 0.4489 (0.4499) loss 2.8571 (2.4604) grad_norm 2.6785 (3.4238) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][300/625] eta 0:02:27 lr 0.000028 wd 0.0500 time 0.4447 (0.4538) data time 0.0008 (0.0031) model time 0.4439 (0.4497) loss 2.2922 (2.4620) grad_norm 2.1139 (3.3963) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][310/625] eta 0:02:22 lr 0.000028 wd 0.0500 time 0.4480 (0.4536) data time 0.0008 (0.0030) model time 0.4472 (0.4496) loss 2.5819 (2.4574) grad_norm 2.0959 (3.3863) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][320/625] eta 0:02:18 lr 0.000028 wd 0.0500 time 0.4449 (0.4534) data time 0.0006 (0.0030) model time 0.4443 (0.4494) loss 1.8980 (2.4553) grad_norm 2.6912 (3.3722) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:57:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][330/625] eta 0:02:13 lr 0.000028 wd 0.0500 time 0.4506 (0.4532) data time 0.0009 (0.0029) model time 0.4497 (0.4493) loss 2.6049 (2.4537) grad_norm 2.4152 (3.3531) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][340/625] eta 0:02:09 lr 0.000028 wd 0.0500 time 0.4477 (0.4531) data time 0.0009 (0.0028) model time 0.4469 (0.4493) loss 1.4198 (2.4457) grad_norm 1.9327 (3.3367) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][350/625] eta 0:02:04 lr 0.000028 wd 0.0500 time 0.4528 (0.4530) data time 0.0007 (0.0028) model time 0.4521 (0.4493) loss 2.9203 (2.4448) grad_norm 3.2766 (3.3366) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][360/625] eta 0:02:00 lr 0.000028 wd 0.0500 time 0.4514 (0.4529) data time 0.0008 (0.0027) model time 0.4506 (0.4493) loss 2.5850 (2.4462) grad_norm 2.8116 (3.3270) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][370/625] eta 0:01:55 lr 0.000028 wd 0.0500 time 0.4499 (0.4528) data time 0.0007 (0.0027) model time 0.4492 (0.4492) loss 2.2342 (2.4479) grad_norm 5.0715 (3.3680) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][380/625] eta 0:01:50 lr 0.000027 wd 0.0500 time 0.4488 (0.4527) data time 0.0009 (0.0026) model time 0.4479 (0.4491) loss 2.6213 (2.4434) grad_norm 2.1686 (3.3512) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][390/625] eta 0:01:46 lr 0.000027 wd 0.0500 time 0.4434 (0.4525) data time 0.0007 (0.0026) model time 0.4427 (0.4490) loss 1.9004 (2.4411) grad_norm 2.5047 (3.3344) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][400/625] eta 0:01:41 lr 0.000027 wd 0.0500 time 0.4498 (0.4524) data time 0.0006 (0.0025) model time 0.4493 (0.4490) loss 2.9447 (2.4400) grad_norm 2.1887 (3.3139) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][410/625] eta 0:01:37 lr 0.000027 wd 0.0500 time 0.4490 (0.4523) data time 0.0009 (0.0025) model time 0.4481 (0.4490) loss 2.6414 (2.4360) grad_norm 2.1140 (3.3260) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][420/625] eta 0:01:32 lr 0.000027 wd 0.0500 time 0.4495 (0.4523) data time 0.0007 (0.0025) model time 0.4488 (0.4490) loss 2.5153 (2.4417) grad_norm 2.2348 (3.3148) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][430/625] eta 0:01:28 lr 0.000027 wd 0.0500 time 0.4479 (0.4522) data time 0.0008 (0.0024) model time 0.4470 (0.4490) loss 2.5765 (2.4442) grad_norm 2.5645 (3.3050) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][440/625] eta 0:01:23 lr 0.000027 wd 0.0500 time 0.4480 (0.4521) data time 0.0008 (0.0024) model time 0.4472 (0.4490) loss 2.7496 (2.4470) grad_norm 3.6024 (3.2965) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][450/625] eta 0:01:19 lr 0.000027 wd 0.0500 time 0.4464 (0.4521) data time 0.0009 (0.0024) model time 0.4455 (0.4489) loss 2.9263 (2.4470) grad_norm 2.7063 (3.2919) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:58:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][460/625] eta 0:01:14 lr 0.000027 wd 0.0500 time 0.4451 (0.4519) data time 0.0007 (0.0023) model time 0.4444 (0.4488) loss 2.4113 (2.4482) grad_norm 3.1671 (3.2860) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][470/625] eta 0:01:10 lr 0.000027 wd 0.0500 time 0.4458 (0.4518) data time 0.0006 (0.0023) model time 0.4452 (0.4487) loss 2.4303 (2.4514) grad_norm 3.4063 (3.3046) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][480/625] eta 0:01:05 lr 0.000027 wd 0.0500 time 0.4482 (0.4517) data time 0.0009 (0.0023) model time 0.4473 (0.4487) loss 2.7972 (2.4543) grad_norm 2.3630 (3.3172) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][490/625] eta 0:01:01 lr 0.000027 wd 0.0500 time 0.4487 (0.4521) data time 0.0006 (0.0022) model time 0.4481 (0.4492) loss 1.6498 (2.4559) grad_norm 3.0548 (3.3160) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][500/625] eta 0:00:56 lr 0.000027 wd 0.0500 time 0.4451 (0.4520) data time 0.0009 (0.0022) model time 0.4442 (0.4491) loss 2.4461 (2.4558) grad_norm 2.3115 (3.3089) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][510/625] eta 0:00:51 lr 0.000027 wd 0.0500 time 0.4477 (0.4519) data time 0.0009 (0.0022) model time 0.4468 (0.4491) loss 2.4150 (2.4575) grad_norm 3.3793 (3.3192) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][520/625] eta 0:00:47 lr 0.000027 wd 0.0500 time 0.4410 (0.4518) data time 0.0008 (0.0022) model time 0.4401 (0.4490) loss 2.4837 (2.4574) grad_norm 2.0402 (3.3160) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][530/625] eta 0:00:42 lr 0.000027 wd 0.0500 time 0.4480 (0.4517) data time 0.0009 (0.0021) model time 0.4472 (0.4489) loss 2.2803 (2.4535) grad_norm 22.3898 (3.3509) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][540/625] eta 0:00:38 lr 0.000027 wd 0.0500 time 0.4479 (0.4516) data time 0.0009 (0.0021) model time 0.4470 (0.4488) loss 2.3546 (2.4501) grad_norm 2.4879 (3.3408) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][550/625] eta 0:00:33 lr 0.000027 wd 0.0500 time 0.4465 (0.4518) data time 0.0007 (0.0021) model time 0.4458 (0.4491) loss 2.6412 (2.4514) grad_norm 19.0995 (3.3702) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][560/625] eta 0:00:29 lr 0.000027 wd 0.0500 time 0.4500 (0.4518) data time 0.0008 (0.0021) model time 0.4491 (0.4491) loss 1.9314 (2.4508) grad_norm 2.4553 (3.3636) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][570/625] eta 0:00:24 lr 0.000027 wd 0.0500 time 0.4522 (0.4517) data time 0.0006 (0.0020) model time 0.4516 (0.4491) loss 2.0091 (2.4465) grad_norm 2.6308 (3.3646) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][580/625] eta 0:00:20 lr 0.000027 wd 0.0500 time 0.4485 (0.4517) data time 0.0008 (0.0020) model time 0.4477 (0.4490) loss 2.8944 (2.4448) grad_norm 3.7688 (3.3667) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 10:59:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][590/625] eta 0:00:15 lr 0.000027 wd 0.0500 time 0.4413 (0.4516) data time 0.0006 (0.0020) model time 0.4406 (0.4490) loss 2.6697 (2.4443) grad_norm 2.9726 (3.3678) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][600/625] eta 0:00:11 lr 0.000027 wd 0.0500 time 0.4542 (0.4515) data time 0.0007 (0.0020) model time 0.4535 (0.4489) loss 2.0648 (2.4472) grad_norm 2.4270 (3.3731) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][610/625] eta 0:00:06 lr 0.000027 wd 0.0500 time 0.4414 (0.4514) data time 0.0006 (0.0020) model time 0.4408 (0.4489) loss 2.4803 (2.4435) grad_norm 2.7782 (3.3663) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [279/300][620/625] eta 0:00:02 lr 0.000027 wd 0.0500 time 0.4447 (0.4513) data time 0.0006 (0.0019) model time 0.4441 (0.4488) loss 2.8243 (2.4441) grad_norm 3.8141 (3.3579) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:10 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 279 training takes 0:04:42 [2024-08-11 11:00:10 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:00:12 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:00:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.484 (0.484) Loss 0.5298 (0.5298) Acc@1 88.867 (88.867) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 11:00:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.152) Loss 0.8657 (0.6323) Acc@1 80.566 (86.972) Acc@5 96.045 (97.776) Mem 16699MB [2024-08-11 11:00:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.135) Loss 0.9565 (0.7557) Acc@1 79.248 (84.096) Acc@5 95.020 (96.645) Mem 16699MB [2024-08-11 11:00:15 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.819 Acc@5 96.577 [2024-08-11 11:00:15 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.816 (0.816) Loss 0.5210 (0.5210) Acc@1 89.111 (89.111) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:00:17 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8379 (0.6213) Acc@1 81.152 (87.092) Acc@5 96.143 (97.763) Mem 16699MB [2024-08-11 11:00:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.151) Loss 0.9209 (0.7405) Acc@1 79.395 (84.282) Acc@5 95.459 (96.675) Mem 16699MB [2024-08-11 11:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.037 Acc@5 96.623 [2024-08-11 11:00:19 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 11:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][0/625] eta 0:12:55 lr 0.000027 wd 0.0500 time 1.2414 (1.2414) data time 0.6813 (0.6813) model time 0.0000 (0.0000) loss 2.5288 (2.5288) grad_norm 7.0912 (7.0912) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][10/625] eta 0:05:20 lr 0.000027 wd 0.0500 time 0.4451 (0.5210) data time 0.0009 (0.0627) model time 0.0000 (0.0000) loss 2.3778 (2.3481) grad_norm 2.2677 (3.3418) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][20/625] eta 0:04:54 lr 0.000027 wd 0.0500 time 0.4504 (0.4869) data time 0.0009 (0.0333) model time 0.0000 (0.0000) loss 2.6568 (2.3737) grad_norm 3.2262 (3.1006) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][30/625] eta 0:04:42 lr 0.000027 wd 0.0500 time 0.4497 (0.4749) data time 0.0008 (0.0228) model time 0.0000 (0.0000) loss 2.4785 (2.3569) grad_norm 2.9850 (2.9787) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][40/625] eta 0:04:34 lr 0.000027 wd 0.0500 time 0.4426 (0.4693) data time 0.0008 (0.0174) model time 0.0000 (0.0000) loss 2.4984 (2.3689) grad_norm 3.3524 (5.2768) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][50/625] eta 0:04:27 lr 0.000027 wd 0.0500 time 0.4481 (0.4656) data time 0.0008 (0.0142) model time 0.0000 (0.0000) loss 1.8571 (2.3407) grad_norm 3.6990 (4.8409) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][60/625] eta 0:04:21 lr 0.000027 wd 0.0500 time 0.4489 (0.4630) data time 0.0008 (0.0120) model time 0.4481 (0.4490) loss 2.5102 (2.3848) grad_norm 2.3114 (4.5428) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][70/625] eta 0:04:15 lr 0.000027 wd 0.0500 time 0.4478 (0.4610) data time 0.0008 (0.0104) model time 0.4470 (0.4486) loss 2.7068 (2.4098) grad_norm 4.1713 (4.4719) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][80/625] eta 0:04:10 lr 0.000027 wd 0.0500 time 0.4508 (0.4595) data time 0.0006 (0.0092) model time 0.4502 (0.4483) loss 2.2153 (2.3851) grad_norm 2.9560 (5.1573) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][90/625] eta 0:04:06 lr 0.000027 wd 0.0500 time 0.4478 (0.4608) data time 0.0009 (0.0083) model time 0.4469 (0.4539) loss 2.4765 (2.3912) grad_norm 4.1702 (4.9073) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][100/625] eta 0:04:01 lr 0.000027 wd 0.0500 time 0.4472 (0.4607) data time 0.0006 (0.0076) model time 0.4465 (0.4548) loss 1.6048 (2.3785) grad_norm 2.5733 (4.6730) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][110/625] eta 0:03:56 lr 0.000027 wd 0.0500 time 0.4445 (0.4595) data time 0.0007 (0.0070) model time 0.4439 (0.4535) loss 2.5758 (2.3808) grad_norm 2.9797 (4.5581) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][120/625] eta 0:03:51 lr 0.000027 wd 0.0500 time 0.4449 (0.4585) data time 0.0009 (0.0065) model time 0.4440 (0.4525) loss 2.5386 (2.3995) grad_norm 2.0909 (4.4322) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][130/625] eta 0:03:46 lr 0.000027 wd 0.0500 time 0.4448 (0.4575) data time 0.0009 (0.0060) model time 0.4439 (0.4516) loss 2.6765 (2.4086) grad_norm 1.6917 (4.3202) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][140/625] eta 0:03:41 lr 0.000027 wd 0.0500 time 0.4483 (0.4569) data time 0.0008 (0.0057) model time 0.4476 (0.4511) loss 2.2932 (2.4083) grad_norm 3.5787 (4.2669) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][150/625] eta 0:03:36 lr 0.000027 wd 0.0500 time 0.4474 (0.4563) data time 0.0007 (0.0054) model time 0.4468 (0.4507) loss 2.7396 (2.4032) grad_norm 3.2037 (4.1849) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][160/625] eta 0:03:31 lr 0.000027 wd 0.0500 time 0.4484 (0.4559) data time 0.0007 (0.0051) model time 0.4478 (0.4505) loss 2.7142 (2.4140) grad_norm 3.3441 (4.1514) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][170/625] eta 0:03:27 lr 0.000026 wd 0.0500 time 0.4469 (0.4555) data time 0.0006 (0.0048) model time 0.4463 (0.4503) loss 1.8377 (2.3942) grad_norm 3.0195 (4.4742) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][180/625] eta 0:03:22 lr 0.000026 wd 0.0500 time 0.4498 (0.4551) data time 0.0008 (0.0046) model time 0.4490 (0.4502) loss 2.5576 (2.4053) grad_norm 3.1450 (4.4494) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][190/625] eta 0:03:17 lr 0.000026 wd 0.0500 time 0.4452 (0.4547) data time 0.0006 (0.0044) model time 0.4446 (0.4498) loss 1.6317 (2.3884) grad_norm 3.1084 (4.4148) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][200/625] eta 0:03:13 lr 0.000026 wd 0.0500 time 0.4515 (0.4543) data time 0.0006 (0.0042) model time 0.4508 (0.4496) loss 2.2665 (2.3812) grad_norm 1.6813 (4.3215) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][210/625] eta 0:03:08 lr 0.000026 wd 0.0500 time 0.4521 (0.4541) data time 0.0006 (0.0041) model time 0.4515 (0.4496) loss 1.8315 (2.3755) grad_norm 2.2551 (4.2353) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][220/625] eta 0:03:03 lr 0.000026 wd 0.0500 time 0.4521 (0.4539) data time 0.0006 (0.0039) model time 0.4515 (0.4496) loss 1.4050 (2.3641) grad_norm 2.6787 (4.1565) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][230/625] eta 0:02:59 lr 0.000026 wd 0.0500 time 0.4481 (0.4537) data time 0.0006 (0.0038) model time 0.4475 (0.4495) loss 3.0683 (2.3707) grad_norm 2.0416 (4.0976) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][240/625] eta 0:02:54 lr 0.000026 wd 0.0500 time 0.4533 (0.4536) data time 0.0008 (0.0037) model time 0.4525 (0.4495) loss 2.4205 (2.3758) grad_norm 3.3721 (4.0430) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][250/625] eta 0:02:50 lr 0.000026 wd 0.0500 time 0.4484 (0.4540) data time 0.0006 (0.0036) model time 0.4478 (0.4501) loss 2.6242 (2.3713) grad_norm 3.2072 (3.9907) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][260/625] eta 0:02:45 lr 0.000026 wd 0.0500 time 0.4474 (0.4537) data time 0.0007 (0.0035) model time 0.4467 (0.4500) loss 1.9402 (2.3644) grad_norm 2.5427 (3.9684) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][270/625] eta 0:02:40 lr 0.000026 wd 0.0500 time 0.4449 (0.4534) data time 0.0010 (0.0034) model time 0.4439 (0.4498) loss 2.3000 (2.3741) grad_norm 2.1834 (3.9599) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][280/625] eta 0:02:36 lr 0.000026 wd 0.0500 time 0.4473 (0.4538) data time 0.0007 (0.0033) model time 0.4466 (0.4503) loss 2.6780 (2.3773) grad_norm 2.4590 (3.9264) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][290/625] eta 0:02:31 lr 0.000026 wd 0.0500 time 0.4464 (0.4536) data time 0.0007 (0.0032) model time 0.4457 (0.4502) loss 2.1812 (2.3810) grad_norm 3.3463 (3.9206) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][300/625] eta 0:02:27 lr 0.000026 wd 0.0500 time 0.4468 (0.4534) data time 0.0006 (0.0031) model time 0.4462 (0.4501) loss 2.4565 (2.3880) grad_norm 2.9354 (3.8875) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][310/625] eta 0:02:22 lr 0.000026 wd 0.0500 time 0.4489 (0.4533) data time 0.0006 (0.0030) model time 0.4482 (0.4500) loss 2.8968 (2.3895) grad_norm 1.7406 (3.8525) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][320/625] eta 0:02:18 lr 0.000026 wd 0.0500 time 0.4452 (0.4531) data time 0.0006 (0.0030) model time 0.4446 (0.4499) loss 2.4535 (2.3935) grad_norm 2.8438 (3.8252) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][330/625] eta 0:02:13 lr 0.000026 wd 0.0500 time 0.4509 (0.4530) data time 0.0008 (0.0029) model time 0.4500 (0.4499) loss 2.9239 (2.3948) grad_norm 4.6590 (3.7916) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][340/625] eta 0:02:09 lr 0.000026 wd 0.0500 time 0.4484 (0.4529) data time 0.0009 (0.0028) model time 0.4475 (0.4498) loss 1.6087 (2.3868) grad_norm 2.9765 (3.7702) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:02:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][350/625] eta 0:02:04 lr 0.000026 wd 0.0500 time 0.4492 (0.4527) data time 0.0008 (0.0028) model time 0.4483 (0.4497) loss 2.6064 (2.3918) grad_norm 4.6448 (3.8852) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][360/625] eta 0:01:59 lr 0.000026 wd 0.0500 time 0.4460 (0.4526) data time 0.0006 (0.0027) model time 0.4454 (0.4496) loss 1.8927 (2.3954) grad_norm 2.9154 (3.8531) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:03:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][370/625] eta 0:01:55 lr 0.000026 wd 0.0500 time 0.4498 (0.4526) data time 0.0008 (0.0027) model time 0.4490 (0.4496) loss 2.0590 (2.4006) grad_norm 3.3474 (3.8289) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][380/625] eta 0:01:50 lr 0.000026 wd 0.0500 time 0.4523 (0.4525) data time 0.0006 (0.0026) model time 0.4517 (0.4496) loss 2.4260 (2.4026) grad_norm 1.9655 (3.7999) loss_scale 256.0000 (130.0157) mem 16699MB [2024-08-11 11:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][390/625] eta 0:01:46 lr 0.000026 wd 0.0500 time 0.4447 (0.4524) data time 0.0006 (0.0026) model time 0.4441 (0.4496) loss 2.1643 (2.4039) grad_norm 1.7575 (3.7738) loss_scale 256.0000 (133.2379) mem 16699MB [2024-08-11 11:03:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][400/625] eta 0:01:41 lr 0.000026 wd 0.0500 time 0.4462 (0.4523) data time 0.0008 (0.0025) model time 0.4453 (0.4495) loss 2.6904 (2.4070) grad_norm 2.6443 (3.7640) loss_scale 256.0000 (136.2993) mem 16699MB [2024-08-11 11:03:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][410/625] eta 0:01:37 lr 0.000026 wd 0.0500 time 0.4492 (0.4521) data time 0.0009 (0.0025) model time 0.4483 (0.4494) loss 2.7375 (2.4069) grad_norm 3.5019 (3.7449) loss_scale 256.0000 (139.2117) mem 16699MB [2024-08-11 11:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][420/625] eta 0:01:32 lr 0.000026 wd 0.0500 time 0.4507 (0.4520) data time 0.0006 (0.0025) model time 0.4501 (0.4493) loss 2.2330 (2.4018) grad_norm 3.5489 (3.7551) loss_scale 256.0000 (141.9857) mem 16699MB [2024-08-11 11:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][430/625] eta 0:01:28 lr 0.000026 wd 0.0500 time 0.4510 (0.4520) data time 0.0008 (0.0024) model time 0.4502 (0.4493) loss 2.3832 (2.4002) grad_norm 2.8892 (3.7456) loss_scale 256.0000 (144.6311) mem 16699MB [2024-08-11 11:03:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][440/625] eta 0:01:23 lr 0.000026 wd 0.0500 time 0.4487 (0.4519) data time 0.0007 (0.0024) model time 0.4480 (0.4493) loss 2.3765 (2.3994) grad_norm 3.9051 (3.7436) loss_scale 256.0000 (147.1565) mem 16699MB [2024-08-11 11:03:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][450/625] eta 0:01:19 lr 0.000026 wd 0.0500 time 0.4489 (0.4518) data time 0.0009 (0.0023) model time 0.4481 (0.4492) loss 2.2679 (2.4039) grad_norm 2.5643 (3.7257) loss_scale 256.0000 (149.5698) mem 16699MB [2024-08-11 11:03:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][460/625] eta 0:01:14 lr 0.000026 wd 0.0500 time 0.4448 (0.4518) data time 0.0008 (0.0023) model time 0.4440 (0.4492) loss 2.5294 (2.4016) grad_norm 2.9655 (3.7119) loss_scale 256.0000 (151.8785) mem 16699MB [2024-08-11 11:03:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][470/625] eta 0:01:10 lr 0.000026 wd 0.0500 time 0.4459 (0.4517) data time 0.0009 (0.0023) model time 0.4450 (0.4492) loss 2.4032 (2.4032) grad_norm 2.1581 (3.6872) loss_scale 256.0000 (154.0892) mem 16699MB [2024-08-11 11:03:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][480/625] eta 0:01:05 lr 0.000026 wd 0.0500 time 0.4467 (0.4516) data time 0.0008 (0.0023) model time 0.4459 (0.4491) loss 2.6612 (2.4081) grad_norm 2.8774 (3.6710) loss_scale 256.0000 (156.2079) mem 16699MB [2024-08-11 11:04:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][490/625] eta 0:01:00 lr 0.000026 wd 0.0500 time 0.4479 (0.4515) data time 0.0006 (0.0022) model time 0.4473 (0.4491) loss 3.0331 (2.4103) grad_norm 2.9852 (3.6680) loss_scale 256.0000 (158.2403) mem 16699MB [2024-08-11 11:04:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][500/625] eta 0:00:56 lr 0.000026 wd 0.0500 time 0.4375 (0.4519) data time 0.0010 (0.0022) model time 0.4365 (0.4495) loss 2.5333 (2.4111) grad_norm 2.9279 (3.7269) loss_scale 256.0000 (160.1916) mem 16699MB [2024-08-11 11:04:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][510/625] eta 0:00:51 lr 0.000026 wd 0.0500 time 0.4502 (0.4518) data time 0.0007 (0.0022) model time 0.4495 (0.4495) loss 2.7349 (2.4115) grad_norm 2.1931 (3.7368) loss_scale 256.0000 (162.0665) mem 16699MB [2024-08-11 11:04:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][520/625] eta 0:00:47 lr 0.000026 wd 0.0500 time 0.4510 (0.4518) data time 0.0008 (0.0021) model time 0.4502 (0.4494) loss 2.7338 (2.4122) grad_norm 2.0056 (3.7682) loss_scale 256.0000 (163.8695) mem 16699MB [2024-08-11 11:04:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][530/625] eta 0:00:42 lr 0.000026 wd 0.0500 time 0.4479 (0.4517) data time 0.0006 (0.0021) model time 0.4473 (0.4494) loss 2.1434 (2.4140) grad_norm 2.3310 (3.7505) loss_scale 256.0000 (165.6045) mem 16699MB [2024-08-11 11:04:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][540/625] eta 0:00:38 lr 0.000026 wd 0.0500 time 0.4436 (0.4516) data time 0.0007 (0.0021) model time 0.4429 (0.4493) loss 1.6839 (2.4112) grad_norm 2.6692 (3.7460) loss_scale 256.0000 (167.2754) mem 16699MB [2024-08-11 11:04:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][550/625] eta 0:00:33 lr 0.000026 wd 0.0500 time 0.4475 (0.4516) data time 0.0007 (0.0021) model time 0.4467 (0.4493) loss 1.5176 (2.4107) grad_norm 3.2872 (3.7350) loss_scale 256.0000 (168.8857) mem 16699MB [2024-08-11 11:04:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][560/625] eta 0:00:29 lr 0.000026 wd 0.0500 time 0.4479 (0.4515) data time 0.0006 (0.0020) model time 0.4473 (0.4493) loss 1.8667 (2.4125) grad_norm 3.9165 (3.7252) loss_scale 256.0000 (170.4385) mem 16699MB [2024-08-11 11:04:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][570/625] eta 0:00:24 lr 0.000026 wd 0.0500 time 0.4476 (0.4514) data time 0.0006 (0.0020) model time 0.4471 (0.4492) loss 2.2259 (2.4146) grad_norm 4.0610 (3.7084) loss_scale 256.0000 (171.9370) mem 16699MB [2024-08-11 11:04:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][580/625] eta 0:00:20 lr 0.000026 wd 0.0500 time 0.4475 (0.4516) data time 0.0006 (0.0020) model time 0.4468 (0.4494) loss 2.1428 (2.4165) grad_norm 2.3262 (3.6946) loss_scale 256.0000 (173.3838) mem 16699MB [2024-08-11 11:04:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][590/625] eta 0:00:15 lr 0.000026 wd 0.0500 time 0.4488 (0.4515) data time 0.0009 (0.0020) model time 0.4479 (0.4494) loss 2.6364 (2.4130) grad_norm 2.4239 (3.6806) loss_scale 256.0000 (174.7817) mem 16699MB [2024-08-11 11:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][600/625] eta 0:00:11 lr 0.000026 wd 0.0500 time 0.4485 (0.4515) data time 0.0007 (0.0020) model time 0.4479 (0.4493) loss 2.1169 (2.4149) grad_norm 2.0431 (3.6605) loss_scale 256.0000 (176.1331) mem 16699MB [2024-08-11 11:04:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][610/625] eta 0:00:06 lr 0.000025 wd 0.0500 time 0.4402 (0.4514) data time 0.0005 (0.0020) model time 0.4397 (0.4493) loss 1.9499 (2.4113) grad_norm 2.7500 (3.6535) loss_scale 256.0000 (177.4403) mem 16699MB [2024-08-11 11:04:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [280/300][620/625] eta 0:00:02 lr 0.000025 wd 0.0500 time 0.4408 (0.4512) data time 0.0004 (0.0019) model time 0.4404 (0.4491) loss 1.7558 (2.4083) grad_norm 4.1337 (3.6404) loss_scale 256.0000 (178.7053) mem 16699MB [2024-08-11 11:05:01 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 280 training takes 0:04:41 [2024-08-11 11:05:01 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:05:02 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:05:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.515 (0.515) Loss 0.5298 (0.5298) Acc@1 88.916 (88.916) Acc@5 99.072 (99.072) Mem 16699MB [2024-08-11 11:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.155) Loss 0.8716 (0.6323) Acc@1 80.518 (86.963) Acc@5 95.898 (97.749) Mem 16699MB [2024-08-11 11:05:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.136) Loss 0.9536 (0.7549) Acc@1 79.053 (84.026) Acc@5 94.971 (96.619) Mem 16699MB [2024-08-11 11:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.771 Acc@5 96.559 [2024-08-11 11:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:05:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.902 (0.902) Loss 0.5215 (0.5215) Acc@1 89.111 (89.111) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:05:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.190) Loss 0.8398 (0.6219) Acc@1 81.104 (87.100) Acc@5 96.191 (97.772) Mem 16699MB [2024-08-11 11:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.154) Loss 0.9238 (0.7414) Acc@1 79.346 (84.282) Acc@5 95.459 (96.680) Mem 16699MB [2024-08-11 11:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.029 Acc@5 96.623 [2024-08-11 11:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 11:05:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][0/625] eta 0:13:03 lr 0.000025 wd 0.0500 time 1.2537 (1.2537) data time 0.7069 (0.7069) model time 0.0000 (0.0000) loss 2.5564 (2.5564) grad_norm 2.7280 (2.7280) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][10/625] eta 0:05:20 lr 0.000025 wd 0.0500 time 0.4443 (0.5209) data time 0.0006 (0.0650) model time 0.0000 (0.0000) loss 2.2906 (2.2447) grad_norm 2.2120 (2.5702) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][20/625] eta 0:04:54 lr 0.000025 wd 0.0500 time 0.4561 (0.4870) data time 0.0006 (0.0344) model time 0.0000 (0.0000) loss 2.3493 (2.3298) grad_norm 2.7259 (3.2139) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][30/625] eta 0:04:45 lr 0.000025 wd 0.0500 time 0.4545 (0.4800) data time 0.0008 (0.0236) model time 0.0000 (0.0000) loss 2.5246 (2.3249) grad_norm 3.0737 (3.2510) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][40/625] eta 0:04:36 lr 0.000025 wd 0.0500 time 0.4483 (0.4729) data time 0.0009 (0.0181) model time 0.0000 (0.0000) loss 2.6691 (2.4008) grad_norm 2.3540 (3.1221) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][50/625] eta 0:04:29 lr 0.000025 wd 0.0500 time 0.4510 (0.4681) data time 0.0006 (0.0147) model time 0.0000 (0.0000) loss 1.4205 (2.3853) grad_norm 2.2434 (3.3783) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][60/625] eta 0:04:22 lr 0.000025 wd 0.0500 time 0.4498 (0.4649) data time 0.0006 (0.0124) model time 0.4492 (0.4479) loss 2.7492 (2.4389) grad_norm 4.2098 (3.2610) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][70/625] eta 0:04:16 lr 0.000025 wd 0.0500 time 0.4450 (0.4629) data time 0.0009 (0.0108) model time 0.4442 (0.4488) loss 2.5920 (2.4734) grad_norm 2.5165 (3.2295) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][80/625] eta 0:04:11 lr 0.000025 wd 0.0500 time 0.4439 (0.4610) data time 0.0007 (0.0095) model time 0.4432 (0.4481) loss 2.8171 (2.5130) grad_norm 2.3731 (3.2503) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][90/625] eta 0:04:05 lr 0.000025 wd 0.0500 time 0.4451 (0.4595) data time 0.0006 (0.0086) model time 0.4445 (0.4478) loss 2.7236 (2.5075) grad_norm 3.3795 (3.3088) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:05:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][100/625] eta 0:04:00 lr 0.000025 wd 0.0500 time 0.4488 (0.4584) data time 0.0009 (0.0078) model time 0.4479 (0.4478) loss 2.6489 (2.4988) grad_norm 2.6104 (3.2400) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][110/625] eta 0:03:55 lr 0.000025 wd 0.0500 time 0.4473 (0.4575) data time 0.0006 (0.0072) model time 0.4467 (0.4477) loss 2.6288 (2.4804) grad_norm 2.9265 (3.2001) loss_scale 256.0000 (256.0000) mem 16699MB [2024-08-11 11:06:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][120/625] eta 0:03:50 lr 0.000025 wd 0.0500 time 0.4472 (0.4567) data time 0.0009 (0.0067) model time 0.4464 (0.4476) loss 1.9563 (2.4808) grad_norm 2.3365 (inf) loss_scale 128.0000 (251.7686) mem 16699MB [2024-08-11 11:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][130/625] eta 0:03:45 lr 0.000025 wd 0.0500 time 0.4501 (0.4560) data time 0.0008 (0.0062) model time 0.4493 (0.4475) loss 2.6244 (2.4858) grad_norm 4.3590 (inf) loss_scale 128.0000 (242.3206) mem 16699MB [2024-08-11 11:06:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][140/625] eta 0:03:41 lr 0.000025 wd 0.0500 time 0.4486 (0.4558) data time 0.0009 (0.0058) model time 0.4477 (0.4480) loss 2.5903 (2.4817) grad_norm 2.2742 (inf) loss_scale 128.0000 (234.2128) mem 16699MB [2024-08-11 11:06:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][150/625] eta 0:03:36 lr 0.000025 wd 0.0500 time 0.4464 (0.4553) data time 0.0008 (0.0055) model time 0.4455 (0.4479) loss 2.9434 (2.4868) grad_norm 2.7751 (inf) loss_scale 128.0000 (227.1788) mem 16699MB [2024-08-11 11:06:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][160/625] eta 0:03:31 lr 0.000025 wd 0.0500 time 0.4479 (0.4548) data time 0.0006 (0.0052) model time 0.4473 (0.4479) loss 1.8375 (2.4734) grad_norm 2.4238 (inf) loss_scale 128.0000 (221.0186) mem 16699MB [2024-08-11 11:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][170/625] eta 0:03:26 lr 0.000025 wd 0.0500 time 0.4486 (0.4545) data time 0.0008 (0.0050) model time 0.4478 (0.4479) loss 1.5527 (2.4611) grad_norm 2.4811 (inf) loss_scale 128.0000 (215.5789) mem 16699MB [2024-08-11 11:06:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][180/625] eta 0:03:22 lr 0.000025 wd 0.0500 time 0.4486 (0.4553) data time 0.0006 (0.0047) model time 0.4480 (0.4495) loss 2.7517 (2.4625) grad_norm 3.0844 (inf) loss_scale 128.0000 (210.7403) mem 16699MB [2024-08-11 11:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][190/625] eta 0:03:17 lr 0.000025 wd 0.0500 time 0.4557 (0.4550) data time 0.0008 (0.0045) model time 0.4549 (0.4494) loss 2.6897 (2.4651) grad_norm 2.3425 (inf) loss_scale 128.0000 (206.4084) mem 16699MB [2024-08-11 11:06:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][200/625] eta 0:03:13 lr 0.000025 wd 0.0500 time 0.4459 (0.4546) data time 0.0008 (0.0044) model time 0.4451 (0.4492) loss 2.6517 (2.4649) grad_norm 2.8862 (inf) loss_scale 128.0000 (202.5075) mem 16699MB [2024-08-11 11:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][210/625] eta 0:03:08 lr 0.000025 wd 0.0500 time 0.4457 (0.4543) data time 0.0009 (0.0042) model time 0.4448 (0.4491) loss 2.6385 (2.4636) grad_norm 2.6821 (inf) loss_scale 128.0000 (198.9763) mem 16699MB [2024-08-11 11:06:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][220/625] eta 0:03:03 lr 0.000025 wd 0.0500 time 0.4470 (0.4540) data time 0.0006 (0.0040) model time 0.4464 (0.4489) loss 2.2400 (2.4594) grad_norm 3.3280 (inf) loss_scale 128.0000 (195.7647) mem 16699MB [2024-08-11 11:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][230/625] eta 0:02:59 lr 0.000025 wd 0.0500 time 0.4503 (0.4537) data time 0.0009 (0.0039) model time 0.4494 (0.4488) loss 2.6393 (2.4636) grad_norm 3.6664 (inf) loss_scale 128.0000 (192.8312) mem 16699MB [2024-08-11 11:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][240/625] eta 0:02:54 lr 0.000025 wd 0.0500 time 0.4486 (0.4535) data time 0.0010 (0.0038) model time 0.4476 (0.4487) loss 2.7899 (2.4665) grad_norm 3.2853 (inf) loss_scale 128.0000 (190.1411) mem 16699MB [2024-08-11 11:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][250/625] eta 0:02:49 lr 0.000025 wd 0.0500 time 0.4491 (0.4533) data time 0.0006 (0.0037) model time 0.4485 (0.4487) loss 1.9004 (2.4596) grad_norm 2.7783 (inf) loss_scale 128.0000 (187.6653) mem 16699MB [2024-08-11 11:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][260/625] eta 0:02:45 lr 0.000025 wd 0.0500 time 0.4477 (0.4531) data time 0.0006 (0.0035) model time 0.4471 (0.4487) loss 2.2347 (2.4541) grad_norm 4.0285 (inf) loss_scale 128.0000 (185.3793) mem 16699MB [2024-08-11 11:07:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][270/625] eta 0:02:40 lr 0.000025 wd 0.0500 time 0.4459 (0.4529) data time 0.0007 (0.0034) model time 0.4453 (0.4486) loss 2.5234 (2.4567) grad_norm 4.0980 (inf) loss_scale 128.0000 (183.2620) mem 16699MB [2024-08-11 11:07:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][280/625] eta 0:02:36 lr 0.000025 wd 0.0500 time 0.5698 (0.4532) data time 0.0007 (0.0034) model time 0.5691 (0.4491) loss 2.1667 (2.4536) grad_norm 2.5352 (inf) loss_scale 128.0000 (181.2954) mem 16699MB [2024-08-11 11:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][290/625] eta 0:02:31 lr 0.000025 wd 0.0500 time 0.4497 (0.4530) data time 0.0008 (0.0033) model time 0.4489 (0.4490) loss 2.6676 (2.4582) grad_norm 2.4985 (inf) loss_scale 128.0000 (179.4639) mem 16699MB [2024-08-11 11:07:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][300/625] eta 0:02:27 lr 0.000025 wd 0.0500 time 0.4523 (0.4528) data time 0.0009 (0.0032) model time 0.4514 (0.4489) loss 2.5700 (2.4639) grad_norm 2.0270 (inf) loss_scale 128.0000 (177.7542) mem 16699MB [2024-08-11 11:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][310/625] eta 0:02:22 lr 0.000025 wd 0.0500 time 0.4499 (0.4527) data time 0.0010 (0.0031) model time 0.4490 (0.4488) loss 2.1455 (2.4585) grad_norm 2.2017 (inf) loss_scale 128.0000 (176.1543) mem 16699MB [2024-08-11 11:07:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][320/625] eta 0:02:18 lr 0.000025 wd 0.0500 time 0.4514 (0.4526) data time 0.0010 (0.0030) model time 0.4505 (0.4488) loss 1.9177 (2.4647) grad_norm 3.2383 (inf) loss_scale 128.0000 (174.6542) mem 16699MB [2024-08-11 11:07:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][330/625] eta 0:02:13 lr 0.000025 wd 0.0500 time 0.4494 (0.4525) data time 0.0007 (0.0030) model time 0.4487 (0.4488) loss 2.8346 (2.4695) grad_norm 2.1630 (inf) loss_scale 128.0000 (173.2447) mem 16699MB [2024-08-11 11:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][340/625] eta 0:02:08 lr 0.000025 wd 0.0500 time 0.4489 (0.4524) data time 0.0006 (0.0029) model time 0.4483 (0.4488) loss 2.8757 (2.4685) grad_norm 3.8109 (inf) loss_scale 128.0000 (171.9179) mem 16699MB [2024-08-11 11:07:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][350/625] eta 0:02:04 lr 0.000025 wd 0.0500 time 0.4498 (0.4523) data time 0.0008 (0.0029) model time 0.4490 (0.4488) loss 1.7042 (2.4612) grad_norm 2.7224 (inf) loss_scale 128.0000 (170.6667) mem 16699MB [2024-08-11 11:07:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][360/625] eta 0:01:59 lr 0.000025 wd 0.0500 time 0.4461 (0.4522) data time 0.0006 (0.0028) model time 0.4455 (0.4487) loss 2.2230 (2.4556) grad_norm 2.2161 (inf) loss_scale 128.0000 (169.4848) mem 16699MB [2024-08-11 11:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][370/625] eta 0:01:55 lr 0.000025 wd 0.0500 time 0.4465 (0.4521) data time 0.0008 (0.0027) model time 0.4457 (0.4487) loss 2.8869 (2.4581) grad_norm 2.0172 (inf) loss_scale 128.0000 (168.3666) mem 16699MB [2024-08-11 11:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][380/625] eta 0:01:50 lr 0.000025 wd 0.0500 time 0.4536 (0.4519) data time 0.0006 (0.0027) model time 0.4530 (0.4486) loss 1.4676 (2.4547) grad_norm 2.5482 (inf) loss_scale 128.0000 (167.3071) mem 16699MB [2024-08-11 11:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][390/625] eta 0:01:46 lr 0.000025 wd 0.0500 time 0.4526 (0.4519) data time 0.0014 (0.0026) model time 0.4513 (0.4486) loss 1.6931 (2.4515) grad_norm 3.3799 (inf) loss_scale 128.0000 (166.3018) mem 16699MB [2024-08-11 11:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][400/625] eta 0:01:41 lr 0.000025 wd 0.0500 time 0.4477 (0.4518) data time 0.0006 (0.0026) model time 0.4471 (0.4486) loss 2.0746 (2.4489) grad_norm 2.1397 (inf) loss_scale 128.0000 (165.3466) mem 16699MB [2024-08-11 11:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][410/625] eta 0:01:37 lr 0.000025 wd 0.0500 time 0.4500 (0.4518) data time 0.0009 (0.0026) model time 0.4492 (0.4486) loss 3.0203 (2.4508) grad_norm 3.0744 (inf) loss_scale 128.0000 (164.4380) mem 16699MB [2024-08-11 11:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][420/625] eta 0:01:32 lr 0.000025 wd 0.0500 time 0.4486 (0.4517) data time 0.0008 (0.0025) model time 0.4478 (0.4486) loss 2.9521 (2.4482) grad_norm 2.5424 (inf) loss_scale 128.0000 (163.5724) mem 16699MB [2024-08-11 11:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][430/625] eta 0:01:28 lr 0.000024 wd 0.0500 time 0.4509 (0.4516) data time 0.0008 (0.0025) model time 0.4501 (0.4485) loss 2.6806 (2.4487) grad_norm 2.9352 (inf) loss_scale 128.0000 (162.7471) mem 16699MB [2024-08-11 11:08:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][440/625] eta 0:01:23 lr 0.000024 wd 0.0500 time 0.4480 (0.4515) data time 0.0008 (0.0024) model time 0.4471 (0.4485) loss 2.7698 (2.4505) grad_norm 3.0917 (inf) loss_scale 128.0000 (161.9592) mem 16699MB [2024-08-11 11:08:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][450/625] eta 0:01:18 lr 0.000024 wd 0.0500 time 0.4462 (0.4514) data time 0.0008 (0.0024) model time 0.4454 (0.4485) loss 2.8318 (2.4463) grad_norm 2.8854 (inf) loss_scale 128.0000 (161.2062) mem 16699MB [2024-08-11 11:08:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][460/625] eta 0:01:14 lr 0.000024 wd 0.0500 time 0.4471 (0.4514) data time 0.0009 (0.0024) model time 0.4462 (0.4484) loss 2.1171 (2.4467) grad_norm 1.8430 (inf) loss_scale 128.0000 (160.4859) mem 16699MB [2024-08-11 11:08:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][470/625] eta 0:01:09 lr 0.000024 wd 0.0500 time 0.4462 (0.4513) data time 0.0008 (0.0023) model time 0.4454 (0.4484) loss 2.2804 (2.4473) grad_norm 3.4219 (inf) loss_scale 128.0000 (159.7962) mem 16699MB [2024-08-11 11:08:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][480/625] eta 0:01:05 lr 0.000024 wd 0.0500 time 0.4493 (0.4512) data time 0.0008 (0.0023) model time 0.4485 (0.4484) loss 2.8819 (2.4475) grad_norm 2.2406 (inf) loss_scale 128.0000 (159.1351) mem 16699MB [2024-08-11 11:08:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][490/625] eta 0:01:00 lr 0.000024 wd 0.0500 time 0.4473 (0.4512) data time 0.0008 (0.0023) model time 0.4465 (0.4484) loss 2.4514 (2.4515) grad_norm 2.6812 (inf) loss_scale 128.0000 (158.5010) mem 16699MB [2024-08-11 11:08:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][500/625] eta 0:00:56 lr 0.000024 wd 0.0500 time 0.4487 (0.4511) data time 0.0006 (0.0022) model time 0.4481 (0.4483) loss 2.3271 (2.4553) grad_norm 2.4884 (inf) loss_scale 128.0000 (157.8922) mem 16699MB [2024-08-11 11:09:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][510/625] eta 0:00:51 lr 0.000024 wd 0.0500 time 0.6605 (0.4514) data time 0.0006 (0.0022) model time 0.6599 (0.4487) loss 2.1269 (2.4561) grad_norm 2.5778 (inf) loss_scale 128.0000 (157.3072) mem 16699MB [2024-08-11 11:09:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][520/625] eta 0:00:47 lr 0.000024 wd 0.0500 time 0.4451 (0.4513) data time 0.0008 (0.0022) model time 0.4443 (0.4487) loss 2.6498 (2.4552) grad_norm 1.8479 (inf) loss_scale 128.0000 (156.7447) mem 16699MB [2024-08-11 11:09:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][530/625] eta 0:00:42 lr 0.000024 wd 0.0500 time 0.4437 (0.4512) data time 0.0008 (0.0022) model time 0.4429 (0.4486) loss 3.0602 (2.4552) grad_norm 2.5082 (inf) loss_scale 128.0000 (156.2034) mem 16699MB [2024-08-11 11:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][540/625] eta 0:00:38 lr 0.000024 wd 0.0500 time 0.4486 (0.4512) data time 0.0008 (0.0021) model time 0.4478 (0.4486) loss 2.3803 (2.4553) grad_norm 2.6028 (inf) loss_scale 128.0000 (155.6821) mem 16699MB [2024-08-11 11:09:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][550/625] eta 0:00:33 lr 0.000024 wd 0.0500 time 0.4475 (0.4511) data time 0.0008 (0.0021) model time 0.4467 (0.4485) loss 3.0165 (2.4582) grad_norm 3.3816 (inf) loss_scale 128.0000 (155.1797) mem 16699MB [2024-08-11 11:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][560/625] eta 0:00:29 lr 0.000024 wd 0.0500 time 0.4453 (0.4510) data time 0.0006 (0.0021) model time 0.4447 (0.4485) loss 2.3277 (2.4605) grad_norm 1.8456 (inf) loss_scale 128.0000 (154.6952) mem 16699MB [2024-08-11 11:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][570/625] eta 0:00:24 lr 0.000024 wd 0.0500 time 0.4472 (0.4509) data time 0.0009 (0.0021) model time 0.4463 (0.4484) loss 2.7910 (2.4613) grad_norm 3.2233 (inf) loss_scale 128.0000 (154.2277) mem 16699MB [2024-08-11 11:09:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][580/625] eta 0:00:20 lr 0.000024 wd 0.0500 time 0.4471 (0.4509) data time 0.0007 (0.0021) model time 0.4464 (0.4484) loss 2.4882 (2.4630) grad_norm 2.0449 (inf) loss_scale 128.0000 (153.7762) mem 16699MB [2024-08-11 11:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][590/625] eta 0:00:15 lr 0.000024 wd 0.0500 time 0.4451 (0.4508) data time 0.0008 (0.0020) model time 0.4443 (0.4483) loss 2.5909 (2.4605) grad_norm 3.6818 (inf) loss_scale 128.0000 (153.3401) mem 16699MB [2024-08-11 11:09:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][600/625] eta 0:00:11 lr 0.000024 wd 0.0500 time 0.4517 (0.4508) data time 0.0006 (0.0020) model time 0.4511 (0.4483) loss 1.9898 (2.4601) grad_norm 2.8895 (inf) loss_scale 128.0000 (152.9185) mem 16699MB [2024-08-11 11:09:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][610/625] eta 0:00:06 lr 0.000024 wd 0.0500 time 0.4434 (0.4507) data time 0.0007 (0.0020) model time 0.4427 (0.4483) loss 2.8091 (2.4603) grad_norm 9.1521 (inf) loss_scale 128.0000 (152.5106) mem 16699MB [2024-08-11 11:09:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [281/300][620/625] eta 0:00:02 lr 0.000024 wd 0.0500 time 0.4478 (0.4509) data time 0.0004 (0.0020) model time 0.4474 (0.4485) loss 2.3473 (2.4656) grad_norm 2.1905 (inf) loss_scale 128.0000 (152.1159) mem 16699MB [2024-08-11 11:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 281 training takes 0:04:41 [2024-08-11 11:09:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:09:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:09:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.497 (0.497) Loss 0.5322 (0.5322) Acc@1 88.867 (88.867) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:09:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.154) Loss 0.8555 (0.6316) Acc@1 80.664 (87.029) Acc@5 96.338 (97.807) Mem 16699MB [2024-08-11 11:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9473 (0.7540) Acc@1 79.199 (84.119) Acc@5 95.215 (96.661) Mem 16699MB [2024-08-11 11:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.857 Acc@5 96.609 [2024-08-11 11:09:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 11:09:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.842 (0.842) Loss 0.5220 (0.5220) Acc@1 89.111 (89.111) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:09:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.186) Loss 0.8408 (0.6225) Acc@1 81.152 (87.105) Acc@5 96.240 (97.763) Mem 16699MB [2024-08-11 11:09:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.153) Loss 0.9248 (0.7422) Acc@1 79.297 (84.263) Acc@5 95.361 (96.668) Mem 16699MB [2024-08-11 11:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 84.015 Acc@5 96.609 [2024-08-11 11:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 11:10:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][0/625] eta 0:12:26 lr 0.000024 wd 0.0500 time 1.1943 (1.1943) data time 0.5280 (0.5280) model time 0.0000 (0.0000) loss 3.0307 (3.0307) grad_norm 2.8248 (2.8248) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][10/625] eta 0:05:16 lr 0.000024 wd 0.0500 time 0.4488 (0.5153) data time 0.0008 (0.0488) model time 0.0000 (0.0000) loss 2.8896 (2.4670) grad_norm 17.6555 (5.3708) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][20/625] eta 0:04:52 lr 0.000024 wd 0.0500 time 0.4482 (0.4832) data time 0.0009 (0.0260) model time 0.0000 (0.0000) loss 1.7337 (2.3580) grad_norm 2.1788 (4.2483) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][30/625] eta 0:04:40 lr 0.000024 wd 0.0500 time 0.4476 (0.4717) data time 0.0009 (0.0179) model time 0.0000 (0.0000) loss 1.8848 (2.3290) grad_norm 6.0597 (3.8500) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][40/625] eta 0:04:32 lr 0.000024 wd 0.0500 time 0.4490 (0.4666) data time 0.0009 (0.0137) model time 0.0000 (0.0000) loss 2.3536 (2.3353) grad_norm 6.6367 (3.7777) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][50/625] eta 0:04:26 lr 0.000024 wd 0.0500 time 0.4488 (0.4632) data time 0.0006 (0.0112) model time 0.0000 (0.0000) loss 1.5216 (2.3346) grad_norm 3.7660 (3.6167) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][60/625] eta 0:04:20 lr 0.000024 wd 0.0500 time 0.4505 (0.4610) data time 0.0009 (0.0095) model time 0.4495 (0.4486) loss 2.7891 (2.3656) grad_norm 2.8354 (3.4881) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][70/625] eta 0:04:14 lr 0.000024 wd 0.0500 time 0.4458 (0.4592) data time 0.0007 (0.0083) model time 0.4452 (0.4481) loss 2.4629 (2.3655) grad_norm 4.6736 (3.3998) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][80/625] eta 0:04:09 lr 0.000024 wd 0.0500 time 0.4469 (0.4577) data time 0.0008 (0.0074) model time 0.4461 (0.4474) loss 1.8134 (2.3584) grad_norm 3.0592 (3.4124) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][90/625] eta 0:04:04 lr 0.000024 wd 0.0500 time 0.4516 (0.4567) data time 0.0006 (0.0067) model time 0.4510 (0.4476) loss 2.3729 (2.3523) grad_norm 3.9156 (4.0885) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:10:41 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 11:10:41 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:10:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:15:10 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 11:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 11:15:22 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 11:15:36 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 11:15:36 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 11:15:39 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 11:15:41 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 11:15:41 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 282) [2024-08-11 11:15:41 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 11:16:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][100/625] eta 0:20:37 lr 0.000024 wd 0.0500 time 0.4421 (2.3563) data time 0.0008 (0.0767) model time 0.4413 (2.2795) loss 2.7640 (2.7264) grad_norm 3.1906 (2.7312) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][110/625] eta 0:12:01 lr 0.000024 wd 0.0500 time 0.4440 (1.4005) data time 0.0006 (0.0388) model time 0.4433 (1.3617) loss 2.9469 (2.6332) grad_norm 3.3509 (2.5898) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][120/625] eta 0:09:10 lr 0.000024 wd 0.0500 time 0.4435 (1.0901) data time 0.0008 (0.0261) model time 0.4427 (1.0639) loss 2.9445 (2.6376) grad_norm 2.4179 (3.1295) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][130/625] eta 0:07:42 lr 0.000024 wd 0.0500 time 0.4474 (0.9334) data time 0.0006 (0.0198) model time 0.4468 (0.9136) loss 1.9935 (2.5658) grad_norm 2.5097 (3.1283) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][140/625] eta 0:06:45 lr 0.000024 wd 0.0500 time 0.4462 (0.8359) data time 0.0009 (0.0160) model time 0.4453 (0.8199) loss 2.1867 (2.5502) grad_norm 3.4006 (3.3457) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][150/625] eta 0:06:06 lr 0.000024 wd 0.0500 time 0.4425 (0.7708) data time 0.0007 (0.0135) model time 0.4418 (0.7574) loss 2.7019 (2.5301) grad_norm 2.9864 (3.2802) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][160/625] eta 0:05:36 lr 0.000024 wd 0.0500 time 0.4440 (0.7243) data time 0.0006 (0.0116) model time 0.4434 (0.7127) loss 1.8087 (2.5045) grad_norm 3.1650 (3.5098) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][170/625] eta 0:05:13 lr 0.000024 wd 0.0500 time 0.4448 (0.6894) data time 0.0008 (0.0103) model time 0.4440 (0.6791) loss 2.7450 (2.5060) grad_norm 2.7506 (3.7655) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][180/625] eta 0:04:54 lr 0.000024 wd 0.0500 time 0.4473 (0.6622) data time 0.0006 (0.0092) model time 0.4467 (0.6530) loss 2.8528 (2.4951) grad_norm 2.2687 (3.7272) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][190/625] eta 0:04:38 lr 0.000024 wd 0.0500 time 0.4458 (0.6405) data time 0.0008 (0.0084) model time 0.4450 (0.6321) loss 2.7592 (2.5039) grad_norm 2.2788 (3.8036) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][200/625] eta 0:04:24 lr 0.000024 wd 0.0500 time 0.4458 (0.6229) data time 0.0008 (0.0077) model time 0.4450 (0.6152) loss 2.4671 (2.5050) grad_norm 2.4305 (3.6995) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:16:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][210/625] eta 0:04:12 lr 0.000024 wd 0.0500 time 0.4483 (0.6083) data time 0.0006 (0.0071) model time 0.4477 (0.6011) loss 3.1729 (2.5165) grad_norm 2.2837 (3.6585) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][220/625] eta 0:04:01 lr 0.000024 wd 0.0500 time 0.4471 (0.5959) data time 0.0006 (0.0066) model time 0.4465 (0.5892) loss 2.6565 (2.4997) grad_norm 3.3859 (3.6375) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][230/625] eta 0:03:51 lr 0.000024 wd 0.0500 time 0.4521 (0.5854) data time 0.0006 (0.0062) model time 0.4515 (0.5792) loss 1.2806 (2.4843) grad_norm 4.5098 (3.5843) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][240/625] eta 0:03:41 lr 0.000024 wd 0.0500 time 0.4482 (0.5763) data time 0.0008 (0.0059) model time 0.4474 (0.5705) loss 2.6620 (2.4837) grad_norm 2.1428 (3.5307) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][250/625] eta 0:03:33 lr 0.000024 wd 0.0500 time 0.4483 (0.5682) data time 0.0008 (0.0055) model time 0.4476 (0.5626) loss 2.7556 (2.4877) grad_norm 4.1142 (3.5363) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][260/625] eta 0:03:24 lr 0.000024 wd 0.0500 time 0.4484 (0.5610) data time 0.0007 (0.0053) model time 0.4477 (0.5557) loss 1.8069 (2.4858) grad_norm 2.4559 (3.4916) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][270/625] eta 0:03:16 lr 0.000024 wd 0.0500 time 0.4443 (0.5546) data time 0.0006 (0.0050) model time 0.4436 (0.5496) loss 2.1040 (2.4782) grad_norm 2.1307 (3.4491) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][280/625] eta 0:03:09 lr 0.000023 wd 0.0500 time 0.4471 (0.5496) data time 0.0006 (0.0048) model time 0.4465 (0.5448) loss 2.0563 (2.4774) grad_norm 2.4986 (3.4181) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][290/625] eta 0:03:02 lr 0.000023 wd 0.0500 time 0.4464 (0.5445) data time 0.0008 (0.0046) model time 0.4456 (0.5399) loss 2.5273 (2.4686) grad_norm 3.3869 (3.4342) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][300/625] eta 0:02:55 lr 0.000023 wd 0.0500 time 0.4516 (0.5400) data time 0.0006 (0.0044) model time 0.4510 (0.5355) loss 2.3898 (2.4619) grad_norm 2.4532 (3.4296) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][310/625] eta 0:02:48 lr 0.000023 wd 0.0500 time 0.4480 (0.5358) data time 0.0008 (0.0042) model time 0.4472 (0.5315) loss 2.2483 (2.4540) grad_norm 2.2682 (3.4245) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][320/625] eta 0:02:42 lr 0.000023 wd 0.0500 time 0.4443 (0.5320) data time 0.0008 (0.0041) model time 0.4435 (0.5279) loss 2.8816 (2.4547) grad_norm 2.5167 (3.4682) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][330/625] eta 0:02:35 lr 0.000023 wd 0.0500 time 0.4515 (0.5285) data time 0.0008 (0.0040) model time 0.4507 (0.5245) loss 2.6836 (2.4457) grad_norm 1.9986 (3.4484) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][340/625] eta 0:02:29 lr 0.000023 wd 0.0500 time 0.4468 (0.5253) data time 0.0007 (0.0038) model time 0.4461 (0.5214) loss 1.8637 (2.4410) grad_norm 2.3398 (3.4151) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][350/625] eta 0:02:23 lr 0.000023 wd 0.0500 time 0.4482 (0.5223) data time 0.0008 (0.0037) model time 0.4474 (0.5185) loss 1.9839 (2.4378) grad_norm 3.2342 (3.4192) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:18:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][360/625] eta 0:02:17 lr 0.000023 wd 0.0500 time 0.4490 (0.5195) data time 0.0007 (0.0036) model time 0.4483 (0.5159) loss 2.7545 (2.4373) grad_norm 2.3155 (3.4010) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:18:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][370/625] eta 0:02:11 lr 0.000023 wd 0.0500 time 0.4464 (0.5170) data time 0.0009 (0.0035) model time 0.4455 (0.5135) loss 2.8024 (2.4424) grad_norm 5.6577 (nan) loss_scale 64.0000 (125.7143) mem 16695MB [2024-08-11 11:18:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][380/625] eta 0:02:06 lr 0.000023 wd 0.0500 time 0.4478 (0.5146) data time 0.0008 (0.0034) model time 0.4470 (0.5112) loss 2.2427 (2.4385) grad_norm 1.8946 (nan) loss_scale 64.0000 (123.5862) mem 16695MB [2024-08-11 11:18:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][390/625] eta 0:02:00 lr 0.000023 wd 0.0500 time 0.4468 (0.5124) data time 0.0006 (0.0033) model time 0.4462 (0.5091) loss 2.0853 (2.4296) grad_norm 3.6456 (nan) loss_scale 64.0000 (121.6000) mem 16695MB [2024-08-11 11:18:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][400/625] eta 0:01:54 lr 0.000023 wd 0.0500 time 0.4445 (0.5103) data time 0.0008 (0.0033) model time 0.4438 (0.5070) loss 2.1998 (2.4288) grad_norm 6.8896 (nan) loss_scale 64.0000 (119.7419) mem 16695MB [2024-08-11 11:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][410/625] eta 0:01:49 lr 0.000023 wd 0.0500 time 0.4459 (0.5083) data time 0.0007 (0.0032) model time 0.4452 (0.5051) loss 2.3774 (2.4357) grad_norm 2.0792 (nan) loss_scale 64.0000 (118.0000) mem 16695MB [2024-08-11 11:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][420/625] eta 0:01:43 lr 0.000023 wd 0.0500 time 0.4505 (0.5064) data time 0.0006 (0.0031) model time 0.4498 (0.5033) loss 2.8920 (2.4387) grad_norm 3.1619 (nan) loss_scale 64.0000 (116.3636) mem 16695MB [2024-08-11 11:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][430/625] eta 0:01:38 lr 0.000023 wd 0.0500 time 0.4482 (0.5047) data time 0.0007 (0.0031) model time 0.4475 (0.5016) loss 2.3699 (2.4372) grad_norm 2.8588 (nan) loss_scale 64.0000 (114.8235) mem 16695MB [2024-08-11 11:18:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][440/625] eta 0:01:33 lr 0.000023 wd 0.0500 time 0.4425 (0.5030) data time 0.0009 (0.0030) model time 0.4416 (0.5000) loss 1.7891 (2.4397) grad_norm 1.9997 (nan) loss_scale 64.0000 (113.3714) mem 16695MB [2024-08-11 11:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][450/625] eta 0:01:27 lr 0.000023 wd 0.0500 time 0.4463 (0.5014) data time 0.0006 (0.0029) model time 0.4457 (0.4985) loss 2.7103 (2.4410) grad_norm 3.4121 (nan) loss_scale 64.0000 (112.0000) mem 16695MB [2024-08-11 11:18:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][460/625] eta 0:01:22 lr 0.000023 wd 0.0500 time 0.4464 (0.5006) data time 0.0008 (0.0029) model time 0.4456 (0.4977) loss 2.9027 (2.4396) grad_norm 4.7582 (nan) loss_scale 64.0000 (110.7027) mem 16695MB [2024-08-11 11:18:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][470/625] eta 0:01:17 lr 0.000023 wd 0.0500 time 0.4476 (0.4992) data time 0.0006 (0.0028) model time 0.4470 (0.4964) loss 2.2223 (2.4376) grad_norm 3.8258 (nan) loss_scale 64.0000 (109.4737) mem 16695MB [2024-08-11 11:18:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][480/625] eta 0:01:12 lr 0.000023 wd 0.0500 time 0.4520 (0.4978) data time 0.0009 (0.0028) model time 0.4511 (0.4951) loss 1.7144 (2.4305) grad_norm 1.9313 (nan) loss_scale 64.0000 (108.3077) mem 16695MB [2024-08-11 11:19:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][490/625] eta 0:01:07 lr 0.000023 wd 0.0500 time 0.4453 (0.4966) data time 0.0007 (0.0027) model time 0.4445 (0.4938) loss 2.6331 (2.4350) grad_norm 2.5394 (nan) loss_scale 64.0000 (107.2000) mem 16695MB [2024-08-11 11:19:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][500/625] eta 0:01:01 lr 0.000023 wd 0.0500 time 0.4496 (0.4954) data time 0.0006 (0.0027) model time 0.4490 (0.4927) loss 2.7550 (2.4386) grad_norm 2.5937 (nan) loss_scale 64.0000 (106.1463) mem 16695MB [2024-08-11 11:19:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][510/625] eta 0:00:56 lr 0.000023 wd 0.0500 time 0.4481 (0.4942) data time 0.0006 (0.0026) model time 0.4475 (0.4916) loss 2.2895 (2.4353) grad_norm 3.4766 (nan) loss_scale 64.0000 (105.1429) mem 16695MB [2024-08-11 11:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][520/625] eta 0:00:51 lr 0.000023 wd 0.0500 time 0.4480 (0.4931) data time 0.0008 (0.0026) model time 0.4471 (0.4905) loss 2.4718 (2.4369) grad_norm 2.5073 (nan) loss_scale 64.0000 (104.1860) mem 16695MB [2024-08-11 11:19:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][530/625] eta 0:00:46 lr 0.000023 wd 0.0500 time 0.4456 (0.4921) data time 0.0008 (0.0025) model time 0.4448 (0.4896) loss 2.1675 (2.4380) grad_norm 2.4566 (nan) loss_scale 64.0000 (103.2727) mem 16695MB [2024-08-11 11:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][540/625] eta 0:00:41 lr 0.000023 wd 0.0500 time 0.4503 (0.4911) data time 0.0006 (0.0025) model time 0.4497 (0.4886) loss 2.6725 (2.4391) grad_norm 2.8121 (nan) loss_scale 64.0000 (102.4000) mem 16695MB [2024-08-11 11:19:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][550/625] eta 0:00:36 lr 0.000023 wd 0.0500 time 0.4455 (0.4901) data time 0.0006 (0.0025) model time 0.4449 (0.4877) loss 2.7500 (2.4359) grad_norm 2.4931 (nan) loss_scale 64.0000 (101.5652) mem 16695MB [2024-08-11 11:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][560/625] eta 0:00:31 lr 0.000023 wd 0.0500 time 0.4469 (0.4892) data time 0.0009 (0.0024) model time 0.4461 (0.4868) loss 2.4568 (2.4301) grad_norm 2.8920 (nan) loss_scale 64.0000 (100.7660) mem 16695MB [2024-08-11 11:19:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][570/625] eta 0:00:26 lr 0.000023 wd 0.0500 time 0.4470 (0.4884) data time 0.0008 (0.0024) model time 0.4462 (0.4860) loss 2.0423 (2.4299) grad_norm 2.9262 (nan) loss_scale 64.0000 (100.0000) mem 16695MB [2024-08-11 11:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][580/625] eta 0:00:21 lr 0.000023 wd 0.0500 time 0.4451 (0.4875) data time 0.0006 (0.0024) model time 0.4445 (0.4851) loss 2.2571 (2.4319) grad_norm 2.3182 (nan) loss_scale 64.0000 (99.2653) mem 16695MB [2024-08-11 11:19:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][590/625] eta 0:00:17 lr 0.000023 wd 0.0500 time 0.4491 (0.4867) data time 0.0006 (0.0023) model time 0.4485 (0.4844) loss 1.7389 (2.4292) grad_norm 1.9592 (nan) loss_scale 64.0000 (98.5600) mem 16695MB [2024-08-11 11:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][600/625] eta 0:00:12 lr 0.000023 wd 0.0500 time 0.4521 (0.4860) data time 0.0008 (0.0023) model time 0.4513 (0.4837) loss 2.3941 (2.4336) grad_norm 3.2393 (nan) loss_scale 64.0000 (97.8824) mem 16695MB [2024-08-11 11:19:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][610/625] eta 0:00:07 lr 0.000023 wd 0.0500 time 0.4438 (0.4856) data time 0.0004 (0.0023) model time 0.4433 (0.4833) loss 2.2820 (2.4343) grad_norm 2.2867 (nan) loss_scale 64.0000 (97.2308) mem 16695MB [2024-08-11 11:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [282/300][620/625] eta 0:00:02 lr 0.000023 wd 0.0500 time 0.4478 (0.4848) data time 0.0006 (0.0022) model time 0.4472 (0.4826) loss 2.8829 (2.4292) grad_norm 2.6622 (nan) loss_scale 64.0000 (96.6038) mem 16695MB [2024-08-11 11:20:03 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 282 training takes 0:04:18 [2024-08-11 11:20:03 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:20:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:20:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5356 (0.5356) Acc@1 88.721 (88.721) Acc@5 98.975 (98.975) Mem 16695MB [2024-08-11 11:20:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8643 (0.6338) Acc@1 80.518 (86.927) Acc@5 96.094 (97.732) Mem 16695MB [2024-08-11 11:20:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9355 (0.7560) Acc@1 79.297 (84.077) Acc@5 95.410 (96.652) Mem 16695MB [2024-08-11 11:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.835 Acc@5 96.605 [2024-08-11 11:20:12 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:20:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.779 (0.779) Loss 0.5229 (0.5229) Acc@1 89.062 (89.062) Acc@5 98.975 (98.975) Mem 16695MB [2024-08-11 11:20:14 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.181) Loss 0.8418 (0.6231) Acc@1 81.152 (87.078) Acc@5 96.240 (97.758) Mem 16695MB [2024-08-11 11:20:15 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.150) Loss 0.9268 (0.7430) Acc@1 79.199 (84.226) Acc@5 95.361 (96.666) Mem 16695MB [2024-08-11 11:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.983 Acc@5 96.607 [2024-08-11 11:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 84.0% [2024-08-11 11:20:16 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.98% [2024-08-11 11:20:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 11:20:17 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 11:20:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][0/625] eta 0:08:50 lr 0.000023 wd 0.0500 time 0.8490 (0.8490) data time 0.3814 (0.3814) model time 0.0000 (0.0000) loss 1.8810 (1.8810) grad_norm 2.6819 (2.6819) loss_scale 64.0000 (64.0000) mem 16704MB [2024-08-11 11:20:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][10/625] eta 0:04:56 lr 0.000023 wd 0.0500 time 0.4495 (0.4824) data time 0.0006 (0.0354) model time 0.0000 (0.0000) loss 2.0856 (2.3862) grad_norm 2.7217 (2.9165) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][20/625] eta 0:04:41 lr 0.000023 wd 0.0500 time 0.4403 (0.4658) data time 0.0007 (0.0189) model time 0.0000 (0.0000) loss 2.6410 (2.4964) grad_norm 7.0376 (2.8584) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][30/625] eta 0:04:33 lr 0.000023 wd 0.0500 time 0.4470 (0.4595) data time 0.0008 (0.0130) model time 0.0000 (0.0000) loss 2.7403 (2.5038) grad_norm 3.8363 (3.1173) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][40/625] eta 0:04:30 lr 0.000023 wd 0.0500 time 0.4469 (0.4618) data time 0.0008 (0.0101) model time 0.0000 (0.0000) loss 2.3644 (2.4799) grad_norm 2.5313 (3.0830) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][50/625] eta 0:04:23 lr 0.000023 wd 0.0500 time 0.4480 (0.4589) data time 0.0009 (0.0082) model time 0.0000 (0.0000) loss 2.6411 (2.4699) grad_norm 2.8168 (3.1330) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][60/625] eta 0:04:18 lr 0.000023 wd 0.0500 time 0.4479 (0.4568) data time 0.0006 (0.0070) model time 0.4473 (0.4451) loss 2.6766 (2.4863) grad_norm 4.0009 (3.1272) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][70/625] eta 0:04:12 lr 0.000023 wd 0.0500 time 0.4433 (0.4552) data time 0.0006 (0.0062) model time 0.4427 (0.4450) loss 2.8299 (2.4628) grad_norm 1.9795 (3.1079) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][80/625] eta 0:04:07 lr 0.000023 wd 0.0500 time 0.4439 (0.4542) data time 0.0008 (0.0055) model time 0.4431 (0.4453) loss 2.8279 (2.4621) grad_norm 2.7153 (3.0874) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:20:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][90/625] eta 0:04:02 lr 0.000023 wd 0.0500 time 0.4505 (0.4534) data time 0.0008 (0.0050) model time 0.4497 (0.4456) loss 2.6905 (2.4827) grad_norm 2.9415 (3.0805) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][100/625] eta 0:03:57 lr 0.000023 wd 0.0500 time 0.4471 (0.4528) data time 0.0008 (0.0046) model time 0.4464 (0.4458) loss 2.3407 (2.4738) grad_norm 2.5731 (3.1485) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][110/625] eta 0:03:52 lr 0.000023 wd 0.0500 time 0.4484 (0.4523) data time 0.0008 (0.0042) model time 0.4477 (0.4459) loss 1.9458 (2.4508) grad_norm 4.0251 (3.3054) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][120/625] eta 0:03:48 lr 0.000023 wd 0.0500 time 0.4509 (0.4519) data time 0.0007 (0.0039) model time 0.4502 (0.4460) loss 2.4118 (2.4485) grad_norm 2.7962 (3.2664) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][130/625] eta 0:03:43 lr 0.000023 wd 0.0500 time 0.4462 (0.4515) data time 0.0006 (0.0037) model time 0.4456 (0.4460) loss 2.4191 (2.4317) grad_norm 2.8158 (3.2609) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][140/625] eta 0:03:38 lr 0.000022 wd 0.0500 time 0.4476 (0.4512) data time 0.0007 (0.0035) model time 0.4469 (0.4460) loss 2.3683 (2.4422) grad_norm 3.2259 (3.2526) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][150/625] eta 0:03:34 lr 0.000022 wd 0.0500 time 0.4493 (0.4509) data time 0.0008 (0.0033) model time 0.4485 (0.4460) loss 2.7554 (2.4428) grad_norm 2.0729 (3.2318) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][160/625] eta 0:03:29 lr 0.000022 wd 0.0500 time 0.4529 (0.4506) data time 0.0008 (0.0032) model time 0.4521 (0.4460) loss 2.2143 (2.4370) grad_norm 3.6822 (3.2860) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][170/625] eta 0:03:24 lr 0.000022 wd 0.0500 time 0.4470 (0.4504) data time 0.0006 (0.0030) model time 0.4464 (0.4460) loss 2.6760 (2.4347) grad_norm 3.3318 (3.2818) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][180/625] eta 0:03:20 lr 0.000022 wd 0.0500 time 0.4470 (0.4503) data time 0.0008 (0.0029) model time 0.4462 (0.4462) loss 2.6422 (2.4172) grad_norm 6.2592 (3.2712) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][190/625] eta 0:03:15 lr 0.000022 wd 0.0500 time 0.4459 (0.4502) data time 0.0006 (0.0028) model time 0.4453 (0.4462) loss 2.3857 (2.4182) grad_norm 2.3245 (3.2459) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][200/625] eta 0:03:11 lr 0.000022 wd 0.0500 time 0.4470 (0.4500) data time 0.0006 (0.0027) model time 0.4464 (0.4462) loss 2.2916 (2.4264) grad_norm 2.6499 (3.2428) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][210/625] eta 0:03:06 lr 0.000022 wd 0.0500 time 0.4542 (0.4499) data time 0.0007 (0.0026) model time 0.4535 (0.4463) loss 3.0568 (2.4303) grad_norm 2.9954 (3.2652) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:21:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][220/625] eta 0:03:02 lr 0.000022 wd 0.0500 time 0.4490 (0.4506) data time 0.0008 (0.0025) model time 0.4482 (0.4473) loss 1.8487 (2.4232) grad_norm 2.4120 (3.3036) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][230/625] eta 0:02:58 lr 0.000022 wd 0.0500 time 0.4462 (0.4513) data time 0.0006 (0.0024) model time 0.4456 (0.4484) loss 2.4545 (2.4301) grad_norm 2.5870 (3.2916) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][240/625] eta 0:02:53 lr 0.000022 wd 0.0500 time 0.4526 (0.4512) data time 0.0008 (0.0024) model time 0.4517 (0.4483) loss 1.8753 (2.4315) grad_norm 3.6624 (3.2896) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][250/625] eta 0:02:49 lr 0.000022 wd 0.0500 time 0.4470 (0.4511) data time 0.0009 (0.0023) model time 0.4461 (0.4483) loss 1.6400 (2.4338) grad_norm 2.1320 (3.3129) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][260/625] eta 0:02:44 lr 0.000022 wd 0.0500 time 0.4491 (0.4509) data time 0.0006 (0.0023) model time 0.4485 (0.4482) loss 2.3615 (2.4396) grad_norm 4.0146 (3.3209) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][270/625] eta 0:02:40 lr 0.000022 wd 0.0500 time 0.4461 (0.4508) data time 0.0006 (0.0022) model time 0.4455 (0.4482) loss 2.8757 (2.4334) grad_norm 2.6003 (3.3079) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][280/625] eta 0:02:35 lr 0.000022 wd 0.0500 time 0.4440 (0.4508) data time 0.0006 (0.0021) model time 0.4433 (0.4482) loss 2.5896 (2.4186) grad_norm 2.8880 (3.3304) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][290/625] eta 0:02:30 lr 0.000022 wd 0.0500 time 0.4461 (0.4506) data time 0.0006 (0.0021) model time 0.4455 (0.4481) loss 1.6558 (2.4145) grad_norm 2.2630 (3.3197) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][300/625] eta 0:02:26 lr 0.000022 wd 0.0500 time 0.4453 (0.4505) data time 0.0008 (0.0021) model time 0.4445 (0.4480) loss 1.9080 (2.4079) grad_norm 19.2540 (3.3602) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][310/625] eta 0:02:21 lr 0.000022 wd 0.0500 time 0.4467 (0.4504) data time 0.0006 (0.0020) model time 0.4461 (0.4479) loss 1.7026 (2.4033) grad_norm 2.6096 (3.3514) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][320/625] eta 0:02:17 lr 0.000022 wd 0.0500 time 0.4453 (0.4503) data time 0.0006 (0.0020) model time 0.4447 (0.4479) loss 2.6639 (2.4042) grad_norm 2.6529 (3.3353) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][330/625] eta 0:02:12 lr 0.000022 wd 0.0500 time 0.4459 (0.4502) data time 0.0008 (0.0019) model time 0.4451 (0.4478) loss 2.4939 (2.4034) grad_norm 3.1750 (3.3345) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][340/625] eta 0:02:08 lr 0.000022 wd 0.0500 time 0.4449 (0.4501) data time 0.0006 (0.0019) model time 0.4442 (0.4478) loss 2.4334 (2.4061) grad_norm 3.5434 (3.3852) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][350/625] eta 0:02:03 lr 0.000022 wd 0.0500 time 0.4432 (0.4501) data time 0.0006 (0.0019) model time 0.4427 (0.4478) loss 2.2167 (2.4024) grad_norm 7.2687 (3.3942) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:22:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][360/625] eta 0:01:59 lr 0.000022 wd 0.0500 time 0.4474 (0.4500) data time 0.0008 (0.0018) model time 0.4466 (0.4478) loss 2.5343 (2.4061) grad_norm 2.2151 (3.4105) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][370/625] eta 0:01:54 lr 0.000022 wd 0.0500 time 0.4457 (0.4500) data time 0.0006 (0.0018) model time 0.4451 (0.4477) loss 2.4895 (2.4067) grad_norm 2.1781 (3.3916) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][380/625] eta 0:01:50 lr 0.000022 wd 0.0500 time 0.4448 (0.4499) data time 0.0006 (0.0018) model time 0.4442 (0.4477) loss 2.5506 (2.4035) grad_norm 4.2365 (3.4176) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][390/625] eta 0:01:45 lr 0.000022 wd 0.0500 time 0.4514 (0.4498) data time 0.0006 (0.0018) model time 0.4508 (0.4477) loss 2.9651 (2.4081) grad_norm 2.3248 (3.4025) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][400/625] eta 0:01:41 lr 0.000022 wd 0.0500 time 0.5828 (0.4501) data time 0.0006 (0.0017) model time 0.5822 (0.4481) loss 1.7751 (2.4086) grad_norm 2.5705 (3.4132) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][410/625] eta 0:01:36 lr 0.000022 wd 0.0500 time 0.4442 (0.4499) data time 0.0006 (0.0017) model time 0.4435 (0.4478) loss 2.5946 (2.4043) grad_norm 5.3249 (3.4154) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][420/625] eta 0:01:32 lr 0.000022 wd 0.0500 time 0.4460 (0.4498) data time 0.0008 (0.0017) model time 0.4453 (0.4478) loss 2.8078 (2.4000) grad_norm 2.1630 (3.3953) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][430/625] eta 0:01:27 lr 0.000022 wd 0.0500 time 0.4487 (0.4499) data time 0.0008 (0.0017) model time 0.4479 (0.4479) loss 2.5049 (2.4005) grad_norm 2.0551 (3.3809) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][440/625] eta 0:01:23 lr 0.000022 wd 0.0500 time 0.4458 (0.4499) data time 0.0006 (0.0017) model time 0.4452 (0.4479) loss 2.6539 (2.4054) grad_norm 2.5717 (3.3697) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][450/625] eta 0:01:18 lr 0.000022 wd 0.0500 time 0.4580 (0.4506) data time 0.0007 (0.0016) model time 0.4572 (0.4487) loss 2.8781 (2.4035) grad_norm 3.4985 (3.3626) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][460/625] eta 0:01:14 lr 0.000022 wd 0.0500 time 0.4478 (0.4505) data time 0.0006 (0.0016) model time 0.4472 (0.4487) loss 2.5909 (2.4026) grad_norm 2.5409 (3.3488) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][470/625] eta 0:01:09 lr 0.000022 wd 0.0500 time 0.4478 (0.4505) data time 0.0006 (0.0016) model time 0.4472 (0.4487) loss 1.9381 (2.4004) grad_norm 2.9590 (3.3595) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][480/625] eta 0:01:05 lr 0.000022 wd 0.0500 time 0.4480 (0.4504) data time 0.0006 (0.0016) model time 0.4474 (0.4486) loss 2.8397 (2.4083) grad_norm 3.1815 (3.3459) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:23:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][490/625] eta 0:01:00 lr 0.000022 wd 0.0500 time 0.4481 (0.4504) data time 0.0006 (0.0016) model time 0.4475 (0.4486) loss 1.6663 (2.4042) grad_norm 3.0013 (3.3529) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][500/625] eta 0:00:56 lr 0.000022 wd 0.0500 time 0.4448 (0.4504) data time 0.0006 (0.0015) model time 0.4442 (0.4486) loss 2.9388 (2.4040) grad_norm 11.1264 (3.3605) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][510/625] eta 0:00:51 lr 0.000022 wd 0.0500 time 0.4496 (0.4504) data time 0.0008 (0.0015) model time 0.4488 (0.4486) loss 2.2021 (2.4071) grad_norm 2.9035 (3.3579) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][520/625] eta 0:00:47 lr 0.000022 wd 0.0500 time 0.4530 (0.4503) data time 0.0006 (0.0015) model time 0.4524 (0.4486) loss 2.7924 (2.4102) grad_norm 2.2204 (3.3636) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][530/625] eta 0:00:42 lr 0.000022 wd 0.0500 time 0.4505 (0.4503) data time 0.0007 (0.0015) model time 0.4498 (0.4486) loss 2.7430 (2.4124) grad_norm 2.4088 (3.3755) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][540/625] eta 0:00:38 lr 0.000022 wd 0.0500 time 0.4505 (0.4503) data time 0.0008 (0.0015) model time 0.4497 (0.4486) loss 2.6932 (2.4095) grad_norm 3.3958 (3.3710) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][550/625] eta 0:00:33 lr 0.000022 wd 0.0500 time 0.4488 (0.4503) data time 0.0006 (0.0015) model time 0.4482 (0.4486) loss 1.6642 (2.4096) grad_norm 2.1916 (3.3767) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][560/625] eta 0:00:29 lr 0.000022 wd 0.0500 time 0.4501 (0.4503) data time 0.0006 (0.0015) model time 0.4495 (0.4486) loss 2.2159 (2.4082) grad_norm 2.3252 (3.3724) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][570/625] eta 0:00:24 lr 0.000022 wd 0.0500 time 0.4537 (0.4503) data time 0.0006 (0.0015) model time 0.4531 (0.4487) loss 2.6111 (2.4088) grad_norm 3.2612 (3.3716) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][580/625] eta 0:00:20 lr 0.000022 wd 0.0500 time 0.4474 (0.4503) data time 0.0006 (0.0014) model time 0.4468 (0.4486) loss 3.1562 (2.4101) grad_norm 2.4044 (3.3722) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][590/625] eta 0:00:15 lr 0.000022 wd 0.0500 time 0.4473 (0.4502) data time 0.0007 (0.0014) model time 0.4466 (0.4486) loss 2.6527 (2.4112) grad_norm 3.9552 (3.3801) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][600/625] eta 0:00:11 lr 0.000022 wd 0.0500 time 0.4465 (0.4508) data time 0.0008 (0.0014) model time 0.4458 (0.4492) loss 2.4584 (2.4111) grad_norm 4.9130 (3.3958) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][610/625] eta 0:00:06 lr 0.000022 wd 0.0500 time 0.4421 (0.4507) data time 0.0006 (0.0014) model time 0.4415 (0.4492) loss 2.1578 (2.4112) grad_norm 2.9672 (3.3879) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [283/300][620/625] eta 0:00:02 lr 0.000022 wd 0.0500 time 0.6481 (0.4509) data time 0.0006 (0.0014) model time 0.6475 (0.4494) loss 2.8266 (2.4118) grad_norm 2.7829 (3.3880) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:24:59 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 283 training takes 0:04:41 [2024-08-11 11:24:59 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:25:00 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:25:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.459 (0.459) Loss 0.5356 (0.5356) Acc@1 88.916 (88.916) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:25:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8618 (0.6357) Acc@1 80.566 (86.967) Acc@5 96.338 (97.794) Mem 16699MB [2024-08-11 11:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.135) Loss 0.9463 (0.7601) Acc@1 79.004 (84.043) Acc@5 95.166 (96.642) Mem 16699MB [2024-08-11 11:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.785 Acc@5 96.597 [2024-08-11 11:25:03 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:25:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.865 (0.865) Loss 0.5244 (0.5244) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:25:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.189) Loss 0.8423 (0.6237) Acc@1 81.201 (87.074) Acc@5 96.289 (97.767) Mem 16699MB [2024-08-11 11:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.154) Loss 0.9277 (0.7439) Acc@1 79.150 (84.201) Acc@5 95.361 (96.668) Mem 16699MB [2024-08-11 11:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.949 Acc@5 96.607 [2024-08-11 11:25:07 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 11:25:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][0/625] eta 0:12:36 lr 0.000022 wd 0.0500 time 1.2105 (1.2105) data time 0.5706 (0.5706) model time 0.0000 (0.0000) loss 1.5739 (1.5739) grad_norm 3.2274 (3.2274) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][10/625] eta 0:05:17 lr 0.000022 wd 0.0500 time 0.4468 (0.5166) data time 0.0008 (0.0526) model time 0.0000 (0.0000) loss 2.7404 (2.3708) grad_norm 4.3911 (4.0484) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][20/625] eta 0:04:52 lr 0.000022 wd 0.0500 time 0.4455 (0.4837) data time 0.0008 (0.0279) model time 0.0000 (0.0000) loss 2.5633 (2.5071) grad_norm 2.7762 (3.4635) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][30/625] eta 0:04:40 lr 0.000021 wd 0.0500 time 0.4440 (0.4716) data time 0.0007 (0.0191) model time 0.0000 (0.0000) loss 2.3918 (2.5185) grad_norm 2.9119 (3.2246) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][40/625] eta 0:04:32 lr 0.000021 wd 0.0500 time 0.4445 (0.4655) data time 0.0006 (0.0147) model time 0.0000 (0.0000) loss 2.6524 (2.4871) grad_norm 1.9711 (3.0986) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][50/625] eta 0:04:25 lr 0.000021 wd 0.0500 time 0.4502 (0.4619) data time 0.0007 (0.0119) model time 0.0000 (0.0000) loss 2.4439 (2.4820) grad_norm 2.9515 (2.9832) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][60/625] eta 0:04:19 lr 0.000021 wd 0.0500 time 0.4477 (0.4596) data time 0.0008 (0.0101) model time 0.4469 (0.4472) loss 2.5809 (2.4712) grad_norm 2.4761 (2.9710) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][70/625] eta 0:04:14 lr 0.000021 wd 0.0500 time 0.4422 (0.4577) data time 0.0009 (0.0088) model time 0.4412 (0.4461) loss 2.4241 (2.4459) grad_norm 3.4991 (3.0070) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][80/625] eta 0:04:08 lr 0.000021 wd 0.0500 time 0.4468 (0.4563) data time 0.0007 (0.0078) model time 0.4461 (0.4458) loss 2.3421 (2.4350) grad_norm 2.1381 (3.0244) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][90/625] eta 0:04:03 lr 0.000021 wd 0.0500 time 0.4463 (0.4553) data time 0.0008 (0.0071) model time 0.4455 (0.4459) loss 1.8415 (2.4159) grad_norm 3.1988 (3.0033) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][100/625] eta 0:03:58 lr 0.000021 wd 0.0500 time 0.4445 (0.4544) data time 0.0008 (0.0064) model time 0.4437 (0.4459) loss 2.7433 (2.4300) grad_norm 2.6938 (3.0146) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:25:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][110/625] eta 0:03:53 lr 0.000021 wd 0.0500 time 0.4445 (0.4537) data time 0.0008 (0.0059) model time 0.4437 (0.4459) loss 2.5713 (2.4470) grad_norm 2.6447 (2.9883) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][120/625] eta 0:03:48 lr 0.000021 wd 0.0500 time 0.4499 (0.4532) data time 0.0006 (0.0055) model time 0.4494 (0.4461) loss 2.7858 (2.4525) grad_norm 4.5296 (3.0662) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][130/625] eta 0:03:44 lr 0.000021 wd 0.0500 time 0.6710 (0.4545) data time 0.0008 (0.0051) model time 0.6703 (0.4489) loss 2.7431 (2.4574) grad_norm 3.7424 (3.1040) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][140/625] eta 0:03:40 lr 0.000021 wd 0.0500 time 0.4437 (0.4539) data time 0.0008 (0.0048) model time 0.4429 (0.4486) loss 2.7508 (2.4592) grad_norm 2.0349 (3.0726) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][150/625] eta 0:03:35 lr 0.000021 wd 0.0500 time 0.4435 (0.4535) data time 0.0006 (0.0046) model time 0.4429 (0.4483) loss 2.6047 (2.4479) grad_norm 2.6954 (3.0887) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][160/625] eta 0:03:30 lr 0.000021 wd 0.0500 time 0.3912 (0.4535) data time 0.0006 (0.0043) model time 0.3905 (0.4488) loss 2.5969 (2.4317) grad_norm 17.9025 (3.1833) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][170/625] eta 0:03:26 lr 0.000021 wd 0.0500 time 0.4501 (0.4531) data time 0.0007 (0.0041) model time 0.4494 (0.4485) loss 2.8037 (2.4308) grad_norm 2.3430 (3.1895) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][180/625] eta 0:03:21 lr 0.000021 wd 0.0500 time 0.4464 (0.4528) data time 0.0008 (0.0039) model time 0.4456 (0.4484) loss 2.8708 (2.4391) grad_norm 4.4432 (3.1962) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][190/625] eta 0:03:16 lr 0.000021 wd 0.0500 time 0.4447 (0.4525) data time 0.0006 (0.0038) model time 0.4441 (0.4482) loss 2.7963 (2.4370) grad_norm 4.9427 (3.1938) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][200/625] eta 0:03:12 lr 0.000021 wd 0.0500 time 0.4486 (0.4522) data time 0.0007 (0.0036) model time 0.4478 (0.4481) loss 2.4753 (2.4417) grad_norm 2.6229 (3.1935) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][210/625] eta 0:03:07 lr 0.000021 wd 0.0500 time 0.4524 (0.4520) data time 0.0007 (0.0035) model time 0.4517 (0.4481) loss 2.3254 (2.4461) grad_norm 3.4218 (3.1616) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][220/625] eta 0:03:02 lr 0.000021 wd 0.0500 time 0.4461 (0.4518) data time 0.0008 (0.0034) model time 0.4454 (0.4480) loss 2.4726 (2.4512) grad_norm 2.7664 (3.1390) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][230/625] eta 0:02:58 lr 0.000021 wd 0.0500 time 0.4476 (0.4515) data time 0.0006 (0.0033) model time 0.4470 (0.4478) loss 2.4719 (2.4499) grad_norm 6.9684 (3.1666) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:26:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][240/625] eta 0:02:53 lr 0.000021 wd 0.0500 time 0.4449 (0.4514) data time 0.0008 (0.0031) model time 0.4442 (0.4478) loss 2.4435 (2.4443) grad_norm 3.0627 (3.1533) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][250/625] eta 0:02:49 lr 0.000021 wd 0.0500 time 0.4502 (0.4512) data time 0.0007 (0.0031) model time 0.4495 (0.4477) loss 2.9319 (2.4497) grad_norm 2.2399 (3.1205) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][260/625] eta 0:02:44 lr 0.000021 wd 0.0500 time 0.4484 (0.4511) data time 0.0008 (0.0030) model time 0.4475 (0.4477) loss 2.5016 (2.4473) grad_norm 2.1949 (3.1430) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][270/625] eta 0:02:40 lr 0.000021 wd 0.0500 time 0.4485 (0.4511) data time 0.0006 (0.0029) model time 0.4479 (0.4478) loss 2.6287 (2.4438) grad_norm 8.8815 (3.1704) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][280/625] eta 0:02:35 lr 0.000021 wd 0.0500 time 0.4462 (0.4511) data time 0.0008 (0.0028) model time 0.4454 (0.4479) loss 3.1260 (2.4448) grad_norm 2.5459 (3.2734) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][290/625] eta 0:02:31 lr 0.000021 wd 0.0500 time 0.4645 (0.4512) data time 0.0006 (0.0028) model time 0.4639 (0.4481) loss 2.3637 (2.4489) grad_norm 2.9098 (3.3185) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][300/625] eta 0:02:26 lr 0.000021 wd 0.0500 time 0.4445 (0.4511) data time 0.0006 (0.0027) model time 0.4439 (0.4481) loss 2.3381 (2.4499) grad_norm 2.1127 (3.3087) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][310/625] eta 0:02:22 lr 0.000021 wd 0.0500 time 0.4510 (0.4510) data time 0.0008 (0.0027) model time 0.4502 (0.4481) loss 2.8208 (2.4533) grad_norm 20.5269 (3.3564) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][320/625] eta 0:02:17 lr 0.000021 wd 0.0500 time 0.4526 (0.4510) data time 0.0006 (0.0026) model time 0.4520 (0.4481) loss 1.6990 (2.4447) grad_norm 2.2832 (3.3403) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][330/625] eta 0:02:13 lr 0.000021 wd 0.0500 time 0.4537 (0.4509) data time 0.0008 (0.0025) model time 0.4529 (0.4481) loss 1.6893 (2.4352) grad_norm 2.4555 (3.3193) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][340/625] eta 0:02:08 lr 0.000021 wd 0.0500 time 0.4473 (0.4509) data time 0.0008 (0.0025) model time 0.4465 (0.4481) loss 2.9183 (2.4325) grad_norm 2.4146 (3.3352) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][350/625] eta 0:02:03 lr 0.000021 wd 0.0500 time 0.4475 (0.4509) data time 0.0008 (0.0024) model time 0.4468 (0.4481) loss 1.6510 (2.4336) grad_norm 2.1694 (3.3328) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][360/625] eta 0:01:59 lr 0.000021 wd 0.0500 time 0.4469 (0.4508) data time 0.0006 (0.0024) model time 0.4463 (0.4481) loss 2.2978 (2.4366) grad_norm 3.0134 (3.4106) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][370/625] eta 0:01:54 lr 0.000021 wd 0.0500 time 0.4518 (0.4507) data time 0.0008 (0.0024) model time 0.4510 (0.4481) loss 2.3927 (2.4371) grad_norm 2.1297 (3.4270) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:27:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][380/625] eta 0:01:50 lr 0.000021 wd 0.0500 time 0.3870 (0.4510) data time 0.0007 (0.0023) model time 0.3863 (0.4485) loss 2.8473 (2.4383) grad_norm 2.8372 (3.4187) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][390/625] eta 0:01:45 lr 0.000021 wd 0.0500 time 0.4477 (0.4509) data time 0.0007 (0.0023) model time 0.4470 (0.4485) loss 2.5818 (2.4335) grad_norm 3.6736 (3.3999) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][400/625] eta 0:01:41 lr 0.000021 wd 0.0500 time 0.4487 (0.4508) data time 0.0008 (0.0022) model time 0.4479 (0.4484) loss 1.7742 (2.4330) grad_norm 2.9113 (3.3949) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][410/625] eta 0:01:36 lr 0.000021 wd 0.0500 time 0.4484 (0.4508) data time 0.0008 (0.0022) model time 0.4476 (0.4483) loss 1.8332 (2.4256) grad_norm 18.8544 (3.4341) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][420/625] eta 0:01:32 lr 0.000021 wd 0.0500 time 0.4464 (0.4507) data time 0.0008 (0.0022) model time 0.4456 (0.4483) loss 2.3116 (2.4263) grad_norm 3.4689 (3.4247) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][430/625] eta 0:01:27 lr 0.000021 wd 0.0500 time 0.4462 (0.4505) data time 0.0006 (0.0021) model time 0.4456 (0.4482) loss 2.6037 (2.4263) grad_norm 4.4873 (3.4201) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][440/625] eta 0:01:23 lr 0.000021 wd 0.0500 time 0.4487 (0.4504) data time 0.0006 (0.0021) model time 0.4481 (0.4481) loss 2.3877 (2.4288) grad_norm 2.9936 (3.4182) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][450/625] eta 0:01:18 lr 0.000021 wd 0.0500 time 0.4477 (0.4504) data time 0.0006 (0.0021) model time 0.4471 (0.4481) loss 2.7690 (2.4336) grad_norm 3.4259 (3.4060) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][460/625] eta 0:01:14 lr 0.000021 wd 0.0500 time 0.4460 (0.4503) data time 0.0008 (0.0020) model time 0.4453 (0.4480) loss 2.1296 (2.4339) grad_norm 2.5667 (3.4102) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][470/625] eta 0:01:09 lr 0.000021 wd 0.0500 time 0.4435 (0.4507) data time 0.0006 (0.0020) model time 0.4429 (0.4485) loss 2.9126 (2.4334) grad_norm 3.5694 (3.4038) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][480/625] eta 0:01:05 lr 0.000021 wd 0.0500 time 0.4435 (0.4506) data time 0.0007 (0.0020) model time 0.4428 (0.4484) loss 2.0178 (2.4357) grad_norm 4.3813 (3.3937) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][490/625] eta 0:01:00 lr 0.000021 wd 0.0500 time 0.4458 (0.4505) data time 0.0007 (0.0020) model time 0.4450 (0.4484) loss 2.6618 (2.4290) grad_norm 189.9015 (3.7820) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][500/625] eta 0:00:56 lr 0.000021 wd 0.0500 time 0.4513 (0.4504) data time 0.0008 (0.0019) model time 0.4505 (0.4483) loss 2.5544 (2.4238) grad_norm 2.8795 (3.7610) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:28:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][510/625] eta 0:00:51 lr 0.000021 wd 0.0500 time 0.4439 (0.4504) data time 0.0006 (0.0019) model time 0.4433 (0.4483) loss 2.6628 (2.4226) grad_norm 3.5033 (3.7499) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][520/625] eta 0:00:47 lr 0.000021 wd 0.0500 time 0.4436 (0.4503) data time 0.0007 (0.0019) model time 0.4429 (0.4482) loss 2.5903 (2.4213) grad_norm 2.8961 (3.7381) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][530/625] eta 0:00:42 lr 0.000021 wd 0.0500 time 0.4473 (0.4505) data time 0.0006 (0.0019) model time 0.4468 (0.4485) loss 2.1466 (2.4210) grad_norm 2.9952 (3.7281) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][540/625] eta 0:00:38 lr 0.000021 wd 0.0500 time 0.4483 (0.4505) data time 0.0008 (0.0019) model time 0.4475 (0.4485) loss 2.9526 (2.4240) grad_norm 2.6695 (3.7131) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][550/625] eta 0:00:33 lr 0.000021 wd 0.0500 time 0.4487 (0.4504) data time 0.0007 (0.0018) model time 0.4480 (0.4485) loss 2.0042 (2.4265) grad_norm 2.9452 (3.6994) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][560/625] eta 0:00:29 lr 0.000021 wd 0.0500 time 0.4471 (0.4504) data time 0.0008 (0.0018) model time 0.4463 (0.4485) loss 2.2354 (2.4262) grad_norm 3.0833 (3.6833) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][570/625] eta 0:00:24 lr 0.000020 wd 0.0500 time 0.4454 (0.4504) data time 0.0006 (0.0018) model time 0.4448 (0.4484) loss 2.4143 (2.4281) grad_norm 3.0023 (3.6727) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][580/625] eta 0:00:20 lr 0.000020 wd 0.0500 time 0.4502 (0.4503) data time 0.0008 (0.0018) model time 0.4494 (0.4484) loss 2.4116 (2.4267) grad_norm 3.0818 (3.6576) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][590/625] eta 0:00:15 lr 0.000020 wd 0.0500 time 0.4498 (0.4503) data time 0.0008 (0.0018) model time 0.4490 (0.4484) loss 1.7860 (2.4242) grad_norm 1.9446 (3.6559) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][600/625] eta 0:00:11 lr 0.000020 wd 0.0500 time 0.4485 (0.4502) data time 0.0006 (0.0017) model time 0.4479 (0.4483) loss 2.8375 (2.4231) grad_norm 2.0801 (3.6621) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][610/625] eta 0:00:06 lr 0.000020 wd 0.0500 time 0.4370 (0.4502) data time 0.0004 (0.0017) model time 0.4366 (0.4483) loss 2.5328 (2.4256) grad_norm 2.7274 (3.6462) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [284/300][620/625] eta 0:00:02 lr 0.000020 wd 0.0500 time 0.4424 (0.4501) data time 0.0006 (0.0017) model time 0.4419 (0.4482) loss 2.7632 (2.4250) grad_norm 3.4414 (3.6618) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:29:48 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 284 training takes 0:04:41 [2024-08-11 11:29:48 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:29:50 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.465 (0.465) Loss 0.5278 (0.5278) Acc@1 89.209 (89.209) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 11:29:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.152) Loss 0.8516 (0.6275) Acc@1 81.055 (87.012) Acc@5 96.094 (97.789) Mem 16699MB [2024-08-11 11:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9351 (0.7528) Acc@1 79.395 (84.117) Acc@5 95.312 (96.642) Mem 16699MB [2024-08-11 11:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.835 Acc@5 96.581 [2024-08-11 11:29:53 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:29:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.804 (0.804) Loss 0.5249 (0.5249) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 11:29:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.183) Loss 0.8438 (0.6243) Acc@1 81.201 (87.087) Acc@5 96.289 (97.776) Mem 16699MB [2024-08-11 11:29:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.151) Loss 0.9277 (0.7446) Acc@1 79.102 (84.189) Acc@5 95.361 (96.673) Mem 16699MB [2024-08-11 11:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.935 Acc@5 96.613 [2024-08-11 11:29:57 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 11:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][0/625] eta 0:12:29 lr 0.000020 wd 0.0500 time 1.1984 (1.1984) data time 0.5124 (0.5124) model time 0.0000 (0.0000) loss 2.1460 (2.1460) grad_norm 3.9055 (3.9055) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][10/625] eta 0:05:16 lr 0.000020 wd 0.0500 time 0.4424 (0.5154) data time 0.0006 (0.0473) model time 0.0000 (0.0000) loss 2.9361 (2.5410) grad_norm 3.6012 (3.3134) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][20/625] eta 0:04:52 lr 0.000020 wd 0.0500 time 0.4487 (0.4833) data time 0.0007 (0.0251) model time 0.0000 (0.0000) loss 1.9101 (2.4648) grad_norm 2.1566 (2.9692) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][30/625] eta 0:04:40 lr 0.000020 wd 0.0500 time 0.4474 (0.4718) data time 0.0008 (0.0173) model time 0.0000 (0.0000) loss 2.6602 (2.4481) grad_norm 4.2615 (2.9180) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][40/625] eta 0:04:32 lr 0.000020 wd 0.0500 time 0.4510 (0.4663) data time 0.0007 (0.0133) model time 0.0000 (0.0000) loss 2.6885 (2.4772) grad_norm 3.4396 (2.9358) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][50/625] eta 0:04:25 lr 0.000020 wd 0.0500 time 0.4459 (0.4623) data time 0.0007 (0.0108) model time 0.0000 (0.0000) loss 2.1911 (2.4903) grad_norm 2.3192 (2.9282) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][60/625] eta 0:04:21 lr 0.000020 wd 0.0500 time 0.4465 (0.4627) data time 0.0006 (0.0092) model time 0.4459 (0.4639) loss 2.1880 (2.4607) grad_norm 1.9748 (2.9501) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][70/625] eta 0:04:15 lr 0.000020 wd 0.0500 time 0.4467 (0.4605) data time 0.0006 (0.0080) model time 0.4460 (0.4553) loss 2.0138 (2.4262) grad_norm 2.9782 (2.9398) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][80/625] eta 0:04:10 lr 0.000020 wd 0.0500 time 0.4475 (0.4589) data time 0.0005 (0.0071) model time 0.4469 (0.4523) loss 2.8454 (2.4182) grad_norm 2.7585 (2.9810) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][90/625] eta 0:04:04 lr 0.000020 wd 0.0500 time 0.4508 (0.4576) data time 0.0009 (0.0064) model time 0.4499 (0.4510) loss 2.9518 (2.4136) grad_norm 2.0975 (3.0563) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][100/625] eta 0:03:59 lr 0.000020 wd 0.0500 time 0.4472 (0.4566) data time 0.0007 (0.0058) model time 0.4465 (0.4502) loss 2.7087 (2.4226) grad_norm 2.9852 (3.0182) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][110/625] eta 0:03:55 lr 0.000020 wd 0.0500 time 0.4517 (0.4573) data time 0.0009 (0.0054) model time 0.4508 (0.4522) loss 2.5926 (2.4107) grad_norm 2.3215 (2.9962) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][120/625] eta 0:03:50 lr 0.000020 wd 0.0500 time 0.4478 (0.4564) data time 0.0007 (0.0050) model time 0.4470 (0.4514) loss 2.7342 (2.4208) grad_norm 2.8179 (3.0821) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:30:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][130/625] eta 0:03:45 lr 0.000020 wd 0.0500 time 0.4446 (0.4557) data time 0.0008 (0.0047) model time 0.4438 (0.4508) loss 1.6954 (2.4082) grad_norm 2.7842 (3.0507) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][140/625] eta 0:03:40 lr 0.000020 wd 0.0500 time 0.4445 (0.4550) data time 0.0007 (0.0044) model time 0.4438 (0.4501) loss 2.5792 (2.3984) grad_norm 2.6300 (3.0245) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][150/625] eta 0:03:36 lr 0.000020 wd 0.0500 time 0.4498 (0.4548) data time 0.0007 (0.0042) model time 0.4491 (0.4501) loss 2.5448 (2.3951) grad_norm 3.6631 (3.0075) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][160/625] eta 0:03:31 lr 0.000020 wd 0.0500 time 0.4469 (0.4543) data time 0.0006 (0.0040) model time 0.4463 (0.4498) loss 2.4311 (2.3940) grad_norm 2.5980 (3.1470) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][170/625] eta 0:03:26 lr 0.000020 wd 0.0500 time 0.4513 (0.4540) data time 0.0006 (0.0038) model time 0.4507 (0.4497) loss 2.3407 (2.3838) grad_norm 4.0004 (3.1675) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][180/625] eta 0:03:21 lr 0.000020 wd 0.0500 time 0.4445 (0.4536) data time 0.0006 (0.0036) model time 0.4439 (0.4494) loss 2.8150 (2.3829) grad_norm 2.3073 (3.2548) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][190/625] eta 0:03:17 lr 0.000020 wd 0.0500 time 0.4499 (0.4532) data time 0.0007 (0.0035) model time 0.4492 (0.4492) loss 2.4548 (2.3799) grad_norm 1.6389 (3.2335) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][200/625] eta 0:03:12 lr 0.000020 wd 0.0500 time 0.4454 (0.4529) data time 0.0005 (0.0033) model time 0.4449 (0.4490) loss 2.7004 (2.3703) grad_norm 6.2210 (3.2549) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][210/625] eta 0:03:07 lr 0.000020 wd 0.0500 time 0.4481 (0.4527) data time 0.0006 (0.0032) model time 0.4476 (0.4489) loss 1.6729 (2.3700) grad_norm 2.2978 (3.2368) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][220/625] eta 0:03:03 lr 0.000020 wd 0.0500 time 0.4445 (0.4525) data time 0.0006 (0.0031) model time 0.4439 (0.4487) loss 2.6158 (2.3767) grad_norm 2.7160 (3.2164) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][230/625] eta 0:02:58 lr 0.000020 wd 0.0500 time 0.4596 (0.4523) data time 0.0006 (0.0030) model time 0.4590 (0.4487) loss 2.0810 (2.3832) grad_norm 2.2315 (3.1898) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][240/625] eta 0:02:54 lr 0.000020 wd 0.0500 time 0.6508 (0.4529) data time 0.0009 (0.0029) model time 0.6499 (0.4496) loss 2.4607 (2.3831) grad_norm 2.8510 (3.1885) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][250/625] eta 0:02:49 lr 0.000020 wd 0.0500 time 0.4480 (0.4528) data time 0.0006 (0.0028) model time 0.4474 (0.4496) loss 1.5894 (2.3846) grad_norm 1.9469 (3.1760) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][260/625] eta 0:02:45 lr 0.000020 wd 0.0500 time 0.4419 (0.4526) data time 0.0007 (0.0027) model time 0.4412 (0.4495) loss 2.5470 (2.3912) grad_norm 1.7177 (3.1623) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:31:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][270/625] eta 0:02:40 lr 0.000020 wd 0.0500 time 0.4435 (0.4524) data time 0.0006 (0.0027) model time 0.4428 (0.4494) loss 2.0331 (2.3942) grad_norm 4.7034 (3.1655) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][280/625] eta 0:02:36 lr 0.000020 wd 0.0500 time 0.4477 (0.4523) data time 0.0006 (0.0026) model time 0.4471 (0.4493) loss 2.6978 (2.3927) grad_norm 2.6104 (3.1545) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][290/625] eta 0:02:31 lr 0.000020 wd 0.0500 time 0.4461 (0.4521) data time 0.0006 (0.0025) model time 0.4455 (0.4492) loss 2.6119 (2.3986) grad_norm 2.9526 (3.1316) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][300/625] eta 0:02:26 lr 0.000020 wd 0.0500 time 0.4447 (0.4519) data time 0.0006 (0.0025) model time 0.4441 (0.4491) loss 2.8466 (2.4003) grad_norm 2.5362 (3.6627) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][310/625] eta 0:02:22 lr 0.000020 wd 0.0500 time 0.4452 (0.4518) data time 0.0006 (0.0024) model time 0.4445 (0.4490) loss 3.2251 (2.4050) grad_norm 2.9198 (3.6518) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][320/625] eta 0:02:17 lr 0.000020 wd 0.0500 time 0.4465 (0.4517) data time 0.0008 (0.0024) model time 0.4458 (0.4490) loss 2.5819 (2.4031) grad_norm 2.1400 (3.6277) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][330/625] eta 0:02:13 lr 0.000020 wd 0.0500 time 0.4494 (0.4518) data time 0.0007 (0.0023) model time 0.4487 (0.4491) loss 2.7633 (2.4075) grad_norm 2.8382 (3.6252) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][340/625] eta 0:02:08 lr 0.000020 wd 0.0500 time 0.4437 (0.4516) data time 0.0006 (0.0023) model time 0.4431 (0.4490) loss 2.6429 (2.4085) grad_norm 2.3900 (3.6596) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:32:31 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 11:32:31 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:32:32 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:34:41 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 11:34:42 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 11:34:56 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 11:35:09 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 11:35:09 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 11:35:12 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 11:35:14 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 11:35:14 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 285) [2024-08-11 11:35:14 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 11:35:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][350/625] eta 0:11:18 lr 0.000020 wd 0.0500 time 0.4787 (2.4676) data time 0.0010 (0.0645) model time 0.4777 (2.4031) loss 2.5997 (2.6939) grad_norm 4.2967 (3.7017) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:35:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][360/625] eta 0:06:30 lr 0.000020 wd 0.0500 time 0.4806 (1.4736) data time 0.0009 (0.0329) model time 0.4797 (1.4408) loss 2.6341 (2.6322) grad_norm 2.3204 (7.8564) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:35:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][370/625] eta 0:04:53 lr 0.000020 wd 0.0500 time 0.4730 (1.1506) data time 0.0011 (0.0223) model time 0.4719 (1.1283) loss 2.4712 (2.6590) grad_norm 2.2075 (6.3219) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:35:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][380/625] eta 0:04:01 lr 0.000020 wd 0.0500 time 0.4723 (0.9872) data time 0.0008 (0.0170) model time 0.4714 (0.9702) loss 2.0622 (2.6008) grad_norm 4.1609 (5.5100) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][390/625] eta 0:03:27 lr 0.000020 wd 0.0500 time 0.4652 (0.8851) data time 0.0010 (0.0139) model time 0.4642 (0.8712) loss 2.4127 (2.5747) grad_norm 2.2930 (5.0064) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][400/625] eta 0:03:03 lr 0.000020 wd 0.0500 time 0.4852 (0.8176) data time 0.0009 (0.0117) model time 0.4843 (0.8058) loss 2.4154 (2.5390) grad_norm 4.2186 (4.6894) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][410/625] eta 0:02:45 lr 0.000020 wd 0.0500 time 0.4812 (0.7696) data time 0.0009 (0.0102) model time 0.4803 (0.7594) loss 2.0112 (2.5194) grad_norm 4.3156 (4.5632) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][420/625] eta 0:02:30 lr 0.000020 wd 0.0500 time 0.4824 (0.7337) data time 0.0012 (0.0091) model time 0.4813 (0.7246) loss 2.8690 (2.5219) grad_norm 2.4666 (4.3239) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][430/625] eta 0:02:17 lr 0.000020 wd 0.0500 time 0.4800 (0.7057) data time 0.0008 (0.0082) model time 0.4792 (0.6975) loss 2.5852 (2.5053) grad_norm 2.7347 (4.2089) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][440/625] eta 0:02:06 lr 0.000020 wd 0.0500 time 0.4755 (0.6828) data time 0.0011 (0.0075) model time 0.4744 (0.6753) loss 2.6391 (2.5047) grad_norm 4.0527 (4.1124) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][450/625] eta 0:01:56 lr 0.000020 wd 0.0500 time 0.4770 (0.6642) data time 0.0010 (0.0069) model time 0.4759 (0.6573) loss 2.4212 (2.5034) grad_norm 2.8264 (4.0170) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][460/625] eta 0:01:47 lr 0.000020 wd 0.0500 time 0.4726 (0.6486) data time 0.0009 (0.0064) model time 0.4717 (0.6422) loss 2.8655 (2.5097) grad_norm 2.4534 (3.9210) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][470/625] eta 0:01:38 lr 0.000020 wd 0.0500 time 0.4837 (0.6356) data time 0.0009 (0.0060) model time 0.4828 (0.6296) loss 2.6157 (2.4982) grad_norm 2.4287 (3.8769) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][480/625] eta 0:01:30 lr 0.000020 wd 0.0500 time 0.4838 (0.6247) data time 0.0008 (0.0057) model time 0.4829 (0.6191) loss 1.5023 (2.4968) grad_norm 2.4319 (4.1130) loss_scale 64.0000 (64.0000) mem 16700MB [2024-08-11 11:36:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][490/625] eta 0:01:23 lr 0.000020 wd 0.0500 time 0.4786 (0.6153) data time 0.0011 (0.0054) model time 0.4774 (0.6099) loss 2.8217 (2.4925) grad_norm 3.2890 (4.0724) loss_scale 128.0000 (66.1333) mem 16700MB [2024-08-11 11:36:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][500/625] eta 0:01:15 lr 0.000020 wd 0.0500 time 0.4829 (0.6070) data time 0.0011 (0.0051) model time 0.4817 (0.6019) loss 2.4146 (2.4898) grad_norm 3.0480 (4.0225) loss_scale 128.0000 (70.0000) mem 16700MB [2024-08-11 11:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][510/625] eta 0:01:08 lr 0.000020 wd 0.0500 time 0.4844 (0.5996) data time 0.0008 (0.0049) model time 0.4836 (0.5948) loss 1.6092 (2.4871) grad_norm 3.9516 (3.9620) loss_scale 128.0000 (73.4118) mem 16700MB [2024-08-11 11:37:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][520/625] eta 0:01:02 lr 0.000019 wd 0.0500 time 0.4820 (0.5929) data time 0.0009 (0.0047) model time 0.4810 (0.5882) loss 1.7886 (2.4788) grad_norm 2.6901 (3.9505) loss_scale 128.0000 (76.4444) mem 16700MB [2024-08-11 11:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][530/625] eta 0:00:55 lr 0.000019 wd 0.0500 time 0.4794 (0.5876) data time 0.0008 (0.0045) model time 0.4785 (0.5832) loss 2.0888 (2.4781) grad_norm 3.1824 (4.0835) loss_scale 128.0000 (79.1579) mem 16700MB [2024-08-11 11:37:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][540/625] eta 0:00:49 lr 0.000019 wd 0.0500 time 0.4757 (0.5822) data time 0.0011 (0.0043) model time 0.4746 (0.5779) loss 2.6997 (2.4660) grad_norm 4.2809 (4.0672) loss_scale 128.0000 (81.6000) mem 16700MB [2024-08-11 11:37:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][550/625] eta 0:00:43 lr 0.000019 wd 0.0500 time 0.4830 (0.5774) data time 0.0008 (0.0041) model time 0.4823 (0.5733) loss 2.4560 (2.4575) grad_norm 3.4167 (4.0151) loss_scale 128.0000 (83.8095) mem 16700MB [2024-08-11 11:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][560/625] eta 0:00:37 lr 0.000019 wd 0.0500 time 0.4832 (0.5731) data time 0.0010 (0.0040) model time 0.4822 (0.5691) loss 2.5134 (2.4586) grad_norm 2.9299 (3.9476) loss_scale 128.0000 (85.8182) mem 16700MB [2024-08-11 11:37:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][570/625] eta 0:00:31 lr 0.000019 wd 0.0500 time 0.4787 (0.5691) data time 0.0010 (0.0039) model time 0.4777 (0.5653) loss 2.5379 (2.4623) grad_norm 3.2468 (3.9018) loss_scale 128.0000 (87.6522) mem 16700MB [2024-08-11 11:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][580/625] eta 0:00:25 lr 0.000019 wd 0.0500 time 0.5022 (0.5656) data time 0.0011 (0.0038) model time 0.5011 (0.5619) loss 2.6555 (2.4543) grad_norm 3.2285 (3.8783) loss_scale 128.0000 (89.3333) mem 16700MB [2024-08-11 11:37:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][590/625] eta 0:00:19 lr 0.000019 wd 0.0500 time 0.4780 (0.5622) data time 0.0009 (0.0037) model time 0.4771 (0.5586) loss 1.8069 (2.4466) grad_norm 3.3185 (3.8280) loss_scale 128.0000 (90.8800) mem 16700MB [2024-08-11 11:37:42 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 11:37:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:37:48 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:47:09 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 11:47:10 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 11:47:23 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 11:52:34 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 11:52:35 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 11:52:45 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 11:53:00 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 11:53:00 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 11:53:02 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 11:53:04 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 11:53:04 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 285) [2024-08-11 11:53:04 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 11:53:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][600/625] eta 0:02:09 lr 0.000019 wd 0.0500 time 0.4416 (5.1600) data time 0.0006 (0.2864) model time 0.4410 (4.8736) loss 3.0641 (2.7233) grad_norm 3.1165 (4.8184) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:53:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][610/625] eta 0:00:26 lr 0.000019 wd 0.0500 time 0.4388 (1.7905) data time 0.0004 (0.0827) model time 0.4384 (1.7078) loss 2.4779 (2.5328) grad_norm 17.2962 (4.6589) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:53:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [285/300][620/625] eta 0:00:06 lr 0.000019 wd 0.0500 time 0.4383 (1.2275) data time 0.0006 (0.0485) model time 0.4377 (1.1791) loss 2.3943 (2.5869) grad_norm 2.5524 (3.8968) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 11:53:38 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 285 training takes 0:00:31 [2024-08-11 11:53:38 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:53:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:53:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.463 (0.463) Loss 0.5396 (0.5396) Acc@1 88.770 (88.770) Acc@5 98.828 (98.828) Mem 16695MB [2024-08-11 11:53:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.114 (0.151) Loss 0.8613 (0.6320) Acc@1 80.811 (87.056) Acc@5 96.289 (97.763) Mem 16695MB [2024-08-11 11:53:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.144) Loss 0.9365 (0.7564) Acc@1 79.443 (84.066) Acc@5 95.166 (96.633) Mem 16695MB [2024-08-11 11:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.805 Acc@5 96.561 [2024-08-11 11:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:53:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.821 (0.821) Loss 0.5249 (0.5249) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16695MB [2024-08-11 11:53:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.209) Loss 0.8447 (0.6248) Acc@1 81.152 (87.087) Acc@5 96.289 (97.781) Mem 16695MB [2024-08-11 11:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.114 (0.164) Loss 0.9282 (0.7453) Acc@1 79.102 (84.189) Acc@5 95.459 (96.670) Mem 16695MB [2024-08-11 11:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.939 Acc@5 96.605 [2024-08-11 11:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 11:53:51 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.94% [2024-08-11 11:53:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 11:53:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 11:53:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][0/625] eta 0:08:55 lr 0.000019 wd 0.0500 time 0.8566 (0.8566) data time 0.3732 (0.3732) model time 0.0000 (0.0000) loss 2.8092 (2.8092) grad_norm 2.9563 (2.9563) loss_scale 128.0000 (128.0000) mem 16704MB [2024-08-11 11:54:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][10/625] eta 0:04:55 lr 0.000019 wd 0.0500 time 0.4447 (0.4808) data time 0.0008 (0.0348) model time 0.0000 (0.0000) loss 2.6912 (2.5420) grad_norm 7.3001 (3.4390) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][20/625] eta 0:04:40 lr 0.000019 wd 0.0500 time 0.4381 (0.4630) data time 0.0008 (0.0187) model time 0.0000 (0.0000) loss 2.3626 (2.4802) grad_norm 3.4757 (3.3607) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][30/625] eta 0:04:32 lr 0.000019 wd 0.0500 time 0.4438 (0.4572) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 1.5748 (2.4562) grad_norm 5.8743 (3.4118) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][40/625] eta 0:04:25 lr 0.000019 wd 0.0500 time 0.4470 (0.4544) data time 0.0008 (0.0100) model time 0.0000 (0.0000) loss 2.6986 (2.4329) grad_norm 2.4139 (3.2892) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][50/625] eta 0:04:20 lr 0.000019 wd 0.0500 time 0.4385 (0.4524) data time 0.0008 (0.0082) model time 0.0000 (0.0000) loss 2.7561 (2.4229) grad_norm 2.7606 (3.1706) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][60/625] eta 0:04:14 lr 0.000019 wd 0.0500 time 0.4430 (0.4510) data time 0.0006 (0.0070) model time 0.4424 (0.4431) loss 2.2292 (2.4140) grad_norm 3.1946 (3.1440) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][70/625] eta 0:04:09 lr 0.000019 wd 0.0500 time 0.4436 (0.4501) data time 0.0007 (0.0061) model time 0.4428 (0.4435) loss 2.5466 (2.4347) grad_norm 3.0102 (3.2062) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][80/625] eta 0:04:04 lr 0.000019 wd 0.0500 time 0.4431 (0.4495) data time 0.0007 (0.0055) model time 0.4425 (0.4438) loss 2.7252 (2.4463) grad_norm 2.6542 (3.1683) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][90/625] eta 0:04:00 lr 0.000019 wd 0.0500 time 0.4453 (0.4490) data time 0.0008 (0.0050) model time 0.4444 (0.4438) loss 2.9181 (2.4565) grad_norm 2.8905 (3.1558) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][100/625] eta 0:03:55 lr 0.000019 wd 0.0500 time 0.4453 (0.4487) data time 0.0007 (0.0046) model time 0.4446 (0.4440) loss 2.3936 (2.4480) grad_norm 3.1456 (3.2235) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][110/625] eta 0:03:50 lr 0.000019 wd 0.0500 time 0.4475 (0.4484) data time 0.0006 (0.0042) model time 0.4469 (0.4441) loss 2.7514 (2.4506) grad_norm 2.8907 (3.2100) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][120/625] eta 0:03:46 lr 0.000019 wd 0.0500 time 0.4424 (0.4482) data time 0.0006 (0.0040) model time 0.4418 (0.4442) loss 2.4607 (2.4408) grad_norm 3.2625 (3.4355) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:54:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][130/625] eta 0:03:41 lr 0.000019 wd 0.0500 time 0.4455 (0.4479) data time 0.0006 (0.0037) model time 0.4448 (0.4442) loss 2.9203 (2.4373) grad_norm 12.7778 (3.4668) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][140/625] eta 0:03:37 lr 0.000019 wd 0.0500 time 0.4420 (0.4478) data time 0.0009 (0.0035) model time 0.4411 (0.4443) loss 2.7730 (2.4504) grad_norm 3.2595 (3.4375) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][150/625] eta 0:03:32 lr 0.000019 wd 0.0500 time 0.4471 (0.4476) data time 0.0007 (0.0034) model time 0.4464 (0.4443) loss 2.4896 (2.4447) grad_norm 2.2610 (3.4350) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][160/625] eta 0:03:28 lr 0.000019 wd 0.0500 time 0.4447 (0.4475) data time 0.0012 (0.0032) model time 0.4435 (0.4444) loss 2.9270 (2.4523) grad_norm 9.3269 (3.4436) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][170/625] eta 0:03:23 lr 0.000019 wd 0.0500 time 0.4442 (0.4474) data time 0.0009 (0.0031) model time 0.4433 (0.4444) loss 1.9623 (2.4364) grad_norm 2.2103 (3.4046) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][180/625] eta 0:03:19 lr 0.000019 wd 0.0500 time 0.4474 (0.4474) data time 0.0007 (0.0030) model time 0.4468 (0.4445) loss 2.5713 (2.4349) grad_norm 4.4959 (3.4126) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][190/625] eta 0:03:14 lr 0.000019 wd 0.0500 time 0.4467 (0.4481) data time 0.0006 (0.0029) model time 0.4461 (0.4457) loss 3.0417 (2.4334) grad_norm 4.5401 (3.4040) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][200/625] eta 0:03:10 lr 0.000019 wd 0.0500 time 0.4418 (0.4481) data time 0.0006 (0.0028) model time 0.4413 (0.4458) loss 2.0171 (2.4340) grad_norm 3.0910 (3.4911) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][210/625] eta 0:03:05 lr 0.000019 wd 0.0500 time 0.4456 (0.4480) data time 0.0006 (0.0027) model time 0.4450 (0.4457) loss 1.5630 (2.4271) grad_norm 2.9898 (3.4539) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][220/625] eta 0:03:01 lr 0.000019 wd 0.0500 time 0.4449 (0.4479) data time 0.0006 (0.0026) model time 0.4443 (0.4456) loss 2.5229 (2.4258) grad_norm 6.9047 (3.4456) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][230/625] eta 0:02:56 lr 0.000019 wd 0.0500 time 0.4470 (0.4478) data time 0.0008 (0.0025) model time 0.4462 (0.4456) loss 2.6756 (2.4191) grad_norm 2.5280 (3.4416) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][240/625] eta 0:02:52 lr 0.000019 wd 0.0500 time 0.4447 (0.4477) data time 0.0006 (0.0024) model time 0.4440 (0.4456) loss 1.6923 (2.4095) grad_norm 2.9677 (3.4336) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][250/625] eta 0:02:47 lr 0.000019 wd 0.0500 time 0.4410 (0.4477) data time 0.0008 (0.0024) model time 0.4401 (0.4455) loss 2.6751 (2.4159) grad_norm 2.4140 (3.4140) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][260/625] eta 0:02:43 lr 0.000019 wd 0.0500 time 0.4465 (0.4476) data time 0.0008 (0.0023) model time 0.4458 (0.4456) loss 2.4358 (2.4177) grad_norm 2.9550 (3.4362) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:55:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][270/625] eta 0:02:38 lr 0.000019 wd 0.0500 time 0.4455 (0.4476) data time 0.0007 (0.0023) model time 0.4448 (0.4456) loss 2.8981 (2.4068) grad_norm 7.7913 (3.4347) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][280/625] eta 0:02:34 lr 0.000019 wd 0.0500 time 0.4460 (0.4482) data time 0.0008 (0.0022) model time 0.4452 (0.4463) loss 2.8552 (2.4028) grad_norm 7.4085 (3.4386) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][290/625] eta 0:02:30 lr 0.000019 wd 0.0500 time 0.4461 (0.4480) data time 0.0007 (0.0022) model time 0.4455 (0.4462) loss 2.7726 (2.4141) grad_norm 2.5984 (3.4830) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][300/625] eta 0:02:25 lr 0.000019 wd 0.0500 time 0.4429 (0.4479) data time 0.0008 (0.0021) model time 0.4421 (0.4461) loss 1.7337 (2.4175) grad_norm 2.3235 (3.4628) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][310/625] eta 0:02:21 lr 0.000019 wd 0.0500 time 0.4438 (0.4478) data time 0.0008 (0.0021) model time 0.4430 (0.4460) loss 2.1350 (2.4197) grad_norm 2.4818 (3.5355) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][320/625] eta 0:02:16 lr 0.000019 wd 0.0500 time 0.4440 (0.4477) data time 0.0006 (0.0021) model time 0.4435 (0.4460) loss 2.4883 (2.4246) grad_norm 2.5618 (3.6279) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][330/625] eta 0:02:12 lr 0.000019 wd 0.0500 time 0.4439 (0.4477) data time 0.0007 (0.0020) model time 0.4433 (0.4459) loss 2.3859 (2.4226) grad_norm 3.6294 (3.6108) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][340/625] eta 0:02:07 lr 0.000019 wd 0.0500 time 0.4436 (0.4480) data time 0.0008 (0.0020) model time 0.4428 (0.4464) loss 2.4670 (2.4217) grad_norm 2.5107 (3.5981) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][350/625] eta 0:02:03 lr 0.000019 wd 0.0500 time 0.4445 (0.4479) data time 0.0006 (0.0020) model time 0.4439 (0.4462) loss 2.5794 (2.4220) grad_norm 3.6946 (3.6057) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][360/625] eta 0:01:58 lr 0.000019 wd 0.0500 time 0.4419 (0.4478) data time 0.0006 (0.0019) model time 0.4413 (0.4461) loss 2.6151 (2.4172) grad_norm 3.8592 (3.6764) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][370/625] eta 0:01:54 lr 0.000019 wd 0.0500 time 0.4467 (0.4477) data time 0.0007 (0.0019) model time 0.4460 (0.4461) loss 2.7189 (2.4201) grad_norm 2.3858 (3.6561) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][380/625] eta 0:01:49 lr 0.000019 wd 0.0500 time 0.4471 (0.4476) data time 0.0009 (0.0019) model time 0.4462 (0.4460) loss 2.5098 (2.4246) grad_norm 4.7454 (3.6357) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][390/625] eta 0:01:45 lr 0.000019 wd 0.0500 time 0.4468 (0.4476) data time 0.0006 (0.0018) model time 0.4461 (0.4460) loss 1.6775 (2.4238) grad_norm 2.0620 (3.6116) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:56:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][400/625] eta 0:01:40 lr 0.000019 wd 0.0500 time 0.4455 (0.4476) data time 0.0006 (0.0018) model time 0.4448 (0.4460) loss 2.2794 (2.4246) grad_norm 5.2827 (3.5897) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][410/625] eta 0:01:36 lr 0.000019 wd 0.0500 time 0.4470 (0.4480) data time 0.0009 (0.0018) model time 0.4461 (0.4465) loss 2.2407 (2.4278) grad_norm 3.6045 (3.5904) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][420/625] eta 0:01:31 lr 0.000019 wd 0.0500 time 0.4460 (0.4480) data time 0.0008 (0.0018) model time 0.4452 (0.4465) loss 2.4661 (2.4283) grad_norm 2.7064 (3.5694) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][430/625] eta 0:01:27 lr 0.000019 wd 0.0500 time 0.4459 (0.4479) data time 0.0008 (0.0018) model time 0.4450 (0.4464) loss 2.1949 (2.4241) grad_norm 3.8304 (3.5629) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][440/625] eta 0:01:22 lr 0.000019 wd 0.0500 time 0.4351 (0.4479) data time 0.0008 (0.0017) model time 0.4343 (0.4464) loss 1.3866 (2.4181) grad_norm 3.2450 (3.5696) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][450/625] eta 0:01:18 lr 0.000019 wd 0.0500 time 0.4422 (0.4478) data time 0.0009 (0.0017) model time 0.4414 (0.4463) loss 2.4742 (2.4183) grad_norm 2.9802 (3.5623) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][460/625] eta 0:01:13 lr 0.000019 wd 0.0500 time 0.4468 (0.4478) data time 0.0009 (0.0017) model time 0.4459 (0.4463) loss 2.4417 (2.4212) grad_norm 2.9611 (3.5486) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][470/625] eta 0:01:09 lr 0.000019 wd 0.0500 time 0.4454 (0.4479) data time 0.0006 (0.0017) model time 0.4448 (0.4465) loss 2.4120 (2.4204) grad_norm 2.5185 (3.5539) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][480/625] eta 0:01:04 lr 0.000019 wd 0.0500 time 0.4475 (0.4479) data time 0.0007 (0.0017) model time 0.4468 (0.4465) loss 2.7719 (2.4210) grad_norm 3.0628 (3.5428) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][490/625] eta 0:01:00 lr 0.000019 wd 0.0500 time 0.4434 (0.4478) data time 0.0008 (0.0016) model time 0.4426 (0.4464) loss 1.9491 (2.4244) grad_norm 2.7383 (3.5404) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][500/625] eta 0:00:55 lr 0.000019 wd 0.0500 time 0.4455 (0.4478) data time 0.0006 (0.0016) model time 0.4449 (0.4463) loss 2.7145 (2.4190) grad_norm 2.3516 (3.5394) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][510/625] eta 0:00:51 lr 0.000018 wd 0.0500 time 0.4458 (0.4477) data time 0.0008 (0.0016) model time 0.4450 (0.4463) loss 2.6859 (2.4184) grad_norm 2.1978 (3.5278) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][520/625] eta 0:00:47 lr 0.000018 wd 0.0500 time 0.4443 (0.4477) data time 0.0006 (0.0016) model time 0.4437 (0.4463) loss 3.1143 (2.4200) grad_norm 2.9604 (3.5183) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][530/625] eta 0:00:42 lr 0.000018 wd 0.0500 time 0.4436 (0.4477) data time 0.0007 (0.0016) model time 0.4429 (0.4463) loss 2.6167 (2.4229) grad_norm 1.9720 (3.5112) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:57:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][540/625] eta 0:00:38 lr 0.000018 wd 0.0500 time 0.4470 (0.4477) data time 0.0008 (0.0016) model time 0.4462 (0.4463) loss 2.4417 (2.4256) grad_norm 2.2969 (3.5029) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][550/625] eta 0:00:33 lr 0.000018 wd 0.0500 time 0.4453 (0.4477) data time 0.0006 (0.0016) model time 0.4446 (0.4463) loss 2.7120 (2.4262) grad_norm 2.5022 (3.5445) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][560/625] eta 0:00:29 lr 0.000018 wd 0.0500 time 0.4498 (0.4480) data time 0.0008 (0.0015) model time 0.4489 (0.4467) loss 1.9386 (2.4282) grad_norm 3.6420 (3.5806) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][570/625] eta 0:00:24 lr 0.000018 wd 0.0500 time 0.4411 (0.4480) data time 0.0008 (0.0015) model time 0.4403 (0.4467) loss 2.2849 (2.4291) grad_norm 3.1543 (3.5728) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][580/625] eta 0:00:20 lr 0.000018 wd 0.0500 time 0.4481 (0.4479) data time 0.0006 (0.0015) model time 0.4475 (0.4466) loss 2.7937 (2.4278) grad_norm 3.1823 (3.5624) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][590/625] eta 0:00:15 lr 0.000018 wd 0.0500 time 0.4465 (0.4479) data time 0.0007 (0.0015) model time 0.4458 (0.4466) loss 2.5475 (2.4294) grad_norm 1.9942 (3.5597) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][600/625] eta 0:00:11 lr 0.000018 wd 0.0500 time 0.4485 (0.4479) data time 0.0009 (0.0015) model time 0.4476 (0.4466) loss 2.4184 (2.4303) grad_norm 5.4014 (3.5754) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][610/625] eta 0:00:06 lr 0.000018 wd 0.0500 time 0.4415 (0.4479) data time 0.0004 (0.0015) model time 0.4411 (0.4466) loss 2.6736 (2.4321) grad_norm 2.3826 (3.5858) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 11:58:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [286/300][620/625] eta 0:00:02 lr 0.000018 wd 0.0500 time 0.4416 (0.4477) data time 0.0004 (0.0015) model time 0.4413 (0.4464) loss 2.2466 (2.4275) grad_norm 2.3262 (nan) loss_scale 64.0000 (127.2786) mem 16699MB [2024-08-11 11:58:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 286 training takes 0:04:39 [2024-08-11 11:58:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 11:58:38 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 11:58:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.471 (0.471) Loss 0.5361 (0.5361) Acc@1 88.818 (88.818) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 11:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8511 (0.6330) Acc@1 80.957 (86.950) Acc@5 96.387 (97.763) Mem 16699MB [2024-08-11 11:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9297 (0.7565) Acc@1 79.492 (84.061) Acc@5 95.361 (96.631) Mem 16699MB [2024-08-11 11:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.805 Acc@5 96.563 [2024-08-11 11:58:41 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 11:58:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.794 (0.794) Loss 0.5259 (0.5259) Acc@1 89.014 (89.014) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 11:58:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.184) Loss 0.8452 (0.6251) Acc@1 81.152 (87.092) Acc@5 96.289 (97.798) Mem 16699MB [2024-08-11 11:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.151) Loss 0.9287 (0.7461) Acc@1 79.102 (84.180) Acc@5 95.410 (96.666) Mem 16699MB [2024-08-11 11:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.925 Acc@5 96.601 [2024-08-11 11:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 11:58:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][0/625] eta 0:13:26 lr 0.000018 wd 0.0500 time 1.2900 (1.2900) data time 0.6284 (0.6284) model time 0.0000 (0.0000) loss 2.4786 (2.4786) grad_norm 2.1840 (2.1840) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:58:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][10/625] eta 0:05:20 lr 0.000018 wd 0.0500 time 0.4434 (0.5206) data time 0.0009 (0.0579) model time 0.0000 (0.0000) loss 2.5910 (2.2661) grad_norm 2.1796 (2.8613) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][20/625] eta 0:04:57 lr 0.000018 wd 0.0500 time 0.4462 (0.4917) data time 0.0006 (0.0307) model time 0.0000 (0.0000) loss 2.5485 (2.4021) grad_norm 25.4163 (3.8653) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][30/625] eta 0:04:43 lr 0.000018 wd 0.0500 time 0.4450 (0.4769) data time 0.0006 (0.0211) model time 0.0000 (0.0000) loss 2.5714 (2.4124) grad_norm 2.3554 (3.5488) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][40/625] eta 0:04:34 lr 0.000018 wd 0.0500 time 0.4459 (0.4696) data time 0.0008 (0.0162) model time 0.0000 (0.0000) loss 2.8914 (2.4060) grad_norm 3.0293 (3.5098) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][50/625] eta 0:04:27 lr 0.000018 wd 0.0500 time 0.4425 (0.4646) data time 0.0009 (0.0132) model time 0.0000 (0.0000) loss 2.5626 (2.3892) grad_norm 2.7251 (3.3792) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][60/625] eta 0:04:20 lr 0.000018 wd 0.0500 time 0.4526 (0.4618) data time 0.0006 (0.0111) model time 0.4520 (0.4466) loss 1.7790 (2.3524) grad_norm 1.9569 (3.3460) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][70/625] eta 0:04:14 lr 0.000018 wd 0.0500 time 0.4423 (0.4594) data time 0.0006 (0.0097) model time 0.4416 (0.4454) loss 2.2077 (2.3551) grad_norm 3.7722 (3.3199) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][80/625] eta 0:04:09 lr 0.000018 wd 0.0500 time 0.4432 (0.4575) data time 0.0009 (0.0086) model time 0.4423 (0.4445) loss 2.4342 (2.3860) grad_norm 2.0181 (3.2585) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][90/625] eta 0:04:03 lr 0.000018 wd 0.0500 time 0.4447 (0.4560) data time 0.0006 (0.0078) model time 0.4441 (0.4443) loss 1.6200 (2.3975) grad_norm 4.5948 (3.2343) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][100/625] eta 0:03:58 lr 0.000018 wd 0.0500 time 0.4426 (0.4550) data time 0.0009 (0.0071) model time 0.4418 (0.4443) loss 1.6144 (2.3902) grad_norm 3.6711 (3.2206) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][110/625] eta 0:03:54 lr 0.000018 wd 0.0500 time 0.4442 (0.4546) data time 0.0008 (0.0065) model time 0.4434 (0.4451) loss 2.6087 (2.4110) grad_norm 3.5205 (3.2032) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][120/625] eta 0:03:50 lr 0.000018 wd 0.0500 time 0.4331 (0.4557) data time 0.0009 (0.0061) model time 0.4322 (0.4483) loss 2.5271 (2.4240) grad_norm 3.1145 (3.1888) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][130/625] eta 0:03:45 lr 0.000018 wd 0.0500 time 0.4469 (0.4549) data time 0.0007 (0.0057) model time 0.4462 (0.4479) loss 2.1507 (2.4276) grad_norm 2.2201 (3.1953) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][140/625] eta 0:03:40 lr 0.000018 wd 0.0500 time 0.4475 (0.4542) data time 0.0009 (0.0053) model time 0.4466 (0.4475) loss 2.7294 (2.4324) grad_norm 2.6855 (3.1610) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][150/625] eta 0:03:35 lr 0.000018 wd 0.0500 time 0.4420 (0.4536) data time 0.0008 (0.0050) model time 0.4412 (0.4471) loss 2.2814 (2.4188) grad_norm 3.3332 (3.1507) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 11:59:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][160/625] eta 0:03:30 lr 0.000018 wd 0.0500 time 0.4432 (0.4531) data time 0.0008 (0.0048) model time 0.4424 (0.4469) loss 2.9059 (2.4049) grad_norm 2.2467 (3.1757) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][170/625] eta 0:03:25 lr 0.000018 wd 0.0500 time 0.4498 (0.4527) data time 0.0009 (0.0045) model time 0.4489 (0.4468) loss 2.7418 (2.4074) grad_norm 3.7825 (3.1710) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][180/625] eta 0:03:21 lr 0.000018 wd 0.0500 time 0.4438 (0.4523) data time 0.0006 (0.0044) model time 0.4432 (0.4466) loss 1.9979 (2.3928) grad_norm 2.7062 (3.1728) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][190/625] eta 0:03:16 lr 0.000018 wd 0.0500 time 0.4439 (0.4519) data time 0.0007 (0.0042) model time 0.4432 (0.4464) loss 1.7160 (2.3838) grad_norm 2.3401 (3.1368) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][200/625] eta 0:03:11 lr 0.000018 wd 0.0500 time 0.4445 (0.4516) data time 0.0010 (0.0040) model time 0.4436 (0.4463) loss 2.5194 (2.3828) grad_norm 3.2004 (3.1282) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][210/625] eta 0:03:07 lr 0.000018 wd 0.0500 time 0.4478 (0.4514) data time 0.0008 (0.0039) model time 0.4470 (0.4463) loss 2.5903 (2.3845) grad_norm 2.5195 (3.1457) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][220/625] eta 0:03:02 lr 0.000018 wd 0.0500 time 0.4430 (0.4511) data time 0.0009 (0.0037) model time 0.4421 (0.4462) loss 2.4974 (2.3911) grad_norm 3.5745 (3.1739) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][230/625] eta 0:02:58 lr 0.000018 wd 0.0500 time 0.4428 (0.4509) data time 0.0006 (0.0036) model time 0.4422 (0.4461) loss 2.4367 (2.3843) grad_norm 2.6950 (3.1814) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][240/625] eta 0:02:53 lr 0.000018 wd 0.0500 time 0.4473 (0.4507) data time 0.0008 (0.0035) model time 0.4465 (0.4461) loss 2.6286 (2.3838) grad_norm 2.3673 (3.2658) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][250/625] eta 0:02:48 lr 0.000018 wd 0.0500 time 0.4483 (0.4505) data time 0.0006 (0.0034) model time 0.4476 (0.4461) loss 2.1382 (2.3838) grad_norm 2.7128 (3.2661) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][260/625] eta 0:02:44 lr 0.000018 wd 0.0500 time 0.4492 (0.4504) data time 0.0006 (0.0033) model time 0.4485 (0.4461) loss 2.0287 (2.3787) grad_norm 2.1028 (3.4011) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][270/625] eta 0:02:39 lr 0.000018 wd 0.0500 time 0.4479 (0.4503) data time 0.0006 (0.0032) model time 0.4472 (0.4461) loss 2.7661 (2.3929) grad_norm 2.3814 (3.4162) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][280/625] eta 0:02:35 lr 0.000018 wd 0.0500 time 0.4494 (0.4502) data time 0.0008 (0.0031) model time 0.4486 (0.4461) loss 2.2031 (2.3975) grad_norm 2.9472 (3.4269) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:00:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][290/625] eta 0:02:30 lr 0.000018 wd 0.0500 time 0.4472 (0.4500) data time 0.0009 (0.0030) model time 0.4464 (0.4460) loss 1.7490 (2.3903) grad_norm 4.2075 (3.4618) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][300/625] eta 0:02:26 lr 0.000018 wd 0.0500 time 0.4456 (0.4499) data time 0.0006 (0.0030) model time 0.4450 (0.4461) loss 1.6559 (2.3825) grad_norm 2.4267 (3.4476) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][310/625] eta 0:02:21 lr 0.000018 wd 0.0500 time 0.4499 (0.4498) data time 0.0006 (0.0029) model time 0.4493 (0.4460) loss 2.2971 (2.3846) grad_norm 2.3801 (3.4547) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][320/625] eta 0:02:17 lr 0.000018 wd 0.0500 time 0.4490 (0.4497) data time 0.0008 (0.0028) model time 0.4482 (0.4460) loss 2.5076 (2.3898) grad_norm 3.5111 (3.4343) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][330/625] eta 0:02:12 lr 0.000018 wd 0.0500 time 0.4448 (0.4501) data time 0.0008 (0.0028) model time 0.4440 (0.4465) loss 2.7696 (2.3886) grad_norm 2.2905 (3.4260) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][340/625] eta 0:02:08 lr 0.000018 wd 0.0500 time 0.4483 (0.4500) data time 0.0008 (0.0027) model time 0.4475 (0.4465) loss 2.2547 (2.3871) grad_norm 2.1588 (3.4087) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][350/625] eta 0:02:03 lr 0.000018 wd 0.0500 time 0.4471 (0.4500) data time 0.0006 (0.0027) model time 0.4465 (0.4466) loss 2.3326 (2.3871) grad_norm 2.6133 (3.4169) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][360/625] eta 0:01:59 lr 0.000018 wd 0.0500 time 0.4497 (0.4504) data time 0.0007 (0.0026) model time 0.4490 (0.4472) loss 2.5898 (2.3912) grad_norm 2.4689 (3.4330) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][370/625] eta 0:01:54 lr 0.000018 wd 0.0500 time 0.4444 (0.4503) data time 0.0008 (0.0026) model time 0.4436 (0.4471) loss 2.1426 (2.3858) grad_norm 3.6217 (3.4236) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][380/625] eta 0:01:50 lr 0.000018 wd 0.0500 time 0.4462 (0.4503) data time 0.0006 (0.0025) model time 0.4455 (0.4472) loss 2.5247 (2.3828) grad_norm 5.1149 (3.4111) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][390/625] eta 0:01:45 lr 0.000018 wd 0.0500 time 0.4491 (0.4502) data time 0.0008 (0.0025) model time 0.4483 (0.4471) loss 1.7134 (2.3884) grad_norm 2.3682 (3.3980) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][400/625] eta 0:01:41 lr 0.000018 wd 0.0500 time 0.4464 (0.4500) data time 0.0008 (0.0024) model time 0.4456 (0.4470) loss 3.0348 (2.3910) grad_norm 3.1887 (3.4037) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][410/625] eta 0:01:36 lr 0.000018 wd 0.0500 time 0.4452 (0.4499) data time 0.0009 (0.0024) model time 0.4443 (0.4469) loss 2.6036 (2.3961) grad_norm 2.2560 (3.3824) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][420/625] eta 0:01:32 lr 0.000018 wd 0.0500 time 0.4432 (0.4498) data time 0.0008 (0.0024) model time 0.4424 (0.4469) loss 2.7469 (2.3919) grad_norm 3.1728 (3.3749) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][430/625] eta 0:01:27 lr 0.000018 wd 0.0500 time 0.4467 (0.4497) data time 0.0008 (0.0023) model time 0.4459 (0.4468) loss 2.1447 (2.3942) grad_norm 3.7148 (3.3743) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][440/625] eta 0:01:23 lr 0.000018 wd 0.0500 time 0.4476 (0.4496) data time 0.0008 (0.0023) model time 0.4468 (0.4468) loss 2.3314 (2.3919) grad_norm 2.4433 (3.3972) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][450/625] eta 0:01:18 lr 0.000018 wd 0.0500 time 0.4441 (0.4500) data time 0.0008 (0.0023) model time 0.4433 (0.4472) loss 2.7333 (2.3930) grad_norm 3.9440 (3.3897) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][460/625] eta 0:01:14 lr 0.000018 wd 0.0500 time 0.4441 (0.4499) data time 0.0008 (0.0022) model time 0.4433 (0.4472) loss 2.4067 (2.3952) grad_norm 2.5024 (3.3713) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][470/625] eta 0:01:09 lr 0.000018 wd 0.0500 time 0.4453 (0.4501) data time 0.0008 (0.0022) model time 0.4445 (0.4474) loss 2.0042 (2.3959) grad_norm 3.2397 (3.4153) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][480/625] eta 0:01:05 lr 0.000018 wd 0.0500 time 0.4459 (0.4500) data time 0.0006 (0.0022) model time 0.4453 (0.4474) loss 2.4572 (2.3969) grad_norm 4.0335 (3.4078) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][490/625] eta 0:01:00 lr 0.000018 wd 0.0500 time 0.4450 (0.4499) data time 0.0008 (0.0022) model time 0.4441 (0.4473) loss 2.5984 (2.4003) grad_norm 2.7637 (3.4189) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][500/625] eta 0:00:56 lr 0.000018 wd 0.0500 time 0.4476 (0.4499) data time 0.0007 (0.0021) model time 0.4470 (0.4473) loss 1.8035 (2.3993) grad_norm 3.2716 (3.4208) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][510/625] eta 0:00:51 lr 0.000018 wd 0.0500 time 0.4430 (0.4498) data time 0.0006 (0.0021) model time 0.4423 (0.4473) loss 1.5159 (2.3951) grad_norm 4.1942 (3.4084) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][520/625] eta 0:00:47 lr 0.000018 wd 0.0500 time 0.4436 (0.4497) data time 0.0006 (0.0021) model time 0.4430 (0.4472) loss 2.8054 (2.3995) grad_norm 2.7420 (3.4769) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][530/625] eta 0:00:42 lr 0.000018 wd 0.0500 time 0.4439 (0.4496) data time 0.0008 (0.0021) model time 0.4431 (0.4472) loss 2.6658 (2.4036) grad_norm 2.5300 (3.4670) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][540/625] eta 0:00:38 lr 0.000017 wd 0.0500 time 0.4442 (0.4496) data time 0.0008 (0.0020) model time 0.4433 (0.4471) loss 2.5130 (2.4056) grad_norm 3.0658 (3.4595) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][550/625] eta 0:00:33 lr 0.000017 wd 0.0500 time 0.4517 (0.4496) data time 0.0008 (0.0020) model time 0.4509 (0.4472) loss 2.5014 (2.4084) grad_norm 2.4028 (3.4498) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][560/625] eta 0:00:29 lr 0.000017 wd 0.0500 time 0.4463 (0.4496) data time 0.0008 (0.0020) model time 0.4455 (0.4472) loss 2.5368 (2.4093) grad_norm 3.1816 (3.4428) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][570/625] eta 0:00:24 lr 0.000017 wd 0.0500 time 0.4471 (0.4495) data time 0.0009 (0.0020) model time 0.4462 (0.4471) loss 2.4249 (2.4072) grad_norm 3.2547 (3.5719) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][580/625] eta 0:00:20 lr 0.000017 wd 0.0500 time 0.4475 (0.4494) data time 0.0008 (0.0020) model time 0.4467 (0.4471) loss 2.4239 (2.4053) grad_norm 2.0010 (3.5588) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][590/625] eta 0:00:15 lr 0.000017 wd 0.0500 time 0.4459 (0.4494) data time 0.0008 (0.0019) model time 0.4451 (0.4471) loss 2.4052 (2.4075) grad_norm 2.4529 (3.5452) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][600/625] eta 0:00:11 lr 0.000017 wd 0.0500 time 0.4467 (0.4493) data time 0.0008 (0.0019) model time 0.4458 (0.4470) loss 2.5009 (2.4051) grad_norm 3.0681 (3.5342) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][610/625] eta 0:00:06 lr 0.000017 wd 0.0500 time 0.4396 (0.4492) data time 0.0006 (0.0019) model time 0.4390 (0.4469) loss 2.6046 (2.4061) grad_norm 3.3102 (3.5446) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [287/300][620/625] eta 0:00:02 lr 0.000017 wd 0.0500 time 0.4383 (0.4491) data time 0.0004 (0.0019) model time 0.4380 (0.4468) loss 2.6520 (2.4076) grad_norm 2.5111 (3.5353) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 287 training takes 0:04:40 [2024-08-11 12:03:26 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:03:27 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 12:03:27 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.472 (0.472) Loss 0.5337 (0.5337) Acc@1 89.062 (89.062) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 12:03:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.153) Loss 0.8608 (0.6322) Acc@1 80.713 (86.994) Acc@5 96.387 (97.785) Mem 16699MB [2024-08-11 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.136) Loss 0.9399 (0.7558) Acc@1 79.199 (84.070) Acc@5 95.508 (96.663) Mem 16699MB [2024-08-11 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.815 Acc@5 96.591 [2024-08-11 12:03:30 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 12:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.813 (0.813) Loss 0.5269 (0.5269) Acc@1 89.014 (89.014) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 12:03:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.184) Loss 0.8462 (0.6257) Acc@1 81.152 (87.087) Acc@5 96.387 (97.807) Mem 16699MB [2024-08-11 12:03:33 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.152) Loss 0.9297 (0.7467) Acc@1 79.297 (84.189) Acc@5 95.410 (96.677) Mem 16699MB [2024-08-11 12:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.941 Acc@5 96.613 [2024-08-11 12:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 12:03:34 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.94% [2024-08-11 12:03:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 12:03:35 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 12:03:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][0/625] eta 0:08:02 lr 0.000017 wd 0.0500 time 0.7723 (0.7723) data time 0.3713 (0.3713) model time 0.0000 (0.0000) loss 2.1072 (2.1072) grad_norm 2.4587 (2.4587) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][10/625] eta 0:04:51 lr 0.000017 wd 0.0500 time 0.4412 (0.4744) data time 0.0008 (0.0346) model time 0.0000 (0.0000) loss 2.3451 (2.5437) grad_norm 2.8315 (3.2578) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][20/625] eta 0:04:38 lr 0.000017 wd 0.0500 time 0.4434 (0.4603) data time 0.0008 (0.0185) model time 0.0000 (0.0000) loss 2.7443 (2.5161) grad_norm 2.5155 (3.0994) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][30/625] eta 0:04:38 lr 0.000017 wd 0.0500 time 0.4451 (0.4683) data time 0.0006 (0.0128) model time 0.0000 (0.0000) loss 2.0856 (2.4762) grad_norm 2.9983 (3.2601) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][40/625] eta 0:04:31 lr 0.000017 wd 0.0500 time 0.4454 (0.4634) data time 0.0008 (0.0099) model time 0.0000 (0.0000) loss 2.5268 (2.3934) grad_norm 2.7602 (3.1848) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][50/625] eta 0:04:24 lr 0.000017 wd 0.0500 time 0.4466 (0.4599) data time 0.0009 (0.0081) model time 0.0000 (0.0000) loss 2.7121 (2.3869) grad_norm 2.4834 (3.3953) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][60/625] eta 0:04:18 lr 0.000017 wd 0.0500 time 0.4444 (0.4579) data time 0.0006 (0.0069) model time 0.4438 (0.4472) loss 2.7059 (2.3926) grad_norm 2.7307 (3.4071) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][70/625] eta 0:04:14 lr 0.000017 wd 0.0500 time 0.4444 (0.4578) data time 0.0006 (0.0061) model time 0.4438 (0.4516) loss 2.6905 (2.4014) grad_norm 2.3314 (3.3198) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][80/625] eta 0:04:09 lr 0.000017 wd 0.0500 time 0.4450 (0.4579) data time 0.0008 (0.0054) model time 0.4442 (0.4537) loss 2.1667 (2.4222) grad_norm 1.7703 (3.3555) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][90/625] eta 0:04:04 lr 0.000017 wd 0.0500 time 0.4458 (0.4567) data time 0.0008 (0.0049) model time 0.4450 (0.4517) loss 1.8686 (2.4451) grad_norm 3.8973 (3.3573) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][100/625] eta 0:03:59 lr 0.000017 wd 0.0500 time 0.4463 (0.4557) data time 0.0008 (0.0045) model time 0.4455 (0.4505) loss 2.6719 (2.4550) grad_norm 3.9578 (3.3810) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][110/625] eta 0:03:54 lr 0.000017 wd 0.0500 time 0.4469 (0.4548) data time 0.0006 (0.0042) model time 0.4462 (0.4496) loss 2.7916 (2.4577) grad_norm 5.2448 (3.4527) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][120/625] eta 0:03:49 lr 0.000017 wd 0.0500 time 0.4467 (0.4541) data time 0.0007 (0.0039) model time 0.4461 (0.4490) loss 2.1851 (2.4549) grad_norm 3.2529 (3.4444) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][130/625] eta 0:03:44 lr 0.000017 wd 0.0500 time 0.4554 (0.4535) data time 0.0006 (0.0037) model time 0.4548 (0.4485) loss 2.4485 (2.4639) grad_norm 2.7821 (3.4328) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][140/625] eta 0:03:39 lr 0.000017 wd 0.0500 time 0.4460 (0.4529) data time 0.0009 (0.0035) model time 0.4451 (0.4481) loss 2.2788 (2.4563) grad_norm 3.3768 (3.4039) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][150/625] eta 0:03:34 lr 0.000017 wd 0.0500 time 0.4463 (0.4525) data time 0.0006 (0.0033) model time 0.4457 (0.4479) loss 2.4185 (2.4533) grad_norm 3.1252 (3.4766) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][160/625] eta 0:03:30 lr 0.000017 wd 0.0500 time 0.4439 (0.4521) data time 0.0008 (0.0032) model time 0.4431 (0.4477) loss 1.9566 (2.4507) grad_norm 2.3849 (3.4404) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][170/625] eta 0:03:25 lr 0.000017 wd 0.0500 time 0.4458 (0.4518) data time 0.0006 (0.0030) model time 0.4452 (0.4475) loss 1.9601 (2.4555) grad_norm 2.3455 (3.4031) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:04:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][180/625] eta 0:03:20 lr 0.000017 wd 0.0500 time 0.4438 (0.4515) data time 0.0008 (0.0030) model time 0.4430 (0.4472) loss 1.6529 (2.4570) grad_norm 2.7110 (3.3822) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][190/625] eta 0:03:16 lr 0.000017 wd 0.0500 time 0.4459 (0.4512) data time 0.0009 (0.0028) model time 0.4451 (0.4472) loss 2.6621 (2.4603) grad_norm 2.1486 (3.3492) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][200/625] eta 0:03:11 lr 0.000017 wd 0.0500 time 0.4430 (0.4509) data time 0.0006 (0.0027) model time 0.4424 (0.4469) loss 2.1239 (2.4511) grad_norm 2.8103 (3.3352) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][210/625] eta 0:03:07 lr 0.000017 wd 0.0500 time 0.4459 (0.4507) data time 0.0006 (0.0027) model time 0.4452 (0.4468) loss 2.3884 (2.4359) grad_norm 2.7846 (3.3069) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][220/625] eta 0:03:03 lr 0.000017 wd 0.0500 time 0.4445 (0.4520) data time 0.0006 (0.0026) model time 0.4438 (0.4487) loss 2.3083 (2.4294) grad_norm 6.0610 (3.3108) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][230/625] eta 0:02:58 lr 0.000017 wd 0.0500 time 0.4457 (0.4516) data time 0.0008 (0.0025) model time 0.4450 (0.4484) loss 2.1746 (2.4334) grad_norm 2.8253 (3.2969) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][240/625] eta 0:02:53 lr 0.000017 wd 0.0500 time 0.4431 (0.4514) data time 0.0008 (0.0024) model time 0.4424 (0.4482) loss 2.4206 (2.4366) grad_norm 2.5908 (3.2862) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][250/625] eta 0:02:49 lr 0.000017 wd 0.0500 time 0.4436 (0.4511) data time 0.0008 (0.0024) model time 0.4428 (0.4479) loss 2.4944 (2.4384) grad_norm 2.5609 (3.3300) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][260/625] eta 0:02:44 lr 0.000017 wd 0.0500 time 0.4426 (0.4508) data time 0.0008 (0.0023) model time 0.4418 (0.4477) loss 2.6979 (2.4378) grad_norm 2.1645 (3.3183) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][270/625] eta 0:02:39 lr 0.000017 wd 0.0500 time 0.4457 (0.4506) data time 0.0008 (0.0023) model time 0.4449 (0.4475) loss 1.5360 (2.4301) grad_norm 2.0517 (3.2996) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][280/625] eta 0:02:35 lr 0.000017 wd 0.0500 time 0.4467 (0.4503) data time 0.0008 (0.0022) model time 0.4459 (0.4473) loss 2.9578 (2.4324) grad_norm 3.8233 (3.2937) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][290/625] eta 0:02:30 lr 0.000017 wd 0.0500 time 0.4407 (0.4507) data time 0.0008 (0.0022) model time 0.4399 (0.4478) loss 2.6806 (2.4189) grad_norm 2.4867 (3.2944) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][300/625] eta 0:02:26 lr 0.000017 wd 0.0500 time 0.4433 (0.4505) data time 0.0008 (0.0021) model time 0.4425 (0.4477) loss 2.7021 (2.4193) grad_norm 3.4235 (3.5119) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:05:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][310/625] eta 0:02:21 lr 0.000017 wd 0.0500 time 0.4452 (0.4503) data time 0.0006 (0.0021) model time 0.4446 (0.4475) loss 2.6047 (2.4187) grad_norm 4.7356 (3.5347) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][320/625] eta 0:02:17 lr 0.000017 wd 0.0500 time 0.4421 (0.4501) data time 0.0008 (0.0020) model time 0.4414 (0.4474) loss 2.7305 (2.4247) grad_norm 3.7832 (3.5111) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][330/625] eta 0:02:12 lr 0.000017 wd 0.0500 time 0.4465 (0.4500) data time 0.0009 (0.0020) model time 0.4456 (0.4473) loss 2.7353 (2.4289) grad_norm 3.2396 (3.5021) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][340/625] eta 0:02:08 lr 0.000017 wd 0.0500 time 0.4480 (0.4498) data time 0.0008 (0.0020) model time 0.4473 (0.4472) loss 1.7120 (2.4267) grad_norm 2.8904 (3.4828) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][350/625] eta 0:02:03 lr 0.000017 wd 0.0500 time 0.4467 (0.4497) data time 0.0008 (0.0019) model time 0.4459 (0.4471) loss 2.4476 (2.4260) grad_norm 3.0473 (3.4893) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][360/625] eta 0:01:59 lr 0.000017 wd 0.0500 time 0.4440 (0.4496) data time 0.0006 (0.0019) model time 0.4434 (0.4470) loss 2.0785 (2.4316) grad_norm 3.0419 (3.4854) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][370/625] eta 0:01:54 lr 0.000017 wd 0.0500 time 0.4480 (0.4496) data time 0.0006 (0.0019) model time 0.4473 (0.4470) loss 3.0587 (2.4229) grad_norm 3.2745 (3.4838) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][380/625] eta 0:01:50 lr 0.000017 wd 0.0500 time 0.4459 (0.4495) data time 0.0006 (0.0019) model time 0.4453 (0.4470) loss 2.5771 (2.4164) grad_norm 2.6688 (3.4860) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][390/625] eta 0:01:45 lr 0.000017 wd 0.0500 time 0.4485 (0.4494) data time 0.0006 (0.0018) model time 0.4479 (0.4469) loss 3.0580 (2.4179) grad_norm 4.1575 (3.4777) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][400/625] eta 0:01:41 lr 0.000017 wd 0.0500 time 0.4426 (0.4493) data time 0.0007 (0.0018) model time 0.4419 (0.4468) loss 2.2592 (2.4160) grad_norm 4.1837 (3.4740) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][410/625] eta 0:01:36 lr 0.000017 wd 0.0500 time 0.4437 (0.4491) data time 0.0009 (0.0018) model time 0.4428 (0.4467) loss 2.7610 (2.4190) grad_norm 2.6502 (3.4743) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][420/625] eta 0:01:32 lr 0.000017 wd 0.0500 time 0.4456 (0.4494) data time 0.0009 (0.0018) model time 0.4447 (0.4471) loss 2.7379 (2.4223) grad_norm 3.1348 (3.4598) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][430/625] eta 0:01:27 lr 0.000017 wd 0.0500 time 0.4485 (0.4494) data time 0.0007 (0.0018) model time 0.4478 (0.4471) loss 1.8599 (2.4237) grad_norm 2.0933 (3.4689) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][440/625] eta 0:01:23 lr 0.000017 wd 0.0500 time 0.4484 (0.4502) data time 0.0006 (0.0017) model time 0.4478 (0.4480) loss 2.5640 (2.4221) grad_norm 4.1840 (3.4711) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:06:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][450/625] eta 0:01:18 lr 0.000017 wd 0.0500 time 0.4469 (0.4501) data time 0.0009 (0.0017) model time 0.4460 (0.4479) loss 2.5320 (2.4216) grad_norm 2.8683 (3.4639) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][460/625] eta 0:01:14 lr 0.000017 wd 0.0500 time 0.4431 (0.4500) data time 0.0008 (0.0017) model time 0.4423 (0.4479) loss 2.2581 (2.4205) grad_norm 2.7537 (3.4978) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][470/625] eta 0:01:09 lr 0.000017 wd 0.0500 time 0.4423 (0.4499) data time 0.0007 (0.0017) model time 0.4417 (0.4478) loss 1.7148 (2.4180) grad_norm 8.2272 (3.5052) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][480/625] eta 0:01:05 lr 0.000017 wd 0.0500 time 0.4527 (0.4498) data time 0.0009 (0.0017) model time 0.4518 (0.4477) loss 2.9586 (2.4145) grad_norm 2.2973 (3.4886) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][490/625] eta 0:01:00 lr 0.000017 wd 0.0500 time 0.4477 (0.4497) data time 0.0008 (0.0016) model time 0.4469 (0.4476) loss 2.5689 (2.4167) grad_norm 3.4027 (3.4703) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][500/625] eta 0:00:56 lr 0.000017 wd 0.0500 time 0.4435 (0.4496) data time 0.0008 (0.0016) model time 0.4427 (0.4476) loss 2.1218 (2.4179) grad_norm 2.5084 (3.4840) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][510/625] eta 0:00:51 lr 0.000017 wd 0.0500 time 0.4456 (0.4495) data time 0.0007 (0.0016) model time 0.4449 (0.4475) loss 2.2702 (2.4179) grad_norm 2.2775 (3.4721) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][520/625] eta 0:00:47 lr 0.000017 wd 0.0500 time 0.4469 (0.4495) data time 0.0006 (0.0016) model time 0.4462 (0.4474) loss 1.7698 (2.4215) grad_norm 2.2782 (3.4775) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][530/625] eta 0:00:42 lr 0.000017 wd 0.0500 time 0.4452 (0.4494) data time 0.0009 (0.0016) model time 0.4443 (0.4474) loss 2.5957 (2.4223) grad_norm 3.2664 (3.4691) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][540/625] eta 0:00:38 lr 0.000017 wd 0.0500 time 0.4453 (0.4493) data time 0.0009 (0.0016) model time 0.4445 (0.4473) loss 2.1683 (2.4218) grad_norm 2.9265 (3.4688) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][550/625] eta 0:00:33 lr 0.000017 wd 0.0500 time 0.4473 (0.4492) data time 0.0008 (0.0016) model time 0.4465 (0.4472) loss 2.3388 (2.4244) grad_norm 3.7045 (3.4778) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][560/625] eta 0:00:29 lr 0.000017 wd 0.0500 time 0.4479 (0.4492) data time 0.0008 (0.0016) model time 0.4471 (0.4472) loss 2.3809 (2.4255) grad_norm 2.4601 (3.6227) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][570/625] eta 0:00:24 lr 0.000017 wd 0.0500 time 0.4457 (0.4491) data time 0.0007 (0.0015) model time 0.4450 (0.4471) loss 2.8634 (2.4239) grad_norm 3.3734 (3.6136) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:07:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][580/625] eta 0:00:20 lr 0.000017 wd 0.0500 time 0.4455 (0.4494) data time 0.0007 (0.0015) model time 0.4448 (0.4475) loss 2.6451 (2.4209) grad_norm 3.7897 (3.6591) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][590/625] eta 0:00:15 lr 0.000017 wd 0.0500 time 0.4457 (0.4498) data time 0.0007 (0.0015) model time 0.4450 (0.4479) loss 1.5566 (2.4173) grad_norm 4.6461 (3.6468) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][600/625] eta 0:00:11 lr 0.000017 wd 0.0500 time 0.4468 (0.4497) data time 0.0007 (0.0015) model time 0.4461 (0.4479) loss 2.9856 (2.4226) grad_norm 3.0854 (3.7081) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][610/625] eta 0:00:06 lr 0.000017 wd 0.0500 time 0.4459 (0.4497) data time 0.0004 (0.0015) model time 0.4454 (0.4479) loss 1.6619 (2.4220) grad_norm 5.3638 (3.7009) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [288/300][620/625] eta 0:00:02 lr 0.000017 wd 0.0500 time 0.4455 (0.4496) data time 0.0004 (0.0015) model time 0.4451 (0.4478) loss 2.4585 (2.4204) grad_norm 2.7710 (3.6970) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 288 training takes 0:04:40 [2024-08-11 12:08:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:08:18 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 12:08:18 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.464 (0.464) Loss 0.5229 (0.5229) Acc@1 89.209 (89.209) Acc@5 99.072 (99.072) Mem 16699MB [2024-08-11 12:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8506 (0.6275) Acc@1 80.957 (87.034) Acc@5 96.387 (97.816) Mem 16699MB [2024-08-11 12:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.135) Loss 0.9297 (0.7512) Acc@1 79.199 (84.080) Acc@5 95.264 (96.642) Mem 16699MB [2024-08-11 12:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.803 Acc@5 96.571 [2024-08-11 12:08:21 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 12:08:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.819 (0.819) Loss 0.5273 (0.5273) Acc@1 89.014 (89.014) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 12:08:23 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.185) Loss 0.8462 (0.6264) Acc@1 81.201 (87.078) Acc@5 96.387 (97.816) Mem 16699MB [2024-08-11 12:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.152) Loss 0.9297 (0.7474) Acc@1 79.248 (84.175) Acc@5 95.410 (96.677) Mem 16699MB [2024-08-11 12:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.925 Acc@5 96.619 [2024-08-11 12:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 12:08:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][0/625] eta 0:13:11 lr 0.000017 wd 0.0500 time 1.2669 (1.2669) data time 0.7036 (0.7036) model time 0.0000 (0.0000) loss 2.5912 (2.5912) grad_norm 3.3187 (3.3187) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][10/625] eta 0:05:19 lr 0.000017 wd 0.0500 time 0.4456 (0.5197) data time 0.0005 (0.0649) model time 0.0000 (0.0000) loss 1.5490 (2.5091) grad_norm 2.2891 (2.7016) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][20/625] eta 0:04:53 lr 0.000016 wd 0.0500 time 0.4486 (0.4849) data time 0.0010 (0.0344) model time 0.0000 (0.0000) loss 2.3033 (2.3444) grad_norm 2.6272 (2.5797) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][30/625] eta 0:04:41 lr 0.000016 wd 0.0500 time 0.4475 (0.4723) data time 0.0008 (0.0235) model time 0.0000 (0.0000) loss 2.5571 (2.3125) grad_norm 2.4691 (3.0571) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][40/625] eta 0:04:32 lr 0.000016 wd 0.0500 time 0.4469 (0.4663) data time 0.0006 (0.0180) model time 0.0000 (0.0000) loss 2.1554 (2.3541) grad_norm 3.7706 (2.9788) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][50/625] eta 0:04:25 lr 0.000016 wd 0.0500 time 0.4428 (0.4621) data time 0.0008 (0.0147) model time 0.0000 (0.0000) loss 1.8289 (2.3149) grad_norm 3.2686 (2.9496) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][60/625] eta 0:04:20 lr 0.000016 wd 0.0500 time 0.4462 (0.4609) data time 0.0007 (0.0124) model time 0.4455 (0.4540) loss 1.5922 (2.3215) grad_norm 2.6562 (2.9147) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][70/625] eta 0:04:14 lr 0.000016 wd 0.0500 time 0.4453 (0.4588) data time 0.0006 (0.0108) model time 0.4446 (0.4498) loss 2.5574 (2.3353) grad_norm 2.8492 (2.9181) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][80/625] eta 0:04:09 lr 0.000016 wd 0.0500 time 0.4576 (0.4576) data time 0.0006 (0.0095) model time 0.4570 (0.4493) loss 1.7295 (2.3195) grad_norm 10.1368 (3.8174) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][90/625] eta 0:04:04 lr 0.000016 wd 0.0500 time 0.4464 (0.4566) data time 0.0009 (0.0086) model time 0.4456 (0.4488) loss 2.3681 (2.3304) grad_norm 2.4191 (3.7241) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][100/625] eta 0:03:59 lr 0.000016 wd 0.0500 time 0.4455 (0.4557) data time 0.0006 (0.0078) model time 0.4450 (0.4483) loss 2.9168 (2.3403) grad_norm 1.7545 (3.6359) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][110/625] eta 0:03:54 lr 0.000016 wd 0.0500 time 0.4438 (0.4550) data time 0.0006 (0.0072) model time 0.4432 (0.4482) loss 2.3446 (2.3600) grad_norm 2.2254 (3.5407) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][120/625] eta 0:03:49 lr 0.000016 wd 0.0500 time 0.4461 (0.4543) data time 0.0006 (0.0067) model time 0.4454 (0.4477) loss 1.9824 (2.3561) grad_norm 4.4555 (3.4979) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][130/625] eta 0:03:45 lr 0.000016 wd 0.0500 time 0.4438 (0.4550) data time 0.0008 (0.0062) model time 0.4430 (0.4497) loss 2.1979 (2.3695) grad_norm 3.2980 (3.4669) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][140/625] eta 0:03:40 lr 0.000016 wd 0.0500 time 0.4440 (0.4543) data time 0.0007 (0.0058) model time 0.4434 (0.4490) loss 1.6331 (2.3652) grad_norm 2.5803 (3.5298) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][150/625] eta 0:03:35 lr 0.000016 wd 0.0500 time 0.4428 (0.4536) data time 0.0007 (0.0055) model time 0.4422 (0.4485) loss 2.6987 (2.3760) grad_norm 2.3951 (3.5021) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][160/625] eta 0:03:30 lr 0.000016 wd 0.0500 time 0.4482 (0.4531) data time 0.0006 (0.0052) model time 0.4476 (0.4482) loss 2.7525 (2.3637) grad_norm 2.7492 (3.4664) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][170/625] eta 0:03:25 lr 0.000016 wd 0.0500 time 0.4483 (0.4527) data time 0.0008 (0.0050) model time 0.4475 (0.4479) loss 2.4997 (2.3739) grad_norm 3.1234 (3.4866) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][180/625] eta 0:03:21 lr 0.000016 wd 0.0500 time 0.4470 (0.4524) data time 0.0006 (0.0047) model time 0.4463 (0.4477) loss 2.6460 (2.3696) grad_norm 2.6292 (3.5311) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][190/625] eta 0:03:16 lr 0.000016 wd 0.0500 time 0.4453 (0.4521) data time 0.0006 (0.0045) model time 0.4447 (0.4476) loss 1.5570 (2.3746) grad_norm 53.9741 (3.8329) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:09:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][200/625] eta 0:03:12 lr 0.000016 wd 0.0500 time 0.4478 (0.4518) data time 0.0008 (0.0043) model time 0.4470 (0.4475) loss 2.6760 (2.3761) grad_norm 2.8969 (3.8308) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][210/625] eta 0:03:07 lr 0.000016 wd 0.0500 time 0.4433 (0.4515) data time 0.0006 (0.0042) model time 0.4427 (0.4473) loss 3.0688 (2.3758) grad_norm 5.6475 (3.8192) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][220/625] eta 0:03:02 lr 0.000016 wd 0.0500 time 0.4433 (0.4513) data time 0.0006 (0.0040) model time 0.4427 (0.4472) loss 2.5892 (2.3788) grad_norm 1.8205 (3.7830) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][230/625] eta 0:02:58 lr 0.000016 wd 0.0500 time 0.4476 (0.4510) data time 0.0006 (0.0039) model time 0.4470 (0.4470) loss 2.6491 (2.3868) grad_norm 4.7268 (3.7626) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][240/625] eta 0:02:53 lr 0.000016 wd 0.0500 time 0.4457 (0.4508) data time 0.0006 (0.0038) model time 0.4451 (0.4469) loss 2.3838 (2.3887) grad_norm 2.1261 (3.8787) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][250/625] eta 0:02:48 lr 0.000016 wd 0.0500 time 0.4483 (0.4506) data time 0.0006 (0.0037) model time 0.4477 (0.4469) loss 2.6554 (2.3924) grad_norm 2.4766 (3.8466) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][260/625] eta 0:02:44 lr 0.000016 wd 0.0500 time 0.4453 (0.4505) data time 0.0007 (0.0036) model time 0.4446 (0.4468) loss 2.5396 (2.3964) grad_norm 6.8167 (3.8232) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][270/625] eta 0:02:39 lr 0.000016 wd 0.0500 time 0.4427 (0.4502) data time 0.0009 (0.0035) model time 0.4418 (0.4466) loss 3.0431 (2.3945) grad_norm 3.4511 (3.8678) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][280/625] eta 0:02:35 lr 0.000016 wd 0.0500 time 0.4438 (0.4506) data time 0.0007 (0.0034) model time 0.4431 (0.4472) loss 2.2120 (2.3976) grad_norm 2.1352 (3.8517) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][290/625] eta 0:02:30 lr 0.000016 wd 0.0500 time 0.4462 (0.4504) data time 0.0008 (0.0033) model time 0.4454 (0.4470) loss 2.3138 (2.4061) grad_norm 2.3269 (3.8233) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][300/625] eta 0:02:26 lr 0.000016 wd 0.0500 time 0.5786 (0.4507) data time 0.0006 (0.0032) model time 0.5780 (0.4475) loss 2.5105 (2.4020) grad_norm 2.9597 (3.8346) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][310/625] eta 0:02:21 lr 0.000016 wd 0.0500 time 0.4426 (0.4505) data time 0.0008 (0.0031) model time 0.4418 (0.4473) loss 2.4889 (2.4014) grad_norm 2.3689 (3.8376) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][320/625] eta 0:02:17 lr 0.000016 wd 0.0500 time 0.4466 (0.4503) data time 0.0008 (0.0031) model time 0.4458 (0.4472) loss 2.2783 (2.3955) grad_norm 2.3008 (3.8005) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][330/625] eta 0:02:12 lr 0.000016 wd 0.0500 time 0.4431 (0.4502) data time 0.0006 (0.0030) model time 0.4425 (0.4472) loss 2.0326 (2.3907) grad_norm 2.2536 (3.8387) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:10:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][340/625] eta 0:02:08 lr 0.000016 wd 0.0500 time 0.4474 (0.4500) data time 0.0008 (0.0029) model time 0.4466 (0.4470) loss 2.0955 (2.3928) grad_norm 2.2990 (3.8028) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][350/625] eta 0:02:03 lr 0.000016 wd 0.0500 time 0.4440 (0.4499) data time 0.0008 (0.0029) model time 0.4432 (0.4469) loss 2.6661 (2.3922) grad_norm 2.0253 (3.8804) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][360/625] eta 0:01:59 lr 0.000016 wd 0.0500 time 0.4457 (0.4497) data time 0.0007 (0.0028) model time 0.4450 (0.4468) loss 2.7469 (2.3935) grad_norm 3.7811 (3.8903) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][370/625] eta 0:01:54 lr 0.000016 wd 0.0500 time 0.4455 (0.4496) data time 0.0006 (0.0028) model time 0.4449 (0.4468) loss 2.6796 (2.3970) grad_norm 3.8719 (3.8626) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][380/625] eta 0:01:50 lr 0.000016 wd 0.0500 time 0.4445 (0.4495) data time 0.0008 (0.0027) model time 0.4437 (0.4467) loss 1.5510 (2.3954) grad_norm 2.8793 (3.8358) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][390/625] eta 0:01:45 lr 0.000016 wd 0.0500 time 0.4424 (0.4494) data time 0.0008 (0.0027) model time 0.4416 (0.4467) loss 2.2175 (2.3955) grad_norm 6.6181 (3.8274) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][400/625] eta 0:01:41 lr 0.000016 wd 0.0500 time 0.4460 (0.4493) data time 0.0008 (0.0026) model time 0.4452 (0.4466) loss 2.6481 (2.3939) grad_norm 2.1273 (3.8133) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][410/625] eta 0:01:36 lr 0.000016 wd 0.0500 time 0.4427 (0.4492) data time 0.0007 (0.0026) model time 0.4420 (0.4466) loss 2.6448 (2.3939) grad_norm 2.6069 (3.7942) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][420/625] eta 0:01:32 lr 0.000016 wd 0.0500 time 0.4487 (0.4492) data time 0.0006 (0.0025) model time 0.4481 (0.4465) loss 2.7666 (2.3976) grad_norm 1.9479 (3.7619) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][430/625] eta 0:01:27 lr 0.000016 wd 0.0500 time 0.4470 (0.4494) data time 0.0007 (0.0025) model time 0.4463 (0.4469) loss 3.0107 (2.4008) grad_norm 2.7488 (3.7492) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][440/625] eta 0:01:23 lr 0.000016 wd 0.0500 time 0.4459 (0.4494) data time 0.0006 (0.0025) model time 0.4453 (0.4468) loss 2.6239 (2.4027) grad_norm 2.4496 (3.7453) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][450/625] eta 0:01:18 lr 0.000016 wd 0.0500 time 0.4467 (0.4493) data time 0.0008 (0.0024) model time 0.4459 (0.4468) loss 2.4734 (2.4027) grad_norm 2.4809 (3.7286) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][460/625] eta 0:01:14 lr 0.000016 wd 0.0500 time 0.4459 (0.4492) data time 0.0007 (0.0024) model time 0.4453 (0.4468) loss 2.7038 (2.4013) grad_norm 3.9250 (3.7107) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:11:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][470/625] eta 0:01:09 lr 0.000016 wd 0.0500 time 0.4452 (0.4496) data time 0.0006 (0.0024) model time 0.4445 (0.4472) loss 2.8316 (2.4024) grad_norm 8.8861 (3.7097) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][480/625] eta 0:01:05 lr 0.000016 wd 0.0500 time 0.4456 (0.4495) data time 0.0008 (0.0023) model time 0.4448 (0.4472) loss 1.4726 (2.4014) grad_norm 3.8708 (3.6918) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][490/625] eta 0:01:00 lr 0.000016 wd 0.0500 time 0.4843 (0.4495) data time 0.0008 (0.0023) model time 0.4835 (0.4472) loss 2.7582 (2.4025) grad_norm 2.0131 (3.6844) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][500/625] eta 0:00:56 lr 0.000016 wd 0.0500 time 0.4464 (0.4495) data time 0.0006 (0.0023) model time 0.4458 (0.4472) loss 2.5485 (2.4029) grad_norm 3.0125 (3.8325) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][510/625] eta 0:00:51 lr 0.000016 wd 0.0500 time 0.4492 (0.4494) data time 0.0008 (0.0022) model time 0.4484 (0.4472) loss 1.8223 (2.4016) grad_norm 14.4092 (3.8359) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][520/625] eta 0:00:47 lr 0.000016 wd 0.0500 time 0.4456 (0.4493) data time 0.0006 (0.0022) model time 0.4449 (0.4471) loss 2.7686 (2.4002) grad_norm 2.2095 (3.8812) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][530/625] eta 0:00:42 lr 0.000016 wd 0.0500 time 0.4446 (0.4493) data time 0.0009 (0.0022) model time 0.4437 (0.4471) loss 2.5260 (2.4006) grad_norm 2.6286 (3.8722) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][540/625] eta 0:00:38 lr 0.000016 wd 0.0500 time 0.4458 (0.4492) data time 0.0009 (0.0022) model time 0.4449 (0.4470) loss 2.9847 (2.4036) grad_norm 2.6670 (3.8577) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][550/625] eta 0:00:33 lr 0.000016 wd 0.0500 time 0.4485 (0.4492) data time 0.0006 (0.0021) model time 0.4479 (0.4470) loss 2.4621 (2.4087) grad_norm 3.0720 (3.8393) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][560/625] eta 0:00:29 lr 0.000016 wd 0.0500 time 0.4430 (0.4491) data time 0.0006 (0.0021) model time 0.4424 (0.4469) loss 1.7382 (2.4055) grad_norm 2.9006 (3.8179) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][570/625] eta 0:00:24 lr 0.000016 wd 0.0500 time 0.4486 (0.4490) data time 0.0007 (0.0021) model time 0.4480 (0.4469) loss 2.0791 (2.4039) grad_norm 5.9848 (3.8072) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][580/625] eta 0:00:20 lr 0.000016 wd 0.0500 time 0.4419 (0.4490) data time 0.0006 (0.0021) model time 0.4413 (0.4468) loss 2.5992 (2.4044) grad_norm 4.1833 (3.8336) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][590/625] eta 0:00:15 lr 0.000016 wd 0.0500 time 0.4413 (0.4489) data time 0.0006 (0.0021) model time 0.4407 (0.4468) loss 1.9421 (2.4061) grad_norm 4.2758 (3.8387) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][600/625] eta 0:00:11 lr 0.000016 wd 0.0500 time 0.4419 (0.4489) data time 0.0006 (0.0020) model time 0.4413 (0.4468) loss 2.0177 (2.4041) grad_norm 2.1650 (3.8294) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:12:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][610/625] eta 0:00:06 lr 0.000016 wd 0.0500 time 0.4433 (0.4488) data time 0.0004 (0.0020) model time 0.4429 (0.4467) loss 2.6579 (2.4039) grad_norm 3.0418 (3.8268) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [289/300][620/625] eta 0:00:02 lr 0.000016 wd 0.0500 time 0.4416 (0.4487) data time 0.0004 (0.0020) model time 0.4412 (0.4466) loss 2.8539 (2.4035) grad_norm 3.4677 (3.8173) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:05 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 289 training takes 0:04:40 [2024-08-11 12:13:05 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:13:06 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 12:13:07 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5356 (0.5356) Acc@1 88.721 (88.721) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 12:13:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.123 (0.151) Loss 0.8496 (0.6339) Acc@1 80.664 (86.950) Acc@5 96.387 (97.812) Mem 16699MB [2024-08-11 12:13:09 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9370 (0.7563) Acc@1 79.443 (84.036) Acc@5 95.459 (96.684) Mem 16699MB [2024-08-11 12:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.799 Acc@5 96.605 [2024-08-11 12:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 12:13:10 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.774 (0.774) Loss 0.5283 (0.5283) Acc@1 89.014 (89.014) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 12:13:12 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.181) Loss 0.8477 (0.6270) Acc@1 81.055 (87.056) Acc@5 96.387 (97.807) Mem 16699MB [2024-08-11 12:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.115 (0.150) Loss 0.9302 (0.7482) Acc@1 79.297 (84.159) Acc@5 95.459 (96.675) Mem 16699MB [2024-08-11 12:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.907 Acc@5 96.613 [2024-08-11 12:13:13 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 12:13:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][0/625] eta 0:13:34 lr 0.000016 wd 0.0500 time 1.3026 (1.3026) data time 0.7508 (0.7508) model time 0.0000 (0.0000) loss 2.6287 (2.6287) grad_norm 2.3812 (2.3812) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][10/625] eta 0:05:22 lr 0.000016 wd 0.0500 time 0.4487 (0.5237) data time 0.0006 (0.0691) model time 0.0000 (0.0000) loss 1.5754 (2.3419) grad_norm 2.9648 (2.8804) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][20/625] eta 0:04:54 lr 0.000016 wd 0.0500 time 0.4426 (0.4862) data time 0.0008 (0.0366) model time 0.0000 (0.0000) loss 2.6844 (2.3744) grad_norm 3.1129 (4.1815) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][30/625] eta 0:04:41 lr 0.000016 wd 0.0500 time 0.4437 (0.4732) data time 0.0006 (0.0251) model time 0.0000 (0.0000) loss 2.8457 (2.4409) grad_norm 2.0085 (3.6640) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][40/625] eta 0:04:32 lr 0.000016 wd 0.0500 time 0.4443 (0.4664) data time 0.0009 (0.0192) model time 0.0000 (0.0000) loss 2.5603 (2.4901) grad_norm 3.1390 (3.5714) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][50/625] eta 0:04:30 lr 0.000016 wd 0.0500 time 0.4408 (0.4696) data time 0.0006 (0.0156) model time 0.0000 (0.0000) loss 2.9412 (2.4872) grad_norm 2.9647 (3.6256) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][60/625] eta 0:04:22 lr 0.000016 wd 0.0500 time 0.4432 (0.4654) data time 0.0006 (0.0132) model time 0.4426 (0.4434) loss 1.8500 (2.4603) grad_norm 2.0671 (3.6635) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][70/625] eta 0:04:16 lr 0.000016 wd 0.0500 time 0.4450 (0.4624) data time 0.0006 (0.0115) model time 0.4444 (0.4434) loss 2.7111 (2.4616) grad_norm 2.1654 (3.5888) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][80/625] eta 0:04:11 lr 0.000016 wd 0.0500 time 0.4441 (0.4622) data time 0.0007 (0.0101) model time 0.4434 (0.4488) loss 2.2422 (2.4487) grad_norm 6.4327 (3.6936) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:13:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][90/625] eta 0:04:06 lr 0.000016 wd 0.0500 time 0.4428 (0.4603) data time 0.0008 (0.0091) model time 0.4420 (0.4476) loss 2.4797 (2.4445) grad_norm 2.5857 (3.6467) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:14:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][100/625] eta 0:04:00 lr 0.000016 wd 0.0500 time 0.4444 (0.4589) data time 0.0008 (0.0083) model time 0.4437 (0.4471) loss 1.9795 (2.4385) grad_norm 5.8011 (3.6445) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:14:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][110/625] eta 0:03:55 lr 0.000016 wd 0.0500 time 0.4478 (0.4577) data time 0.0008 (0.0076) model time 0.4470 (0.4468) loss 2.4937 (2.4470) grad_norm 2.7357 (3.5719) loss_scale 64.0000 (64.0000) mem 16699MB [2024-08-11 12:14:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][120/625] eta 0:03:50 lr 0.000016 wd 0.0500 time 0.4459 (0.4569) data time 0.0006 (0.0071) model time 0.4452 (0.4469) loss 2.5885 (2.4632) grad_norm 3.5144 (3.5792) loss_scale 128.0000 (67.7025) mem 16699MB [2024-08-11 12:14:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][130/625] eta 0:03:45 lr 0.000016 wd 0.0500 time 0.4422 (0.4561) data time 0.0008 (0.0066) model time 0.4414 (0.4466) loss 2.1991 (2.4480) grad_norm 3.1533 (3.5460) loss_scale 128.0000 (72.3053) mem 16699MB [2024-08-11 12:14:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][140/625] eta 0:03:40 lr 0.000016 wd 0.0500 time 0.4427 (0.4553) data time 0.0009 (0.0062) model time 0.4418 (0.4463) loss 2.8738 (2.4558) grad_norm 2.9336 (3.5011) loss_scale 128.0000 (76.2553) mem 16699MB [2024-08-11 12:14:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][150/625] eta 0:03:35 lr 0.000016 wd 0.0500 time 0.4449 (0.4546) data time 0.0006 (0.0058) model time 0.4443 (0.4460) loss 3.0357 (2.4625) grad_norm 2.7872 (3.4604) loss_scale 128.0000 (79.6821) mem 16699MB [2024-08-11 12:14:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][160/625] eta 0:03:31 lr 0.000016 wd 0.0500 time 0.4439 (0.4541) data time 0.0009 (0.0055) model time 0.4431 (0.4460) loss 2.7761 (2.4670) grad_norm 2.8307 (3.4476) loss_scale 128.0000 (82.6832) mem 16699MB [2024-08-11 12:14:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][170/625] eta 0:03:26 lr 0.000016 wd 0.0500 time 0.4425 (0.4535) data time 0.0006 (0.0053) model time 0.4419 (0.4458) loss 2.4847 (2.4672) grad_norm 2.4768 (3.4079) loss_scale 128.0000 (85.3333) mem 16699MB [2024-08-11 12:14:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][180/625] eta 0:03:21 lr 0.000016 wd 0.0500 time 0.4461 (0.4530) data time 0.0008 (0.0050) model time 0.4453 (0.4457) loss 2.2999 (2.4769) grad_norm 2.1796 (3.3711) loss_scale 128.0000 (87.6906) mem 16699MB [2024-08-11 12:14:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][190/625] eta 0:03:16 lr 0.000016 wd 0.0500 time 0.4454 (0.4527) data time 0.0008 (0.0048) model time 0.4446 (0.4457) loss 2.8054 (2.4785) grad_norm 3.8033 (3.3987) loss_scale 128.0000 (89.8010) mem 16699MB [2024-08-11 12:14:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][200/625] eta 0:03:12 lr 0.000015 wd 0.0500 time 0.4450 (0.4524) data time 0.0007 (0.0046) model time 0.4444 (0.4457) loss 2.4988 (2.4822) grad_norm 4.0576 (3.3910) loss_scale 128.0000 (91.7015) mem 16699MB [2024-08-11 12:14:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][210/625] eta 0:03:07 lr 0.000015 wd 0.0500 time 0.4459 (0.4521) data time 0.0008 (0.0044) model time 0.4451 (0.4456) loss 2.2797 (2.4713) grad_norm 2.7481 (3.3903) loss_scale 128.0000 (93.4218) mem 16699MB [2024-08-11 12:14:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][220/625] eta 0:03:02 lr 0.000015 wd 0.0500 time 0.4481 (0.4518) data time 0.0008 (0.0043) model time 0.4473 (0.4455) loss 1.9284 (2.4647) grad_norm 6.1563 (3.4131) loss_scale 128.0000 (94.9864) mem 16699MB [2024-08-11 12:14:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][230/625] eta 0:02:58 lr 0.000015 wd 0.0500 time 0.4433 (0.4515) data time 0.0009 (0.0041) model time 0.4425 (0.4455) loss 1.7966 (2.4601) grad_norm 2.4996 (3.3971) loss_scale 128.0000 (96.4156) mem 16699MB [2024-08-11 12:15:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][240/625] eta 0:02:54 lr 0.000015 wd 0.0500 time 0.4456 (0.4521) data time 0.0006 (0.0040) model time 0.4450 (0.4466) loss 2.0517 (2.4682) grad_norm 2.0878 (3.3783) loss_scale 128.0000 (97.7261) mem 16699MB [2024-08-11 12:15:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][250/625] eta 0:02:49 lr 0.000015 wd 0.0500 time 0.4457 (0.4519) data time 0.0008 (0.0039) model time 0.4449 (0.4465) loss 2.4376 (2.4682) grad_norm 3.2937 (3.3635) loss_scale 128.0000 (98.9323) mem 16699MB [2024-08-11 12:15:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][260/625] eta 0:02:44 lr 0.000015 wd 0.0500 time 0.4485 (0.4517) data time 0.0008 (0.0037) model time 0.4477 (0.4465) loss 1.9014 (2.4591) grad_norm 2.5431 (3.3621) loss_scale 128.0000 (100.0460) mem 16699MB [2024-08-11 12:15:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][270/625] eta 0:02:40 lr 0.000015 wd 0.0500 time 0.3916 (0.4520) data time 0.0009 (0.0036) model time 0.3907 (0.4471) loss 2.2685 (2.4573) grad_norm 2.4599 (3.3763) loss_scale 128.0000 (101.0775) mem 16699MB [2024-08-11 12:15:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][280/625] eta 0:02:35 lr 0.000015 wd 0.0500 time 0.4457 (0.4519) data time 0.0010 (0.0035) model time 0.4447 (0.4471) loss 2.4860 (2.4553) grad_norm 2.7081 (3.3773) loss_scale 128.0000 (102.0356) mem 16699MB [2024-08-11 12:15:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][290/625] eta 0:02:31 lr 0.000015 wd 0.0500 time 0.4490 (0.4517) data time 0.0009 (0.0034) model time 0.4481 (0.4470) loss 2.4239 (2.4558) grad_norm 3.7574 (3.3587) loss_scale 128.0000 (102.9278) mem 16699MB [2024-08-11 12:15:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][300/625] eta 0:02:26 lr 0.000015 wd 0.0500 time 0.4470 (0.4517) data time 0.0008 (0.0034) model time 0.4462 (0.4472) loss 2.5708 (2.4522) grad_norm 2.5782 (3.3462) loss_scale 128.0000 (103.7608) mem 16699MB [2024-08-11 12:15:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][310/625] eta 0:02:22 lr 0.000015 wd 0.0500 time 0.4446 (0.4515) data time 0.0007 (0.0033) model time 0.4440 (0.4471) loss 2.4793 (2.4549) grad_norm 2.7334 (3.3387) loss_scale 128.0000 (104.5402) mem 16699MB [2024-08-11 12:15:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][320/625] eta 0:02:17 lr 0.000015 wd 0.0500 time 0.4479 (0.4513) data time 0.0006 (0.0032) model time 0.4472 (0.4470) loss 2.4299 (2.4493) grad_norm 3.4913 (3.3358) loss_scale 128.0000 (105.2710) mem 16699MB [2024-08-11 12:15:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][330/625] eta 0:02:13 lr 0.000015 wd 0.0500 time 0.4477 (0.4512) data time 0.0006 (0.0031) model time 0.4471 (0.4470) loss 1.8639 (2.4415) grad_norm 4.2479 (3.3461) loss_scale 128.0000 (105.9577) mem 16699MB [2024-08-11 12:15:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][340/625] eta 0:02:08 lr 0.000015 wd 0.0500 time 0.4474 (0.4510) data time 0.0006 (0.0031) model time 0.4468 (0.4469) loss 2.8007 (2.4393) grad_norm 2.3004 (3.3320) loss_scale 128.0000 (106.6041) mem 16699MB [2024-08-11 12:15:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][350/625] eta 0:02:04 lr 0.000015 wd 0.0500 time 0.4469 (0.4509) data time 0.0008 (0.0030) model time 0.4460 (0.4469) loss 1.4721 (2.4326) grad_norm 2.9813 (3.3267) loss_scale 128.0000 (107.2137) mem 16699MB [2024-08-11 12:15:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][360/625] eta 0:01:59 lr 0.000015 wd 0.0500 time 0.4444 (0.4508) data time 0.0008 (0.0029) model time 0.4436 (0.4469) loss 2.4592 (2.4293) grad_norm 2.3309 (3.3454) loss_scale 128.0000 (107.7895) mem 16699MB [2024-08-11 12:16:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][370/625] eta 0:01:54 lr 0.000015 wd 0.0500 time 0.4439 (0.4506) data time 0.0008 (0.0029) model time 0.4432 (0.4468) loss 2.9420 (2.4355) grad_norm 3.9547 (3.3518) loss_scale 128.0000 (108.3342) mem 16699MB [2024-08-11 12:16:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][380/625] eta 0:01:50 lr 0.000015 wd 0.0500 time 0.4440 (0.4505) data time 0.0006 (0.0028) model time 0.4434 (0.4467) loss 1.4589 (2.4301) grad_norm 3.9344 (3.3644) loss_scale 128.0000 (108.8504) mem 16699MB [2024-08-11 12:16:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][390/625] eta 0:01:45 lr 0.000015 wd 0.0500 time 0.4472 (0.4504) data time 0.0006 (0.0028) model time 0.4465 (0.4466) loss 1.8987 (2.4319) grad_norm 3.1758 (3.3719) loss_scale 128.0000 (109.3402) mem 16699MB [2024-08-11 12:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][400/625] eta 0:01:41 lr 0.000015 wd 0.0500 time 0.4455 (0.4503) data time 0.0007 (0.0027) model time 0.4448 (0.4466) loss 2.8177 (2.4340) grad_norm 36.3615 (3.4540) loss_scale 128.0000 (109.8055) mem 16699MB [2024-08-11 12:16:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][410/625] eta 0:01:36 lr 0.000015 wd 0.0500 time 0.4435 (0.4501) data time 0.0009 (0.0027) model time 0.4426 (0.4465) loss 2.9327 (2.4380) grad_norm 4.9408 (3.4726) loss_scale 128.0000 (110.2482) mem 16699MB [2024-08-11 12:16:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][420/625] eta 0:01:32 lr 0.000015 wd 0.0500 time 0.4455 (0.4505) data time 0.0008 (0.0026) model time 0.4447 (0.4470) loss 2.2365 (2.4358) grad_norm 2.9007 (3.4531) loss_scale 128.0000 (110.6698) mem 16699MB [2024-08-11 12:16:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][430/625] eta 0:01:27 lr 0.000015 wd 0.0500 time 0.4455 (0.4504) data time 0.0007 (0.0026) model time 0.4448 (0.4469) loss 2.0456 (2.4338) grad_norm 5.6430 (3.4370) loss_scale 128.0000 (111.0719) mem 16699MB [2024-08-11 12:16:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][440/625] eta 0:01:23 lr 0.000015 wd 0.0500 time 0.4438 (0.4506) data time 0.0007 (0.0026) model time 0.4431 (0.4472) loss 2.6874 (2.4355) grad_norm 2.4793 (3.4197) loss_scale 128.0000 (111.4558) mem 16699MB [2024-08-11 12:16:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][450/625] eta 0:01:18 lr 0.000015 wd 0.0500 time 0.4482 (0.4505) data time 0.0008 (0.0025) model time 0.4474 (0.4472) loss 2.3558 (2.4386) grad_norm 2.1951 (3.4064) loss_scale 128.0000 (111.8226) mem 16699MB [2024-08-11 12:16:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][460/625] eta 0:01:14 lr 0.000015 wd 0.0500 time 0.4471 (0.4508) data time 0.0006 (0.0025) model time 0.4464 (0.4476) loss 2.4170 (2.4359) grad_norm 3.9886 (3.4001) loss_scale 128.0000 (112.1735) mem 16699MB [2024-08-11 12:16:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][470/625] eta 0:01:09 lr 0.000015 wd 0.0500 time 0.4420 (0.4507) data time 0.0007 (0.0025) model time 0.4414 (0.4476) loss 1.7861 (2.4368) grad_norm 2.2435 (3.3961) loss_scale 128.0000 (112.5096) mem 16699MB [2024-08-11 12:16:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][480/625] eta 0:01:05 lr 0.000015 wd 0.0500 time 0.4479 (0.4506) data time 0.0008 (0.0024) model time 0.4470 (0.4475) loss 2.3219 (2.4339) grad_norm 2.2865 (3.3837) loss_scale 128.0000 (112.8316) mem 16699MB [2024-08-11 12:16:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][490/625] eta 0:01:00 lr 0.000015 wd 0.0500 time 0.4453 (0.4505) data time 0.0009 (0.0024) model time 0.4444 (0.4475) loss 2.5408 (2.4361) grad_norm 2.9069 (3.6016) loss_scale 128.0000 (113.1405) mem 16699MB [2024-08-11 12:16:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][500/625] eta 0:00:56 lr 0.000015 wd 0.0500 time 0.4486 (0.4504) data time 0.0006 (0.0024) model time 0.4479 (0.4474) loss 2.4552 (2.4346) grad_norm 77.8175 (3.7451) loss_scale 128.0000 (113.4371) mem 16699MB [2024-08-11 12:17:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][510/625] eta 0:00:51 lr 0.000015 wd 0.0500 time 0.4568 (0.4504) data time 0.0009 (0.0023) model time 0.4559 (0.4474) loss 2.6274 (2.4373) grad_norm 16.0547 (3.7680) loss_scale 128.0000 (113.7221) mem 16699MB [2024-08-11 12:17:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][520/625] eta 0:00:47 lr 0.000015 wd 0.0500 time 0.4472 (0.4503) data time 0.0007 (0.0023) model time 0.4465 (0.4473) loss 2.5039 (2.4348) grad_norm 8.9747 (3.7601) loss_scale 128.0000 (113.9962) mem 16699MB [2024-08-11 12:17:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][530/625] eta 0:00:42 lr 0.000015 wd 0.0500 time 0.4438 (0.4502) data time 0.0009 (0.0023) model time 0.4429 (0.4472) loss 2.1191 (2.4362) grad_norm 2.8914 (3.8132) loss_scale 128.0000 (114.2599) mem 16699MB [2024-08-11 12:17:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][540/625] eta 0:00:38 lr 0.000015 wd 0.0500 time 0.4509 (0.4501) data time 0.0007 (0.0023) model time 0.4502 (0.4472) loss 2.7869 (2.4337) grad_norm 2.5161 (3.8200) loss_scale 128.0000 (114.5139) mem 16699MB [2024-08-11 12:17:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][550/625] eta 0:00:33 lr 0.000015 wd 0.0500 time 0.4417 (0.4500) data time 0.0008 (0.0022) model time 0.4409 (0.4471) loss 2.8716 (2.4336) grad_norm 2.9712 (3.8195) loss_scale 128.0000 (114.7586) mem 16699MB [2024-08-11 12:17:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][560/625] eta 0:00:29 lr 0.000015 wd 0.0500 time 0.4471 (0.4499) data time 0.0009 (0.0022) model time 0.4462 (0.4471) loss 1.5905 (2.4282) grad_norm 2.8826 (3.8087) loss_scale 128.0000 (114.9947) mem 16699MB [2024-08-11 12:17:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][570/625] eta 0:00:24 lr 0.000015 wd 0.0500 time 0.4514 (0.4498) data time 0.0008 (0.0022) model time 0.4506 (0.4470) loss 2.0863 (2.4300) grad_norm 2.7616 (3.7936) loss_scale 128.0000 (115.2224) mem 16699MB [2024-08-11 12:17:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][580/625] eta 0:00:20 lr 0.000015 wd 0.0500 time 0.4443 (0.4498) data time 0.0008 (0.0022) model time 0.4435 (0.4470) loss 2.1185 (2.4291) grad_norm 2.2255 (3.7789) loss_scale 128.0000 (115.4423) mem 16699MB [2024-08-11 12:17:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][590/625] eta 0:00:15 lr 0.000015 wd 0.0500 time 0.4461 (0.4497) data time 0.0008 (0.0021) model time 0.4453 (0.4470) loss 1.9700 (2.4281) grad_norm 2.4611 (3.8023) loss_scale 128.0000 (115.6548) mem 16699MB [2024-08-11 12:17:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][600/625] eta 0:00:11 lr 0.000015 wd 0.0500 time 0.4436 (0.4496) data time 0.0007 (0.0021) model time 0.4430 (0.4469) loss 1.4616 (2.4269) grad_norm 7.7744 (3.7979) loss_scale 128.0000 (115.8602) mem 16699MB [2024-08-11 12:17:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][610/625] eta 0:00:06 lr 0.000015 wd 0.0500 time 0.4455 (0.4499) data time 0.0006 (0.0021) model time 0.4450 (0.4473) loss 1.7179 (2.4252) grad_norm 3.6080 (3.8022) loss_scale 128.0000 (116.0589) mem 16699MB [2024-08-11 12:17:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [290/300][620/625] eta 0:00:02 lr 0.000015 wd 0.0500 time 0.4444 (0.4498) data time 0.0006 (0.0021) model time 0.4439 (0.4472) loss 2.5869 (2.4266) grad_norm 42.1208 (3.8752) loss_scale 128.0000 (116.2512) mem 16699MB [2024-08-11 12:17:54 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 290 training takes 0:04:41 [2024-08-11 12:17:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:17:56 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 12:17:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.461 (0.461) Loss 0.5322 (0.5322) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16699MB [2024-08-11 12:17:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.150) Loss 0.8564 (0.6315) Acc@1 80.713 (87.021) Acc@5 96.143 (97.834) Mem 16699MB [2024-08-11 12:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.134) Loss 0.9380 (0.7559) Acc@1 79.590 (84.087) Acc@5 95.166 (96.656) Mem 16699MB [2024-08-11 12:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.821 Acc@5 96.575 [2024-08-11 12:17:59 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 12:18:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.875 (0.875) Loss 0.5288 (0.5288) Acc@1 88.965 (88.965) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 12:18:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.189) Loss 0.8481 (0.6272) Acc@1 81.104 (87.052) Acc@5 96.387 (97.807) Mem 16699MB [2024-08-11 12:18:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.154) Loss 0.9307 (0.7485) Acc@1 79.297 (84.154) Acc@5 95.459 (96.677) Mem 16699MB [2024-08-11 12:18:03 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.899 Acc@5 96.615 [2024-08-11 12:18:03 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 12:18:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][0/625] eta 0:13:33 lr 0.000015 wd 0.0500 time 1.3012 (1.3012) data time 0.5134 (0.5134) model time 0.0000 (0.0000) loss 2.7344 (2.7344) grad_norm 3.2650 (3.2650) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][10/625] eta 0:05:22 lr 0.000015 wd 0.0500 time 0.4460 (0.5250) data time 0.0008 (0.0475) model time 0.0000 (0.0000) loss 2.6890 (2.4249) grad_norm 3.1193 (3.2507) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][20/625] eta 0:04:55 lr 0.000015 wd 0.0500 time 0.4450 (0.4876) data time 0.0008 (0.0253) model time 0.0000 (0.0000) loss 2.4167 (2.4639) grad_norm 2.0936 (3.0716) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][30/625] eta 0:04:42 lr 0.000015 wd 0.0500 time 0.4445 (0.4743) data time 0.0007 (0.0174) model time 0.0000 (0.0000) loss 2.0050 (2.4458) grad_norm 3.5547 (3.1014) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][40/625] eta 0:04:33 lr 0.000015 wd 0.0500 time 0.4465 (0.4674) data time 0.0008 (0.0134) model time 0.0000 (0.0000) loss 2.6460 (2.4264) grad_norm 2.0243 (3.0590) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][50/625] eta 0:04:27 lr 0.000015 wd 0.0500 time 0.4448 (0.4658) data time 0.0008 (0.0109) model time 0.0000 (0.0000) loss 2.3777 (2.4329) grad_norm 9.3232 (3.2420) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][60/625] eta 0:04:21 lr 0.000015 wd 0.0500 time 0.4501 (0.4626) data time 0.0007 (0.0092) model time 0.4495 (0.4455) loss 2.7247 (2.4596) grad_norm 3.2153 (3.3370) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][70/625] eta 0:04:16 lr 0.000015 wd 0.0500 time 0.4487 (0.4628) data time 0.0008 (0.0081) model time 0.4479 (0.4544) loss 2.3909 (2.4398) grad_norm 3.9463 (3.7069) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][80/625] eta 0:04:11 lr 0.000015 wd 0.0500 time 0.4441 (0.4606) data time 0.0006 (0.0072) model time 0.4435 (0.4510) loss 2.4756 (2.4584) grad_norm 2.9718 (3.6395) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][90/625] eta 0:04:05 lr 0.000015 wd 0.0500 time 0.4475 (0.4590) data time 0.0006 (0.0065) model time 0.4468 (0.4495) loss 2.9224 (2.4862) grad_norm 2.7665 (3.5837) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][100/625] eta 0:04:00 lr 0.000015 wd 0.0500 time 0.4447 (0.4579) data time 0.0006 (0.0059) model time 0.4441 (0.4491) loss 2.7630 (2.4741) grad_norm 5.5295 (3.5707) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][110/625] eta 0:03:55 lr 0.000015 wd 0.0500 time 0.4472 (0.4570) data time 0.0006 (0.0055) model time 0.4466 (0.4487) loss 2.6779 (2.4734) grad_norm 2.5856 (3.5542) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:18:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][120/625] eta 0:03:50 lr 0.000015 wd 0.0500 time 0.4430 (0.4561) data time 0.0008 (0.0051) model time 0.4422 (0.4482) loss 2.5284 (2.4580) grad_norm 4.1565 (3.5357) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][130/625] eta 0:03:45 lr 0.000015 wd 0.0500 time 0.4454 (0.4555) data time 0.0006 (0.0048) model time 0.4448 (0.4480) loss 2.2945 (2.4540) grad_norm 4.5750 (3.5058) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][140/625] eta 0:03:40 lr 0.000015 wd 0.0500 time 0.4483 (0.4548) data time 0.0009 (0.0045) model time 0.4474 (0.4477) loss 2.6753 (2.4506) grad_norm 5.2451 (3.4875) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][150/625] eta 0:03:35 lr 0.000015 wd 0.0500 time 0.4459 (0.4543) data time 0.0007 (0.0043) model time 0.4453 (0.4476) loss 2.2963 (2.4434) grad_norm 3.0368 (3.4414) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][160/625] eta 0:03:31 lr 0.000015 wd 0.0500 time 0.4434 (0.4539) data time 0.0006 (0.0041) model time 0.4429 (0.4474) loss 3.2694 (2.4543) grad_norm 2.4389 (3.4166) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][170/625] eta 0:03:26 lr 0.000015 wd 0.0500 time 0.4438 (0.4534) data time 0.0007 (0.0039) model time 0.4431 (0.4472) loss 1.6586 (2.4426) grad_norm 2.4045 (3.3963) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][180/625] eta 0:03:21 lr 0.000015 wd 0.0500 time 0.4464 (0.4530) data time 0.0007 (0.0037) model time 0.4457 (0.4471) loss 2.2065 (2.4335) grad_norm 2.9690 (3.4427) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][190/625] eta 0:03:16 lr 0.000015 wd 0.0500 time 0.4458 (0.4527) data time 0.0006 (0.0036) model time 0.4452 (0.4470) loss 2.5517 (2.4207) grad_norm 2.8790 (3.4320) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][200/625] eta 0:03:12 lr 0.000015 wd 0.0500 time 0.4490 (0.4524) data time 0.0008 (0.0034) model time 0.4482 (0.4470) loss 1.9584 (2.4242) grad_norm 3.5469 (3.4099) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][210/625] eta 0:03:07 lr 0.000015 wd 0.0500 time 0.4442 (0.4529) data time 0.0010 (0.0033) model time 0.4432 (0.4479) loss 1.8480 (2.4190) grad_norm 64.5262 (3.6947) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][220/625] eta 0:03:03 lr 0.000015 wd 0.0500 time 0.4457 (0.4527) data time 0.0008 (0.0032) model time 0.4449 (0.4478) loss 2.5026 (2.4199) grad_norm 2.9813 (3.7740) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][230/625] eta 0:02:58 lr 0.000015 wd 0.0500 time 0.4460 (0.4525) data time 0.0006 (0.0031) model time 0.4453 (0.4478) loss 2.6929 (2.4165) grad_norm 3.8742 (3.7600) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][240/625] eta 0:02:54 lr 0.000015 wd 0.0500 time 0.4447 (0.4522) data time 0.0006 (0.0030) model time 0.4440 (0.4476) loss 2.6634 (2.4232) grad_norm 2.1255 (3.7692) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:19:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][250/625] eta 0:02:49 lr 0.000015 wd 0.0500 time 0.4449 (0.4519) data time 0.0006 (0.0029) model time 0.4443 (0.4475) loss 2.7065 (2.4158) grad_norm 2.7358 (3.7495) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][260/625] eta 0:02:44 lr 0.000015 wd 0.0500 time 0.4443 (0.4516) data time 0.0006 (0.0028) model time 0.4437 (0.4472) loss 2.9335 (2.4191) grad_norm 2.8116 (3.7374) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][270/625] eta 0:02:40 lr 0.000015 wd 0.0500 time 0.4437 (0.4516) data time 0.0008 (0.0028) model time 0.4428 (0.4474) loss 2.5815 (2.4219) grad_norm 4.9923 (3.7322) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][280/625] eta 0:02:35 lr 0.000015 wd 0.0500 time 0.4451 (0.4513) data time 0.0008 (0.0027) model time 0.4443 (0.4472) loss 2.2979 (2.4155) grad_norm 2.7984 (3.7104) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][290/625] eta 0:02:31 lr 0.000015 wd 0.0500 time 0.4445 (0.4511) data time 0.0009 (0.0026) model time 0.4436 (0.4471) loss 1.7949 (2.4126) grad_norm 2.0552 (3.7213) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][300/625] eta 0:02:26 lr 0.000015 wd 0.0500 time 0.4487 (0.4509) data time 0.0006 (0.0026) model time 0.4480 (0.4470) loss 1.6588 (2.4124) grad_norm 3.8239 (3.7443) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][310/625] eta 0:02:21 lr 0.000015 wd 0.0500 time 0.4450 (0.4507) data time 0.0008 (0.0025) model time 0.4442 (0.4469) loss 1.5261 (2.4106) grad_norm 3.6690 (3.7319) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][320/625] eta 0:02:17 lr 0.000015 wd 0.0500 time 0.4466 (0.4506) data time 0.0007 (0.0025) model time 0.4460 (0.4468) loss 2.6944 (2.4074) grad_norm 77.5671 (3.9624) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][330/625] eta 0:02:12 lr 0.000015 wd 0.0500 time 0.4408 (0.4504) data time 0.0006 (0.0024) model time 0.4401 (0.4467) loss 1.9973 (2.4056) grad_norm 3.6834 (3.9438) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][340/625] eta 0:02:08 lr 0.000015 wd 0.0500 time 0.4454 (0.4502) data time 0.0006 (0.0024) model time 0.4448 (0.4466) loss 2.4895 (2.4105) grad_norm 3.2708 (3.9023) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][350/625] eta 0:02:03 lr 0.000015 wd 0.0500 time 0.6731 (0.4507) data time 0.0006 (0.0023) model time 0.6725 (0.4473) loss 2.7000 (2.4101) grad_norm 2.7354 (3.8914) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][360/625] eta 0:01:59 lr 0.000015 wd 0.0500 time 0.4471 (0.4506) data time 0.0009 (0.0023) model time 0.4462 (0.4473) loss 1.4852 (2.4090) grad_norm 2.9686 (3.8834) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][370/625] eta 0:01:54 lr 0.000015 wd 0.0500 time 0.4495 (0.4506) data time 0.0006 (0.0022) model time 0.4489 (0.4473) loss 2.9606 (2.4099) grad_norm 3.2903 (3.9408) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][380/625] eta 0:01:50 lr 0.000015 wd 0.0500 time 0.4454 (0.4505) data time 0.0007 (0.0022) model time 0.4448 (0.4473) loss 2.1105 (2.4095) grad_norm 3.4142 (3.9283) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:20:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][390/625] eta 0:01:45 lr 0.000015 wd 0.0500 time 0.4466 (0.4504) data time 0.0007 (0.0022) model time 0.4459 (0.4472) loss 2.2934 (2.4074) grad_norm 2.7954 (3.8962) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][400/625] eta 0:01:41 lr 0.000015 wd 0.0500 time 0.4456 (0.4503) data time 0.0008 (0.0021) model time 0.4448 (0.4472) loss 1.5109 (2.4067) grad_norm 3.0014 (3.9506) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][410/625] eta 0:01:36 lr 0.000015 wd 0.0500 time 0.4489 (0.4506) data time 0.0007 (0.0021) model time 0.4482 (0.4476) loss 2.7808 (2.4132) grad_norm 1.9708 (3.9289) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][420/625] eta 0:01:32 lr 0.000015 wd 0.0500 time 0.4493 (0.4509) data time 0.0006 (0.0021) model time 0.4487 (0.4480) loss 2.2142 (2.4109) grad_norm 3.4152 (3.9348) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][430/625] eta 0:01:27 lr 0.000015 wd 0.0500 time 0.4427 (0.4508) data time 0.0008 (0.0021) model time 0.4420 (0.4479) loss 2.8371 (2.4147) grad_norm 2.9392 (4.0651) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][440/625] eta 0:01:23 lr 0.000015 wd 0.0500 time 0.4472 (0.4507) data time 0.0008 (0.0020) model time 0.4464 (0.4479) loss 2.5523 (2.4168) grad_norm 2.5287 (4.0357) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][450/625] eta 0:01:18 lr 0.000015 wd 0.0500 time 0.4437 (0.4507) data time 0.0009 (0.0020) model time 0.4428 (0.4479) loss 2.9422 (2.4143) grad_norm 3.3635 (4.1266) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][460/625] eta 0:01:14 lr 0.000015 wd 0.0500 time 0.4473 (0.4506) data time 0.0008 (0.0020) model time 0.4465 (0.4478) loss 2.8485 (2.4182) grad_norm 2.6449 (4.1064) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][470/625] eta 0:01:09 lr 0.000015 wd 0.0500 time 0.4408 (0.4505) data time 0.0009 (0.0020) model time 0.4400 (0.4478) loss 2.6897 (2.4208) grad_norm 11.1622 (4.1120) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][480/625] eta 0:01:05 lr 0.000015 wd 0.0500 time 0.4435 (0.4504) data time 0.0006 (0.0019) model time 0.4429 (0.4477) loss 2.9018 (2.4265) grad_norm 4.1868 (4.0944) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][490/625] eta 0:01:00 lr 0.000015 wd 0.0500 time 0.4464 (0.4503) data time 0.0008 (0.0019) model time 0.4456 (0.4477) loss 2.4811 (2.4261) grad_norm 2.0050 (4.0708) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][500/625] eta 0:00:56 lr 0.000015 wd 0.0500 time 0.4470 (0.4503) data time 0.0006 (0.0019) model time 0.4463 (0.4476) loss 1.8271 (2.4259) grad_norm 2.1902 (4.0487) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][510/625] eta 0:00:51 lr 0.000015 wd 0.0500 time 0.4526 (0.4502) data time 0.0006 (0.0019) model time 0.4520 (0.4476) loss 2.5700 (2.4269) grad_norm 1.7445 (4.0293) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:21:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][520/625] eta 0:00:47 lr 0.000014 wd 0.0500 time 0.4450 (0.4501) data time 0.0008 (0.0018) model time 0.4442 (0.4475) loss 2.8724 (2.4276) grad_norm 3.7773 (4.0089) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][530/625] eta 0:00:42 lr 0.000014 wd 0.0500 time 0.4425 (0.4500) data time 0.0008 (0.0018) model time 0.4417 (0.4474) loss 2.2586 (2.4287) grad_norm 2.2418 (3.9839) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][540/625] eta 0:00:38 lr 0.000014 wd 0.0500 time 0.4468 (0.4499) data time 0.0009 (0.0018) model time 0.4459 (0.4474) loss 2.6527 (2.4294) grad_norm 2.4433 (3.9678) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][550/625] eta 0:00:33 lr 0.000014 wd 0.0500 time 0.4448 (0.4498) data time 0.0008 (0.0018) model time 0.4440 (0.4473) loss 2.5041 (2.4288) grad_norm 3.3634 (3.9549) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][560/625] eta 0:00:29 lr 0.000014 wd 0.0500 time 0.4470 (0.4498) data time 0.0006 (0.0018) model time 0.4464 (0.4473) loss 2.8436 (2.4288) grad_norm 2.7203 (3.9414) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][570/625] eta 0:00:24 lr 0.000014 wd 0.0500 time 0.4430 (0.4497) data time 0.0006 (0.0018) model time 0.4423 (0.4472) loss 1.7098 (2.4300) grad_norm 4.9342 (3.9351) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][580/625] eta 0:00:20 lr 0.000014 wd 0.0500 time 0.4489 (0.4496) data time 0.0008 (0.0017) model time 0.4481 (0.4472) loss 2.6214 (2.4336) grad_norm 3.8888 (3.9442) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][590/625] eta 0:00:15 lr 0.000014 wd 0.0500 time 0.4457 (0.4496) data time 0.0006 (0.0017) model time 0.4451 (0.4472) loss 2.3969 (2.4353) grad_norm 2.7131 (3.9585) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][600/625] eta 0:00:11 lr 0.000014 wd 0.0500 time 0.4428 (0.4496) data time 0.0006 (0.0017) model time 0.4422 (0.4472) loss 2.3849 (2.4333) grad_norm 2.7739 (3.9388) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][610/625] eta 0:00:06 lr 0.000014 wd 0.0500 time 0.4457 (0.4495) data time 0.0004 (0.0017) model time 0.4453 (0.4472) loss 2.5117 (2.4349) grad_norm 3.2122 (3.9225) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [291/300][620/625] eta 0:00:02 lr 0.000014 wd 0.0500 time 0.4409 (0.4494) data time 0.0004 (0.0017) model time 0.4405 (0.4471) loss 2.0613 (2.4327) grad_norm 2.9688 (3.9219) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:44 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 291 training takes 0:04:40 [2024-08-11 12:22:44 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:22:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 12:22:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.456 (0.456) Loss 0.5342 (0.5342) Acc@1 88.818 (88.818) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 12:22:47 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.151) Loss 0.8467 (0.6310) Acc@1 80.713 (86.963) Acc@5 96.191 (97.776) Mem 16699MB [2024-08-11 12:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.134) Loss 0.9365 (0.7548) Acc@1 79.346 (84.122) Acc@5 95.264 (96.629) Mem 16699MB [2024-08-11 12:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.853 Acc@5 96.563 [2024-08-11 12:22:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 12:22:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.764 (0.764) Loss 0.5288 (0.5288) Acc@1 88.965 (88.965) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 12:22:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.115 (0.180) Loss 0.8477 (0.6274) Acc@1 81.055 (87.061) Acc@5 96.387 (97.820) Mem 16699MB [2024-08-11 12:22:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.116 (0.149) Loss 0.9302 (0.7490) Acc@1 79.346 (84.168) Acc@5 95.459 (96.684) Mem 16699MB [2024-08-11 12:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.907 Acc@5 96.619 [2024-08-11 12:22:52 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 12:22:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][0/625] eta 0:12:59 lr 0.000014 wd 0.0500 time 1.2473 (1.2473) data time 0.6305 (0.6305) model time 0.0000 (0.0000) loss 2.5394 (2.5394) grad_norm 2.7761 (2.7761) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:22:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][10/625] eta 0:05:20 lr 0.000014 wd 0.0500 time 0.4444 (0.5204) data time 0.0007 (0.0581) model time 0.0000 (0.0000) loss 2.5663 (2.3994) grad_norm 3.7017 (3.3270) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:23:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][20/625] eta 0:04:53 lr 0.000014 wd 0.0500 time 0.4489 (0.4857) data time 0.0008 (0.0309) model time 0.0000 (0.0000) loss 2.7004 (2.4501) grad_norm 2.1698 (3.0797) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 12:23:05 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 12:23:06 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:23:07 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 12:42:27 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 12:42:28 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 12:44:05 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 12:44:06 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 12:44:19 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 12:44:28 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 12:44:28 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 12:44:30 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 12:44:32 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 12:44:32 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 292) [2024-08-11 12:44:32 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 12:46:05 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 12:46:06 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 12:46:19 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 12:46:27 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 12:46:27 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 12:46:29 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 12:46:31 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 12:46:31 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 292) [2024-08-11 12:46:31 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 12:46:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][30/625] eta 1:00:35 lr 0.000014 wd 0.0500 time 0.4456 (6.1109) data time 0.0007 (0.2970) model time 0.0000 (0.0000) loss 2.3966 (2.5617) grad_norm 2.3460 (3.1433) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:46:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][40/625] eta 0:17:06 lr 0.000014 wd 0.0500 time 0.4456 (1.7545) data time 0.0008 (0.0692) model time 0.0000 (0.0000) loss 2.6834 (2.6405) grad_norm 2.3047 (2.8250) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][50/625] eta 0:11:21 lr 0.000014 wd 0.0500 time 0.4439 (1.1852) data time 0.0006 (0.0395) model time 0.0000 (0.0000) loss 2.6974 (2.6306) grad_norm 2.6042 (2.6533) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][60/625] eta 0:09:07 lr 0.000014 wd 0.0500 time 0.4443 (0.9692) data time 0.0007 (0.0278) model time 0.4436 (0.4716) loss 3.0134 (2.6449) grad_norm 3.9701 (3.0762) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][70/625] eta 0:07:53 lr 0.000014 wd 0.0500 time 0.4452 (0.8527) data time 0.0008 (0.0216) model time 0.4444 (0.4693) loss 2.2680 (2.5873) grad_norm 2.3055 (3.0908) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][80/625] eta 0:07:02 lr 0.000014 wd 0.0500 time 0.4494 (0.7761) data time 0.0009 (0.0177) model time 0.4485 (0.4616) loss 2.4548 (2.5782) grad_norm 3.2033 (3.1980) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][90/625] eta 0:06:27 lr 0.000014 wd 0.0500 time 0.4473 (0.7242) data time 0.0008 (0.0150) model time 0.4465 (0.4583) loss 2.4091 (2.5533) grad_norm 3.1029 (3.4301) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][100/625] eta 0:06:00 lr 0.000014 wd 0.0500 time 0.4485 (0.6864) data time 0.0008 (0.0131) model time 0.4476 (0.4561) loss 2.5870 (2.5165) grad_norm 3.1102 (3.4619) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][110/625] eta 0:05:38 lr 0.000014 wd 0.0500 time 0.4436 (0.6578) data time 0.0007 (0.0116) model time 0.4429 (0.4548) loss 1.8860 (2.5083) grad_norm 5.5397 (3.5219) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][120/625] eta 0:05:20 lr 0.000014 wd 0.0500 time 0.4495 (0.6353) data time 0.0006 (0.0104) model time 0.4488 (0.4538) loss 2.7562 (2.5053) grad_norm 2.6664 (3.5342) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][130/625] eta 0:05:05 lr 0.000014 wd 0.0500 time 0.4527 (0.6173) data time 0.0008 (0.0095) model time 0.4519 (0.4531) loss 2.5489 (2.5129) grad_norm 5.2923 (3.7440) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][140/625] eta 0:04:52 lr 0.000014 wd 0.0500 time 0.4449 (0.6023) data time 0.0007 (0.0088) model time 0.4442 (0.4525) loss 2.4265 (2.5050) grad_norm 3.1030 (3.6430) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][150/625] eta 0:04:40 lr 0.000014 wd 0.0500 time 0.4480 (0.5898) data time 0.0009 (0.0081) model time 0.4471 (0.4519) loss 2.3796 (2.5043) grad_norm 2.5021 (3.7675) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][160/625] eta 0:04:29 lr 0.000014 wd 0.0500 time 0.4463 (0.5792) data time 0.0009 (0.0076) model time 0.4454 (0.4515) loss 2.9048 (2.5077) grad_norm 2.1931 (3.7310) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:47:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][170/625] eta 0:04:19 lr 0.000014 wd 0.0500 time 0.4477 (0.5700) data time 0.0009 (0.0071) model time 0.4468 (0.4512) loss 2.7514 (2.4963) grad_norm 3.3749 (3.7606) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][180/625] eta 0:04:10 lr 0.000014 wd 0.0500 time 0.4469 (0.5622) data time 0.0007 (0.0067) model time 0.4462 (0.4510) loss 2.4097 (2.4875) grad_norm 3.5502 (3.7415) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][190/625] eta 0:04:01 lr 0.000014 wd 0.0500 time 0.4507 (0.5553) data time 0.0007 (0.0064) model time 0.4500 (0.4509) loss 2.0644 (2.4852) grad_norm 2.5859 (3.7864) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][200/625] eta 0:03:53 lr 0.000014 wd 0.0500 time 0.4464 (0.5493) data time 0.0009 (0.0060) model time 0.4456 (0.4508) loss 2.7797 (2.4821) grad_norm 2.4763 (4.0421) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][210/625] eta 0:03:46 lr 0.000014 wd 0.0500 time 0.6830 (0.5452) data time 0.0009 (0.0058) model time 0.6821 (0.4522) loss 2.8043 (2.4754) grad_norm 4.4816 (4.1031) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][220/625] eta 0:03:38 lr 0.000014 wd 0.0500 time 0.4484 (0.5398) data time 0.0007 (0.0055) model time 0.4478 (0.4516) loss 2.4773 (2.4659) grad_norm 3.9653 (4.0653) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][230/625] eta 0:03:31 lr 0.000014 wd 0.0500 time 0.4480 (0.5354) data time 0.0007 (0.0053) model time 0.4474 (0.4515) loss 1.6167 (2.4535) grad_norm 2.2766 (4.0139) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][240/625] eta 0:03:24 lr 0.000014 wd 0.0500 time 0.4501 (0.5315) data time 0.0010 (0.0051) model time 0.4491 (0.4515) loss 1.6443 (2.4482) grad_norm 3.6475 (4.0028) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][250/625] eta 0:03:17 lr 0.000014 wd 0.0500 time 0.4516 (0.5279) data time 0.0007 (0.0049) model time 0.4510 (0.4514) loss 1.6543 (2.4450) grad_norm 10.1711 (3.9898) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][260/625] eta 0:03:11 lr 0.000014 wd 0.0500 time 0.4492 (0.5246) data time 0.0009 (0.0047) model time 0.4483 (0.4514) loss 2.1661 (2.4421) grad_norm 4.7691 (3.9761) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][270/625] eta 0:03:05 lr 0.000014 wd 0.0500 time 0.4504 (0.5215) data time 0.0008 (0.0046) model time 0.4496 (0.4512) loss 2.5602 (2.4427) grad_norm 2.8551 (3.9423) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][280/625] eta 0:02:58 lr 0.000014 wd 0.0500 time 0.4507 (0.5187) data time 0.0008 (0.0044) model time 0.4499 (0.4511) loss 2.7315 (2.4359) grad_norm 3.8010 (3.9265) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][290/625] eta 0:02:52 lr 0.000014 wd 0.0500 time 0.4497 (0.5160) data time 0.0008 (0.0043) model time 0.4490 (0.4510) loss 2.1119 (2.4248) grad_norm 4.0912 (3.9378) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:48:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][300/625] eta 0:02:46 lr 0.000014 wd 0.0500 time 0.4481 (0.5136) data time 0.0007 (0.0042) model time 0.4474 (0.4509) loss 2.7001 (2.4209) grad_norm 3.6172 (3.9243) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][310/625] eta 0:02:41 lr 0.000014 wd 0.0500 time 0.4514 (0.5114) data time 0.0010 (0.0040) model time 0.4504 (0.4509) loss 2.7775 (2.4234) grad_norm 2.7945 (3.9613) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][320/625] eta 0:02:35 lr 0.000014 wd 0.0500 time 0.4490 (0.5093) data time 0.0011 (0.0039) model time 0.4479 (0.4509) loss 2.3178 (2.4172) grad_norm 2.1627 (3.9265) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][330/625] eta 0:02:29 lr 0.000014 wd 0.0500 time 0.4479 (0.5074) data time 0.0006 (0.0038) model time 0.4473 (0.4509) loss 2.5143 (2.4122) grad_norm 3.6694 (4.0543) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][340/625] eta 0:02:24 lr 0.000014 wd 0.0500 time 0.4478 (0.5056) data time 0.0007 (0.0037) model time 0.4471 (0.4508) loss 2.9297 (2.4134) grad_norm 4.1259 (4.0345) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][350/625] eta 0:02:18 lr 0.000014 wd 0.0500 time 0.4482 (0.5039) data time 0.0009 (0.0036) model time 0.4473 (0.4508) loss 2.9719 (2.4205) grad_norm 1.8000 (4.0337) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][360/625] eta 0:02:13 lr 0.000014 wd 0.0500 time 0.4482 (0.5022) data time 0.0009 (0.0036) model time 0.4473 (0.4507) loss 2.7859 (2.4203) grad_norm 2.4178 (3.9987) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][370/625] eta 0:02:07 lr 0.000014 wd 0.0500 time 0.4500 (0.5007) data time 0.0009 (0.0035) model time 0.4491 (0.4506) loss 2.5685 (2.4233) grad_norm 2.8717 (3.9792) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][380/625] eta 0:02:02 lr 0.000014 wd 0.0500 time 0.4494 (0.4993) data time 0.0006 (0.0034) model time 0.4488 (0.4506) loss 2.7207 (2.4254) grad_norm 3.0894 (3.9464) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][390/625] eta 0:01:57 lr 0.000014 wd 0.0500 time 0.4575 (0.4979) data time 0.0007 (0.0033) model time 0.4568 (0.4505) loss 2.6230 (2.4227) grad_norm 1.9379 (3.9095) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][400/625] eta 0:01:51 lr 0.000014 wd 0.0500 time 0.4510 (0.4972) data time 0.0006 (0.0033) model time 0.4504 (0.4511) loss 3.1476 (2.4221) grad_norm 2.9160 (3.8948) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][410/625] eta 0:01:46 lr 0.000014 wd 0.0500 time 0.4462 (0.4960) data time 0.0008 (0.0032) model time 0.4454 (0.4510) loss 1.6454 (2.4172) grad_norm 2.8786 (3.8611) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][420/625] eta 0:01:41 lr 0.000014 wd 0.0500 time 0.4512 (0.4948) data time 0.0010 (0.0032) model time 0.4501 (0.4509) loss 3.0577 (2.4150) grad_norm 3.3907 (3.8418) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][430/625] eta 0:01:36 lr 0.000014 wd 0.0500 time 0.4491 (0.4937) data time 0.0008 (0.0031) model time 0.4482 (0.4510) loss 2.2527 (2.4164) grad_norm 4.0513 (3.8505) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:49:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][440/625] eta 0:01:31 lr 0.000014 wd 0.0500 time 0.4480 (0.4927) data time 0.0009 (0.0031) model time 0.4471 (0.4509) loss 2.7565 (2.4201) grad_norm 3.2158 (3.8470) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][450/625] eta 0:01:26 lr 0.000014 wd 0.0500 time 0.4505 (0.4917) data time 0.0007 (0.0030) model time 0.4498 (0.4509) loss 1.9582 (2.4197) grad_norm 2.8857 (3.8520) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][460/625] eta 0:01:20 lr 0.000014 wd 0.0500 time 0.4466 (0.4908) data time 0.0007 (0.0029) model time 0.4459 (0.4509) loss 2.9177 (2.4275) grad_norm 2.5229 (3.8259) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][470/625] eta 0:01:15 lr 0.000014 wd 0.0500 time 0.4522 (0.4898) data time 0.0007 (0.0029) model time 0.4515 (0.4509) loss 2.4075 (2.4301) grad_norm 4.5797 (3.8438) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][480/625] eta 0:01:10 lr 0.000014 wd 0.0500 time 0.4525 (0.4890) data time 0.0006 (0.0029) model time 0.4519 (0.4508) loss 1.5965 (2.4284) grad_norm 4.4763 (3.8450) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][490/625] eta 0:01:05 lr 0.000014 wd 0.0500 time 0.4504 (0.4881) data time 0.0006 (0.0028) model time 0.4498 (0.4508) loss 2.1363 (2.4240) grad_norm 2.3745 (3.8340) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][500/625] eta 0:01:00 lr 0.000014 wd 0.0500 time 0.4477 (0.4873) data time 0.0007 (0.0028) model time 0.4470 (0.4508) loss 2.2179 (2.4208) grad_norm 4.1148 (3.8282) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][510/625] eta 0:00:55 lr 0.000014 wd 0.0500 time 0.4472 (0.4866) data time 0.0006 (0.0027) model time 0.4466 (0.4507) loss 2.8561 (2.4202) grad_norm 3.4036 (3.8154) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][520/625] eta 0:00:51 lr 0.000014 wd 0.0500 time 0.4494 (0.4859) data time 0.0007 (0.0027) model time 0.4487 (0.4508) loss 1.8818 (2.4227) grad_norm 2.7257 (3.8159) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][530/625] eta 0:00:46 lr 0.000014 wd 0.0500 time 0.4496 (0.4852) data time 0.0006 (0.0027) model time 0.4489 (0.4508) loss 1.9968 (2.4197) grad_norm 2.8452 (3.7947) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 12:50:42 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 12:50:42 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 12:50:46 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 14:00:05 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:00:06 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:00:09 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 14:00:30 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 14:00:30 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 14:00:32 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 14:00:34 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 14:00:35 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 292) [2024-08-11 14:00:35 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 14:00:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][540/625] eta 0:07:09 lr 0.000014 wd 0.0500 time 0.4381 (5.0480) data time 0.0006 (0.1883) model time 0.4375 (4.8597) loss 2.9922 (2.7328) grad_norm 3.1256 (3.2425) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][550/625] eta 0:02:11 lr 0.000014 wd 0.0500 time 0.4402 (1.7575) data time 0.0007 (0.0544) model time 0.4395 (1.7031) loss 2.5853 (2.5796) grad_norm 2.4945 (3.0501) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][560/625] eta 0:01:18 lr 0.000014 wd 0.0500 time 0.4371 (1.2087) data time 0.0008 (0.0321) model time 0.4363 (1.1766) loss 2.6511 (2.6226) grad_norm 3.2432 (3.0649) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][570/625] eta 0:00:54 lr 0.000014 wd 0.0500 time 0.4397 (0.9909) data time 0.0007 (0.0229) model time 0.4391 (0.9680) loss 1.8154 (2.6100) grad_norm 2.4410 (3.0461) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][580/625] eta 0:00:39 lr 0.000014 wd 0.0500 time 0.4445 (0.8714) data time 0.0006 (0.0179) model time 0.4439 (0.8535) loss 2.7023 (2.5830) grad_norm 3.1719 (3.6609) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][590/625] eta 0:00:27 lr 0.000014 wd 0.0500 time 0.4411 (0.7920) data time 0.0007 (0.0147) model time 0.4404 (0.7773) loss 2.8164 (2.5767) grad_norm 2.8741 (3.5182) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][600/625] eta 0:00:18 lr 0.000014 wd 0.0500 time 0.4394 (0.7373) data time 0.0007 (0.0125) model time 0.4388 (0.7247) loss 2.9258 (2.5399) grad_norm 2.4393 (3.4724) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][610/625] eta 0:00:10 lr 0.000014 wd 0.0500 time 0.4352 (0.6974) data time 0.0004 (0.0110) model time 0.4347 (0.6863) loss 2.6523 (2.5136) grad_norm 2.2634 (3.3697) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [292/300][620/625] eta 0:00:03 lr 0.000014 wd 0.0500 time 0.4388 (0.6665) data time 0.0006 (0.0098) model time 0.4382 (0.6568) loss 2.4741 (2.4884) grad_norm 5.4623 (3.3644) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:01:37 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 292 training takes 0:00:57 [2024-08-11 14:01:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 14:01:42 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 14:01:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5278 (0.5278) Acc@1 89.209 (89.209) Acc@5 98.926 (98.926) Mem 16699MB [2024-08-11 14:01:44 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.152) Loss 0.8530 (0.6315) Acc@1 80.615 (86.990) Acc@5 96.143 (97.758) Mem 16699MB [2024-08-11 14:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.135) Loss 0.9355 (0.7549) Acc@1 79.150 (84.075) Acc@5 95.264 (96.617) Mem 16699MB [2024-08-11 14:01:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.819 Acc@5 96.557 [2024-08-11 14:01:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 14:01:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.929 (0.929) Loss 0.5293 (0.5293) Acc@1 89.014 (89.014) Acc@5 99.023 (99.023) Mem 16699MB [2024-08-11 14:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.116 (0.193) Loss 0.8467 (0.6274) Acc@1 80.957 (87.047) Acc@5 96.436 (97.834) Mem 16699MB [2024-08-11 14:01:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.157) Loss 0.9307 (0.7494) Acc@1 79.297 (84.166) Acc@5 95.508 (96.689) Mem 16699MB [2024-08-11 14:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.905 Acc@5 96.619 [2024-08-11 14:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 14:01:52 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.91% [2024-08-11 14:01:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 14:01:53 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 14:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][0/625] eta 0:09:46 lr 0.000014 wd 0.0500 time 0.9382 (0.9382) data time 0.4617 (0.4617) model time 0.0000 (0.0000) loss 2.3521 (2.3521) grad_norm 2.7277 (2.7277) loss_scale 128.0000 (128.0000) mem 16712MB [2024-08-11 14:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][10/625] eta 0:05:00 lr 0.000014 wd 0.0500 time 0.4443 (0.4879) data time 0.0008 (0.0427) model time 0.0000 (0.0000) loss 2.7023 (2.4752) grad_norm 2.9767 (4.8916) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][20/625] eta 0:04:43 lr 0.000014 wd 0.0500 time 0.4488 (0.4683) data time 0.0007 (0.0227) model time 0.0000 (0.0000) loss 2.4131 (2.4995) grad_norm 2.1668 (3.8115) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][30/625] eta 0:04:34 lr 0.000014 wd 0.0500 time 0.4446 (0.4608) data time 0.0008 (0.0157) model time 0.0000 (0.0000) loss 2.4100 (2.5163) grad_norm 2.5782 (3.4967) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][40/625] eta 0:04:27 lr 0.000014 wd 0.0500 time 0.4465 (0.4570) data time 0.0007 (0.0121) model time 0.0000 (0.0000) loss 2.4040 (2.4762) grad_norm 2.8915 (3.6064) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][50/625] eta 0:04:21 lr 0.000014 wd 0.0500 time 0.4440 (0.4546) data time 0.0006 (0.0099) model time 0.0000 (0.0000) loss 2.7482 (2.4773) grad_norm 2.5357 (3.5667) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][60/625] eta 0:04:16 lr 0.000014 wd 0.0500 time 0.4453 (0.4532) data time 0.0007 (0.0084) model time 0.4446 (0.4452) loss 2.4686 (2.4558) grad_norm 4.3700 (3.5088) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][70/625] eta 0:04:10 lr 0.000014 wd 0.0500 time 0.4475 (0.4519) data time 0.0007 (0.0073) model time 0.4468 (0.4440) loss 2.9453 (2.4634) grad_norm 3.1888 (3.4341) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][80/625] eta 0:04:05 lr 0.000014 wd 0.0500 time 0.4411 (0.4508) data time 0.0009 (0.0065) model time 0.4402 (0.4435) loss 2.4832 (2.4721) grad_norm 2.2454 (3.7137) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][90/625] eta 0:04:00 lr 0.000014 wd 0.0500 time 0.4483 (0.4501) data time 0.0006 (0.0059) model time 0.4477 (0.4435) loss 2.5176 (2.4452) grad_norm 2.2691 (3.7923) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][100/625] eta 0:03:56 lr 0.000014 wd 0.0500 time 0.4442 (0.4495) data time 0.0006 (0.0054) model time 0.4435 (0.4434) loss 2.8973 (2.4528) grad_norm 2.5368 (3.9818) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][110/625] eta 0:03:51 lr 0.000014 wd 0.0500 time 0.4427 (0.4492) data time 0.0008 (0.0050) model time 0.4419 (0.4437) loss 1.9248 (2.4275) grad_norm 2.1485 (3.9194) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][120/625] eta 0:03:47 lr 0.000014 wd 0.0500 time 0.6785 (0.4508) data time 0.0006 (0.0047) model time 0.6779 (0.4472) loss 2.5350 (2.4213) grad_norm 2.9144 (3.8296) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 14:02:50 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 14:02:50 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 14:02:51 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 14:04:26 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:04:27 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:12:18 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:12:20 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:12:33 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 14:12:45 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 14:12:45 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 14:12:48 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 14:16:14 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:16:15 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:16:17 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 14:24:30 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:24:32 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:24:45 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 14:24:56 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 14:24:56 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 14:24:58 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 14:25:00 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 14:25:01 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 293) [2024-08-11 14:25:01 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 14:25:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][130/625] eta 0:30:01 lr 0.000014 wd 0.0500 time 0.4744 (3.6390) data time 0.0009 (0.1205) model time 0.4736 (3.5185) loss 2.8202 (2.5975) grad_norm 4.5535 (3.0953) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:25:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][140/625] eta 0:13:26 lr 0.000014 wd 0.0500 time 0.4765 (1.6627) data time 0.0011 (0.0459) model time 0.4755 (1.6168) loss 2.7051 (2.5596) grad_norm 2.4925 (2.9921) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:25:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][150/625] eta 0:09:32 lr 0.000014 wd 0.0500 time 0.4697 (1.2052) data time 0.0008 (0.0287) model time 0.4689 (1.1765) loss 2.4020 (2.5384) grad_norm 2.9086 (4.9573) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:25:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][160/625] eta 0:07:49 lr 0.000014 wd 0.0500 time 0.4740 (1.0097) data time 0.0011 (0.0210) model time 0.4730 (0.9887) loss 2.5854 (2.5295) grad_norm 3.4737 (4.4059) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:25:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][170/625] eta 0:06:48 lr 0.000014 wd 0.0500 time 0.4731 (0.8976) data time 0.0011 (0.0167) model time 0.4720 (0.8809) loss 2.4364 (2.5361) grad_norm 3.4094 (4.2738) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:25:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][180/625] eta 0:06:06 lr 0.000014 wd 0.0500 time 0.4832 (0.8226) data time 0.0011 (0.0139) model time 0.4821 (0.8087) loss 2.7611 (2.5241) grad_norm 3.4852 (4.0521) loss_scale 128.0000 (128.0000) mem 16699MB [2024-08-11 14:25:54 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 14:25:54 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 14:25:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 14:36:10 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:36:22 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 14:36:37 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 14:36:37 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 14:36:39 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 14:36:41 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 14:36:41 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 293) [2024-08-11 14:36:41 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 14:37:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][190/625] eta 0:22:45 lr 0.000014 wd 0.0500 time 0.4456 (3.1393) data time 0.0011 (0.0877) model time 0.4445 (3.0516) loss 2.4506 (2.6404) grad_norm 5.8345 (4.6631) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 14:37:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][200/625] eta 0:11:01 lr 0.000014 wd 0.0500 time 0.4486 (1.5564) data time 0.0008 (0.0366) model time 0.4477 (1.5197) loss 2.1968 (2.6129) grad_norm 2.7231 (4.4923) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 14:37:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][210/625] eta 0:07:55 lr 0.000014 wd 0.0500 time 0.4469 (1.1446) data time 0.0007 (0.0235) model time 0.4462 (1.1211) loss 2.9152 (2.5876) grad_norm 2.5430 (3.8761) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 14:37:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][220/625] eta 0:06:32 lr 0.000014 wd 0.0500 time 0.6962 (0.9692) data time 0.0008 (0.0174) model time 0.6954 (0.9518) loss 2.4209 (2.5832) grad_norm 2.4845 (3.5601) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 14:37:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][230/625] eta 0:05:38 lr 0.000014 wd 0.0500 time 0.4445 (0.8566) data time 0.0006 (0.0139) model time 0.4438 (0.8427) loss 2.7500 (2.5780) grad_norm 2.1315 (3.4328) loss_scale 128.0000 (128.0000) mem 16695MB [2024-08-11 14:37:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][240/625] eta 0:05:02 lr 0.000014 wd 0.0500 time 0.4482 (0.7845) data time 0.0009 (0.0116) model time 0.4474 (0.7729) loss 2.3426 (2.5673) grad_norm 1.8241 (4.8547) loss_scale 256.0000 (132.4912) mem 16695MB [2024-08-11 14:37:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][250/625] eta 0:04:35 lr 0.000014 wd 0.0500 time 0.4512 (0.7339) data time 0.0010 (0.0100) model time 0.4502 (0.7239) loss 2.2541 (2.5329) grad_norm 2.7623 (4.5698) loss_scale 256.0000 (150.9254) mem 16695MB [2024-08-11 14:37:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][260/625] eta 0:04:14 lr 0.000014 wd 0.0500 time 0.4471 (0.6966) data time 0.0008 (0.0088) model time 0.4463 (0.6878) loss 2.5740 (2.4984) grad_norm 2.2100 (4.3261) loss_scale 256.0000 (164.5714) mem 16695MB [2024-08-11 14:37:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][270/625] eta 0:03:57 lr 0.000014 wd 0.0500 time 0.4469 (0.6680) data time 0.0008 (0.0079) model time 0.4460 (0.6601) loss 2.6454 (2.4779) grad_norm 2.0516 (4.2026) loss_scale 256.0000 (175.0805) mem 16695MB [2024-08-11 14:37:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][280/625] eta 0:03:42 lr 0.000014 wd 0.0500 time 0.4402 (0.6452) data time 0.0008 (0.0072) model time 0.4394 (0.6380) loss 2.4784 (2.4755) grad_norm 3.0415 (4.1426) loss_scale 256.0000 (183.4227) mem 16695MB [2024-08-11 14:37:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][290/625] eta 0:03:29 lr 0.000014 wd 0.0500 time 0.4511 (0.6266) data time 0.0007 (0.0066) model time 0.4504 (0.6200) loss 2.4154 (2.4891) grad_norm 2.4143 (4.0937) loss_scale 256.0000 (190.2056) mem 16695MB [2024-08-11 14:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][300/625] eta 0:03:18 lr 0.000014 wd 0.0500 time 0.4488 (0.6113) data time 0.0008 (0.0061) model time 0.4480 (0.6052) loss 2.5779 (2.4823) grad_norm 3.2709 (4.1950) loss_scale 256.0000 (195.8291) mem 16695MB [2024-08-11 14:38:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][310/625] eta 0:03:08 lr 0.000014 wd 0.0500 time 0.4489 (0.5983) data time 0.0008 (0.0057) model time 0.4481 (0.5926) loss 2.5008 (2.4665) grad_norm 1.8186 (4.2396) loss_scale 256.0000 (200.5669) mem 16695MB [2024-08-11 14:38:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][320/625] eta 0:02:59 lr 0.000014 wd 0.0500 time 0.4490 (0.5874) data time 0.0006 (0.0054) model time 0.4484 (0.5820) loss 2.2654 (2.4666) grad_norm 3.4813 (4.1957) loss_scale 256.0000 (204.6131) mem 16695MB [2024-08-11 14:38:08 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 14:38:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 14:38:13 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 14:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 14:55:07 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 14:55:08 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 14:55:28 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 14:55:28 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 14:55:31 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 14:55:33 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 14:55:33 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 293) [2024-08-11 14:55:33 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 15:04:54 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 15:04:55 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 15:05:09 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 15:05:21 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 15:05:21 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 15:05:24 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 15:05:26 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 15:05:26 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 293) [2024-08-11 15:05:26 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 15:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 15:05:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:05:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 15:07:42 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 15:07:44 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 15:08:10 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 15:08:19 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 15:08:19 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 15:08:22 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 15:08:24 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 15:08:24 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 293) [2024-08-11 15:08:24 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 15:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][330/625] eta 0:23:39 lr 0.000014 wd 0.0500 time 0.4158 (4.8125) data time 0.0008 (0.1999) model time 0.4150 (4.6126) loss 2.7389 (2.6595) grad_norm 2.6363 (3.0943) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:08:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][340/625] eta 0:08:55 lr 0.000014 wd 0.0500 time 0.4120 (1.8797) data time 0.0011 (0.0673) model time 0.4109 (1.8124) loss 2.4767 (2.5283) grad_norm 3.6897 (4.6257) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][350/625] eta 0:05:55 lr 0.000014 wd 0.0500 time 0.4233 (1.2936) data time 0.0011 (0.0408) model time 0.4222 (1.2528) loss 2.5894 (2.5660) grad_norm 3.2665 (3.9435) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][360/625] eta 0:04:38 lr 0.000014 wd 0.0500 time 0.4053 (1.0514) data time 0.0012 (0.0295) model time 0.4041 (1.0219) loss 2.4967 (2.5589) grad_norm 2.5240 (3.8825) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][370/625] eta 0:03:53 lr 0.000014 wd 0.0500 time 0.4124 (0.9154) data time 0.0010 (0.0232) model time 0.4114 (0.8922) loss 2.8994 (2.5501) grad_norm 17.6999 (4.0870) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][380/625] eta 0:03:21 lr 0.000014 wd 0.0500 time 0.4115 (0.8244) data time 0.0008 (0.0191) model time 0.4107 (0.8053) loss 2.0963 (2.5415) grad_norm 1.9809 (3.9723) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][390/625] eta 0:02:58 lr 0.000014 wd 0.0500 time 0.4143 (0.7614) data time 0.0011 (0.0163) model time 0.4132 (0.7451) loss 2.5056 (2.5200) grad_norm 3.5892 (3.8212) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][400/625] eta 0:02:40 lr 0.000014 wd 0.0500 time 0.4108 (0.7150) data time 0.0010 (0.0143) model time 0.4098 (0.7007) loss 2.0184 (2.4949) grad_norm 2.1402 (3.6335) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][410/625] eta 0:02:26 lr 0.000014 wd 0.0500 time 0.4116 (0.6796) data time 0.0008 (0.0128) model time 0.4109 (0.6668) loss 2.3517 (2.4816) grad_norm 2.3558 (3.6182) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][420/625] eta 0:02:13 lr 0.000013 wd 0.0500 time 0.4111 (0.6517) data time 0.0010 (0.0115) model time 0.4101 (0.6401) loss 2.8608 (2.4777) grad_norm 2.4663 (3.5438) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][430/625] eta 0:02:02 lr 0.000013 wd 0.0500 time 0.4136 (0.6288) data time 0.0010 (0.0105) model time 0.4126 (0.6183) loss 2.2732 (2.4901) grad_norm 2.2162 (3.4840) loss_scale 256.0000 (256.0000) mem 16687MB [2024-08-11 15:09:37 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 15:09:37 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:09:43 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 15:27:33 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 15:27:33 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 15:27:46 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 15:27:58 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 15:27:58 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 15:28:00 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 15:28:02 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 15:28:03 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 293) [2024-08-11 15:28:03 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 15:28:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][440/625] eta 0:16:09 lr 0.000013 wd 0.0500 time 0.4589 (5.2429) data time 0.0006 (0.1416) model time 0.4583 (5.1013) loss 2.7914 (2.5843) grad_norm 2.3453 (2.8470) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:28:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][450/625] eta 0:05:19 lr 0.000013 wd 0.0500 time 0.4538 (1.8229) data time 0.0005 (0.0410) model time 0.4533 (1.7819) loss 2.5382 (2.4939) grad_norm 2.8809 (3.9873) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:28:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][460/625] eta 0:03:26 lr 0.000013 wd 0.0500 time 0.4520 (1.2525) data time 0.0009 (0.0243) model time 0.4511 (1.2282) loss 2.3854 (2.4962) grad_norm 3.9980 (4.5424) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:28:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][470/625] eta 0:02:38 lr 0.000013 wd 0.0500 time 0.4498 (1.0251) data time 0.0006 (0.0174) model time 0.4491 (1.0077) loss 1.9921 (2.5326) grad_norm 2.1332 (4.0483) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:28:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][480/625] eta 0:02:10 lr 0.000013 wd 0.0500 time 0.4573 (0.8998) data time 0.0007 (0.0136) model time 0.4567 (0.8862) loss 2.5459 (2.5303) grad_norm 2.6968 (3.7552) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:28:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][490/625] eta 0:01:50 lr 0.000013 wd 0.0500 time 0.4551 (0.8171) data time 0.0006 (0.0113) model time 0.4544 (0.8058) loss 2.8044 (2.5319) grad_norm 3.0210 (3.6623) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:28:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][500/625] eta 0:01:35 lr 0.000013 wd 0.0500 time 0.4540 (0.7604) data time 0.0007 (0.0096) model time 0.4533 (0.7508) loss 2.8503 (2.5241) grad_norm 3.4594 (3.5952) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][510/625] eta 0:01:22 lr 0.000013 wd 0.0500 time 0.4541 (0.7193) data time 0.0007 (0.0084) model time 0.4534 (0.7109) loss 2.4827 (2.5095) grad_norm 3.7614 (3.5059) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][520/625] eta 0:01:12 lr 0.000013 wd 0.0500 time 0.4563 (0.6880) data time 0.0009 (0.0075) model time 0.4554 (0.6805) loss 2.4682 (2.4774) grad_norm 8.8327 (3.5111) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][530/625] eta 0:01:03 lr 0.000013 wd 0.0500 time 0.4533 (0.6633) data time 0.0010 (0.0068) model time 0.4523 (0.6565) loss 2.3958 (2.4664) grad_norm 2.3537 (3.5889) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][540/625] eta 0:00:54 lr 0.000013 wd 0.0500 time 0.4575 (0.6436) data time 0.0008 (0.0062) model time 0.4566 (0.6374) loss 2.6745 (2.4880) grad_norm 2.7013 (3.7584) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][550/625] eta 0:00:47 lr 0.000013 wd 0.0500 time 0.4591 (0.6270) data time 0.0009 (0.0058) model time 0.4582 (0.6213) loss 2.6993 (2.4773) grad_norm 2.8045 (3.6766) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][560/625] eta 0:00:39 lr 0.000013 wd 0.0500 time 0.4573 (0.6132) data time 0.0008 (0.0054) model time 0.4565 (0.6078) loss 2.2528 (2.4748) grad_norm 1.8080 (3.6085) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][570/625] eta 0:00:33 lr 0.000013 wd 0.0500 time 0.4563 (0.6014) data time 0.0009 (0.0050) model time 0.4554 (0.5964) loss 2.8068 (2.4702) grad_norm 2.8523 (3.6458) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][580/625] eta 0:00:26 lr 0.000013 wd 0.0500 time 0.4575 (0.5915) data time 0.0006 (0.0047) model time 0.4569 (0.5868) loss 2.3179 (2.4582) grad_norm 3.2160 (3.6394) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][590/625] eta 0:00:20 lr 0.000013 wd 0.0500 time 0.4594 (0.5829) data time 0.0007 (0.0045) model time 0.4587 (0.5784) loss 2.5729 (2.4544) grad_norm 3.3564 (3.5789) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][600/625] eta 0:00:14 lr 0.000013 wd 0.0500 time 0.4585 (0.5753) data time 0.0007 (0.0043) model time 0.4578 (0.5710) loss 2.2645 (2.4543) grad_norm 2.3601 (3.5440) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][610/625] eta 0:00:08 lr 0.000013 wd 0.0500 time 0.4470 (0.5684) data time 0.0006 (0.0041) model time 0.4464 (0.5643) loss 1.4634 (2.4511) grad_norm 1.9910 (3.5473) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [293/300][620/625] eta 0:00:02 lr 0.000013 wd 0.0500 time 0.3954 (0.5627) data time 0.0005 (0.0039) model time 0.3949 (0.5588) loss 2.2233 (2.4503) grad_norm 2.1756 (3.5550) loss_scale 256.0000 (256.0000) mem 16700MB [2024-08-11 15:29:52 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 293 training takes 0:01:45 [2024-08-11 15:29:52 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:29:57 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 15:29:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.455 (0.455) Loss 0.5337 (0.5337) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16700MB [2024-08-11 15:29:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.151) Loss 0.8481 (0.6306) Acc@1 80.859 (87.016) Acc@5 96.436 (97.807) Mem 16700MB [2024-08-11 15:30:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.135) Loss 0.9365 (0.7546) Acc@1 79.346 (84.103) Acc@5 95.312 (96.656) Mem 16700MB [2024-08-11 15:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.839 Acc@5 96.577 [2024-08-11 15:30:04 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 15:30:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 1.349 (1.349) Loss 0.5298 (0.5298) Acc@1 89.014 (89.014) Acc@5 98.975 (98.975) Mem 16700MB [2024-08-11 15:30:06 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.233) Loss 0.8481 (0.6280) Acc@1 80.957 (87.056) Acc@5 96.387 (97.812) Mem 16700MB [2024-08-11 15:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.178) Loss 0.9316 (0.7500) Acc@1 79.395 (84.170) Acc@5 95.508 (96.675) Mem 16700MB [2024-08-11 15:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.911 Acc@5 96.605 [2024-08-11 15:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 15:30:08 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.91% [2024-08-11 15:30:08 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 15:30:10 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 15:30:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][0/625] eta 0:09:26 lr 0.000013 wd 0.0500 time 0.9065 (0.9065) data time 0.3602 (0.3602) model time 0.0000 (0.0000) loss 2.7717 (2.7717) grad_norm 2.7680 (2.7680) loss_scale 256.0000 (256.0000) mem 16712MB [2024-08-11 15:30:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][10/625] eta 0:05:05 lr 0.000013 wd 0.0500 time 0.4590 (0.4970) data time 0.0008 (0.0335) model time 0.0000 (0.0000) loss 1.6848 (2.1987) grad_norm 2.6173 (3.0544) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][20/625] eta 0:04:49 lr 0.000013 wd 0.0500 time 0.4557 (0.4791) data time 0.0006 (0.0179) model time 0.0000 (0.0000) loss 2.5679 (2.2200) grad_norm 2.4754 (2.9288) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][30/625] eta 0:04:41 lr 0.000013 wd 0.0500 time 0.4567 (0.4724) data time 0.0007 (0.0124) model time 0.0000 (0.0000) loss 2.6658 (2.2623) grad_norm 4.3667 (2.8787) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][40/625] eta 0:04:34 lr 0.000013 wd 0.0500 time 0.4659 (0.4690) data time 0.0006 (0.0096) model time 0.0000 (0.0000) loss 1.8641 (2.3046) grad_norm 2.4925 (2.9139) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][50/625] eta 0:04:28 lr 0.000013 wd 0.0500 time 0.4599 (0.4668) data time 0.0006 (0.0079) model time 0.0000 (0.0000) loss 2.0026 (2.3058) grad_norm 2.0814 (3.0511) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][60/625] eta 0:04:22 lr 0.000013 wd 0.0500 time 0.4528 (0.4652) data time 0.0007 (0.0067) model time 0.4521 (0.4557) loss 2.1118 (2.3101) grad_norm 3.1007 (2.9978) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][70/625] eta 0:04:17 lr 0.000013 wd 0.0500 time 0.4614 (0.4642) data time 0.0009 (0.0059) model time 0.4605 (0.4565) loss 2.9263 (2.3017) grad_norm 3.9249 (3.3871) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][80/625] eta 0:04:13 lr 0.000013 wd 0.0500 time 0.4596 (0.4657) data time 0.0006 (0.0053) model time 0.4590 (0.4630) loss 1.5221 (2.2894) grad_norm 3.0033 (3.4303) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][90/625] eta 0:04:08 lr 0.000013 wd 0.0500 time 0.4609 (0.4649) data time 0.0008 (0.0048) model time 0.4601 (0.4615) loss 2.6537 (2.3210) grad_norm 2.4632 (3.4515) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:30:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][100/625] eta 0:04:03 lr 0.000013 wd 0.0500 time 0.4591 (0.4643) data time 0.0010 (0.0044) model time 0.4581 (0.4609) loss 2.4994 (2.3330) grad_norm 2.6526 (3.4324) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][110/625] eta 0:03:58 lr 0.000013 wd 0.0500 time 0.4594 (0.4638) data time 0.0008 (0.0041) model time 0.4586 (0.4604) loss 2.6041 (2.3142) grad_norm 3.2995 (3.4384) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][120/625] eta 0:03:54 lr 0.000013 wd 0.0500 time 0.4601 (0.4636) data time 0.0008 (0.0038) model time 0.4592 (0.4603) loss 2.3543 (2.3134) grad_norm 2.1871 (3.4240) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][130/625] eta 0:03:49 lr 0.000013 wd 0.0500 time 0.4602 (0.4630) data time 0.0006 (0.0036) model time 0.4596 (0.4597) loss 2.9433 (2.3364) grad_norm 2.2925 (3.4366) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][140/625] eta 0:03:44 lr 0.000013 wd 0.0500 time 0.4537 (0.4629) data time 0.0009 (0.0034) model time 0.4527 (0.4599) loss 1.7045 (2.3514) grad_norm 2.7090 (3.4322) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][150/625] eta 0:03:39 lr 0.000013 wd 0.0500 time 0.4587 (0.4626) data time 0.0008 (0.0032) model time 0.4578 (0.4595) loss 2.1595 (2.3550) grad_norm 2.5276 (3.4021) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][160/625] eta 0:03:34 lr 0.000013 wd 0.0500 time 0.4575 (0.4623) data time 0.0006 (0.0031) model time 0.4569 (0.4593) loss 2.5407 (2.3624) grad_norm 3.7375 (3.4018) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][170/625] eta 0:03:30 lr 0.000013 wd 0.0500 time 0.4586 (0.4621) data time 0.0006 (0.0029) model time 0.4580 (0.4593) loss 2.3920 (2.3666) grad_norm 2.1838 (3.4057) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][180/625] eta 0:03:25 lr 0.000013 wd 0.0500 time 0.4571 (0.4620) data time 0.0008 (0.0028) model time 0.4563 (0.4593) loss 2.6839 (2.3700) grad_norm 2.5449 (3.4992) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][190/625] eta 0:03:20 lr 0.000013 wd 0.0500 time 0.4571 (0.4619) data time 0.0008 (0.0027) model time 0.4563 (0.4593) loss 2.4812 (2.3725) grad_norm 2.2860 (3.4573) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][200/625] eta 0:03:16 lr 0.000013 wd 0.0500 time 0.4589 (0.4617) data time 0.0007 (0.0026) model time 0.4582 (0.4591) loss 2.4757 (2.3652) grad_norm 1.8751 (3.4470) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][210/625] eta 0:03:11 lr 0.000013 wd 0.0500 time 0.4558 (0.4615) data time 0.0007 (0.0025) model time 0.4551 (0.4590) loss 2.7731 (2.3732) grad_norm 3.4194 (3.4373) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][220/625] eta 0:03:06 lr 0.000013 wd 0.0500 time 0.4516 (0.4613) data time 0.0008 (0.0024) model time 0.4508 (0.4588) loss 2.4037 (2.3782) grad_norm 3.1742 (3.4056) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:31:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][230/625] eta 0:03:02 lr 0.000013 wd 0.0500 time 0.4593 (0.4612) data time 0.0006 (0.0024) model time 0.4587 (0.4588) loss 1.7404 (2.3782) grad_norm 5.1867 (3.3936) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][240/625] eta 0:02:57 lr 0.000013 wd 0.0500 time 0.4637 (0.4611) data time 0.0006 (0.0023) model time 0.4630 (0.4588) loss 2.5034 (2.3869) grad_norm 1.9662 (3.3779) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][250/625] eta 0:02:52 lr 0.000013 wd 0.0500 time 0.4574 (0.4610) data time 0.0008 (0.0023) model time 0.4565 (0.4587) loss 2.5885 (2.3968) grad_norm 3.2813 (3.3705) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][260/625] eta 0:02:48 lr 0.000013 wd 0.0500 time 0.4612 (0.4610) data time 0.0010 (0.0022) model time 0.4602 (0.4587) loss 2.5357 (2.3973) grad_norm 3.3843 (3.3945) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][270/625] eta 0:02:43 lr 0.000013 wd 0.0500 time 0.4598 (0.4609) data time 0.0009 (0.0022) model time 0.4589 (0.4586) loss 2.1178 (2.3898) grad_norm 3.5976 (3.4652) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][280/625] eta 0:02:38 lr 0.000013 wd 0.0500 time 0.4539 (0.4607) data time 0.0007 (0.0021) model time 0.4532 (0.4585) loss 1.3714 (2.3800) grad_norm 2.8489 (3.4730) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][290/625] eta 0:02:34 lr 0.000013 wd 0.0500 time 0.4591 (0.4606) data time 0.0008 (0.0021) model time 0.4583 (0.4584) loss 2.7620 (2.3808) grad_norm 3.9869 (3.4723) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][300/625] eta 0:02:29 lr 0.000013 wd 0.0500 time 0.4651 (0.4605) data time 0.0008 (0.0020) model time 0.4643 (0.4583) loss 2.2274 (2.3845) grad_norm 1.9441 (3.4482) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][310/625] eta 0:02:25 lr 0.000013 wd 0.0500 time 0.4550 (0.4604) data time 0.0006 (0.0020) model time 0.4544 (0.4582) loss 2.3799 (2.3845) grad_norm 2.9036 (3.4631) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][320/625] eta 0:02:20 lr 0.000013 wd 0.0500 time 0.4598 (0.4603) data time 0.0006 (0.0019) model time 0.4592 (0.4582) loss 2.8824 (2.3908) grad_norm 2.9573 (3.4433) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][330/625] eta 0:02:15 lr 0.000013 wd 0.0500 time 0.4572 (0.4602) data time 0.0008 (0.0019) model time 0.4564 (0.4582) loss 1.7252 (2.3956) grad_norm 2.9574 (3.4228) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][340/625] eta 0:02:11 lr 0.000013 wd 0.0500 time 0.4590 (0.4602) data time 0.0006 (0.0019) model time 0.4584 (0.4582) loss 2.6387 (2.3877) grad_norm 2.9273 (3.4014) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][350/625] eta 0:02:06 lr 0.000013 wd 0.0500 time 0.4519 (0.4600) data time 0.0008 (0.0018) model time 0.4511 (0.4581) loss 2.5492 (2.3900) grad_norm 3.6699 (3.3943) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:32:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][360/625] eta 0:02:01 lr 0.000013 wd 0.0500 time 0.4570 (0.4603) data time 0.0006 (0.0018) model time 0.4563 (0.4584) loss 2.7533 (2.3916) grad_norm 3.7071 (3.3974) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][370/625] eta 0:01:57 lr 0.000013 wd 0.0500 time 0.4525 (0.4602) data time 0.0006 (0.0018) model time 0.4519 (0.4584) loss 2.6696 (2.3946) grad_norm 2.5302 (3.3858) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][380/625] eta 0:01:52 lr 0.000013 wd 0.0500 time 0.4595 (0.4602) data time 0.0010 (0.0018) model time 0.4586 (0.4583) loss 2.2295 (2.3989) grad_norm 2.2986 (3.3783) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][390/625] eta 0:01:48 lr 0.000013 wd 0.0500 time 0.4588 (0.4601) data time 0.0006 (0.0017) model time 0.4582 (0.4583) loss 2.4042 (2.4006) grad_norm 4.3691 (3.3912) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][400/625] eta 0:01:43 lr 0.000013 wd 0.0500 time 0.4623 (0.4601) data time 0.0007 (0.0017) model time 0.4616 (0.4583) loss 1.8612 (2.4019) grad_norm 2.7271 (3.3812) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][410/625] eta 0:01:39 lr 0.000013 wd 0.0500 time 0.6790 (0.4606) data time 0.0009 (0.0017) model time 0.6781 (0.4589) loss 2.4307 (2.4042) grad_norm 2.8255 (3.3921) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][420/625] eta 0:01:34 lr 0.000013 wd 0.0500 time 0.4569 (0.4606) data time 0.0006 (0.0017) model time 0.4562 (0.4589) loss 2.7325 (2.4036) grad_norm 3.8090 (3.3830) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][430/625] eta 0:01:29 lr 0.000013 wd 0.0500 time 0.4556 (0.4605) data time 0.0006 (0.0017) model time 0.4550 (0.4588) loss 2.3822 (2.4052) grad_norm 3.2231 (3.3824) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][440/625] eta 0:01:25 lr 0.000013 wd 0.0500 time 0.4625 (0.4605) data time 0.0008 (0.0016) model time 0.4617 (0.4588) loss 2.5701 (2.4070) grad_norm 2.6003 (3.3996) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][450/625] eta 0:01:20 lr 0.000013 wd 0.0500 time 0.4567 (0.4604) data time 0.0006 (0.0016) model time 0.4561 (0.4587) loss 3.1739 (2.4098) grad_norm 2.6672 (3.3992) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][460/625] eta 0:01:15 lr 0.000013 wd 0.0500 time 0.4605 (0.4603) data time 0.0006 (0.0016) model time 0.4599 (0.4587) loss 2.2624 (2.4046) grad_norm 2.4563 (3.3930) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][470/625] eta 0:01:11 lr 0.000013 wd 0.0500 time 0.4631 (0.4603) data time 0.0008 (0.0016) model time 0.4623 (0.4587) loss 2.2862 (2.4036) grad_norm 2.0233 (3.3791) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][480/625] eta 0:01:06 lr 0.000013 wd 0.0500 time 0.4609 (0.4603) data time 0.0006 (0.0016) model time 0.4603 (0.4587) loss 2.3041 (2.4036) grad_norm 2.3313 (3.3634) loss_scale 256.0000 (256.0000) mem 16703MB [2024-08-11 15:33:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][490/625] eta 0:01:02 lr 0.000013 wd 0.0500 time 0.4580 (0.4602) data time 0.0006 (0.0015) model time 0.4574 (0.4587) loss 2.6731 (2.4054) grad_norm 2.9659 (inf) loss_scale 128.0000 (254.6965) mem 16703MB [2024-08-11 15:34:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][500/625] eta 0:00:57 lr 0.000013 wd 0.0500 time 0.4584 (0.4602) data time 0.0008 (0.0015) model time 0.4576 (0.4587) loss 2.1075 (2.4055) grad_norm 3.2973 (inf) loss_scale 128.0000 (252.1677) mem 16703MB [2024-08-11 15:34:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][510/625] eta 0:00:52 lr 0.000013 wd 0.0500 time 0.4576 (0.4605) data time 0.0008 (0.0015) model time 0.4567 (0.4590) loss 2.0993 (2.4015) grad_norm 2.4711 (inf) loss_scale 128.0000 (249.7378) mem 16703MB [2024-08-11 15:34:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][520/625] eta 0:00:48 lr 0.000013 wd 0.0500 time 0.4619 (0.4605) data time 0.0007 (0.0015) model time 0.4611 (0.4589) loss 1.9961 (2.4010) grad_norm 3.1302 (inf) loss_scale 128.0000 (247.4012) mem 16703MB [2024-08-11 15:34:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][530/625] eta 0:00:43 lr 0.000013 wd 0.0500 time 0.4571 (0.4604) data time 0.0007 (0.0015) model time 0.4564 (0.4589) loss 2.7060 (2.3980) grad_norm 2.2727 (inf) loss_scale 128.0000 (245.1525) mem 16703MB [2024-08-11 15:34:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][540/625] eta 0:00:39 lr 0.000013 wd 0.0500 time 0.4560 (0.4604) data time 0.0006 (0.0015) model time 0.4554 (0.4589) loss 2.6016 (2.3977) grad_norm 1.9373 (inf) loss_scale 128.0000 (242.9871) mem 16703MB [2024-08-11 15:34:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][550/625] eta 0:00:34 lr 0.000013 wd 0.0500 time 0.4590 (0.4604) data time 0.0009 (0.0015) model time 0.4581 (0.4589) loss 2.6444 (2.4028) grad_norm 10.1569 (inf) loss_scale 128.0000 (240.9002) mem 16703MB [2024-08-11 15:34:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][560/625] eta 0:00:29 lr 0.000013 wd 0.0500 time 0.4576 (0.4603) data time 0.0008 (0.0015) model time 0.4568 (0.4588) loss 2.3603 (2.4012) grad_norm 3.6033 (inf) loss_scale 128.0000 (238.8877) mem 16703MB [2024-08-11 15:34:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][570/625] eta 0:00:25 lr 0.000013 wd 0.0500 time 0.4519 (0.4603) data time 0.0008 (0.0015) model time 0.4510 (0.4588) loss 2.8956 (2.4007) grad_norm 3.7477 (inf) loss_scale 128.0000 (236.9457) mem 16703MB [2024-08-11 15:34:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][580/625] eta 0:00:20 lr 0.000013 wd 0.0500 time 0.4560 (0.4603) data time 0.0006 (0.0014) model time 0.4554 (0.4588) loss 2.3518 (2.4035) grad_norm 3.2419 (inf) loss_scale 128.0000 (235.0706) mem 16703MB [2024-08-11 15:34:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][590/625] eta 0:00:16 lr 0.000013 wd 0.0500 time 0.4580 (0.4602) data time 0.0006 (0.0014) model time 0.4574 (0.4587) loss 2.0097 (2.4044) grad_norm 2.5989 (inf) loss_scale 128.0000 (233.2589) mem 16703MB [2024-08-11 15:34:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][600/625] eta 0:00:11 lr 0.000013 wd 0.0500 time 0.6151 (0.4604) data time 0.0009 (0.0014) model time 0.6142 (0.4590) loss 2.4031 (2.4046) grad_norm 1.9277 (inf) loss_scale 128.0000 (231.5075) mem 16703MB [2024-08-11 15:34:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][610/625] eta 0:00:06 lr 0.000013 wd 0.0500 time 0.4568 (0.4604) data time 0.0006 (0.0014) model time 0.4562 (0.4590) loss 2.8126 (2.4061) grad_norm 8.9921 (inf) loss_scale 128.0000 (229.8134) mem 16703MB [2024-08-11 15:34:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [294/300][620/625] eta 0:00:02 lr 0.000013 wd 0.0500 time 0.4528 (0.4603) data time 0.0004 (0.0014) model time 0.4524 (0.4589) loss 1.6775 (2.4006) grad_norm 2.4620 (inf) loss_scale 128.0000 (228.1739) mem 16703MB [2024-08-11 15:34:57 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 294 training takes 0:04:47 [2024-08-11 15:34:57 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:34:59 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 15:34:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.450 (0.450) Loss 0.5278 (0.5278) Acc@1 88.916 (88.916) Acc@5 98.975 (98.975) Mem 16703MB [2024-08-11 15:35:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.152) Loss 0.8496 (0.6310) Acc@1 80.713 (86.954) Acc@5 96.387 (97.781) Mem 16703MB [2024-08-11 15:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.136) Loss 0.9346 (0.7559) Acc@1 79.297 (84.080) Acc@5 95.459 (96.626) Mem 16703MB [2024-08-11 15:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.823 Acc@5 96.551 [2024-08-11 15:35:02 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 15:35:03 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.799 (0.799) Loss 0.5298 (0.5298) Acc@1 89.014 (89.014) Acc@5 98.975 (98.975) Mem 16703MB [2024-08-11 15:35:04 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.185) Loss 0.8481 (0.6282) Acc@1 81.104 (87.043) Acc@5 96.436 (97.807) Mem 16703MB [2024-08-11 15:35:05 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.153) Loss 0.9321 (0.7504) Acc@1 79.248 (84.142) Acc@5 95.508 (96.668) Mem 16703MB [2024-08-11 15:35:06 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.889 Acc@5 96.599 [2024-08-11 15:35:06 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 15:35:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][0/625] eta 0:12:39 lr 0.000013 wd 0.0500 time 1.2148 (1.2148) data time 0.5914 (0.5914) model time 0.0000 (0.0000) loss 2.6199 (2.6199) grad_norm 2.5600 (2.5600) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][10/625] eta 0:05:24 lr 0.000013 wd 0.0500 time 0.4576 (0.5271) data time 0.0009 (0.0545) model time 0.0000 (0.0000) loss 2.7663 (2.4282) grad_norm 4.8877 (7.4860) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][20/625] eta 0:04:58 lr 0.000013 wd 0.0500 time 0.4593 (0.4941) data time 0.0006 (0.0289) model time 0.0000 (0.0000) loss 2.2425 (2.3385) grad_norm 2.4888 (5.3030) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][30/625] eta 0:04:46 lr 0.000013 wd 0.0500 time 0.4544 (0.4822) data time 0.0007 (0.0198) model time 0.0000 (0.0000) loss 1.6602 (2.3271) grad_norm 2.1382 (4.5218) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][40/625] eta 0:04:38 lr 0.000013 wd 0.0500 time 0.4570 (0.4759) data time 0.0009 (0.0152) model time 0.0000 (0.0000) loss 2.8560 (2.3495) grad_norm 2.5795 (4.0852) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][50/625] eta 0:04:31 lr 0.000013 wd 0.0500 time 0.4584 (0.4722) data time 0.0008 (0.0124) model time 0.0000 (0.0000) loss 2.4216 (2.3393) grad_norm 2.6151 (3.9461) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][60/625] eta 0:04:25 lr 0.000013 wd 0.0500 time 0.4559 (0.4700) data time 0.0009 (0.0105) model time 0.4550 (0.4578) loss 2.6642 (2.3492) grad_norm 2.8279 (3.9891) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][70/625] eta 0:04:19 lr 0.000013 wd 0.0500 time 0.4564 (0.4682) data time 0.0007 (0.0091) model time 0.4557 (0.4569) loss 2.7767 (2.3368) grad_norm 3.9891 (3.8679) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][80/625] eta 0:04:14 lr 0.000013 wd 0.0500 time 0.4561 (0.4669) data time 0.0008 (0.0081) model time 0.4553 (0.4569) loss 2.7475 (2.3429) grad_norm 56.1800 (4.5245) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][90/625] eta 0:04:09 lr 0.000013 wd 0.0500 time 0.4562 (0.4657) data time 0.0006 (0.0073) model time 0.4556 (0.4566) loss 1.9637 (2.3434) grad_norm 2.0944 (4.4241) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][100/625] eta 0:04:04 lr 0.000013 wd 0.0500 time 0.4522 (0.4663) data time 0.0006 (0.0066) model time 0.4515 (0.4595) loss 1.9877 (2.3457) grad_norm 4.5617 (4.2855) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:35:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][110/625] eta 0:03:59 lr 0.000013 wd 0.0500 time 0.4602 (0.4657) data time 0.0006 (0.0061) model time 0.4595 (0.4593) loss 3.0768 (2.3713) grad_norm 2.6307 (4.2239) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][120/625] eta 0:03:54 lr 0.000013 wd 0.0500 time 0.4585 (0.4651) data time 0.0008 (0.0057) model time 0.4577 (0.4591) loss 2.2209 (2.3895) grad_norm 3.5942 (4.1581) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][130/625] eta 0:03:50 lr 0.000013 wd 0.0500 time 0.4586 (0.4647) data time 0.0008 (0.0053) model time 0.4578 (0.4591) loss 2.1669 (2.3719) grad_norm 4.4432 (4.0928) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][140/625] eta 0:03:45 lr 0.000013 wd 0.0500 time 0.4570 (0.4643) data time 0.0006 (0.0050) model time 0.4564 (0.4589) loss 1.9881 (2.3592) grad_norm 1.8303 (4.0252) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][150/625] eta 0:03:40 lr 0.000013 wd 0.0500 time 0.4573 (0.4638) data time 0.0006 (0.0047) model time 0.4567 (0.4587) loss 2.4216 (2.3653) grad_norm 2.9343 (3.9768) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][160/625] eta 0:03:35 lr 0.000013 wd 0.0500 time 0.4581 (0.4633) data time 0.0009 (0.0045) model time 0.4572 (0.4584) loss 2.6823 (2.3779) grad_norm 3.5335 (3.9327) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][170/625] eta 0:03:30 lr 0.000013 wd 0.0500 time 0.4557 (0.4630) data time 0.0008 (0.0043) model time 0.4549 (0.4583) loss 2.8452 (2.3778) grad_norm 2.3890 (3.8652) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][180/625] eta 0:03:25 lr 0.000013 wd 0.0500 time 0.4753 (0.4629) data time 0.0008 (0.0041) model time 0.4744 (0.4584) loss 2.3428 (2.3731) grad_norm 2.2157 (3.9263) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][190/625] eta 0:03:21 lr 0.000013 wd 0.0500 time 0.4608 (0.4627) data time 0.0007 (0.0039) model time 0.4602 (0.4584) loss 2.0164 (2.3714) grad_norm 2.7034 (4.0453) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][200/625] eta 0:03:16 lr 0.000013 wd 0.0500 time 0.4534 (0.4624) data time 0.0006 (0.0037) model time 0.4528 (0.4583) loss 2.7764 (2.3852) grad_norm 3.2540 (4.0019) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][210/625] eta 0:03:11 lr 0.000013 wd 0.0500 time 0.4607 (0.4622) data time 0.0008 (0.0036) model time 0.4599 (0.4583) loss 1.9368 (2.3731) grad_norm 4.7162 (3.9770) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:48 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][220/625] eta 0:03:07 lr 0.000013 wd 0.0500 time 0.4548 (0.4621) data time 0.0006 (0.0035) model time 0.4542 (0.4582) loss 2.4812 (2.3675) grad_norm 3.4003 (3.9275) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][230/625] eta 0:03:02 lr 0.000013 wd 0.0500 time 0.4564 (0.4618) data time 0.0008 (0.0034) model time 0.4556 (0.4581) loss 1.5517 (2.3768) grad_norm 6.2195 (3.9056) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:36:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][240/625] eta 0:02:57 lr 0.000013 wd 0.0500 time 0.4547 (0.4616) data time 0.0008 (0.0032) model time 0.4539 (0.4579) loss 2.4485 (2.3777) grad_norm 3.0894 (3.9113) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][250/625] eta 0:02:52 lr 0.000013 wd 0.0500 time 0.4564 (0.4613) data time 0.0008 (0.0032) model time 0.4556 (0.4578) loss 2.4132 (2.3826) grad_norm 2.6209 (3.9276) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][260/625] eta 0:02:48 lr 0.000013 wd 0.0500 time 0.4597 (0.4613) data time 0.0008 (0.0031) model time 0.4589 (0.4578) loss 2.6741 (2.3757) grad_norm 3.4656 (3.9926) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][270/625] eta 0:02:43 lr 0.000013 wd 0.0500 time 0.4566 (0.4613) data time 0.0009 (0.0030) model time 0.4558 (0.4579) loss 2.3982 (2.3838) grad_norm 3.2906 (3.9717) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][280/625] eta 0:02:39 lr 0.000013 wd 0.0500 time 0.4569 (0.4612) data time 0.0009 (0.0029) model time 0.4560 (0.4579) loss 2.0351 (2.3791) grad_norm 2.9243 (3.9225) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][290/625] eta 0:02:34 lr 0.000013 wd 0.0500 time 0.4586 (0.4611) data time 0.0008 (0.0028) model time 0.4578 (0.4579) loss 2.7965 (2.3835) grad_norm 2.9605 (3.8995) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][300/625] eta 0:02:30 lr 0.000013 wd 0.0500 time 0.4564 (0.4617) data time 0.0008 (0.0028) model time 0.4556 (0.4587) loss 2.4545 (2.3885) grad_norm 3.8081 (3.9782) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][310/625] eta 0:02:25 lr 0.000013 wd 0.0500 time 0.4549 (0.4615) data time 0.0008 (0.0027) model time 0.4541 (0.4586) loss 2.0948 (2.3907) grad_norm 3.3101 (4.0295) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][320/625] eta 0:02:20 lr 0.000013 wd 0.0500 time 0.4549 (0.4615) data time 0.0008 (0.0026) model time 0.4541 (0.4587) loss 2.4890 (2.3934) grad_norm 4.2285 (4.0059) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][330/625] eta 0:02:16 lr 0.000013 wd 0.0500 time 0.4570 (0.4614) data time 0.0009 (0.0026) model time 0.4562 (0.4586) loss 2.7344 (2.3963) grad_norm 1.9227 (3.9989) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][340/625] eta 0:02:11 lr 0.000013 wd 0.0500 time 0.4589 (0.4613) data time 0.0006 (0.0025) model time 0.4584 (0.4586) loss 1.9187 (2.3950) grad_norm 2.5628 (3.9703) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][350/625] eta 0:02:06 lr 0.000013 wd 0.0500 time 0.4563 (0.4613) data time 0.0006 (0.0025) model time 0.4557 (0.4586) loss 2.0507 (2.3908) grad_norm 2.0079 (3.9504) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][360/625] eta 0:02:02 lr 0.000013 wd 0.0500 time 0.4575 (0.4612) data time 0.0006 (0.0024) model time 0.4569 (0.4586) loss 2.6425 (2.3951) grad_norm 3.1504 (4.0857) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:37:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][370/625] eta 0:01:57 lr 0.000013 wd 0.0500 time 0.4602 (0.4611) data time 0.0008 (0.0024) model time 0.4593 (0.4585) loss 2.9914 (2.4039) grad_norm 2.8815 (4.1136) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][380/625] eta 0:01:52 lr 0.000013 wd 0.0500 time 0.4586 (0.4611) data time 0.0007 (0.0024) model time 0.4579 (0.4585) loss 2.4131 (2.4063) grad_norm 2.1706 (4.0970) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][390/625] eta 0:01:48 lr 0.000013 wd 0.0500 time 0.4575 (0.4610) data time 0.0007 (0.0023) model time 0.4568 (0.4585) loss 2.7653 (2.4085) grad_norm 6.4784 (4.0914) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][400/625] eta 0:01:43 lr 0.000013 wd 0.0500 time 0.4569 (0.4609) data time 0.0008 (0.0023) model time 0.4561 (0.4585) loss 2.5590 (2.4099) grad_norm 2.5615 (4.0651) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][410/625] eta 0:01:39 lr 0.000013 wd 0.0500 time 0.4579 (0.4608) data time 0.0008 (0.0022) model time 0.4570 (0.4584) loss 2.3244 (2.4082) grad_norm 2.1812 (4.0472) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][420/625] eta 0:01:34 lr 0.000013 wd 0.0500 time 0.4617 (0.4608) data time 0.0007 (0.0022) model time 0.4609 (0.4584) loss 2.3057 (2.4046) grad_norm 1.8602 (4.0198) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][430/625] eta 0:01:29 lr 0.000013 wd 0.0500 time 0.4589 (0.4607) data time 0.0008 (0.0022) model time 0.4581 (0.4584) loss 2.4831 (2.4087) grad_norm 4.2363 (4.0037) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][440/625] eta 0:01:25 lr 0.000013 wd 0.0500 time 0.4558 (0.4607) data time 0.0008 (0.0021) model time 0.4550 (0.4584) loss 2.7357 (2.4057) grad_norm 2.6322 (3.9825) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][450/625] eta 0:01:20 lr 0.000013 wd 0.0500 time 0.4586 (0.4606) data time 0.0008 (0.0021) model time 0.4578 (0.4583) loss 2.6578 (2.4083) grad_norm 2.6128 (3.9589) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][460/625] eta 0:01:16 lr 0.000013 wd 0.0500 time 0.4567 (0.4609) data time 0.0006 (0.0021) model time 0.4561 (0.4587) loss 2.8531 (2.4087) grad_norm 2.6518 (3.9689) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][470/625] eta 0:01:11 lr 0.000013 wd 0.0500 time 0.4588 (0.4608) data time 0.0008 (0.0021) model time 0.4580 (0.4586) loss 2.3286 (2.4123) grad_norm 2.7782 (3.9539) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][480/625] eta 0:01:06 lr 0.000013 wd 0.0500 time 0.4649 (0.4608) data time 0.0006 (0.0020) model time 0.4643 (0.4586) loss 2.3414 (2.4152) grad_norm 3.8504 (3.9440) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][490/625] eta 0:01:02 lr 0.000013 wd 0.0500 time 0.4592 (0.4610) data time 0.0006 (0.0020) model time 0.4586 (0.4589) loss 2.7249 (2.4180) grad_norm 3.4902 (3.9244) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:38:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][500/625] eta 0:00:57 lr 0.000013 wd 0.0500 time 0.4596 (0.4610) data time 0.0006 (0.0020) model time 0.4590 (0.4589) loss 2.8929 (2.4145) grad_norm 5.0448 (3.9529) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][510/625] eta 0:00:53 lr 0.000013 wd 0.0500 time 0.4566 (0.4610) data time 0.0006 (0.0020) model time 0.4560 (0.4589) loss 2.2064 (2.4091) grad_norm 3.5912 (3.9536) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][520/625] eta 0:00:48 lr 0.000013 wd 0.0500 time 0.4647 (0.4610) data time 0.0008 (0.0019) model time 0.4639 (0.4589) loss 2.6818 (2.4123) grad_norm 2.7974 (3.9360) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][530/625] eta 0:00:43 lr 0.000013 wd 0.0500 time 0.4584 (0.4609) data time 0.0008 (0.0019) model time 0.4576 (0.4589) loss 2.4635 (2.4101) grad_norm 2.0670 (3.9160) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][540/625] eta 0:00:39 lr 0.000013 wd 0.0500 time 0.4536 (0.4609) data time 0.0008 (0.0019) model time 0.4528 (0.4589) loss 2.4473 (2.4128) grad_norm 3.6042 (3.9073) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][550/625] eta 0:00:34 lr 0.000013 wd 0.0500 time 0.4590 (0.4609) data time 0.0008 (0.0019) model time 0.4582 (0.4589) loss 2.3519 (2.4150) grad_norm 3.3546 (3.9023) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:24 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][560/625] eta 0:00:29 lr 0.000013 wd 0.0500 time 0.4617 (0.4608) data time 0.0008 (0.0019) model time 0.4609 (0.4589) loss 2.7366 (2.4176) grad_norm 2.7256 (3.8873) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][570/625] eta 0:00:25 lr 0.000013 wd 0.0500 time 0.4523 (0.4608) data time 0.0009 (0.0018) model time 0.4515 (0.4589) loss 1.8094 (2.4177) grad_norm 3.1907 (3.8813) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][580/625] eta 0:00:20 lr 0.000013 wd 0.0500 time 0.4606 (0.4608) data time 0.0006 (0.0018) model time 0.4600 (0.4589) loss 2.6425 (2.4161) grad_norm 2.5780 (3.8684) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][590/625] eta 0:00:16 lr 0.000013 wd 0.0500 time 0.4624 (0.4608) data time 0.0007 (0.0018) model time 0.4617 (0.4589) loss 2.3901 (2.4182) grad_norm 3.3803 (3.8540) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][600/625] eta 0:00:11 lr 0.000013 wd 0.0500 time 0.4550 (0.4607) data time 0.0008 (0.0018) model time 0.4542 (0.4588) loss 2.4947 (2.4193) grad_norm 2.6568 (3.8450) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][610/625] eta 0:00:06 lr 0.000013 wd 0.0500 time 0.4501 (0.4606) data time 0.0006 (0.0018) model time 0.4494 (0.4588) loss 2.9116 (2.4200) grad_norm 3.3748 (3.9323) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [295/300][620/625] eta 0:00:02 lr 0.000013 wd 0.0500 time 0.4559 (0.4605) data time 0.0004 (0.0018) model time 0.4555 (0.4586) loss 1.5951 (2.4179) grad_norm 3.6380 (3.9258) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:39:53 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 295 training takes 0:04:47 [2024-08-11 15:39:53 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:39:55 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 15:39:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.466 (0.466) Loss 0.5337 (0.5337) Acc@1 89.062 (89.062) Acc@5 98.926 (98.926) Mem 16703MB [2024-08-11 15:39:57 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.152) Loss 0.8491 (0.6316) Acc@1 80.762 (86.954) Acc@5 96.240 (97.807) Mem 16703MB [2024-08-11 15:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.136) Loss 0.9351 (0.7550) Acc@1 79.590 (84.112) Acc@5 95.410 (96.689) Mem 16703MB [2024-08-11 15:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.847 Acc@5 96.619 [2024-08-11 15:39:58 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 15:39:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.902 (0.902) Loss 0.5298 (0.5298) Acc@1 89.014 (89.014) Acc@5 98.975 (98.975) Mem 16703MB [2024-08-11 15:40:00 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.193) Loss 0.8491 (0.6284) Acc@1 81.104 (87.038) Acc@5 96.484 (97.820) Mem 16703MB [2024-08-11 15:40:01 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.158) Loss 0.9316 (0.7506) Acc@1 79.297 (84.145) Acc@5 95.508 (96.680) Mem 16703MB [2024-08-11 15:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.893 Acc@5 96.613 [2024-08-11 15:40:02 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 15:40:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][0/625] eta 0:13:19 lr 0.000013 wd 0.0500 time 1.2800 (1.2800) data time 0.4968 (0.4968) model time 0.0000 (0.0000) loss 1.7025 (1.7025) grad_norm 2.3302 (2.3302) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][10/625] eta 0:05:28 lr 0.000013 wd 0.0500 time 0.4613 (0.5349) data time 0.0005 (0.0459) model time 0.0000 (0.0000) loss 2.3192 (2.4376) grad_norm 4.4639 (3.5202) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][20/625] eta 0:05:02 lr 0.000013 wd 0.0500 time 0.4525 (0.4994) data time 0.0008 (0.0244) model time 0.0000 (0.0000) loss 1.5591 (2.4370) grad_norm 2.6524 (3.2905) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][30/625] eta 0:04:49 lr 0.000013 wd 0.0500 time 0.4615 (0.4866) data time 0.0008 (0.0168) model time 0.0000 (0.0000) loss 2.5589 (2.4649) grad_norm 4.8853 (3.2491) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][40/625] eta 0:04:40 lr 0.000013 wd 0.0500 time 0.4566 (0.4803) data time 0.0006 (0.0129) model time 0.0000 (0.0000) loss 2.4490 (2.4078) grad_norm 2.7693 (3.6404) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][50/625] eta 0:04:34 lr 0.000013 wd 0.0500 time 0.4592 (0.4765) data time 0.0006 (0.0105) model time 0.0000 (0.0000) loss 2.8366 (2.3449) grad_norm 2.1774 (3.4754) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][60/625] eta 0:04:27 lr 0.000013 wd 0.0500 time 0.4624 (0.4739) data time 0.0006 (0.0089) model time 0.4619 (0.4600) loss 2.6894 (2.3391) grad_norm 3.0527 (3.4532) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][70/625] eta 0:04:24 lr 0.000013 wd 0.0500 time 0.3992 (0.4769) data time 0.0009 (0.0077) model time 0.3983 (0.4770) loss 2.2586 (2.3672) grad_norm 2.6240 (3.4691) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][80/625] eta 0:04:18 lr 0.000013 wd 0.0500 time 0.4606 (0.4749) data time 0.0008 (0.0069) model time 0.4597 (0.4713) loss 2.5402 (2.3854) grad_norm 3.5020 (3.4541) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][90/625] eta 0:04:13 lr 0.000013 wd 0.0500 time 0.4608 (0.4733) data time 0.0008 (0.0062) model time 0.4600 (0.4684) loss 2.7678 (2.3973) grad_norm 4.2672 (3.4807) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][100/625] eta 0:04:07 lr 0.000013 wd 0.0500 time 0.4593 (0.4720) data time 0.0008 (0.0057) model time 0.4585 (0.4666) loss 2.4314 (2.4003) grad_norm 2.4541 (3.5292) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][110/625] eta 0:04:02 lr 0.000013 wd 0.0500 time 0.4563 (0.4706) data time 0.0008 (0.0052) model time 0.4555 (0.4649) loss 1.5134 (2.3765) grad_norm 2.3369 (4.2417) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:40:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][120/625] eta 0:03:57 lr 0.000013 wd 0.0500 time 0.4590 (0.4697) data time 0.0008 (0.0049) model time 0.4582 (0.4639) loss 2.4819 (2.3913) grad_norm 3.3441 (4.1413) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][130/625] eta 0:03:52 lr 0.000013 wd 0.0500 time 0.4619 (0.4688) data time 0.0007 (0.0046) model time 0.4612 (0.4631) loss 2.6003 (2.3707) grad_norm 2.1135 (4.0703) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][140/625] eta 0:03:47 lr 0.000013 wd 0.0500 time 0.4601 (0.4682) data time 0.0008 (0.0043) model time 0.4593 (0.4627) loss 2.4414 (2.3704) grad_norm 2.3921 (4.5637) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][150/625] eta 0:03:42 lr 0.000013 wd 0.0500 time 0.4607 (0.4677) data time 0.0006 (0.0040) model time 0.4601 (0.4624) loss 2.7993 (2.3739) grad_norm 2.1624 (4.4756) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][160/625] eta 0:03:37 lr 0.000013 wd 0.0500 time 0.4619 (0.4672) data time 0.0008 (0.0038) model time 0.4611 (0.4621) loss 2.2860 (2.3813) grad_norm 2.7117 (4.3786) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][170/625] eta 0:03:32 lr 0.000013 wd 0.0500 time 0.4621 (0.4668) data time 0.0008 (0.0037) model time 0.4613 (0.4620) loss 2.9793 (2.3931) grad_norm 3.8332 (4.3065) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][180/625] eta 0:03:27 lr 0.000013 wd 0.0500 time 0.4549 (0.4664) data time 0.0008 (0.0035) model time 0.4541 (0.4616) loss 1.7059 (2.3940) grad_norm 3.9969 (4.2568) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][190/625] eta 0:03:22 lr 0.000013 wd 0.0500 time 0.4621 (0.4660) data time 0.0008 (0.0034) model time 0.4613 (0.4614) loss 2.7995 (2.3953) grad_norm 2.8290 (4.2028) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][200/625] eta 0:03:17 lr 0.000013 wd 0.0500 time 0.4614 (0.4656) data time 0.0006 (0.0032) model time 0.4608 (0.4612) loss 1.9179 (2.4040) grad_norm 2.8750 (4.2052) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][210/625] eta 0:03:13 lr 0.000013 wd 0.0500 time 0.4602 (0.4654) data time 0.0005 (0.0031) model time 0.4596 (0.4610) loss 2.7872 (2.3830) grad_norm 2.9656 (4.2109) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][220/625] eta 0:03:08 lr 0.000012 wd 0.0500 time 0.4603 (0.4651) data time 0.0006 (0.0030) model time 0.4598 (0.4609) loss 2.5617 (2.3712) grad_norm 3.9106 (4.1736) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][230/625] eta 0:03:03 lr 0.000012 wd 0.0500 time 0.4579 (0.4649) data time 0.0006 (0.0029) model time 0.4574 (0.4608) loss 2.5987 (2.3698) grad_norm 19.8608 (4.1995) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][240/625] eta 0:02:58 lr 0.000012 wd 0.0500 time 0.4581 (0.4647) data time 0.0006 (0.0028) model time 0.4575 (0.4607) loss 2.3383 (2.3682) grad_norm 2.2492 (4.1610) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:41:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][250/625] eta 0:02:54 lr 0.000012 wd 0.0500 time 0.4587 (0.4644) data time 0.0009 (0.0027) model time 0.4579 (0.4606) loss 2.7451 (2.3750) grad_norm 2.1402 (4.1023) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][260/625] eta 0:02:49 lr 0.000012 wd 0.0500 time 0.4569 (0.4642) data time 0.0008 (0.0027) model time 0.4561 (0.4604) loss 2.9841 (2.3808) grad_norm 3.6089 (4.1370) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][270/625] eta 0:02:44 lr 0.000012 wd 0.0500 time 0.4604 (0.4640) data time 0.0006 (0.0026) model time 0.4598 (0.4604) loss 2.0655 (2.3854) grad_norm 3.4621 (4.1109) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][280/625] eta 0:02:40 lr 0.000012 wd 0.0500 time 0.4546 (0.4638) data time 0.0006 (0.0025) model time 0.4540 (0.4602) loss 2.2261 (2.3849) grad_norm 2.8335 (4.0848) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][290/625] eta 0:02:35 lr 0.000012 wd 0.0500 time 0.6370 (0.4648) data time 0.0008 (0.0025) model time 0.6362 (0.4615) loss 2.8850 (2.3859) grad_norm 2.8042 (4.2601) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][300/625] eta 0:02:30 lr 0.000012 wd 0.0500 time 0.4568 (0.4645) data time 0.0009 (0.0024) model time 0.4559 (0.4613) loss 2.3848 (2.3884) grad_norm 4.4448 (4.2404) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][310/625] eta 0:02:26 lr 0.000012 wd 0.0500 time 0.4574 (0.4643) data time 0.0006 (0.0024) model time 0.4568 (0.4612) loss 1.8402 (2.3848) grad_norm 3.9229 (4.2220) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][320/625] eta 0:02:21 lr 0.000012 wd 0.0500 time 0.4588 (0.4641) data time 0.0008 (0.0023) model time 0.4580 (0.4610) loss 3.0931 (2.3793) grad_norm 4.3061 (4.1942) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][330/625] eta 0:02:16 lr 0.000012 wd 0.0500 time 0.4587 (0.4639) data time 0.0008 (0.0023) model time 0.4579 (0.4608) loss 2.7741 (2.3833) grad_norm 3.5530 (4.1647) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][340/625] eta 0:02:12 lr 0.000012 wd 0.0500 time 0.4569 (0.4637) data time 0.0008 (0.0022) model time 0.4561 (0.4607) loss 1.7014 (2.3821) grad_norm 2.4177 (4.1245) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][350/625] eta 0:02:07 lr 0.000012 wd 0.0500 time 0.4603 (0.4636) data time 0.0006 (0.0022) model time 0.4597 (0.4606) loss 2.2720 (2.3783) grad_norm 2.7157 (4.1069) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][360/625] eta 0:02:02 lr 0.000012 wd 0.0500 time 0.4581 (0.4635) data time 0.0006 (0.0022) model time 0.4575 (0.4606) loss 1.6625 (2.3812) grad_norm 2.5238 (4.0957) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][370/625] eta 0:01:58 lr 0.000012 wd 0.0500 time 0.4591 (0.4634) data time 0.0008 (0.0021) model time 0.4583 (0.4605) loss 2.4195 (2.3832) grad_norm 3.8704 (4.0678) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:42:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][380/625] eta 0:01:53 lr 0.000012 wd 0.0500 time 0.4578 (0.4633) data time 0.0008 (0.0021) model time 0.4570 (0.4604) loss 2.0712 (2.3818) grad_norm 2.9897 (4.1233) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][390/625] eta 0:01:48 lr 0.000012 wd 0.0500 time 0.4544 (0.4632) data time 0.0008 (0.0021) model time 0.4536 (0.4604) loss 2.5939 (2.3855) grad_norm 2.7958 (4.0987) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][400/625] eta 0:01:44 lr 0.000012 wd 0.0500 time 0.4591 (0.4630) data time 0.0008 (0.0020) model time 0.4584 (0.4603) loss 2.3628 (2.3867) grad_norm 2.0841 (4.1011) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][410/625] eta 0:01:39 lr 0.000012 wd 0.0500 time 0.4758 (0.4630) data time 0.0006 (0.0020) model time 0.4752 (0.4603) loss 2.6924 (2.3852) grad_norm 3.2500 (4.0766) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][420/625] eta 0:01:34 lr 0.000012 wd 0.0500 time 0.4559 (0.4629) data time 0.0006 (0.0020) model time 0.4553 (0.4602) loss 2.3935 (2.3823) grad_norm 4.5517 (4.0491) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][430/625] eta 0:01:30 lr 0.000012 wd 0.0500 time 0.4556 (0.4628) data time 0.0007 (0.0019) model time 0.4548 (0.4601) loss 1.6122 (2.3780) grad_norm 2.5233 (4.0463) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][440/625] eta 0:01:25 lr 0.000012 wd 0.0500 time 0.4616 (0.4635) data time 0.0006 (0.0019) model time 0.4609 (0.4610) loss 2.9005 (2.3826) grad_norm 6.0915 (4.2057) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][450/625] eta 0:01:21 lr 0.000012 wd 0.0500 time 0.4591 (0.4635) data time 0.0006 (0.0019) model time 0.4585 (0.4610) loss 1.5255 (2.3826) grad_norm 4.3015 (4.2044) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][460/625] eta 0:01:16 lr 0.000012 wd 0.0500 time 0.4576 (0.4634) data time 0.0006 (0.0019) model time 0.4570 (0.4610) loss 2.7705 (2.3815) grad_norm 2.9713 (4.1811) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][470/625] eta 0:01:11 lr 0.000012 wd 0.0500 time 0.4577 (0.4633) data time 0.0008 (0.0019) model time 0.4569 (0.4609) loss 2.1786 (2.3832) grad_norm 2.9828 (4.1685) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][480/625] eta 0:01:07 lr 0.000012 wd 0.0500 time 0.4586 (0.4631) data time 0.0008 (0.0018) model time 0.4579 (0.4607) loss 2.7896 (2.3792) grad_norm 2.7014 (4.1559) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][490/625] eta 0:01:02 lr 0.000012 wd 0.0500 time 0.4561 (0.4630) data time 0.0007 (0.0018) model time 0.4555 (0.4606) loss 2.8639 (2.3782) grad_norm 4.3493 (4.1433) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][500/625] eta 0:00:57 lr 0.000012 wd 0.0500 time 0.4621 (0.4629) data time 0.0006 (0.0018) model time 0.4615 (0.4605) loss 2.1468 (2.3778) grad_norm 3.6533 (4.1199) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:43:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][510/625] eta 0:00:53 lr 0.000012 wd 0.0500 time 0.4654 (0.4629) data time 0.0008 (0.0018) model time 0.4646 (0.4605) loss 2.5321 (2.3737) grad_norm 3.9333 (4.1043) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][520/625] eta 0:00:48 lr 0.000012 wd 0.0500 time 0.4587 (0.4628) data time 0.0006 (0.0018) model time 0.4581 (0.4604) loss 2.7555 (2.3749) grad_norm 7.3129 (4.0961) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][530/625] eta 0:00:43 lr 0.000012 wd 0.0500 time 0.4576 (0.4627) data time 0.0008 (0.0017) model time 0.4568 (0.4604) loss 1.8410 (2.3705) grad_norm 12.5556 (4.1000) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:12 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][540/625] eta 0:00:39 lr 0.000012 wd 0.0500 time 0.4580 (0.4627) data time 0.0007 (0.0017) model time 0.4573 (0.4604) loss 1.4523 (2.3717) grad_norm 3.8202 (4.0838) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][550/625] eta 0:00:34 lr 0.000012 wd 0.0500 time 0.4672 (0.4627) data time 0.0008 (0.0017) model time 0.4664 (0.4604) loss 2.6672 (2.3677) grad_norm 5.8038 (4.0767) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][560/625] eta 0:00:30 lr 0.000012 wd 0.0500 time 0.4547 (0.4626) data time 0.0008 (0.0017) model time 0.4539 (0.4604) loss 2.4691 (2.3675) grad_norm 3.7868 (4.0618) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][570/625] eta 0:00:25 lr 0.000012 wd 0.0500 time 0.4561 (0.4625) data time 0.0006 (0.0017) model time 0.4555 (0.4603) loss 2.7928 (2.3721) grad_norm 3.5985 (4.0683) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][580/625] eta 0:00:20 lr 0.000012 wd 0.0500 time 0.4596 (0.4625) data time 0.0008 (0.0017) model time 0.4588 (0.4603) loss 2.4190 (2.3719) grad_norm 2.7268 (4.0507) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][590/625] eta 0:00:16 lr 0.000012 wd 0.0500 time 0.4551 (0.4624) data time 0.0007 (0.0017) model time 0.4544 (0.4602) loss 2.7276 (2.3726) grad_norm 2.7561 (4.0363) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][600/625] eta 0:00:11 lr 0.000012 wd 0.0500 time 0.4562 (0.4623) data time 0.0006 (0.0016) model time 0.4556 (0.4602) loss 2.6829 (2.3759) grad_norm 5.9753 (4.0209) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][610/625] eta 0:00:06 lr 0.000012 wd 0.0500 time 0.4552 (0.4623) data time 0.0006 (0.0016) model time 0.4546 (0.4601) loss 2.1466 (2.3760) grad_norm 4.4358 (4.0281) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [296/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.4555 (0.4621) data time 0.0004 (0.0016) model time 0.4550 (0.4600) loss 2.5129 (2.3765) grad_norm 3.2546 (4.0111) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:44:51 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 296 training takes 0:04:48 [2024-08-11 15:44:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:44:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 15:44:53 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.467 (0.467) Loss 0.5352 (0.5352) Acc@1 88.818 (88.818) Acc@5 99.072 (99.072) Mem 16703MB [2024-08-11 15:44:54 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.154) Loss 0.8516 (0.6333) Acc@1 80.811 (86.923) Acc@5 96.436 (97.820) Mem 16703MB [2024-08-11 15:44:55 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.137) Loss 0.9409 (0.7571) Acc@1 79.248 (84.068) Acc@5 95.312 (96.677) Mem 16703MB [2024-08-11 15:44:56 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.811 Acc@5 96.593 [2024-08-11 15:44:56 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 15:44:56 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.782 (0.782) Loss 0.5303 (0.5303) Acc@1 88.965 (88.965) Acc@5 98.926 (98.926) Mem 16703MB [2024-08-11 15:44:58 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.184) Loss 0.8486 (0.6287) Acc@1 81.006 (87.021) Acc@5 96.338 (97.785) Mem 16703MB [2024-08-11 15:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.152) Loss 0.9321 (0.7512) Acc@1 79.248 (84.142) Acc@5 95.508 (96.668) Mem 16703MB [2024-08-11 15:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.893 Acc@5 96.599 [2024-08-11 15:44:59 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 15:45:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][0/625] eta 0:13:43 lr 0.000012 wd 0.0500 time 1.3170 (1.3170) data time 0.4167 (0.4167) model time 0.0000 (0.0000) loss 3.1693 (3.1693) grad_norm 2.9065 (2.9065) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][10/625] eta 0:05:30 lr 0.000012 wd 0.0500 time 0.4617 (0.5368) data time 0.0008 (0.0386) model time 0.0000 (0.0000) loss 2.4431 (2.5479) grad_norm 2.9332 (3.9094) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][20/625] eta 0:05:02 lr 0.000012 wd 0.0500 time 0.4568 (0.5002) data time 0.0006 (0.0205) model time 0.0000 (0.0000) loss 2.7030 (2.4496) grad_norm 2.0442 (3.5178) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][30/625] eta 0:04:50 lr 0.000012 wd 0.0500 time 0.4601 (0.4876) data time 0.0007 (0.0142) model time 0.0000 (0.0000) loss 1.5789 (2.4677) grad_norm 4.8478 (3.6593) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][40/625] eta 0:04:41 lr 0.000012 wd 0.0500 time 0.4605 (0.4807) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 2.2316 (2.4533) grad_norm 2.9508 (4.9417) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][50/625] eta 0:04:33 lr 0.000012 wd 0.0500 time 0.4581 (0.4763) data time 0.0007 (0.0089) model time 0.0000 (0.0000) loss 2.6049 (2.4664) grad_norm 3.0487 (4.8251) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][60/625] eta 0:04:28 lr 0.000012 wd 0.0500 time 0.4546 (0.4754) data time 0.0007 (0.0076) model time 0.4540 (0.4701) loss 2.7665 (2.4765) grad_norm 2.7639 (4.5737) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][70/625] eta 0:04:24 lr 0.000012 wd 0.0500 time 0.4684 (0.4759) data time 0.0006 (0.0067) model time 0.4677 (0.4741) loss 2.8296 (2.4847) grad_norm 3.7902 (4.6589) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][80/625] eta 0:04:18 lr 0.000012 wd 0.0500 time 0.4387 (0.4737) data time 0.0007 (0.0060) model time 0.4380 (0.4683) loss 2.5352 (2.4734) grad_norm 6.0530 (4.6203) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][90/625] eta 0:04:12 lr 0.000012 wd 0.0500 time 0.4601 (0.4719) data time 0.0007 (0.0054) model time 0.4594 (0.4655) loss 2.6497 (2.4685) grad_norm 2.0489 (4.5127) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][100/625] eta 0:04:07 lr 0.000012 wd 0.0500 time 0.4583 (0.4706) data time 0.0008 (0.0049) model time 0.4575 (0.4640) loss 2.4705 (2.4682) grad_norm 2.2919 (4.3830) loss_scale 128.0000 (128.0000) mem 16703MB [2024-08-11 15:45:51 vssm_base_ms_e300] (main_hfai_mnodes.py 379): INFO Suspend command received, saving checkpoint and exiting [2024-08-11 15:45:51 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 15:45:52 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 18:54:07 vssm_base_ms_e300] (main_hfai_mnodes.py 529): INFO Full config saved to ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/config.json [2024-08-11 18:54:08 vssm_base_ms_e300] (main_hfai_mnodes.py 129): INFO Creating model:vssm/vssm_base_ms_e300 [2024-08-11 18:54:22 vssm_base_ms_e300] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-08-11 18:54:36 vssm_base_ms_e300] (main_hfai_mnodes.py 193): INFO auto resuming from ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth [2024-08-11 18:54:36 vssm_base_ms_e300] (utils.py 21): INFO ==============> Resuming form ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth.................... [2024-08-11 18:54:38 vssm_base_ms_e300] (utils.py 30): INFO resuming model: [2024-08-11 18:54:40 vssm_base_ms_e300] (utils.py 37): INFO resuming model_ema: [2024-08-11 18:54:41 vssm_base_ms_e300] (utils.py 61): INFO => loaded successfully './exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth' (epoch 297) [2024-08-11 18:54:41 vssm_base_ms_e300] (main_hfai_mnodes.py 233): INFO Start training [2024-08-11 18:55:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][110/625] eta 1:36:52 lr 0.000012 wd 0.0500 time 1.2037 (11.2860) data time 0.0011 (0.4387) model time 1.2027 (10.8473) loss 2.7458 (2.7842) grad_norm 5.9799 (4.2791) loss_scale 128.0000 (128.0000) mem 16724MB [2024-08-11 18:55:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][120/625] eta 0:19:12 lr 0.000012 wd 0.0500 time 0.4798 (2.2817) data time 0.0008 (0.0741) model time 0.4789 (2.2076) loss 2.1450 (2.5760) grad_norm 3.1364 (3.7784) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-11 18:55:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][130/625] eta 0:12:03 lr 0.000012 wd 0.0500 time 0.4767 (1.4618) data time 0.0012 (0.0410) model time 0.4755 (1.4208) loss 2.3138 (2.5818) grad_norm 4.7892 (4.1517) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-11 18:55:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][140/625] eta 0:09:23 lr 0.000012 wd 0.0500 time 0.4787 (1.1619) data time 0.0008 (0.0285) model time 0.4779 (1.1333) loss 2.5379 (2.5652) grad_norm 3.3848 (3.7840) loss_scale 128.0000 (128.0000) mem 16721MB [2024-08-11 18:55:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][150/625] eta 0:07:56 lr 0.000012 wd 0.0500 time 0.4769 (1.0033) data time 0.0011 (0.0220) model time 0.4758 (0.9813) loss 2.5864 (2.5451) grad_norm 4.0127 (inf) loss_scale 64.0000 (115.8095) mem 16721MB [2024-08-11 18:55:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][160/625] eta 0:06:59 lr 0.000012 wd 0.0500 time 0.4749 (0.9020) data time 0.0008 (0.0180) model time 0.4740 (0.8840) loss 2.3026 (2.5185) grad_norm 3.4900 (inf) loss_scale 64.0000 (105.8462) mem 16721MB [2024-08-11 18:55:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][170/625] eta 0:06:19 lr 0.000012 wd 0.0500 time 0.4827 (0.8340) data time 0.0008 (0.0153) model time 0.4819 (0.8188) loss 2.6329 (2.5065) grad_norm 2.2637 (inf) loss_scale 64.0000 (99.0968) mem 16721MB [2024-08-11 18:55:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][180/625] eta 0:05:49 lr 0.000012 wd 0.0500 time 0.4816 (0.7851) data time 0.0011 (0.0133) model time 0.4805 (0.7718) loss 2.4690 (2.4663) grad_norm 3.4020 (inf) loss_scale 64.0000 (94.2222) mem 16721MB [2024-08-11 18:55:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][190/625] eta 0:05:25 lr 0.000012 wd 0.0500 time 0.4801 (0.7483) data time 0.0011 (0.0118) model time 0.4790 (0.7364) loss 2.5910 (2.4589) grad_norm 3.8105 (inf) loss_scale 64.0000 (90.5366) mem 16721MB [2024-08-11 18:55:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][200/625] eta 0:05:05 lr 0.000012 wd 0.0500 time 0.4770 (0.7193) data time 0.0008 (0.0107) model time 0.4762 (0.7086) loss 1.5694 (2.4424) grad_norm 3.0096 (inf) loss_scale 64.0000 (87.6522) mem 16721MB [2024-08-11 18:55:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][210/625] eta 0:04:48 lr 0.000012 wd 0.0500 time 0.4761 (0.6956) data time 0.0008 (0.0097) model time 0.4753 (0.6859) loss 2.8294 (2.4634) grad_norm 2.6499 (inf) loss_scale 64.0000 (85.3333) mem 16721MB [2024-08-11 18:56:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][220/625] eta 0:04:33 lr 0.000012 wd 0.0500 time 0.4775 (0.6763) data time 0.0012 (0.0090) model time 0.4763 (0.6673) loss 2.8697 (2.4731) grad_norm 3.4806 (inf) loss_scale 64.0000 (83.4286) mem 16721MB [2024-08-11 18:56:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][230/625] eta 0:04:20 lr 0.000012 wd 0.0500 time 0.4805 (0.6601) data time 0.0008 (0.0083) model time 0.4797 (0.6518) loss 2.6206 (2.4768) grad_norm 2.8287 (inf) loss_scale 64.0000 (81.8361) mem 16721MB [2024-08-11 18:56:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][240/625] eta 0:04:08 lr 0.000012 wd 0.0500 time 0.4827 (0.6466) data time 0.0011 (0.0078) model time 0.4816 (0.6388) loss 2.6084 (2.4729) grad_norm 6.1858 (inf) loss_scale 64.0000 (80.4848) mem 16721MB [2024-08-11 18:56:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][250/625] eta 0:03:58 lr 0.000012 wd 0.0500 time 0.4847 (0.6350) data time 0.0011 (0.0073) model time 0.4836 (0.6277) loss 2.4276 (2.4667) grad_norm 2.1262 (inf) loss_scale 64.0000 (79.3239) mem 16721MB [2024-08-11 18:56:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][260/625] eta 0:03:48 lr 0.000012 wd 0.0500 time 0.4830 (0.6251) data time 0.0011 (0.0069) model time 0.4819 (0.6182) loss 2.5285 (2.4594) grad_norm 2.2867 (inf) loss_scale 64.0000 (78.3158) mem 16721MB [2024-08-11 18:56:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][270/625] eta 0:03:38 lr 0.000012 wd 0.0500 time 0.4835 (0.6163) data time 0.0011 (0.0065) model time 0.4824 (0.6098) loss 2.5381 (2.4641) grad_norm 2.5051 (inf) loss_scale 64.0000 (77.4321) mem 16721MB [2024-08-11 18:56:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][280/625] eta 0:03:29 lr 0.000012 wd 0.0500 time 0.4775 (0.6083) data time 0.0011 (0.0062) model time 0.4764 (0.6021) loss 2.1118 (2.4592) grad_norm 2.6485 (inf) loss_scale 64.0000 (76.6512) mem 16721MB [2024-08-11 18:56:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][290/625] eta 0:03:21 lr 0.000012 wd 0.0500 time 0.4812 (0.6012) data time 0.0011 (0.0059) model time 0.4801 (0.5952) loss 2.4491 (2.4505) grad_norm 3.8802 (inf) loss_scale 64.0000 (75.9560) mem 16721MB [2024-08-11 18:56:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][300/625] eta 0:03:13 lr 0.000012 wd 0.0500 time 0.4826 (0.5956) data time 0.0012 (0.0057) model time 0.4814 (0.5899) loss 2.7529 (2.4488) grad_norm 2.6976 (inf) loss_scale 64.0000 (75.3333) mem 16721MB [2024-08-11 18:56:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][310/625] eta 0:03:05 lr 0.000012 wd 0.0500 time 0.4840 (0.5899) data time 0.0008 (0.0055) model time 0.4832 (0.5844) loss 2.7257 (2.4412) grad_norm 5.9732 (inf) loss_scale 64.0000 (74.7723) mem 16721MB [2024-08-11 18:56:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][320/625] eta 0:02:58 lr 0.000012 wd 0.0500 time 0.4817 (0.5848) data time 0.0011 (0.0053) model time 0.4806 (0.5796) loss 2.5823 (2.4346) grad_norm 2.4070 (inf) loss_scale 64.0000 (74.2642) mem 16721MB [2024-08-11 18:56:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][330/625] eta 0:02:51 lr 0.000012 wd 0.0500 time 0.4813 (0.5802) data time 0.0009 (0.0051) model time 0.4804 (0.5752) loss 2.4922 (2.4316) grad_norm 3.6284 (inf) loss_scale 64.0000 (73.8018) mem 16721MB [2024-08-11 18:56:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][340/625] eta 0:02:44 lr 0.000012 wd 0.0500 time 0.4862 (0.5761) data time 0.0008 (0.0049) model time 0.4854 (0.5712) loss 2.5380 (2.4255) grad_norm 1.9474 (inf) loss_scale 64.0000 (73.3793) mem 16721MB [2024-08-11 18:57:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][350/625] eta 0:02:37 lr 0.000012 wd 0.0500 time 0.4828 (0.5721) data time 0.0009 (0.0047) model time 0.4819 (0.5674) loss 2.7266 (2.4247) grad_norm 2.7553 (inf) loss_scale 64.0000 (72.9917) mem 16721MB [2024-08-11 18:57:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][360/625] eta 0:02:30 lr 0.000012 wd 0.0500 time 0.4812 (0.5684) data time 0.0008 (0.0046) model time 0.4804 (0.5638) loss 2.7905 (2.4199) grad_norm 3.8243 (inf) loss_scale 64.0000 (72.6349) mem 16721MB [2024-08-11 18:57:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][370/625] eta 0:02:24 lr 0.000012 wd 0.0500 time 0.4747 (0.5649) data time 0.0014 (0.0045) model time 0.4733 (0.5604) loss 2.8011 (2.4152) grad_norm 3.0902 (inf) loss_scale 64.0000 (72.3053) mem 16721MB [2024-08-11 18:57:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][380/625] eta 0:02:17 lr 0.000012 wd 0.0500 time 0.4813 (0.5618) data time 0.0012 (0.0043) model time 0.4801 (0.5574) loss 2.2395 (2.4093) grad_norm 5.4089 (inf) loss_scale 64.0000 (72.0000) mem 16721MB [2024-08-11 18:57:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][390/625] eta 0:02:11 lr 0.000012 wd 0.0500 time 0.4841 (0.5590) data time 0.0011 (0.0042) model time 0.4830 (0.5548) loss 2.0203 (2.4127) grad_norm 2.0932 (inf) loss_scale 64.0000 (71.7163) mem 16721MB [2024-08-11 18:57:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][400/625] eta 0:02:05 lr 0.000012 wd 0.0500 time 0.5689 (0.5567) data time 0.0011 (0.0041) model time 0.5678 (0.5526) loss 2.7748 (2.4116) grad_norm 4.7269 (inf) loss_scale 64.0000 (71.4521) mem 16721MB [2024-08-11 18:57:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][410/625] eta 0:01:59 lr 0.000012 wd 0.0500 time 0.4841 (0.5543) data time 0.0011 (0.0041) model time 0.4830 (0.5502) loss 2.5944 (2.4073) grad_norm 2.6764 (inf) loss_scale 64.0000 (71.2053) mem 16721MB [2024-08-11 18:57:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][420/625] eta 0:01:53 lr 0.000012 wd 0.0500 time 0.4788 (0.5520) data time 0.0011 (0.0040) model time 0.4776 (0.5480) loss 2.8724 (2.4074) grad_norm 1.6967 (inf) loss_scale 64.0000 (70.9744) mem 16721MB [2024-08-11 18:57:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][430/625] eta 0:01:47 lr 0.000012 wd 0.0500 time 0.4776 (0.5498) data time 0.0011 (0.0039) model time 0.4765 (0.5459) loss 2.2489 (2.4131) grad_norm 3.8613 (inf) loss_scale 64.0000 (70.7578) mem 16721MB [2024-08-11 18:57:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][440/625] eta 0:01:41 lr 0.000012 wd 0.0500 time 0.5005 (0.5477) data time 0.0011 (0.0038) model time 0.4994 (0.5439) loss 2.7130 (2.4098) grad_norm 4.5084 (inf) loss_scale 64.0000 (70.5542) mem 16721MB [2024-08-11 18:57:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][450/625] eta 0:01:35 lr 0.000012 wd 0.0500 time 0.4783 (0.5456) data time 0.0009 (0.0037) model time 0.4774 (0.5419) loss 2.6181 (2.4126) grad_norm 2.6176 (inf) loss_scale 64.0000 (70.3626) mem 16721MB [2024-08-11 18:57:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][460/625] eta 0:01:29 lr 0.000012 wd 0.0500 time 0.4831 (0.5438) data time 0.0008 (0.0036) model time 0.4822 (0.5402) loss 2.8850 (2.4152) grad_norm 15.1054 (inf) loss_scale 64.0000 (70.1818) mem 16721MB [2024-08-11 18:58:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][470/625] eta 0:01:24 lr 0.000012 wd 0.0500 time 0.4816 (0.5421) data time 0.0008 (0.0036) model time 0.4808 (0.5386) loss 3.0818 (2.4157) grad_norm 3.1835 (inf) loss_scale 64.0000 (70.0110) mem 16721MB [2024-08-11 18:58:07 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][480/625] eta 0:01:18 lr 0.000012 wd 0.0500 time 0.4845 (0.5412) data time 0.0011 (0.0035) model time 0.4834 (0.5377) loss 2.4941 (2.4119) grad_norm 4.3928 (inf) loss_scale 64.0000 (69.8495) mem 16721MB [2024-08-11 18:58:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][490/625] eta 0:01:12 lr 0.000012 wd 0.0500 time 0.4843 (0.5397) data time 0.0008 (0.0034) model time 0.4835 (0.5363) loss 2.7967 (2.4109) grad_norm 3.3376 (inf) loss_scale 64.0000 (69.6963) mem 16721MB [2024-08-11 18:58:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][500/625] eta 0:01:07 lr 0.000012 wd 0.0500 time 0.4817 (0.5382) data time 0.0008 (0.0034) model time 0.4809 (0.5349) loss 2.2497 (2.4063) grad_norm 3.1986 (inf) loss_scale 64.0000 (69.5510) mem 16721MB [2024-08-11 18:58:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][510/625] eta 0:01:01 lr 0.000012 wd 0.0500 time 0.4754 (0.5368) data time 0.0008 (0.0033) model time 0.4746 (0.5334) loss 2.6996 (2.4100) grad_norm 26.3909 (inf) loss_scale 64.0000 (69.4129) mem 16721MB [2024-08-11 18:58:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][520/625] eta 0:00:56 lr 0.000012 wd 0.0500 time 0.4827 (0.5354) data time 0.0011 (0.0033) model time 0.4815 (0.5321) loss 2.5978 (2.4143) grad_norm 2.4028 (inf) loss_scale 64.0000 (69.2816) mem 16721MB [2024-08-11 18:58:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][530/625] eta 0:00:50 lr 0.000012 wd 0.0500 time 0.4901 (0.5342) data time 0.0008 (0.0032) model time 0.4893 (0.5309) loss 2.2880 (2.4109) grad_norm 2.2298 (inf) loss_scale 64.0000 (69.1564) mem 16721MB [2024-08-11 18:58:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][540/625] eta 0:00:45 lr 0.000012 wd 0.0500 time 0.4867 (0.5330) data time 0.0009 (0.0032) model time 0.4858 (0.5298) loss 2.4281 (2.4133) grad_norm 2.3572 (inf) loss_scale 64.0000 (69.0370) mem 16721MB [2024-08-11 18:58:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][550/625] eta 0:00:39 lr 0.000012 wd 0.0500 time 0.4814 (0.5319) data time 0.0009 (0.0031) model time 0.4806 (0.5287) loss 2.7676 (2.4144) grad_norm 2.5775 (inf) loss_scale 64.0000 (68.9231) mem 16721MB [2024-08-11 18:58:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][560/625] eta 0:00:34 lr 0.000012 wd 0.0500 time 0.4818 (0.5308) data time 0.0011 (0.0031) model time 0.4807 (0.5277) loss 2.1397 (2.4144) grad_norm 3.5370 (inf) loss_scale 64.0000 (68.8142) mem 16721MB [2024-08-11 18:58:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][570/625] eta 0:00:29 lr 0.000012 wd 0.0500 time 0.4790 (0.5297) data time 0.0009 (0.0030) model time 0.4781 (0.5267) loss 1.9209 (2.4102) grad_norm 3.4384 (inf) loss_scale 64.0000 (68.7100) mem 16721MB [2024-08-11 18:58:55 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][580/625] eta 0:00:23 lr 0.000012 wd 0.0500 time 0.4812 (0.5287) data time 0.0011 (0.0030) model time 0.4801 (0.5257) loss 2.7068 (2.4079) grad_norm 3.3187 (inf) loss_scale 64.0000 (68.6102) mem 16721MB [2024-08-11 18:59:00 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][590/625] eta 0:00:18 lr 0.000012 wd 0.0500 time 0.4772 (0.5277) data time 0.0008 (0.0030) model time 0.4764 (0.5248) loss 2.6126 (2.4066) grad_norm 2.7290 (inf) loss_scale 64.0000 (68.5145) mem 16721MB [2024-08-11 18:59:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][600/625] eta 0:00:13 lr 0.000012 wd 0.0500 time 0.4844 (0.5267) data time 0.0012 (0.0029) model time 0.4833 (0.5238) loss 2.8040 (2.4096) grad_norm 3.9043 (inf) loss_scale 64.0000 (68.4228) mem 16721MB [2024-08-11 18:59:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][610/625] eta 0:00:07 lr 0.000012 wd 0.0500 time 0.4823 (0.5258) data time 0.0008 (0.0029) model time 0.4815 (0.5230) loss 2.4441 (2.4091) grad_norm 3.3426 (inf) loss_scale 64.0000 (68.3347) mem 16721MB [2024-08-11 18:59:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [297/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.4859 (0.5250) data time 0.0006 (0.0028) model time 0.4853 (0.5221) loss 2.5011 (2.4100) grad_norm 2.6018 (inf) loss_scale 64.0000 (68.2500) mem 16721MB [2024-08-11 18:59:16 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 297 training takes 0:04:30 [2024-08-11 18:59:16 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 18:59:22 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 18:59:22 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.513 (0.513) Loss 0.5366 (0.5366) Acc@1 88.770 (88.770) Acc@5 98.975 (98.975) Mem 16721MB [2024-08-11 18:59:24 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.160) Loss 0.8535 (0.6349) Acc@1 80.664 (86.901) Acc@5 96.436 (97.785) Mem 16721MB [2024-08-11 18:59:25 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.9399 (0.7590) Acc@1 79.492 (84.031) Acc@5 95.166 (96.622) Mem 16721MB [2024-08-11 18:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.759 Acc@5 96.553 [2024-08-11 18:59:28 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 18:59:29 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.883 (0.883) Loss 0.5303 (0.5303) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16721MB [2024-08-11 18:59:30 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.117 (0.196) Loss 0.8486 (0.6288) Acc@1 81.006 (87.003) Acc@5 96.338 (97.798) Mem 16721MB [2024-08-11 18:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.117 (0.158) Loss 0.9316 (0.7516) Acc@1 79.346 (84.131) Acc@5 95.459 (96.673) Mem 16721MB [2024-08-11 18:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.873 Acc@5 96.603 [2024-08-11 18:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 18:59:32 vssm_base_ms_e300] (main_hfai_mnodes.py 281): INFO New max accuracy ema: 83.87% [2024-08-11 18:59:32 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saving...... [2024-08-11 18:59:34 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/best_ckpt_ema.pth saved !!! [2024-08-11 18:59:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][0/625] eta 0:09:55 lr 0.000012 wd 0.0500 time 0.9535 (0.9535) data time 0.4037 (0.4037) model time 0.0000 (0.0000) loss 2.4702 (2.4702) grad_norm 2.6508 (2.6508) loss_scale 64.0000 (64.0000) mem 16725MB [2024-08-11 18:59:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][10/625] eta 0:05:20 lr 0.000012 wd 0.0500 time 0.4748 (0.5206) data time 0.0011 (0.0377) model time 0.0000 (0.0000) loss 2.9135 (2.1534) grad_norm 3.0194 (3.2644) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 18:59:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][20/625] eta 0:05:02 lr 0.000012 wd 0.0500 time 0.4673 (0.5000) data time 0.0009 (0.0203) model time 0.0000 (0.0000) loss 1.6802 (2.2199) grad_norm 3.3872 (3.8341) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 18:59:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][30/625] eta 0:04:53 lr 0.000012 wd 0.0500 time 0.4775 (0.4931) data time 0.0008 (0.0141) model time 0.0000 (0.0000) loss 2.2756 (2.3028) grad_norm 10.8927 (3.8191) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 18:59:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][40/625] eta 0:04:47 lr 0.000012 wd 0.0500 time 0.5079 (0.4909) data time 0.0009 (0.0109) model time 0.0000 (0.0000) loss 1.6417 (2.3419) grad_norm 6.3687 (3.6643) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 18:59:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][50/625] eta 0:04:40 lr 0.000012 wd 0.0500 time 0.4749 (0.4885) data time 0.0012 (0.0090) model time 0.0000 (0.0000) loss 2.4411 (2.3598) grad_norm 2.3691 (3.5743) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][60/625] eta 0:04:35 lr 0.000012 wd 0.0500 time 0.4749 (0.4876) data time 0.0008 (0.0079) model time 0.4741 (0.4806) loss 2.8869 (2.3784) grad_norm 2.9563 (3.6962) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][70/625] eta 0:04:31 lr 0.000012 wd 0.0500 time 0.4781 (0.4898) data time 0.0009 (0.0070) model time 0.4772 (0.4914) loss 2.6398 (2.4086) grad_norm 2.4453 (3.9219) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][80/625] eta 0:04:26 lr 0.000012 wd 0.0500 time 0.4749 (0.4887) data time 0.0009 (0.0062) model time 0.4741 (0.4875) loss 2.5855 (2.4194) grad_norm 3.1083 (3.8308) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][90/625] eta 0:04:20 lr 0.000012 wd 0.0500 time 0.4724 (0.4875) data time 0.0011 (0.0057) model time 0.4713 (0.4846) loss 2.4784 (2.4092) grad_norm 2.3428 (4.0144) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][100/625] eta 0:04:15 lr 0.000012 wd 0.0500 time 0.5040 (0.4875) data time 0.0011 (0.0056) model time 0.5029 (0.4842) loss 3.1138 (2.4238) grad_norm 4.4851 (3.9582) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][110/625] eta 0:04:10 lr 0.000012 wd 0.0500 time 0.4552 (0.4868) data time 0.0007 (0.0053) model time 0.4545 (0.4831) loss 2.4832 (2.4355) grad_norm 4.3598 (3.9425) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][120/625] eta 0:04:05 lr 0.000012 wd 0.0500 time 0.4781 (0.4862) data time 0.0009 (0.0050) model time 0.4772 (0.4825) loss 2.3779 (2.4391) grad_norm 2.9820 (3.8934) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][130/625] eta 0:04:00 lr 0.000012 wd 0.0500 time 0.4801 (0.4858) data time 0.0008 (0.0047) model time 0.4793 (0.4821) loss 2.4689 (2.4212) grad_norm 5.8167 (3.8597) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][140/625] eta 0:03:55 lr 0.000012 wd 0.0500 time 0.4832 (0.4853) data time 0.0011 (0.0044) model time 0.4821 (0.4817) loss 1.9463 (2.4199) grad_norm 2.4322 (3.8018) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][150/625] eta 0:03:50 lr 0.000012 wd 0.0500 time 0.4767 (0.4852) data time 0.0011 (0.0042) model time 0.4756 (0.4818) loss 2.5018 (2.4125) grad_norm 2.5074 (3.7487) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][160/625] eta 0:03:45 lr 0.000012 wd 0.0500 time 0.4782 (0.4849) data time 0.0009 (0.0040) model time 0.4773 (0.4815) loss 2.4366 (2.4220) grad_norm 2.5779 (3.7476) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:00:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][170/625] eta 0:03:40 lr 0.000012 wd 0.0500 time 0.4772 (0.4857) data time 0.0008 (0.0038) model time 0.4764 (0.4828) loss 2.6759 (2.4293) grad_norm 2.6668 (3.7146) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][180/625] eta 0:03:35 lr 0.000012 wd 0.0500 time 0.4818 (0.4852) data time 0.0011 (0.0037) model time 0.4807 (0.4823) loss 1.4151 (2.4133) grad_norm 4.5826 (3.6854) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][190/625] eta 0:03:30 lr 0.000012 wd 0.0500 time 0.4980 (0.4850) data time 0.0007 (0.0036) model time 0.4973 (0.4821) loss 1.8840 (2.4136) grad_norm 3.3229 (3.7034) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][200/625] eta 0:03:25 lr 0.000012 wd 0.0500 time 0.4743 (0.4846) data time 0.0011 (0.0034) model time 0.4732 (0.4818) loss 1.9332 (2.4018) grad_norm 2.3857 (3.6850) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][210/625] eta 0:03:21 lr 0.000012 wd 0.0500 time 0.4777 (0.4844) data time 0.0008 (0.0033) model time 0.4768 (0.4816) loss 1.7546 (2.3972) grad_norm 2.2658 (3.7081) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][220/625] eta 0:03:16 lr 0.000012 wd 0.0500 time 0.4763 (0.4842) data time 0.0009 (0.0033) model time 0.4754 (0.4813) loss 2.9029 (2.4119) grad_norm 3.7705 (3.6941) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][230/625] eta 0:03:11 lr 0.000012 wd 0.0500 time 0.4773 (0.4840) data time 0.0007 (0.0032) model time 0.4766 (0.4812) loss 2.9245 (2.4160) grad_norm 2.7094 (3.6700) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][240/625] eta 0:03:06 lr 0.000012 wd 0.0500 time 0.4871 (0.4838) data time 0.0011 (0.0031) model time 0.4860 (0.4810) loss 2.6145 (2.4085) grad_norm 2.2465 (3.6798) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][250/625] eta 0:03:01 lr 0.000012 wd 0.0500 time 0.4784 (0.4835) data time 0.0011 (0.0030) model time 0.4773 (0.4808) loss 2.3362 (2.4171) grad_norm 40.8650 (3.8121) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][260/625] eta 0:02:56 lr 0.000012 wd 0.0500 time 0.4773 (0.4840) data time 0.0011 (0.0029) model time 0.4762 (0.4814) loss 2.0831 (2.4207) grad_norm 2.8318 (3.8174) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][270/625] eta 0:02:51 lr 0.000012 wd 0.0500 time 0.4755 (0.4838) data time 0.0009 (0.0029) model time 0.4747 (0.4812) loss 2.6069 (2.4204) grad_norm 2.7350 (3.8142) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][280/625] eta 0:02:46 lr 0.000012 wd 0.0500 time 0.4766 (0.4836) data time 0.0008 (0.0028) model time 0.4758 (0.4810) loss 2.5837 (2.4249) grad_norm 2.8327 (3.7916) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][290/625] eta 0:02:41 lr 0.000012 wd 0.0500 time 0.4878 (0.4834) data time 0.0009 (0.0028) model time 0.4869 (0.4809) loss 1.5481 (2.4183) grad_norm 3.5367 (3.7705) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:01:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][300/625] eta 0:02:37 lr 0.000012 wd 0.0500 time 0.4818 (0.4833) data time 0.0011 (0.0027) model time 0.4808 (0.4808) loss 2.6702 (2.4169) grad_norm 4.1028 (3.7757) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][310/625] eta 0:02:32 lr 0.000012 wd 0.0500 time 0.4755 (0.4833) data time 0.0008 (0.0027) model time 0.4747 (0.4808) loss 2.1189 (2.4097) grad_norm 2.2394 (3.7478) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][320/625] eta 0:02:27 lr 0.000012 wd 0.0500 time 0.4818 (0.4837) data time 0.0011 (0.0026) model time 0.4808 (0.4814) loss 1.3960 (2.4056) grad_norm 2.8481 (3.7375) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:14 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][330/625] eta 0:02:22 lr 0.000012 wd 0.0500 time 0.5412 (0.4837) data time 0.0011 (0.0026) model time 0.5401 (0.4815) loss 2.7302 (2.4032) grad_norm 4.2361 (3.7272) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:19 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][340/625] eta 0:02:17 lr 0.000012 wd 0.0500 time 0.4755 (0.4836) data time 0.0012 (0.0025) model time 0.4743 (0.4813) loss 2.5197 (2.4039) grad_norm 2.3980 (3.7088) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][350/625] eta 0:02:12 lr 0.000012 wd 0.0500 time 0.4788 (0.4834) data time 0.0008 (0.0025) model time 0.4780 (0.4812) loss 2.2519 (2.4017) grad_norm 4.9467 (3.6928) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][360/625] eta 0:02:08 lr 0.000012 wd 0.0500 time 0.4871 (0.4833) data time 0.0008 (0.0025) model time 0.4863 (0.4811) loss 1.3906 (2.4009) grad_norm 3.0817 (3.6782) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:33 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][370/625] eta 0:02:03 lr 0.000012 wd 0.0500 time 0.4722 (0.4832) data time 0.0008 (0.0024) model time 0.4714 (0.4810) loss 2.1169 (2.3993) grad_norm 11.8531 (3.7373) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:38 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][380/625] eta 0:01:58 lr 0.000012 wd 0.0500 time 0.4764 (0.4831) data time 0.0008 (0.0024) model time 0.4756 (0.4808) loss 2.3863 (2.3971) grad_norm 1.8928 (3.7331) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:43 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][390/625] eta 0:01:53 lr 0.000012 wd 0.0500 time 0.4772 (0.4830) data time 0.0008 (0.0024) model time 0.4763 (0.4808) loss 1.4104 (2.3954) grad_norm 3.8516 (3.7506) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][400/625] eta 0:01:48 lr 0.000012 wd 0.0500 time 0.4874 (0.4830) data time 0.0011 (0.0023) model time 0.4863 (0.4808) loss 2.0961 (2.3944) grad_norm 3.0929 (3.7319) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][410/625] eta 0:01:43 lr 0.000012 wd 0.0500 time 0.4854 (0.4829) data time 0.0011 (0.0023) model time 0.4842 (0.4807) loss 2.5676 (2.3997) grad_norm 3.2039 (3.7384) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:02:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][420/625] eta 0:01:38 lr 0.000012 wd 0.0500 time 0.4817 (0.4828) data time 0.0011 (0.0023) model time 0.4806 (0.4806) loss 2.6304 (2.3994) grad_norm 2.1880 (3.7190) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:02 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][430/625] eta 0:01:34 lr 0.000012 wd 0.0500 time 0.4750 (0.4827) data time 0.0009 (0.0023) model time 0.4741 (0.4805) loss 1.7945 (2.3954) grad_norm 2.1522 (3.7259) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][440/625] eta 0:01:29 lr 0.000012 wd 0.0500 time 0.4746 (0.4824) data time 0.0012 (0.0022) model time 0.4735 (0.4803) loss 2.5210 (2.3930) grad_norm 3.6670 (3.7194) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][450/625] eta 0:01:24 lr 0.000012 wd 0.0500 time 0.4759 (0.4823) data time 0.0011 (0.0022) model time 0.4748 (0.4802) loss 2.5257 (2.3943) grad_norm 8.4083 (3.7351) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][460/625] eta 0:01:19 lr 0.000012 wd 0.0500 time 0.4814 (0.4822) data time 0.0010 (0.0022) model time 0.4804 (0.4801) loss 1.6303 (2.3984) grad_norm 2.5740 (3.7281) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:21 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][470/625] eta 0:01:14 lr 0.000012 wd 0.0500 time 0.4766 (0.4822) data time 0.0010 (0.0022) model time 0.4756 (0.4801) loss 2.5191 (2.3985) grad_norm 2.7717 (3.7248) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:26 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][480/625] eta 0:01:09 lr 0.000012 wd 0.0500 time 0.4753 (0.4825) data time 0.0008 (0.0022) model time 0.4745 (0.4804) loss 1.6810 (2.3992) grad_norm 3.2487 (3.7047) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:31 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][490/625] eta 0:01:05 lr 0.000012 wd 0.0500 time 0.4751 (0.4824) data time 0.0011 (0.0021) model time 0.4740 (0.4803) loss 2.3809 (2.4006) grad_norm 2.1756 (3.6867) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][500/625] eta 0:01:00 lr 0.000012 wd 0.0500 time 0.4734 (0.4823) data time 0.0011 (0.0021) model time 0.4722 (0.4802) loss 1.7990 (2.4027) grad_norm 5.1233 (3.6977) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][510/625] eta 0:00:55 lr 0.000012 wd 0.0500 time 0.4747 (0.4821) data time 0.0010 (0.0021) model time 0.4737 (0.4801) loss 2.6692 (2.3996) grad_norm 2.2767 (3.7123) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][520/625] eta 0:00:50 lr 0.000012 wd 0.0500 time 0.4802 (0.4820) data time 0.0010 (0.0021) model time 0.4792 (0.4800) loss 2.7249 (2.4008) grad_norm 2.6421 (3.7161) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][530/625] eta 0:00:45 lr 0.000012 wd 0.0500 time 0.4737 (0.4819) data time 0.0008 (0.0021) model time 0.4729 (0.4799) loss 2.1515 (2.4024) grad_norm 3.6566 (3.7157) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][540/625] eta 0:00:40 lr 0.000012 wd 0.0500 time 0.4740 (0.4818) data time 0.0011 (0.0020) model time 0.4729 (0.4798) loss 2.5943 (2.4056) grad_norm 3.6331 (3.7199) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:03:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][550/625] eta 0:00:36 lr 0.000012 wd 0.0500 time 0.4757 (0.4817) data time 0.0010 (0.0020) model time 0.4747 (0.4797) loss 2.2096 (2.4067) grad_norm 3.5639 (3.7095) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][560/625] eta 0:00:31 lr 0.000012 wd 0.0500 time 0.4755 (0.4816) data time 0.0011 (0.0020) model time 0.4744 (0.4796) loss 2.8543 (2.4047) grad_norm 2.9431 (3.6920) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][570/625] eta 0:00:26 lr 0.000012 wd 0.0500 time 0.4751 (0.4815) data time 0.0008 (0.0020) model time 0.4743 (0.4795) loss 2.2502 (2.4041) grad_norm 2.7059 (3.6787) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][580/625] eta 0:00:21 lr 0.000012 wd 0.0500 time 0.4708 (0.4814) data time 0.0008 (0.0020) model time 0.4700 (0.4794) loss 3.4068 (2.4053) grad_norm 2.3625 (3.6944) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][590/625] eta 0:00:16 lr 0.000012 wd 0.0500 time 0.4781 (0.4813) data time 0.0009 (0.0020) model time 0.4771 (0.4793) loss 2.4189 (2.4051) grad_norm 4.7443 (3.7185) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][600/625] eta 0:00:12 lr 0.000012 wd 0.0500 time 0.4784 (0.4812) data time 0.0011 (0.0019) model time 0.4774 (0.4792) loss 2.6396 (2.4063) grad_norm 3.0331 (3.7578) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][610/625] eta 0:00:07 lr 0.000012 wd 0.0500 time 0.4720 (0.4811) data time 0.0005 (0.0019) model time 0.4714 (0.4791) loss 2.7850 (2.4068) grad_norm 2.6850 (3.7459) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [298/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.4756 (0.4810) data time 0.0007 (0.0019) model time 0.4748 (0.4790) loss 1.8702 (2.4068) grad_norm 3.9039 (3.8269) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:34 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 298 training takes 0:05:00 [2024-08-11 19:04:34 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 19:04:36 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 19:04:37 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.517 (0.517) Loss 0.5288 (0.5288) Acc@1 89.111 (89.111) Acc@5 99.023 (99.023) Mem 16721MB [2024-08-11 19:04:38 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.120 (0.161) Loss 0.8564 (0.6321) Acc@1 80.713 (86.923) Acc@5 96.387 (97.803) Mem 16721MB [2024-08-11 19:04:39 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.141) Loss 0.9404 (0.7574) Acc@1 79.297 (84.033) Acc@5 95.410 (96.656) Mem 16721MB [2024-08-11 19:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.753 Acc@5 96.587 [2024-08-11 19:04:40 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 19:04:41 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.885 (0.885) Loss 0.5308 (0.5308) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16721MB [2024-08-11 19:04:42 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.195) Loss 0.8486 (0.6292) Acc@1 80.957 (86.990) Acc@5 96.387 (97.798) Mem 16721MB [2024-08-11 19:04:43 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.119 (0.159) Loss 0.9331 (0.7522) Acc@1 79.297 (84.129) Acc@5 95.410 (96.654) Mem 16721MB [2024-08-11 19:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.871 Acc@5 96.587 [2024-08-11 19:04:44 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.9% [2024-08-11 19:04:45 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][0/625] eta 0:13:01 lr 0.000012 wd 0.0500 time 1.2511 (1.2511) data time 0.6551 (0.6551) model time 0.0000 (0.0000) loss 2.6678 (2.6678) grad_norm 2.7870 (2.7870) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:50 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][10/625] eta 0:05:36 lr 0.000012 wd 0.0500 time 0.4773 (0.5466) data time 0.0010 (0.0607) model time 0.0000 (0.0000) loss 2.4891 (2.4640) grad_norm 2.4270 (2.8893) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][20/625] eta 0:05:10 lr 0.000012 wd 0.0500 time 0.4756 (0.5131) data time 0.0009 (0.0323) model time 0.0000 (0.0000) loss 2.2854 (2.4127) grad_norm 2.8245 (3.0726) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:04:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][30/625] eta 0:04:57 lr 0.000012 wd 0.0500 time 0.4738 (0.5008) data time 0.0009 (0.0223) model time 0.0000 (0.0000) loss 2.1505 (2.4276) grad_norm 3.0249 (3.1509) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:04 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][40/625] eta 0:04:49 lr 0.000012 wd 0.0500 time 0.4757 (0.4947) data time 0.0008 (0.0171) model time 0.0000 (0.0000) loss 2.9170 (2.4952) grad_norm 4.4572 (3.2371) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:09 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][50/625] eta 0:04:42 lr 0.000012 wd 0.0500 time 0.4768 (0.4914) data time 0.0011 (0.0140) model time 0.0000 (0.0000) loss 2.1188 (2.5047) grad_norm 2.6950 (3.1901) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][60/625] eta 0:04:36 lr 0.000012 wd 0.0500 time 0.4765 (0.4895) data time 0.0011 (0.0119) model time 0.4754 (0.4785) loss 1.4123 (2.5030) grad_norm 2.7884 (3.2744) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][70/625] eta 0:04:30 lr 0.000012 wd 0.0500 time 0.4800 (0.4879) data time 0.0010 (0.0104) model time 0.4791 (0.4776) loss 3.0660 (2.5026) grad_norm 5.0679 (3.3006) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:23 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][80/625] eta 0:04:25 lr 0.000012 wd 0.0500 time 0.4811 (0.4868) data time 0.0011 (0.0092) model time 0.4801 (0.4779) loss 2.8651 (2.4684) grad_norm 4.3862 (3.2560) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:28 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][90/625] eta 0:04:19 lr 0.000012 wd 0.0500 time 0.4743 (0.4857) data time 0.0008 (0.0083) model time 0.4735 (0.4772) loss 1.7710 (2.4524) grad_norm 3.8625 (3.3226) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][100/625] eta 0:04:14 lr 0.000012 wd 0.0500 time 0.4722 (0.4846) data time 0.0009 (0.0076) model time 0.4713 (0.4766) loss 2.7667 (2.4492) grad_norm 4.5707 (3.2974) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][110/625] eta 0:04:09 lr 0.000012 wd 0.0500 time 0.4691 (0.4837) data time 0.0011 (0.0071) model time 0.4681 (0.4760) loss 2.4403 (2.4305) grad_norm 2.4821 (3.2895) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:42 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][120/625] eta 0:04:04 lr 0.000012 wd 0.0500 time 0.4780 (0.4833) data time 0.0008 (0.0066) model time 0.4772 (0.4762) loss 2.5112 (2.4448) grad_norm 3.8649 (3.3469) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:47 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][130/625] eta 0:03:59 lr 0.000012 wd 0.0500 time 0.4752 (0.4829) data time 0.0011 (0.0062) model time 0.4741 (0.4764) loss 1.9411 (2.4383) grad_norm 3.5916 (3.4537) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:52 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][140/625] eta 0:03:54 lr 0.000012 wd 0.0500 time 0.4780 (0.4826) data time 0.0012 (0.0058) model time 0.4768 (0.4765) loss 2.6827 (2.4539) grad_norm 1.9340 (3.4540) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:05:57 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][150/625] eta 0:03:49 lr 0.000012 wd 0.0500 time 0.4735 (0.4833) data time 0.0011 (0.0055) model time 0.4724 (0.4780) loss 1.9956 (2.4592) grad_norm 4.6603 (3.4176) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][160/625] eta 0:03:44 lr 0.000012 wd 0.0500 time 0.4729 (0.4828) data time 0.0009 (0.0052) model time 0.4720 (0.4777) loss 2.0850 (2.4621) grad_norm 2.0610 (3.5269) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:06 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][170/625] eta 0:03:39 lr 0.000012 wd 0.0500 time 0.4730 (0.4823) data time 0.0010 (0.0050) model time 0.4719 (0.4773) loss 2.4323 (2.4567) grad_norm 3.5041 (3.4856) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:11 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][180/625] eta 0:03:34 lr 0.000012 wd 0.0500 time 0.4728 (0.4818) data time 0.0009 (0.0048) model time 0.4719 (0.4769) loss 2.1431 (2.4423) grad_norm 3.5230 (3.4798) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:16 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][190/625] eta 0:03:29 lr 0.000012 wd 0.0500 time 0.4765 (0.4825) data time 0.0009 (0.0046) model time 0.4755 (0.4781) loss 2.1304 (2.4449) grad_norm 2.7646 (3.4766) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][200/625] eta 0:03:24 lr 0.000012 wd 0.0500 time 0.4768 (0.4822) data time 0.0012 (0.0044) model time 0.4756 (0.4779) loss 1.6405 (2.4456) grad_norm 3.1645 (3.4627) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][210/625] eta 0:03:20 lr 0.000012 wd 0.0500 time 0.4772 (0.4819) data time 0.0008 (0.0043) model time 0.4764 (0.4778) loss 2.7733 (2.4487) grad_norm 3.6962 (3.4442) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:30 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][220/625] eta 0:03:15 lr 0.000012 wd 0.0500 time 0.4794 (0.4817) data time 0.0011 (0.0041) model time 0.4783 (0.4776) loss 2.4182 (2.4530) grad_norm 3.0783 (3.4250) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:35 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][230/625] eta 0:03:10 lr 0.000012 wd 0.0500 time 0.4795 (0.4816) data time 0.0011 (0.0040) model time 0.4784 (0.4777) loss 2.4097 (2.4574) grad_norm 5.7634 (3.4869) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:40 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][240/625] eta 0:03:05 lr 0.000012 wd 0.0500 time 0.4764 (0.4814) data time 0.0012 (0.0039) model time 0.4752 (0.4775) loss 2.3512 (2.4538) grad_norm 3.3930 (3.4827) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][250/625] eta 0:03:00 lr 0.000012 wd 0.0500 time 0.4794 (0.4813) data time 0.0008 (0.0038) model time 0.4786 (0.4775) loss 1.7586 (2.4509) grad_norm 4.2114 (3.4765) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][260/625] eta 0:02:55 lr 0.000012 wd 0.0500 time 0.4752 (0.4811) data time 0.0008 (0.0037) model time 0.4744 (0.4774) loss 2.4904 (2.4523) grad_norm 2.7759 (3.5546) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:54 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][270/625] eta 0:02:50 lr 0.000012 wd 0.0500 time 0.4735 (0.4808) data time 0.0008 (0.0036) model time 0.4727 (0.4773) loss 2.5949 (2.4551) grad_norm 2.3624 (3.5475) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:06:59 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][280/625] eta 0:02:45 lr 0.000012 wd 0.0500 time 0.4730 (0.4807) data time 0.0010 (0.0035) model time 0.4720 (0.4772) loss 2.7255 (2.4556) grad_norm 3.6728 (3.5405) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][290/625] eta 0:02:40 lr 0.000012 wd 0.0500 time 0.4783 (0.4805) data time 0.0008 (0.0034) model time 0.4775 (0.4771) loss 2.7242 (2.4538) grad_norm 3.1870 (3.5433) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][300/625] eta 0:02:36 lr 0.000012 wd 0.0500 time 0.4782 (0.4804) data time 0.0009 (0.0033) model time 0.4774 (0.4770) loss 2.7996 (2.4543) grad_norm 4.3616 (3.5416) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][310/625] eta 0:02:31 lr 0.000012 wd 0.0500 time 0.4777 (0.4803) data time 0.0011 (0.0032) model time 0.4766 (0.4770) loss 2.8691 (2.4584) grad_norm 2.9587 (3.5540) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:18 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][320/625] eta 0:02:26 lr 0.000012 wd 0.0500 time 0.4733 (0.4802) data time 0.0008 (0.0032) model time 0.4724 (0.4769) loss 2.9545 (2.4610) grad_norm 3.8141 (3.5477) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][330/625] eta 0:02:21 lr 0.000012 wd 0.0500 time 0.4728 (0.4800) data time 0.0012 (0.0031) model time 0.4716 (0.4768) loss 2.7420 (2.4568) grad_norm 9.9369 (3.5769) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][340/625] eta 0:02:16 lr 0.000012 wd 0.0500 time 0.4798 (0.4800) data time 0.0009 (0.0031) model time 0.4789 (0.4768) loss 1.7697 (2.4470) grad_norm 3.6992 (3.5732) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][350/625] eta 0:02:11 lr 0.000012 wd 0.0500 time 0.4777 (0.4799) data time 0.0009 (0.0030) model time 0.4768 (0.4768) loss 2.4272 (2.4390) grad_norm 11.4748 (3.5710) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:37 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][360/625] eta 0:02:07 lr 0.000012 wd 0.0500 time 0.4763 (0.4798) data time 0.0008 (0.0029) model time 0.4754 (0.4767) loss 2.3245 (2.4373) grad_norm 2.6348 (3.5727) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][370/625] eta 0:02:02 lr 0.000012 wd 0.0500 time 0.4724 (0.4797) data time 0.0008 (0.0029) model time 0.4715 (0.4767) loss 2.7398 (2.4402) grad_norm 2.2464 (3.6046) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:46 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][380/625] eta 0:01:57 lr 0.000012 wd 0.0500 time 0.4699 (0.4795) data time 0.0008 (0.0029) model time 0.4691 (0.4766) loss 2.5053 (2.4427) grad_norm 3.0845 (3.6297) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:51 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][390/625] eta 0:01:52 lr 0.000012 wd 0.0500 time 0.4719 (0.4794) data time 0.0011 (0.0028) model time 0.4708 (0.4765) loss 2.5505 (2.4401) grad_norm 3.5802 (3.6507) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:07:56 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][400/625] eta 0:01:47 lr 0.000012 wd 0.0500 time 0.4721 (0.4793) data time 0.0011 (0.0028) model time 0.4710 (0.4764) loss 2.6918 (2.4422) grad_norm 5.0579 (3.6404) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:01 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][410/625] eta 0:01:43 lr 0.000012 wd 0.0500 time 0.4801 (0.4797) data time 0.0009 (0.0027) model time 0.4793 (0.4769) loss 1.8986 (2.4347) grad_norm 3.2269 (3.6499) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:05 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][420/625] eta 0:01:38 lr 0.000012 wd 0.0500 time 0.4801 (0.4797) data time 0.0011 (0.0027) model time 0.4790 (0.4769) loss 1.7323 (2.4328) grad_norm 2.4428 (3.6420) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:10 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][430/625] eta 0:01:33 lr 0.000012 wd 0.0500 time 0.4758 (0.4796) data time 0.0009 (0.0026) model time 0.4749 (0.4769) loss 2.8390 (2.4313) grad_norm 2.8894 (3.6346) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:15 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][440/625] eta 0:01:28 lr 0.000012 wd 0.0500 time 0.4848 (0.4796) data time 0.0008 (0.0026) model time 0.4839 (0.4769) loss 2.3545 (2.4295) grad_norm 3.1565 (3.6272) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:20 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][450/625] eta 0:01:23 lr 0.000012 wd 0.0500 time 0.4762 (0.4795) data time 0.0011 (0.0026) model time 0.4751 (0.4769) loss 2.4480 (2.4313) grad_norm 3.2996 (3.6533) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:25 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][460/625] eta 0:01:19 lr 0.000012 wd 0.0500 time 0.4851 (0.4795) data time 0.0011 (0.0025) model time 0.4841 (0.4769) loss 2.4314 (2.4349) grad_norm 5.9784 (3.6775) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:29 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][470/625] eta 0:01:14 lr 0.000012 wd 0.0500 time 0.4780 (0.4795) data time 0.0008 (0.0025) model time 0.4772 (0.4769) loss 2.8038 (2.4364) grad_norm 3.7401 (3.6690) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:34 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][480/625] eta 0:01:09 lr 0.000012 wd 0.0500 time 0.4775 (0.4798) data time 0.0011 (0.0025) model time 0.4764 (0.4772) loss 2.6506 (2.4350) grad_norm 3.4044 (3.6549) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:39 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][490/625] eta 0:01:04 lr 0.000012 wd 0.0500 time 0.4765 (0.4797) data time 0.0011 (0.0025) model time 0.4755 (0.4772) loss 2.5256 (2.4366) grad_norm 10.3537 (3.9708) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:44 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][500/625] eta 0:00:59 lr 0.000012 wd 0.0500 time 0.4770 (0.4796) data time 0.0011 (0.0024) model time 0.4759 (0.4772) loss 2.3514 (2.4357) grad_norm 3.7325 (3.9525) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:49 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][510/625] eta 0:00:55 lr 0.000012 wd 0.0500 time 0.4817 (0.4796) data time 0.0009 (0.0024) model time 0.4808 (0.4772) loss 1.5785 (2.4265) grad_norm 2.1121 (3.9413) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:53 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][520/625] eta 0:00:50 lr 0.000012 wd 0.0500 time 0.4755 (0.4795) data time 0.0012 (0.0024) model time 0.4743 (0.4771) loss 2.1684 (2.4213) grad_norm 2.5975 (3.9320) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:08:58 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][530/625] eta 0:00:45 lr 0.000012 wd 0.0500 time 0.4725 (0.4795) data time 0.0011 (0.0024) model time 0.4713 (0.4771) loss 1.5760 (2.4191) grad_norm 2.7141 (3.9091) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:03 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][540/625] eta 0:00:40 lr 0.000012 wd 0.0500 time 0.4756 (0.4794) data time 0.0011 (0.0023) model time 0.4745 (0.4770) loss 1.8537 (2.4178) grad_norm 2.6913 (3.9045) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:08 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][550/625] eta 0:00:35 lr 0.000012 wd 0.0500 time 0.4747 (0.4794) data time 0.0009 (0.0023) model time 0.4737 (0.4770) loss 2.8444 (2.4190) grad_norm 3.3613 (3.8884) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:13 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][560/625] eta 0:00:31 lr 0.000012 wd 0.0500 time 0.4779 (0.4797) data time 0.0011 (0.0023) model time 0.4768 (0.4774) loss 2.9546 (2.4241) grad_norm 4.3186 (3.8796) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:17 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][570/625] eta 0:00:26 lr 0.000012 wd 0.0500 time 0.4789 (0.4796) data time 0.0010 (0.0023) model time 0.4779 (0.4773) loss 2.7304 (2.4225) grad_norm 2.4987 (3.8666) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:22 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][580/625] eta 0:00:21 lr 0.000012 wd 0.0500 time 0.4833 (0.4796) data time 0.0011 (0.0023) model time 0.4822 (0.4773) loss 2.7005 (2.4203) grad_norm 2.9346 (3.8598) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:27 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][590/625] eta 0:00:16 lr 0.000012 wd 0.0500 time 0.4753 (0.4795) data time 0.0009 (0.0022) model time 0.4745 (0.4773) loss 2.6461 (2.4221) grad_norm 2.4919 (3.8476) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:32 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][600/625] eta 0:00:11 lr 0.000012 wd 0.0500 time 0.4787 (0.4795) data time 0.0011 (0.0022) model time 0.4776 (0.4772) loss 2.4608 (2.4207) grad_norm 2.2182 (3.8484) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:36 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][610/625] eta 0:00:07 lr 0.000012 wd 0.0500 time 0.4745 (0.4794) data time 0.0008 (0.0022) model time 0.4737 (0.4772) loss 2.2795 (2.4171) grad_norm 4.5782 (3.8444) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:41 vssm_base_ms_e300] (main_hfai_mnodes.py 367): INFO Train: [299/300][620/625] eta 0:00:02 lr 0.000012 wd 0.0500 time 0.4750 (0.4793) data time 0.0008 (0.0022) model time 0.4741 (0.4771) loss 2.5128 (2.4155) grad_norm 12.3402 (3.8480) loss_scale 64.0000 (64.0000) mem 16721MB [2024-08-11 19:09:43 vssm_base_ms_e300] (main_hfai_mnodes.py 394): INFO EPOCH 299 training takes 0:04:59 [2024-08-11 19:09:43 vssm_base_ms_e300] (utils.py 118): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saving...... [2024-08-11 19:09:45 vssm_base_ms_e300] (utils.py 120): INFO ./exclude/output_msvmamba/vssm_base_ms_e300/20240804093509/latest_ckpt.pth saved !!! [2024-08-11 19:09:45 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.518 (0.518) Loss 0.5352 (0.5352) Acc@1 88.770 (88.770) Acc@5 99.072 (99.072) Mem 16721MB [2024-08-11 19:09:46 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.118 (0.160) Loss 0.8418 (0.6316) Acc@1 81.152 (87.012) Acc@5 96.533 (97.820) Mem 16721MB [2024-08-11 19:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.140) Loss 0.9336 (0.7577) Acc@1 79.443 (84.068) Acc@5 95.410 (96.654) Mem 16721MB [2024-08-11 19:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.805 Acc@5 96.577 [2024-08-11 19:09:48 vssm_base_ms_e300] (main_hfai_mnodes.py 257): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 19:09:49 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [0/25] Time 0.934 (0.934) Loss 0.5308 (0.5308) Acc@1 88.965 (88.965) Acc@5 98.975 (98.975) Mem 16721MB [2024-08-11 19:09:50 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [10/25] Time 0.119 (0.200) Loss 0.8486 (0.6294) Acc@1 80.908 (86.985) Acc@5 96.338 (97.803) Mem 16721MB [2024-08-11 19:09:51 vssm_base_ms_e300] (main_hfai_mnodes.py 438): INFO Test: [20/25] Time 0.118 (0.161) Loss 0.9336 (0.7526) Acc@1 79.297 (84.101) Acc@5 95.459 (96.661) Mem 16721MB [2024-08-11 19:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 445): INFO * Acc@1 83.843 Acc@5 96.593 [2024-08-11 19:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 272): INFO Accuracy of the network on the 50000 test images: 83.8% [2024-08-11 19:09:52 vssm_base_ms_e300] (main_hfai_mnodes.py 291): INFO Training time 0:15:11