SentenceTransformer based on Shuu12121/CodeModernBERT-Owl-4.1

This is a sentence-transformers model finetuned from Shuu12121/CodeModernBERT-Owl-4.1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Shuu12121/CodeModernBERT-Owl-4.1
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Register the router instance.\n\n@return void',
    "protected function registerRouter()\n\t{\n\t\t$this->app['router'] = $this->app->share(function($app)\n\t\t{\n\t\t\treturn new Router($app['events'], $app);\n\t\t});\n\t}",
    'func (d *Driver) DiffSize(id string, idMappings *idtools.IDMappings, parent string, parentMappings *idtools.IDMappings, mountLabel string) (size int64, err error) {\n\tif d.useNaiveDiff() || !d.isParent(id, parent) {\n\t\treturn d.naiveDiff.DiffSize(id, idMappings, parent, parentMappings, mountLabel)\n\t}\n\treturn directory.Size(d.getDiffPath(id))\n}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,960,000 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 50.31 tokens
    • max: 1024 tokens
    • min: 28 tokens
    • mean: 164.73 tokens
    • max: 1024 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    // GetNodeID returns the NodeID field if it's non-nil, zero value otherwise. func (a *App) GetNodeID() string {
    if a == nil
    // SignVote signs a canonical representation of the vote, along with the
    // chainID. Implements PrivValidator.
    func (pv *FilePV) SignVote(chainID string, vote *types.Vote) error {
    if err := pv.signVote(chainID, vote); err != nil {
    return fmt.Errorf("error signing vote: %v", err)
    }
    return nil
    }
    1.0
    //GetQyAccessToken ่Žทๅ–access_token func (ctx *Context) GetQyAccessToken() (accessToken string, err error) {
    ctx.accessTokenLock.Lock()
    defer ctx.accessTokenLock.Unlock()

    accessTokenCacheKey := fmt.Sprintf("qy_access_token_%s", ctx.AppID)
    val := ctx.Cache.Get(accessTokenCacheKey)
    if val != nil {
    accessToken = val.(string)
    return
    }

    //ไปŽๅพฎไฟกๆœๅŠกๅ™จ่Žทๅ–
    var resQyAccessToken ResQyAccessToken
    resQyAccessToken, err = ctx.GetQyAccessTokenFromServer()
    if err != nil {
    return
    }

    accessToken = resQyAccessToken.AccessToken
    return
    }
    1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 250
  • per_device_eval_batch_size: 250
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 250
  • per_device_eval_batch_size: 250
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0180 500 0.9746
0.0359 1000 0.1636
0.0539 1500 0.1502
0.0718 2000 0.1374
0.0898 2500 0.1314
0.1078 3000 0.1241
0.1257 3500 0.1152
0.1437 4000 0.1146
0.1616 4500 0.1065
0.1796 5000 0.1014
0.1976 5500 0.0983
0.2155 6000 0.0987
0.2335 6500 0.0917
0.2514 7000 0.0912
0.2694 7500 0.0896
0.2874 8000 0.086
0.3053 8500 0.0811
0.3233 9000 0.0813
0.3412 9500 0.082
0.3592 10000 0.0759
0.3772 10500 0.0753
0.3951 11000 0.0722
0.4131 11500 0.0707
0.4310 12000 0.0699
0.4490 12500 0.0698
0.4670 13000 0.0679
0.4849 13500 0.0653
0.5029 14000 0.0641
0.5208 14500 0.063
0.5388 15000 0.0621
0.5568 15500 0.061
0.5747 16000 0.0581
0.5927 16500 0.0555
0.6106 17000 0.0552
0.6286 17500 0.0551
0.6466 18000 0.0533
0.6645 18500 0.0521
0.6825 19000 0.051
0.7004 19500 0.0509
0.7184 20000 0.0499
0.7364 20500 0.0468
0.7543 21000 0.0484
0.7723 21500 0.0466
0.7902 22000 0.0446
0.8082 22500 0.0453
0.8261 23000 0.0442
0.8441 23500 0.0424
0.8621 24000 0.0434
0.8800 24500 0.0416
0.8980 25000 0.0406
0.9159 25500 0.0404
0.9339 26000 0.0398
0.9519 26500 0.0406
0.9698 27000 0.0387
0.9878 27500 0.0386
1.0057 28000 0.0311
1.0237 28500 0.0193
1.0417 29000 0.0197
1.0596 29500 0.0186
1.0776 30000 0.0192
1.0955 30500 0.0194
1.1135 31000 0.0196
1.1315 31500 0.0198
1.1494 32000 0.0203
1.1674 32500 0.02
1.1853 33000 0.0184
1.2033 33500 0.0181
1.2213 34000 0.0195
1.2392 34500 0.0186
1.2572 35000 0.0184
1.2751 35500 0.0184
1.2931 36000 0.0194
1.3111 36500 0.0191
1.3290 37000 0.0183
1.3470 37500 0.0179
1.3649 38000 0.0179
1.3829 38500 0.0178
1.4009 39000 0.018
1.4188 39500 0.0182
1.4368 40000 0.0188
1.4547 40500 0.0172
1.4727 41000 0.0169
1.4907 41500 0.0173
1.5086 42000 0.0166
1.5266 42500 0.0157
1.5445 43000 0.0168
1.5625 43500 0.0158
1.5805 44000 0.016
1.5984 44500 0.0166
1.6164 45000 0.0168
1.6343 45500 0.0162
1.6523 46000 0.0153
1.6703 46500 0.0149
1.6882 47000 0.0158
1.7062 47500 0.0152
1.7241 48000 0.0147
1.7421 48500 0.0146
1.7601 49000 0.0145
1.7780 49500 0.0148
1.7960 50000 0.015
1.8139 50500 0.0145
1.8319 51000 0.0142
1.8499 51500 0.014
1.8678 52000 0.0139
1.8858 52500 0.0133
1.9037 53000 0.0135
1.9217 53500 0.0131
1.9397 54000 0.0134
1.9576 54500 0.013
1.9756 55000 0.0132
1.9935 55500 0.0122
2.0115 56000 0.0089
2.0295 56500 0.0061
2.0474 57000 0.0061
2.0654 57500 0.006
2.0833 58000 0.0062
2.1013 58500 0.0058
2.1193 59000 0.0059
2.1372 59500 0.0059
2.1552 60000 0.0059
2.1731 60500 0.0058
2.1911 61000 0.0059
2.2091 61500 0.0058
2.2270 62000 0.0059
2.2450 62500 0.0058
2.2629 63000 0.0057
2.2809 63500 0.0055
2.2989 64000 0.0056
2.3168 64500 0.0056
2.3348 65000 0.0056
2.3527 65500 0.0057
2.3707 66000 0.0055
2.3886 66500 0.0056
2.4066 67000 0.0054
2.4246 67500 0.0055
2.4425 68000 0.0052
2.4605 68500 0.0053
2.4784 69000 0.0052
2.4964 69500 0.0053
2.5144 70000 0.0052
2.5323 70500 0.0052
2.5503 71000 0.0051
2.5682 71500 0.0049
2.5862 72000 0.005
2.6042 72500 0.0047
2.6221 73000 0.0048
2.6401 73500 0.0047
2.6580 74000 0.0048
2.6760 74500 0.0048
2.6940 75000 0.0048
2.7119 75500 0.0047
2.7299 76000 0.0047
2.7478 76500 0.0046
2.7658 77000 0.0046
2.7838 77500 0.0044
2.8017 78000 0.0046
2.8197 78500 0.0047
2.8376 79000 0.0045
2.8556 79500 0.0043
2.8736 80000 0.0045
2.8915 80500 0.0044
2.9095 81000 0.0045
2.9274 81500 0.0045
2.9454 82000 0.0043
2.9634 82500 0.0042
2.9813 83000 0.0041
2.9993 83500 0.0044

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Shuu12121/CodeSearch-ModernBERT-Owl-4.1-Plus

Finetuned
(3)
this model