SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-9-epochs")
# Run inference
sentences = [
    'What is a WISP? ',
    'A wireless Internet service provider (WISP) is an Internet service provider with a network based on wireless networking. Technology may include commonplace Wi-Fi wireless mesh networking, or proprietary equipment designed to operate over open 900 MHz, 2.4 GHz, 4.9, 5.2, 5.4, 5.7, and 5.8 GHz bands or licensed frequencies such as 2.5 GHz (EBS/BRS), 3.65 GHz (NN) and in the UHF band (including the MMDS frequency band) and LMDS.[citation needed]',
    'A wireless Internet service provider (WISP) is an Internet service provider with a network based on wireless networking. Technology may include commonplace Wi-Fi wireless mesh networking, or proprietary equipment designed to operate over open 900 MHz, 2.4 GHz, 4.9, 5.2, 5.4, 5.7, and 5.8 GHz bands or licensed frequencies such as 2.5 GHz (EBS/BRS), 3.65 GHz (NN) and in the UHF band (including the MMDS frequency band) and LMDS.[citation needed]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4112

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,284 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.32 tokens
    • max: 41 tokens
    • min: 29 tokens
    • mean: 153.05 tokens
    • max: 481 tokens
    • min: 31 tokens
    • mean: 156.11 tokens
    • max: 486 tokens
  • Samples:
    question context negative
    What is the state song of New York? I Love New York (stylized I ❤ NY) is both a logo and a song that are the basis of an advertising campaign and have been used since 1977 to promote tourism in New York City, and later to promote New York State as well. The trademarked logo, owned by New York State Empire State Development, appears in souvenir shops and brochures throughout the city and state, some licensed, many not. The song is the state song of New York. New York—often called New York City or the City of New York to distinguish it from the State of New York, of which it is a part—is the most populous city in the United States and the center of the New York metropolitan area, the premier gateway for legal immigration to the United States and one of the most populous urban agglomerations in the world. A global power city, New York exerts a significant impact upon commerce, finance, media, art, fashion, research, technology, education, and entertainment, its fast pace defining the term New York minute. Home to the headquarters of the United Nations, New York is an important center for international diplomacy and has been described as the cultural and financial capital of the world.
    What is the state song of New York? I Love New York (stylized I ❤ NY) is both a logo and a song that are the basis of an advertising campaign and have been used since 1977 to promote tourism in New York City, and later to promote New York State as well. The trademarked logo, owned by New York State Empire State Development, appears in souvenir shops and brochures throughout the city and state, some licensed, many not. The song is the state song of New York. New York—often called New York City or the City of New York to distinguish it from the State of New York, of which it is a part—is the most populous city in the United States and the center of the New York metropolitan area, the premier gateway for legal immigration to the United States and one of the most populous urban agglomerations in the world. A global power city, New York exerts a significant impact upon commerce, finance, media, art, fashion, research, technology, education, and entertainment, its fast pace defining the term New York minute. Home to the headquarters of the United Nations, New York is an important center for international diplomacy and has been described as the cultural and financial capital of the world.
    What was the mandated closing time of pubs in Knightsbridge in the 1960s? There was a special case established under the State Management Scheme where the brewery and licensed premises were bought and run by the state until 1973, most notably in Carlisle. During the 20th century elsewhere, both the licensing laws and enforcement were progressively relaxed, and there were differences between parishes; in the 1960s, at closing time in Kensington at 10:30 pm, drinkers would rush over the parish boundary to be in good time for "Last Orders" in Knightsbridge before 11 pm, a practice observed in many pubs adjoining licensing area boundaries. Some Scottish and Welsh parishes remained officially "dry" on Sundays (although often this merely required knocking at the back door of the pub). These restricted opening hours led to the tradition of lock-ins. Pubs that cater for a niche clientele, such as sports fans or people of certain nationalities are known as theme pubs. Examples of theme pubs include sports bars, rock pubs, biker pubs, Goth pubs, strip pubs, gay bars, karaoke bars and Irish pubs.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 7 tokens
    • mean: 14.61 tokens
    • max: 39 tokens
    • min: 28 tokens
    • mean: 148.94 tokens
    • max: 512 tokens
    • min: 28 tokens
    • mean: 147.0 tokens
    • max: 486 tokens
  • Samples:
    question context negative_1
    When the UK changes their clocks forward in the spring, what do they call the time they're then observing? The name of local time typically changes when DST is observed. American English replaces standard with daylight: for example, Pacific Standard Time (PST) becomes Pacific Daylight Time (PDT). In the United Kingdom, the standard term for UK time when advanced by one hour is British Summer Time (BST), and British English typically inserts summer into other time zone names, e.g. Central European Time (CET) becomes Central European Summer Time (CEST). Daylight saving time (DST) or summer time is the practice of advancing clocks during summer months by one hour so that in the evening daylight is experienced an hour longer, while sacrificing normal sunrise times. Typically, regions with summer time adjust clocks forward one hour close to the start of spring and adjust them backward in the autumn to standard time.
    When did the U.N. vote to adopt the Sustainable Development Goals? In September 2015, the 193 member states of the United Nations unanimously adopted the Sustainable Development Goals, a set of 17 goals aiming to transform the world over the next 15 years. These goals are designed to eliminate poverty, discrimination, abuse and preventable deaths, address environmental destruction, and usher in an era of development for all people, everywhere. In September 2015, the 193 member states of the United Nations unanimously adopted the Sustainable Development Goals, a set of 17 goals aiming to transform the world over the next 15 years. These goals are designed to eliminate poverty, discrimination, abuse and preventable deaths, address environmental destruction, and usher in an era of development for all people, everywhere.
    Where in India is Sanskrit still spoken by the population? Samskrita Bharati is an organisation working for Sanskrit revival. The "All-India Sanskrit Festival" (since 2002) holds composition contests. The 1991 Indian census reported 49,736 fluent speakers of Sanskrit. Sanskrit learning programmes also feature on the lists of most AIR broadcasting centres. The Mattur village in central Karnataka claims to have native speakers of Sanskrit among its population. Inhabitants of all castes learn Sanskrit starting in childhood and converse in the language. Even the local Muslims converse in Sanskrit. Historically, the village was given by king Krishnadevaraya of the Vijayanagara Empire to Vedic scholars and their families, while people in his kingdom spoke Kannada and Telugu. Another effort concentrates on preserving and passing along the oral tradition of the Vedas, www.shrivedabharathi.in is one such organisation based out of Hyderabad that has been digitising the Vedas by recording recitations of Vedic Pandits. Samskrita Bharati is an organisation working for Sanskrit revival. The "All-India Sanskrit Festival" (since 2002) holds composition contests. The 1991 Indian census reported 49,736 fluent speakers of Sanskrit. Sanskrit learning programmes also feature on the lists of most AIR broadcasting centres. The Mattur village in central Karnataka claims to have native speakers of Sanskrit among its population. Inhabitants of all castes learn Sanskrit starting in childhood and converse in the language. Even the local Muslims converse in Sanskrit. Historically, the village was given by king Krishnadevaraya of the Vijayanagara Empire to Vedic scholars and their families, while people in his kingdom spoke Kannada and Telugu. Another effort concentrates on preserving and passing along the oral tradition of the Vedas, www.shrivedabharathi.in is one such organisation based out of Hyderabad that has been digitising the Vedas by recording recitations of Vedic Pandits.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 9
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 9
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3516
0.2890 100 0.7706 0.8278 0.3768
0.5780 200 0.4913 0.7635 0.4002
0.8671 300 0.4249 0.7408 0.4014
1.1561 400 0.3396 0.7256 0.4112
1.4451 500 0.2886 0.7149 0.4136
1.7341 600 0.2854 0.7187 0.4142
2.0231 700 0.2596 0.7110 0.4128
2.3121 800 0.1537 0.7251 0.4122
2.6012 900 0.1538 0.7261 0.4166
2.8902 1000 0.1626 0.7219 0.4124
3.1792 1100 0.1234 0.7360 0.4196
3.4682 1200 0.104 0.7424 0.4150
3.7572 1300 0.1042 0.7431 0.4130
4.0462 1400 0.1025 0.7395 0.4176
4.3353 1500 0.0705 0.7591 0.4112
4.6243 1600 0.0766 0.7578 0.4106
4.9133 1700 0.0768 0.7589 0.4154
5.2023 1800 0.0623 0.7693 0.4124
5.4913 1900 0.0561 0.7673 0.4132
5.7803 2000 0.0592 0.7658 0.4188
6.0694 2100 0.0553 0.7723 0.4102
6.3584 2200 0.0455 0.7823 0.4090
6.6474 2300 0.0476 0.7784 0.4158
6.9364 2400 0.0478 0.7825 0.4168
7.2254 2500 0.0439 0.7877 0.4082
7.5145 2600 0.0407 0.7875 0.4106
7.8035 2700 0.041 0.7888 0.4182
8.0925 2800 0.0401 0.7922 0.4100
8.3815 2900 0.0385 0.7914 0.4116
8.6705 3000 0.0375 0.7930 0.4110
8.9595 3100 0.0377 0.7926 0.4130
-1 -1 - - 0.4112

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/emb-bge-base-en-v1.5-squad-9-epochs

Finetuned
(426)
this model

Evaluation results