SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-2-epochs")
# Run inference
sentences = [
    'Who wrote 182 papers dealing with the hibernation of swallows?',
    'Aristotle however suggested that swallows and other birds hibernated. This belief persisted as late as 1878, when Elliott Coues listed the titles of no less than 182 papers dealing with the hibernation of swallows. Even the "highly observant" Gilbert White, in his posthumously published 1789 The Natural History of Selborne, quoted a man\'s story about swallows being found in a chalk cliff collapse "while he was a schoolboy at Brighthelmstone", though the man denied being an eyewitness. However, he also writes that "as to swallows being found in a torpid state during the winter in the Isle of Wight or any part of this country, I never heard any such account worth attending to", and that if early swallows "happen to find frost and snow they immediately withdraw for a time—a circumstance this much more in favour of hiding than migration", since he doubts they would "return for a week or two to warmer latitudes".',
    'Aristotle however suggested that swallows and other birds hibernated. This belief persisted as late as 1878, when Elliott Coues listed the titles of no less than 182 papers dealing with the hibernation of swallows. Even the "highly observant" Gilbert White, in his posthumously published 1789 The Natural History of Selborne, quoted a man\'s story about swallows being found in a chalk cliff collapse "while he was a schoolboy at Brighthelmstone", though the man denied being an eyewitness. However, he also writes that "as to swallows being found in a torpid state during the winter in the Isle of Wight or any part of this country, I never heard any such account worth attending to", and that if early swallows "happen to find frost and snow they immediately withdraw for a time—a circumstance this much more in favour of hiding than migration", since he doubts they would "return for a week or two to warmer latitudes".',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4116

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,282 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 7 tokens
    • mean: 14.79 tokens
    • max: 42 tokens
    • min: 28 tokens
    • mean: 148.38 tokens
    • max: 494 tokens
    • min: 31 tokens
    • mean: 152.75 tokens
    • max: 512 tokens
  • Samples:
    question context negative
    How long did the Hollywood round air for in season eight of American Idol? In the first major change to the judging panel, a fourth judge, Kara DioGuardi, was introduced. This was also the first season without executive producer Nigel Lythgoe who left to focus on the international versions of his show So You Think You Can Dance. The Hollywood round was moved to the Kodak Theatre for 2009 and was also extended to two weeks. Idol Gives Back was canceled for this season due to the global recession at the time. American Idol is an American singing competition series created by Simon Fuller and produced by 19 Entertainment, and is distributed by FremantleMedia North America. It began airing on Fox on June 11, 2002, as an addition to the Idols format based on the British series Pop Idol and has since become one of the most successful shows in the history of American television. The concept of the series is to find new solo recording artists, with the winner being determined by the viewers in America. Winners chosen by viewers through telephone, Internet, and SMS text voting were Kelly Clarkson, Ruben Studdard, Fantasia Barrino, Carrie Underwood, Taylor Hicks, Jordin Sparks, David Cook, Kris Allen, Lee DeWyze, Scotty McCreery, Phillip Phillips, Candice Glover, Caleb Johnson, and Nick Fradiani.
    What is one example of how students can benefit from videoconferencing? Videoconferencing provides students with the opportunity to learn by participating in two-way communication forums. Furthermore, teachers and lecturers worldwide can be brought to remote or otherwise isolated educational facilities. Students from diverse communities and backgrounds can come together to learn about one another, although language barriers will continue to persist. Such students are able to explore, communicate, analyze and share information and ideas with one another. Through videoconferencing, students can visit other parts of the world to speak with their peers, and visit museums and educational facilities. Such virtual field trips can provide enriched learning opportunities to students, especially those in geographically isolated locations, and to the economically disadvantaged. Small schools can use these technologies to pool resources and provide courses, such as in foreign languages, which could not otherwise be offered. Typical use of the various technologies described above include calling or conferencing on a one-on-one, one-to-many or many-to-many basis for personal, business, educational, deaf Video Relay Service and tele-medical, diagnostic and rehabilitative use or services. New services utilizing videocalling and videoconferencing, such as teachers and psychologists conducting online sessions, personal videocalls to inmates incarcerated in penitentiaries, and videoconferencing to resolve airline engineering issues at maintenance facilities, are being created or evolving on an ongoing basis.
    What names were used by The Sun to characterize the French and Germans? The Sun has been openly antagonistic towards other European nations, particularly the French and Germans. During the 1980s and 1990s, the nationalities were routinely described in copy and headlines as "frogs", "krauts" or "hun". As the paper is opposed to the EU it has referred to foreign leaders who it deemed hostile to the UK in unflattering terms. Former President Jacques Chirac of France, for instance, was branded "le Worm". An unflattering picture of German chancellor Angela Merkel, taken from the rear, bore the headline "I'm Big in the Bumdestag" (17 April 2006). On 10 October, hostilities began between German and French republican forces near Orléans. At first, the Germans were victorious but the French drew reinforcements and defeated the Germans at the Battle of Coulmiers on 9 November. After the surrender of Metz, more than 100,000 well-trained and experienced German troops joined the German 'Southern Army'. The French were forced to abandon Orléans on 4 December, and were finally defeated at the Battle of Le Mans (10–12 January). A second French army which operated north of Paris was turned back at the Battle of Amiens (27 November), the Battle of Bapaume (3 January 1871) and the Battle of St. Quentin (13 January).
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 3 tokens
    • mean: 14.7 tokens
    • max: 39 tokens
    • min: 28 tokens
    • mean: 153.44 tokens
    • max: 510 tokens
    • min: 28 tokens
    • mean: 150.79 tokens
    • max: 510 tokens
  • Samples:
    question context negative_1
    Bilateral treaties are concluded between how many states or entities? Bilateral treaties are concluded between two states or entities. It is possible, however, for a bilateral treaty to have more than two parties; consider for instance the bilateral treaties between Switzerland and the European Union (EU) following the Swiss rejection of the European Economic Area agreement. Each of these treaties has seventeen parties. These however are still bilateral, not multilateral, treaties. The parties are divided into two groups, the Swiss ("on the one part") and the EU and its member states ("on the other part"). The treaty establishes rights and obligations between the Swiss and the EU and the member states severally—it does not establish any rights and obligations amongst the EU and its member states.[citation needed] Bilateral treaties are concluded between two states or entities. It is possible, however, for a bilateral treaty to have more than two parties; consider for instance the bilateral treaties between Switzerland and the European Union (EU) following the Swiss rejection of the European Economic Area agreement. Each of these treaties has seventeen parties. These however are still bilateral, not multilateral, treaties. The parties are divided into two groups, the Swiss ("on the one part") and the EU and its member states ("on the other part"). The treaty establishes rights and obligations between the Swiss and the EU and the member states severally—it does not establish any rights and obligations amongst the EU and its member states.[citation needed]
    What period is referred to as a "golden age"? The term "classical music" did not appear until the early 19th century, in an attempt to distinctly canonize the period from Johann Sebastian Bach to Beethoven as a golden age. The earliest reference to "classical music" recorded by the Oxford English Dictionary is from about 1836. In European history, the Middle Ages or medieval period lasted from the 5th to the 15th century. It began with the collapse of the Western Roman Empire and merged into the Renaissance and the Age of Discovery. The Middle Ages is the middle period of the three traditional divisions of Western history: Antiquity, Medieval period, and Modern period. The Medieval period is itself subdivided into the Early, the High, and the Late Middle Ages.
    What is the name of the area of land around Nanjing? Nanjing, with a total land area of 6,598 square kilometres (2,548 sq mi), is situated in the heartland of drainage area of lower reaches of Yangtze River, and in Yangtze River Delta, one of the largest economic zones of China. The Yangtze River flows past the west side and then north side of Nanjing City, while the Ningzheng Ridge surrounds the north, east and south side of the city. The city is 300 kilometres (190 mi) west-northwest of Shanghai, 1,200 kilometres (750 mi) south-southeast of Beijing, and 1,400 kilometres (870 mi) east-northeast of Chongqing. The downstream Yangtze River flows from Jiujiang, Jiangxi, through Anhui and Jiangsu to East Sea, north to drainage basin of downstream Yangtze is Huai River basin and south to it is Zhe River basin, and they are connected by the Grand Canal east to Nanjing. The area around Nanjing is called Hsiajiang (下江, Downstream River) region, with Jianghuai (江淮) stressing northern part and Jiangzhe (江浙) stressing southern part. The region is a... Nanjing ( listen; Chinese: 南京, "Southern Capital") is the city situated in the heartland of lower Yangtze River region in China, which has long been a major centre of culture, education, research, politics, economy, transport networks and tourism. It is the capital city of Jiangsu province of People's Republic of China and the second largest city in East China, with a total population of 8,216,100, and legally the capital of Republic of China which lost the mainland during the civil war. The city whose name means "Southern Capital" has a prominent place in Chinese history and culture, having served as the capitals of various Chinese dynasties, kingdoms and republican governments dating from the 3rd century AD to 1949. Prior to the advent of pinyin romanization, Nanjing's city name was spelled as Nanking or Nankin. Nanjing has a number of other names, and some historical names are now used as names of districts of the city, and among them there is the name Jiangning (江寧), whose former c...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3524
0.2890 100 0.638 0.7838 0.3856
0.5780 200 0.4271 0.7404 0.3988
0.8671 300 0.3997 0.7197 0.4014
1.1561 400 0.3068 0.7188 0.4142
1.4451 500 0.2351 0.7110 0.4106
1.7341 600 0.2389 0.7110 0.4120
-1 -1 - - 0.4116

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/emb-bge-base-en-v1.5-squad-2-epochs

Finetuned
(426)
this model

Evaluation results