SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-3-epochs")
# Run inference
sentences = [
    'How many people outside the UK were under British rule in 1945?',
    'Though Britain and the empire emerged victorious from the Second World War, the effects of the conflict were profound, both at home and abroad. Much of Europe, a continent that had dominated the world for several centuries, was in ruins, and host to the armies of the United States and the Soviet Union, who now held the balance of global power. Britain was left essentially bankrupt, with insolvency only averted in 1946 after the negotiation of a $US 4.33 billion loan (US$56 billion in 2012) from the United States, the last instalment of which was repaid in 2006. At the same time, anti-colonial movements were on the rise in the colonies of European nations. The situation was complicated further by the increasing Cold War rivalry of the United States and the Soviet Union. In principle, both nations were opposed to European colonialism. In practice, however, American anti-communism prevailed over anti-imperialism, and therefore the United States supported the continued existence of the British Empire to keep Communist expansion in check. The "wind of change" ultimately meant that the British Empire\'s days were numbered, and on the whole, Britain adopted a policy of peaceful disengagement from its colonies once stable, non-Communist governments were available to transfer power to. This was in contrast to other European powers such as France and Portugal, which waged costly and ultimately unsuccessful wars to keep their empires intact. Between 1945 and 1965, the number of people under British rule outside the UK itself fell from 700 million to five million, three million of whom were in Hong Kong.',
    "By the start of the 20th century, Germany and the United States challenged Britain's economic lead. Subsequent military and economic tensions between Britain and Germany were major causes of the First World War, during which Britain relied heavily upon its empire. The conflict placed enormous strain on the military, financial and manpower resources of Britain. Although the British Empire achieved its largest territorial extent immediately after World War I, Britain was no longer the world's pre-eminent industrial or military power. In the Second World War, Britain's colonies in South-East Asia were occupied by Imperial Japan. Despite the final victory of Britain and its allies, the damage to British prestige helped to accelerate the decline of the empire. India, Britain's most valuable and populous possession, achieved independence as part of a larger decolonisation movement in which Britain granted independence to most territories of the Empire. The transfer of Hong Kong to China in 1997 marked for many the end of the British Empire. Fourteen overseas territories remain under British sovereignty. After independence, many former British colonies joined the Commonwealth of Nations, a free association of independent states. The United Kingdom is now one of 16 Commonwealth nations, a grouping known informally as the Commonwealth realms, that share one monarch—Queen Elizabeth II.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.4192

Training Details

Training Dataset

Unnamed Dataset

  • Size: 44,283 training samples
  • Columns: question, context, and negative
  • Approximate statistics based on the first 1000 samples:
    question context negative
    type string string string
    details
    • min: 6 tokens
    • mean: 14.62 tokens
    • max: 35 tokens
    • min: 31 tokens
    • mean: 152.26 tokens
    • max: 512 tokens
    • min: 32 tokens
    • mean: 153.89 tokens
    • max: 510 tokens
  • Samples:
    question context negative
    About how many cubic meters of wood was used in 1991 to make products like glulam, LVL, and structural composite lumber? These products include glued laminated timber (glulam), wood structural panels (including plywood, oriented strand board and composite panels), laminated veneer lumber (LVL) and other structural composite lumber (SCL) products, parallel strand lumber, and I-joists. Approximately 100 million cubic meters of wood was consumed for this purpose in 1991. The trends suggest that particle board and fiber board will overtake plywood. Engineered wood products, glued building products "engineered" for application-specific performance requirements, are often used in construction and industrial applications. Glued engineered wood products are manufactured by bonding together wood strands, veneers, lumber or other forms of wood fiber with glue to form a larger, more efficient composite structural unit.
    Which Eclogues discusses homosexual love? The biographical tradition asserts that Virgil began the hexameter Eclogues (or Bucolics) in 42 BC and it is thought that the collection was published around 39–38 BC, although this is controversial. The Eclogues (from the Greek for "selections") are a group of ten poems roughly modeled on the bucolic hexameter poetry ("pastoral poetry") of the Hellenistic poet Theocritus. After his victory in the Battle of Philippi in 42 BC, fought against the army led by the assassins of Julius Caesar, Octavian tried to pay off his veterans with land expropriated from towns in northern Italy, supposedly including, according to the tradition, an estate near Mantua belonging to Virgil. The loss of his family farm and the attempt through poetic petitions to regain his property have traditionally been seen as Virgil's motives in the composition of the Eclogues. This is now thought to be an unsupported inference from interpretations of the Eclogues. In Eclogues 1 and 9, Virgil indeed dramatizes the contra... Sometime after the publication of the Eclogues (probably before 37 BC), Virgil became part of the circle of Maecenas, Octavian's capable agent d'affaires who sought to counter sympathy for Antony among the leading families by rallying Roman literary figures to Octavian's side. Virgil came to know many of the other leading literary figures of the time, including Horace, in whose poetry he is often mentioned, and Varius Rufus, who later helped finish the Aeneid.
    What nation lies to the west of the Marshall Islands? The islands are located about halfway between Hawaii and Australia, north of Nauru and Kiribati, east of the Federated States of Micronesia, and south of the U.S. territory of Wake Island, to which it lays claim. The atolls and islands form two groups: the Ratak (sunrise) and the Ralik (sunset). The two island chains lie approximately parallel to one another, running northwest to southeast, comprising about 750,000 square miles (1,900,000 km2) of ocean but only about 70 square miles (180 km2) of land mass. Each includes 15 to 18 islands and atolls. The country consists of a total of 29 atolls and five isolated islands. The climate is hot and humid, with a wet season from May to November. Many Pacific typhoons begin as tropical storms in the Marshall Islands region, and grow stronger as they move west toward the Mariana Islands and the Philippines.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 5,000 evaluation samples
  • Columns: question, context, and negative_1
  • Approximate statistics based on the first 1000 samples:
    question context negative_1
    type string string string
    details
    • min: 3 tokens
    • mean: 14.61 tokens
    • max: 52 tokens
    • min: 28 tokens
    • mean: 151.48 tokens
    • max: 494 tokens
    • min: 28 tokens
    • mean: 151.7 tokens
    • max: 512 tokens
  • Samples:
    question context negative_1
    When did Simon Cowell announce he was no longer going to be a judge? In season eight, Latin Grammy Award-nominated singer–songwriter and record producer Kara DioGuardi was added as a fourth judge. She stayed for two seasons and left the show before season ten. Paula Abdul left the show before season nine after failing to agree terms with the show producers. Emmy Award-winning talk show host Ellen DeGeneres replaced Paula Abdul for that season, but left after just one season. On January 11, 2010, Simon Cowell announced that he was leaving the show to pursue introducing the American version of his show The X Factor to the USA for 2011. Jennifer Lopez and Steven Tyler joined the judging panel in season ten, but both left after two seasons. They were replaced by three new judges, Mariah Carey, Nicki Minaj and Keith Urban, who joined Randy Jackson in season 12. However both Carey and Minaj left after one season, and Randy Jackson also announced that he would depart the show after twelve seasons as a judge but would return as a mentor. Urban is the only judge... A special tribute to Simon Cowell was presented in the finale for his final season with the show. Many figures from the show's past, including Paula Abdul, made an appearance.
    When did Hodgson publish his DNA study? According to an autosomal DNA study by Hodgson et al. (2014), the Afro-Asiatic languages were likely spread across Africa and the Near East by an ancestral population(s) carrying a newly identified non-African genetic component, which the researchers dub the "Ethio-Somali". This Ethio-Somali component is today most common among Afro-Asiatic-speaking populations in the Horn of Africa. It reaches a frequency peak among ethnic Somalis, representing the majority of their ancestry. The Ethio-Somali component is most closely related to the Maghrebi non-African genetic component, and is believed to have diverged from all other non-African ancestries at least 23,000 years ago. On this basis, the researchers suggest that the original Ethio-Somali carrying population(s) probably arrived in the pre-agricultural period from the Near East, having crossed over into northeastern Africa via the Sinai Peninsula. The population then likely split into two branches, with one group heading westward toward ... Advances in understanding genes and inheritance continued throughout the 20th century. Deoxyribonucleic acid (DNA) was shown to be the molecular repository of genetic information by experiments in the 1940s to 1950s. The structure of DNA was studied by Rosalind Franklin using X-ray crystallography, which led James D. Watson and Francis Crick to publish a model of the double-stranded DNA molecule whose paired nucleotide bases indicated a compelling hypothesis for the mechanism of genetic replication. Collectively, this body of research established the central dogma of molecular biology, which states that proteins are translated from RNA, which is transcribed from DNA. This dogma has since been shown to have exceptions, such as reverse transcription in retroviruses. The modern study of genetics at the level of DNA is known as molecular genetics.
    What campus did Yale buy in 2008? Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven forests in Connecticut, Vermont, and New Hampshire—the largest of which is the 7,840-acre (31.7 km2) Yale-Myers Forest in Connecticut's Quiet Corner—and nature preserves including Horse Island. Yale's central campus in downtown New Haven covers 260 acres (1.1 km2) and comprises its main, historic campus and a medical campus adjacent to the Yale-New Haven Hospital. In western New Haven, the university holds 500 acres (2.0 km2) of athletic facilities, including the Yale Golf Course. In 2008, Yale purchased the 136-acre (0.55 km2) former Bayer Pharmaceutical campus in West Haven, Connecticut, the buildings of which are now used as laboratory and research space. Yale also owns seven forests in Connecticut, Vermont, and New Hampshire—the largest of which is the 7,840-acre (31.7 km2) Yale-Myers Forest in Connecticut's Quiet Corner—and nature preserves including Horse Island.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss gooqa-dev_cosine_accuracy
-1 -1 - - 0.3536
0.2890 100 0.6663 0.7830 0.3922
0.5780 200 0.4374 0.7399 0.4064
0.8671 300 0.4048 0.7294 0.4086
1.1561 400 0.3149 0.7244 0.4136
1.4451 500 0.2378 0.7246 0.4182
1.7341 600 0.2358 0.7179 0.4158
2.0231 700 0.2338 0.7170 0.4240
2.3121 800 0.1602 0.7293 0.4148
2.6012 900 0.1595 0.7237 0.4230
2.8902 1000 0.1545 0.7229 0.4146
-1 -1 - - 0.4192

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ayushexel/emb-bge-base-en-v1.5-squad-3-epochs

Finetuned
(429)
this model

Evaluation results