SentenceTransformer based on BAAI/bge-base-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-4-epochs")
# Run inference
sentences = [
'How is The Bahre-Nagassi translated?',
'The Scottish traveler James Bruce reported in 1770 that Medri Bahri was a distinct political entity from Abyssinia, noting that the two territories were frequently in conflict. The Bahre-Nagassi ("Kings of the Sea") alternately fought with or against the Abyssinians and the neighbouring Muslim Adal Sultanate depending on the geopolitical circumstances. Medri Bahri was thus part of the Christian resistance against Imam Ahmad ibn Ibrahim al-Ghazi of Adal\'s forces, but later joined the Adalite states and the Ottoman Empire front against Abyssinia in 1572. That 16th century also marked the arrival of the Ottomans, who began making inroads in the Red Sea area.',
'The Scottish traveler James Bruce reported in 1770 that Medri Bahri was a distinct political entity from Abyssinia, noting that the two territories were frequently in conflict. The Bahre-Nagassi ("Kings of the Sea") alternately fought with or against the Abyssinians and the neighbouring Muslim Adal Sultanate depending on the geopolitical circumstances. Medri Bahri was thus part of the Christian resistance against Imam Ahmad ibn Ibrahim al-Ghazi of Adal\'s forces, but later joined the Adalite states and the Ottoman Empire front against Abyssinia in 1572. That 16th century also marked the arrival of the Ottomans, who began making inroads in the Red Sea area.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
gooqa-dev - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.412 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 44,288 training samples
- Columns:
question,context, andnegative - Approximate statistics based on the first 1000 samples:
question context negative type string string string details - min: 6 tokens
- mean: 14.68 tokens
- max: 63 tokens
- min: 33 tokens
- mean: 149.38 tokens
- max: 512 tokens
- min: 31 tokens
- mean: 156.95 tokens
- max: 510 tokens
- Samples:
question context negative What can happen to vaginal flora?Antibiotics are screened for any negative effects on humans or other mammals before approval for clinical use, and are usually considered safe and most are well tolerated. However, some antibiotics have been associated with a range of adverse side effects. Side-effects range from mild to very serious depending on the antibiotics used, the microbial organisms targeted, and the individual patient. Side effects may reflect the pharmacological or toxicological properties of the antibiotic or may involve hypersensitivity reactions or anaphylaxis. Safety profiles of newer drugs are often not as well established as for those that have a long history of use. Adverse effects range from fever and nausea to major allergic reactions, including photodermatitis and anaphylaxis. Common side-effects include diarrhea, resulting from disruption of the species composition in the intestinal flora, resulting, for example, in overgrowth of pathogenic bacteria, such as Clostridium difficile. Antibacterials c...Different methods can be used to define southern Europe, including its political, economic, and cultural attributes. Southern Europe can also be defined by its natural features — its geography, climate, and flora.How did Whitehead identify his system of metaphysics?Perhaps foremost among what Whitehead considered faulty metaphysical assumptions was the Cartesian idea that reality is fundamentally constructed of bits of matter that exist totally independently of one another, which he rejected in favor of an event-based or "process" ontology in which events are primary and are fundamentally interrelated and dependent on one another. He also argued that the most basic elements of reality can all be regarded as experiential, indeed that everything is constituted by its experience. He used the term "experience" very broadly, so that even inanimate processes such as electron collisions are said to manifest some degree of experience. In this, he went against Descartes' separation of two different kinds of real existence, either exclusively material or else exclusively mental. Whitehead referred to his metaphysical system as "philosophy of organism", but it would become known more widely as "process philosophy."Alfred North Whitehead was born in Ramsgate, Kent, England, in 1861. His father, Alfred Whitehead, was a minister and schoolmaster of Chatham House Academy, a successful school for boys established by Thomas Whitehead, Alfred North's grandfather. Whitehead himself recalled both of them as being very successful schoolmasters, but that his grandfather was the more extraordinary man. Whitehead's mother was Maria Sarah Whitehead, formerly Maria Sarah Buckmaster. Whitehead was apparently not particularly close with his mother, as he never mentioned her in any of his writings, and there is evidence that Whitehead's wife, Evelyn, had a low opinion of her.What do cotton linters look like?Cotton linters are fine, silky fibers which adhere to the seeds of the cotton plant after ginning. These curly fibers typically are less than 1⁄8 inch (3.2 mm) long. The term also may apply to the longer textile fiber staple lint as well as the shorter fuzzy fibers from some upland species. Linters are traditionally used in the manufacture of paper and as a raw material in the manufacture of cellulose. In the UK, linters are referred to as "cotton wool". This can also be a refined product (absorbent cotton in U.S. usage) which has medical, cosmetic and many other practical uses. The first medical use of cotton wool was by Sampson Gamgee at the Queen's Hospital (later the General Hospital) in Birmingham, England.Cotton is used to make a number of textile products. These include terrycloth for highly absorbent bath towels and robes; denim for blue jeans; cambric, popularly used in the manufacture of blue work shirts (from which we get the term "blue-collar"); and corduroy, seersucker, and cotton twill. Socks, underwear, and most T-shirts are made from cotton. Bed sheets often are made from cotton. Cotton also is used to make yarn used in crochet and knitting. Fabric also can be made from recycled or recovered cotton that otherwise would be thrown away during the spinning, weaving, or cutting process. While many fabrics are made completely of cotton, some materials blend cotton with other fibers, including rayon and synthetic fibers such as polyester. It can either be used in knitted or woven fabrics, as it can be blended with elastine to make a stretchier thread for knitted fabrics, and apparel such as stretch jeans. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 5,000 evaluation samples
- Columns:
question,context, andnegative_1 - Approximate statistics based on the first 1000 samples:
question context negative_1 type string string string details - min: 6 tokens
- mean: 14.34 tokens
- max: 31 tokens
- min: 31 tokens
- mean: 151.88 tokens
- max: 450 tokens
- min: 32 tokens
- mean: 150.81 tokens
- max: 512 tokens
- Samples:
question context negative_1 Which season of American Idol had the highest profit on its tour?The top ten (eleven in season ten) toured at the end of every season. In the season twelve tour a semi-finalist who won a sing-off was also added to the tour. Kellogg's Pop-Tarts was the sponsor for the first seven seasons, and Guitar Hero was added for the season seven tour. M&M's Pretzel Chocolate Candies was a sponsor of the season nine tour. The season five tour was the most successful tour with gross of over $35 million.The impact of American Idol is also strongly felt in musical theatre, where many of Idol alumni have forged successful careers. The striking effect of former American Idol contestants on Broadway has been noted and commented on. The casting of a popular Idol contestant can lead to significantly increased ticket sales. Other alumni have gone on to work in television and films, the most notable being Jennifer Hudson who, on the recommendation of the Idol vocal coach Debra Byrd, won a role in Dreamgirls and subsequently received an Academy Award for her performance.How does interchanging allophones of the same pheneme render words?The findings and insights of speech perception and articulation research complicate the traditional and somewhat intuitive idea of interchangeable allophones being perceived as the same phoneme. First, interchanged allophones of the same phoneme can result in unrecognizable words. Second, actual speech, even at a word level, is highly co-articulated, so it is problematic to expect to be able to splice words into simple segments without affecting speech perception.The findings and insights of speech perception and articulation research complicate the traditional and somewhat intuitive idea of interchangeable allophones being perceived as the same phoneme. First, interchanged allophones of the same phoneme can result in unrecognizable words. Second, actual speech, even at a word level, is highly co-articulated, so it is problematic to expect to be able to splice words into simple segments without affecting speech perception.How many days does someone in the United Kingdom have to wait to watch American Idol after its original broadcast?In Latin America, the show is broadcast and subtitled by Sony Entertainment Television. In southeast Asia, it is broadcast by STAR World every Thursday and Friday nine or ten hours after. In Philippines, it is aired every Thursday and Friday nine or ten hours after its United States telecast; from 2002 to 2007 on ABC 5; 2008–11 on QTV, then GMA News TV; and since 2012 on ETC. On Philippine television history. In Australia, it is aired a few hours after the U.S. telecast. It was aired on Network Ten from 2002 to 2007 and then again in 2013, from 2008 to 2012 on Fox8, from season 13 onwards it airs on digital channel, Eleven, a sister channel to Network Ten. In the United Kingdom, episodes are aired one day after the U.S. broadcast on digital channel ITV2. As of season 12, the episodes air on 5*. It is also aired in Ireland on TV3 two days after the telecast. In Brazil and Israel, the show airs two days after its original broadcast. In the instances where the airing is delayed, the shows...American Idol is broadcast to over 100 nations outside of the United States. In most nations these are not live broadcasts and may be tape delayed by several days or weeks. In Canada, the first thirteen seasons of American Idol were aired live by CTV and/or CTV Two, in simulcast with Fox. CTV dropped Idol after its thirteenth season and in August 2014, Yes TV announced that it had picked up Canadian rights to American Idol beginning in its 2015 season. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128num_train_epochs: 4warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.3522 |
| 0.2890 | 100 | 0.6923 | 0.7927 | 0.3864 |
| 0.5780 | 200 | 0.4467 | 0.7568 | 0.4010 |
| 0.8671 | 300 | 0.4108 | 0.7170 | 0.4100 |
| 1.1561 | 400 | 0.3124 | 0.7227 | 0.4062 |
| 1.4451 | 500 | 0.2431 | 0.7183 | 0.4082 |
| 1.7341 | 600 | 0.2447 | 0.7078 | 0.4184 |
| 2.0231 | 700 | 0.2355 | 0.7059 | 0.4184 |
| 2.3121 | 800 | 0.1468 | 0.7160 | 0.4194 |
| 2.6012 | 900 | 0.1517 | 0.7243 | 0.4148 |
| 2.8902 | 1000 | 0.152 | 0.7145 | 0.4108 |
| 3.1792 | 1100 | 0.1281 | 0.7251 | 0.4114 |
| 3.4682 | 1200 | 0.1135 | 0.7195 | 0.4134 |
| 3.7572 | 1300 | 0.1128 | 0.7236 | 0.4064 |
| -1 | -1 | - | - | 0.4120 |
Framework Versions
- Python: 3.11.0
- Sentence Transformers: 4.0.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- -
Model tree for ayushexel/emb-bge-base-en-v1.5-squad-4-epochs
Base model
BAAI/bge-base-en-v1.5