metadata
language:
- en
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- loss:SoftmaxLoss
- loss:CosineSimilarityLoss
base_model: google-bert/bert-base-uncased
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
widget:
- source_sentence: the guy is dead
sentences:
- The dog is dead.
- Men are sitting in the park.
- People are outside.
- source_sentence: Women are running.
sentences:
- Two women are running.
- A animated airplane is landing.
- The man sang and played his guitar.
- source_sentence: The gate is yellow.
sentences:
- The gate is blue.
- The cook is kneading the flour.
- A woman puts flour on a piece of meat.
- source_sentence: A parrot is talking.
sentences:
- A man is singing.
- Two men are standing in a room.
- Three dogs playing in the snow.
- source_sentence: the guy is paid
sentences:
- A man is receiving a contract.
- A man is racing on his bike.
- a dog chases a cat
pipeline_tag: sentence-similarity
co2_eq_emissions:
emissions: 6.489379533908795
energy_consumed: 0.01669499908389665
source: codecarbon
training_type: fine-tuning
on_cloud: false
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
ram_total_size: 31.777088165283203
hours_used: 0.097
hardware_used: 1 x NVIDIA GeForce RTX 3090
model-index:
- name: SentenceTransformer based on google-bert/bert-base-uncased
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.8287682657838144
name: Pearson Cosine
- type: spearman_cosine
value: 0.8350670289838767
name: Spearman Cosine
- type: pearson_manhattan
value: 0.796834648877542
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8041000103101458
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7968015917572032
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.803879972820206
name: Spearman Euclidean
- type: pearson_dot
value: 0.7572392072098838
name: Pearson Dot
- type: spearman_dot
value: 0.7696731029709327
name: Spearman Dot
- type: pearson_max
value: 0.8287682657838144
name: Pearson Max
- type: spearman_max
value: 0.8350670289838767
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.8014245911006761
name: Pearson Cosine
- type: spearman_cosine
value: 0.8049359058371248
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7934883900951029
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.793480619733962
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7940198430253176
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7942686805824551
name: Spearman Euclidean
- type: pearson_dot
value: 0.698878713916111
name: Pearson Dot
- type: spearman_dot
value: 0.6967434595564439
name: Spearman Dot
- type: pearson_max
value: 0.8014245911006761
name: Pearson Max
- type: spearman_max
value: 0.8049359058371248
name: Spearman Max
SentenceTransformer based on google-bert/bert-base-uncased
This is a sentence-transformers model finetuned from google-bert/bert-base-uncased on the all-nli and sts datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google-bert/bert-base-uncased
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Datasets:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/bert-base-uncased-multi-task")
# Run inference
sentences = [
'the guy is paid',
'A man is receiving a contract.',
'A man is racing on his bike.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts-dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.8288 |
| spearman_cosine | 0.8351 |
| pearson_manhattan | 0.7968 |
| spearman_manhattan | 0.8041 |
| pearson_euclidean | 0.7968 |
| spearman_euclidean | 0.8039 |
| pearson_dot | 0.7572 |
| spearman_dot | 0.7697 |
| pearson_max | 0.8288 |
| spearman_max | 0.8351 |
Semantic Similarity
- Dataset:
sts-test - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.8014 |
| spearman_cosine | 0.8049 |
| pearson_manhattan | 0.7935 |
| spearman_manhattan | 0.7935 |
| pearson_euclidean | 0.794 |
| spearman_euclidean | 0.7943 |
| pearson_dot | 0.6989 |
| spearman_dot | 0.6967 |
| pearson_max | 0.8014 |
| spearman_max | 0.8049 |
Training Details
Training Datasets
all-nli
- Dataset: all-nli at cc6c526
- Size: 942,069 training samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 17.38 tokens
- max: 52 tokens
- min: 4 tokens
- mean: 10.7 tokens
- max: 31 tokens
- 0: ~33.40%
- 1: ~33.30%
- 2: ~33.30%
- Samples:
premise hypothesis label A person on a horse jumps over a broken down airplane.A person is training his horse for a competition.1A person on a horse jumps over a broken down airplane.A person is at a diner, ordering an omelette.2A person on a horse jumps over a broken down airplane.A person is outdoors, on a horse.0 - Loss:
SoftmaxLoss
sts
- Dataset: sts at ab7a5ac
- Size: 5,749 training samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 6 tokens
- mean: 10.0 tokens
- max: 28 tokens
- min: 5 tokens
- mean: 9.95 tokens
- max: 25 tokens
- min: 0.0
- mean: 0.54
- max: 1.0
- Samples:
sentence1 sentence2 score A plane is taking off.An air plane is taking off.1.0A man is playing a large flute.A man is playing a flute.0.76A man is spreading shreded cheese on a pizza.A man is spreading shredded cheese on an uncooked pizza.0.76 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Evaluation Datasets
all-nli
- Dataset: all-nli at cc6c526
- Size: 1,000 evaluation samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 18.44 tokens
- max: 57 tokens
- min: 5 tokens
- mean: 10.57 tokens
- max: 25 tokens
- 0: ~33.10%
- 1: ~33.30%
- 2: ~33.60%
- Samples:
premise hypothesis label Two women are embracing while holding to go packages.The sisters are hugging goodbye while holding to go packages after just eating lunch.1Two women are embracing while holding to go packages.Two woman are holding packages.0Two women are embracing while holding to go packages.The men are fighting outside a deli.2 - Loss:
SoftmaxLoss
sts
- Dataset: sts at ab7a5ac
- Size: 1,500 evaluation samples
- Columns:
sentence1,sentence2, andscore - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 5 tokens
- mean: 15.1 tokens
- max: 45 tokens
- min: 6 tokens
- mean: 15.11 tokens
- max: 53 tokens
- min: 0.0
- mean: 0.47
- max: 1.0
- Samples:
sentence1 sentence2 score A man with a hard hat is dancing.A man wearing a hard hat is dancing.1.0A young child is riding a horse.A child is riding a horse.0.95A man is feeding a mouse to a snake.The man is feeding a mouse to the snake.1.0 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1warmup_ratio: 0.1fp16: Truemulti_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Falseper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Nonedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss | sts loss | all-nli loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|---|---|---|---|---|---|---|
| 0.1389 | 100 | 0.5961 | 0.0470 | 1.1005 | 0.8096 | - |
| 0.2778 | 200 | 0.5408 | 0.0354 | 0.9687 | 0.8229 | - |
| 0.4167 | 300 | 0.5185 | 0.0373 | 0.9398 | 0.8265 | - |
| 0.5556 | 400 | 0.4978 | 0.0368 | 0.9304 | 0.8200 | - |
| 0.6944 | 500 | 0.5026 | 0.0347 | 0.9044 | 0.8234 | - |
| 0.8333 | 600 | 0.4702 | 0.0326 | 0.8727 | 0.8300 | - |
| 0.9722 | 700 | 0.4649 | 0.0328 | 0.8723 | 0.8351 | - |
| 1.0 | 720 | - | - | - | - | 0.8049 |
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Energy Consumed: 0.017 kWh
- Carbon Emitted: 0.006 kg of CO2
- Hours Used: 0.097 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.11.6
- Sentence Transformers: 3.0.0.dev0
- Transformers: 4.41.0.dev0
- PyTorch: 2.3.0+cu121
- Accelerate: 0.26.1
- Datasets: 2.18.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers and SoftmaxLoss
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}