metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:44282
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: Does strength come from the walls or cavities of wood cells?
sentences:
- >-
In temperate softwoods there often is a marked difference between
latewood and earlywood. The latewood will be denser than that formed
early in the season. When examined under a microscope the cells of dense
latewood are seen to be very thick-walled and with very small cell
cavities, while those formed first in the season have thin walls and
large cell cavities. The strength is in the walls, not the cavities.
Hence the greater the proportion of latewood the greater the density and
strength. In choosing a piece of pine where strength or stiffness is the
important consideration, the principal thing to observe is the
comparative amounts of earlywood and latewood. The width of ring is not
nearly so important as the proportion and nature of the latewood in the
ring.
- >-
In temperate softwoods there often is a marked difference between
latewood and earlywood. The latewood will be denser than that formed
early in the season. When examined under a microscope the cells of dense
latewood are seen to be very thick-walled and with very small cell
cavities, while those formed first in the season have thin walls and
large cell cavities. The strength is in the walls, not the cavities.
Hence the greater the proportion of latewood the greater the density and
strength. In choosing a piece of pine where strength or stiffness is the
important consideration, the principal thing to observe is the
comparative amounts of earlywood and latewood. The width of ring is not
nearly so important as the proportion and nature of the latewood in the
ring.
- >-
All of these elements struck a chord with the older readers, such as
college-aged adults, and they successfully gained in a way not seen
before. In 1965, Spider-Man and the Hulk were both featured in Esquire
magazine's list of 28 college campus heroes, alongside John F. Kennedy
and Bob Dylan. In 2009 writer Geoff Boucher reflected that, "Superman
and DC Comics instantly seemed like boring old Pat Boone; Marvel felt
like The Beatles and the British Invasion. It was Kirby's artwork with
its tension and psychedelia that made it perfect for the times—or was it
Lee's bravado and melodrama, which was somehow insecure and brash at the
same time?"
- source_sentence: What was Brian May's zodiac sign?
sentences:
- >-
After the decline of Aksum, the Eritrean highlands were under the domain
of Bahr Negash ruled by the Bahr Negus. The area was then known as
Ma'ikele Bahr ("between the seas/rivers," i.e. the land between the Red
Sea and the Mereb river). It was later renamed under Emperor Zara Yaqob
as the domain of the Bahr Negash, the Medri Bahri ("Sea land" in
Tingrinya, although it included some areas like Shire on the other side
of the Mereb, today in Ethiopia). With its capital at Debarwa, the
state's main provinces were Hamasien, Serae and Akele Guzai.
- >-
In 1963, the teenage Brian May and his father custom-built his signature
guitar Red Special, which was purposely designed to feedback. Sonic
experimentation figured heavily in Queen's songs. A distinctive
characteristic of Queen's music are the vocal harmonies which are
usually composed of the voices of May, Mercury, and Taylor best heard on
the studio albums A Night at the Opera and A Day at the Races. Some of
the ground work for the development of this sound can be attributed to
their former producer Roy Thomas Baker, and their engineer Mike Stone.
Besides vocal harmonies, Queen were also known for multi-tracking voices
to imitate the sound of a large choir through overdubs. For instance,
according to Brian May, there are over 180 vocal overdubs in "Bohemian
Rhapsody". The band's vocal structures have been compared with the Beach
Boys, but May stated they were not "much of an influence".
- >-
Having attended art college, Mercury also designed Queen's logo, called
the Queen crest, shortly before the release of the band's first album.
The logo combines the zodiac signs of all four members: two lions for
Leo (Deacon and Taylor), a crab for Cancer (May), and two fairies for
Virgo (Mercury). The lions embrace a stylised letter Q, the crab rests
atop the letter with flames rising directly above it, and the fairies
are each sheltering below a lion. There is also a crown inside the Q and
the whole logo is over-shadowed by an enormous phoenix. The whole symbol
bears a passing resemblance to the Royal coat of arms of the United
Kingdom, particularly with the lion supporters. The original logo, as
found on the reverse-side of the first album cover, was a simple line
drawing but more intricate colour versions were used on later sleeves.
- source_sentence: What musician joined Kanye West on the song "Only One"?
sentences:
- >-
Modern Nationalism, as developed especially since the French Revolution,
has made the distinction between "language" and "dialect" an issue of
great political importance. A group speaking a separate "language" is
often seen as having a greater claim to being a separate "people", and
thus to be more deserving of its own independent state, while a group
speaking a "dialect" tends to be seen not as "a people" in its own
right, but as a sub-group, part of a bigger people, which must content
itself with regional autonomy.[citation needed] The distinction between
language and dialect is thus inevitably made at least as much on a
political basis as on a linguistic one, and can lead to great political
controversy, or even armed conflict.
- >-
In June 2013, West and television personality Kim Kardashian announced
the birth of their first child, North. In October 2013, the couple
announced their engagement to widespread media attention. November 2013,
West stated that he was beginning work on his next studio album, hoping
to release it by mid-2014, with production by Rick Rubin and Q-Tip. In
December 2013, Adidas announced the beginning of an official apparel
collaboration with West, to be premiered the following year. In May
2014, West and Kardashian were married in a private ceremony in
Florence, Italy, with a variety of artists and celebrities in
attendance. West released a single, "Only One", featuring Paul
McCartney, on December 31, 2014. "FourFiveSeconds", a single jointly
produced with Rihanna and McCartney, was released in January 2015. West
also appeared on the Saturday Night Live 40th Anniversary Special, where
he premiered a new song entitled "Wolves", featuring Sia Furler and
fellow Chicago rapper, Vic Mensa. In February 2015, West premiered his
clothing collaboration with Adidas, entitled Yeezy Season 1, to
generally positive reviews. This would include West's Yeezy Boost
sneakers. In March 2015, West released the single "All Day" featuring
Theophilus London, Allan Kingdom and Paul McCartney. West performed the
song at the 2015 BRIT Awards with a number of US rappers and UK grime
MC's including: Skepta, Wiley, Novelist, Fekky, Krept & Konan, Stormzy,
Allan Kingdom, Theophilus London and Vic Mensa. He would premiere the
second iteration of his clothing line, Yeezy Season 2, in September 2015
at New York Fashion Week.
- >-
As of 2013, West has won a total of 21 Grammy Awards, making him one of
the most awarded artists of all-time. About.com ranked Kanye West No. 8
on their "Top 50 Hip-Hop Producers" list. On May 16, 2008, Kanye West
was crowned by MTV as the year's No. 1 "Hottest MC in the Game." On
December 17, 2010, Kanye West was voted as the MTV Man of the Year by
MTV. Billboard ranked Kanye West No. 3 on their list of Top 10 Producers
of the Decade. West ties with Bob Dylan for having topped the annual
Pazz & Jop critic poll the most number of times ever, with four
number-one albums each. West has also been included twice in the Time
100 annual lists of the most influential people in the world as well as
being listed in a number of Forbes annual lists.
- source_sentence: What was Berners-Lee a director of?
sentences:
- >-
The first web browser was invented in 1990 by Sir Tim Berners-Lee.
Berners-Lee is the director of the World Wide Web Consortium (W3C),
which oversees the Web's continued development, and is also the founder
of the World Wide Web Foundation. His browser was called WorldWideWeb
and later renamed Nexus.
- >-
A particular criticism of the Buddha was Vedic animal sacrifice.[web 18]
He also mocked the Vedic "hymn of the cosmic man". However, the Buddha
was not anti-Vedic, and declared that the Veda in its true form was
declared by "Kashyapa" to certain rishis, who by severe penances had
acquired the power to see by divine eyes. He names the Vedic rishis, and
declared that the original Veda of the rishis[note 25] was altered by a
few Brahmins who introduced animal sacrifices. The Buddha says that it
was on this alteration of the true Veda that he refused to pay respect
to the Vedas of his time. However, he did not denounce the union with
Brahman,[note 26] or the idea of the self uniting with the Self. At the
same time, the traditional Hindu itself gradually underwent profound
changes, transforming it into what is recognized as early Hinduism.
- >-
The first web browser was invented in 1990 by Sir Tim Berners-Lee.
Berners-Lee is the director of the World Wide Web Consortium (W3C),
which oversees the Web's continued development, and is also the founder
of the World Wide Web Foundation. His browser was called WorldWideWeb
and later renamed Nexus.
- source_sentence: Who wrote 182 papers dealing with the hibernation of swallows?
sentences:
- >-
This means that a 5% reduction in operating voltage will more than
double the life of the bulb, at the expense of reducing its light output
by about 16%. This may be a very acceptable trade off for a light bulb
that is in a difficult-to-access location (for example, traffic lights
or fixtures hung from high ceilings). Long-life bulbs take advantage of
this trade-off. Since the value of the electric power they consume is
much more than the value of the lamp, general service lamps emphasize
efficiency over long operating life. The objective is to minimize the
cost of light, not the cost of lamps. Early bulbs had a life of up to
2500 hours, but in 1924 a cartel agreed to limit life to 1000 hours.
When this was exposed in 1953, General Electric and other leading
American manufacturers were banned from limiting the life.
- >-
Aristotle however suggested that swallows and other birds hibernated.
This belief persisted as late as 1878, when Elliott Coues listed the
titles of no less than 182 papers dealing with the hibernation of
swallows. Even the "highly observant" Gilbert White, in his posthumously
published 1789 The Natural History of Selborne, quoted a man's story
about swallows being found in a chalk cliff collapse "while he was a
schoolboy at Brighthelmstone", though the man denied being an
eyewitness. However, he also writes that "as to swallows being found in
a torpid state during the winter in the Isle of Wight or any part of
this country, I never heard any such account worth attending to", and
that if early swallows "happen to find frost and snow they immediately
withdraw for a time—a circumstance this much more in favour of hiding
than migration", since he doubts they would "return for a week or two to
warmer latitudes".
- >-
Aristotle however suggested that swallows and other birds hibernated.
This belief persisted as late as 1878, when Elliott Coues listed the
titles of no less than 182 papers dealing with the hibernation of
swallows. Even the "highly observant" Gilbert White, in his posthumously
published 1789 The Natural History of Selborne, quoted a man's story
about swallows being found in a chalk cliff collapse "while he was a
schoolboy at Brighthelmstone", though the man denied being an
eyewitness. However, he also writes that "as to swallows being found in
a torpid state during the winter in the Isle of Wight or any part of
this country, I never heard any such account worth attending to", and
that if early swallows "happen to find frost and snow they immediately
withdraw for a time—a circumstance this much more in favour of hiding
than migration", since he doubts they would "return for a week or two to
warmer latitudes".
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on BAAI/bge-base-en-v1.5
results:
- task:
type: triplet
name: Triplet
dataset:
name: gooqa dev
type: gooqa-dev
metrics:
- type: cosine_accuracy
value: 0.4115999937057495
name: Cosine Accuracy
SentenceTransformer based on BAAI/bge-base-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-2-epochs")
# Run inference
sentences = [
'Who wrote 182 papers dealing with the hibernation of swallows?',
'Aristotle however suggested that swallows and other birds hibernated. This belief persisted as late as 1878, when Elliott Coues listed the titles of no less than 182 papers dealing with the hibernation of swallows. Even the "highly observant" Gilbert White, in his posthumously published 1789 The Natural History of Selborne, quoted a man\'s story about swallows being found in a chalk cliff collapse "while he was a schoolboy at Brighthelmstone", though the man denied being an eyewitness. However, he also writes that "as to swallows being found in a torpid state during the winter in the Isle of Wight or any part of this country, I never heard any such account worth attending to", and that if early swallows "happen to find frost and snow they immediately withdraw for a time—a circumstance this much more in favour of hiding than migration", since he doubts they would "return for a week or two to warmer latitudes".',
'Aristotle however suggested that swallows and other birds hibernated. This belief persisted as late as 1878, when Elliott Coues listed the titles of no less than 182 papers dealing with the hibernation of swallows. Even the "highly observant" Gilbert White, in his posthumously published 1789 The Natural History of Selborne, quoted a man\'s story about swallows being found in a chalk cliff collapse "while he was a schoolboy at Brighthelmstone", though the man denied being an eyewitness. However, he also writes that "as to swallows being found in a torpid state during the winter in the Isle of Wight or any part of this country, I never heard any such account worth attending to", and that if early swallows "happen to find frost and snow they immediately withdraw for a time—a circumstance this much more in favour of hiding than migration", since he doubts they would "return for a week or two to warmer latitudes".',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
gooqa-dev - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.4116 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 44,282 training samples
- Columns:
question,context, andnegative - Approximate statistics based on the first 1000 samples:
question context negative type string string string details - min: 7 tokens
- mean: 14.79 tokens
- max: 42 tokens
- min: 28 tokens
- mean: 148.38 tokens
- max: 494 tokens
- min: 31 tokens
- mean: 152.75 tokens
- max: 512 tokens
- Samples:
question context negative How long did the Hollywood round air for in season eight of American Idol?In the first major change to the judging panel, a fourth judge, Kara DioGuardi, was introduced. This was also the first season without executive producer Nigel Lythgoe who left to focus on the international versions of his show So You Think You Can Dance. The Hollywood round was moved to the Kodak Theatre for 2009 and was also extended to two weeks. Idol Gives Back was canceled for this season due to the global recession at the time.American Idol is an American singing competition series created by Simon Fuller and produced by 19 Entertainment, and is distributed by FremantleMedia North America. It began airing on Fox on June 11, 2002, as an addition to the Idols format based on the British series Pop Idol and has since become one of the most successful shows in the history of American television. The concept of the series is to find new solo recording artists, with the winner being determined by the viewers in America. Winners chosen by viewers through telephone, Internet, and SMS text voting were Kelly Clarkson, Ruben Studdard, Fantasia Barrino, Carrie Underwood, Taylor Hicks, Jordin Sparks, David Cook, Kris Allen, Lee DeWyze, Scotty McCreery, Phillip Phillips, Candice Glover, Caleb Johnson, and Nick Fradiani.What is one example of how students can benefit from videoconferencing?Videoconferencing provides students with the opportunity to learn by participating in two-way communication forums. Furthermore, teachers and lecturers worldwide can be brought to remote or otherwise isolated educational facilities. Students from diverse communities and backgrounds can come together to learn about one another, although language barriers will continue to persist. Such students are able to explore, communicate, analyze and share information and ideas with one another. Through videoconferencing, students can visit other parts of the world to speak with their peers, and visit museums and educational facilities. Such virtual field trips can provide enriched learning opportunities to students, especially those in geographically isolated locations, and to the economically disadvantaged. Small schools can use these technologies to pool resources and provide courses, such as in foreign languages, which could not otherwise be offered.Typical use of the various technologies described above include calling or conferencing on a one-on-one, one-to-many or many-to-many basis for personal, business, educational, deaf Video Relay Service and tele-medical, diagnostic and rehabilitative use or services. New services utilizing videocalling and videoconferencing, such as teachers and psychologists conducting online sessions, personal videocalls to inmates incarcerated in penitentiaries, and videoconferencing to resolve airline engineering issues at maintenance facilities, are being created or evolving on an ongoing basis.What names were used by The Sun to characterize the French and Germans?The Sun has been openly antagonistic towards other European nations, particularly the French and Germans. During the 1980s and 1990s, the nationalities were routinely described in copy and headlines as "frogs", "krauts" or "hun". As the paper is opposed to the EU it has referred to foreign leaders who it deemed hostile to the UK in unflattering terms. Former President Jacques Chirac of France, for instance, was branded "le Worm". An unflattering picture of German chancellor Angela Merkel, taken from the rear, bore the headline "I'm Big in the Bumdestag" (17 April 2006).On 10 October, hostilities began between German and French republican forces near Orléans. At first, the Germans were victorious but the French drew reinforcements and defeated the Germans at the Battle of Coulmiers on 9 November. After the surrender of Metz, more than 100,000 well-trained and experienced German troops joined the German 'Southern Army'. The French were forced to abandon Orléans on 4 December, and were finally defeated at the Battle of Le Mans (10–12 January). A second French army which operated north of Paris was turned back at the Battle of Amiens (27 November), the Battle of Bapaume (3 January 1871) and the Battle of St. Quentin (13 January). - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 5,000 evaluation samples
- Columns:
question,context, andnegative_1 - Approximate statistics based on the first 1000 samples:
question context negative_1 type string string string details - min: 3 tokens
- mean: 14.7 tokens
- max: 39 tokens
- min: 28 tokens
- mean: 153.44 tokens
- max: 510 tokens
- min: 28 tokens
- mean: 150.79 tokens
- max: 510 tokens
- Samples:
question context negative_1 Bilateral treaties are concluded between how many states or entities?Bilateral treaties are concluded between two states or entities. It is possible, however, for a bilateral treaty to have more than two parties; consider for instance the bilateral treaties between Switzerland and the European Union (EU) following the Swiss rejection of the European Economic Area agreement. Each of these treaties has seventeen parties. These however are still bilateral, not multilateral, treaties. The parties are divided into two groups, the Swiss ("on the one part") and the EU and its member states ("on the other part"). The treaty establishes rights and obligations between the Swiss and the EU and the member states severally—it does not establish any rights and obligations amongst the EU and its member states.[citation needed]Bilateral treaties are concluded between two states or entities. It is possible, however, for a bilateral treaty to have more than two parties; consider for instance the bilateral treaties between Switzerland and the European Union (EU) following the Swiss rejection of the European Economic Area agreement. Each of these treaties has seventeen parties. These however are still bilateral, not multilateral, treaties. The parties are divided into two groups, the Swiss ("on the one part") and the EU and its member states ("on the other part"). The treaty establishes rights and obligations between the Swiss and the EU and the member states severally—it does not establish any rights and obligations amongst the EU and its member states.[citation needed]What period is referred to as a "golden age"?The term "classical music" did not appear until the early 19th century, in an attempt to distinctly canonize the period from Johann Sebastian Bach to Beethoven as a golden age. The earliest reference to "classical music" recorded by the Oxford English Dictionary is from about 1836.In European history, the Middle Ages or medieval period lasted from the 5th to the 15th century. It began with the collapse of the Western Roman Empire and merged into the Renaissance and the Age of Discovery. The Middle Ages is the middle period of the three traditional divisions of Western history: Antiquity, Medieval period, and Modern period. The Medieval period is itself subdivided into the Early, the High, and the Late Middle Ages.What is the name of the area of land around Nanjing?Nanjing, with a total land area of 6,598 square kilometres (2,548 sq mi), is situated in the heartland of drainage area of lower reaches of Yangtze River, and in Yangtze River Delta, one of the largest economic zones of China. The Yangtze River flows past the west side and then north side of Nanjing City, while the Ningzheng Ridge surrounds the north, east and south side of the city. The city is 300 kilometres (190 mi) west-northwest of Shanghai, 1,200 kilometres (750 mi) south-southeast of Beijing, and 1,400 kilometres (870 mi) east-northeast of Chongqing. The downstream Yangtze River flows from Jiujiang, Jiangxi, through Anhui and Jiangsu to East Sea, north to drainage basin of downstream Yangtze is Huai River basin and south to it is Zhe River basin, and they are connected by the Grand Canal east to Nanjing. The area around Nanjing is called Hsiajiang (下江, Downstream River) region, with Jianghuai (江淮) stressing northern part and Jiangzhe (江浙) stressing southern part. The region is a...Nanjing ( listen; Chinese: 南京, "Southern Capital") is the city situated in the heartland of lower Yangtze River region in China, which has long been a major centre of culture, education, research, politics, economy, transport networks and tourism. It is the capital city of Jiangsu province of People's Republic of China and the second largest city in East China, with a total population of 8,216,100, and legally the capital of Republic of China which lost the mainland during the civil war. The city whose name means "Southern Capital" has a prominent place in Chinese history and culture, having served as the capitals of various Chinese dynasties, kingdoms and republican governments dating from the 3rd century AD to 1949. Prior to the advent of pinyin romanization, Nanjing's city name was spelled as Nanking or Nankin. Nanjing has a number of other names, and some historical names are now used as names of districts of the city, and among them there is the name Jiangning (江寧), whose former c... - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128num_train_epochs: 2warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.3524 |
| 0.2890 | 100 | 0.638 | 0.7838 | 0.3856 |
| 0.5780 | 200 | 0.4271 | 0.7404 | 0.3988 |
| 0.8671 | 300 | 0.3997 | 0.7197 | 0.4014 |
| 1.1561 | 400 | 0.3068 | 0.7188 | 0.4142 |
| 1.4451 | 500 | 0.2351 | 0.7110 | 0.4106 |
| 1.7341 | 600 | 0.2389 | 0.7110 | 0.4120 |
| -1 | -1 | - | - | 0.4116 |
Framework Versions
- Python: 3.11.0
- Sentence Transformers: 4.0.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}