| # Quickstart | |
| Once you have SentenceTransformers [installed](installation.md), the usage is simple: | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| model = SentenceTransformer('all-MiniLM-L6-v2') | |
| #Our sentences we like to encode | |
| sentences = ['This framework generates embeddings for each input sentence', | |
| 'Sentences are passed as a list of string.', | |
| 'The quick brown fox jumps over the lazy dog.'] | |
| #Sentences are encoded by calling model.encode() | |
| sentence_embeddings = model.encode(sentences) | |
| #Print the embeddings | |
| for sentence, embedding in zip(sentences, sentence_embeddings): | |
| print("Sentence:", sentence) | |
| print("Embedding:", embedding) | |
| print("") | |
| ``` | |
| With `SentenceTransformer('all-MiniLM-L6-v2')` we define which sentence transformer model we like to load. In this example, we load *all-MiniLM-L6-v2*, which is a MiniLM model fine tuned on a large dataset of over 1 billion training pairs. | |
| BERT (and other transformer networks) output for each token in our input text an embedding. In order to create a fixed-sized sentence embedding out of this, the model applies mean pooling, i.e., the output embeddings for all tokens are averaged to yield a fixed-sized vector. | |
| ## Comparing Sentence Similarities | |
| The sentences (texts) are mapped such that sentences with similar meanings are close in vector space. One common method to measure the similarity in vector space is to use [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity). For two sentences, this can be done like this: | |
| ```python | |
| from sentence_transformers import SentenceTransformer, util | |
| model = SentenceTransformer('all-MiniLM-L6-v2') | |
| #Sentences are encoded by calling model.encode() | |
| emb1 = model.encode("This is a red cat with a hat.") | |
| emb2 = model.encode("Have you seen my red cat?") | |
| cos_sim = util.cos_sim(emb1, emb2) | |
| print("Cosine-Similarity:", cos_sim) | |
| ``` | |
| If you have a list with more sentences, you can use the following code example: | |
| ```python | |
| from sentence_transformers import SentenceTransformer, util | |
| model = SentenceTransformer('all-MiniLM-L6-v2') | |
| sentences = ['A man is eating food.', | |
| 'A man is eating a piece of bread.', | |
| 'The girl is carrying a baby.', | |
| 'A man is riding a horse.', | |
| 'A woman is playing violin.', | |
| 'Two men pushed carts through the woods.', | |
| 'A man is riding a white horse on an enclosed ground.', | |
| 'A monkey is playing drums.', | |
| 'Someone in a gorilla costume is playing a set of drums.' | |
| ] | |
| #Encode all sentences | |
| embeddings = model.encode(sentences) | |
| #Compute cosine similarity between all pairs | |
| cos_sim = util.cos_sim(embeddings, embeddings) | |
| #Add all pairs to a list with their cosine similarity score | |
| all_sentence_combinations = [] | |
| for i in range(len(cos_sim)-1): | |
| for j in range(i+1, len(cos_sim)): | |
| all_sentence_combinations.append([cos_sim[i][j], i, j]) | |
| #Sort list by the highest cosine similarity score | |
| all_sentence_combinations = sorted(all_sentence_combinations, key=lambda x: x[0], reverse=True) | |
| print("Top-5 most similar pairs:") | |
| for score, i, j in all_sentence_combinations[0:5]: | |
| print("{} \t {} \t {:.4f}".format(sentences[i], sentences[j], cos_sim[i][j])) | |
| ``` | |
| See on the left the *Usage* sections for more examples how to use SentenceTransformers. | |
| ## Pre-Trained Models | |
| Various pre-trained models exists optimized for many tasks exists. For a full list, see **[Pretrained Models](pretrained_models.md)**. | |
| ## Training your own Embeddings | |
| Training your own sentence embeddings models for all type of use-cases is easy and requires often only minimal coding effort. For a comprehensive tutorial, see [Training/Overview](training/overview.md). | |
| You can also extend easily existent sentence embeddings models to **further languages**. For details, see [Multi-Lingual Training](../examples/training/multilingual/README). | |