Member-only story
Large Language Models: SBERT — Sentence-BERT
Learn how siamese BERT networks accurately transform sentences into embeddings
Introduction
It is no secret that transformers made evolutionary progress in NLP. Based on transformers, many other machine learning models have evolved. One of them is BERT which primarily consists of several stacked transformer encoders. Apart from being used for a set of different problems like sentiment analysis or question answering, BERT became increasingly popular for constructing word embeddings — vectors of numbers representing semantic meanings of words.
Representing words in the form of embeddings gave a huge advantage as machine learning algorithms cannot work with raw texts but can operate on vectors of vectors. This allows comparing different words by their similarity by using a standard metric like Euclidean or cosine distance.
The problem is that, in practice, we often need to construct embeddings not for single words but instead for whole sentences. However, the basic BERT version builds embeddings only on the word level. Due to this, several BERT-like approaches were later developed to solve this problem which will be discussed in this article. By progressively discussing them, we will then reach to the state-of-the-art model called…

