Member-only story

Large Language Models: BERT — Bidirectional Encoder Representations from Transformer

Understand how BERT constructs state-of-the-art embeddings

11 min readAug 30, 2023

Introduction

2017 was a historical year in machine learning when the Transformer model made its first appearance on the scene. It has been performing amazingly on many benchmarks and has become suitable for lots of problems in Data Science. Thanks to its efficient architecture, many other Transformer-based models have been developed later which specialise more on particular tasks.

One of such models is BERT. It is primarily known for being able to construct embeddings which can very accurately represent text information and store semantic meanings of long text sequences. As a result, BERT embeddings became widely used in machine learning. Understanding how BERT builds text representations is crucial because it opens the door for tackling a large range of tasks in NLP.

In this article, we will refer to the original BERT paper and have a look at BERT architecture and understand the core mechanisms behind it. In the first sections, we will give a high-level overview of BERT. After that, we will gradually dive into its internal workflow and how information is passed throughout the model. Finally, we will learn how BERT can be…

TDS Archive

Large Language Models: BERT — Bidirectional Encoder Representations from Transformer

Understand how BERT constructs state-of-the-art embeddings

Introduction

Published in TDS Archive

Written by Vyacheslav Efimov