Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Large Language Models: BERT — Bidirectional Encoder Representations from Transformer

Understand how BERT constructs state-of-the-art embeddings

11 min readAug 30, 2023

--

Press enter or click to view image in full size

Introduction

2017 was a historical year in machine learning when the Transformer model made its first appearance on the scene. It has been performing amazingly on many benchmarks and has become suitable for lots of problems in Data Science. Thanks to its efficient architecture, many other Transformer-based models have been developed later which specialise more on particular tasks.

One of such models is BERT. It is primarily known for being able to construct embeddings which can very accurately represent text information and store semantic meanings of long text sequences. As a result, BERT embeddings became widely used in machine learning. Understanding how BERT builds text representations is crucial because it opens the door for tackling a large range of tasks in NLP.

In this article, we will refer to the original BERT paper and have a look at BERT architecture and understand the core mechanisms behind it. In the first sections, we will give a high-level overview of BERT. After that, we will gradually dive into its internal workflow and how information is passed throughout the model. Finally, we will learn how BERT can be…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Vyacheslav Efimov
Vyacheslav Efimov

Written by Vyacheslav Efimov

Senior ML Engineer 👨‍💻 | Passionate about Data Science ⭐️ | Content Creator ✍️