Member-only story

Large Language Models: DistilBERT — Smaller, Faster, Cheaper and Lighter

Unlocking the secrets of BERT compression: a student-teacher framework for maximum efficiency

7 min readOct 7, 2023

Introduction

In recent years, the evolution of large language models has skyrocketed. BERT became one of the most popular and efficient models allowing to solve a wide range of NLP tasks with high accuracy. After BERT, a set of other models appeared later on the scene demonstrating outstanding results as well.

The obvious trend that became easy to observe is the fact that with time large language models (LLMs) tend to become more complex by exponentially augmenting the number of parameters and data they are trained on. Research in deep learning showed that such techniques usually lead to better results. Unfortunately, the machine learning world has already dealt with several problems regarding LLMs and scalability has become the main obstacle in effective training, storing and using them.

By taking into consideration this issue, special techniques have been elaborated for compressing LLMs. The objectives of compressing algorithms are either decreasing training time, reducing memory consumption or accelerating model inference. The three most common compression techniques used in…

TDS Archive

Large Language Models: DistilBERT — Smaller, Faster, Cheaper and Lighter

Unlocking the secrets of BERT compression: a student-teacher framework for maximum efficiency

Introduction

Published in TDS Archive

Written by Vyacheslav Efimov