Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Large Language Models: DistilBERT — Smaller, Faster, Cheaper and Lighter

Unlocking the secrets of BERT compression: a student-teacher framework for maximum efficiency

7 min readOct 7, 2023

--

Press enter or click to view image in full size

Introduction

In recent years, the evolution of large language models has skyrocketed. BERT became one of the most popular and efficient models allowing to solve a wide range of NLP tasks with high accuracy. After BERT, a set of other models appeared later on the scene demonstrating outstanding results as well.

The obvious trend that became easy to observe is the fact that with time large language models (LLMs) tend to become more complex by exponentially augmenting the number of parameters and data they are trained on. Research in deep learning showed that such techniques usually lead to better results. Unfortunately, the machine learning world has already dealt with several problems regarding LLMs and scalability has become the main obstacle in effective training, storing and using them.

By taking into consideration this issue, special techniques have been elaborated for compressing LLMs. The objectives of compressing algorithms are either decreasing training time, reducing memory consumption or accelerating model inference. The three most common compression techniques used in…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Vyacheslav Efimov
Vyacheslav Efimov

Written by Vyacheslav Efimov

Senior ML Engineer 👨‍💻 | Passionate about Data Science ⭐️ | Content Creator ✍️