Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Similarity Search, Part 3: Blending Inverted File Index and Product Quantization

Discover how to combine two basic similarity search indexes to get the advantages of both

8 min readMay 19, 2023

--

Press enter or click to view image in full size

Similarity search is a problem where given a query the goal is to find the most similar documents to it among all the database documents.

Introduction

In data science, similarity search often appears in the NLP domain, search engines or recommender systems where the most relevant documents or items need to be retrieved for a query. There exists a large variety of different ways to improve search performance in massive volumes of data.

In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and product quantization. Both of them optimize search performance but focus on different aspects: the first one accelerates the search speed while the latter compresses vectors to a smaller, memory-efficient representation.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Vyacheslav Efimov
Vyacheslav Efimov

Written by Vyacheslav Efimov

Senior ML Engineer 👨‍💻 | Passionate about Data Science ⭐️ | Content Creator ✍️