Member-only story

Similarity Search, Part 3: Blending Inverted File Index and Product Quantization

Discover how to combine two basic similarity search indexes to get the advantages of both

8 min readMay 19, 2023

Similarity search is a problem where given a query the goal is to find the most similar documents to it among all the database documents.

Introduction

In data science, similarity search often appears in the NLP domain, search engines or recommender systems where the most relevant documents or items need to be retrieved for a query. There exists a large variety of different ways to improve search performance in massive volumes of data.

In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and product quantization. Both of them optimize search performance but focus on different aspects: the first one accelerates the search speed while the latter compresses vectors to a smaller, memory-efficient representation.

TDS Archive

Similarity Search, Part 3: Blending Inverted File Index and Product Quantization

Discover how to combine two basic similarity search indexes to get the advantages of both

Introduction

Published in TDS Archive

Written by Vyacheslav Efimov