Member-only story
Similarity Search, Part 3: Blending Inverted File Index and Product Quantization
Discover how to combine two basic similarity search indexes to get the advantages of both
Similarity search is a problem where given a query the goal is to find the most similar documents to it among all the database documents.
Introduction
In data science, similarity search often appears in the NLP domain, search engines or recommender systems where the most relevant documents or items need to be retrieved for a query. There exists a large variety of different ways to improve search performance in massive volumes of data.
In the first two parts of this series we have discussed two fundamental algorithms in information retrieval: inverted file index and product quantization. Both of them optimize search performance but focus on different aspects: the first one accelerates the search speed while the latter compresses vectors to a smaller, memory-efficient representation.

