Various types of Distance Metrics in Machine Learning

5 min readAug 14, 2019

A number of Machine Learning Algorithms — Supervised or Unsupervised, use Distance Metrics to know the input data pattern in order to make any Data-Based decision. A good distance metric helps in improving the performance of Classification, Clustering, and Information Retrieval process significantly. In this article, we will discuss different Distance Metrics and how do they help in Machine Learning Modelling.

In many real-world applications, we use Machine Learning algorithms for classifying or recognizing images and for retrieving information through an Image’s content. For example — Face recognition, Censored Images online, Retail Catalog, Recommendation Systems, etc. Choosing a good distance metric becomes really important here. The distance metric helps algorithms to recognize similarities between the contents.

Distance Function

Basic distance function we all know that is Pythagorean Theorem. In order to calculate the distance between two data points A and B Pythagorean theorem considers the length X and Y-axis

In Machine Learning algorithm we used this formula as a distance function.

Now we will discuss some of the distance metrics here and implement them in python

Distance Metrics

The set of input attributes, for which we want to make a prediction about the resulting output attributes, is called the query, or query point. The first step in making a prediction with MBL(Memory-Based Learning) is to look through the database to find all the data points whose input attributes are similar to the query point. In order to do that, we have to define what is meant by similar. We need to define a distance metric that tells how close two points are.

Now we will understand the math behind the Distance Metrics and how to Implement them.

Euclidean Distance

Euclidean distance is the most common use of distance. In most cases when people said about distance, they will refer to Euclidean distance. Euclidean distance is also known as simply distance. When data is dense or continuous, this is the best proximity measure.

In Cartesian coordinates, if p = (p1, p2,…, pn) and q = (q1, q2,…, qn) are two points in Euclidean n-space, then the distance (d) from p to q, or from q to p is given by the Pythagorean formula:

Manhattan Distance

Manhattan distance is a metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates. In a simple way of saying it is the total sum of the difference between the x-coordinates and y-coordinates.

Suppose we have two points A and B if we want to find the Manhattan distance between them, just we have, to sum up, the absolute x-axis and y — axis variation means we have to find how these two points A and B are varying in X-axis and Y- axis. In a more mathematical way of saying Manhattan distance between two points measured along axes at right angles.

In a plane with p1 at (x1, y1) and p2 at (x2, y2).

Manhattan distance = |x1 — x2| + |y1 — y2|

This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1 distance or L1 norm, city block distance, Minkowski’s L1 distance, taxi-cab metric, or city block distance.

Minkowski Distance

The Minkowski distance is a generalized metric form of Euclidean distance and Manhattan distance.

The Minkowski distance of order p between two points

Cosine Distance

Mostly Cosine distance metric is used to find similarities between different documents. In cosine metric, we measure the degree of angle between two documents/vectors(the term frequencies in different documents collected as metrics). This particular metric is used when the magnitude between vectors does not matter but the orientation.

Cosine similarity formula can be derived from the equation of dot products:-

Jaccard Index

The Jaccard similarity index (sometimes called the Jaccard similarity coefficient) compares members for two sets to see which members are shared and which are distinct. It’s a measure of similarity for the two sets of data, with a range from 0% to 100%. The higher the percentage, the more similar the two populations. Although it’s easy to interpret, it is extremely sensitive to small samples sizes and may give erroneous results, especially with very small samples or data sets with missing observations.