Sitemap

Cosine Similarity Vs Euclidean Distance

3 min readDec 26, 2019
Press enter or click to view image in full size
Photo by Marcus Dall Col on Unsplash

In this article, I would like to explain what Cosine similarity and euclidean distance are and the scenarios where we can apply them.

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any angle in the interval (0,π] radians. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. (source: Wikipedia)

To explain, as illustrated in the following figure 1, let’s consider two cases where one of the two (viz., cosine similarity or euclidean distance) is more effective measure.

Case 1: When Cosine Similarity is better than Euclidean distance

Let’s assume OA, OB and OC are three vectors as illustrated in the figure 1. The points A, B and C form an equilateral triangle. This means that the Euclidean distance of these points are same (AB = BC = CA). In this case, the Euclidean distance will not be effective in deciding which of the three vectors are similar to each other. Although the magnitude (length) of the vectors are different, Cosine similarity measure shows that OA is more similar to OB than to OC.

#Python code for Case 1: Where Cosine similarity measure is better than Euclidean distancefrom scipy.spatial import distance# The points below have been selected to demonstrate the case for Cosine similarity
O = [0.00, 0.00]
A = [1.45, 7.56]
B = [7.81, 12.41]
C = [8.83, 4.48]
#Cosine similarity
cos_simA_B = 1 - distance.cosine(A, B)
cos_simB_C = 1 - distance.cosine(B, C)
cos_simA_C = 1 - distance.cosine(A, C)
#Measuring Euclidean distances
euc_dstA_B = distance.euclidean(A,B)
euc_dstB_C = distance.euclidean(B,C)
euc_dstA_C = distance.euclidean(C,A)
#Output:Case 1: Where Cosine similarity measure is better than Euclidean distanceCosine Similarity measure:
Between OA and OB: 0.93
Between OB and OC: 0.86
Between OC and OA: 0.61
Euclidean Distances:
From A to B: 8.0
From B to C: 8.0
From C to A: 8.0

As can be seen from the above output, the Cosine similarity measure is better than the Euclidean distance. Cosine similarity measure suggests that OA and OB are closer to each other than OA to OC.

Case 2: When Euclidean distance is better than Cosine similarity

Consider another case where the points A’, B’ and C’ are collinear as illustrated in the figure 1. In this case, Cosine similarity of all the three vectors (OA’, OB’ and OC’) are same (equals to 1). However, the Euclidean distance measure will be more effective and it indicates that A’ is more closer (similar) to B’ than C’.

#Python code for Case 2: Euclidean distance is better than Cosine similarity
A_ = [8.00, 2.00]
B_ = [12.00, 3.00]
C_ = [32.00, 8.00]
#Cosine similarity
cos_simA_B_ = 1 - distance.cosine(A_, B_)
cos_simB_C_ = 1 - distance.cosine(B_, C_)
cos_simA_C_ = 1 - distance.cosine(A_, C_)
#Euclidean distance
dstA_B_ = distance.euclidean(A_,B_)
dstB_C_ = distance.euclidean(B_,C_)
dstA_C_ = distance.euclidean(C_,A_)
#Output:Case 2: Euclidean distance is a better measure than Cosine similarityCosine Similarity measure:
Between OA' and OB': 1.0
Between OB' and OC': 1.0
Between OC' and OA': 1.0
Euclidean Distances:
From A' to B': 4.123105625617661
From B' to C': 20.615528128088304
From A' to C': 24.73863375370596

As can be seen from the above output, the Cosine similarity measure was same but the Euclidean distance suggests points A and B are closer to each other and hence similar to each other.

When to use Cosine similarity or Euclidean distance?

Please read the article from Chris Emmery for more information.

--

--