SIFT Interest Point Detector Using Python - OpenCV

SIFT (Scale-Invariant Feature Transform) is a feature descriptor used to detect and describe local features in images. It identifies distinctive keypoints that remain stable under changes in scale, rotation, and viewpoint, making it highly effective for image matching and object recognition.

Detects stable keypoints that can be reliably identified across different images.
Generates distinctive descriptors that enable accurate feature matching.

Unlike the Harris Detector, which depends on image properties such as viewpoint, depth, and scale, SIFT can detect features independently of these variations by transforming image data into scale-invariant coordinates.

Working

sfsdf — Sequence of steps followed in SIFT Detector

Step 1: Scale Space Peak Selection

In this step, the image is analyzed at multiple scales by applying Gaussian filters with different sigma values. This helps detect features that remain stable even when the image size changes.

Multiple Gaussian-blurred versions of the image are created.
Difference of Gaussian (DoG) images are generated from adjacent scales.
Local maxima and minima are identified as candidate keypoints.

Step 2: Key Point Localization

The candidate keypoints obtained from the previous step are refined to improve accuracy and stability. Weak and unstable keypoints are removed.

Refines the position of candidate keypoints.
Removes low-contrast keypoints that may be affected by noise.
Eliminates keypoints located along edges.

z = -\frac{\partial^2 D^{-1}}{\partial x^2}\frac{\partial D}{\partial x}

In the above expression, D represents the Difference of Gaussian. To remove the unstable key points, the value of z is calculated and if the function value at z is below a threshold value then the point is excluded.

Step 3: Orientation Assignment

Each keypoint is assigned an orientation based on the gradient information of its neighborhood. This makes the detector invariant to image rotation.

Gradient magnitude and direction are computed around each keypoint.
An orientation histogram is created from the gradient values.
The dominant orientation is assigned to the keypoint.

Step 4: Keypoint Descriptor Generation

A descriptor is generated for each keypoint using information from its local neighborhood. These descriptors are later used to match features between images.

A 16×16 neighborhood around the keypoint is selected.
The neighborhood is divided into sixteen 4×4 subregions.
An 8-bin orientation histogram is computed for each subregion.
The resulting 128-dimensional descriptor uniquely represents the keypoint.

Implementation

Let's consider an image and use the SIFT algorithm to detect keypoints and generate feature descriptors. The detected keypoints are then visualized on the image.

Note: In OpenCV 4.4+, SIFT is available using cv2.SIFT_create(). If needed, install:
pip install opencv-contrib-python>=4.4

Python

import cv2

img = cv2.imread('geeks.jpg')

gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

sift = cv2.SIFT_create()
kp = sift.detect(gray, None)

img=cv2.drawKeypoints(gray ,
                      kp ,
                      img ,
                      flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv2.imwrite('image-with-keypoints.jpg', img)

Output:

The image on left is the original, the image on right shows the various highlighted interest points on the image