Text & Data Mining by practical means: Image processing

Showing posts with label Image processing. Show all posts

Saturday, June 7, 2014

Fractals: a deterministic and recursive algorithm to reproduce it (Part II)

Abstract:

Fern fractal estimation thru recursive and deterministic algorithm.

A fractal can be described as fixed point of a Markov process: given the proper contractive maps it's easy to generate it.

The question is: given the contractive maps, is it possible to generate the fractal using a pure deterministic approach?

Problem

We already observed that the iteration of the contractive maps starting from a fixed point that lyes on the border of the fractal returns points of the convex hull of the fractal.

What's happen if we recursively apply such schema to the points obtained from the the above procedure?
Is it possible recreate the markov process (or at least approximate it) removing any probabilistic aspects?

The below algorithm, given the contractive maps of the fractal, returns at each iteration better approximation of it.

At each iteration it considers the fractal as better approximation of the contour of the original image.

The Algorithm (by practical means)

Consider a fixed point $P_0$ of a contractive map $\tau_1$ that lyes on the convex hull of the fractal.
Choose randomly a point $P_i$ of the fractal and apply the above contractive map until the distance to $P_0$ is negligible.
Map the point calculated at step 2 using sequentially all the contractive maps.
Map each point obtained from point 3 the former step with $\tau_1$ till the distance to $P_0$ is negligible.
If[ITERATIONS< K]:

K--;
For each point obtained from point 4 go to point 3.

To explain how it works I plotted the list of points got using $K=1$ iterations of the algorithm:

Bigger is $K$, more accurate will be the result.
The above procedure works only if the contractive map $\tau_1$ has a fixed point that lyes on the convex hull of the fractal!!

Results:

I iterated the algorithm with $K=4$ times. At each iteration the algorithm returns a deeper description of the contour of the fractal (...even though definition of contour for a fractal doesn't make any sense, it gives at least a practical meaning):

Results of the Recursive Algorithm iterated with K=1,2,3,4.

If we try to overlap the results obtained with the original fern we get:

Original Fern, and the overlap with the results obtained using the recursive algorithm (K=4).

Conclusion and next steps
I showed how to depict the IFS generated thru the markovian process as a recursive and completely deterministic process.
We noticed also, that in the fractal there are special points (as for instance $P_0$) that play a crucial role to describe the IFS.

The question now is the following:
is it possible leverage such special points and the above recursive algorithm to estimate the parameters of the contractive maps?
... I will show you a procedure that partially answer the question and some other example of the algorithm above described!
Stay Tuned.
Cristian

Wednesday, May 15, 2013

Image comparison: SURF algorithm

In the former post I showed a naif approach that can be easily implemented and that works quite well under the condition that the image is not subjected to distortions.
I decided to deepen a bit the topic just for the sake of the curiosity ( curiosity and imagination are the skills that I most appreciate at work!).

SURF stands for "Speeded Up Robust Features" and it has been published by Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool in 2008. The original paper is available here.

I strongly recommend to read the paper: it's self contained and it's very clear to be read also by neophyte as I.
As usual I go straight to the experiments and to discuss the results.

Experiment
I considered again the same data set of the last post, and I tried to compare all the possible combinations of pairs of images to check the accuracy of the algorithm.

The first step of the algorithm is to determine the key features of an image.

key features detection

Once the features have been detected it's possible to use them for the comparison through euclidean measures or similar.

In the below image I plotted the common points and their position between two images:

Common points found between two images. In yellow I depicted them and their position.

If you zoom a bit the image you will notice that the common points found are thousands. To find them the algorithm took around four minutes.

In a productive environment this amount of time could be not acceptable.

To improve the situation and reduce the computational time, I processed the image with binarization filters; after this preprocessing the time has been reduced from 4 minutes to 25 seconds.

Results

In the below matrix I listed all the comparison done to check the accuracy.

The correct pairs lay on the principal diagonal of the matrix.

In red have been listed the correct pairs

To show graphically the results I plotted over the matrix the number of common points found for each pair. The colors chosen highlight that number.

Conclusions

As you can see for all the pairs on the principal diagonal the algorithm detected the highest number of common points.

In the next post I will show how to use this algorithm to identify logos in a document.

Stay tuned

Cristian

Saturday, March 30, 2013

Image comparison for dummies: an easy and fast approach

Since I started to blog, I presented different methods, techniques and tricks to solve problems related to Data mining, and text mining (... I'm still puzzled about the difference between text mining and text analytics).

Today I would like to present a different approach to solve a common problem in the text analytics area: the document comparison problem.

The problem
Usually if you want to compare two documents in order to check if they are pretty much the same you can apply some easy tricks in order to check frequency and position of the words.
Sometime the documents require an OCR processing in order to get the "text rendition" for the documents, and such process usually is a very expensive in term of computational time (often it takes more than 10 seconds per page): what if you have to perform many comparison checks and you are short in time?
...In that case I suggest to consider the documents as images!

Image processing approach
This branch of the computer science evolved quite fast in the last decade and it achieves a level of complexity extremely high.
I'm not an expert at all, and many libraries (often free as opencv) are available and they offer several functionalities: image comparison, image recognition and so on.
Unfortunately I'm stubborn and curious and I like try to do experiments :).

So I decided to implement a very easy algorithm able to compare with high accuracy documents very similar.

Algorithm:
1. Resize the image (in order to have images having the same number of pixels)
2. Binarize the image
3. compute the histogram distribution
4. perform the comparison considering the Kullback distance between the two empirical distributions got from the histograms.

Considerations:

such algorithm is not robust to linear transformation/distortions of the image (but this is not case when you compare documents...).
it's very fast respect the solutions that require OCR processing.
it could be easily optimized considering sub-block of the images (and give higher weights to blocks in the middle of the image).
through some assumptions based on eigenvectors it could be helpful also in case of image linear transformations.

Experiment

I selected 10 random images (pictures, medical analysis, text, screenshot):

Original sample set

For each document, after some image adjust filter, I computed the histograms:

Histograms for the above images

I introduced some noise in each of the documents and I computed again the histograms:

Noised sample set.

Histograms for the noised images

As you can see, despite the noise introduced is quite big, the characteristic histogram seems to preserve quite well the shape.

Results:

I compared each possible possible combination of the pairs composed by original image and noised image and I ranked the result of the comparison.

...The test has been passed with an accuracy of 100%.

I have to say that the images are quite heterogeneous among each other, but further tests (I'll publish them in one of the next posts) highlighted that the method is quite robust (with all the limitations explained above).

Here is some of the best matching found:

Some of the best matching found

...I have to say that the image processing is a very fascinating science!
Stay tuned
cristian

Pages