Python fractal box count - fractal dimension - python

I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
10.jpg:
24.jpg:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From https://github.com/rougier/numpy-100 (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?

With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.

Related

Calculating the nearest neighbour in a 2d grid using multilevel solution

I have a problem where in a grid of x*y size I am provided a single dot, and I need to find the nearest neighbour. In practice, I am trying to find the closest dot to the cursor in pygame that crosses a color distance threshold that is calculated as following:
sqrt(((rgb1[0]-rgb2[0])**2)+((rgb1[1]-rgb2[1])**2)+((rgb1[2]-rgb2[2])**2))
So far I have a function that calculates the different resolutions for the grid and reduces it by a factor of two while always maintaining the darkest pixel. It looks as following:
from PIL import Image
from typing import Dict
import numpy as np
#we input a pillow image object and retrieve a dictionary with every grid version of the 3 dimensional array:
def calculate_resolutions(image: Image) -> Dict[int, np.ndarray]:
resolutions = {}
#we start with the highest resolution image, the size of which we initially divide by 1, then 2, then 4 etc.:
divisor = 1
#reduce the grid by 5 iterations
resolution_iterations = 5
for i in range(resolution_iterations):
pixel_lookup = image.load() #convert image to PixelValues object, which allows for pixellookup via [x,y] index
#calculate the resolution of the new grid, round upwards:
resolution = (int((image.size[0] - 1) // divisor + 1), int((image.size[1] - 1) // divisor + 1))
#generate 3d array with new grid resolution, fill in values that are darker than white:
new_grid = np.full((resolution[0],resolution[1],3),np.array([255,255,255]))
for x in range(image.size[0]):
for y in range(image.size[1]):
if not x%divisor and not y%divisor:
darkest_pixel = (255,255,255)
x_range = divisor if x+divisor<image.size[0] else (0 if image.size[0]-x<0 else image.size[0]-x)
y_range = divisor if y+divisor<image.size[1] else (0 if image.size[1]-y<0 else image.size[1]-y)
for x_ in range(x,x+x_range):
for y_ in range(y,y+y_range):
if pixel_lookup[x_,y_][0]+pixel_lookup[x_,y_][1]+pixel_lookup[x_,y_][2] < darkest_pixel[0]+darkest_pixel[1]+darkest_pixel[2]:
darkest_pixel = pixel_lookup[x_,y_]
if darkest_pixel != (255,255,255):
new_grid[int(x/divisor)][int(y/divisor)] = np.array(darkest_pixel)
resolutions[i] = new_grid
divisor = divisor*2
return resolutions
This is the most performance efficient solution I was able to come up with. If this function is run on a grid that continually changes, like a video with x fps, it will be very performance intensive. I also considered using a kd-tree algorithm that simply adds and removes any dots that happen to change on the grid, but when it comes to finding individual nearest neighbours on a static grid this solution has the potential to be more resource efficient. I am open to any kinds of suggestions in terms of how this function could be improved in terms of performance.
Now, I am in a position where for example, I try to find the nearest neighbour of the current cursor position in a 100x100 grid. The resulting reduced grids are 50^2, 25^2, 13^2, and 7^2. In a situation where a part of the grid looks as following:
And I am on the aggregation step where a part of the grid consisting of six large squares, the black one being the current cursor position and the orange dots being dots where the color distance threshold is crossed, I would not know which diagonally located closest neighbour I would want to pick to search next. In this case, going one aggregation step down shows that the lower left would be the right choice. Depending on how many grid layers I have this could result in a very large error in terms of the nearest neighbour search. Is there a good way how I can solve this problem? If there are multiple squares that show they have a relevant location, do I have to search them all in the next step to be sure? And if that is the case, the further away I get the more I would need to make use of math functions such as the pythagorean theorem to assert whether the two positive squares I find are overlapping in terms of distance and could potentially contain the closest neighbour, which would start to be performance intensive again if the function is called frequently. Would it still make sense to pursue this solution over a regular kd tree? For now the grid size is still fairly small (~800-600) but if the grid gets larger the performance may start suffering again. Is there a good scalable solution to this problem that could be applied here?

Inverse FFT returns negative values when it should not

I have several points (x,y,z coordinates) in a 3D box with associated masses. I want to draw an histogram of the mass-density that is found in spheres of a given radius R.
I have written a code that, providing I did not make any errors which I think I may have, works in the following way:
My "real" data is something huge thus I wrote a little code to generate non overlapping points randomly with arbitrary mass in a box.
I compute a 3D histogram (weighted by mass) with a binning about 10 times smaller than the radius of my spheres.
I take the FFT of my histogram, compute the wave-modes (kx, ky and kz) and use them to multiply my histogram in Fourier space by the analytic expression of the 3D top-hat window (sphere filtering) function in Fourier space.
I inverse FFT my newly computed grid.
Thus drawing a 1D-histogram of the values on each bin would give me what I want.
My issue is the following: given what I do there should not be any negative values in my inverted FFT grid (step 4), but I get some, and with values much higher that the numerical error.
If I run my code on a small box (300x300x300 cm3 and the points of separated by at least 1 cm) I do not get the issue. I do get it for 600x600x600 cm3 though.
If I set all the masses to 0, thus working on an empty grid, I do get back my 0 without any noted issues.
I here give my code in a full block so that it is easily copied.
import numpy as np
import matplotlib.pyplot as plt
import random
from numba import njit
# 1. Generate a bunch of points with masses from 1 to 3 separated by a radius of 1 cm
radius = 1
rangeX = (0, 100)
rangeY = (0, 100)
rangeZ = (0, 100)
rangem = (1,3)
qty = 20000 # or however many points you want
# Generate a set of all points within 1 of the origin, to be used as offsets later
deltas = set()
for x in range(-radius, radius+1):
for y in range(-radius, radius+1):
for z in range(-radius, radius+1):
if x*x + y*y + z*z<= radius*radius:
deltas.add((x,y,z))
X = []
Y = []
Z = []
M = []
excluded = set()
for i in range(qty):
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
z = random.randrange(*rangeZ)
m = random.uniform(*rangem)
if (x,y,z) in excluded: continue
X.append(x)
Y.append(y)
Z.append(z)
M.append(m)
excluded.update((x+dx, y+dy, z+dz) for (dx,dy,dz) in deltas)
print("There is ",len(X)," points in the box")
# Compute the 3D histogram
a = np.vstack((X, Y, Z)).T
b = 200
H, edges = np.histogramdd(a, weights=M, bins = b)
# Compute the FFT of the grid
Fh = np.fft.fftn(H, axes=(-3,-2, -1))
# Compute the different wave-modes
kx = 2*np.pi*np.fft.fftfreq(len(edges[0][:-1]))*len(edges[0][:-1])/(np.amax(X)-np.amin(X))
ky = 2*np.pi*np.fft.fftfreq(len(edges[1][:-1]))*len(edges[1][:-1])/(np.amax(Y)-np.amin(Y))
kz = 2*np.pi*np.fft.fftfreq(len(edges[2][:-1]))*len(edges[2][:-1])/(np.amax(Z)-np.amin(Z))
# I create a matrix containing the values of the filter in each point of the grid in Fourier space
R = 5
Kh = np.empty((len(kx),len(ky),len(kz)))
#njit(parallel=True)
def func_njit(kx, ky, kz, Kh):
for i in range(len(kx)):
for j in range(len(ky)):
for k in range(len(kz)):
if np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2) != 0:
Kh[i][j][k] = (np.sin((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)-(np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R*np.cos((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R))*3/((np.sqrt(kx[i]**2+ky[j]**2+kz[k]**2))*R)**3
else:
Kh[i][j][k] = 1
return Kh
Kh = func_njit(kx, ky, kz, Kh)
# I multiply each point of my grid by the associated value of the filter (multiplication in Fourier space = convolution in real space)
Gh = np.multiply(Fh, Kh)
# I take the inverse FFT of my filtered grid. I take the real part to get back floats but there should only be zeros for the imaginary part.
Density = np.real(np.fft.ifftn(Gh,axes=(-3,-2, -1)))
# Here it shows if there are negative values the magnitude of the error
print(np.min(Density))
D = Density.flatten()
N = np.mean(D)
# I then compute the histogram I want
hist, bins = np.histogram(D/N, bins='auto', density=True)
bin_centers = (bins[1:]+bins[:-1])*0.5
plt.plot(bin_centers, hist)
plt.xlabel('rho/rhom')
plt.ylabel('P(rho)')
plt.show()
Do you know why I'm getting these negative values? Do you think there is a simpler way to proceed?
Sorry if this is a very long post, I tried to make it very clear and will edit it with your comments, thanks a lot!
-EDIT-
A follow-up question on the issue can be found [here].1
The filter you create in the frequency domain is only an approximation to the filter you want to create. The problem is that we are dealing with the DFT here, not the continuous-domain FT (with its infinite frequencies). The Fourier transform of a ball is indeed the function you describe, however this function is infinitely large -- it is not band-limited!
By sampling this function only within a window, you are effectively multiplying it with an ideal low-pass filter (the rectangle of the domain). This low-pass filter, in the spatial domain, has negative values. Therefore, the filter you create also has negative values in the spatial domain.
This is a slice through the origin of the inverse transform of Kh (after I applied fftshift to move the origin to the middle of the image, for better display):
As you can tell here, there is some ringing that leads to negative values.
One way to overcome this ringing is to apply a windowing function in the frequency domain. Another option is to generate a ball in the spatial domain, and compute its Fourier transform. This second option would be the simplest to achieve. Do remember that the kernel in the spatial domain must also have the origin at the top-left pixel to obtain a correct FFT.
A windowing function is typically applied in the spatial domain to avoid issues with the image border when computing the FFT. Here, I propose to apply such a window in the frequency domain to avoid similar issues when computing the IFFT. Note, however, that this will always further reduce the bandwidth of the kernel (the windowing function would work as a low-pass filter after all), and therefore yield a smoother transition of foreground to background in the spatial domain (i.e. the spatial domain kernel will not have as sharp a transition as you might like). The best known windowing functions are Hamming and Hann windows, but there are many others worth trying out.
Unsolicited advice:
I simplified your code to compute Kh to the following:
kr = np.sqrt(kx[:,None,None]**2 + ky[None,:,None]**2 + kz[None,None,:]**2)
kr *= R
Kh = (np.sin(kr)-kr*np.cos(kr))*3/(kr)**3
Kh[0,0,0] = 1
I find this easier to read than the nested loops. It should also be significantly faster, and avoid the need for njit. Note that you were computing the same distance (what I call kr here) 5 times. Factoring out such computation is not only faster, but yields more readable code.
Just a guess:
Where do you get the idea that the imaginary part MUST be zero? Have you ever tried to take the absolute values (sqrt(re^2 + im^2)) and forget about the phase instead of just taking the real part? Just something that came to my mind.

How to segment nearby elements in a binary image using Python/Opencv

I have this binary image in where each ‘curve' represents a hat from a pile of these objects. It was obtained by thresholding a region of the original image of stacked straw hats.
As you can see, these curves have many gaps and holes inside of its shapes, which dificults the use of a technique like cv.connectedcomponentes in order to obtain the amount of objects in the image, which is my goal.
I think if there was some technique to fill in these gaps and/or, mainly, the holes, in smaller parts of the original binary image, like the ones I'm showing bellow, maybe by connecting nearby elements or detecting and filling contours, would be possible to segment each curve as an individual element.
Not the most elegant way, but it should be simple enough.
Consider a vertical slice of with w (the same as the slices you posted in your question). If you sum the white pixels along the rows of the slice, you should get six nice "peaks" corresponding to the six rims of the hats:
However, since the rims are rounded, some vertical slices would be better than others for this sort of estimation.
Therefore, I suggest looking at all slices of width w and counting the peaks for each slice.
Here's a Matlab code that does this
img = imread('http://i.stack.imgur.com/69FfJ.jpg'); % read the image
bw = img(:,:,1)>128; % convert to binary
w = 75; % width of slice
all_slices = imfilter(single(bw), ones(1,w)/w, 'symmetric')>.5; % compute horizontal sum of all slices using filter
% a peak is a slice with more than 50% "white" pixels
peaks = diff( all_slices, 1, 1 ) > 0; % detect the peaks using vertical diff
count_per_slice = sum( peaks, 1 ); % how many peaks each slice think it sees
Looking at the distribution of the count_per_slice:
You see that although many slices predict the wrong number of hats (between 4 to 9) the majority votes for the correct number 6:
num_hats = mode(count_per_slice); % take the mode of the distribution.
A python code that does the same (assuming bw is a numpy array of shape (h,w) and of dtype bool):
from scipy import signal, stats
import numpy as np
w = 75;
all_slices = signal.convolve2d( bw.astype('f4'), np.ones((1,w),dtype='f4')/float(w), mode='same', boundary='symmetric')>0.5
peaks = np.diff( all_slices, n=1, axis=0 ) > 0
count_per_slice = peaks.sum( axis=0 )
num_hats = stats.mode( count_per_slice )

Peak detection in a noisy 2d array

I'm trying to get python to return, as close as possible, the center of the most obvious clustering in an image like the one below:
In my previous question I asked how to get the global maximum and the local maximums of a 2d array, and the answers given worked perfectly. The issue is that the center estimation I can get by averaging the global maximum obtained with different bin sizes is always slightly off than the one I would set by eye, because I'm only accounting for the biggest bin instead of a group of biggest bins (like one does by eye).
I tried adapting the answer to this question to my problem, but it turns out my image is too noisy for that algorithm to work. Here's my code implementing that answer:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
from os import getcwd
from os.path import join, realpath, dirname
# Save path to dir where this code exists.
mypath = realpath(join(getcwd(), dirname(__file__)))
myfile = 'data_file.dat'
x, y = np.loadtxt(join(mypath,myfile), usecols=(1, 2), unpack=True)
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
rang = [[xmin, xmax], [ymin, ymax]]
paws = []
for d_b in range(25, 110, 25):
# Number of bins in x,y given the bin width 'd_b'
binsxy = [int((xmax - xmin) / d_b), int((ymax - ymin) / d_b)]
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
paws.append(H)
def detect_peaks(image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask
detected_peaks = local_max - eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+1))
pp.imshow(paw)
pp.subplot(4,2,(2*i+2) )
pp.imshow(detected_peaks)
pp.show()
and here's the result of that (varying the bin size):
Clearly my background is too noisy for that algorithm to work, so the question is: how can I make that algorithm less sensitive? If an alternative solution exists then please let me know.
EDIT
Following Bi Rico advise I attempted smoothing my 2d array before passing it on to the local maximum finder, like so:
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
H1 = gaussian_filter(H, 2, mode='nearest')
paws.append(H1)
These were the results with a sigma of 2, 4 and 8:
EDIT 2
A mode ='constant' seems to work much better than nearest. It converges to the right center with a sigma=2 for the largest bin size:
So, how do I get the coordinates of the maximum that shows in the last image?
Answering the last part of your question, always you have points in an image, you can find their coordinates by searching, in some order, the local maximums of the image. In case your data is not a point source, you can apply a mask to each peak in order to avoid the peak neighborhood from being a maximum while performing a future search. I propose the following code:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import copy
def get_std(image):
return np.std(image)
def get_max(image,sigma,alpha=20,size=10):
i_out = []
j_out = []
image_temp = copy.deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
i_out.append(i)
j_out.append(j)
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
image_temp[yv.clip(0,image_temp.shape[0]-1),
xv.clip(0,image_temp.shape[1]-1) ] = 0
print xv
else:
break
return i_out,j_out
#reading the image
image = mpimg.imread('ggd4.jpg')
#computing the standard deviation of the image
sigma = get_std(image)
#getting the peaks
i,j = get_max(image[:,:,0],sigma, alpha=10, size=10)
#let's see the results
plt.imshow(image, origin='lower')
plt.plot(i,j,'ro', markersize=10, alpha=0.5)
plt.show()
The image ggd4 for the test can be downloaded from:
http://www.ipac.caltech.edu/2mass/gallery/spr99/ggd4.jpg
The first part is to get some information about the noise in the image. I did it by computing the standard deviation of the full image (actually is better to select an small rectangle without signal). This is telling us how much noise is present in the image.
The idea to get the peaks is to ask for successive maximums, which are above of certain threshold (let's say, 3, 4, 5, 10, or 20 times the noise). This is what the function get_max is actually doing. It performs the search of maximums until one of them is below the threshold imposed by the noise. In order to avoid finding the same maximum many times it is necessary to remove the peaks from the image. In the general way, the shape of the mask to do so depends strongly on the problem that one want to solve. for the case of stars, it should be good to remove the star by using a Gaussian function, or something similar. I have chosen for simplicity a square function, and the size of the function (in pixels) is the variable "size".
I think that from this example, anybody can improve the code by adding more general things.
EDIT:
The original image looks like:
While the image after identifying the luminous points looks like this:
Too much of a n00b on Stack Overflow to comment on Alejandro's answer elsewhere here. I would refine his code a bit to use a preallocated numpy array for output:
def get_max(image,sigma,alpha=3,size=10):
from copy import deepcopy
import numpy as np
# preallocate a lot of peak storage
k_arr = np.zeros((10000,2))
image_temp = deepcopy(image)
peak_ct=0
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
k_arr[peak_ct]=[j,i]
# this is the part that masks already-found peaks.
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
# the clip here handles edge cases where the peak is near the
# image edge
image_temp[yv.clip(0,image_temp.shape[0]-1),
xv.clip(0,image_temp.shape[1]-1) ] = 0
peak_ct+=1
else:
break
# trim the output for only what we've actually found
return k_arr[:peak_ct]
In profiling this and Alejandro's code using his example image, this code about 33% faster (0.03 sec for Alejandro's code, 0.02 sec for mine.) I expect on images with larger numbers of peaks, it would be even faster - appending the output to a list will get slower and slower for more peaks.
I think the first step needed here is to express the values in H in terms of the standard deviation of the field:
import numpy as np
H = H / np.std(H)
Now you can put a threshold on the values of this H. If the noise is assumed to be Gaussian, picking a threshold of 3 you can be quite sure (99.7%) that this pixel can be associated with a real peak and not noise. See here.
Now the further selection can start. It is not exactly clear to me what exactly you want to find. Do you want the exact location of peak values? Or do you want one location for a cluster of peaks which is in the middle of this cluster?
Anyway, starting from this point with all pixel values expressed in standard deviations of the field, you should be able to get what you want. If you want to find clusters you could perform a nearest neighbour search on the >3-sigma gridpoints and put a threshold on the distance. I.e. only connect them when they are close enough to each other. If several gridpoints are connected you can define this as a group/cluster and calculate some (sigma-weighted?) center of the cluster.
Hope my first contribution on Stackoverflow is useful for you!
The way I would do it:
1) normalize H between 0 and 1.
2) pick a threshold value, as tcaswell suggests. It could be between .9 and .99 for example
3) use masked arrays to keep only the x,y coordinates with H above threshold:
import numpy.ma as ma
x_masked=ma.masked_array(x, mask= H < thresold)
y_masked=ma.masked_array(y, mask= H < thresold)
4) now you can weight-average on the masked coordinates, with weight something like (H-threshold)^2, or any other power greater or equal to one, depending on your taste/tests.
Comment:
1) This is not robust with respect to the type of peaks you have, since you may have to adapt the thresold. This is the minor problem;
2) This DOES NOT work with two peaks as it is, and will give wrong results if the 2nd peak is above threshold.
Nonetheless, it will always give you an answer without crashing (with pros and cons of the thing..)
I'm adding this answer because it's the solution I ended up using. It's a combination of Bi Rico's comment here (May 30 at 18:54) and the answer given in this question: Find peak of 2d histogram.
As it turns out using the peak detection algorithm from this question Peak detection in a 2D array only complicates matters. After applying the Gaussian filter to the image all that needs to be done is to ask for the maximum bin (as Bi Rico pointed out) and then obtain the maximum in coordinates.
So instead of using the detect-peaks function as I did above, I simply add the following code after the Gaussian 2D histogram is obtained:
# Get 2D histogram.
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
# Get Gaussian filtered 2D histogram.
H1 = gaussian_filter(H, 2, mode='nearest')
# Get center of maximum in bin coordinates.
x_cent_bin, y_cent_bin = np.unravel_index(H1.argmax(), H1.shape)
# Get center in x,y coordinates.
x_cent_coor , y_cent_coord = np.average(xedges[x_cent_bin:x_cent_bin + 2]), np.average(yedges[y_cent_g:y_cent_g + 2])

Compare similarity of images using OpenCV with Python

I'm trying to compare a image to a list of other images and return a selection of images (like Google search images) of this list with up to 70% of similarity.
I get this code in this post and change for my context
# Load the images
img =cv2.imread(MEDIA_ROOT + "/uploads/imagerecognize/armchair.jpg")
# Convert them to grayscale
imgg =cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# SURF extraction
surf = cv2.FeatureDetector_create("SURF")
surfDescriptorExtractor = cv2.DescriptorExtractor_create("SURF")
kp = surf.detect(imgg)
kp, descritors = surfDescriptorExtractor.compute(imgg,kp)
# Setting up samples and responses for kNN
samples = np.array(descritors)
responses = np.arange(len(kp),dtype = np.float32)
# kNN training
knn = cv2.KNearest()
knn.train(samples,responses)
modelImages = [MEDIA_ROOT + "/uploads/imagerecognize/1.jpg", MEDIA_ROOT + "/uploads/imagerecognize/2.jpg", MEDIA_ROOT + "/uploads/imagerecognize/3.jpg"]
for modelImage in modelImages:
# Now loading a template image and searching for similar keypoints
template = cv2.imread(modelImage)
templateg= cv2.cvtColor(template,cv2.COLOR_BGR2GRAY)
keys = surf.detect(templateg)
keys,desc = surfDescriptorExtractor.compute(templateg, keys)
for h,des in enumerate(desc):
des = np.array(des,np.float32).reshape((1,128))
retval, results, neigh_resp, dists = knn.find_nearest(des,1)
res,dist = int(results[0][0]),dists[0][0]
if dist<0.1: # draw matched keypoints in red color
color = (0,0,255)
else: # draw unmatched in blue color
#print dist
color = (255,0,0)
#Draw matched key points on original image
x,y = kp[res].pt
center = (int(x),int(y))
cv2.circle(img,center,2,color,-1)
#Draw matched key points on template image
x,y = keys[h].pt
center = (int(x),int(y))
cv2.circle(template,center,2,color,-1)
cv2.imshow('img',img)
cv2.imshow('tm',template)
cv2.waitKey(0)
cv2.destroyAllWindows()
My question is, how can I compare the image with the list of images and get only the similar images? Is there any method to do this?
I suggest you to take a look to the earth mover's distance (EMD) between the images.
This metric gives a feeling on how hard it is to tranform a normalized grayscale image into another, but can be generalized for color images. A very good analysis of this method can be found in the following paper:
robotics.stanford.edu/~rubner/papers/rubnerIjcv00.pdf
It can be done both on the whole image and on the histogram (which is really faster than the whole image method). I'm not sure of which method allow a full image comparision, but for histogram comparision you can use the cv.CalcEMD2 function.
The only problem is that this method does not define a percentage of similarity, but a distance that you can filter on.
I know that this is not a full working algorithm, but is still a base for it, so I hope it helps.
EDIT:
Here is a spoof of how the EMD works in principle. The main idea is having two normalized matrices (two grayscale images divided by their sum), and defining a flux matrix that describe how you move the gray from one pixel to the other from the first image to obtain the second (it can be defined even for non normalized one, but is more difficult).
In mathematical terms the flow matrix is actually a quadridimensional tensor that gives the flow from the point (i,j) of the old image to the point (k,l) of the new one, but if you flatten your images you can transform it to a normal matrix, just a little more hard to read.
This Flow matrix has three constraints: each terms should be positive, the sum of each row should return the same value of the desitnation pixel and the sum of each column should return the value of the starting pixel.
Given this you have to minimize the cost of the transformation, given by the sum of the products of each flow from (i,j) to (k,l) for the distance between (i,j) and (k,l).
It looks a little complicated in words, so here is the test code. The logic is correct, I'm not sure why the scipy solver complains about it (you should look maybe to openOpt or something similar):
#original data, two 2x2 images, normalized
x = rand(2,2)
x/=sum(x)
y = rand(2,2)
y/=sum(y)
#initial guess of the flux matrix
# just the product of the image x as row for the image y as column
#This is a working flux, but is not an optimal one
F = (y.flatten()*x.flatten().reshape((y.size,-1))).flatten()
#distance matrix, based on euclidean distance
row_x,col_x = meshgrid(range(x.shape[0]),range(x.shape[1]))
row_y,col_y = meshgrid(range(y.shape[0]),range(y.shape[1]))
rows = ((row_x.flatten().reshape((row_x.size,-1)) - row_y.flatten().reshape((-1,row_x.size)))**2)
cols = ((col_x.flatten().reshape((row_x.size,-1)) - col_y.flatten().reshape((-1,row_x.size)))**2)
D = np.sqrt(rows+cols)
D = D.flatten()
x = x.flatten()
y = y.flatten()
#COST=sum(F*D)
#cost function
fun = lambda F: sum(F*D)
jac = lambda F: D
#array of constraint
#the constraint of sum one is implicit given the later constraints
cons = []
#each row and columns should sum to the value of the start and destination array
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[i,:])-x[i]} for i in range(x.size) ]
cons += [ {'type': 'eq', 'fun': lambda F: sum(F.reshape((x.size,y.size))[:,i])-y[i]} for i in range(y.size) ]
#the values of F should be positive
bnds = (0, None)*F.size
from scipy.optimize import minimize
res = minimize(fun=fun, x0=F, method='SLSQP', jac=jac, bounds=bnds, constraints=cons)
the variable res contains the result of the minimization...but as I said I'm not sure why it complains about a singular matrix.
The only problem with this algorithm is that is not very fast, so it's not possible to do it on demand, but you have to perform it with patience on the creation of the dataset and store somewhere the results
You are embarking on a massive problem, referred to as "content based image retrieval", or CBIR. It's a massive and active field. There are no finished algorithms or standard approaches yet, although there are a lot of techniques all with varying levels of success.
Even Google image search doesn't do this (yet) - they do text-based image search - e.g., search for text in a page that's like the text you searched for. (And I'm sure they're working on using CBIR; it's the holy grail for a lot of image processing researchers)
If you have a tight deadline or need to get this done and working soon... yikes.
Here's a ton of papers on the topic:
http://scholar.google.com/scholar?q=content+based+image+retrieval
Generally you will need to do a few things:
Extract features (either at local interest points, or globally, or somehow, SIFT, SURF, histograms, etc.)
Cluster / build a model of image distributions
This can involve feature descriptors, image gists, multiple instance learning. etc.
I wrote a program to do something very similar maybe 2 years ago using Python/Cython. Later I rewrote it to Go to get better performance. The base idea comes from findimagedupes IIRC.
It basically computes a "fingerprint" for each image, and then compares these fingerprints to match similar images.
The fingerprint is generated by resizing the image to 160x160, converting it to grayscale, adding some blur, normalizing it, then resizing it to 16x16 monochrome. At the end you have 256 bits of output: that's your fingerprint. This is very easy to do using convert:
convert path[0] -sample 160x160! -modulate 100,0 -blur 3x99 \
-normalize -equalize -sample 16x16 -threshold 50% -monochrome mono:-
(The [0] in path[0] is used to only extract the first frame of animated GIFs; if you're not interested in such images you can just remove it.)
After applying this to 2 images, you will have 2 (256-bit) fingerprints, fp1 and fp2.
The similarity score of these 2 images is then computed by XORing these 2 values and counting the bits set to 1. To do this bit counting, you can use the bitsoncount() function from this answer:
# fp1 and fp2 are stored as lists of 8 (32-bit) integers
score = 0
for n in range(8):
score += bitsoncount(fp1[n] ^ fp2[n])
score will be a number between 0 and 256 indicating how similar your images are. In my application I divide it by 2.56 (normalize to 0-100) and I've found that images with a normalized score of 20 or less are often identical.
If you want to implement this method and use it to compare lots of images, I strongly suggest you use Cython (or just plain C) as much as possible: XORing and bit counting is very slow with pure Python integers.
I'm really sorry but I can't find my Python code anymore. Right now I only have a Go version, but I'm afraid I can't post it here (tightly integrated in some other code, and probably a little ugly as it was my first serious program in Go...).
There's also a very good "find by similarity" function in GQView/Geeqie; its source is here.
For a simpler implementation of Earth Mover's Distance (aka Wasserstein Distance) in Python, you could use Scipy:
from keras.preprocessing.image import load_img, img_to_array
from scipy.stats import wasserstein_distance
import numpy as np
def get_histogram(img):
'''
Get the histogram of an image. For an 8-bit, grayscale image, the
histogram will be a 256 unit vector in which the nth value indicates
the percent of the pixels in the image with the given darkness level.
The histogram's values sum to 1.
'''
h, w = img.shape[:2]
hist = [0.0] * 256
for i in range(h):
for j in range(w):
hist[img[i, j]] += 1
return np.array(hist) / (h * w)
a = img_to_array(load_img('a.jpg', grayscale=True))
b = img_to_array(load_img('b.jpg', grayscale=True))
a_hist = get_histogram(a)
b_hist = get_histogram(b)
dist = wasserstein_distance(a_hist, b_hist)
print(dist)

Categories

Resources