How to segment nearby elements in a binary image using Python/Opencv - python

I have this binary image in where each ‘curve' represents a hat from a pile of these objects. It was obtained by thresholding a region of the original image of stacked straw hats.
As you can see, these curves have many gaps and holes inside of its shapes, which dificults the use of a technique like cv.connectedcomponentes in order to obtain the amount of objects in the image, which is my goal.
I think if there was some technique to fill in these gaps and/or, mainly, the holes, in smaller parts of the original binary image, like the ones I'm showing bellow, maybe by connecting nearby elements or detecting and filling contours, would be possible to segment each curve as an individual element.

Not the most elegant way, but it should be simple enough.
Consider a vertical slice of with w (the same as the slices you posted in your question). If you sum the white pixels along the rows of the slice, you should get six nice "peaks" corresponding to the six rims of the hats:
However, since the rims are rounded, some vertical slices would be better than others for this sort of estimation.
Therefore, I suggest looking at all slices of width w and counting the peaks for each slice.
Here's a Matlab code that does this
img = imread(''); % read the image
bw = img(:,:,1)>128; % convert to binary
w = 75; % width of slice
all_slices = imfilter(single(bw), ones(1,w)/w, 'symmetric')>.5; % compute horizontal sum of all slices using filter
% a peak is a slice with more than 50% "white" pixels
peaks = diff( all_slices, 1, 1 ) > 0; % detect the peaks using vertical diff
count_per_slice = sum( peaks, 1 ); % how many peaks each slice think it sees
Looking at the distribution of the count_per_slice:
You see that although many slices predict the wrong number of hats (between 4 to 9) the majority votes for the correct number 6:
num_hats = mode(count_per_slice); % take the mode of the distribution.
A python code that does the same (assuming bw is a numpy array of shape (h,w) and of dtype bool):
from scipy import signal, stats
import numpy as np
w = 75;
all_slices = signal.convolve2d( bw.astype('f4'), np.ones((1,w),dtype='f4')/float(w), mode='same', boundary='symmetric')>0.5
peaks = np.diff( all_slices, n=1, axis=0 ) > 0
count_per_slice = peaks.sum( axis=0 )
num_hats = stats.mode( count_per_slice )


Numpy array - Two unknown dimensions - png files

I have a dataset existing of a bunch of png files with different sizes regarding their heights and widths.
I read in this files with the following code to get a numpy array. In this case, it is 2D. But actually I want to get a 3D array which exists of the number of images n, the height of the images h and the width w.
import os.path
import glob
import numpy as np
def open_images(images_directory):
pattern_to_match = os.path.join(images_directory, "*.png")
png_files = (x for x in glob.iglob(pattern_to_match)
if os.path.isfile(x))
for current_png_filename in png_files:
print("Opening file", current_png_filename)
with open(current_png_filename, "rb") as current_png_file:
data =
return np.frombuffer(data, dtype=np.uint8, offset=16)\
.reshape(-1, 3)\
directory_to_search = r"C:\Users\tobis\OneDrive\Desktop\Masterarbeit\data\2017-IWT4S-HDR_LP-dataset\crop_h1"
At the moment, I get an array with a shpae like this:
(21559, 3). I think the first number is a combination of width and height and the last is the RGB value. I would like to get an array that looks like this one: (n, h, w).
Is there a way to get such an array? Unfortunately, I have two unknown dimensions. This seems to be the problem...
You can't just read an image file like that. You need to use a library to read it and interpret the height, width, colourspace, bits per pixel, date, the GPS data, the camera make and model and all the compressed, encoded pixels.
For example, with PIL/Pillow:
from PIL import Image
import numpy as np
# Open image and make sure it is RGB - not palette
im ='image.png').convert('RGB')
# Make into Numpy array
na = np.array(im)
# Check shape
print(na.shape) # prints (480,640,3) for height, width, channels
If you have a flattened image and would like to recover the original row and column dimensions you can apply a heuristic that tests various possible combinations and checks the "smoothness" of the image along the row axis. This smoothness can be checked via the mean squared error of consecutive rows for example. This assumes that the original image has some kind of structure, also along the row axis, and so the change in consecutive pixels in the original image will be relatively small when compare to other possible shapes.
For example let's say the original image is 155 x 79 pixels and it has been flattened into an array of 155 * 79 == 12245. The prime factorization of this is 5, 31, 79. So the possible row dimensions are all unique combinations of these prime factors, i.e. 5, 31, 79, 155, 395, 2449. Now these possible row dimensions, in the following referred to as estimates, can be sorted into two different categories:
Estimates which are a divisor of the original row dimension: 5, 31 and 155. This means effectively that multiple row-skipped copies of the original image are stacked next to each other. So the resulting image will retain the original column grouping. Since similar columns remain together each element of the stack will have roughly the same smoothness. For example if the estimate is 31 this means that the original shape 31 x 5 , 79 is transformed to 31 , 5 x 79, i.e. only every 5-th row of the original image is considered and five such copies are stacked next to each other. For the original image (i.e. an estimate of 155) length-1 correlations are considered (i.e. each pair of consecutive rows is compared), while for an estimate of 31 length-5 correlations are considered (i.e. comparing row-pairs that have another 4 rows between them). Since the original image is expected to have some smooth structure, the smoothness should decrease when longer ranges are compared. The decrease in smoothness will be bigger when the skip-range increases, but it can also completely vanish if the image contains some degree of periodicity along the row axis.
All other estimates: 79, 155, 395, 2449. For estimates of this category different columns of the original image are mixed in the test image corresponding to the estimate. For example if the estimate is 79 we have 155 % 79 == 76, i.e. each new row in the test image shifts the original columns by 3 with respect to the previous row. Assuming that the original image varies along the column dimension these shifts will introduce an increasingly strong deviation for the emerging consecutive rows. Since this column shift increases from row to row the resulting decrease in row-smoothness should be strong unless the number of rows is small. If the original image is column-periodic with the shift number of the estimate this can lead to a perfect agreement however.
So to summarize, if we compute the smoothness for all row dimension estimates we expect the smoothness to decrease for a wrong estimate and the decrease will be small if the estimate falls in category (1) and bigger if it falls in category (2).
Important: If the images are periodic along either the row or column dimension this can lead to a false estimate.
The implementation needs to cover the following steps:
Compute the prime factorization of the length of the flattened image.
Compute all unique row dimension estimates from combinations of the prime factors.
For each estimate compute the row-smoothness of the resulting test image. For example use the mean squared error of consecutive rows (actually this will be a non-smoothness score).
Find the best estimate from the scores.
Here is some example code for the implementation:
import itertools as it
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
image = np.array('example.jpg'))
original_shape = image.shape[:2]
image = image.reshape(-1, 3)
def compute_prime_factors(n):
i = 2
while i <= n:
if n % i == 0:
n //= i
yield i
i += 1
prime_factors = list(compute_prime_factors(len(image)))
combinations = it.chain.from_iterable(it.combinations(prime_factors, r=i) for i in range(1, len(prime_factors)))
row_dims = sorted({ for x in combinations})
def test_row_dim(r):
c = len(image) // r
test = image.reshape(r, c, 3)
return np.mean((test[1:] - test[:-1])**2)
scores = [test_row_dim(r) for r in row_dims]
best_estimate = row_dims[np.argmin(scores)]
fig, ax = plt.subplots()
ax.set(xlabel='row dimension', ylabel='score')
ax.plot(row_dims, scores, '-o', label='Estimations')
ax.plot([best_estimate], [np.min(scores)], '*', ms=12, label=f'Best Estimate ({best_estimate})')
ax.axvline(original_shape[0], label=f'Actual Dim ({original_shape[0]})', color='#2ca02c', zorder=-100, lw=1.5, ls='--')
plt.imshow(image.reshape(205, -1, 3)) # second best score
Let's test it on some image (H x W: 410 x 640):
Photo by Cameron Venti on Unsplash
This produces the following estimate scores:
The peaks to the left of the best estimate are the category (1) estimates that have the smallest row-skip. The prime factorization of 410 and 640 is 2*5*41 and 2**7 * 5 respectively. So the category (1) estimates that get closest to the original row dimension are 205, 82 and 41 (the side peaks from right to left). A decreasing estimate implies an increasing row-skip range and hence an increasing MSE score. The peak to the left of the best estimate corresponds to an estimate of 205, i.e. each second row gets skipped and hence two such row-skipped versions are stacked next to each other:
As you can imagine, by skipping every second row, the image doesn't change too much and the change is the same for the two side-by-side versions. Hence the small difference to the original image's MSE score.

Smoothing without filling missing values with zeros

I'd like to smooth a map that doesn't cover the full sky. This map is not Gaussian nor has it mean zero, so the default behavior of healpy in which it fills missing values with 0 leads to a bias towards lower values at the edges of this mask:
import healpy as hp
nside = 128
npix = hp.nside2npix(nside)
arr = np.ones(npix)
mask = np.zeros(npix, dtype=bool)
mask[:mask.size//2] = True
arr[~mask] = hp.UNSEEN
arr_sm = hp.smoothing(arr, fwhm=np.radians(5.))
hp.mollview(arr, title='Input array')
hp.mollview(arr_sm, title='Smoothed array')
I would like to preserve the sharp edge by setting the weight of the masked values to zero, instead of setting the values to zero. This appears to be difficult because healpy performs the smoothing in harmonic space.
To be more specific, I'd like to mimic the mode keyword in scipy.gaussian_filter(). healpy.smoothing() implicitly uses mode=constant with cval=0, but I'd require something like mode=reflect.
Is there any reasonable way to overcome this issue?
The easiest way to handle this is to remove the mean of the map, perform the smoothing with hp.smoothing, then add the offset back.
This works around the issue because now the map is zero-mean so zero-filling does not create a border effect.
def masked_smoothing(m, fwhm_deg=5.0):
#make sure m is a masked healpy array
m =
offset = m.mean()
smoothed=hp.smoothing(m - offset, fwhm=np.radians(fwhm_deg))
return smoothed + offset
The other option I can think of is some iterative algorithm to fill the map in "reflect" mode before smoothing, possibly to be implemented in cythonor numba , the main issue is how complex is your boundary. If it is easy like a latitude cut then all of this is easy, for a general case is very complex and could have a lot of corner cases you need to handle:
Identify "border layers"
get all the missing pixels
find the neighbors and find which one has a valid neighbor and mark it as the "first border"
repeat this algorithm and find pixels that have a "first border" pixel neighbor and mark it as "second border"
repeat until you have all the layers you need
Fill reflected values
loop on border layers
loop on each layer pixel
find the valid neighbors, compute their barycenter, now assume that the line between the border pixel center and the barycenter is going perpendicular through the mask boundary and the mask boundary is halfway
now extend this line by doubling it in the direction inside the mask, take the interpolated value of the map at that location and assign it to the current missing pixel
repeat this for the other layers by playing with the length of the line.
This problem is related to the following question and answer (disclaimer: from me):
It can be transferred to your case as follows:
import numpy as np
import healpy as hp
nside = 128
npix = hp.nside2npix(nside)
# using random numbers here to see the actual smoothing
arr = np.random.rand(npix)
mask = np.zeros(npix, dtype=bool)
mask[:mask.size//2] = True
def masked_smoothing(U, rad=5.0):
VV=hp.smoothing(V, fwhm=np.radians(rad))
WW=hp.smoothing(W, fwhm=np.radians(rad))
return VV/WW
# setting array to np.nan to exclude the pixels in the smoothing
arr[~mask] = np.nan
arr_sm = masked_smoothing(arr)
arr_sm[~mask] = hp.UNSEEN
hp.mollview(arr, title='Input array')
hp.mollview(arr_sm, title='Smoothed array')

Python fractal box count - fractal dimension

I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?
With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.

Peak detection in a noisy 2d array

I'm trying to get python to return, as close as possible, the center of the most obvious clustering in an image like the one below:
In my previous question I asked how to get the global maximum and the local maximums of a 2d array, and the answers given worked perfectly. The issue is that the center estimation I can get by averaging the global maximum obtained with different bin sizes is always slightly off than the one I would set by eye, because I'm only accounting for the biggest bin instead of a group of biggest bins (like one does by eye).
I tried adapting the answer to this question to my problem, but it turns out my image is too noisy for that algorithm to work. Here's my code implementing that answer:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
from os import getcwd
from os.path import join, realpath, dirname
# Save path to dir where this code exists.
mypath = realpath(join(getcwd(), dirname(__file__)))
myfile = 'data_file.dat'
x, y = np.loadtxt(join(mypath,myfile), usecols=(1, 2), unpack=True)
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
rang = [[xmin, xmax], [ymin, ymax]]
paws = []
for d_b in range(25, 110, 25):
# Number of bins in x,y given the bin width 'd_b'
binsxy = [int((xmax - xmin) / d_b), int((ymax - ymin) / d_b)]
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
def detect_peaks(image):
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask
detected_peaks = local_max - eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+2) )
and here's the result of that (varying the bin size):
Clearly my background is too noisy for that algorithm to work, so the question is: how can I make that algorithm less sensitive? If an alternative solution exists then please let me know.
Following Bi Rico advise I attempted smoothing my 2d array before passing it on to the local maximum finder, like so:
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
H1 = gaussian_filter(H, 2, mode='nearest')
These were the results with a sigma of 2, 4 and 8:
A mode ='constant' seems to work much better than nearest. It converges to the right center with a sigma=2 for the largest bin size:
So, how do I get the coordinates of the maximum that shows in the last image?
Answering the last part of your question, always you have points in an image, you can find their coordinates by searching, in some order, the local maximums of the image. In case your data is not a point source, you can apply a mask to each peak in order to avoid the peak neighborhood from being a maximum while performing a future search. I propose the following code:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import copy
def get_std(image):
return np.std(image)
def get_max(image,sigma,alpha=20,size=10):
i_out = []
j_out = []
image_temp = copy.deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
xv.clip(0,image_temp.shape[1]-1) ] = 0
print xv
return i_out,j_out
#reading the image
image = mpimg.imread('ggd4.jpg')
#computing the standard deviation of the image
sigma = get_std(image)
#getting the peaks
i,j = get_max(image[:,:,0],sigma, alpha=10, size=10)
#let's see the results
plt.imshow(image, origin='lower')
plt.plot(i,j,'ro', markersize=10, alpha=0.5)
The image ggd4 for the test can be downloaded from:
The first part is to get some information about the noise in the image. I did it by computing the standard deviation of the full image (actually is better to select an small rectangle without signal). This is telling us how much noise is present in the image.
The idea to get the peaks is to ask for successive maximums, which are above of certain threshold (let's say, 3, 4, 5, 10, or 20 times the noise). This is what the function get_max is actually doing. It performs the search of maximums until one of them is below the threshold imposed by the noise. In order to avoid finding the same maximum many times it is necessary to remove the peaks from the image. In the general way, the shape of the mask to do so depends strongly on the problem that one want to solve. for the case of stars, it should be good to remove the star by using a Gaussian function, or something similar. I have chosen for simplicity a square function, and the size of the function (in pixels) is the variable "size".
I think that from this example, anybody can improve the code by adding more general things.
The original image looks like:
While the image after identifying the luminous points looks like this:
Too much of a n00b on Stack Overflow to comment on Alejandro's answer elsewhere here. I would refine his code a bit to use a preallocated numpy array for output:
def get_max(image,sigma,alpha=3,size=10):
from copy import deepcopy
import numpy as np
# preallocate a lot of peak storage
k_arr = np.zeros((10000,2))
image_temp = deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
# this is the part that masks already-found peaks.
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
# the clip here handles edge cases where the peak is near the
# image edge
xv.clip(0,image_temp.shape[1]-1) ] = 0
# trim the output for only what we've actually found
return k_arr[:peak_ct]
In profiling this and Alejandro's code using his example image, this code about 33% faster (0.03 sec for Alejandro's code, 0.02 sec for mine.) I expect on images with larger numbers of peaks, it would be even faster - appending the output to a list will get slower and slower for more peaks.
I think the first step needed here is to express the values in H in terms of the standard deviation of the field:
import numpy as np
H = H / np.std(H)
Now you can put a threshold on the values of this H. If the noise is assumed to be Gaussian, picking a threshold of 3 you can be quite sure (99.7%) that this pixel can be associated with a real peak and not noise. See here.
Now the further selection can start. It is not exactly clear to me what exactly you want to find. Do you want the exact location of peak values? Or do you want one location for a cluster of peaks which is in the middle of this cluster?
Anyway, starting from this point with all pixel values expressed in standard deviations of the field, you should be able to get what you want. If you want to find clusters you could perform a nearest neighbour search on the >3-sigma gridpoints and put a threshold on the distance. I.e. only connect them when they are close enough to each other. If several gridpoints are connected you can define this as a group/cluster and calculate some (sigma-weighted?) center of the cluster.
Hope my first contribution on Stackoverflow is useful for you!
The way I would do it:
1) normalize H between 0 and 1.
2) pick a threshold value, as tcaswell suggests. It could be between .9 and .99 for example
3) use masked arrays to keep only the x,y coordinates with H above threshold:
import as ma
x_masked=ma.masked_array(x, mask= H < thresold)
y_masked=ma.masked_array(y, mask= H < thresold)
4) now you can weight-average on the masked coordinates, with weight something like (H-threshold)^2, or any other power greater or equal to one, depending on your taste/tests.
1) This is not robust with respect to the type of peaks you have, since you may have to adapt the thresold. This is the minor problem;
2) This DOES NOT work with two peaks as it is, and will give wrong results if the 2nd peak is above threshold.
Nonetheless, it will always give you an answer without crashing (with pros and cons of the thing..)
I'm adding this answer because it's the solution I ended up using. It's a combination of Bi Rico's comment here (May 30 at 18:54) and the answer given in this question: Find peak of 2d histogram.
As it turns out using the peak detection algorithm from this question Peak detection in a 2D array only complicates matters. After applying the Gaussian filter to the image all that needs to be done is to ask for the maximum bin (as Bi Rico pointed out) and then obtain the maximum in coordinates.
So instead of using the detect-peaks function as I did above, I simply add the following code after the Gaussian 2D histogram is obtained:
# Get 2D histogram.
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
# Get Gaussian filtered 2D histogram.
H1 = gaussian_filter(H, 2, mode='nearest')
# Get center of maximum in bin coordinates.
x_cent_bin, y_cent_bin = np.unravel_index(H1.argmax(), H1.shape)
# Get center in x,y coordinates.
x_cent_coor , y_cent_coord = np.average(xedges[x_cent_bin:x_cent_bin + 2]), np.average(yedges[y_cent_g:y_cent_g + 2])

Image comparison algorithm

I'm trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a litte bit shifted. So if a have a 100% identical images but one is a little bit moved, I get a bad correlation value.
Any suggestions for a better algorithm?
BTW, I'm talking about to compare thousand of imgages...
Here is an example of my pictures (microscopic):
im1 and im2 are the same but a little bit shifted/cutted, im3 should be recognized as completly different...
Problem is solved with the suggestions of Peter Hansen! Works very well! Thanks to all answers! Some results can be found here
A similar question was asked a year ago and has numerous responses, including one regarding pixelizing the images, which I was going to suggest as at least a pre-qualification step (as it would exclude very non-similar images quite quickly).
There are also links there to still-earlier questions which have even more references and good answers.
Here's an implementation using some of the ideas with Scipy, using your above three images (saved as im1.jpg, im2.jpg, im3.jpg, respectively). The final output shows im1 compared with itself, as a baseline, and then each image compared with the others.
>>> import scipy as sp
>>> from scipy.misc import imread
>>> from scipy.signal.signaltools import correlate2d as c2d
>>> def get(i):
... # get JPG image as Scipy array, RGB (3 layer)
... data = imread('im%s.jpg' % i)
... # convert to grey-scale using W3C luminance calc
... data = sp.inner(data, [299, 587, 114]) / 1000.0
... # normalize per
... return (data - data.mean()) / data.std()
>>> im1 = get(1)
>>> im2 = get(2)
>>> im3 = get(3)
>>> im1.shape
(105, 401)
>>> im2.shape
(109, 373)
>>> im3.shape
(121, 457)
>>> c11 = c2d(im1, im1, mode='same') # baseline
>>> c12 = c2d(im1, im2, mode='same')
>>> c13 = c2d(im1, im3, mode='same')
>>> c23 = c2d(im2, im3, mode='same')
>>> c11.max(), c12.max(), c13.max(), c23.max()
(42105.00000000259, 39898.103896795357, 16482.883608327804, 15873.465425120798)
So note that im1 compared with itself gives a score of 42105, im2 compared with im1 is not far off that, but im3 compared with either of the others gives well under half that value. You'd have to experiment with other images to see how well this might perform and how you might improve it.
Run time is long... several minutes on my machine. I would try some pre-filtering to avoid wasting time comparing very dissimilar images, maybe with the "compare jpg file size" trick mentioned in responses to the other question, or with pixelization. The fact that you have images of different sizes complicates things, but you didn't give enough information about the extent of butchering one might expect, so it's hard to give a specific answer that takes that into account.
I have one done this with an image histogram comparison. My basic algorithm was this:
Split image into red, green and blue
Create normalized histograms for red, green and blue channel and concatenate them into a vector (r0...rn,, where n is the number of "buckets", 256 should be enough
subtract this histogram from the histogram of another image and calculate the distance
here is some code with numpy and pil
r = numpy.asarray(im.convert( "RGB", (1,0,0,0, 1,0,0,0, 1,0,0,0) ))
g = numpy.asarray(im.convert( "RGB", (0,1,0,0, 0,1,0,0, 0,1,0,0) ))
b = numpy.asarray(im.convert( "RGB", (0,0,1,0, 0,0,1,0, 0,0,1,0) ))
hr, h_bins = numpy.histogram(r, bins=256, new=True, normed=True)
hg, h_bins = numpy.histogram(g, bins=256, new=True, normed=True)
hb, h_bins = numpy.histogram(b, bins=256, new=True, normed=True)
hist = numpy.array([hr, hg, hb]).ravel()
if you have two histograms, you can get the distance like this:
diff = hist1 - hist2
distance = numpy.sqrt(, diff))
If the two images are identical, the distance is 0, the more they diverge, the greater the distance.
It worked quite well for photos for me but failed on graphics like texts and logos.
You really need to specify the question better, but, looking at those 5 images, the organisms all seem to be oriented the same way. If this is always the case, you can try doing a normalized cross-correlation between the two images and taking the peak value as your degree of similarity. I don't know of a normalized cross-correlation function in Python, but there is a similar fftconvolve() function and you can do the circular cross-correlation yourself:
a = asarray('c603225337.jpg').convert('L'))
b = asarray('9b78f22f42.jpg').convert('L'))
f1 = rfftn(a)
f2 = rfftn(b)
g = f1 * f2
c = irfftn(g)
This won't work as written since the images are different sizes, and the output isn't weighted or normalized at all.
The location of the peak value of the output indicates the offset between the two images, and the magnitude of the peak indicates the similarity. There should be a way to weight/normalize it so that you can tell the difference between a good match and a poor match.
This isn't as good of an answer as I want, since I haven't figured out how to normalize it yet, but I'll update it if I figure it out, and it will give you an idea to look into.
If your problem is about shifted pixels, maybe you should compare against a frequency transform.
The FFT should be OK (numpy has an implementation for 2D matrices), but I'm always hearing that Wavelets are better for this kind of tasks ^_^
About the performance, if all the images are of the same size, if I remember well, the FFTW package created an specialised function for each FFT input size, so you can get a nice performance boost reusing the same code... I don't know if numpy is based on FFTW, but if it's not maybe you could try to investigate a little bit there.
Here you have a prototype... you can play a little bit with it to see which threshold fits with your images.
import Image
import numpy
import sys
def main():
img1 =[1])
img2 =[2])
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
s = 0
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.fft.fft2(numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size))
m2 = numpy.fft.fft2(numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size))
s += numpy.sum(numpy.abs(m1-m2))
print s
if __name__ == "__main__":
Another way to proceed might be blurring the images, then subtracting the pixel values from the two images. If the difference is non nil, then you can shift one of the images 1 px in each direction and compare again, if the difference is lower than in the previous step, you can repeat shifting in the direction of the gradient and subtracting until the difference is lower than a certain threshold or increases again. That should work if the radius of the blurring kernel is larger than the shift of the images.
Also, you can try with some of the tools that are commonly used in the photography workflow for blending multiple expositions or doing panoramas, like the Pano Tools.
I have done some image processing course long ago, and remember that when matching I normally started with making the image grayscale, and then sharpening the edges of the image so you only see edges. You (the software) can then shift and subtract the images until the difference is minimal.
If that difference is larger than the treshold you set, the images are not equal and you can move on to the next. Images with a smaller treshold can then be analyzed next.
I do think that at best you can radically thin out possible matches, but will need to personally compare possible matches to determine they're really equal.
I can't really show code as it was a long time ago, and I used Khoros/Cantata for that course.
First off, correlation is a very CPU intensive rather inaccurate measure for similarity. Why not just go for the sum of the squares if differences between individual pixels?
A simple solution, if the maximum shift is limited: generate all possible shifted images and find the one that is the best match. Make sure you calculate your match variable (i.e. correllation) only over the subset of pixels that can be matched in all shifted images. Also, your maximum shift should be significantly smaller than the size of your images.
If you want to use some more advances image processing techniques I suggest you look at SIFT this is a very powerfull method that (theoretically anyway) can properly match items in images independent of translation, rotation and scale.
I guess you could do something like this:
estimate vertical / horizontal displacement of reference image vs the comparison image. a
simple SAD (sum of absolute difference) with motion vectors would do to.
shift the comparison image accordingly
compute the pearson correlation you were trying to do
Shift measurement is not difficult.
Take a region (say about 32x32) in comparison image.
Shift it by x pixels in horizontal and y pixels in vertical direction.
Compute the SAD (sum of absolute difference) w.r.t. original image
Do this for several values of x and y in a small range (-10, +10)
Find the place where the difference is minimum
Pick that value as the shift motion vector
If the SAD is coming very high for all values of x and y then you can anyway assume that the images are highly dissimilar and shift measurement is not necessary.
To get the imports to work correctly on my Ubuntu 16.04 (as of April 2017), I installed python 2.7 and these:
sudo apt-get install python-dev
sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk
sudo apt-get install python-scipy
sudo pip install pillow
Then I changed Snowflake's imports to these:
import scipy as sp
from scipy.ndimage import imread
from scipy.signal.signaltools import correlate2d as c2d
How awesome that Snowflake's scripted worked for me 8 years later!
I propose a solution based on the Jaccard index of similarity on the image histograms. See:
You can compute the difference in the distribution of the pixel colors. This is indeed pretty invariant to translations.
from PIL.Image import Image
from typing import List
def jaccard_similarity(im1: Image, im2: Image) -> float:
"""Compute the similarity between two images.
First, for each image an histogram of the pixels distribution is extracted.
Then, the similarity between the histograms is compared using the weighted Jaccard index of similarity, defined as:
Jsimilarity = sum(min(b1_i, b2_i)) / sum(max(b1_i, b2_i)
where b1_i, and b2_i are the ith histogram bin of images 1 and 2, respectively.
The two images must have same resolution and number of channels (depth).
Where it is also called Ruzicka similarity."""
if im1.size != im2.size:
raise Exception("Images must have the same size. Found {} and {}".format(im1.size, im2.size))
n_channels_1 = len(im1.getbands())
n_channels_2 = len(im2.getbands())
if n_channels_1 != n_channels_2:
raise Exception("Images must have the same number of channels. Found {} and {}".format(n_channels_1, n_channels_2))
assert n_channels_1 == n_channels_2
sum_mins = 0
sum_maxs = 0
hi1 = im1.histogram() # type: List[int]
hi2 = im2.histogram() # type: List[int]
# Since the two images have the same amount of channels, they must have the same amount of bins in the histogram.
assert len(hi1) == len(hi2)
for b1, b2 in zip(hi1, hi2):
min_b = min(b1, b2)
sum_mins += min_b
max_b = max(b1, b2)
sum_maxs += max_b
jaccard_index = sum_mins / sum_maxs
return jaccard_index
With respect to mean squared error, the Jaccard index lies always in the range [0,1], thus allowing for comparisons among different image sizes.
Then, you can compare the two images, but after rescaling to the same size! Or pixel counts will have to be somehow normalized. I used this:
import sys
from skincare.common.utils import jaccard_similarity
import PIL.Image
from PIL.Image import Image
file1 = sys.argv[1]
file2 = sys.argv[2]
im1 = # type: Image
im2 = # type: Image
print("Image 1: mode={}, size={}".format(im1.mode, im1.size))
print("Image 2: mode={}, size={}".format(im2.mode, im2.size))
if im1.size != im2.size:
print("Resizing image 2 to {}".format(im1.size))
im2 = im2.resize(im1.size, resample=PIL.Image.BILINEAR)
j = jaccard_similarity(im1, im2)
print("Jaccard similarity index = {}".format(j))
Testing on your images:
$ python im1.jpg im2.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(373, 109)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.7238955686269157
$ python im1.jpg im3.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.22785529941822316
$ python im2.jpg im3.jpg
Image 1: mode=RGB, size=(373, 109)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (373, 109)
Jaccard similarity index = 0.29066426814105445
You might also consider experimenting with different resampling filters (like NEAREST or LANCZOS), as they, of course, alter the color distribution when resizing.
Additionally, consider that swapping images change the results, as the second image might be downsampled instead of upsampled (After all, cropping might better suit your case rather than rescaling.)

