Subtract two histograms - python

I am trying to find the residual left behind when you subtract pixel distribution of two different images(the images are in a 2D array format).
I am trying to do something like the below
import numpy as np
hist1, bins1 = np.histogram(img1, bins=100)
hist2, bins2 = np.histogram(img2, bins=100)
residual = hist1 - hist2
However, in my above method the problem is that both the images have different maximum and minimum and when you do hist1-hist2 the individual bin value of each element in hist1-hist2 is not the same.
I was wondering if there is an alternative elegant way of doing this.
Thanks.

import numpy as np
nbins = 100
#minimum value element wise from both arrays
min = np.minimum(img1, img2)
#maximum value element wise from both arrays
max = np.maximum(img1, img2)
#histogram is build with fixed min and max values
hist1, _ = numpy.histogram(img1,range=(min,max), bins=nbins)
hist2, _ = numpy.histogram(img2,range=(min,max), bins=nbins)
#makes sense to have only positive values
diff = np.absolute(hist1 - hist2)

You can explicitly define bins in np.histogram() call. If you set them to the same value for both calls, then your code would work.
If your values are say between 0 and 255, you could do following:
import numpy as np
hist1, bins1 = np.histogram(img1, bins=np.linspace(0, 255, 100))
hist2, bins2 = np.histogram(img2, bins=np.linspace(0, 255, 100))
residual = hist1 - hist2
This way you have 100 bins with the same boundaries and the simple difference now makes sense (the code is not tested but you get the idea).

Related

how to generate per-pixel histogram from many images in numpy?

I have tens of thousands of images. I want to generate a histogram for each pixel. I have come up with the following code using NumPy to do this that works:
import numpy as np
import matplotlib.pyplot as plt
nimages = 1000
im_shape = (64,64)
nbins = 100
#predefine the histogram bins
hist_bins = np.linspace(0,1,nbins)
#create an array to store histograms for each pixel
perpix_hist = np.zeros((64,64,nbins))
for ni in range(nimages):
#create a simple image with normally distributed pixel values
im = np.random.normal(loc=0.5,scale=0.05,size=im_shape)
#sort each pixel into the predefined histogram
bins_for_this_image = np.searchsorted(hist_bins, im.ravel())
bins_for_this_image = bins_for_this_image.reshape(im_shape)
#this next part adds one to each of those bins
#but this is slow as it loops through each pixel
#how to vectorize?
for i in range(im_shape[0]):
for j in range(im_shape[1]):
perpix_hist[i,j,bins_for_this_image[i,j]] += 1
#plot histogram for a single pixel
plt.plot(hist_bins,perpix_hist[0,0])
plt.xlabel('pixel values')
plt.ylabel('counts')
plt.title('histogram for a single pixel')
plt.show()
I would like to know if anyone can help me vectorize the for loops? I can't think of how to index into the perpix_hist array properly. I have tens/hundreds of thousands of images and each image is ~1500x1500 pixels, and this is too slow.
You can vectorize it using np.meshgrid and providing indices for first, second and third dimension (the last dimension you already have).
y_grid, x_grid = np.meshgrid(np.arange(64), np.arange(64))
for i in range(nimages):
#create a simple image with normally distributed pixel values
im = np.random.normal(loc=0.5,scale=0.05,size=im_shape)
#sort each pixel into the predefined histogram
bins_for_this_image = np.searchsorted(hist_bins, im.ravel())
bins_for_this_image = bins_for_this_image.reshape(im_shape)
perpix_hist[x_grid, y_grid, bins_for_this_image] += 1

Use a numpy mask to determine indices for imshow

Using the small reproducible example below, I create mask that I would then like to programatically determine the min and max x and y indices of where the mask is false (i.e., where the values are not masked). In this and the larger 'real-world' example, the masked values will always be spatially continuous - there are no 'islands' in the mask. The goal is to use the programatically determined indices to zoom into the non-masked values with imshow. I attempt to depict what I'm seeking to do in the image at the end of the post.
import numpy as np
import matplotlib.pyplot as plt
# Generate a large array
arr1 = np.random.rand(100,100)
# Generate a smaller array that will help
# set the mask used below
arr2 = np.random.rand(20,10) + 1
# Insert the smaller array into the larger
# array for demonstration purposes
arr1[60:80,10:20] = arr2
# boost a few values neighboring the inserted array for demonstration purposes
arr1[59,12] += 2
arr1[70:75,20] += 2
arr1[80,13:16] += 2
arr1[64:72,9] += 2
# For demonstration, plot arr1
fig, ax = plt.subplots(figsize=(20, 15))
im = ax.imshow(arr1)
plt.show()
# Generate a mask with an example condition
mask = arr1 < 1
Using the mask, how does one determine what the values of x_min, x_max, y_min, & y_max in the following line of code should be
im = ax.imshow(arr1[y_min:y_max, x_min:x_max])
such that the imshow would be zoomed in to where the red box is on the following figure? As long as I don't have my wires crossed, I think the answer for this small example would be y_min=59, y_max=80, x_min=9, & x_max=20
The following code should work:
y, x = np.where(~mask) # ~ negates the boolean array
x_min = x.min()
x_max = x.max()
y_min = y.min()
y_max = y.max()
plt.imshow(arr1[y_min:y_max+1, x_min:x_max+1])

Find the centre value of the two highest peaks in a histogram Python

I am trying to find the 'middle' value between the highest and second highest peak in a histogram. I can do this manually of course but I want to create an automated method. To calculate my histogram I use:
hist= cv2.calcHist([gray_scale_img], [0], None, [256], [0, 256])
so far I have only figured out how to work out the maximum peak value using max = np.argmax(hist). I have attached an image the red is what I am aiming to find.
HISTOGRAM IMAGE
Here is how you can compute the index and value between the top 2 peaks of histogram (using OpenCV and Python 3).
import numpy as np
import cv2
img = cv2.imread('../test.jpg', cv2.IMREAD_GRAYSCALE)
#Compute histogram
hist = cv2.calcHist([img], [0], None, [256], [0, 256])
#Convert histogram to simple list
hist = [val[0] for val in hist]
#Generate a list of indices
indices = list(range(0, 256))
#Descending sort-by-key with histogram value as key
s = [(x,y) for y,x in sorted(zip(hist,indices), reverse=True)]
#Index of highest peak in histogram
index_of_highest_peak = s[0][0]
#Index of second highest peak in histogram
index_of_second_highest_peak = s[1][0]
print(index_of_highest_peak)
print(index_of_second_highest_peak)
#If top 2 indices are adjacent to each other, there won't be a midpoint
if abs(index_of_highest_peak - index_of_second_highest_peak) < 2:
raise Exception('Midpoint does not exist')
else: #Compute mid index
midpoint = int( (index_of_highest_peak + index_of_second_highest_peak) / 2.0 )
print('Index Between Top 2 Peaks = ', midpoint)
print('Histogram Value At MidPoint = ', hist[midpoint])
I have made the assumption that if top 2 peaks are adjacent to each other, there won't be a midpoint. You may condition this case according to your needs.
What you are attempting seems very similar to the Otsu thresholding algorithm. In that case you can use
ret, otsu = cv2.threshold(gray_scale_img , 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
If your data are strongly trimodal and Otsu's method is not satisfactory, the counterpart would be to apply kmeans clustering with 3 clusters. Both Otsu's method and kmeans classify data by minimizing the within-class variance. In code:
import numpy as np
from sklearn.cluster import KMeans
# Make trimodal data
# (if image is large, consider downsampling it)
n1 = np.random.normal(80, 10, 200)
n2 = np.random.normal(120, 10, 200)
n3 = np.random.normal(180, 10, 400)
imflat = np.r_[n1, n2, n3]
# shuffle and reshape
np.random.shuffle(imflat)
im = imflat.reshape(-1, 1)
km = KMeans(n_clusters=3, random_state=0).fit(im)
lab = km.labels_
# maxmimum data value in each cluster
[im[np.argwhere(lab==i)].flatten().max() for i in range(3)]
This doesn't identify which of the 3 clusters has the highest peak in the histogram though. Apologies for incomplete solution. Another suggestion could be to fit 6th order polynomial to your histogram then find turning points.

matplotlib: plot hist2d piecewise

I would like to plot a large sample stored in the arrays a and b with matplotlib's hist2d feature. However, generating H, xedges, yedges, img does not work directly for this data, as it uses too much memory. It works for half the number of samples, though, so I would like to do something like
H_1, xedges_1, yedges_1, img_1 = plt.hist2d(a[:len(a)/2], b[:len(b)/2], bins = 10)
followed by
H_2, xedges_2, yedges_2, img_2 = plt.hist2d(a[len(a)/2:], b[len(b)/2:], bins = 10)
While perhaps deleting the first half of the arrays after calculating the first set of variables. Is there a way to merge these two sets of variables and generate a combined plot for the data?
If (and only if!) you specify the bin edges manually, then your histograms will be compatible. You can simply add the occurences of each bin for both subsets, and you'll recover the full histogram:
import numpy as np
import matplotlib.pyplot as plt
a=np.random.rand(200)*10
b=np.random.rand(200)*10
binmin=min(a.min(),b.min())
binmax=max(a.max(),b.max())
H_1, xedges_1, yedges_1, img_1 = plt.hist2d(a[:len(a)/2], b[:len(b)/2], bins = np.linspace(binmin,binmax,10+1))
H_2, xedges_2, yedges_2, img_2 = plt.hist2d(a[len(a)/2:], b[len(b)/2:], bins = np.linspace(binmin,binmax,10+1))
H_3, xedges_3, yedges_3, img_3 = plt.hist2d(a, b, bins = np.linspace(binmin,binmax,10+1))
Result:
In [150]: (H_1+H_2==H_3).all()
Out[150]: True
Which you can easily plot using plt.pcolor. That's what hist2d seems to use, albeit with an additional transpose of the data:
plt.figure()
plt.pcolor((H_1+H_2).T)
img_3 (left) vs (H_1+H_2).T (right):

Simultaneously fit linearly every line of a 2d numpy array

I am working in Python on image analysis. I have an image (2d numpy array) with some intensity drift in it. I want to level it.
To remove the increasing/decreasing intensity over the width of the image, I want to fit every row of the 2d numpy array with a line. I however do not want to loop through every row index.
MWE:
import numpy as np
import matplotlib.pyplot as plt
width=1500
height=2500
np.random.random((width,height))
fill_fun = lambda x,a,b : a*x+b
play_image = fill_fun(np.tile(np.arange(width),(height,1)),0.15,2)+np.random.random( (height,width) )
#For representation purposes:
#plt.imshow(play_image,cmap='Greys_r')
#plt.show()
#1) Fit every row and kill the intensity decrease/increase tendency
fit_func = lambda p,x: p[0]*x+b
errfunc = lambda p, x, y: abs(fitfunc(p, x) - y) # Distance to the target function
x_axis=np.linspace(0,width,width)
for i in range(height):
row_val=play_image[i,:]
p0=[(row_val[-1]-row_val[0])/float(width),row_val[0]] #guess
p1, success = optimize.leastsq(errfunc, p0[:], args=(x_axis,row_val))
play_image[i,:]-= fit_func(p1,x_axis)-p1[1]
By doing this I effectively level my image intensity horizontally. Is there anyway I can replace the loop by a matrix operation ? To somehow fit all the lines at the same time with a (height,2) parameter vector ?
Thanks for the help
Fitting a line is a simple formula to use directly, which can be done about three short lines in numpy (most of the code below is just making and plotting the data and fits):
import numpy as np
import matplotlib.pyplot as plt
# make the data as sequential sections of a circle
theta = np.linspace(np.pi, 0, 120)
y = np.reshape(np.sin(theta), (10,12))
x = np.repeat(np.arange(12)[None,:], 10, axis=0)
# fit the line
m = lambda x: np.mean(x, axis=1)
beta = ( m(y*x) - m(x)*m(y) )/(m(x*x) - m(x)**2)
alpha = m(y) - beta*m(x)
# plot the data and fits
plt.plot([y[:,i] for i in range(12)], ".") # plot the data
plt.gca().set_color_cycle(None) # reset the color cycle
fits = alpha[:,None] + beta[:,None]*x # make lines from the fits for the plots
plt.plot(fits.T)
plt.show()
You can implement the normal equations and their solution pretty easily. The main challenge is keeping track of the appropriate dimensions so all the vectorized operations work correctly. Here's one method:
import numpy as np
# image size
m = 100
n = 125
# A random image to work with.
np.random.seed(123)
img = np.random.randint(0, 100, size=(m, n))
# X is the design matrix. It is the same for each row. It has shape (n, 2).
X = np.column_stack((np.ones(n), np.arange(n)))
# A is X.T.dot(X), but in this case we can use an explicit formula for each term.
s1 = 0.5*n*(n - 1) # Sum of integers
s2 = n*(n - 0.5)*(n - 1)/3.0 # Sum of squared integers
A = np.array([[n, s1], [s1, s2]])
# Y has shape (2, m). Each column is a vector on the right-hand-side of the
# normal equations.
Y = X.T.dot(img.T)
# Solve the normal equations. beta has shape (2, m). Each column gives the
# coefficients of the linear fit for each row of img.
beta = np.linalg.solve(A, Y)
# Create an array that holds the linear drift for each row.
# X has shape (n, 2) and beta has shape (2, m), so row_drift has shape (m, n),
# the same as img.
row_drift = X.dot(beta).T
# Remove the drift from img.
img2 = img - row_drift

Categories

Resources