Create an Undistorted Top-Down View of Camera Image - python

I have a fixed camera mounted on a wall viewing a rectangular lawn at an angle. My goal is to obtain an undistorted, top-down view of the lawn.
I have an image from the camera as a python numpy array which looks like this:
raw camera image
I use an inverse matrix with skimage.transform.warp to correct the image to a top down view:
top down distorted
This works perfectly, however the camera lens introduces barrel distortion.
Seperately, I can correct the distortion with a generated lookup table using skimage.transform.warp_coords and passing a simple undistort callable function based on the algorithm described here.
The image is then generated using scipy.ndimage.map_coordinates:
undistorted camera view
These 2 processes work individually, but how do I combine them to create an undistorted top-down view, without creating an intermediate image?
I could run each point in the lookup table through the matrix to create a new table, but the table is massive and memory is tight (Raspberry Pi Zero).
I would like to define the undistortion as a matrix and just combine the 2 matrices, but as I understand it, the projective homography matrix is linear but undistortion is non-linear, so this can't be done. I can't use OpenCV due to resource constraints, and the calibration procedure involving multiple chessboard images is impractical. Currently, I calibrate by taking 4 lawn corner points and generate the matrix from them, which works well.
I would have anticipated that this is a common problem in Computer Vision but can't find any suitable solutions.

The barrel distortion is nonlinear, but it is also smooth. This means it can be well approximated by a collection of piecewise linear approximations.
So you do not need a large, per-pixel look-up table of un-distortion displacements. Rather, you can subsample it (or just scale it down), and use bilinear interpolation for in-between pixels.

I have found a solution that appears to work by creating seperate functions for undistort and transformation, then chaining them together.
The skimage source code here has the _apply_mat method for generating a mapping from a matrix. I based my unwarp function on that:
def unwarp(coords, matrix):
coords = np.array(coords, copy=False, ndmin=2)
x, y = np.transpose(coords)
src = np.vstack((x, y, np.ones_like(x)))
dst = src.T # matrix.T
# below, we will divide by the last dimension of the homogeneous
# coordinate matrix. In order to avoid division by zero,
# we replace exact zeros in this column with a very small number.
dst[dst[:, 2] == 0, 2] = np.finfo(float).eps
# rescale to homogeneous coordinates
dst[:, :2] /= dst[:, 2:3]
return dst[:, :2]
I created a similar function for undistorting based on Tanner Hellands algorithm:
def undistort(coords, cols, rows, correction_radius, zoom):
half_width = cols / 2
half_height = rows / 2
new_x = coords[:, 0] - half_width
new_y = coords[:, 1] - half_height
distance = np.hypot(new_x, new_y)
r = distance / correction_radius
theta = np.ones_like(r)
# only process non-zero values
np.divide(np.arctan(r), r, out=theta, where=r!=0)
source_x = half_width + theta * new_x * zoom
source_y = half_height + theta * new_y * zoom
result = np.column_stack([source_x, source_y])
return result
The only tricky bit here is the divide where we need to prevent division by zero.
Once we have each lookup table we can chain them together:
def undistort_unwarp(coords):
undistorted = undistort(coords)
both = unwarp(undistorted)
return both
Note that these are the callable functions passed to skimage.transform.warp_coords:
mymap = tf.warp_coords(undistort_unwarp, shape=(rows, cols), dtype=np.int16)
The map can then be passed to the skimage.transform.warp function.
Francesco's answer was helpful, however I needed the full pixel resolution for the transformation, so I used it for the undistort as well, and looked to other ways to reduce the memory consumption.
Each map consumes
rows * cols * bytes-per-item * 2 (x and y)
bytes. The default datatype is float64, which requires 8 bytes-per-item, and the documentation suggests sane choices would be the default or float32 at 4 bytes-per-item. I was able to reduce this to 2 bytes-per-item using int16 with no visible ill effects, but I suspect the spline interpolation is not being used to the full (at all?).
The map is the same for each channel of a colour RGB image. However, when I called warp_coords with shape=(rows, cols, 3) I got 3 duplicate maps, so I created a function to handle colour images by processing each channel separately:
def warp_colour(img_arr, coord_map):
if img_arr.ndim == 3:
# colour
rows, cols, _chans = img_arr.shape
r_arr = tf.warp(img_arr[:, :, 0], inverse_map=coord_map, output_shape=(rows, cols))
g_arr = tf.warp(img_arr[:, :, 1], inverse_map=coord_map, output_shape=(rows, cols))
b_arr = tf.warp(img_arr[:, :, 2], inverse_map=coord_map, output_shape=(rows, cols))
rgb_arr = np.dstack([r_arr, g_arr, b_arr])
# grayscale
rows, cols = img_arr.shape
rgb_arr = tf.warp(img_arr, inverse_map=coord_map, output_shape=(rows, cols))
return rgb_arr
One issue with skimage.transform.warp_coords is that it does not have the map_args dictionary parameter that skimage.transform.warp has. I had to call my unwarp and undistort functions through an intermediate function to add the parameters.


Calculate and use inverse of OpenCV distortion parameters

So basically I want to find out what the invert distortion parameters would be for a calibration I do as showed here.
NOTE: I know that we can perform undistortion and then use remapping to do what I have done below, but my goal is to be able to find out the actual inverted distortion parameters and use them to distort other images, not to just be able to revert what cv2.undistort() does
I have tried passing in the negation of the distortion parameters:
# _, mat, distortion, _, _ = cv2.calibrateCamera(...)
# undistorted_image = cv2.undistort(with distortion)
# redistorted_image = cv2.undistort(with np.negative(distortion))
In theory, I was thinking that if the redistorted_image is similar to the original image, then the np.negative(distortion) parameters are what I am looking for, but it turned out to be false.
Actual method I use:
def save_undistorted_images(image, matrix, distortion):
test_image_su = cv.imread(image)
height, width = test_image_su.shape[:2]
new_camera_mtx, roi = cv.getOptimalNewCameraMatrix(matrix, distortion, (width, height), 1, (width, height))
distortion_u = np.negative(distortion)
# unsure if this line helps
new_camera_mtx_inv, roi = cv.getOptimalNewCameraMatrix(matrix, distortion_u, (width, height), 1, (width, height))
# undistort the image
undistorted_image = cv.undistort(test_image_su, matrix, distortion, None, new_camera_mtx)
cv.imwrite('undistorted_frame.png', undistorted_image)
# redistort trying to get something like original image (image_test_su)
distorted_image = cv.undistort(undistorted_image, matrix, distortion_u, None, new_camera_mtx_inv)
cv.imwrite('redistorted_frame.png', distorted_image)
The results:
(left a: original) (right b: undistorted)
(left c: distorted using np.negative(distortion)) (right d: undistorted image redistorted using np.negative(distortion)))
The image d here is basically c performed on b, which I expected would be similar to a
Why is b here overpowering the effect of c?
Other way of calculating inverse that I tried:
The following is my python implementation of this paper
distortion_u = distortion
k1 = distortion_u[0][0]
k2 = distortion_u[0][1]
k3 = distortion_u[0][4]
b1 = -1 * k1
b2 = 3 * (k1 * k1) - k2
b3 = (8 * k1 * k2) + (-1 * (12 * (k1 * k1 * k1))) - k3
# radial:
distortion_u[0][0] = b1
distortion_u[0][1] = b2
distortion_u[0][4] = b3
# tangential:
#distortion_u[0][2] = -1 * distortion_u[0][2]
#distortion_u[0][3] = -1 * distortion_u[0][3]
The results of applying distortion on undistorted image using above distortion parameters is also not good, looks really similar to results above.
So, this brings us to:
Why is the effect of normal distortion always overpowering np.negative(distortion) or anything else?
Does all distortion work this way? (negative values does not equal to opposite effect)
How to get the actually opposite distortion parameters?
Afraid you are doing it wrong. The opencv distortion model you have calibrated computes undistorted and normalized image coordinates from distorted ones. It is a nonlinear model, so inverting it involves solving a system of nonlinear (polynomial) equations.
A closed form (parametric) solution exists AFAIK only for the case of single-parameter pure radial distortion, i.e. when the only nonzero distortion parameter is k1, the coefficient of r^2. In this case the model inversion equation reduces to a cubic equation in r, and you can then express the inverse model using Cardano's formula for the solution of the cubic.
In all other cases one inverts the model numerically, using various algorithms for solving the nonlinear system of equations. OpenCV uses an iterative "false-position" method.
Since you want to use the inverse model to un-distort a set of images (which is the normal use case), you should use initUndistortRectifyMap to calculate the undistortion solution for the image once and for all, and then pass it for every image to remap to actually undistort the images.
If you really need a parametric model for the inverse model, my advice would be to look into approximating the maps returned by initUndistortRectifyMap with a pair of higher order polynomials, or thin-plate splines.

Manually wirting code for warpAffine in python

I want to implement affine transformation by not using library functions.
I have an image named "transformed" and I want to apply inverse transformation to obtain "img_org" image. Right now, I am using my own basic GetBilinearPixel function to set the intensity value. But, the image is not transforming properly.This is what I came up with. :
This is image("transformed.png"):
This is image("img_org.png"):
But My goal is to produce this image:
You can see the transformation matrix here:
pts1 = np.float32( [[693,349] , [605,331] , [445,59]] )
pts2 = np.float32 ( [[1379,895] , [1213,970] ,[684,428]] )
Mat = cv2.getAffineTransform(pts2,pts1)
def GetBilinearPixel(imArr, posX, posY):
return imArr[posX][posY]
for i in range(1,img.shape[0]-1):
for j in range(1,img.shape[1]-1):
#print pos
#print r,c
if(c<=1024 and r<=768 and c>=0 and r>=0):
img_org[r][c] += GetBilinearPixel(img, i, j)
for i in range(0,img_org.shape[0]):
for j in range(0,img_org.shape[1]):
img_org[i][j] = img_org[i][j]/img_size[i][j]
Is my logic wrong? I know that i have applied very inefficient algorithm.
Is there any insight that i am missing?
Or can you give me any other algorithm which will work fine.
(Request) . I don't want to use warpAffine function.
So I vectorized the code and this method works---I can't find the exact issue with your implementation, but maybe this will shed some light (plus the speed is way faster).
The setup to vectorize is to create a linear (homogeneous) array containing every point in the image. We want an array that looks like
x0 x1 ... xN x0 x1 ... xN ..... x0 x1 ... xN
y0 y0 ... y0 y1 y1 ... y1 ..... yM yM ... yM
1 1 ... 1 1 1 ... 1 ..... 1 1 ... 1
So that every point (xi, yi, 1) is included. Then transforming is just a single matrix multiplication with your transformation matrix and this array.
To simplify matters (partially because your image naming conventions confused me), I'll say the original starting image is the "destination" or dst because we want to transform back to the "source" or src image. Bearing that in mind, creating this linear homogenous array could look something like this:
dst = cv2.imread('img.jpg', 0)
h, w = dst.shape[:2]
dst_y, dst_x = np.indices((h, w)) # similar to meshgrid/mgrid
dst_lin_homg_pts = np.stack((dst_x.ravel(), dst_y.ravel(), np.ones(dst_y.size)))
Then, to transform the points, just create the transformation matrix and multiply. I'll round the transformed pixel locations because I'm using them as an index and not bothering with interpolation:
src_pts = np.float32([[693, 349], [605, 331], [445, 59]])
dst_pts = np.float32([[1379, 895], [1213, 970], [684, 428]])
transf = cv2.getAffineTransform(dst_pts, src_pts)
src_lin_pts = np.round(
Now this transformation will send some pixels to negative indices, and if we index with those, it'll wrap around the image---probably not what we want to do. Of course in the OpenCV implementation, it just cuts those pixels off completely. But we can just shift all the transformed pixels so that all of the locations are positive and we don't cut off any (you can of course do whatever you want in this regard):
min_x, min_y = np.amin(src_lin_pts, axis=1)
src_lin_pts -= np.array([[min_x], [min_y]])
Then we'll need to create the source image src which the transform maps into. I'll create it with a gray background so we can see the extent of the black from the dst image.
trans_max_x, trans_max_y = np.amax(src_lin_pts, axis=1)
src = np.ones((trans_max_y+1, trans_max_x+1), dtype=np.uint8)*127
Now all we have to do is place some corresponding pixels from the destination image into the source image. Since I didn't cut off any of the pixels and there's the same number of pixels in both linear points array, I can just assign the transformed pixels the color they had in the original image.
src[src_lin_pts[1], src_lin_pts[0]] = dst.ravel()
Now, of course, this isn't interpolating on the image. But there's no built-ins in OpenCV for interpolation (there is backend C functions for other methods to use but not that you can access in Python AFAIK). But, you have the important parts---where the destination image gets mapped to, and the original image, so you can use any number of libraries to interpolate onto that grid. Or just implement a linear interpolation yourself as it's not too difficult. You'll probably want to un-round the warped pixel locations of course before then.
cv2.imshow('src', src)
Edit: Also this same method will work for warpPerspective too, although your resulting matrix multiplication will give a three-rowed (homogeneous) vector, and you'll need to divide the first two rows by the third row to set them back into Cartesian world. Other than that, everything else stays the same.

Python fractal box count - fractal dimension

I have some images for which I want to calculate the Minkowski/box count dimension to determine the fractal characteristics in the image. Here are 2 example images:
I'm using the following code to calculate the fractal dimension:
import numpy as np
import scipy
def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray
def fractal_dimension(Z, threshold=0.9):
# Only for 2d image
assert(len(Z.shape) == 2)
# From (#87)
def boxcount(Z, k):
S = np.add.reduceat(
np.add.reduceat(Z, np.arange(0, Z.shape[0], k), axis=0),
np.arange(0, Z.shape[1], k), axis=1)
# We count non-empty (0) and non-full boxes (k*k)
return len(np.where((S > 0) & (S < k*k))[0])
# Transform Z into a binary array
Z = (Z < threshold)
# Minimal dimension of image
p = min(Z.shape)
# Greatest power of 2 less than or equal to p
n = 2**np.floor(np.log(p)/np.log(2))
# Extract the exponent
n = int(np.log(n)/np.log(2))
# Build successive box sizes (from 2**n down to 2**1)
sizes = 2**np.arange(n, 1, -1)
# Actual box counting with decreasing size
counts = []
for size in sizes:
counts.append(boxcount(Z, size))
# Fit the successive log(sizes) with log (counts)
coeffs = np.polyfit(np.log(sizes), np.log(counts), 1)
return -coeffs[0]
I = rgb2gray(scipy.misc.imread("24.jpg"))
print("Minkowski–Bouligand dimension (computed): ", fractal_dimension(I))
From the literature I've read, it has been suggested that natural scenes (e.g. 24.jpg) are more fractal in nature, and thus should have a larger fractal dimension value
The results it gives me are in the opposite direction than what the literature would suggest:
10.jpg: 1.259
24.jpg: 1.073
I would expect the fractal dimension for the natural image to be larger than for the urban
Am I calculating the value incorrectly in my code? Or am I just interpreting the results incorrectly?
With fractal dimension of something physical the dimension might converge at different stages to different values. For example, a very thin line (but of finite width) would initially seem one dimensional, then eventual two dimensional as its width becomes of comparable size to the boxes used.
Lets see the dimensions that you have produced:
What do you see? Well the linear fits are not so good. And the dimensions is going towards a value of two.
To diagnose, lets take a look at the grey-scale images produced, with the threshold that you have (that is, 0.9):
The nature picture has almost become an ink blob. The dimensions would go to a value of 2 very soon, as the graphs told us. That is because we pretty much lost the image.
And now with a threshold of 50?
With new linear fits that are much better, the dimensions are 1.6 and 1.8 for urban and nature respectively. Keep in mind, that the urban picture actually has a lot of structure to it, in particular on the textured walls.
In future good threshold values would be ones closer to the mean of the grey scale images, that way your image does not turn into a blob of ink!
A good text book on this is "Fractals everywhere" by Michael F. Barnsley.

HOG Feature extraction

I want to extract HOG features of Line images of Arabic Handwriting. The code is as follows. So , I want help regarding how to input the image and how to output the features . Can anyone please help me regarding this.
import numpy as np
from scipy import sqrt, pi, arctan2, cos, sin
from scipy.ndimage import uniform_filter
def hog(image, orientations=9, pixels_per_cell=(8, 8),
cells_per_block=(3, 3), visualise=False, normalise=False):
"""Extract Histogram of Oriented Gradients (HOG) for a given image.
Compute a Histogram of Oriented Gradients (HOG) by
1. (optional) global image normalisation
2. computing the gradient image in x and y
3. computing gradient histograms
4. normalising across blocks
5. flattening into a feature vector
image : (M, N) ndarray
Input image (greyscale).
orientations : int
Number of orientation bins.
pixels_per_cell : 2 tuple (int, int)
Size (in pixels) of a cell.
cells_per_block : 2 tuple (int,int)
Number of cells in each block.
visualise : bool, optional
Also return an image of the HOG.
normalise : bool, optional
Apply power law compression to normalise the image before
newarr : ndarray
HOG for the image as a 1D (flattened) array.
hog_image : ndarray (if visualise=True)
A visualisation of the HOG image.
* Dalal, N and Triggs, B, Histograms of Oriented Gradients for
Human Detection, IEEE Computer Society Conference on Computer
Vision and Pattern Recognition 2005 San Diego, CA, USA
image = np.atleast_2d(image)
The first stage applies an optional global image normalisation
equalisation that is designed to reduce the influence of illumination
effects. In practice we use gamma (power law) compression, either
computing the square root or the log of each colour channel.
Image texture strength is typically proportional to the local surface
illumination so this compression helps to reduce the effects of local
shadowing and illumination variations.
if image.ndim > 3:
raise ValueError("Currently only supports grey-level images")
if normalise:
image = sqrt(image)
The second stage computes first order image gradients. These capture
contour, silhouette and some texture information, while providing
further resistance to illumination variations. The locally dominant
colour channel is used, which provides colour invariance to a large
extent. Variant methods may also include second order image derivatives,
which act as primitive bar detectors - a useful feature for capturing,
e.g. bar like structures in bicycles and limbs in humans.
gx = np.zeros(image.shape)
gy = np.zeros(image.shape)
gx[:, :-1] = np.diff(image, n=1, axis=1)
gy[:-1, :] = np.diff(image, n=1, axis=0)
The third stage aims to produce an encoding that is sensitive to
local image content while remaining resistant to small changes in
pose or appearance. The adopted method pools gradient orientation
information locally in the same way as the SIFT [Lowe 2004]
feature. The image window is divided into small spatial regions,
called "cells". For each cell we accumulate a local 1-D histogram
of gradient or edge orientations over all the pixels in the
cell. This combined cell-level 1-D histogram forms the basic
"orientation histogram" representation. Each orientation histogram
divides the gradient angle range into a fixed number of
predetermined bins. The gradient magnitudes of the pixels in the
cell are used to vote into the orientation histogram.
magnitude = sqrt(gx ** 2 + gy ** 2)
orientation = arctan2(gy, (gx + 1e-15)) * (180 / pi) + 90
sy, sx = image.shape
cx, cy = pixels_per_cell
bx, by = cells_per_block
n_cellsx = int(np.floor(sx // cx)) # number of cells in x
n_cellsy = int(np.floor(sy // cy)) # number of cells in y
# compute orientations integral images
orientation_histogram = np.zeros((n_cellsy, n_cellsx, orientations))
for i in range(orientations):
#create new integral image for this orientation
# isolate orientations in this range
temp_ori = np.where(orientation < 180 / orientations * (i + 1),
orientation, 0)
temp_ori = np.where(orientation >= 180 / orientations * i,
temp_ori, 0)
# select magnitudes for those orientations
cond2 = temp_ori > 0
temp_mag = np.where(cond2, magnitude, 0)
orientation_histogram[:,:,i] = uniform_filter(temp_mag, size=(cy, cx))[cy/2::cy, cx/2::cx]
# now for each cell, compute the histogram
#orientation_histogram = np.zeros((n_cellsx, n_cellsy, orientations))
radius = min(cx, cy) // 2 - 1
hog_image = None
if visualise:
hog_image = np.zeros((sy, sx), dtype=float)
if visualise:
from skimage import draw
for x in range(n_cellsx):
for y in range(n_cellsy):
for o in range(orientations):
centre = tuple([y * cy + cy // 2, x * cx + cx // 2])
dx = radius * cos(float(o) / orientations * np.pi)
dy = radius * sin(float(o) / orientations * np.pi)
rr, cc = draw.bresenham(centre[0] - dx, centre[1] - dy,
centre[0] + dx, centre[1] + dy)
hog_image[rr, cc] += orientation_histogram[y, x, o]
The fourth stage computes normalisation, which takes local groups of
cells and contrast normalises their overall responses before passing
to next stage. Normalisation introduces better invariance to illumination,
shadowing, and edge contrast. It is performed by accumulating a measure
of local histogram "energy" over local groups of cells that we call
"blocks". The result is used to normalise each cell in the block.
Typically each individual cell is shared between several blocks, but
its normalisations are block dependent and thus different. The cell
thus appears several times in the final output vector with different
normalisations. This may seem redundant but it improves the performance.
We refer to the normalised block descriptors as Histogram of Oriented
Gradient (HOG) descriptors.
n_blocksx = (n_cellsx - bx) + 1
n_blocksy = (n_cellsy - by) + 1
normalised_blocks = np.zeros((n_blocksy, n_blocksx,
by, bx, orientations))
for x in range(n_blocksx):
for y in range(n_blocksy):
block = orientation_histogram[y:y + by, x:x + bx, :]
eps = 1e-5
normalised_blocks[y, x, :] = block / sqrt(block.sum() ** 2 + eps)
The final step collects the HOG descriptors from all blocks of a dense
overlapping grid of blocks covering the detection window into a combined
feature vector for use in the window classifier.
if visualise:
return normalised_blocks.ravel(), hog_image
return normalised_blocks.ravel()
You can use the OpenCV library to read image files into NumPy arrays.

Peak detection in a noisy 2d array

I'm trying to get python to return, as close as possible, the center of the most obvious clustering in an image like the one below:
In my previous question I asked how to get the global maximum and the local maximums of a 2d array, and the answers given worked perfectly. The issue is that the center estimation I can get by averaging the global maximum obtained with different bin sizes is always slightly off than the one I would set by eye, because I'm only accounting for the biggest bin instead of a group of biggest bins (like one does by eye).
I tried adapting the answer to this question to my problem, but it turns out my image is too noisy for that algorithm to work. Here's my code implementing that answer:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
from os import getcwd
from os.path import join, realpath, dirname
# Save path to dir where this code exists.
mypath = realpath(join(getcwd(), dirname(__file__)))
myfile = 'data_file.dat'
x, y = np.loadtxt(join(mypath,myfile), usecols=(1, 2), unpack=True)
xmin, xmax = min(x), max(x)
ymin, ymax = min(y), max(y)
rang = [[xmin, xmax], [ymin, ymax]]
paws = []
for d_b in range(25, 110, 25):
# Number of bins in x,y given the bin width 'd_b'
binsxy = [int((xmax - xmin) / d_b), int((ymax - ymin) / d_b)]
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
def detect_peaks(image):
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask
detected_peaks = local_max - eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+2) )
and here's the result of that (varying the bin size):
Clearly my background is too noisy for that algorithm to work, so the question is: how can I make that algorithm less sensitive? If an alternative solution exists then please let me know.
Following Bi Rico advise I attempted smoothing my 2d array before passing it on to the local maximum finder, like so:
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
H1 = gaussian_filter(H, 2, mode='nearest')
These were the results with a sigma of 2, 4 and 8:
A mode ='constant' seems to work much better than nearest. It converges to the right center with a sigma=2 for the largest bin size:
So, how do I get the coordinates of the maximum that shows in the last image?
Answering the last part of your question, always you have points in an image, you can find their coordinates by searching, in some order, the local maximums of the image. In case your data is not a point source, you can apply a mask to each peak in order to avoid the peak neighborhood from being a maximum while performing a future search. I propose the following code:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import copy
def get_std(image):
return np.std(image)
def get_max(image,sigma,alpha=20,size=10):
i_out = []
j_out = []
image_temp = copy.deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
xv.clip(0,image_temp.shape[1]-1) ] = 0
print xv
return i_out,j_out
#reading the image
image = mpimg.imread('ggd4.jpg')
#computing the standard deviation of the image
sigma = get_std(image)
#getting the peaks
i,j = get_max(image[:,:,0],sigma, alpha=10, size=10)
#let's see the results
plt.imshow(image, origin='lower')
plt.plot(i,j,'ro', markersize=10, alpha=0.5)
The image ggd4 for the test can be downloaded from:
The first part is to get some information about the noise in the image. I did it by computing the standard deviation of the full image (actually is better to select an small rectangle without signal). This is telling us how much noise is present in the image.
The idea to get the peaks is to ask for successive maximums, which are above of certain threshold (let's say, 3, 4, 5, 10, or 20 times the noise). This is what the function get_max is actually doing. It performs the search of maximums until one of them is below the threshold imposed by the noise. In order to avoid finding the same maximum many times it is necessary to remove the peaks from the image. In the general way, the shape of the mask to do so depends strongly on the problem that one want to solve. for the case of stars, it should be good to remove the star by using a Gaussian function, or something similar. I have chosen for simplicity a square function, and the size of the function (in pixels) is the variable "size".
I think that from this example, anybody can improve the code by adding more general things.
The original image looks like:
While the image after identifying the luminous points looks like this:
Too much of a n00b on Stack Overflow to comment on Alejandro's answer elsewhere here. I would refine his code a bit to use a preallocated numpy array for output:
def get_max(image,sigma,alpha=3,size=10):
from copy import deepcopy
import numpy as np
# preallocate a lot of peak storage
k_arr = np.zeros((10000,2))
image_temp = deepcopy(image)
while True:
k = np.argmax(image_temp)
j,i = np.unravel_index(k, image_temp.shape)
if(image_temp[j,i] >= alpha*sigma):
# this is the part that masks already-found peaks.
x = np.arange(i-size, i+size)
y = np.arange(j-size, j+size)
xv,yv = np.meshgrid(x,y)
# the clip here handles edge cases where the peak is near the
# image edge
xv.clip(0,image_temp.shape[1]-1) ] = 0
# trim the output for only what we've actually found
return k_arr[:peak_ct]
In profiling this and Alejandro's code using his example image, this code about 33% faster (0.03 sec for Alejandro's code, 0.02 sec for mine.) I expect on images with larger numbers of peaks, it would be even faster - appending the output to a list will get slower and slower for more peaks.
I think the first step needed here is to express the values in H in terms of the standard deviation of the field:
import numpy as np
H = H / np.std(H)
Now you can put a threshold on the values of this H. If the noise is assumed to be Gaussian, picking a threshold of 3 you can be quite sure (99.7%) that this pixel can be associated with a real peak and not noise. See here.
Now the further selection can start. It is not exactly clear to me what exactly you want to find. Do you want the exact location of peak values? Or do you want one location for a cluster of peaks which is in the middle of this cluster?
Anyway, starting from this point with all pixel values expressed in standard deviations of the field, you should be able to get what you want. If you want to find clusters you could perform a nearest neighbour search on the >3-sigma gridpoints and put a threshold on the distance. I.e. only connect them when they are close enough to each other. If several gridpoints are connected you can define this as a group/cluster and calculate some (sigma-weighted?) center of the cluster.
Hope my first contribution on Stackoverflow is useful for you!
The way I would do it:
1) normalize H between 0 and 1.
2) pick a threshold value, as tcaswell suggests. It could be between .9 and .99 for example
3) use masked arrays to keep only the x,y coordinates with H above threshold:
import as ma
x_masked=ma.masked_array(x, mask= H < thresold)
y_masked=ma.masked_array(y, mask= H < thresold)
4) now you can weight-average on the masked coordinates, with weight something like (H-threshold)^2, or any other power greater or equal to one, depending on your taste/tests.
1) This is not robust with respect to the type of peaks you have, since you may have to adapt the thresold. This is the minor problem;
2) This DOES NOT work with two peaks as it is, and will give wrong results if the 2nd peak is above threshold.
Nonetheless, it will always give you an answer without crashing (with pros and cons of the thing..)
I'm adding this answer because it's the solution I ended up using. It's a combination of Bi Rico's comment here (May 30 at 18:54) and the answer given in this question: Find peak of 2d histogram.
As it turns out using the peak detection algorithm from this question Peak detection in a 2D array only complicates matters. After applying the Gaussian filter to the image all that needs to be done is to ask for the maximum bin (as Bi Rico pointed out) and then obtain the maximum in coordinates.
So instead of using the detect-peaks function as I did above, I simply add the following code after the Gaussian 2D histogram is obtained:
# Get 2D histogram.
H, xedges, yedges = np.histogram2d(x, y, range=rang, bins=binsxy)
# Get Gaussian filtered 2D histogram.
H1 = gaussian_filter(H, 2, mode='nearest')
# Get center of maximum in bin coordinates.
x_cent_bin, y_cent_bin = np.unravel_index(H1.argmax(), H1.shape)
# Get center in x,y coordinates.
x_cent_coor , y_cent_coord = np.average(xedges[x_cent_bin:x_cent_bin + 2]), np.average(yedges[y_cent_g:y_cent_g + 2])

