I am using the python bindings for opencv. I am using keypoint detection and description (ie SURF, SIFT,...) to find a template image contained within a target image, but there is a catch: the template can be "squeezed" in the target image, so that the aspect ratio is different than the target image.
This does not work with findHomography(), since it assumes a simple perspective transform, which cannot have this sort of stretching.
Are there any ways to do this? I have thought about incrementally stretching the target image different amounts to change the aspect ratio, and using findHomography at each iteration, but as far as I can tell there is no way of comparing the quality of a fit (since I'm using RANSAC to find the best fit), so I can't tell at which squeeze level it fits best.
Perhaps counting the number of points that matched correctly from the RANSAC by looking at the length of the returned mask? This seems sorta gross.
This does not work with findHomography(), since it assumes a simple perspective transform, which cannot have this sort of stretching.
This is not true; even an affine warp includes stretching the aspect ratios and even shear distortion, and homographies expand this by even non-uniform distortions. For example, the affine transformation given by the matrix
2 0 0
0 1 0
will stretch an image horizontally by a factor of two, as seen with this short program:
import cv2
import numpy as np
img = cv2.imread('lena.png')
affine_warp = np.array([[2, 0, 0], [0, 1, 0]], dtype=np.float32)
dsize = (img.shape[1]*2, img.shape[0])
warped_img = cv2.warpAffine(img, affine_warp, dsize)
cv2.imshow("2x Horizontal Stretching", warped_img)
cv2.waitKey(0)
Producing the output:
So that is not your issue. Homographies allow even stronger warping. Are you running RANSAC yourself or letting the findHomography() function decide your points via RANSAC? Please post your expected output and your current code, possibly in a new question that reflects the problems you're facing.
Related
My problem is as follows. I have an image img0 (array shape (A,B,3)) and then a face img1 cut out from the middle of that image (by an algorithm I don't have access to: my input is only the whole image, and the face cut out from it), now an array shaped (C,D,3) where C<A and D<B. Now, I want to perform operations on the face (e.g., colour it differently) and then stick it back inside the original background (which is not coloured differently) -- these operations will not affect the shape of img1 array containing the face alone, it will remain (C,D,3). Something like img0-img1 doesn't work because of the shape mismatch.
I guess an approach like finding the starting coordinate of the face in img0 would work in the case that the face cut out is rectangular (which is possible for me to use, though not ideal), since it is guaranteed that the face is exactly identical in img1 and img0. That means, to get the background, we only need to find the starting coordinate of the img1 array in img0, cut out the subsequent elements (that correspond to img1) from img0, and we're left with the background. After I've done whatever I want to the face, I can use the new (C,D,3) array in place of the previous img1 part of the whole image (img0).
Is there a way to do this in Python? i.e., compute the difference between two images of different sizes, where one image is a 'subimage' of the other? Or, failing that, if we can find the starting coordinate of the rectangular portion of an image (img0) which corresponds to a rectangular cutout available to us (img1)?
Or, failing that, if we can find the starting coordinate of the rectangular ?portion of an image (img0) which corresponds to a rectangular cutout available to us (img1)?
One easy way to do that would be to cross-correlate your zero-mean cut-out with the zero-mean original image. As you have no noise added to the image, any maximum of the cross-correlation is a possible candidate.
However:
(i) If you don't use faces but e.g. blocks, there will be multiple maxima and you don't have an unique solution.
(ii) It is not exactly an elegant solution to your problem.
I modified the code example from [1] to make it clearer:
from scipy import signal, misc
import numpy as np
face = misc.face(gray=True)
face = face - np.mean(face)
face_cutout = np.copy(face[300:365, 670:750])
face_cutout = face_cutout - np.mean(face_cutout)
corr = signal.correlate2d(face, face_cutout, mode='valid')
y, x = np.unravel_index(np.argmax(corr), corr.shape) # find the match
print(f'x: {x} y: {y}')
[1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate2d.html
I've succeeded on it by using the below method, but I'm sure there must be other more time-efficient alternatives to provide exact angle of rotation instead of an approximation as the method below. I'll be pleased to hear your feedback.
The procedure is based on the following steps:
Import a template image (i.e.: with orientation at 0º)
Create a discrete array of the same image but each one rotated at 360º/rotatesteps compared to its nearest neighbour (i.e.: 30 to 50 rotated images)
# python 3 / opencv 3
# Settings:
rotate_steps = 36
step_angle = round((360/rotate_steps), 0) # one image at each 10º
# Rotation function
def rotate_image(image, angle):
# ../..
return rotated_image
# Importing a sample image and creating a n-dimension array where to store images in:
image = cv2.imread('sample_image.png')
image_Array = np.zeros((image.shape[1], image.shape[0], 1), dtype='uint8')
# Rotating sample image and saving it into the array as a new channel:
while rotation_angle <= (360 - step_angle):
angles.append(rotation_angle)
image_array[:,:,channel] = rotate_image(image.copy(), rotation_angle)
# ../..
So I get:
angles = [0, 10.0, 20.0, 30.0, .../..., 340.0, 350.0]
image_array = [image_1, image_2, image_3, ...] where image_i is a different channel on a numpy array.
Retrieve the 'test_image' for which I'm looking at the angle compared to the sample image we have previously rotated and stored into an array
Follow a series of cv2.matchTemplate() and cv2.minMaxLoc() to find what rotated image's angle best matches the 'test_image'
for i in range(len(angles)):
res = cv2.matchTemplate(test_image, image_array[:,:,i], cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
# ../..
And finally I pick the discretized angle matching the sample image as the one corresponding to the template image with 'max_val' highest value.
This has proved to work well having in mind the resulting precision is based on an approximation with higher / lower precision depending on the amount of rotated template images, and also the rising time taken when rotated template number increases...
I'm sure there must be other smarter alternatives based on different methods such as generating a kind of "orientation vector" of an image, and so comparing just the resulting number with a previously known one from a sample template...
Your feedback will be highly appreciated.
I think your problem doesn't have an easy solution. It's in fact a registration problem, warping (in this case, rotating) an image to fit another. And it's a known difficult problem, as segmentation is.
I heard image processing researchers say that "he who masters segmentation and registration masters image processing", which might be a little bit of a hyperbole, but it gives the general idea.
Anyway, your technique is how I would have gone with it. Looking on researchgate, https://www.researchgate.net/post/How_can_one_determine_the_rotation_angle_between_two_images, lots of answers also go your way. The alternative would be using feature matching, but I'm not sure it would be faster than your solution.
Maybe you can have a look at OpenCV registration methods http://docs.opencv.org/trunk/db/d61/group__reg.html (the method in this link uses pixel matching and not feature matching, maybe it's faster)
I have a problem with FFT implementation in Python. I have completely strange results.
Ok so, I want to open image, get value of every pixel in RGB, then I need to use fft on it, and convert to image again.
My steps:
1) I'm opening image with PIL library in Python like this
from PIL import Image
im = Image.open("test.png")
2) I'm getting pixels
pixels = list(im.getdata())
3) I'm seperate every pixel to r,g,b values
for x in range(width):
for y in range(height):
r,g,b = pixels[x*width+y]
red[x][y] = r
green[x][y] = g
blue[x][y] = b
4). Let's assume that I have one pixel (111,111,111). And use fft on all red values like this
red = np.fft.fft(red)
And then:
print (red[0][0], green[0][0], blue[0][0])
My output is:
(53866+0j) 111 111
It's completely wrong I think. My image is 64x64, and FFT from gimp is completely different. Actually, my FFT give me only arrays with huge values, thats why my output image is black.
Do you have any idea where is problem?
[EDIT]
I've changed as suggested to
red= np.fft.fft2(red)
And after that I scale it
scale = 1/(width*height)
red= abs(red* scale)
And still, I'm getting only black image.
[EDIT2]
Ok, so lets take one image.
Assume that I dont want to open it and save as greyscale image. So I'm doing like this.
def getGray(pixel):
r,g,b = pixel
return (r+g+b)/3
im = Image.open("test.png")
im.load()
pixels = list(im.getdata())
width, height = im.size
for x in range(width):
for y in range(height):
greyscale[x][y] = getGray(pixels[x*width+y])
data = []
for x in range(width):
for y in range(height):
pix = greyscale[x][y]
data.append(pix)
img = Image.new("L", (width,height), "white")
img.putdata(data)
img.save('out.png')
After this, I'm getting this image , which is ok. So now, I want to make fft on my image before I'll save it to new one, so I'm doing like this
scale = 1/(width*height)
greyscale = np.fft.fft2(greyscale)
greyscale = abs(greyscale * scale)
after loading it. After saving it to file, I have . So lets try now open test.png with gimp and use FFT filter plugin. I'm getting this image, which is correct
How I can handle it?
Great question. I’ve never heard of it but the Gimp Fourier plugin seems really neat:
A simple plug-in to do fourier transform on you image. The major advantage of this plugin is to be able to work with the transformed image inside GIMP. You can so draw or apply filters in fourier space, and get the modified image with an inverse FFT.
This idea—of doing Gimp-style manipulation on frequency-domain data and transforming back to an image—is very cool! Despite years of working with FFTs, I’ve never thought about doing this. Instead of messing with Gimp plugins and C executables and ugliness, let’s do this in Python!
Caveat. I experimented with a number of ways to do this, attempting to get something close to the output Gimp Fourier image (gray with moiré pattern) from the original input image, but I simply couldn’t. The Gimp image appears to be somewhat symmetric around the middle of the image, but it’s not flipped vertically or horizontally, nor is it transpose-symmetric. I’d expect the plugin to be using a real 2D FFT to transform an H×W image into a H×W array of real-valued data in the frequency domain, in which case there would be no symmetry (it’s just the to-complex FFT that’s conjugate-symmetric for real-valued inputs like images). So I gave up trying to reverse-engineer what the Gimp plugin is doing and looked at how I’d do this from scratch.
The code. Very simple: read an image, apply scipy.fftpack.rfft in the leading two dimensions to get the “frequency-image”, rescale to 0–255, and save.
Note how this is different from the other answers! No grayscaling—the 2D real-to-real FFT happens independently on all three channels. No abs needed: the frequency-domain image can legitimately have negative values, and if you make them positive, you can’t recover your original image. (Also a nice feature: no compromises on image size. The size of the array remains the same before and after the FFT, whether the width/height is even or odd.)
from PIL import Image
import numpy as np
import scipy.fftpack as fp
## Functions to go from image to frequency-image and back
im2freq = lambda data: fp.rfft(fp.rfft(data, axis=0),
axis=1)
freq2im = lambda f: fp.irfft(fp.irfft(f, axis=1),
axis=0)
## Read in data file and transform
data = np.array(Image.open('test.png'))
freq = im2freq(data)
back = freq2im(freq)
# Make sure the forward and backward transforms work!
assert(np.allclose(data, back))
## Helper functions to rescale a frequency-image to [0, 255] and save
remmax = lambda x: x/x.max()
remmin = lambda x: x - np.amin(x, axis=(0,1), keepdims=True)
touint8 = lambda x: (remmax(remmin(x))*(256-1e-4)).astype(int)
def arr2im(data, fname):
out = Image.new('RGB', data.shape[1::-1])
out.putdata(map(tuple, data.reshape(-1, 3)))
out.save(fname)
arr2im(touint8(freq), 'freq.png')
(Aside: FFT-lover geek note. Look at the documentation for rfft for details, but I used Scipy’s FFTPACK module because its rfft interleaves real and imaginary components of a single pixel as two adjacent real values, guaranteeing that the output for any-sized 2D image (even vs odd, width vs height) will be preserved. This is in contrast to Numpy’s numpy.fft.rfft2 which, because it returns complex data of size width/2+1 by height/2+1, forces you to deal with one extra row/column and deal with deinterleaving complex-to-real yourself. Who needs that hassle for this application.)
Results. Given input named test.png:
this snippet produces the following output (global min/max have been rescaled and quantized to 0-255):
And upscaled:
In this frequency-image, the DC (0 Hz frequency) component is in the top-left, and frequencies move higher as you go right and down.
Now, let’s see what happens when you manipulate this image in a couple of ways. Instead of this test image, let’s use a cat photo.
I made a few mask images in Gimp that I then load into Python and multiply the frequency-image with to see what effect the mask has on the image.
Here’s the code:
# Make frequency-image of cat photo
freq = im2freq(np.array(Image.open('cat.jpg')))
# Load three frequency-domain masks (DSP "filters")
bpfMask = np.array(Image.open('cat-mask-bpfcorner.png')).astype(float) / 255
hpfMask = np.array(Image.open('cat-mask-hpfcorner.png')).astype(float) / 255
lpfMask = np.array(Image.open('cat-mask-corner.png')).astype(float) / 255
# Apply each filter and save the output
arr2im(touint8(freq2im(freq * bpfMask)), 'cat-bpf.png')
arr2im(touint8(freq2im(freq * hpfMask)), 'cat-hpf.png')
arr2im(touint8(freq2im(freq * lpfMask)), 'cat-lpf.png')
Here’s a low-pass filter mask on the left, and on the right, the result—click to see the full-res image:
In the mask, black = 0.0, white = 1.0. So the lowest frequencies are kept here (white), while the high ones are blocked (black). This blurs the image by attenuating high frequencies. Low-pass filters are used all over the place, including when decimating (“downsampling”) an image (though they will be shaped much more carefully than me drawing in Gimp 😜).
Here’s a band-pass filter, where the lowest frequencies (see that bit of white in the top-left corner?) and high frequencies are kept, but the middling-frequencies are blocked. Quite bizarre!
Here’s a high-pass filter, where the top-left corner that was left white in the above mask is blacked out:
This is how edge-detection works.
Postscript. Someone, make a webapp using this technique that lets you draw masks and apply them to an image real-time!!!
There are several issues here.
1) Manual conversion to grayscale isn't good. Use Image.open("test.png").convert('L')
2) Most likely there is an issue with types. You shouldn't pass np.ndarray from fft2 to a PIL image without being sure their types are compatible. abs(np.fft.fft2(something)) will return you an array of type np.float32 or something like this, whereas PIL image is going to receive something like an array of type np.uint8.
3) Scaling suggested in the comments looks wrong. You actually need your values to fit into 0..255 range.
Here's my code that addresses these 3 points:
import numpy as np
from PIL import Image
def fft(channel):
fft = np.fft.fft2(channel)
fft *= 255.0 / fft.max() # proper scaling into 0..255 range
return np.absolute(fft)
input_image = Image.open("test.png")
channels = input_image.split() # splits an image into R, G, B channels
result_array = np.zeros_like(input_image) # make sure data types,
# sizes and numbers of channels of input and output numpy arrays are the save
if len(channels) > 1: # grayscale images have only one channel
for i, channel in enumerate(channels):
result_array[..., i] = fft(channel)
else:
result_array[...] = fft(channels[0])
result_image = Image.fromarray(result_array)
result_image.save('out.png')
I must admit I haven't managed to get results identical to the GIMP FFT plugin. As far as I see it does some post-processing. My results are all kinda very low contrast mess, and GIMP seems to overcome this by tuning contrast and scaling down non-informative channels (in your case all chanels except Red are just empty). Refer to the image:
I am trying to segment some microscopy bright-field images showing some E. coli bacteria.
The picture I am working with resembles this one (even if this one is obtained with phase contrast):
my problem is that after running my segmentation function (OtsuMask below) I cannot distinguish dividing bacteria (you can try my code below on the sample image). This means that I get one single labeled region for a couple of bacteria which are joined by their end, instead of two different labeled images.
The boundary between two dividing bacteria is too narrow to be highlighted by the morphological operations I perform on the thresholded image, but I guess there must be a way to achieve my goal.
Any ideas/suggestions?
import scipy as sp
import numpy as np
from scipy import optimize
import mahotas as mht
from scipy import ndimage
import pylab as plt
def OtsuMask(img,dilation_size=2,erosion_size=1,remove_size=500):
img_thres=np.asarray(img)
s=np.shape(img)
p0=np.array([0,0,0])
p0[0]=(img[0,0]-img[0,-1])/512.
p0[1]=(img[1,0]-img[1,-1])/512.
p0[2]=img.mean()
[x,y]=np.meshgrid(np.arange(s[1]),np.arange(s[0]))
p=fitplane(img,p0)
img=img-myplane(p,x,y)
m=img.min()
img=img-m
img=abs(img)
img=img.astype(uint16)
"""perform thresholding with Otsu"""
T = mht.thresholding.otsu(img,2)
print T
img_thres=img
img_thres[img<T*0.9]=0
img_thres[img>T*0.9]=1
img_thres=-img_thres+1
"""morphological operations"""
diskD=createDisk(dilation_size)
diskE=createDisk(erosion_size)
img_thres=ndimage.morphology.binary_dilation(img_thres,diskD)
labeled_im,N=mht.label(img_thres)
label_sizes=mht.labeled.labeled_size(labeled_im)
labeled_im=mht.labeled.remove_regions(labeled_im,np.where(label_sizes<remove_size))
figure();
imshow(labeled_im)
return labeled_im
def myplane(p,x,y):
return p[0]*x+p[1]*y+p[2]
def res(p,data,x,y):
a=(data-myplane(p,x,y));
return array(np.sum(np.abs(a**2)))
def fitplane(data,p0):
s=shape(data);
[x,y]=meshgrid(arange(s[1]),arange(s[0]));
print shape(x), shape(y)
p=optimize.fmin(res,p0,args=(data,x,y));
print p
return p
def createDisk( size ):
x, y = np.meshgrid( np.arange( -size, size ), np.arange( -size, size ) )
diskMask = ( ( x + .5 )**2 + ( y + .5 )**2 < size**2)
return diskMask
THE FIRST PART OF THE CODE IN OtsuMask CONSIST OF A PLANE FITTING AND SUBTRACTION.
A similar approach to the one described in this related stackoverflow answer can be used here.
It goes basically like this:
threshold your image, as you have done
apply a distance transform on the thresholded image
threshold the distance transform, so that only a small 'seed' part of each bacterium remains
label these seeds, giving each one a different shade of gray
(also add a labeled seed for the background)
execute the watershed algorithm with these seeds and the distance transformed image, to get the separatd contours of your bacteria
Check out the linked answer for some pictures that will make this much clearer.
A few thoughts:
Otsu may not be a good choice, as you may even use a fixed threshold (your bacteria are black).
Thresholding the image with any method will remove a lot of useful information.
I do not have a complete recipe for you, but even this very simple thing seems to give a lot of interesting information:
import matplotlib.pyplot as plt
import cv2
# cv2 is only used to read the image into an array, use only green channel
bact = cv.imread("/tmp/bacteria.png")[:,:,1]
# draw a contour image with fixed threshold 50
fig = plt.figure()
ax = fig.add_subplot(111)
ax.contourf(bact, levels=[0, 50], colors='k')
This gives:
This suggests that if you use contour-tracing techniques with fixed contours, you will receive quite nice-looking starting points for dilation and erosion. So, two differences in thresholding:
Contouring uses much more of the grayscale information than simple black/white thresholding.
The fixed threshold seems to work well with these images, and if illumination correction is needed, Otsu is not the best choice.
One day skimage Watershed segmentation was more useful for me, than any OpenCV samples. It uses some code borrowed from Cellprofiler project (python-based tool for sophisticated cell image analysis). Hint: use Euclidean distance transform from opencv, it's faster than scipy implementation. Also peak_local_max function has distance parameter, which useful for precise single cells distinguishing. I think this function is more robust in finding cell peaks than rude threshold (because intensity of cells may vary).
You can find scipy watershed implementation, but it has weird behavior.
I have a code:
def compare_frames(frame1, frame2):
# cropping ranges of two images
frame1, frame2 = similize(frame1, frame2)
sc = 0
h = numpy.zeros((300,256,3))
frame1= cv2.cvtColor(frame1,cv2.COLOR_BGR2HSV)
frame2= cv2.cvtColor(frame2,cv2.COLOR_BGR2HSV)
bins = numpy.arange(256).reshape(256,1)
color = [ (255,0,0),(0,255,0),(0,0,255) ]
for ch, col in enumerate(color):
hist_item1 = cv2.calcHist([frame1],[ch],None,[256],[0,255])
hist_item2 = cv2.calcHist([frame2],[ch],None,[256],[0,255])
cv2.normalize(hist_item1,hist_item1,0,255,cv2.NORM_MINMAX)
cv2.normalize(hist_item2,hist_item2,0,255,cv2.NORM_MINMAX)
sc = sc + (cv2.compareHist(hist_item1, hist_item2, cv2.cv.CV_COMP_CORREL)/len(color))
return sc
It works, but if image have color noise (more darken/lighten tint) it's not working and give similarity equals is 0.5. (need 0.8)
An image 2 is more darken than image 1.
Can you suggest me FAST comparison algorythm ignore light, blur, noise on images or modify that?
Note:
i have template matching algorythm too:
But it works slowly than i need although similarity is 0.95.
def match_frames(frame1, frame2):
# cropping ranges of two images
frame1, frame2 = similize(frame1, frame2)
result = cv2.matchTemplate(frame1,frame2,cv2.TM_CCOEFF_NORMED)
return numpy.amax(result)
Thanks
Your question is one of the classic ones in computer vision and image processing. Many doctoral theses have been written and scores of papers in conferences and journals.
In short direct pixel comparisons will not work in this case. A transformation of some kind is needed to take you to a different feature space. You could do something simple or complex depending on the requirements you have in mind. You could compute edges or corners. One suggestion already mentioned is the FAST corner detection. This would be a good choice as would SIFT etc... There are many others you could use but it will depend on how much the two images can vary and in what ways.
For example, if there is only going to be global color changes, tint, etc the approach would be different than if the images could be rotated or the object position changing in size (i.e. camera zoom).
Strictly speaking for the case you mention features such as FAST, SIFT, or even edges would work reasonably well. Check http://en.wikipedia.org/wiki/Feature_detection_%28computer_vision%29 for more information
Image patch descriptors (SIFT, SURF...) are usually monochromatic and expect black-and-white images. Thus, for any approach (point matching, frame matching...) I would advise you to change the color space to Lab or YUV first and then work on the luminance plane.
FAST is a (fast) corner detection algorithm. A corner is obviously insensitive to noise and contrast, but may be affected by blur (bad position, bad corner response for example). FAST does not include a descriptor part however, so your matching should then rely on geometric proximity. If you need a descriptor part, then you need to switch to one of the many other keypoint descriptors (SIFT, SURF, FAST + BRIEF/BRISK/ORB/FREAK...).