I was wondering if anyone knew why there is no documentation for HOGDescriptors in the Python bindings of OpenCV.
Maybe I've just missed them, but the only code I've found of them is this thread: Get HOG image features from OpenCV + Python?
If you scroll down in that thread, this code is found in there:
import cv2
hog = cv2.HOGDescriptor()
im = cv2.imread(sample)
h = hog.compute(im)
I've tested this and it works -- so the Python Bindings do exist, just the documentation doesn't. I was wondering if anyone knew why documentation for the Python bindings for HOG is so difficult to find / non-existent. Does anyone know if there is a tutorial I can read anywhere about HOG (especially via the Python Bindings)? I'm new to HOG and would like to see a few examples of how OpenCV does stuff before I start writing my own stuff.
1. Get Inbuilt Documentation:
Following command on your python console will help you know the structure of class HOGDescriptor:
import cv2
help(cv2.HOGDescriptor())
2. Example code:
Here is a snippet of code to initialize an cv2.HOGDescriptor with different parameters (The terms I used here are standard terms which are well defined in OpenCV documentation here):
import cv2
image = cv2.imread("test.jpg",0)
winSize = (64,64)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = 0
nlevels = 64
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
histogramNormType,L2HysThreshold,gammaCorrection,nlevels)
#compute(img[, winStride[, padding[, locations]]]) -> descriptors
winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(image,winStride,padding,locations)
3. Reasoning:
The resultant hog descriptor will have dimension as:
9 orientations X (4 corner blocks that get 1 normalization + 6x4 blocks on the edges that get 2 normalizations + 6x6 blocks that get 4 normalizations) = 1764. as I have given only one location for hog.compute().
4. Different way to initialize HOGDescriptor:
One more way to initialize is from xml file which contains all parameter values:
hog = cv2.HOGDescriptor("hog.xml")
To get an xml file one can do following:
hog = cv2.HOGDescriptor()
hog.save("hog.xml")
and edit the respective parameter values in xml file.
I was wondering the same. Almost none documentation can be found for OpenCV HOGDescriptor, other than the source cpp code.
Scikit-image has a good example page on extracting and illustrating HOG feature. It provides an alternative to explore HOG. It is documented here.
However, there is one thing to point out about scikit-image's hog implementation. Its Python code for hog function does not implement weighted vote for histogram orientation binning, but only does simple binning based on magnitude value falling into which bin. See its hog_histogram function. This is not following exactly Dalal and Triggs's paper.
Actually, I found that object detection based on OpenCV's implementation of HOG is more accurate than with the api from scikit-image. It makes sense to me, because weighted vote is important. By casting weighted votes to bins, variation in histogram is greatly reduced when gradient magnitude falls on or around the boundary. Chris McCormick wrote a very insightful blog on hog, in which orientation binning is clearly described as
For each gradient vector, it’s contribution to the histogram is given by the magnitude of the vector (so stronger gradients have a bigger impact on the histogram). We split the contribution between the two closest bins. So, for example, if a gradient vector has an angle of 85 degrees, then we add 1/4th of its magnitude to the bin centered at 70 degrees, and 3/4ths of its magnitude to the bin centered at 90.
I believe the intent of splitting the contribution is to minimize the problem of gradients which are right on the boundary between two bins. Otherwise, if a strong gradient was right on the edge of a bin, a slight change in the gradient angle (which nudges the gradient into the next bin) could have a strong impact on the histogram.
So, use OpenCV to compute hog if possible (haven't digged into its code and don't feel like doing so, but I suppose OpenCV's way of hog implementation is more appropriate). Not only I found an improvement in detection accuracy, but it also runs faster. Compared to scikit-image's hog code with wonderful comments, its documentation is almost none. Yet it is still feasible that one could get OpenCV's version working in practice - it's a matter of passing the right parameter for window size, cell size, block size, block stride, number of orientations, etc. Other parameters I just went with default.
Related
I have been dipping my toes into OpenCV and the stereovision functions it contains, and am struggling to get good results while following instructions in both the OpenCV documentation and many articles online. Specifically, I believe that at this point I have managed to obtain a decent calibration of my cameras, a decent stereo calibration, and even a decent rectification, but when moving to create the disparity map I seem to get nonsense back.
I am using a set of self-acquired images taken with a Pentax K-3 ii camera using a Loreo Lens-in-a-cap CCD splitter which gives me "two" images taken on one CCD. I can then split the image in half (and trim some of the pixels near the overlap) to have a reliable baseline distance in world coordinates with the camera. I unfortunately have no information on the true focal length of this configuration but I would guess it is around 9cm.
I have performed camera calibration on each split-image set to get camera matrices, distance coefficients, and object and image points for use in epipolar geometry. Then, following the procedure laid out in [1,2], perform stereo calibration and rectification. I do not have the required reputation to embed images, so please click here. By my understanding, the fact that similar features in both images are similar distances to the true horizontal lines I have drawn across them means that this is a good rectification result and should be usable.
However, when I implement the following code to create the disparity map:
# Settings for cv.StereoSGBM_create
minDisparity = 1
numDisparities = 64
blockSize = 1
disp12MaxDiff = 1
uniquenessRatio = 10
speckleWindowSize = 0
speckleRange = 8
stereo = cv.StereoSGBM_create(minDisparity=minDisparity, numDisparities=numDisparities, blockSize=blockSize, disp12MaxDiff=disp12MaxDiff, uniquenessRatio=uniquenessRatio,
speckleWindowSize=speckleWindowSize, speckleRange=speckleRange)
# Calculate the disparity map
disp = stereo.compute(imgL, imgR).astype(np.float32)
# Normalize the values to spread them across the viewable range
disp = cv.normalize(disp,0,255,cv.NORM_MINMAX)
# Resize for display
disp = cv.resize(disp, (1000,1000))
cv.imshow("disparity",disp)
cv.waitKey(0)
The result is disheartening. Intuitively, seeing a lot of black space surrounding edges which actually are fairly well-defined (such as in the chessboard pattern or near my hands) would suggest that there is very little disparity. However it seems clear to me that the images are quite different in terms of translation, so I am a bit confused. I have been delving through the documentation and run out of ideas. I tried reusing the code that produced the initial set of epipolar lines provided here which seemed to work on the original image quite nicely. However, it produces epipolar lines which are certainly not horizontal. This tells me that something is wrong, but I do not understand what could be, especially given the "visual test" I described above. I suspect I am misapplying that section of the code.
One thought I have is that I need to use an ROI to select the valid parts of the image, but I am unsure how to go about this. I think this is supported by the odd streaking behavior at the right edge of the left image post-rectification.
This is a link to a pastebin of all of my code, aside from the initial camera calibration which has significant runtime due to the size of the images.
I would appreciate any help that can be offered as at this point I am going a bit codeblind. I am limited to only 8 links due to my reputation, so please let me know if I can provide better images or documentation of my work.
Given two images of the same scene with potential alignment, focus, lighting differences and noise, I am looking for an operation that I can run on these images that produces another image of the difference between them that minimizes these differences or is more sensitive to the structural differences between them than the global differences. My initial thought was a comparison between the corresponding neighborhoods around a pixel in image A and the same pixel in image B might work.
Is this function already implemented in OpenCV or some other Python library (scipy, numpy etc)?
My musings:
A simple frame difference would tell me where absolute differences occur but is very brittle to alignment, lighting differences. Maybe there is a way to find the standard deviation over a pixel's neighborhood. Numpy's std only works by axis...
This seems like I want the correlation between two signals but I don't know how to extend this to a non-repeating 2D world. scipy.signal.correlate2d seems like it may work if there was an efficient way to just pass corresponding neighborhoods to it. However, I don't have a good feel for what is going on under the hood.
A convolution of one image where the kernel comes from corresponding locations in the other images would give a comparison that would handle noise and focus issues well but I don't know how to use a dynamic kernel for a convolution.
If I had a library of basically identical images (not an original assumption but doable) to compare one image to, I could use a mean difference or mixture of gaussians. But I don't think this would help with alignment or lighting. I could align the image first and then do the comparison.
Per the comment below, I looked up the skimage SSIM (Structural Similarity Index) method that is used to measure image degradation due to things like lossy compression and decompression. It actually expects two copies of the same image - one a truth source, one in a potentially degraded state due to lossy compression and decompression. This method is soft on global bias (lighting), which is good, but sensitive to noise (by design) and especially sensitive to misalignment.
The comment led me to MSE which acts globally, but if iterated over an image by neighborhood, it gives a good result - insensitive to bias, noise but not structural differences. However, it is fairly sensitive to alignment differences and very slow in python...
# Mean Squared Error
def mse(imageA, imageB):
return np.mean((imageA - imageB)**2)
from scipy import misc
import numpy as np
face = misc.face()
nface = np.array(face)
nface[295:305,395:415] = face[195:205,495:515] #discontinuous region
nface = cv2.blur(nface, (3,3)) # focus effects
nface = nface + (np.random.randn(*face.shape) * 10 - 5) # noise
nface = (nface * .9 + 20).astype(int) #lighting
n = m = 3
output = np.zeros(face.shape[:2])
for i in range(face.shape[0]):
for j in range(face.shape[1]):
if i > n and i < face.shape[0]-n and j > m and j < face.shape[1]-m:
output[i,j] = mse(nface[i-n:i+n, j-m:j+m], face[i-n:i+n, j-m:j+m])
Is this a common image processing technique? Is there an name for this or an optimized implementation in openCV or Numpy?
I have tried 3 algorithms:
Compare by Compare_ssim.
Difference detection by PIL (ImageChops.difference).
Images subtraction.
The first algorithm:
(score, diff) = compare_ssim(img1, img2, full=True)
diff = (diff * 255).astype("uint8")
The second algorithm:
from PIL import Image ,ImageChops
img1=Image.open("canny1.jpg")
img2=Image.open("canny2.jpg")
diff=ImageChops.difference(img1,img2)
if diff.getbbox():
diff.show()
The third algorithm:
image3= cv2.subtract(image1,image2)
The problem is these algorithms are so sensitive. If the images have different noise, they consider that the two images are totally different. Any ideas to fix that?
These pictures are different in many ways (deformation, lighting, colors, shape) and simple image processing just cannot handle all of this.
I would recommend a higher level method that tries to extract the geometry and color of those tubes, in the form of a simple geometric graph. Then compare the graphs rather than the images.
I acknowledge that this is easier said than done, and will only work with this particular kind of scene.
It is very difficult to help since we don't really know which parameters you can change, like can you keep your camera fixed? Will it always be just about tubes? What about tubes colors?
Nevertheless, I think what you are looking for is a framework for image registration and I propose you to use SimpleElastix. It is mainly used for medical images so you might have to get familiar with the library SimpleITK. What's interesting is that you have a lot of parameters to control the registration. I think that you will have to look into the documentation to find out how to control a specific image frequency, the one that create the waves and deform the images. Hereafter I did not configured it to have enough local distortion, you'll have to find the best trade-off, but I think it should be flexible enough.
Anyway, you can get such result with the following code, I don't know if it helps, I hope so:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import SimpleITK as sitk
fixedImage = sitk.ReadImage('1.jpg', sitk.sitkFloat32)
movingImage = sitk.ReadImage('2.jpg', sitk.sitkFloat32)
elastixImageFilter = sitk.ElastixImageFilter()
affine_registration_parameters = sitk.GetDefaultParameterMap('affine')
affine_registration_parameters["NumberOfResolutions"] = ['6']
affine_registration_parameters["WriteResultImage"] = ['false']
affine_registration_parameters["MaximumNumberOfSamplingAttempts"] = ['4']
parameterMapVector = sitk.VectorOfParameterMap()
parameterMapVector.append(affine_registration_parameters)
parameterMapVector.append(sitk.GetDefaultParameterMap("bspline"))
elastixImageFilter.SetFixedImage(fixedImage)
elastixImageFilter.SetMovingImage(movingImage)
elastixImageFilter.SetParameterMap(parameterMapVector)
elastixImageFilter.Execute()
registeredImage = elastixImageFilter.GetResultImage()
transformParameterMap = elastixImageFilter.GetTransformParameterMap()
resultImage = sitk.Subtract(registeredImage, fixedImage)
resultImageNp = np.sqrt(sitk.GetArrayFromImage(resultImage) ** 2)
cv2.imwrite('gray_1.png', sitk.GetArrayFromImage(fixedImage))
cv2.imwrite('gray_2.png', sitk.GetArrayFromImage(movingImage))
cv2.imwrite('gray_2r.png', sitk.GetArrayFromImage(registeredImage))
cv2.imwrite('gray_diff.png', resultImageNp)
Your first image resized to 256x256:
Your second image:
Your second image registered with the first one:
Here is the difference between the first and second image which could show what's different:
This is one of the classical problems of image treatment - and one which does not have an answer which holds universally. The possible answers depend highly on what type of images you have, and what type of information you want to extract from them and the differences between them.
You can reduce noise by two means:
a) take several images of the same object, such that the object does not change. You can stack the images and noise is reduced by square-root of the number of images.
b) You can run a blur filter over the image. The more you blur, the more noise is averaged. Noise is here reduced by square-root of the number of pixels you average over. But so is detail in the images.
In both cases (a) and (b) you run the difference analysis after you applied either method.
Probably not applicable to you as you likely cannot get hold of either: it helps, if you can get hold of flatfields which give the inhomogeneity of illumination and pixel sensitivity of your camera and allow correcting the images prior to any treatment. Similar goes for darkfields which give an estimate of the influence of the read-out noise of the camera and allow correcting images for those.
There is somewhat another 3rd option, which is more high-level: run your object analysis first at a detailed-enough level. And compare the results.
I am trying to deblur an image in Python but have run into some problems. Here is what I've tried, but keep in mind that I am not an expert on this topic. According to my understanding, if you know the point spread function, you should be able to deblur the image quite simply by performing a deconvolution. However, this doesn't seem to work and I don't know if I'm doing something stupid or if I just don't understand things correctly. In Mark Newman's Computational Physics book (using Python), he touches on this subject in problem 7.9. In this problem he supplies an image that he deliberately blurred using a Gaussian point spread function (psf), and the objective of the problem is to deblur the image using a Gaussian. This is accomplished by dividing the 2D FFT of the blurred image by the 2D FFT of the psf and then taking the inverse transform. This works reasonably well.
To extend this problem, I wanted to deblur a real image taken with a camera that was deliberately out of focus. So I set up a camera and took two sets of pictures. The first set of pictures were in focus. The first was of a very small LED light in a completely darkened room and the second was of a piece of paper with text on it (using the flash). Then, without changing any of the distances or anything, I changed the focus setting on the camera so that the text was very out of focus. I then took a picture of the text using the flash and took a second picture of the LED (without the flash). Here are the blurred images.
Now, according to my understanding, the image of the blurred point light source should be the point spread function, and as such I should be able to use it to deblur my image. The problem is that when I do so I get an image that just looks like noise. After doing a little research, it seems as though noise can be a big problem when using deconvolution techniques. However, given that I have measured what I believe to be the exact point spread function, I am surprised that noise would be an issue here.
One thing I did try was to replace small values (less than epsilon) in the psf transform with either 1 or with epsilon, and I tried this with a huge range of values for epsilon. This yielded an image that was not just noise, but is also not a deblurred version of the image; it looks like a weird, blurry version of the original (non-blurred) image. Here is an image from my program (you can ignore the value of sigma, which was not used in this program).
I believe I am dealing with a noise issue, but I don't know why and I don't know what to do about it. Any advice would be much appreciated (keeping in mind that I am no expert in this area).
Note that I have deliberately not posted the code because I think that is somewhat irrelevant at this point. But I would be happy to do so if anyone thinks that would be useful. I don't think it's a programming issue because I used the same technique and it works fine when I have the known point spread function (such as when I divide the FFT of the original in-focus image by the FFT of the out-of-focus image and then inverse transform). I just don't understand why I can't seem to use my experimentally measured point spread function.
The problem you have sought to solve is, unfortunately, more difficult than you might expect. Let me explain it in four parts. The first section assumes that you are comfortable with the Fourier transform.
Why you cannot solve this problem with a simple deconvolution.
An outline to how image deblurring can be performed.
Deconvolution by FFT and why it is a bad idea
An alternative method to perform deconvolution
But first, some notation:
I use I to represent an image and K to represent a convolution kernel. I * K is the convolution of the image I with the kernel K. F(I) is the (n-dimensional) Fourier transform of the image I and F(K) is the Fourier transform of the convolution kernel K (this is also called the point spread function, or PSF). Similarly, Fi is the inverse Fourier transform.
Why you cannot solve this problem with a simple deconvolution:
You are correct when you say that we can recover a blurred image Ib = I * K by dividing the Fourier transform of Ib by the Fourier transform of K. However, lens blur is not a convolution blurring operation. It is a modified convolution blurring operation where the blurring kernel K is dependent on the distance to the object you have photographed. Thus, the kernel changes from pixel to pixel.
You might think that this is not an issue with your image, as you have measured the correct kernel at the position of the image. However, this might not be the case, as the part of the image that is far away can influence the part of the image that is close. One way to fix this problem is to crop the image so that it is only the paper that is visible.
Why deconvolution by FFT is a bad idea:
The Convolution Theorem states that I * K = Fi(F(I)F(K)). This theorem leads to the reasonable assumption that if we have an image, Ib = I * K that is blurred by a convolution kernel K, then we can recover the deblurred image by computing I = (F(Ib)/F(K)).
Before we look at why this is a bad idea, I want to get some intuition for what the Convolution Theorem means. When we convolve an image with a kernel, then that is the same as taking the frequency components of the image and multiplying it elementwise with the frequency components of the kernel.
Now, let me explain why it is difficult to deconvolve an image with the FFT. Blurring, by default, removes high-frequency information. Thus, the high frequencies of K must go towards zero. The reason for this is that the high-frequency information of I is lost when it is blurred -- thus, the high-frequency components of Ib must go towards zero. For that to happen, the high-frequency components of K must also go towards zero.
As a result of the high-frequency components of K being almost zero, we see that the high-frequency components of Ib is amplified significantly (as we almost divide by zero) when we deconvolve with the FFT. This is not a problem in the noise-free case.
In the noisy case, however, this is a problem. The reason for this is that noise is, by definition, high-frequency information. So when we try to deconvolve Ib, the noise is amplified to an almost infinite extent. This is the reason that deconvolution by the FFT is a bad idea.
Furthermore, you need to consider how the FFT based convolution algorithm deals with boundary conditions. Normally, when we convolve images, the resolution decreases somewhat. This is unwanted behaviour, so we introduce boundary conditions that specify the pixel values of pixels outside the image. Example of such boundary conditions are
Pixels outside the image has the same value as the closest pixel inside the image
Pixels outside the image has a constant value (e.g. 0)
The image is part of a periodic signal, thus the row of pixel above the topmost row is equal to the bottom row of pixels.
The final boundary condition often makes sense for 1D signals. For images, however, it makes little sense. Unfortunately, the convolution theorem specifies that periodic boundary conditions are used.
In addition to this, it seems that the FFT based inversion method is significantly more sensitive to erroneous kernels than iterative methods (e.g. Gradient descent and FISTA).
An alternative method to perform deconvolution
It might seem like all hope is lost now, as all images are noisy, and deconvolving will increase the noise. However, this is not the case, as we have iterative methods to perform deconvolution. Let me start by showing you the simplest iterative method.
Let || I ||² be the squared sum of all of I's pixels. Solving the equation
Ib = I * K
with respect to I is then equivalent to solving the following optimisation problem:
min L(I) = min ||I * K - Ib||²
with respect to I. This can be done using gradient descent, as the gradient of L is given by
DL = Q * (I * K - Ib)
where Q is the kernel you get by transposing K (this is also called the matched filter in the signal processing litterature).
Thus, you can get the following iterative algorithm that will deblur an image.
from scipy.ndimage import convolve
blurred_image = # Load image
kernel = # Load kernel/psf
learning_rate = # You need to find this yourself, do a logarithmic line search. Small rate will always converge, but slowly. Start with 0.4 and divide by 2 every time it fails.
maxit = 100
def loss(image):
return 0.5 * np.sum((convolve(image, kernel) - blurred_image)**2)
def gradient(image):
return convolve(convolve(image, kernel) - blurred_image, kernel.T)
deblurred = blurred_image.copy()
for _ in range(maxit):
deblurred -= learning_rate*gradient(image)
The above method is perhaps the simplest of the iterative deconvolution algorithms. The way these are used in practice are through so-called regularised deconvolution algorithms. These algorithms work by firstly specifying a function that measures the amount of noise in an image, e.g. TV(I) (the total variation of I). Then the optimisation procedure is performed on L(I) + wTV(I). If you are interested in such algorithms, I recommend reading the FISTA paper by Amir Beck and Marc Teboulle. The paper is quite maths heavy, but you don't need to understand most of it -- only how to implement the TV deblurring algorithm.
In addition to using a regulariser, we use accelerated methods to minimise the loss L(I). One such example is Nesterov accelerated gradient descent. See Adaptive Restart for Accelerated Gradient Schemes by Brendan O'Donoghue, Emmanuel Candes for information about such methods.
An outline to how image deblurring can be performed.
Crop your image so that everything has same distance from the camera
Find the convolution kernel the same way you did now (Test your deconvolution algorithm on synthetically blurred images first)
Implement an iterative method to compute deconvolutoin
Deconvolve the image.
I'm new in image analysis (with Python) and I would like to apply the richardson_lucy deconvolution (from skimage) on my data (CT scans). For this reason, I estimated the PSF in "number of voxels" by means of a specific software. Its value is roughly 6.73 voxels, but I don't know how to use it as a paramter in the function.
The function uses the PSF parameter as a ndarray, so I tried in this way:
from skimage import io
from pylab import array
img = io.imread ("Slice1.tif")
import skimage.restoration as rst
PSF = array (6.7)
img_dbl = rst.richardson_lucy (img, PSF, iterations=10)
It shows me this error: IndexError: too many indices for array
In CT scans, blurring between two different materials can be linked to a Gaussian PSF. If you have more tips for deblurring (maybe better than RL) just write them.
Can any one please help me.
Have a similar problem and still researching it. In my case it didn't work if I didn't use np.uint8 as type. CT data should be 16 bit but only uses the first 12 bits (which are mapped to values between [-1024, 3096]. So I had to rescale my image data to [0-255] before getting anything than black or white out it.
If I understood that correctly, the sum of the PSF should always be 1. What I can guess from your question is that you assume the point spread function to be a gaussian with a meaningful (95% of values?) spread of 6.7 pixels. In that case you would have to model the PSF as a gaussian (that's what I came here for).
You can create one with the method described by #FuzzyDuck in this post.
PSF = gkern(5,2)
This would create a gaussian 5x5 kernel with sum 1 using the proposed method by #FuzzyDuck with a sigma of 2. Note that point spread functions could be applied several times so you have to experiment a little bit with the values (or use an algorithm to approximate that).