I get a good result from the paper: LoFTR: Detector-Free Local Feature Matching with Transformers.
Now I want to get a depth map from the feature matching result.
So I really hope that maybe someone can give me a link or code to reach this goal.
Thank you so much.
You will not be able to achieve a reliable depth or disparity map using the feature matches, especially with the example you posted. A general algorithm to get you started on a low quality depth map would be:
Find a rotation factor for one or both images that minimizes the sum of all of the y-offsets in the feature matches.
Iterate through each feature and record the x-offset from image A to image B. This will give you a sparse disparity map.
Now the difficult part... use an inpainting method (there are lots of them, look them up) to fill in the missing pixel values based on the existent values. (This would give you an unreliable result even if your initial images were well aligned, but it's your only option given your starting point.)
Now you have a dense disparity map. Conversion from depth to disparity is a simple calculation but it requires knowledge of the camera's position, rotation, and properties (focal length, sensor size, etc) when each image was taken. You can make those values up in order to create a fake depth map, but again it will introduce even less accuracy.
I'm trying to simulate this article.
In order to make images noisy and based on the article I need to add manually deterministic distribution noise to Mnist dataset. The article says
" This noise has been added manually using deterministic distribution"
in page 973 at the first row. I looked for it almost in everywhere, but was unable to find how to do it. They usually use this distribution for other goals, not to make images noisy.
This article adds and measures noise by percent, for example a 50% noisy image.
How can we add it with percent in python?
I really need help with that.
They never really make it clear, but it looks like they're randomly assigning pixels a value of 0 for each of their inputs. In the case you mention, 50% of the pixels were assigned 0, and those pixels were chosen at random.
I am trying to deblur an image in Python but have run into some problems. Here is what I've tried, but keep in mind that I am not an expert on this topic. According to my understanding, if you know the point spread function, you should be able to deblur the image quite simply by performing a deconvolution. However, this doesn't seem to work and I don't know if I'm doing something stupid or if I just don't understand things correctly. In Mark Newman's Computational Physics book (using Python), he touches on this subject in problem 7.9. In this problem he supplies an image that he deliberately blurred using a Gaussian point spread function (psf), and the objective of the problem is to deblur the image using a Gaussian. This is accomplished by dividing the 2D FFT of the blurred image by the 2D FFT of the psf and then taking the inverse transform. This works reasonably well.
To extend this problem, I wanted to deblur a real image taken with a camera that was deliberately out of focus. So I set up a camera and took two sets of pictures. The first set of pictures were in focus. The first was of a very small LED light in a completely darkened room and the second was of a piece of paper with text on it (using the flash). Then, without changing any of the distances or anything, I changed the focus setting on the camera so that the text was very out of focus. I then took a picture of the text using the flash and took a second picture of the LED (without the flash). Here are the blurred images.
Now, according to my understanding, the image of the blurred point light source should be the point spread function, and as such I should be able to use it to deblur my image. The problem is that when I do so I get an image that just looks like noise. After doing a little research, it seems as though noise can be a big problem when using deconvolution techniques. However, given that I have measured what I believe to be the exact point spread function, I am surprised that noise would be an issue here.
One thing I did try was to replace small values (less than epsilon) in the psf transform with either 1 or with epsilon, and I tried this with a huge range of values for epsilon. This yielded an image that was not just noise, but is also not a deblurred version of the image; it looks like a weird, blurry version of the original (non-blurred) image. Here is an image from my program (you can ignore the value of sigma, which was not used in this program).
I believe I am dealing with a noise issue, but I don't know why and I don't know what to do about it. Any advice would be much appreciated (keeping in mind that I am no expert in this area).
Note that I have deliberately not posted the code because I think that is somewhat irrelevant at this point. But I would be happy to do so if anyone thinks that would be useful. I don't think it's a programming issue because I used the same technique and it works fine when I have the known point spread function (such as when I divide the FFT of the original in-focus image by the FFT of the out-of-focus image and then inverse transform). I just don't understand why I can't seem to use my experimentally measured point spread function.
The problem you have sought to solve is, unfortunately, more difficult than you might expect. Let me explain it in four parts. The first section assumes that you are comfortable with the Fourier transform.
Why you cannot solve this problem with a simple deconvolution.
An outline to how image deblurring can be performed.
Deconvolution by FFT and why it is a bad idea
An alternative method to perform deconvolution
But first, some notation:
I use I to represent an image and K to represent a convolution kernel. I * K is the convolution of the image I with the kernel K. F(I) is the (n-dimensional) Fourier transform of the image I and F(K) is the Fourier transform of the convolution kernel K (this is also called the point spread function, or PSF). Similarly, Fi is the inverse Fourier transform.
Why you cannot solve this problem with a simple deconvolution:
You are correct when you say that we can recover a blurred image Ib = I * K by dividing the Fourier transform of Ib by the Fourier transform of K. However, lens blur is not a convolution blurring operation. It is a modified convolution blurring operation where the blurring kernel K is dependent on the distance to the object you have photographed. Thus, the kernel changes from pixel to pixel.
You might think that this is not an issue with your image, as you have measured the correct kernel at the position of the image. However, this might not be the case, as the part of the image that is far away can influence the part of the image that is close. One way to fix this problem is to crop the image so that it is only the paper that is visible.
Why deconvolution by FFT is a bad idea:
The Convolution Theorem states that I * K = Fi(F(I)F(K)). This theorem leads to the reasonable assumption that if we have an image, Ib = I * K that is blurred by a convolution kernel K, then we can recover the deblurred image by computing I = (F(Ib)/F(K)).
Before we look at why this is a bad idea, I want to get some intuition for what the Convolution Theorem means. When we convolve an image with a kernel, then that is the same as taking the frequency components of the image and multiplying it elementwise with the frequency components of the kernel.
Now, let me explain why it is difficult to deconvolve an image with the FFT. Blurring, by default, removes high-frequency information. Thus, the high frequencies of K must go towards zero. The reason for this is that the high-frequency information of I is lost when it is blurred -- thus, the high-frequency components of Ib must go towards zero. For that to happen, the high-frequency components of K must also go towards zero.
As a result of the high-frequency components of K being almost zero, we see that the high-frequency components of Ib is amplified significantly (as we almost divide by zero) when we deconvolve with the FFT. This is not a problem in the noise-free case.
In the noisy case, however, this is a problem. The reason for this is that noise is, by definition, high-frequency information. So when we try to deconvolve Ib, the noise is amplified to an almost infinite extent. This is the reason that deconvolution by the FFT is a bad idea.
Furthermore, you need to consider how the FFT based convolution algorithm deals with boundary conditions. Normally, when we convolve images, the resolution decreases somewhat. This is unwanted behaviour, so we introduce boundary conditions that specify the pixel values of pixels outside the image. Example of such boundary conditions are
Pixels outside the image has the same value as the closest pixel inside the image
Pixels outside the image has a constant value (e.g. 0)
The image is part of a periodic signal, thus the row of pixel above the topmost row is equal to the bottom row of pixels.
The final boundary condition often makes sense for 1D signals. For images, however, it makes little sense. Unfortunately, the convolution theorem specifies that periodic boundary conditions are used.
In addition to this, it seems that the FFT based inversion method is significantly more sensitive to erroneous kernels than iterative methods (e.g. Gradient descent and FISTA).
An alternative method to perform deconvolution
It might seem like all hope is lost now, as all images are noisy, and deconvolving will increase the noise. However, this is not the case, as we have iterative methods to perform deconvolution. Let me start by showing you the simplest iterative method.
Let || I ||² be the squared sum of all of I's pixels. Solving the equation
Ib = I * K
with respect to I is then equivalent to solving the following optimisation problem:
min L(I) = min ||I * K - Ib||²
with respect to I. This can be done using gradient descent, as the gradient of L is given by
DL = Q * (I * K - Ib)
where Q is the kernel you get by transposing K (this is also called the matched filter in the signal processing litterature).
Thus, you can get the following iterative algorithm that will deblur an image.
from scipy.ndimage import convolve
blurred_image = # Load image
kernel = # Load kernel/psf
learning_rate = # You need to find this yourself, do a logarithmic line search. Small rate will always converge, but slowly. Start with 0.4 and divide by 2 every time it fails.
maxit = 100
def loss(image):
return 0.5 * np.sum((convolve(image, kernel) - blurred_image)**2)
def gradient(image):
return convolve(convolve(image, kernel) - blurred_image, kernel.T)
deblurred = blurred_image.copy()
for _ in range(maxit):
deblurred -= learning_rate*gradient(image)
The above method is perhaps the simplest of the iterative deconvolution algorithms. The way these are used in practice are through so-called regularised deconvolution algorithms. These algorithms work by firstly specifying a function that measures the amount of noise in an image, e.g. TV(I) (the total variation of I). Then the optimisation procedure is performed on L(I) + wTV(I). If you are interested in such algorithms, I recommend reading the FISTA paper by Amir Beck and Marc Teboulle. The paper is quite maths heavy, but you don't need to understand most of it -- only how to implement the TV deblurring algorithm.
In addition to using a regulariser, we use accelerated methods to minimise the loss L(I). One such example is Nesterov accelerated gradient descent. See Adaptive Restart for Accelerated Gradient Schemes by Brendan O'Donoghue, Emmanuel Candes for information about such methods.
An outline to how image deblurring can be performed.
Crop your image so that everything has same distance from the camera
Find the convolution kernel the same way you did now (Test your deconvolution algorithm on synthetically blurred images first)
Implement an iterative method to compute deconvolutoin
Deconvolve the image.
So, I want to detect cars from a driver recorder recorded video. I've read a lot and do research quite a lot but still not quite getting it. I do think of using a HOG descriptor with linear SVM. But in what way it can still be improver to make it easier to be implement and more robust since this will be kind of a research for me?
I am thinkin of combining another technique/algorithm with the HOG but still kind of lost. I am quite new in this.
Any help is greatly appreciated. I am also open to other better ideas.
HOG (histogram of oriented gradients) is merely a certain type of feature vector that can be computed from your data. You compute the gradient vector at each pixel in your image and then you divide up the possible angles into a discrete number of bins. Within a given image sub-region, you add the total magnitude of the gradient pointing in a given direction as the entry for the relevant angular bin containing that direction.
This leaves you with a vector that has a length equal to the number of bins you've chosen for dividing up the range of angles and acts as an unnormalized histogram.
If you want to compute other image features for the same sub-region, such as the sum of the pixels, some measurement of sharp angles or lines, aspects of the color distribution, or so forth, you can compute as many or as few as you would like, arrange them into a long vector as well, and simply concatenate that feature vector with the HOG vector.
You may also want to repeat the computation of the HOG vector for several different scale levels to help capture some scale variability, concatenating each scale-specific HOG vector onto the overall feature vector. There are other feature concepts like SIFT and others, which are created to automatically account for scale invariance.
You may need to do some normalization or scaling, which you can read about in any standard SVM guide. The standard LIBSVM guide is a great place to start.
You will have to be careful to organize your feature vector correctly since you will likely have a very large number of components to the feature vector, and you have to ensure they are always calculated and placed into the same ordering and undergo exactly the same scaling or normalization treatments.
I would like to get a plot of how much each spatial frequency is present in a grayscale image.
I have been told to try np.fft.fft2 but apparently this is not what I need (according to this question). I was then told to look into np.fft.fftfreq - and while that sounds like what I need it will only take an integer as input, so
np.fft.fftfreq(np.fft.fft2(image))
won't work. Nor does:
np.fft.fftfreq(np.abs(np.fft.fft2(image)))
How else could I try to do this? it seems like a rather trivial task for a fourier transform. It's actually the task of the fourier transform. I don't understand why np.fft.fft2 doesn't have a flag to make the frequency analysis orientation-agnostic.
Maybe you should reread the comments in the linked question, as well as the documentation also posted in the last comment. You are supposed to pass the image shape to np.fft.fftfreq:
freqx = np.fft.fftfreq(image.shape[0])
for x-direction and
freqy = np.fft.fftfreq(image.shape[1])
for y-direction.
The results will be the centers of the frequency bins returned by fft2, for example:
image_fft = np.fft.fft2(image)
Then the frequency corresponding to the amplitude image_fft[i,j] is freqx[i] in x-direction and freqy[i] in y-direction.
Your last sentence indicates that you want to do something completely different, though. The Fourier transform of a two-dimensional input is by common definition also two-dimensional. What deviating definition do you want to use?