I am trying to implement an algorithm in python, but I am not sure when I should use fftshift(fft(fftshift(x))) and when only fft(x) (from numpy). Is there a rule of thumb based on the shape of input data?
I am using fftshift instead of ifftshift due to the even number of values in the vector x.
It really just depends on what you want. The DFT (and hence the FFT) is periodic in the frequency domain with period equal to 2pi.
The fft() function will return the approximation of the DFT with omega (radians/s) from 0 to pi (i.e. 0 to fs, where fs is the sampling frequency). All fftshift() does is swap the output vector of the fft() right down the middle. So the output of fftshift(fft()) is now from -pi/2 to pi/2.
Usually, people like to plot a good approximation of the DTFT (or maybe even the CTFT) using the FFT, so they zero-pad the input with a huge amount of zeros (the function fft() does this on it's own) and then they use the fftshift() function to plot between -pi and pi.
In other words, use fftshift(fft()) for plotting, and fft() for the math!
fft(fftshift(x)) rotates the input vector so the the phase of the complex FFT result is relative to the center of the original data window. If the input waveform is not exactly integer periodic in the FFT width, phase relative to the center of the original window of data may make more sense than the phase relative to some averaging between the discontinuous beginning and end. fft(fftshift(x)) also has the property that the imaginary component of a result will always be positive for a positive zero crossing at the center of the window of any antisymmetric waveform component.
fftshift(fft(y)) rotates the FFT results so that the DC bin is in the center of the result, halfway between -Fs/2 and Fs/2, which is a common spectrum display format.
Related
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
All methods seem to work almost identically, but have slight differences.
An explanation why they end up with slightly different results would be appreciated.
After computing those I started playing with the result to learn about it, and I found out something that confused me:
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
Why is that?
I added the code I used to compute this (the first method at least) because maybe I'm doing something wrong.
from scipy.signal import convolve2d
im = sl.read_image(r'C:\Users\ahhal\Desktop\Essentials\Uni\year3\SemesterA\ImageProcessing\Exercises\Ex2\external\monkey.jpg', 1)
b = [[-1, 1]]
print(np.median(convolve2d(im, b)))
output: 0.0
The read_image function is my own and this is the implementation:
from imageio import imread
from skimage.color import rgb2gray
import numpy as np
def read_image(filename, representation):
"""
Receives an image file and converts it into one of two given representations.
:param filename: The file name of an image on disk (could be grayscale or RGB).
:param representation: representation code, either 1 or 2 defining wether the output
should be a grayscale image (1) or an RGB image (2). If the input image is grayscale,
we won't call it with representation = 2.
:return: An image, represented by a matrix of type (np.float64) with intensities
normalized to the range [0,1].
"""
assert representation in [1, 2]
# reads the image
im = imread(filename)
if representation == 1: # If the user specified they need grayscale image,
if len(im.shape) == 3: # AND the image is not grayscale yet
im = rgb2gray(im) # convert to grayscale (**Assuming its RGB and not a different format**)
im_float = im.astype(np.float64) # Convert the image type to one we can work with.
if im_float.max() > 1: # If image values are out of bound, normalize them.
im_float = im_float / 255
return im_float
Edit 2:
I tried it on several different images, and got 0.0 at all of them.
The image I'm using in the example is:
I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
These derivative methods are all approximate and make different assumptions:
Convolution by [[-1, 1]] computes differences between adjacent elements,
derivative ~= data[n+1] ā data[n]
You can interpret this like interpolating the data with a line segment, then taking the derivative of that interpolant:
I(x) = data[n] + (data[n+1] ā data[n]) * (x ā n)
So the approximation assumes the underlying function is locally linear. You can analyze the error by Taylor expansion to find that the error comes from the ignored higher-order terms. In other words, the approximation is accurate provided the function doesn't have strong nonlinear terms. This is a simple case of finite differences.
This is the same as 1, except with different boundary handling to handle convolution of samples near the edges of the image. By default, scipy.signal.convolve2d does zero padding (though you can use the boundary option to choose some other methods). However when computing the convolution through the DFT, then implicitly the boundary handling is periodic, wrapping around at the image edges. So the results of 1 and 2 differ for a margin of pixels near the edge because of the different boundary handling.
Computing the derivative through multiplying iĻ under the DFT representation can be interpreted like evaluating the derivative of the sinc interpolation the data. Sinc interpolation assumes the data is band limited. The error comes from spectra beyond the Nyquist frequency. Particularly, if there is a hard jump discontinuity from an object boundary, then the image is not bandlimited and the DFT-based derivative will have substantial error in the vicinity of the jump, appearing as ringing artifacts.
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
I don't know why this happened here, but it shouldn't always be the case. For instance if each image row is the unit ramp data[n] = n, then the convolution by [[-1, 1]] is equal to 1 everywhere, except depending on boundary handling possibly not at the edges, so the median is 1.
Pascal already gave a wonderful explanation of the differences between the various approximations to the derivative. So I'll focus here on the "why always 0.0?" question.
The median of the derivative is 0.0 only by approximation. When I compute it, based on the finite difference approximation (method #1), I get -5.15e-5 as the median. Close to zero, but not exactly zero.
The derivative is 0 in uniform (flat) regions of the image such as the out-of-focus background. Other features in the image tend to have both a positive and a negative edge, making the histogram of the derivative image very symmetric:
This symmetry causes the median (as well as the mean) to be close to zero for such an image. However, this is not always the case. For example, if the image is brighter on the left edge than the right edge (or the other way around), then there must be a net gradient across the image, causing the mean or median to be different from zero.
I am trying to get the phase distribution of a 2D aperture using FFT.
The input is a circle, where everything inside the circle has value 1, outside it has value 0.
In order to make a good transform, I use an input array that is 200x as large as the radius of the circle, and make a 5000x5000 grid out of it. This ensures that the circle is actually circular and there is enough room around in order that no Nyquist things happen.
I need to 2D Fourier transform the aperture and then calculate the phase of the Fourier transform at every point.
The function I use for creating the input (aperture):
creating the input aperture
Next do the numpy fft2 2D fourier transform:
Fourier transforming aperture
And the result of this is a 2D complex array (as expected!), BUT with the imaginary parts so much much much smaller than the real parts (17 orders of magnitude difference imaginary parts ~10E-17).
This is not expected and most probably wrong. What went wrong?
The FFT phase result of a perfectly symmetric input is zero, e.g. a strictly real result, thus atan2(Im,Re) == 0 , (imaginary components all zero, except for rounding noise).
(even symmetry with respect to (0,0) circularly, or to (n/2,n/2))
The phase will become non-zero (thus a non-zero imaginary component in the FFT result) when the input is moved off center or otherwise made non-symmetric.
I'm having a little trouble with doing a Fourier Deconvolution using numpy. I'm currently attempting this with a test case of 3 Gaussians so I know exactly what to expect at each end.
What I'm trying to recover is the input signal given the exact form of the filter and the output.
Here, I have used a naive constraint to suppress the high frequency ends setting it to zero (because the signals are all Gaussians in fourier space as well). I expected to recover my original input with a tiny bit of ringing due to this constraint.
#Dummy Case for Gaussian convolve with Gaussian
N = 128
x = np.arange(-5, 5, 10./(2 * N))
epsilon = 1e-18
def gaus(x,sigma):
return 1./np.sqrt(2*np.pi)/sigma * np.exp(-(x * x)/(2 * sigma**2))
y_g = gaus(x,0.3) #output gaussian blurred signal
y_b = gaus(x,0.1) #gaussian blur filter
y_i = gaus(x,np.sqrt(0.3**2 - 0.1**2)) #og gaussian input
f_yg = np.fft.fft(y_g) #fft the blur
f_yb = np.fft.fft(y_b) #fft the filter
f_yi = np.fft.fft(y_i)
r_f = (np.fft.fftshift(f_yg)+epsilon)/(np.fft.fftshift(f_yb)+epsilon) #deconvolve by division in fourier space
r_f[np.abs(x)>0.5] = 0 #naive constraint to remove the artifacts by knowing final form is gaussian
r_f = np.fft.ifftshift(r_f)
r_if = np.fft.ifft(r_f)
y_gf = np.fft.ifft(f_yg)
y_bf = np.fft.ifft(f_yb)
y_if = np.fft.ifft(f_yi)
plt.plot(x,y_if, label='fft true input')
plt.plot(x,r_if, label='fft recv. input')
plt.legend(framealpha=0.)
plt.show()
Here the orange is the recovered input signal using the deconvolution of the output and the blur.
There are a few questions I have with this:
There is clearly a scaling issue. The only area where I can think that this may come in is when I applied the naive constraint. Should I renormalize in this step, knowing that 1/sqrt(N)*integral over my fourier space is equal to 1?
It looks like the position of the recovered Gaussian is messed up with half of the curve at either sides of the plot. Is this due to the division in Fourier space? How do I recover the original position (or have I done this completely wrong to begin with)
I've attached the script used to generate the two curves, the original input and recovered input in physical space.
Cheers,
Keven
EDIT: I should add I have no problem restoring the image using scipy.deconvolve + some small edits. This must mean my method here is somehow wrong?
1 ) As you correctly understood, the requirement for a scaling is related to the Discrete Fourier Transform. The best way to get it is to compute the deconvolution of two uniform unit signals. Their DFT is n 0 0 0 ...., where n is the number of points of the DFT. Hence the ratio r_f is 1 0 0 0 0 and its backward fft computed by np.fft.ifft() is 1/n 1/n 1/n ...
The correct signal resulting from the deconvolution should have been 1/T 1/T 1/T ..., where T=10. is the length of the frame.
As a consequence, the correct scaling to perform the deconvolution is n/T= len(r_f)/10.
r_if=r_if*len(r_if)/10.
2) The deconvoluted signal is translated by half a period. This is due to the fact that the gaussian kernel is centered on the middle of the frame. Simply shift the kernel by half a period and the problem is solved. The function np.fft.fftshift() can be applied to this end:
f_yb = np.fft.fft(np.fft.fftshift(y_b)) #fft the filter
EDIT: To investigate the reason for the translation, let's focus on the case of the deconvolution kernel being a very narrow gaussian distribution, nearly corresponding to Dirac distribution. Your input signal is a gaussian curve, centered at zero, the frame being sampled between -5 and 5. Similarly, the deconvolution kernel is a Dirac centered at zero. As a consequence, the deconvoluted signal must be identical to the input signal: a gaussian curve centered at zero. Nevertheless, the DFT as implemented in FFTW and consequently np.fft.fft() is computed as that of a frame starting at 0 and ending at 10, sampled at points 10j/n where j is in [0..n-1], the frequencies in the Fourier space being k/10 where k in [0..n/2,-n/2+1..-1]. As a consequence, this DFT sees your signal as a gaussian centered at 5 and the deconvolution kernel as a Dirac centered at 5. The convolution of a function f(t) with a Dirac delta(t-t0) centered at t0 is simply the translated function f(t-t0). Hence, the result of the deconvolution as computed by np.fft.fft() is the input signal translated by half a period. Since the input signal is centered at 0 in the [-5,5] frame, the output signal computed by np.fft.fft() is centered at -5 (or equivalently 5 due to periodicity). Shifting the kernel resolves the mismatch between us thinking of the frame as [-5 5] and np.fft.ifft() handling it as if it were [0 10].
Filters are often designed to reduce the effect of high-frequency noises. Deconvoluting therfore induce a potential magnification of high frequency noise. Screening the frequencies as you did is a potential solution. Notice that it is exactly equivalent to convoluting the signal with a particular filter!
In the range of tomographic reconstruction, the filtered backprojection algorithm requires applying a ramp filter, which dramatically inflate the high frequency noise. Here is proposed a Wiener filter: this kind of filter can be designed to minimize the mean square error on the deconvoluted signal, given the SNR of the convoluted signal. It nevertheless require some assumption regarding the power spectral densities of the signal and noise.
I'm currently computing the spectrogram with the matplotlib. I specify NFFT=512 but the resulting image has a height of 257. I then tried to just do a STFT (short time fourier transform) which gives me 512 dimensional vectors (as expected). If I plot the result of the STFT I can see that half of the 512 values are just mirrored so really I only get 257 values (like the matplotlib). Can somebody explain to me why that is the case? I always thought of the FT as a basis transform, why would it introduce this redundancy?
Thank you.
The redundancy is because you input a strictly real signal to your FFT, thus the DFT result is complex conjugate (Hermitian) symmetric. This redundancy is due to the fact that all the imaginary components of strictly real input are zero. But the output of this DFT can include non-zero imaginary components to indicate phase. Thus, the this DFT result has to be conjugate symmetric so that all the imaginary components in the result will cancel out between the two DFT result halves (same magnitudes, but opposite phases), indicating strictly real input. Also, the lower 257 bins of the basis transform will have 512 degrees of (scaler)freedom, just like the input. However, a spectrogram throws away all phase information, so it can only display 257 unique values (magnitude-only).
If you input a complex (quadrature, for instance) signal to a DFT, then there would likely not be Hermitian redundancy, and you would have 1024 degrees of freedom from a 512 length DFT.
If you want an image height of 512 (given real input), try an FFT size of 1024.
I have a WAV file which I would like to visualize in the frequency domain. Next, I would like to write a simple script that takes in a WAV file and outputs whether the energy at a certain frequency "F" exceeds a threshold "Z" (whether a certain tone has a strong presence in the WAV file). There are a bunch of code snippets online that show how to plot an FFT spectrum in Python, but I don't understand a lot of the steps.
I know that wavfile.read(myfile) returns the sampling rate (fs) and the data array (data), but when I run an FFT on it (y = numpy.fft.fft(data)), what units is y in?
To get the array of frequencies for the x-axis, some posters do this where n = len(data):
X = numpy.linspace(0.0, 1.0/(2.0*T), n/2)
and others do this:
X = numpy.fft.fftfreq(n) * fs)[range(n/2)]
Is there a difference between these two methods and is there a good online explanation for what these operations do conceptually?
Some of the online tutorials about FFTs mention windowing, but not a lot of posters use windowing in their code snippets. I see that numpy has a numpy.hamming(N), but what should I use as the input to that method and how do I "apply" the output window to my FFT arrays?
For my threshold computation, is it correct to find the frequency in X that's closest to my desired tone/frequency and check if the corresponding element (same index) in Y has an amplitude greater than the threshold?
FFT data is in units of normalized frequency where the first point is 0 Hz and one past the last point is fs Hz. You can create the frequency axis yourself with linspace(0.0, (1.0 - 1.0/n)*fs, n). You can also use fftfreq but the components will be negative.
These are the same if n is even. You can also use rfftfreq I think. Note that this is only the "positive half" of your frequencies, which is probably what you want for audio (which is real-valued). Note that you can use rfft to just produce the positive half of the spectrum, and then get the frequencies with rfftfreq(n,1.0/fs).
Windowing will decrease sidelobe levels, at the cost of widening the mainlobe of any frequencies that are there. N is the length of your signal and you multiply your signal by the window. However, if you are looking in a long signal you might want to "chop" it up into pieces, window them, and then add the absolute values of their spectra.
"is it correct" is hard to answer. The simple approach is as you said, find the bin closest to your frequency and check its amplitude.