Matplotlib spectrogram versus STFT

Matplotlib spectrogram versus STFT - python

I'm currently computing the spectrogram with the matplotlib. I specify NFFT=512 but the resulting image has a height of 257. I then tried to just do a STFT (short time fourier transform) which gives me 512 dimensional vectors (as expected). If I plot the result of the STFT I can see that half of the 512 values are just mirrored so really I only get 257 values (like the matplotlib). Can somebody explain to me why that is the case? I always thought of the FT as a basis transform, why would it introduce this redundancy?
Thank you.

The redundancy is because you input a strictly real signal to your FFT, thus the DFT result is complex conjugate (Hermitian) symmetric. This redundancy is due to the fact that all the imaginary components of strictly real input are zero. But the output of this DFT can include non-zero imaginary components to indicate phase. Thus, the this DFT result has to be conjugate symmetric so that all the imaginary components in the result will cancel out between the two DFT result halves (same magnitudes, but opposite phases), indicating strictly real input. Also, the lower 257 bins of the basis transform will have 512 degrees of (scaler)freedom, just like the input. However, a spectrogram throws away all phase information, so it can only display 257 unique values (magnitude-only).
If you input a complex (quadrature, for instance) signal to a DFT, then there would likely not be Hermitian redundancy, and you would have 1024 degrees of freedom from a 512 length DFT.
If you want an image height of 512 (given real input), try an FFT size of 1024.

Related

Median of derivative in X axis of an image

I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
All methods seem to work almost identically, but have slight differences.
An explanation why they end up with slightly different results would be appreciated.
After computing those I started playing with the result to learn about it, and I found out something that confused me:
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
Why is that?
I added the code I used to compute this (the first method at least) because maybe I'm doing something wrong.
from scipy.signal import convolve2d
im = sl.read_image(r'C:\Users\ahhal\Desktop\Essentials\Uni\year3\SemesterA\ImageProcessing\Exercises\Ex2\external\monkey.jpg', 1)
b = [[-1, 1]]
print(np.median(convolve2d(im, b)))
output: 0.0
The read_image function is my own and this is the implementation:
from imageio import imread
from skimage.color import rgb2gray
import numpy as np
def read_image(filename, representation):
"""
Receives an image file and converts it into one of two given representations.
:param filename: The file name of an image on disk (could be grayscale or RGB).
:param representation: representation code, either 1 or 2 defining wether the output
should be a grayscale image (1) or an RGB image (2). If the input image is grayscale,
we won't call it with representation = 2.
:return: An image, represented by a matrix of type (np.float64) with intensities
normalized to the range [0,1].
"""
assert representation in [1, 2]
# reads the image
im = imread(filename)
if representation == 1: # If the user specified they need grayscale image,
if len(im.shape) == 3: # AND the image is not grayscale yet
im = rgb2gray(im) # convert to grayscale (**Assuming its RGB and not a different format**)
im_float = im.astype(np.float64) # Convert the image type to one we can work with.
if im_float.max() > 1: # If image values are out of bound, normalize them.
im_float = im_float / 255
return im_float
Edit 2:
I tried it on several different images, and got 0.0 at all of them.
The image I'm using in the example is:

I computed derivatives using different methods such as :
convolution with an array [[-1, 1]].
Using the fourier theorem by computing DFT of the image and the array mentioned above, multiplying them and performing IDFT.
Directly through the derivative formula (Computing Fourier, multiplying by index and a constant and computing the inverse).
These derivative methods are all approximate and make different assumptions:
Convolution by [[-1, 1]] computes differences between adjacent elements,
derivative ~= data[n+1] − data[n]
You can interpret this like interpolating the data with a line segment, then taking the derivative of that interpolant:
I(x) = data[n] + (data[n+1] − data[n]) * (x − n)
So the approximation assumes the underlying function is locally linear. You can analyze the error by Taylor expansion to find that the error comes from the ignored higher-order terms. In other words, the approximation is accurate provided the function doesn't have strong nonlinear terms. This is a simple case of finite differences.
This is the same as 1, except with different boundary handling to handle convolution of samples near the edges of the image. By default, scipy.signal.convolve2d does zero padding (though you can use the boundary option to choose some other methods). However when computing the convolution through the DFT, then implicitly the boundary handling is periodic, wrapping around at the image edges. So the results of 1 and 2 differ for a margin of pixels near the edge because of the different boundary handling.
Computing the derivative through multiplying iω under the DFT representation can be interpreted like evaluating the derivative of the sinc interpolation the data. Sinc interpolation assumes the data is band limited. The error comes from spectra beyond the Nyquist frequency. Particularly, if there is a hard jump discontinuity from an object boundary, then the image is not bandlimited and the DFT-based derivative will have substantial error in the vicinity of the jump, appearing as ringing artifacts.
The main thing that baffles me is that when I try computing the median of this derivative, its ALWAYS 0.0.
I don't know why this happened here, but it shouldn't always be the case. For instance if each image row is the unit ramp data[n] = n, then the convolution by [[-1, 1]] is equal to 1 everywhere, except depending on boundary handling possibly not at the edges, so the median is 1.

Pascal already gave a wonderful explanation of the differences between the various approximations to the derivative. So I'll focus here on the "why always 0.0?" question.
The median of the derivative is 0.0 only by approximation. When I compute it, based on the finite difference approximation (method #1), I get -5.15e-5 as the median. Close to zero, but not exactly zero.
The derivative is 0 in uniform (flat) regions of the image such as the out-of-focus background. Other features in the image tend to have both a positive and a negative edge, making the histogram of the derivative image very symmetric:
This symmetry causes the median (as well as the mean) to be close to zero for such an image. However, this is not always the case. For example, if the image is brighter on the left edge than the right edge (or the other way around), then there must be a net gradient across the image, causing the mean or median to be different from zero.

Python Numpy FFT fast fourier transform - weird results

I am trying to get the phase distribution of a 2D aperture using FFT.
The input is a circle, where everything inside the circle has value 1, outside it has value 0.
In order to make a good transform, I use an input array that is 200x as large as the radius of the circle, and make a 5000x5000 grid out of it. This ensures that the circle is actually circular and there is enough room around in order that no Nyquist things happen.
I need to 2D Fourier transform the aperture and then calculate the phase of the Fourier transform at every point.
The function I use for creating the input (aperture):
creating the input aperture
Next do the numpy fft2 2D fourier transform:
Fourier transforming aperture
And the result of this is a 2D complex array (as expected!), BUT with the imaginary parts so much much much smaller than the real parts (17 orders of magnitude difference imaginary parts ~10E-17).
This is not expected and most probably wrong. What went wrong?

The FFT phase result of a perfectly symmetric input is zero, e.g. a strictly real result, thus atan2(Im,Re) == 0 , (imaginary components all zero, except for rounding noise).
(even symmetry with respect to (0,0) circularly, or to (n/2,n/2))
The phase will become non-zero (thus a non-zero imaginary component in the FFT result) when the input is moved off center or otherwise made non-symmetric.

Why do FFT result show 2 non-zero amplitudes for a samples of single frequency?

Doing a simple FFT run to learn the operation, I create an NumPy array with 100 elements having a sine wave with only a single period in the array. This code is used:
...
n = 100
x = np.fromfunction(lambda a: np.sin(2 * np.pi * a / n), (n,), dtype=float)
res = np.fft.fft(x)
...
The result in res shows an non-zero amplitude at 2 different index value:
idx real imag abs
--- ---------- ---------- ----------
...
1: 0 -50.000 50.000
...
99: 0 50.000 50.000
I had only expected to see a single non-zero amplitude at index 1.
Why is amplitude non-zero for both index 1 and 99, and how can I understand this mathematically?
ADDITION: Maybe the high frequency actually represent an aliased frequency, where the sample rate is too low according to the Nyquist rate.

The np.fft.fft() function returns the two-sided DFT spectra.
What you are seeing are the peaks for frequencies w and -w, where w is the frequency of the sine wave.
You can check this yourself by running np.fft.fftfreq and plotting the results:
x = np.linspace(0, 2)
y = np.sin(2*np.pi*x)
Y = np.fft.fft(y)
freqs = np.fft.fftfreq(len(x), d=x[1]-x[0])
# Plot the results
fig, (ax1, ax2) = plt.subplots(2, 1)
ax1.plot(x, y)
ax2.plot(freqs, np.abs(Y))

The Fourier transform
where Xk are complex numbers. While your x are real numbers, as a result, you get X[N-m] = X[m]* In your case, N=100, m=1, therefore, you have X[ 1 ] = X[99]
The link below explains everything,
Why is the FFT “mirrored”?
When dealing with real numbers, numpy provides another function numpy.fft.rfft
When the DFT is computed for purely real input, the output is Hermitian-symmetric, i.e. the negative frequency terms are just the complex conjugates of the corresponding positive-frequency terms, and the negative-frequency terms are therefore redundant. This function does not compute the negative frequency terms, and the length of the transformed axis of the output is therefore n//2 + 1.

A standard full DFT or FFT is a NxN complex-to-complex linear basis transform that returns its result as an N element vector consisting of complex elements, each complex result element consisting of a real and imaginary component. A complex result is required to represent both the magnitude and phase of each frequency component (and thus not be information lossy). The arctangent of the imaginary and real components represents the phase of each frequency component.
If you feed an FFT with a strictly real input (with no non-zero imaginary components), then you want the FFT result to represent a strictly real signal. How is this possible when the FFT returns a complex result with non-zero imaginary components (required if the phase is non-zero)? By returning two components for each signal, where those two components are equal in magnitude, but opposite in their imaginary components, so the imaginary parts cancels out. You still need the imaginary component of each result element so you can measure the phase. But looking at the entire FFT result, the imaginary components in the two complex values sum to zero, thus representing a strictly real input signal (with no imaginary stuff).
Thus a full FFT has to be complex-conjugate mirror symmetric when given strictly real input.
Thus you see (at least) two equal magnitude values in an FFT result for each frequency component in the input. This is not true when feeding an FFT complex input with non-zero imaginary components, as common in many physics equations and signal processing algorithms.
Added: Why does an FFT have to return a complex result instead of just a magnitude and a phase angle? FFT stands for Fast Fourier Transform. One of the things that makes an FFT fast is that it is a linear transforms that can be computed with just multiplies and adds for the arithmetic (plus a bit of clever data shuffling along the way). The real and imaginary components can be computed with just linear arithmetic. Whereas computing the phase requires an arctangent (or atan2()), which is a much slower non-linear transcendental operator.

When should I use fftshift(fft(fftshift(x))) and when fft(x)?

I am trying to implement an algorithm in python, but I am not sure when I should use fftshift(fft(fftshift(x))) and when only fft(x) (from numpy). Is there a rule of thumb based on the shape of input data?
I am using fftshift instead of ifftshift due to the even number of values in the vector x.

It really just depends on what you want. The DFT (and hence the FFT) is periodic in the frequency domain with period equal to 2pi.
The fft() function will return the approximation of the DFT with omega (radians/s) from 0 to pi (i.e. 0 to fs, where fs is the sampling frequency). All fftshift() does is swap the output vector of the fft() right down the middle. So the output of fftshift(fft()) is now from -pi/2 to pi/2.
Usually, people like to plot a good approximation of the DTFT (or maybe even the CTFT) using the FFT, so they zero-pad the input with a huge amount of zeros (the function fft() does this on it's own) and then they use the fftshift() function to plot between -pi and pi.
In other words, use fftshift(fft()) for plotting, and fft() for the math!

fft(fftshift(x)) rotates the input vector so the the phase of the complex FFT result is relative to the center of the original data window. If the input waveform is not exactly integer periodic in the FFT width, phase relative to the center of the original window of data may make more sense than the phase relative to some averaging between the discontinuous beginning and end. fft(fftshift(x)) also has the property that the imaginary component of a result will always be positive for a positive zero crossing at the center of the window of any antisymmetric waveform component.
fftshift(fft(y)) rotates the FFT results so that the DC bin is in the center of the result, halfway between -Fs/2 and Fs/2, which is a common spectrum display format.

Discrete Fourier Transform: How to use fftshift correctly with fft

I want numerically compute the FFT on a numpy array Y. For testing, I'm using the Gaussian function Y = exp(-x^2). The (symbolic) Fourier Transform is Y' = constant * exp(-k^2/4).
import numpy
X = numpy.arange(-100,100)
Y = numpy.exp(-(X/5.0)**2)
The naive approach fails:
from numpy.fft import *
from matplotlib import pyplot
def plotReIm(x,y):
f = pyplot.figure()
ax = f.add_subplot(111)
ax.plot(x, numpy.real(y), 'b', label='R()')
ax.plot(x, numpy.imag(y), 'r:', label='I()')
ax.plot(x, numpy.abs(y), 'k--', label='abs()')
ax.legend()
Y_k = fftshift(fft(Y))
k = fftshift(fftfreq(len(Y)))
plotReIm(k,Y_k)
real(Y_k) jumps between positive and negative values, which correspond to a jumping phase, which is not present in the symbolic result. This is certainly not desirable. (The result is technically correct in the sense that abs(Y_k) gives the amplitudes as expected ifft(Y_k) is Y.)
Here, the function fftshift() renders the array k monotonically increasing and changes Y_k accordingly. The pairs zip(k, Y_k) are not changed by applying this operation to both vectors.
This changes appears to fix the issue:
Y_k = fftshift(fft(ifftshift(Y)))
k = fftshift(fftfreq(len(Y)))
plotReIm(k,Y_k)
Is this the correct way to employ the fft() function if monotonic Y and Y_k are required?
The reverse operation of the above is:
Yx = fftshift(ifft(ifftshift(Y_k)))
x = fftshift(fftfreq(len(Y_k), k[1] - k[0]))
plotReIm(x,Yx)
For this case, the documentation clearly states that Y_k must be sorted compatible with the output of fft() and fftfreq(), which we can achieve by applying ifftshift().
Those questions have been bothering me for a long time: Are the output and input arrays of both fft() and ifft() always such that a[0] should contain the zero frequency term, a[1:n/2+1] should contain the positive-frequency terms, and a[n/2+1:] should contain the negative-frequency terms, in order of decreasingly negative frequency [numpy reference], where 'frequency' is the independent variable?
The answer on Fourier Transform of a Gaussian is not a Gaussian does not answer my question.

The FFT can be thought of as producing a set vectors each with an amplitude and phase. The fft_shift operation changes the reference point for a phase angle of zero, from the edge of the FFT aperture, to the center of the original input data vector.
The phase (and thus the real component of the complex vector) of the result is sometimes less "jumpy" when this is done, especially if some input function is windowed such that it is discontinuous around the edges of the FFT aperture. Or if the input is symmetric around the center of the FFT aperture, the phase of the FFT result will always be zero after an fft_shift.
An fft_shift can be done by a vector rotate of N/2, or by simply flipping alternating sign bits in the FFT result, which may be more CPU dcache friendly.

The definition for the output of fft (and ifft) is here: http://docs.scipy.org/doc/numpy/reference/routines.fft.html#background-information
This is what the routines compute, no more and no less. Observe that the discrete Fourier transform is rather different from the continuous Fourier transform. For a densely sampled function there is a relation between the two, but the relation also involves phase factors and scaling in addition to fftshift. This is the cause of the oscillations you see in your plot. The necessary phase factor you can work out yourself from the above mathematical formula for the DFT.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.