FFTshift causes oscillations - why? (numpy) - python

my original problem was the following:
I have a pulse-envelope in an array a (0-element = time 0, last element = time T).
I want to fourier spectrum of the pulse. So what I did was np.fft.fftshift(np.fft.fft(a)). All good.
But then I was told to do a shift beforehand, too: np.fft.fftshift(np.fft.fft(np.fft.fftshift(a))). Then oscillations arised.
Now I wonder why one would do 2 shifts as shown above and why oscillations arise...
Here the example:
I have the following code
x = np.arange(100)
a =np.sin(np.pi*x**2/1000)
a:
a_fft = np.fft.fft(a)
a_fft:
a_fft_shift = np.fft.fftshift(a_fft)
a_fft_shift:
a_shift = np.fft.fftshift(a)
a_shift_fft = np.fft.fft(a_shift)
a_shift_fft:
a_shift_fft_shift = np.fft.fftshift(a_shift_fft)
a_shift_fft_shift:

Your line
a_shift = np.fft.fftshift(a)
reorders your original time-domain signal. That means in terms of FFT that you are altering the phases.
Note also that there is a discontinuity in your signal. By the line above, this discontinuity is shifted to the center of the signal, which causes the FFT to produce an infinite amount of high frequency cosine components. If you shift it to another place, the energy will be distributed accordingly.
The other problem is that you are only considering the real part of the spectrum, i. e., the cosine components. Always look at the imaginary part, too!
Take also a look at the magnitude spectrum to see that the position of the discontinuity only affects the phase. The total energy remains always the same.

Related

Moving mean square error between 2 arrays, 'valid', where they fully overlap

I have a noisy square signal which looks like this:
The amplitude is known. To match the complete square, I can create a pattern of the square and apply np.correlate to find where the signal and the pattern maximally overlap. I wanted to apply a similar approach to find the edge, try to correlate with the 2 patterns below:
As the correlation is nothing more than a convolution, this doesn't work. Half the pattern is equal to 0, and the convolution of this half will return 0 no matter the position on the signal; while the other half is equal to -X with X the amplitude. This second half convoluted with the signal will be maximal when the signal amplitude is maximal. On the signal plot, you can observe that the square is not perfect and that the beginning has a slightly larger amplitude. Basically, both correlation leads to a match on the beginning of the square, where the convolution is maximal. The ramp up (end of the square) is not detected.
To avoid this problem, I would like to use a different operation. As I do know the amplitude of the square signal, I can generate a pattern with the correct amplitude, in this case about -0.3. Thus, I would like to take the pattern and slide it across the signal. At each step, I would compute the mean square error and my pattern would match with the signal at the position where the mean square error is minimized. Moreover, I would like to use the same type of setting as for a convolution, 'valid', where the operation is performed only when the 2 arrays fully overlap.
Do you know of an other method; and/or which function, methods I should use? I couldn't find a all-in-one function line np.convolve or np.correlate.
EDIT: Since I couldn't find a pre-coded function in a library, I've coded mine with a while loop. It's pretty inefficient... It's up here on codereview for upgrades.
I think that convolving/correlating your signal with a step function is still pretty close to the optimal solution, since this is similar to matched filtering, which can be proven to be optimal (under certain conditions, noise likely needs to be Gaussian).
The only issue you have is that your template (the step function) contains a DC part. Removing this, will give you the result you want:
import numpy as np
import matplotlib.pyplot as plt
# simulate the signal
signal = np.zeros(4000)
signal[200:-400] = -0.3
signal += 0.005 * np.random.randn(*signal.shape)
plt.plot(signal)
plt.title('Simulated signal')
plt.show()
# convolve with template with non-zero DC
templ = np.zeros(200)
templ[100:] = 1 # step from 0 to 1
plt.plot(np.convolve(signal, templ))
plt.title('Convolution with template with DC component')
plt.show()
# convolve with template without DC
templ_ac = templ - templ.mean() # step from -0.5 to +0.5
plt.plot(np.convolve(signal, templ_ac))
plt.title('Convolution with template without DC component')
plt.show()
Results:
The way to understand this is that convolve(signal, template) = convolve(signal, template_DC) + convolve(signal, template_AC), where template_DC = mean(template) and template_AC = template - template_DC. The first part is the convoltion of the signal with a flat template, which is just a smoothed version of your signal. The second part is the 'edge detection' signal you want. If you do not subtract the AC part of the template, the uninteresting first part dominates the interesting part.
Note that the scaling of the template is not important, the step in the template doesn't have to be 0.3. This will just cause a scale factor in the end result. Also note that this method does not depend on the exact value of the step, so a larger step in your signal will cause a large effect in the edge detection.
If you know that the step is always exactly 0.3, and you want to be insensitive to steps of different amplitude, you could do some sort of least square fitting of the signal with the template, for every possible shift of the templatate, and only trigger a detection of an edge if the residual is small enough. This will be slow, but might give better rejection of steps with the wrong amplitude.
Since you have very little noise you can calculte where the signal drastically changes with a loop, for example:
for i in range(begin+10,end):
if(abs(data[i-10]-data[i])>0.1):
foundChange()

Numpy correlate x-axis is shifted

I have two signals that I'm trying to see their correlation lag:
It looks like they are synced, so I expect the correlate function to give minimum at zero (because they have anti-correlation every ~100 timesteps).
However, using this code:
yhat1 = np.load('cor1.npy')
yhat2 = np.load('cor2.npy')
corr = np.correlate(yhat1 - np.mean(yhat1),
yhat2 - np.mean(yhat2),
mode='same')
plt.plot(corr)
plt.show()
I'm getting the following (I tried to use 'full ' and 'same' in the mode and got the same result):
Why the minimum is not at 0 as expected but at 250?
Why it seems like there are other significant peaks on both sides of the minimum?
data is here
Numpy's correlation function returns you the auto/cross correlation function depending on the inputs you give. Correlation is same as convolution except you dont apply time reversal to one of the signals. In other words, apply a sliding dot-product between signals.
At t=0, it's normal to get zero correlation as one signal has zero at t=0. However,as you perform this further, signals are fluctuating both in magnitude and sign. Due to (relatively) extreme peaks of signals to each other at different times, correlatino is fluctuiating. THe huge peak is at t=500 because at theat time full overlap occurs between two signals. This happens as your signals extreme peaks are aligned at that moment. After t=500, your overlapped regions decrease and obsreve that the behavior is similar to the case before we've reached to t<500.

How do I get the frequencies from a signal?

I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)
What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()

Why do I need to "fftshift" the product of DFTs to recover the convolution product

I need to compute a convolution product using the convolution theorem. However, I do not understand why I neeed to apply fftshift on the inverse fourier transform to get the correct result. Otherwise, the result is swapped (well, I know that is what fftshift is made for, but I don't understand why I obtain a swapped result from the inverse fft). Here is a minimal example with two functions that decrease quickly so that I do not add to bother with padding. The result is checked against scipy.signal.convolve:
import numpy as np
import scipy.signal as sig
Nx = 400
xp = np.arange(Nx) - Nx/2.
Lg = 20
Lb = 25
ff = np.exp(-(xp/Lg)**2) * xp/Lg # function (two bumps of opposite signs)
gg = np.zeros(Nx) # convolution kernel (just a box)
gg[abs(xp)<Lb] = 1
conv_pure = sig.convolve(ff, gg, mode="same") # that is the correct one
tff = np.fft.rfft(ff) # DFT of the function
tfg = np.fft.rfft(gg) # DFT of the kernel
conv_dfts = np.fft.irfft(tff*tfg).real # should be the convolution product
conv_dftshift = np.fft.fftshift(conv_dfts)
And here is how it looks like
So, why is conv_dftsswapped ?
For the calculations in scipy.signal.convolve with mode='full' or mode='same' to be properly defined, the data in the first argument is (effectively) extended with zeros. Your FFT calculation, on the other hand, does circular convolution, which corresponds to using the periodic extension of the data. To see the consequences of this difference, consider how the first point of the result is calculated.
(It is helpful to have in mind the usual "sliding window" view of convolution, such as shown at http://mathworld.wolfram.com/Convolution.html or https://en.wikipedia.org/wiki/Convolution#Visual_explanation. In your case, the sliding window is gg.)
For scipy.signal.convolve with mode='same', you can visualize the calculation of the first point by aligning the right half of gg over the left end of ff, and summing the elementwise product of those two signals. ff is very small at its left end, so this calculation is very close to 0. Subsequent points of the convolution remain zero until the sliding window starts encountering larger values of ff. So the "interesting" part of the result is in the middle of the convolution.
For the first point of the FFT calculation, imagine the right end of gg aligned with the left end of ff. Again take the sum of the elementwise product. There are two big differences here. First, gg is not shifted by half its length like it is with mode='same' in scipy.signal.convolve. Second, the values that gg is multiplied by are not all zero--they are the periodic extension of ff, so in this "sliding window" visualization, we have the rectangular window aligned directly over the center of the double pulse (in the periodic extension). Because of the symmetry of gg and the antisymmetry of ff, this first value is 0. As gg slides right, the symmetry is broken, the positive pulse dominates the calculation, and nontrivial values are computed. Once the window passes the double pulse, the values of the convolution become very small. They become very big again near the end of the convolution, when the rectangular pulse encounters the other side of the double pulse.
To get your FFT calculcation to match scipy.signal.convolve calculation, you can adjust the phase of the rectangular pulse in gg. For example (assuming Nx is even). For example, if you add this line
gg2 = np.roll(gg, -(Nx//2 - 1))
and use gg2 in place of gg in the calculation of tfg:
tfg = np.fft.rfft(gg2) # DFT of the kernel
then conv_dfts and conv_pure agree. There are others ways you can tweak things to get the results to align as you expected. The main point of this answer is to explain why the results that you calculated were different.

extracting phase information using numpy fft

I am trying to use a fast fourier transform to extract the phase shift of a single sinusoidal function. I know that on paper, If we denote the transform of our function as T, then we have the following relations:
However, I am finding that while I am able to accurately capture the frequency of my cosine wave, the phase is inaccurate unless I sample at an extremely high rate. For example:
import numpy as np
import pylab as pl
num_t = 100000
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
w = 2.0*np.pi*30.0
phase = np.pi/2.0
amp = np.fft.rfft(np.cos(w*t+phase))
freqs = np.fft.rfftfreq(t.shape[-1],dt)
print (np.arctan2(amp.imag,amp.real))[30]
pl.subplot(211)
pl.plot(freqs[:60],np.sqrt(amp.real**2+amp.imag**2)[:60])
pl.subplot(212)
pl.plot(freqs[:60],(np.arctan2(amp.imag,amp.real))[:60])
pl.show()
Using num=100000 points I get a phase of 1.57173880459.
Using num=10000 points I get a phase of 1.58022110476.
Using num=1000 points I get a phase of 1.6650441064.
What's going wrong? Even with 1000 points I have 33 points per cycle, which should be enough to resolve it. Is there maybe a way to increase the number of computed frequency points? Is there any way to do this with a "low" number of points?
EDIT: from further experimentation it seems that I need ~1000 points per cycle in order to accurately extract a phase. Why?!
EDIT 2: further experiments indicate that accuracy is related to number of points per cycle, rather than absolute numbers. Increasing the number of sampled points per cycle makes phase more accurate, but if both signal frequency and number of sampled points are increased by the same factor, the accuracy stays the same.
Your points are not distributed equally over the interval, you have the point at the end doubled: 0 is the same point as 1. This gets less important the more points you take, obviusly, but still gives some error. You can avoid it totally, the linspace has a flag for this. Also it has a flag to return you the dt directly along with the array.
Do
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
instead of
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
then it works :)
The phase value in the result bin of an unrotated FFT is only correct if the input signal is exactly integer periodic within the FFT length. Your test signal is not, thus the FFT measures something partially related to the phase difference of the signal discontinuity between end-points of the test sinusoid. A higher sample rate will create a slightly different last end-point from the sinusoid, and thus a possibly smaller discontinuity.
If you want to decrease this FFT phase measurement error, create your test signal so the your test phase is referenced to the exact center (sample N/2) of the test vector (not the 1st sample), and then do an fftshift operation (rotate by N/2) so that there will be no signal discontinuity between the 1st and last point in your resulting FFT input vector of length N.
This snippet of code might help:
def reconstruct_ifft(data):
"""
In this function, we take in a signal, find its fft, retain the dominant modes and reconstruct the signal from that
Parameters
----------
data : Signal to do the fft, ifft
Returns
-------
reconstructed_signal : the reconstructed signal
"""
N = data.size
yf = rfft(data)
amp_yf = np.abs(yf) #amplitude
yf = yf*(amp_yf>(THRESHOLD*np.amax(amp_yf)))
reconstructed_signal = irfft(yf)
return reconstructed_signal
The 0.01 is the threshold of amplitudes of the fft that you would want to retain. Making the THRESHOLD greater(more than 1 does not make any sense), will give
fewer modes and cause higher rms error but ensures higher frequency selectivity.
(Please adjust the TABS for the python code)

Categories

Resources