I have two time signals each containing the two same pulses but at different position.
picture describing the two signals:
How can I get with python the time shifts between the two signals for each pulse?
Cross correlation does not seem a robust way to do the job...
you can see there the cross correlation function and the two time shifts I would like to recover:
Although the time shift is perfectly obtained from the maximum of the cross correlation function if we have only one pulse you can see that it does not help much in the case of multiple pulses.
This is my test program:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
N = 200 # Number of points in initial, unshifted signals
N_pad = 500 # Total number of points at the end
t = np.linspace(-1, 1, N) # Dummy time vector
dt = t[1]-t[0] # Time step
Fs = 1.0/dt # Sampling frequency
pulse1 = signal.gausspulse(t, fc=5) # Create a pulse at 5 Hz
pulse2 = signal.gausspulse(t, fc=8) # Create a pulse at 8 Hz
# Shift and pad the pulses
pulse1_shifted = np.concatenate((pulse1,np.zeros(50)), axis=0)
pulse2_shifted = np.concatenate((pulse2,np.zeros(100)), axis=0)
pulse1_shifted_padded = np.concatenate((np.zeros(N_pad-len(pulse1_shifted)),pulse1_shifted), axis=0)
pulse2_shifted_padded = np.concatenate((np.zeros(N_pad-len(pulse2_shifted)),pulse2_shifted), axis=0)
# Create signal 1 as the sum of the two pulses
sig1 = pulse1_shifted_padded + pulse2_shifted_padded
# Different time shift
pulse1_shifted = np.concatenate((pulse1,np.zeros(60)), axis=0)
pulse2_shifted = np.concatenate((pulse2,np.zeros(150)), axis=0)
pulse1_shifted_padded = np.concatenate((np.zeros(N_pad-len(pulse1_shifted)),pulse1_shifted), axis=0)
pulse2_shifted_padded = np.concatenate((np.zeros(N_pad-len(pulse2_shifted)),pulse2_shifted), axis=0)
# Create signal 2 as the sum of the two pulses
sig2 = pulse1_shifted_padded + pulse2_shifted_padded
# Create new time vector at the same sampling rate
t = np.arange(dt*N_pad,step=dt)
# Plot the two signals
plt.figure()
plt.plot(t,sig1,label="Signal 1")
plt.plot(t,sig2,label="Signal 2")
plt.legend()
plt.xlabel("Time (s)")
plt.ylabel("Amplitude")
plt.title("The two signals. Orange and blue has been recorded at 100 m distance")
# Plot the cross correlation between the two signals
corr = signal.correlate(sig1,sig2)
dt = np.arange(1-N_pad,N_pad)/Fs # Time shift vector
plt.figure()
plt.plot(dt,corr)
plt.plot([0.1,0.1],[-20,20],"--")
plt.plot([0.5,0.5],[-20,20],"--")
plt.ylim([-15,15])
plt.xlabel("Time shift (s)")
plt.ylabel("Cross correlation function")
Do you have a work around?
Thank you so much
Since cross-correlation works perfectly with two pulses, you could choose any pulse to be a baseline and perform pairwise cross-correlations with the rest to determine the offsets as a quick workaround.
If I understand correctly, you want an automated method to extract the time shift between the orange and green cursors in cross-correlation responses such as the one you provided in your question. This time shift would be the marginal time shift between the two pulse pairs in your example. The marginal time shift between two pulse pairs such as those in your example can be inferred directly from cross-correlation results such as the results you provided. You can do this by first detecting the local maxima points in the cross-correlation results. Next, sort the local maxima points in descending order by amplitude, while retaining the time stamp associated with each local maximum. The marginal shift between the two pulses will then be equal to the difference between the two time stamps associated with the top two local maxima points in the sorted list of local maxima.
Related
This is my first post so apologies for any formatting related issues.
So I have a dataset which was obtained from an atomic microscope. The data looks like a 1024x1024 matrix which is composed of different measurements taken from the sample in units of meters, eg.
data = [[1e-07 ... 4e-08][ ... ... ... ][3e-09 ... 12e-06]]
np.size(data) == (1024,1024)
From this data, I was hoping to 1) derive some statistics about the real data; and 2) using the power spectrum density (PSD) distribution, hopefully create a new dataset which is different, but statistically similar to the characteristics of the original data. My plan to do this was 2a) take a 2d fft of data, calculate the power spectrum density 2b) some method?, 2c) take the 2d ifft of the modified signal to turn it back into a new sample with the same power spectrum density as the original.
Moreover, regarding part 2b) this was the closest link I could find regarding a time series based solution; however, I am not understanding exactly how to implement this so far, since I am not exactly sure what the phase, frequency, and amplitudes of the fft data represent in this 2d case, and also since we are now talking about a 2d ifft I'm not exactly sure how to construct this complex matrix while incorporating the random number generation, and amplitude/phase shifts in a way that will translate back to something meaningful.
So basically, I have been having some trouble with my intuition. For this problem, we are working with a 2d Fourier transform of spatial data with no temporal component, so I believe that methods which are applied to time series data could be applied here as well. Since the fft of the original data is the 'frequency in the spatial domain', the x-axis of the PSD should be either pixels or meters, but then what is the 'power' in the y-axis describing? I was hoping that someone could help me figure this problem out.
My code is below, hopefully someone could help me solve my problem. Bonus if someone could help me understand what this shifted frequency vs amplitude plot is saying:
here is the image with the fft, shifted fft, and freq. vs aplitude plots.
Fortunately the power spectrum density function is a bit easier to understand
Thank you all for your time.
data = np.genfromtxt('asample3.0_00001-filter.txt')
x = np.arange(0,int(np.size(data,0)),1)
y = np.arange(0,int(np.size(data,1)),1)
z = data
npix = data.shape[0]
#taking the fourier transform
fourier_image = np.fft.fft2(data)
#Get power spectral density
fourier_amplitudes = np.abs(fourier_image)**2
#calculate sampling frequency fs (physical distance between pixels)
fs = 92e-07/npix
freq_shifted = fs/2 * np.linspace(-1,1,npix)
freq = fs/2 * np.linspace(0,1,int(npix/2))
print("Plotting 2d Fourier Transform ...")
fig, axs = plt.subplots(2,2,figsize=(15, 15))
axs[0,0].imshow(10*np.log10(np.abs(fourier_image)))
axs[0,0].set_title('fft')
axs[0,1].imshow(10*np.log10(np.abs(np.fft.fftshift(fourier_image))))
axs[0,1].set_title('shifted fft')
axs[1,0].plot(freq,10*np.log10(np.abs(fourier_amplitudes[:npix//2])))
axs[1,0].set_title('freq vs amplitude')
for ii in list(range(npix//2)):
axs[1,1].plot(freq_shifted,10*np.log10(np.fft.fftshift(np.abs(fourier_amplitudes[ii]))))
axs[1,1].set_title('shifted freq vs amplitude')
#constructing a wave vector array
## Get frequencies corresponding to signal PSD
kfreq = np.fft.fftfreq(npix) * npix
kfreq2D = np.meshgrid(kfreq, kfreq)
knrm = np.sqrt(kfreq2D[0]**2 + kfreq2D[1]**2)
knrm = knrm.flatten()
fourier_amplitudes = fourier_amplitudes.flatten()
#creating the power spectrum
kbins = np.arange(0.5, npix//2+1, 1.)
kvals = 0.5 * (kbins[1:] + kbins[:-1])
Abins, _, _ = stats.binned_statistic(knrm, fourier_amplitudes,
statistic = "mean",
bins = kbins)
Abins *= np.pi * (kbins[1:]**2 - kbins[:-1]**2)
print("Plotting power spectrum of surface ...")
fig = plt.figure(figsize=(10, 10))
plt.loglog(fs/kvals, Abins)
plt.xlabel("Spatial Frequency $k$ [meters]")
plt.ylabel("Power per Spatial Frequency $P(k)$")
plt.tight_layout()
Suppose one wanted to find the period of a given sinusoidal wave signal. From what I have read online, it appears that the two main approaches employ either fourier analysis or autocorrelation. I am trying to automate the process using python and my usage case is to apply this concept to similar signals that come from the time-series of positions (or speeds or accelerations) of simulated bodies orbiting a star.
For simple-examples-sake, consider x = sin(t) for 0 ≤ t ≤ 10 pi.
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
## sample data
t = np.linspace(0, 10 * np.pi, 100)
x = np.sin(t)
fig, ax = plt.subplots()
ax.plot(t, x, color='b', marker='o')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
Given a sine-wave of the form x = a sin(b(t+c)) + d, the period of the sine-wave is obtained as 2 * pi / b. Since b=1 (or by visual inspection), the period of our sine wave is 2 * pi. I can check the results obtained from other methods against this baseline.
Attempt 1: Autocorrelation
As I understand it (please correct me if I'm wrong), correlation can be used to see if one signal is a time-lagged copy of another signal (similar to how cosine and sine differ by a phase difference). So autocorrelation is testing a signal against itself to measure the times at which the time-lag repeats said signal. Using the example posted here:
result = np.correlate(x, x, mode='full')
Since x and t each consist of 100 elements and result consists of 199 elements, I am not sure why I should arbitrarily select the last 100 elements.
print("\n autocorrelation (shape={}):\n{}\n".format(result.shape, result))
autocorrelation (shape=(199,)):
[ 0.00000000e+00 -3.82130761e-16 -9.73648712e-02 -3.70014208e-01
-8.59889695e-01 -1.56185995e+00 -2.41986054e+00 -3.33109112e+00
-4.15799070e+00 -4.74662427e+00 -4.94918053e+00 -4.64762251e+00
-3.77524157e+00 -2.33298717e+00 -3.97976240e-01 1.87752669e+00
4.27722402e+00 6.54129270e+00 8.39434617e+00 9.57785701e+00
9.88331103e+00 9.18204933e+00 7.44791758e+00 4.76948221e+00
1.34963425e+00 -2.50822289e+00 -6.42666652e+00 -9.99116299e+00
-1.27937834e+01 -1.44791297e+01 -1.47873668e+01 -1.35893098e+01
-1.09091510e+01 -6.93157447e+00 -1.99159756e+00 3.45267493e+00
8.86228186e+00 1.36707567e+01 1.73433176e+01 1.94357232e+01
1.96463736e+01 1.78556800e+01 1.41478477e+01 8.81191526e+00
2.32100171e+00 -4.70897483e+00 -1.15775811e+01 -1.75696560e+01
-2.20296487e+01 -2.44327920e+01 -2.44454330e+01 -2.19677060e+01
-1.71533510e+01 -1.04037163e+01 -2.33560966e+00 6.27458308e+00
1.45655029e+01 2.16769872e+01 2.68391837e+01 2.94553896e+01
2.91697473e+01 2.59122266e+01 1.99154591e+01 1.17007613e+01
2.03381596e+00 -8.14633251e+00 -1.78184255e+01 -2.59814393e+01
-3.17580589e+01 -3.44884934e+01 -3.38046447e+01 -2.96763956e+01
-2.24244433e+01 -1.26974172e+01 -1.41464998e+00 1.03204331e+01
2.13281784e+01 3.04712823e+01 3.67721634e+01 3.95170295e+01
3.83356037e+01 3.32477037e+01 2.46710643e+01 1.33886439e+01
4.77778141e-01 -1.27924775e+01 -2.50860560e+01 -3.51343866e+01
-4.18671622e+01 -4.45258983e+01 -4.27482779e+01 -3.66140001e+01
-2.66465884e+01 -1.37700036e+01 7.76494745e-01 1.55574483e+01
2.90828312e+01 3.99582426e+01 4.70285203e+01 4.95000000e+01
4.70285203e+01 3.99582426e+01 2.90828312e+01 1.55574483e+01
7.76494745e-01 -1.37700036e+01 -2.66465884e+01 -3.66140001e+01
-4.27482779e+01 -4.45258983e+01 -4.18671622e+01 -3.51343866e+01
-2.50860560e+01 -1.27924775e+01 4.77778141e-01 1.33886439e+01
2.46710643e+01 3.32477037e+01 3.83356037e+01 3.95170295e+01
3.67721634e+01 3.04712823e+01 2.13281784e+01 1.03204331e+01
-1.41464998e+00 -1.26974172e+01 -2.24244433e+01 -2.96763956e+01
-3.38046447e+01 -3.44884934e+01 -3.17580589e+01 -2.59814393e+01
-1.78184255e+01 -8.14633251e+00 2.03381596e+00 1.17007613e+01
1.99154591e+01 2.59122266e+01 2.91697473e+01 2.94553896e+01
2.68391837e+01 2.16769872e+01 1.45655029e+01 6.27458308e+00
-2.33560966e+00 -1.04037163e+01 -1.71533510e+01 -2.19677060e+01
-2.44454330e+01 -2.44327920e+01 -2.20296487e+01 -1.75696560e+01
-1.15775811e+01 -4.70897483e+00 2.32100171e+00 8.81191526e+00
1.41478477e+01 1.78556800e+01 1.96463736e+01 1.94357232e+01
1.73433176e+01 1.36707567e+01 8.86228186e+00 3.45267493e+00
-1.99159756e+00 -6.93157447e+00 -1.09091510e+01 -1.35893098e+01
-1.47873668e+01 -1.44791297e+01 -1.27937834e+01 -9.99116299e+00
-6.42666652e+00 -2.50822289e+00 1.34963425e+00 4.76948221e+00
7.44791758e+00 9.18204933e+00 9.88331103e+00 9.57785701e+00
8.39434617e+00 6.54129270e+00 4.27722402e+00 1.87752669e+00
-3.97976240e-01 -2.33298717e+00 -3.77524157e+00 -4.64762251e+00
-4.94918053e+00 -4.74662427e+00 -4.15799070e+00 -3.33109112e+00
-2.41986054e+00 -1.56185995e+00 -8.59889695e-01 -3.70014208e-01
-9.73648712e-02 -3.82130761e-16 0.00000000e+00]
Attempt 2: Fourier
Since I am not sure where to go from the last attempt, I sought a new attempt. To my understanding, Fourier analysis basically shifts a signal from/to the time-domain (x(t) vs t) to/from the frequency domain (x(t) vs f=1/t); the signal in frequency-space should appear as a sinusoidal wave that dampens over time. The period is obtained from the most observed frequency since this is the location of the peak of the distribution of frequencies.
Since my values are all real-valued, applying the Fourier transform should mean my output values are all complex-valued. I wouldn't think this is a problem, except for the fact that scipy has methods for real-values. I do not fully understand the differences between all of the different scipy methods. That makes following the algorithm proposed in this posted solution hard for me to follow (ie, how/why is the threshold value picked?).
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
threshold = 0
idx = np.where(abs(omega)>threshold)[0][-1]
max_f = abs(freq[idx])
print(max_f)
This outputs 0.01, meaning the period is 1/0.01 = 100. This doesn't make sense either.
Attempt 3: Power Spectral Density
According to the scipy docs, I should be able to estimate the power spectral density (psd) of the signal using a periodogram (which, according to wikipedia, is the fourier transform of the autocorrelation function). By selecting the dominant frequency fmax at which the signal peaks, the period of the signal can be obtained as 1 / fmax.
freq, pdensity = signal.periodogram(x)
fig, ax = plt.subplots()
ax.plot(freq, pdensity, color='r')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
The periodogram shown below peaks at 49.076... at a frequency of fmax = 0.05. So, period = 1/fmax = 20. This doesn't make sense to me. I have a feeling it has something to do with the sampling rate, but don't know enough to confirm or progress further.
I realize I am missing some fundamental gaps in understanding how these things work. There are a lot of resources online, but it's hard to find this needle in the haystack. Can someone help me learn more about this?
Let's first look at your signal (I've added endpoint=False to make the division even):
t = np.linspace(0, 10*np.pi, 100, endpoint=False)
x = np.sin(t)
Let's divide out the radians (essentially by taking t /= 2*np.pi) and create the same signal by relating to frequencies:
fs = 20 # Sampling rate of 100/5 = 20 (e.g. Hz)
f = 1 # Signal frequency of 1 (e.g. Hz)
t = np.linspace(0, 5, 5*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
This makes it more salient that f/fs == 1/20 == 0.05 (i.e. the periodicity of the signal is exactly 20 samples). Frequencies in a digital signal always relate to its sampling rate, as you have already guessed. Note that the actual signal is exactly the same no matter what the values of f and fs are, as long as their ratio is the same:
fs = 1 # Natural units
f = 0.05
t = np.linspace(0, 100, 100*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
In the following I'll use these natural units (fs = 1). The only difference will be in t and hence the generated frequency axes.
Autocorrelation
Your understanding of what the autocorrelation function does is correct. It detects the correlation of a signal with a time-lagged version of itself. It does this by sliding the signal over itself as seen in the right column here (from Wikipedia):
Note that as both inputs to the correlation function are the same, the resulting signal is necessarily symmetric. That is why the output of np.correlate is usually sliced from the middle:
acf = np.correlate(x, x, 'full')[-len(x):]
Now index 0 corresponds to 0 delay between the two copies of the signal.
Next you'll want to find the index or delay that presents the largest correlation. Due to the shrinking overlap this will by default also be index 0, so the following won't work:
acf.argmax() # Always returns 0
Instead I recommend to find the largest peak instead, where a peak is defined to be any index with a larger value than both its direct neighbours:
inflection = np.diff(np.sign(np.diff(acf))) # Find the second-order differences
peaks = (inflection < 0).nonzero()[0] + 1 # Find where they are negative
delay = peaks[acf[peaks].argmax()] # Of those, find the index with the maximum value
Now delay == 20, which tells you that the signal has a frequency of 1/20 of its sampling rate:
signal_freq = fs/delay # Gives 0.05
Fourier transform
You used the following to calculate the FFT:
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
Thhese functions re designed for complex-valued signals. They will work for real-valued signals, but you'll get a symmetric output as the negative frequency components will be identical to the positive frequency components. NumPy provides separate functions for real-valued signals:
ft = np.fft.rfft(x)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0]) # Get frequency axis from the time axis
mags = abs(ft) # We don't care about the phase information here
Let's have a look:
plt.plot(freqs, mags)
plt.show()
Note two things: the peak is at frequency 0.05, and the maximum frequency on the axis is 0.5 (the Nyquist frequency, which is exactly half the sampling rate). If we had picked fs = 20, this would be 10.
Now let's find the maximum. The thresholding method you have tried can work, but the target frequency bin is selected blindly and so this method would suffer in the presence of other signals. We could just select the maximum value:
signal_freq = freqs[mags.argmax()] # Gives 0.05
However, this would fail if, e.g., we have a large DC offset (and hence a large component in index 0). In that case we could just select the highest peak again, to make it more robust:
inflection = np.diff(np.sign(np.diff(mags)))
peaks = (inflection < 0).nonzero()[0] + 1
peak = peaks[mags[peaks].argmax()]
signal_freq = freqs[peak] # Gives 0.05
If we had picked fs = 20, this would have given signal_freq == 1.0 due to the different time axis from which the frequency axis was generated.
Periodogram
The method here is essentially the same. The autocorrelation function of x has the same time axis and period as x, so we can use the FFT as above to find the signal frequency:
pdg = np.fft.rfft(acf)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0])
plt.plot(freqs, abs(pdg))
plt.show()
This curve obviously has slightly different characteristics from the direct FFT on x, but the main takeaways are the same: the frequency axis ranges from 0 to 0.5*fs, and we find a peak at the same signal frequency as before: freqs[abs(pdg).argmax()] == 0.05.
Edit:
To measure the actual periodicity of np.sin, we can just use the "angle axis" that we passed to np.sin instead of the time axis when generating the frequency axis:
freqs = np.fft.rfftfreq(len(x), 2*np.pi*f*(t[1]-t[0]))
rad_period = 1/freqs[mags.argmax()] # 6.283185307179586
Though that seems pointless, right? We pass in 2*np.pi and we get 2*np.pi. However, we can do the same with any regular time axis, without presupposing pi at any point:
fs = 10
t = np.arange(1000)/fs
x = np.sin(t)
rad_period = 1/np.fft.rfftfreq(len(x), 1/fs)[abs(np.fft.rfft(x)).argmax()] # 6.25
Naturally, the true value now lies in between two bins. That's where interpolation comes in and the associated need to choose a suitable window function.
I would like to filter out unwanted frequencies and keep an only 60Hz signal.
Here is what I have done so far:
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
#
time = np.linspace(0,1,1000)
in_sig = np.cos(54*np.pi*time) + np.cos(60*np.pi*time) + np.sin(66*np.pi*time);
high_freq = 62;
low_freq = 58;
freqs = fftfreq(len(in_sig), d=time[1]-time[0])
filt_sig = rfft(in_sig)
cut_filt_sig = filt_sig.copy()
cut_filt_sig[(freqs<low_freq)] = 0
cut_filt_sig[(freqs>high_freq)] = 0
cut_in_sig = irfft(cut_filt_sig)
from pylab import *
figure(figsize=(10, 6))
subplot(221);plot(time,in_sig); title('Input signal');
subplot(222);plot(freqs,filt_sig);xlim(0,100);title('FFT of the input signal');
subplot(223);plot(time,cut_in_sig); title('Filtered signal');
xlabel('Time (s)')
subplot(224);plot(freqs,cut_filt_sig);xlim(0,100); title('FFT of the filtered signal');
xlabel('Freq. (Hz)')
show()
Plotted results
As I can see the filtered signal has lower amplitudes at the edges, I assume it could be due to applied rectangular window. What windows would you recommend to use to improve to output?
The issue likely comes from numpy's linspace(). The default mode is to include the endpoint stop. So time is 0, 1/999, 2/999, ..., 1. On the contrary, fft, handles of signal of length N as a periodic signal sampled at 0, T/N, ... , T(N-1)/N, thus avoiding the redundancy of the endpoint.
The computed DFT therefore use a frame of length T=1000/999. Hence the frequencies of the DFT are k*999/1000, not k. Since the length of the frame is not a multiple of the period of the signal (1/6s), a problem named spectral leakage occurs.
To avoid the spectral leakage, the length of the frame can be shortened to a multiple of the period, by removing the endpoint:
time = np.linspace(0,1,1000,endpoint=False)
It returns time as 0, 1/1000, ....999/1000, handled by the DFT as a frame of length 1, that is a multiple of the period of the input signal (1/6s).
If the length of the frame is not a multiple of the period of the signal, the input signal can be windowed so as to partly mitigate the effect related to the discontinuity at the edge of the frame, but spurous frequencies still exist.
Finally, the actual frequencies can be properly computed by estimating the frequency of a peak as its mean frequency wih respect to power density. See my answer to
Why are frequency values rounded in signal using FFT?
So, I am probably missing something obvious, but I have searched through lots of tutorials and documentation and can't seem to find a straight answer. How do you find the frequency axis of a function that you performed an fft on in Python(specifically the fft in the scipy library)?
I am trying to get a raw EMG signal, perform a bandpass filter on it, and then perform an fft to see the remaining frequency components. However, I am not sure how to find an accurate x component list. The specific signal I am working on currently was sampled at 1000 Hz and has 5378 samples.
Is it just creating a linear x starting from 0 and going to the length of the fft'd data? I see a lot of people creating a linspace from 0 to sample points times the sample spacing. But what would be my sample spacing in this case? Would it just be samples/sampling rate? Or is it something else completely?
Here is an example.
First create a sine wave with sampling interval pre-determined. we will combine two sine waves with frequencies 20 and 40. Remember high frequencies might be aliased if the time interval is large.
#Import the necessary packages
from scipy import fftpack
import matplotlib.pyplot as plt
import numpy as np
# sampling freq in herts 20Hz, and 40Hz
freq_sampling1 = 10
freq_sampling2 = 20
amplitude1 = 2 # amplitude of first sine wave
amplitude2 = 4 # amplitude of second sine wave
time = np.linspace(0, 6, 500, endpoint=True) # time range with total samples of 500 from 0 to 6 with time interval equals 6/500
y = amplitude1*np.sin(2*np.pi*freq_sampling1*time) + amplitude2*np.sin(2*np.pi*freq_sampling2*time)
plt.figure(figsize=(10, 4))
plt.plot(time,y, 'k', lw=0.8)
plt.xlim(0,6)
plt.show()
Notice in the figure that two sine waves are superimposed. One with freq. 10 and amplitude 2 and the other with freq. 20 and amplitude 4.
# apply fft function
yf = fftpack.fft(y, time.size)
amp = np.abs(yf) # get amplitude spectrum
freq = np.linspace(0.0, 1.0/(2.0*(6/500)), time.size//2) # get freq axis
# plot the amp spectrum
plt.figure(figsize=(10,6))
plt.plot(freq, (2/amp.size)*amp[0:amp.size//2])
plt.show()
Notice in the amplitude spectrum the two frequencies are recovered while amplitude is zero at other frequencies. the Amplitude values are also 2 and 4 respectively.
you can use instead fftpack.fftfreq to obtain frequency axis as suggested by tom10
Therefore, the code changes to
yf = fftpack.fft(y, time.size)
amp = np.abs(yf) # get amplitude spectrum
freq = fftpack.fftfreq(time.size, 6/500)
plt.figure(figsize=(10,6))
plt.plot(freq[0:freq.size//2], (2/amp.size)*amp[0:amp.size//2])
plt.show()
We are only plotting the positive part of the amplitude spectrum [0:amp.size//2]
Once you feed your window of samples into the FFT call it will return an array of imaginary points ... the freqency separation between each element of returned array is determined by
freq_resolution = sampling_freq / number_of_samples
the 0th element is your DC offset which will be zero if your input curve is balanced straddling the zero crossing point ... so in your case
freq_resolution = 1000 / 5378
In general, for efficiency, you will want to feed an even power of 2 number of samples into your FFT call, important if you are say sliding your window of samples forward in time and repeatedly calling FFT on each window
To calculate the magnitude of a frequency in a given freq_bin (an element of the returned imaginary array)
X = A + jB
A on real axis
B on imag axis
for above formula its
mag = 2.0 * math.Sqrt(A*A+B*B) / number_of_samples
phase == arctan( B / A )
you iterate across each element up to the Nyquist limit which is why you double above magnitude
So yes its a linear increment with same frequency spacing between each freq_bin
The figure I plot via the code below is just a peak around ZERO, no matter how I change the data. My data is just one column which records every timing points of some kind of signal. Is the time_step a value I should define according to the interval of two neighbouring points in my data?
data=np.loadtxt("timesequence",delimiter=",",usecols=(0,),unpack=True)
ps = np.abs(np.fft.fft(data))**2
time_step = 1
freqs = np.fft.fftfreq(data.size, time_step)
idx = np.argsort(freqs)
pl.plot(freqs[idx], ps[idx])
pl.show()
As others have hinted at your signals must have a large nonzero component. A peak at 0 (DC) indicates the average value of your signal. This is derived from the Fourier transform itself. This cosine function cos(0)*ps(0) indicates a measure of the average value of the signal. Other Fourier transform components are cosine waves of varying amplitude which show frequency content at those values.
Note that stationary signals will not have a large DC component as they are already zero mean signals. If you do not want a large DC component then you should compute the mean of your signal and subtract values from that. Regardless of whether your data is 0,...,999 or 1,...,1000, or even 1000, ..., 2000 you will get a peak at 0Hz. The only difference will be the magnitude of the peak since it measures the average value.
data1 = arange(1000)
data2 = arange(1000)+1000
dataTransformed3 = data - mean(data)
data4 = numpy.zeros(1000)
data4[::10] = 1 #simulate a photon counter where a 1 indicates a photon came in at time indexed by array.
# we could assume that the sample rate was 10 Hz for example
ps1 = np.abs(np.fft.fft(data))**2
ps2 = np.abs(np.fft.fft(data))**2
ps3 = np.abs(np.fft.fft(dataTransformed))**2
figure()
plot(ps1) #shows the peak at 0 Hz
figure()
plot(ps2) #shows the peak at 0 Hz
figure()
plot(ps3) #shows the peak at 1 Hz this is because we removed the mean value but since
#the function is a step function the next largest component is the 1 Hz cosine wave.
#notice the order of magnitude difference in the two plots.
Here is a bare-bones example that shows input and output with a peak as you'd expect it:
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
time = np.linspace(0,10,2000)
signal = np.cos(5*np.pi*time)
W = fftfreq(signal.size, d=time[1]-time[0])
f_signal = rfft(signal)
import pylab as plt
plt.subplot(121)
plt.plot(time,signal)
plt.subplot(122)
plt.plot(W,f_signal)
plt.xlim(0,10)
plt.show()
I use rfft since, more than likely, your input signal is from a physical data source and as such is real.
If you make your data all positive:
ps = np.abs(np.fft.fft(data))**2
time_step = 1
then most probably you will create a large 'DC', or 0 Hz component. So if your actual data has little amplitude, compared to that component, it will disappear from the plot, by the autoscaling feature.