I'm trying to write a convolution code entirely in the spectral domain. I'm taking a spike series in time (example below only has one spike for simplicity) of n samples and calculating the Fourier series with numpy.fft.fft. I create a 'Ricker wavelet' of m samples (m << n) and calculate its Fourier series with numpy.fft.fft, but specifying that its output Fourier series be n samples long. Both the spike series and wavelet have the same sampling interval. The resulting convolved series is shifted (peak of wavelet is shifted along the time axis with respect to the spike). This shift seems to depend on the size, m, of the wavelet.
I thought it had something to do with the parameters of numpy.fft.fft(a, n=None, axis=-1, norm=None), particularly the 'axis' parameter. But, I do not understand the documentation for this parameter at all.
Can anyone help me understand why I'm getting this shift (if it isn't clear, let me be explicit and say that the peak of the wavelet in the convolved series must the at the same time sample of the spike in the input spike series)?
My code follows:
################################################################################
#
# import libraries
#
import math
import numpy as np
import scipy
import matplotlib.pyplot as plt
import os
from matplotlib.ticker import MultipleLocator
from random import random
# Define lists
#
Time=[]; Ricker=[]; freq=25; rickersize=51; timeiter=0.002; serieslength=501; TIMElong=[]; Reflectivity=[];
Series=[]; IMPEDANCE=[]; CONVOLUTION=[];
#
# Create ricker wavelet and its time sequence
#
for i in range(0,rickersize):
time=(float(i-rickersize//2)*timeiter)
ricker=(1-2*math.pi*math.pi*freq*freq*time*time)*math.exp(-1*math.pi*math.pi*freq*freq*time*time)
Time.append(time)
Ricker.append(ricker)
#
# Do various FFT operations on the Ricker wavelet:
# Normal FFT, FFT of longer Ricker, Amplitude of the FFTs, their inverse FFTs and their frequency sequence
#
FFT=np.fft.fft(Ricker); FFTlong=np.fft.fft(Ricker,n=serieslength,axis=0,norm=None);
AMP=abs(FFT); AMPlong=abs(FFTlong);
RICKER=np.fft.ifft(FFT); RICKERlong=np.fft.ifft(FFTlong);
FREQ=np.fft.fftfreq(len(Ricker),d=timeiter); FREQlong=np.fft.fftfreq(len(RICKERlong),d=timeiter)
PHASE=np.angle(FFT); PHASElong=np.angle(FFTlong);
#
# Create a single spike in the otherwise empty (0) series of length 'serieslength' (=len(RICKERlong)
# this spikes mimics a very simple seismic reflectivity series in time
#
for i in range(0,serieslength):
time=(float(i)*timeiter)
TIMElong.append(time)
if i==int(serieslength/2):
Series.append(1)
else:
Series.append(0)
#
# Do various FFT operations on the spike series
# Normal FFT, Amplitude of the FFT, its inverse FFT and frequency sequence
#
FFTSeries=np.fft.fft(Series)
AMPSeries=abs(FFTSeries)
SERIES=np.fft.ifft(FFTSeries)
FREQSeries=np.fft.fftfreq(len(Series),d=timeiter)
#
# Do convolution of the spike series with the (long) Ricker wavelet in the frequency domain and see result via inverse FFT
#
FFTConvolution=[FFTlong[i]*FFTSeries[i] for i in range(len(Series))]
CON=np.fft.ifft(FFTConvolution)
CONVOLUTION=[CON[i].real for i in range(len(Series))]
#
# plotting routines
#
fig,axs = plt.subplots(nrows=1,ncols=3, figsize=(14,8))
axs[0].barh(TIMElong,Series,height=0.005, color='black')
axs[1].plot(Ricker,Time,color='black', linestyle='solid',linewidth=1)
axs[2].plot(CONVOLUTION,TIMElong,color='black', linestyle='solid',linewidth=1)
#
axs[0].set_aspect(aspect=8); axs[0].set_title('Reflectivity',fontsize=12); axs[0].yaxis.grid(); axs[0].xaxis.grid();
axs[0].set_xlim(-2,2); axs[0].set_ylim(min(TIMElong),max(TIMElong)); axs[0].invert_yaxis(); axs[0].tick_params(axis='both',which='major',labelsize=12);
#
axs[1].set_aspect(aspect=6.2); axs[1].set_title('Ricker',fontsize=12); axs[1].yaxis.grid(); axs[1].xaxis.grid();
axs[1].set_xlim(-1.0,1.02); axs[1].set_ylim(min(Time),max(Time)); axs[1].invert_yaxis(); axs[1].tick_params(axis='both',which='major',labelsize=12);
#
axs[2].set_aspect(aspect=8); axs[2].set_title('Convolution',fontsize=12); axs[2].yaxis.grid(); axs[2].xaxis.grid();
axs[2].set_xlim(-2,2); axs[2].set_ylim(min(TIMElong),max(TIMElong)); axs[2].invert_yaxis(); axs[2].tick_params(axis='both',which='major',labelsize=12);
#
fig.tight_layout()
fig.show()
####
It turns out that, as far as I can understand, that my question has nothing to do with the peculiarities of python and numpy. The problem is 'circular convolution'. That is, the convolution of two data sequences is longer by a combination of the lengths of both sequences. This has to be accounted for in the fft and ifft. I wasn't doing this. I still don't know exactly how to handle this, but it should be simpler now I know what the problem is.
Apologies to those who tried to answer my malformed question.
Related
I have a dataset with a signal and a 1/distance (Angstrom^-1) column.
This is the dataset (fourier.csv): https://pastebin.com/ucFekzc6
After applying these steps:
import pandas as pd
import numpy as np
from numpy.fft import fft
df = pd.read_csv (r'fourier.csv')
df.plot(x ='1/distance', y ='signal', kind = 'line')
I generated this plot:
To generate the Fast Fourier Transformation data, I used the numpy library for its fft function and I applied it like this:
df['signal_fft'] = fft(df['signal'])
df.plot(x ='1/distance', y ='signal_fft', kind = 'line')
Now the plot looks like this, with the FFT data plotted instead of the initial "signal" data:
What I hoped to generate is something like this (This signal is extremely similar to mine, yet yields a vastly different FFT picture):
Theory Signal before windowing:
Theory FFT:
As you can see, my initial plot looks somewhat similar to graphic (a), but my FFT plot doesn't look anywhere near as clear as graphic (b). I'm still using the 1/distance data for both horizontal axes, but I don't see anything wrong with it, only that it should be interpreted as distance (Angstrom) instead of 1/distance (1/Angstrom) in the FFT plot.
How should I apply FFT in order to get a result that resembles the theoretical FFT curve?
Here's another slide that shows a similar initial signal to mine and a yet again vastly different FFT:
Addendum: I have been asked to provide some additional information on this problem, so I hope this helps.
The origin of the dataset that I have linked is an XAS (X-Ray Absorption Spectroscopy) spectrum of iron oxide. Such an experimentally obtained spectrum looks similar to the one shown in the "Schematic of XAFS data processing" on the top left, i.e. absorbance [a.u.] plotted against the photon energy [eV]. Firstly I processed the spectrum (pre-edge baseline correction + post-edge normalization). Then I converted the data on the x-axis from energy E to wavenumber k (thus dimension 1/Angstrom) and cut off the signal at the L-edge jump, leaving only the signal in the post-edge EXAFS region, referred to as fine structure function χ(k). The mentioned dataset includes k^2 weighted χ(k) (to emphasize oscillations at large k). All of this is not entirely relevant as the only thing I want to do now is a Fourier transformation on this signal ( k^2 χ(k) vs. k). In theory, as we are dealing with photoelectrons and (back)scattering phenomena, the EXAFS region of the XAS spectrum can be approximated using a superposition of many sinusoidal waves such as described in this equation with f(k) being the amplitude and δ(k) the phase shift of the scattered wave.
The aim is to gain an understanding of the chemical environment and the coordination spheres around the absorbing atom. The goal of the Fourier transform is to obtain some sort of signal in dependence of the "radius" R [Angstrom], which could later on be correlated to e.g. an oxygen being in ~2 Angstrom distance to the Mn atom (see "Schematic of XAFS data processing" on the right).
I only want to be able to reproduce the theoretically expected output after the FFT. My main concern is to get rid of the weird output signal and produce something that in some way resembles a curve with somewhat distinct local maxima (as shown in the 4th picture).
I don't have a 100% solution for you, but here's part of the problem.
The fft function you're using assumes that your X values are equally spaced. I checked this assumption by taking the difference between each 1/distance value, and graphing it:
df['1/distance'].diff().plot()
(Y is the difference, X is the index in the dataframe.)
This is supposed to be a constant line.
In order to fix this, one solution is to resample the signal through linear interpolation so that the timestep is constant.
from scipy import interpolate
rs_df = df.drop_duplicates().copy() # Needed because 0 is present twice in dataset
x = rs_df['1/distance']
y = rs_df['signal']
flinear = interpolate.interp1d(x, y, kind='linear')
xnew = np.linspace(np.min(x), np.max(x), rs_df.index.size)
ylinear = flinear(xnew)
rs_df['signal'] = ylinear
rs_df['1/distance'] = xnew
df.plot(x ='1/distance', y ='signal', kind = 'line')
rs_df.plot(x ='1/distance', y ='signal', kind = 'line')
The new line looks visually identical, but has a constant timestep.
I still don't get your intended result from the FFT, so this is only a partial solution.
MCVE
We import required dependencies:
import numpy as np
import pandas as pd
from scipy import signal
import matplotlib.pyplot as plt
And we load your dataset:
raw = pd.read_csv("https://pastebin.com/raw/ucFekzc6", sep="\t",
names=["k", "wchi"], header=0)
We clean the dataset a bit as it contains duplicates and a problematic point with null wave number (or infinite distance) and ensure a zero mean signal:
raw = raw.drop_duplicates()
raw = raw.iloc[1:, :]
raw["wchi"] = raw["wchi"] - raw["wchi"].mean()
The signal is about:
As noticed by #NickODell, signal is not equally sampled which is a problem if you aim to perform FFT signal processing.
We can resample your signal to have equally spaced sampling:
N = 65536
k = np.linspace(raw["k"].min(), raw["k"].max(), N)
interpolant = interpolate.interp1d(raw["k"], raw["wchi"], kind="linear")
g = interpolant(k)
Notice for performance concerns FFT does split the signal with the null frequency component at the borders (that's why your FFT signal does not look as it is usually presented in books). This indeed can be corrected by using classic fftshift method or performing ad hoc indexing.
R = 2*np.pi*fft.fftfreq(N, np.diff(k)[0])[:N//2]
G = (1/N)*fft.fft(g)[0:N//2]
Mind the 2π factor which is involved in the units scaling of your transformation.
You also have mentioned a windowing (at least in a picture) that is not referenced anywhere. This kind of filtering may help a lot when performing signal processing as it filter out artifacts and unwanted noise. I leave it up to you.
Least Square Spectral Analysis
An alternative to process your signal is available since the advent of modern Linear Algebra. There is a way to estimate the periodogram of an irregular sampled signal by a method called Least Square Spectral Analysis.
You are looking for the square root of the periodogram of your signal and scipy does implement an easy way to compute it by the Lomb-Scargle method.
To do so, we simply create a frequency vector (in this case they are desired output distances) and perform the regression for each of those distances w.r.t. your signal:
Rhat = np.linspace(raw["R"].min(), raw["R"].max()*2, 5000)
Ghat = signal.lombscargle(raw["k"], raw["wchi"], freqs=Rhat, normalize=True)
Graphically it leads to:
Comparison
If we compare both methodology we can confirm that the major peaks definitely match.
LSSA gives a smoother curve but do not assume it to be more accurate as this is statistical smooth of an interpolated curve. Anyway it fit the bill for your requirement:
I only want to be able to reproduce the theoretically expected output
after the FFT. My main concern is to get rid of the weird output
signal and produce something that in some way resembles a curve with
somewhat distinct local maxima (as shown in the 4th picture).
Conclusions
I think you have enough information to process your signal either by resampling and using FFT or by using LSSA. Both method has advantages and drawbacks.
Of course this needs to be validated with well know cases. Why not to reproduce with the data of the experience of the paper you are working on to check out you can reconstruct figures you posted.
You also need to dig in the signal conditioning before performing post processing (resampling, windowing, filtering).
I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)
What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()
I am trying to perform Fourier transform using numpy's fft as follows:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(0,1, 128)
x = np.cos(2*np.pi*t)
s_fft = np.fft.fft(x)
s_fft_freq = np.fft.fftshift(np.fft.fftfreq(t.shape[-1], t[1]-t[0]))
plt.plot(s_fft_freq, np.abs(s_fft))
The result I get is
which is wrong, as I know the FT should peak at f = 1, as the frequency of the cos is 1.
What am I doing wrong?
You are only applying fftshift to the x-axis labels, not the actual FFT magnitudes - you just need to apply s_fft = np.fft.fftshift(np.fft.fft(x)) too.
There are 2 or 3 things you have gotten wrong:
The FFT will peak at two positions for a pure real-valued frequency. This is the plus and minus frequencies. The only way to get a single peak in the Fourier domain is by having a complex valued signal (or having the trivial DC component).
(if with f, you mean frequency index) When using the DFT, the number of samples will determine how many frequency components you have. At the highest frequency index, you are always close to the per-sample oscilation: (-1)^t
(if with f, you mean amplitude) There are many definitions of the DFT, affecting both the forward and backward transform. This will affect how the values are interpreted when reading the spectrum.
I am trying to use a fast fourier transform to extract the phase shift of a single sinusoidal function. I know that on paper, If we denote the transform of our function as T, then we have the following relations:
However, I am finding that while I am able to accurately capture the frequency of my cosine wave, the phase is inaccurate unless I sample at an extremely high rate. For example:
import numpy as np
import pylab as pl
num_t = 100000
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
w = 2.0*np.pi*30.0
phase = np.pi/2.0
amp = np.fft.rfft(np.cos(w*t+phase))
freqs = np.fft.rfftfreq(t.shape[-1],dt)
print (np.arctan2(amp.imag,amp.real))[30]
pl.subplot(211)
pl.plot(freqs[:60],np.sqrt(amp.real**2+amp.imag**2)[:60])
pl.subplot(212)
pl.plot(freqs[:60],(np.arctan2(amp.imag,amp.real))[:60])
pl.show()
Using num=100000 points I get a phase of 1.57173880459.
Using num=10000 points I get a phase of 1.58022110476.
Using num=1000 points I get a phase of 1.6650441064.
What's going wrong? Even with 1000 points I have 33 points per cycle, which should be enough to resolve it. Is there maybe a way to increase the number of computed frequency points? Is there any way to do this with a "low" number of points?
EDIT: from further experimentation it seems that I need ~1000 points per cycle in order to accurately extract a phase. Why?!
EDIT 2: further experiments indicate that accuracy is related to number of points per cycle, rather than absolute numbers. Increasing the number of sampled points per cycle makes phase more accurate, but if both signal frequency and number of sampled points are increased by the same factor, the accuracy stays the same.
Your points are not distributed equally over the interval, you have the point at the end doubled: 0 is the same point as 1. This gets less important the more points you take, obviusly, but still gives some error. You can avoid it totally, the linspace has a flag for this. Also it has a flag to return you the dt directly along with the array.
Do
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
instead of
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
then it works :)
The phase value in the result bin of an unrotated FFT is only correct if the input signal is exactly integer periodic within the FFT length. Your test signal is not, thus the FFT measures something partially related to the phase difference of the signal discontinuity between end-points of the test sinusoid. A higher sample rate will create a slightly different last end-point from the sinusoid, and thus a possibly smaller discontinuity.
If you want to decrease this FFT phase measurement error, create your test signal so the your test phase is referenced to the exact center (sample N/2) of the test vector (not the 1st sample), and then do an fftshift operation (rotate by N/2) so that there will be no signal discontinuity between the 1st and last point in your resulting FFT input vector of length N.
This snippet of code might help:
def reconstruct_ifft(data):
"""
In this function, we take in a signal, find its fft, retain the dominant modes and reconstruct the signal from that
Parameters
----------
data : Signal to do the fft, ifft
Returns
-------
reconstructed_signal : the reconstructed signal
"""
N = data.size
yf = rfft(data)
amp_yf = np.abs(yf) #amplitude
yf = yf*(amp_yf>(THRESHOLD*np.amax(amp_yf)))
reconstructed_signal = irfft(yf)
return reconstructed_signal
The 0.01 is the threshold of amplitudes of the fft that you would want to retain. Making the THRESHOLD greater(more than 1 does not make any sense), will give
fewer modes and cause higher rms error but ensures higher frequency selectivity.
(Please adjust the TABS for the python code)
What I try is to filter my data with fft. I have a noisy signal recorded with 500Hz as a 1d- array. My high-frequency should cut off with 20Hz and my low-frequency with 10Hz.
What I have tried is:
fft=scipy.fft(signal)
bp=fft[:]
for i in range(len(bp)):
if not 10<i<20:
bp[i]=0
ibp=scipy.ifft(bp)
What I get now are complex numbers. So something must be wrong. What? How can I correct my code?
It's worth noting that the magnitude of the units of your bp are not necessarily going to be in Hz, but are dependent on the sampling frequency of signal, you should use scipy.fftpack.fftfreq for the conversion. Also if your signal is real you should be using scipy.fftpack.rfft. Here is a minimal working example that filters out all frequencies less than a specified amount:
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
time = np.linspace(0,10,2000)
signal = np.cos(5*np.pi*time) + np.cos(7*np.pi*time)
W = fftfreq(signal.size, d=time[1]-time[0])
f_signal = rfft(signal)
# If our original signal time was in seconds, this is now in Hz
cut_f_signal = f_signal.copy()
cut_f_signal[(W<6)] = 0
cut_signal = irfft(cut_f_signal)
We can plot the evolution of the signal in real and fourier space:
import pylab as plt
plt.subplot(221)
plt.plot(time,signal)
plt.subplot(222)
plt.plot(W,f_signal)
plt.xlim(0,10)
plt.subplot(223)
plt.plot(W,cut_f_signal)
plt.xlim(0,10)
plt.subplot(224)
plt.plot(time,cut_signal)
plt.show()
There's a fundamental flaw in what you are trying to do here - you're applying a rectangular window in the frequency domain which will result in a time domain signal which has been convolved with a sinc function. In other words there will be a large amount of "ringing" in the time domain signal due to the step changes you have introduced in the frequency domain. The proper way to do this kind of frequency domain filtering is to apply a suitable window function in the frequency domain. Any good introductory DSP book should cover this.