Scipy welch and MATLAB pwelch does not provide same answer - python

I'm having trouble in python with the scipy.signal method called welch (https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.welch.html), which estimates the frequency spectrum of a time signal, because it does not (at all) provide the same output as MATLAB's method called pwelch, given the same parameters (window size, overlap, etc.). Beneath is my code in each language, and the input files and output files are in the link here:
https://www.dropbox.com/s/2ch36phbbmjfhqg/inputs_outputs.zip?dl=0
The input is a 2D array with rows being time steps and each column is a signal segment. The columns in the output is the spectrum from the corresponding columns in the input.
Python:
import numpy as np
from scipy.signal import welch, get_window
input = np.genfromtxt('python_input.csv', delimiter=',')
fs = 128
window = get_window('hamming', fs*1)
ff,yy = welch(input, fs=fs, window = window, noverlap = fs/2, nfft=fs*2,
axis=0, scaling="density", detrend=False)
np.savetxt("python_spectrum.csv", 10*np.log10(yy), delimiter=",")
MATLAB:
input = csvread('matlab_input.csv');
fs = 128
win = hamming(fs);
[pxx,f] = pwelch(input ,win,[],[],fs,'psd');
csvwrite('matlab_spectrum.csv',pxx);
I suspect scipy to be the problem, since it's output does not make sense in terms of reflecting the filters I have used (4'th order butterworth bandpass from 0.3 to 35 Hz with filtfilt) beforehand - MATLAB's output, however, does:
Each methods output using imagesc in MATLAB
And some plots of the elementwise differences are here
Elementwise differences (y-axis should be 0-64!) (the third plot I have excluded the most extreme values)
I did try with a simple sinusoid and it worked well in both programming languages - so I am completely lost.

Your Python and MATLAB files are not ordered in the same way, which gives you the error. Even though the shapes of the arrays are the same, the values are ordered row-wise in Python, and column-wise in MATLAB.
You can fix this by reshaping your input array and transposing. This will give you the same ordering of the values as in MATLAB:
input = input.reshape((input.shape[1], input.shape[0])).T

Related

Have I applied the Fourier Transformation correctly to this Dataframe? [EXAFS X-Ray Absorption Dataframe]

I have a dataset with a signal and a 1/distance (Angstrom^-1) column.
This is the dataset (fourier.csv): https://pastebin.com/ucFekzc6
After applying these steps:
import pandas as pd
import numpy as np
from numpy.fft import fft
df = pd.read_csv (r'fourier.csv')
df.plot(x ='1/distance', y ='signal', kind = 'line')
I generated this plot:
To generate the Fast Fourier Transformation data, I used the numpy library for its fft function and I applied it like this:
df['signal_fft'] = fft(df['signal'])
df.plot(x ='1/distance', y ='signal_fft', kind = 'line')
Now the plot looks like this, with the FFT data plotted instead of the initial "signal" data:
What I hoped to generate is something like this (This signal is extremely similar to mine, yet yields a vastly different FFT picture):
Theory Signal before windowing:
Theory FFT:
As you can see, my initial plot looks somewhat similar to graphic (a), but my FFT plot doesn't look anywhere near as clear as graphic (b). I'm still using the 1/distance data for both horizontal axes, but I don't see anything wrong with it, only that it should be interpreted as distance (Angstrom) instead of 1/distance (1/Angstrom) in the FFT plot.
How should I apply FFT in order to get a result that resembles the theoretical FFT curve?
Here's another slide that shows a similar initial signal to mine and a yet again vastly different FFT:
Addendum: I have been asked to provide some additional information on this problem, so I hope this helps.
The origin of the dataset that I have linked is an XAS (X-Ray Absorption Spectroscopy) spectrum of iron oxide. Such an experimentally obtained spectrum looks similar to the one shown in the "Schematic of XAFS data processing" on the top left, i.e. absorbance [a.u.] plotted against the photon energy [eV]. Firstly I processed the spectrum (pre-edge baseline correction + post-edge normalization). Then I converted the data on the x-axis from energy E to wavenumber k (thus dimension 1/Angstrom) and cut off the signal at the L-edge jump, leaving only the signal in the post-edge EXAFS region, referred to as fine structure function χ(k). The mentioned dataset includes k^2 weighted χ(k) (to emphasize oscillations at large k). All of this is not entirely relevant as the only thing I want to do now is a Fourier transformation on this signal ( k^2 χ(k) vs. k). In theory, as we are dealing with photoelectrons and (back)scattering phenomena, the EXAFS region of the XAS spectrum can be approximated using a superposition of many sinusoidal waves such as described in this equation with f(k) being the amplitude and δ(k) the phase shift of the scattered wave.
The aim is to gain an understanding of the chemical environment and the coordination spheres around the absorbing atom. The goal of the Fourier transform is to obtain some sort of signal in dependence of the "radius" R [Angstrom], which could later on be correlated to e.g. an oxygen being in ~2 Angstrom distance to the Mn atom (see "Schematic of XAFS data processing" on the right).
I only want to be able to reproduce the theoretically expected output after the FFT. My main concern is to get rid of the weird output signal and produce something that in some way resembles a curve with somewhat distinct local maxima (as shown in the 4th picture).
I don't have a 100% solution for you, but here's part of the problem.
The fft function you're using assumes that your X values are equally spaced. I checked this assumption by taking the difference between each 1/distance value, and graphing it:
df['1/distance'].diff().plot()
(Y is the difference, X is the index in the dataframe.)
This is supposed to be a constant line.
In order to fix this, one solution is to resample the signal through linear interpolation so that the timestep is constant.
from scipy import interpolate
rs_df = df.drop_duplicates().copy() # Needed because 0 is present twice in dataset
x = rs_df['1/distance']
y = rs_df['signal']
flinear = interpolate.interp1d(x, y, kind='linear')
xnew = np.linspace(np.min(x), np.max(x), rs_df.index.size)
ylinear = flinear(xnew)
rs_df['signal'] = ylinear
rs_df['1/distance'] = xnew
df.plot(x ='1/distance', y ='signal', kind = 'line')
rs_df.plot(x ='1/distance', y ='signal', kind = 'line')
The new line looks visually identical, but has a constant timestep.
I still don't get your intended result from the FFT, so this is only a partial solution.
MCVE
We import required dependencies:
import numpy as np
import pandas as pd
from scipy import signal
import matplotlib.pyplot as plt
And we load your dataset:
raw = pd.read_csv("https://pastebin.com/raw/ucFekzc6", sep="\t",
names=["k", "wchi"], header=0)
We clean the dataset a bit as it contains duplicates and a problematic point with null wave number (or infinite distance) and ensure a zero mean signal:
raw = raw.drop_duplicates()
raw = raw.iloc[1:, :]
raw["wchi"] = raw["wchi"] - raw["wchi"].mean()
The signal is about:
As noticed by #NickODell, signal is not equally sampled which is a problem if you aim to perform FFT signal processing.
We can resample your signal to have equally spaced sampling:
N = 65536
k = np.linspace(raw["k"].min(), raw["k"].max(), N)
interpolant = interpolate.interp1d(raw["k"], raw["wchi"], kind="linear")
g = interpolant(k)
Notice for performance concerns FFT does split the signal with the null frequency component at the borders (that's why your FFT signal does not look as it is usually presented in books). This indeed can be corrected by using classic fftshift method or performing ad hoc indexing.
R = 2*np.pi*fft.fftfreq(N, np.diff(k)[0])[:N//2]
G = (1/N)*fft.fft(g)[0:N//2]
Mind the 2π factor which is involved in the units scaling of your transformation.
You also have mentioned a windowing (at least in a picture) that is not referenced anywhere. This kind of filtering may help a lot when performing signal processing as it filter out artifacts and unwanted noise. I leave it up to you.
Least Square Spectral Analysis
An alternative to process your signal is available since the advent of modern Linear Algebra. There is a way to estimate the periodogram of an irregular sampled signal by a method called Least Square Spectral Analysis.
You are looking for the square root of the periodogram of your signal and scipy does implement an easy way to compute it by the Lomb-Scargle method.
To do so, we simply create a frequency vector (in this case they are desired output distances) and perform the regression for each of those distances w.r.t. your signal:
Rhat = np.linspace(raw["R"].min(), raw["R"].max()*2, 5000)
Ghat = signal.lombscargle(raw["k"], raw["wchi"], freqs=Rhat, normalize=True)
Graphically it leads to:
Comparison
If we compare both methodology we can confirm that the major peaks definitely match.
LSSA gives a smoother curve but do not assume it to be more accurate as this is statistical smooth of an interpolated curve. Anyway it fit the bill for your requirement:
I only want to be able to reproduce the theoretically expected output
after the FFT. My main concern is to get rid of the weird output
signal and produce something that in some way resembles a curve with
somewhat distinct local maxima (as shown in the 4th picture).
Conclusions
I think you have enough information to process your signal either by resampling and using FFT or by using LSSA. Both method has advantages and drawbacks.
Of course this needs to be validated with well know cases. Why not to reproduce with the data of the experience of the paper you are working on to check out you can reconstruct figures you posted.
You also need to dig in the signal conditioning before performing post processing (resampling, windowing, filtering).

Vectorization of numpy matrix that contains pdf of multiple gaussians and multiple samples

My problem is this: I have GMM model with K multi-variate gaussians, and also I have N samples.
I want to create a N*K numpy matrix, which in it's [i,k] cell there is the pdf function of the k'th gaussian on the i'th sample, i.e. in this cell there is
In short, I'm intrested in the following matrix:
pdf matrix
This what I have now (I'm working with python):
Q = np.array([scipy.stats.multivariate_normal(mu_t[k], cov_t[k]).pdf(X) for k in range(self.K)]).T
X in the code is a matrix whose lines are my samples.
It's works fine on small toy dataset from small dimension, but the dataset I'm working with is 10,000 28*28 pictures, and on it this line run extremely slowly...
I want to find a solution that doesn't envolve loops but only vector\matrix operation (i.e. vectorization). The scipy 'multivariate_normal' function cannot parameters of more than 1 gaussians, as far as I understand it (but it's 'pdf' function can calculates on multiple samples at once).
Does someone have an Idea?
I am afraid, that the main speed killer in your problem is the inversion and deteminant calculation for the cov_t matrices. If you somehow managed to precalculate these, you could enroll the calculation and use np.add.outer to get all combinations of x_i - mu_k and then use array comprehension to calculate the probabilities with the full formula of the normal distribution function.
Try
S = np.add.outer(X,-mu_t)
cov_t_inv = ??
cov_t_inv_det = ??
Q = 1/(2*np.pi*cov_t_inv_det)**0.5 * np.exp(-0.5*np.einsum('ikr,krs,kis->ik',S,cov_t_inv,S))
Where you insert precalculated arrays cov_t_inv for the inverse covariance matrices and cov_t_inv_det for their determinants.

Python, fast computation of rolling percentile

Given a multidimensional array, I want to compute a rolling percentile over one of its axes, with the rolling windows truncated near the boundaries of the array. Below is a minimal example implementation using only numpy via np.nanpercentile() applied to stacked, rolled (through np.roll()) arrays. However, the input array may be very large (~ 1 GB or more), so two issues arise:
For the current implementation, the stacked, rolled array may
not fit into RAM memory. Avoidable with for-loops over all axes
unaffected by the rolling, but may be slow.
Even fully vectorized (as below), the computation time is quite long,
understandably due to the sheer amount of computations performed.
Questions: Is there a more efficient python implementation of a rolling percentile (with axis/axes argument or the like and with truncated windows near the boundaries)?** If not, how could the computation be sped up (and, if possible, without exceeding the RAM)? C-code called from Python? Computation of percentiles at fewer "central" points, and approximation in between via (e.g. linear) interpolation? Other ideas?
Related post (implementing rolling percentiles): How to compute moving (or rolling, if you will) percentile/quantile for a 1d array in numpy? Issues are:
pandas implementation via pd.Series().rolling().quantile() works only for pd.Series or pd.DataFrame objects, not multidimensional (4D or arbitrary D) arrays;
implementation via np.lib.stride_tricks.as_strided() with np.nanpercentile() is similar to the one below and should not be much faster given that np.nanpercentile() is the speed bottleneck, see below
Minimal example implementation:
import numpy as np
np.random.seed(100)
# random array of numbers
a = np.random.rand(10000,1,70,70)
# size of rolling window
n_window = 150
# percentile to compute
p = 0.7
# NaN values to prepend/append to array before rolling
nan_temp = np.full(tuple([n_window] + list(np.array(a.shape)[1:])), fill_value=np.nan)
# prepend and append NaN values to array
a_temp = np.concatenate((nan_temp, a, nan_temp), axis=0)
# roll array, stack rolled arrays along new dimension, compute percentile (ignoring NaNs) using np.nanpercentile()
res = np.nanpercentile(np.concatenate([np.roll(a_temp, shift=i, axis=0)[...,None] for i in range(-n_window, n_window+1)],axis=-1),p*100,axis=-1)
# cut away the prepended/appended NaN values
res = res[n_window:-n_window]
Computation times (in seconds), example (for the case of a having a shape of (1000,1,70,70) instead of (10000,1,70,70)):
create random array: 0.0688176155090332
prepend/append NaN values: 0.03478217124938965
stack rolled arrays: 38.17830514907837
compute nanpercentile: 1145.1418626308441
cut out result: 0.0004646778106689453

How to combine the phase of one image and magnitude of different image into 1 image by using python

I want to combine phase spectrum of one image and magnitude spectrum of different image into one image.
I have got phase spectrum and magnitude spectrum of image A and image B.
Here is the code.
f = np.fft.fft2(grayA)
fshift1 = np.fft.fftshift(f)
phase_spectrumA = np.angle(fshift1)
magnitude_spectrumB = 20*np.log(np.abs(fshift1))
f2 = np.fft.fft2(grayB)
fshift2 = np.fft.fftshift(f2)
phase_spectrumB = np.angle(fshift2)
magnitude_spectrumB = 20*np.log(np.abs(fshift2))
I trying to figure out , but still i do not know how to do that.
Below is my test code.
imgCombined = abs(f) * math.exp(1j*np.angle(f2))
I wish i can come out just like that
Here are the few things that you would need to fix for your code to work as intended:
The math.exp function supports scalar exponentiation. For an element-wise matrix exponentiation you should use numpy.exp instead.
Similary, the * operator would attempt to perform matrix multiplication. In your case you want to instead perform element-wise multiplication which can be done with np.multiply
With these fixes you should get the frequency-domain combined matrix as follows:
combined = np.multiply(np.abs(f), np.exp(1j*np.angle(f2)))
To obtain the corresponding spatial-domain image, you would then need compute the inverse transform (and take the real part since there my be residual small imaginary parts due to numerical errors) with:
imgCombined = np.real(np.fft.ifft2(combined))
Finally the result can be shown with:
import matplotlib.pyplot as plt
plt.imshow(imgCombined, cmap='gray')
Note that imgCombined may contain values outside the [0,1] range. You would then need to decide how you want to rescale the values to fit the expected [0,1] range.
The default scaling (resulting in the image shown above) is to linearly scale the values such that the minimum value is set to 0, and the maximum value is set to 0.
Another way could be to limit the values to that range (i.e. forcing all negative values to 0 and all values greater than 1 to 1).
Finally another approach, which seems to provide a result closer to the screenshot provided, would be to take the absolute value with imgCombined = np.abs(imgCombined)

FFT filter vs lfilter in python

I have some troubles when applying bandpass filter to signal in Python. Have tried the following to things:
Do the box window "by hand", i.e. do the FFT on the signal, apply the
filter with a box window and the do the IFFT to get back to the time
domain.
Use the scipy.signal module where I use firwin2 to construct the
filter and then lfilter to to the filtering.
Futhermore I have done the same filtering in the audio program Cool Edit and compared the result from the above two tests.
As can be seen (I am a new user so I can not post my png fig), the results from the FFT and scipy.signal are very different. When compare to the result from Cool edit, the FFT is close, however not identical. Code as below:
# imports
from pylab import *
import os
import scipy.signal as signal
# load data
tr=loadtxt('tr6516.txt',skiprows=1)
sr = 500 # [samples/s]
nf = sr/2.0 # Nyquist frequence
W = 512 # Window widht for filtering
N=float(8192) # Fourier settings
Ns = len(tr[:,0]) # Total number of samples
# Create inpulse responce from the filter
fv=12.25
w =0.5
r =0.5*w
Hz=[0, fv-w-r, fv-w, fv+w, fv+w+r, nf]
ff=[0, 0, 1, 1, 0, 0]
b = signal.firwin2(W,Hz,ff,nfreqs=N+1,nyq=nf)
SigFilter = signal.lfilter(b, 1, tr[:,1])
# Fourier transform
X1 = fft(tr[:,1],n=int(N))
X1 = fftshift(X1)
F1 = arange(-N/2.0,N/2.0)/N*sr
# Filter data
ff=[0,1,1,0]
fv=12.25
w =0.5
r =0.5*w
Hz=[fv-w-r,fv-w,fv+w,fv+w+r]
k1=interp(-F1,Hz,ff)+interp(F1,Hz,ff)
X1_f=X1*k1
X1_f=ifftshift(X1_f)
x1_f=ifft(X1_f,n=int(N))
Can anyone explain to me why this difference? The filtering in Cool edit has been done using the same settings as in scipy.signal (hanning window, window width 512). Or have I got this all totaly wrong.
Best regards,
Anders
Above code:
Compared with Cool Edit:
Small differences can be explained by the libraries using different algorithms that accumulate error slightly differently.
For example, if you compute the DFT using a radix-2 FFT, a split-radix FFT and an ordinary DFT, the results will all be slightly different. In fact the ordinary DFT has worse accuracy than all decent implementations of an FFT because it uses many more floating point operations, and thus it accumulates more error.
Could this explain the close (but not identical) results you are seeing?

Categories

Resources