I'm reading a specific column of a csv file as a numpy array. When I try to do the fft of this array I get an array of NaNs. How do I get the fft to work? Here's what I have so far:
#!/usr/bin/env python
from __future__ import division
import numpy as np
from numpy import fft
import matplotlib.pyplot as plt
fileName = '/Users/Name/Documents/file.csv'
#read csv file
df = np.genfromtxt(fileName, dtype = float, delimiter = ',', names = True)
X = df['X'] #get X from file
rate = 1000. #rate of data collection in points per second
Hx = abs(fft.fft(X))
freqX = fft.fftfreq(len(Hx), 1/rate)
plt.plot(freqX,Hx) #plot freqX vs Hx
Presumably there are some missing values in your csv file. By default, np.genfromtxt will replace the missing values with NaN.
If there are any NaNs or Infs in an array, the fft will be all NaNs or Infs.
For example:
import numpy as np
x = [0.1, 0.2, np.nan, 0.4, 0.5]
print np.fft.fft(x)
And we'll get:
array([ nan +0.j, nan+nanj, nan+nanj, nan+nanj, nan+nanj])
However, because an FFT operates on a regularly-spaced series of values, removing the non-finite values from an array is a bit more complex than just dropping them.
pandas has several specialized operations to do this, if you're open to using it (e.g. fillna). However, it's not too difficult to do with "pure" numpy.
First, I'm going to assume that you're working with a continuous series of data because you're taking the FFT of the values. In that case, we'd want to interpolate the NaN values based on the values around them. Linear interpolation (np.interp) may not be ideal in all situations, but it's not a bad default choice:
For example:
import numpy as np
x = np.array([0.1, 0.2, np.nan, 0.4, 0.5])
xi = np.arange(len(x))
mask = np.isfinite(x)
xfiltered = np.interp(xi, xi[mask], x[mask])
And we'll get:
In [18]: xfiltered
Out[18]: array([ 0.1, 0.2, 0.3, 0.4, 0.5])
We can then calculate the FFT normally:
In [19]: np.fft.fft(xfiltered)
Out[19]:
array([ 1.50+0.j , -0.25+0.34409548j, -0.25+0.08122992j,
-0.25-0.08122992j, -0.25-0.34409548j])
...and get a valid result.
If your data contains NaN values, you need to interpolate them. Alternatively, you can calculate the spectrum using the Fourier equation where np.sum is replaced with np.nansum. With this approach you don't need to interpolate NaN values, although the amount of missing data will effect the spectrum. More missing data will result in a noisy spectrum and hence inaccurate spectral values.
Below is a MWE to illustrate the concept, with a graph showing the result. The MWE illustrates how to calculate the single-sided amplitude spectrum of a simple reference signal containing a number of missing values.
#!/usr/bin/python
# Python code to plot amplitude spectrum of signal containing NaN values
# Python version 2.7.13
from __future__ import division
import numpy as np
import pylab as pl
import random
LW = 2 #line width
AC = 0.5 #alpha channel
pi = np.pi
def periodogramSS(inputsignal,fsamp):
N = len(inputsignal)
N_notnan = np.count_nonzero(~np.isnan(inputsignal))
hr = fsamp/N #frequency resolution
t = np.arange(0,N*Ts,Ts)
#flow,fhih = -fsamp/2,(fsamp/2)+hr #Double-sided spectrum
flow,fhih = 0,fsamp/2+hr #Single-sided spectrum
#flow,fhih = hr,fsamp/2
frange = np.arange(flow,fhih,hr)
fN = len(frange)
Aspec = np.zeros(fN)
n = 0
for f in frange:
Aspec[n] = np.abs(np.nansum(inputsignal*np.exp(-2j*pi*f*t)))/N_notnan
n+=1
Aspec *= 2 #single-sided spectrum
Aspec[0] /= 2 #DC component restored (i.e. halved)
return (frange,Aspec)
#construct reference signal:
f1 = 10 #Hz
T = 1/f1
fs = 10*f1
Ts = 1/fs
t = np.arange(0,20*T,Ts)
DC = 3.0
x = DC + 1.5*np.cos(2*pi*f1*t)
#randomly delete values from signal x:
ndel = 10 #number of samples to replace with NaN
random.seed(0)
L = len(x)
randidx = random.sample(range(0,L),ndel)
for idx in randidx:
x[idx] = np.nan
(fax,Aspectrum) = periodogramSS(x,fs)
fig1 = pl.figure(1,figsize=(6*3.13,4*3.13)) #full screen
pl.ion()
pl.subplot(211)
pl.plot(t, x, 'b.-', lw=LW, ms=2, label='ref', alpha=AC)
#mark NaN values:
for (t_,x_) in zip(t,x):
if np.isnan(x_):
pl.axvline(x=t_,color='g',alpha=AC,ls='-',lw=2)
pl.grid()
pl.xlabel('Time [s]')
pl.ylabel('Reference signal')
pl.subplot(212)
pl.stem(fax, Aspectrum, basefmt=' ', markerfmt='r.', linefmt='r-')
pl.grid()
pl.xlabel('Frequency [Hz]')
pl.ylabel('Amplitude spectrum')
fig1name = './signal.png'
print 'Saving Fig. 1 to:', fig1name
fig1.savefig(fig1name)
The reference signal (real) is shown in blue with missing values marked with green. The single-sided amplitude spectrum is shown in red. The DC component and amplitude value at 10 Hz are clearly visible. The other values are caused by the reference signal being broken up by the missing data.
Related
I have a plot of 3 Dirac Delta functions after computing an fft using scipy. I want to find the 3 frequencies at which the delta dirac functions occur, therefore, where the y component(amplitude) is greater than 0, but how do I then find their corresponding x component (frequency). Is there a simpler way to print the dominating output frequencies of an fft?
I tried using the np.interp function but it accepts x values and returns y values. I tried inputting the reverse but it only returned the maximum frequency. I don't have an equation simply relating x and y as I've used an fft on my x and y values.
I found all y values above a certain level, y>100 in this case.
But how do I find their corresponding x values?
Fourier Plot
import matplotlib.pyplot as plt
import numpy as np
import math
import pandas as pd
from scipy.fft import fft, fftfreq
import scipy
sample_rate = 44100
duration = 5
N = sample_rate*duration
time = x = np.linspace(0, duration, N, endpoint=False)
amplitude = np.sin(7000*time*2*np.pi)*10 + np.cos(time*5000*(2*np.pi)) + np.sin(time*200*2*np.pi)*5
plt.plot(amplitude[:1000])
from scipy.fft import rfft, rfftfreq
yf = scipy.fft.rfft(amplitude)
xf = scipy.fft.rfftfreq(N, 1/sample_rate)
plt.plot(xf, np.abs(yf))
plt.xlim(0, 10000)
def check_amp(number):
if np.abs(number) > 100:
return True
return False
fft_outputs_iterator = filter(check_amp, yf)
fft_outputs = list(fft_outputs_iterator)
print(np.abs(fft_outputs))
I have a number of spectra: wavelength/counts at a given temperature. The wavelength range is the same for each spectrum.
I would like to interpolate between the temperature and counts to create a large grid of spectra (temperature and counts (at a given wavelength range).
The code below is my current progress. When I try to get a spectrum for a given temperature I only get one value of counts when I need a range of counts representing the spectrum (I already know the wavelengths).
I think I am confused about arrays and interpolation. What am I doing wrong?
import pandas as pd
import numpy as np
from scipy import interpolate
image_template_one = pd.read_excel("mr_image_one.xlsx")
counts = np.array(image_template_one['counts'])
temp = np.array(image_template_one['temp'])
inter = interpolate.interp1d(temp, counts, kind='linear')
temp_new = np.linspace(30,50,0.5)
counts_new = inter(temp_new)
I am now think that I have two arrays; [wavelength,counts] and [wavelength, temperature]. Is this correct, and, do I need to interpolate between the arrays?
Example data
I think what you want to achieve can be done with interp2d:
from scipy import interpolate
# dummy data
data = pd.DataFrame({
'temp': [30]*6 + [40]*6 + [50]*6,
'wave': 3 * [a for a in range(400,460,10)],
'counts': np.random.uniform(.93,.95,18),
})
# make the interpolator
inter = interpolate.interp2d(data['temp'], data['wave'], data['counts'])
# scipy's interpolators return functions,
# which you need to call with the values you want interpolated.
new_x, new_y = np.linspace(30,50,100), np.linspace(400,450,100)
interpolated_values = inter(new_x, new_y)
I would like to obtain the frequency of numerical data. Since the data is not equally spaced, I must interpolate it to calculate the Fourier transform. When I get the frequency, I notice that it varies depending on the maximum time in which it is interpolated.
For example, I have tried to obtain the frequency of a sine function:
import numpy as np
import scipy.interpolate
t =np.linspace(0, 200, num=2000)
func = np.sin(t)
#Interpolate the function
tmax = 150
fintep = scipy.interpolate.interp1d(t, func)
TimeInterp = np.linspace(0, tmax, num=10000)
funcInterp = fintep(TimeInterp)
#sampling rate
FsInterp = 1/((TimeInterp[1]-TimeInterp[0]))
#fft
fftInterp = np.fft.fft(funcInterp)
f1Interp = 2*fftInterp[0:int(np.ceil((len(fftInterp)+1)/2))-1]
discrfreqInterp = np.linspace(0,FsInterp/2,1+int(FsInterp/2/(FsInterp/len(fftInterp)))-1)
f1Interp = f1Interp /(2*len(discrfreqInterp))
fpos1Interp = np.where(abs(f1Interp )==max(abs(f1Interp )))
freqfftInterp = discrfreqInterp [fpos1Interp]
freqfftInterp
My problem is that depending on the value of the tmax, the frequency shows a variation. Is there any way to improve the accuracy?
how to validate whether the down sampled output is correct. For example, I had make some example, however, I am not sure whether the output is correct or not?
Any idea on the validation
Code
import numpy as np
import matplotlib.pyplot as plt # For ploting
from scipy import signal
import mne
fs = 100 # sample rate
rsample=50 # downsample frequency
fTwo=400 # frequency of the signal
x = np.arange(fs)
y = [ np.sin(2*np.pi*fTwo * (i/fs)) for i in x]
f_res = signal.resample(y, rsample)
xnew = np.linspace(0, 100, f_res.size, endpoint=False)
#
# ##############################
#
plt.figure(1)
plt.subplot(211)
plt.stem(x, y)
plt.subplot(212)
plt.stem(xnew, f_res, 'r')
plt.show()
Plotting the data is a good first take at a verification. Here I made regular plot with the points connected by lines. The lines are useful since they give a guide for where you expect the down-sampled data to lie, and also emphasize what the down-sampled data is missing. (It would also work to only show lines for the original data, but lines, as in a stem plot, are too confusing, imho.)
import numpy as np
import matplotlib.pyplot as plt # For ploting
from scipy import signal
fs = 100 # sample rate
rsample=43 # downsample frequency
fTwo=13 # frequency of the signal
x = np.arange(fs, dtype=float)
y = np.sin(2*np.pi*fTwo * (x/fs))
print y
f_res = signal.resample(y, rsample)
xnew = np.linspace(0, 100, f_res.size, endpoint=False)
#
# ##############################
#
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xnew, f_res, 'or')
plt.show()
A few notes:
If you're trying to make a general algorithm, use non-rounded numbers, otherwise you could easily introduce bugs that don't show up when things are even multiples. Similarly, if you need to zoom in to verify, go to a few random places, not, for example, only the start.
Note that I changed fTwo to be significantly less than the number of samples. Somehow, you need at least more than one data point per oscillation if you want to make sense of it.
I also remove the loop for calculating y: in general, you should try to vectorize calculations when using numpy.
The spectrum of the resampled signal should have a tone at the same frequency as the input signal just in a smaller nyquist bandwidth.
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
import scipy.fftpack as fft
fs = 100 # sample rate
rsample=50 # downsample frequency
fTwo=10 # frequency of the signal
n = np.arange(1024)
y = np.sin(2*np.pi*fTwo/fs*n)
y_res = signal.resample(y, len(n)/2)
Y = fft.fftshift(fft.fft(y))
f = -fs*np.arange(-512, 512)/1024
Y_res = fft.fftshift(fft.fft(y_res, 1024))
f_res = -fs/2*np.arange(-512, 512)/1024
plt.figure(1)
plt.subplot(211)
plt.stem(f, abs(Y))
plt.subplot(212)
plt.stem(f_res, abs(Y_res))
plt.show()
The tone is still at 10.
IF you down sample a signal both signals will still have the exact same value and a given time , so just loop through "time" and check that the values are the same. In your case you go from a sample rate of 100 to 50. Assuming you have 1 seconds worth of data from building your x from fs, then just loop through t = 0 to t=1 in 1/50'th increments and make sure that Yd(t) = Ys(t) where Yd d is the down sampled f and Ys is the original sampled frequency. Or to say it simply Yd(n) = Ys(2n) for n = 1,2,3,...n=total_samples-1.
I've been trying to design a bandpass filter using scipy but I keep getting a LinAlg Singular Matrix error. I read that a singular matrix is one that is not invertable, but I'm not sure how that error is coming up and what I can do to fix it
The code takes in an EEG signal (which, in the code below, I have just replaced with an int array for testing) and filters out frequencies < 8Hz and > 12Hz (alpha band)
Can anyone shed some light on where the singular matrix error is coming from? Or alternatively, if you know of a better way to filter a signal like this I'd love to test out other options too
from scipy import signal
from scipy.signal import filter_design as fd
import matplotlib.pylab as plt
#bandpass
Wp = [8, 12] # Cutoff frequency
Ws = [7.5, 12.5] # Stop frequency
Rp = 1 # passband maximum loss (gpass)
As = 100 # stoppand min attenuation (gstop)
b,a = fd.iirdesign(Wp,Ws,Rp,As,ftype='butter')
w,H = signal.freqz(b,a) # filter response
plt.plot(w,H)
t = np.linspace(1,256,256)
x = np.arange(256)
plt.plot(t,x)
y = signal.filtfilt(b,a,x)
plt.plot(t,y)
As indicated in iirdesign documentation, Wp and Ws are "are normalized from 0 to 1, where 1 is the Nyquist frequency".
If your sampling rate is Fs (for example 100Hz) you can normalize the cutoff and stop frequencies using:
Wp = [x / (Fs/2.0) for x in Wp]
Ws = [x / (Fs/2.0) for x in Ws]