I am receiving data from a vibration sensor in the form of two numpy arrays. The first array represents the actual values for the vibration measurement and the second array is the corresponding time information (timestamp). For example:
vibraton_data = np.array([621,1546,262])
timestamps = np.array([1592583531, 1592583548, 1592583555])
That means that for every vibration measurement I have the time information.
I would like to apply now the Fast-Fourier-Transformation. Does anyone know how to do that? My first try would be something like this:
N = 600
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
yf = fft(vibration_data)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
But I dont know here how to deal with the time information from my time array.
This is really a signal processing question, not a Python one. You have several options here:
if your data is uniformly sampled - you can ignore the timestamps altogether. All the information you need is in the data, and the (constant) sampling frequency: f_s = 1.0 / (timestamps[1] - timestamps[0])
if not, you can either:
use Non-uniform DFT (here is one implementation, haven't tried)
interpolate the data between non-uniform timestamps so it becomes uniform. Note that effectively, this applies a low-pass filter to your data, which may be not what you want (more on the effects of interpolation here).
In all cases, when you perform FFT, time information is not required anymore, as you are in the frequency domain.
Related
I'm trying to do some tests before I proceed analyzing some real dataset via FFT, and I've found the following problem.
First, I create a signal as the sum of two cosines and then use rfft to to the transformation (since it has only real values):
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import rfft, rfftfreq
# Number of sample points
N = 800
# Sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = 0.5*np.cos(10*2*np.pi*x) + 0.5*np.cos(200*2*np.pi*x)
# FFT
yf = rfft(y)
xf = rfftfreq(N, T)
fig, ax = plt.subplots(1,2,figsize=(15,5))
ax[0].plot(x,y)
ax[1].plot(xf, 2.0/N*np.abs(yf))
As it can be seen from the definition of the signal, I have two oscillations with amplitude 0.5 and frequency 10 and 200. Now, I would expect the FFT spectrum to be something like two deltas at those points, but apparently increasing the frequency broadens the peaks:
From the first peak it can be infered that the amplitude is 0.5, but not for the second. I've tryied to obtain the area under the peak using np.trapz and use that as an estimate for the amplitude, but as it is close to a dirac delta it's very sensitive to the interval I choose. My problem is that I need to get the amplitude as exact as possible for my data analysis.
EDIT: As it seems to be something related with the number of points, I decided to increment (now that I can) the sample frequency. This seems to solve the problem, as it can be seen in the figure:
However, it still seems strange that for a certain number of points and sample frequency, the high frequency peaks broaden...
It is not strange , you have leakage of the frequency bins. When you discretize the signal (sampling) needed for the Fourier transfrom , frequency bins are created which are frequency intervals where the the amplitude is calculated. And each bin has wide which is given by the sample_rate / num_points . So , the less the number of bins the more difficult is to assign precise amplitudes to every frequency. Other problems in choosing the best sampling rate exist such as the shannon-nyquist theorem to prevent aliasing. https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem . But depending on the problem sometimes there some custom rates used for sampling. E.g. when dealing with audio a sampling rate of 44,100 Hz is widely used , cause is based on the limits of the human hearing. So it depends also on nature of the data you want to perform analysis as you wrote. Anyway , since this question has also theoretical value , you can also check https://dsp.stackexchange.com for some useful info.
I would comment to George's answer, but yet I cannot.
Maybe a starting point for your research are the properties of the Discrete Fourier Transform.
The signal in the time domain is actual the cosines multiplied by a box window which transforms into the frequency domain as the convolution of the deltas with the sinc function. The sinc functions will smear the spectrum.
However, I am not sure we are observing spectral leakage here, since the window fits exactly to the full period of cosines. The discretization of the bins might still play a role here.
Good day
EDIT:
What I want: From any current/voltage waveform on a Power System(PS) I want the filtered 50Hz (fundamental) RMS values magnitudes (and effectively their angles). The current as measured contains all harmonics from 100Hz to 1250Hz depending on the equipment. One cannot analyse and calculate using a wave with these harmonics your error gets so big (depending on equipment) that PS protection equipment calculates incorrect quantities. The signal attached also has MANY many other frequency components involved.
My aim: PS protection Relays are special and calculate a 20ms window in a very short time. I.m not trying to get this. I'm using external recording tech and testing what the relays see are true and they operate correctly. Thus I need to do what they do and only keep 50Hz values without any harmonic and DC.
Important expected result: Given any frequency component that MAY be in the signal I want to see the magnitude of any given harmonic (150,250 - 3rd harmonic magnitudes and 5th harmonic of the fundamental) as well as the magnitude of the DC. This will tell me what type of PS equipment possibly injects these frequencies. It is important that I provide a frequency and the answer is a vector of that frequency only with all other values filtered OUT.
RMS-of-the-fundamental vs RMS differs with a factor of 4000A (50Hz only) and 4500A (with other freqs included)
This code calculates a One Cycle Fourier value (RMS) for given frequency. Sometimes called a Fourier filter I think? I use it for Power System 50Hz/0Hz/150Hz analogues analysis. (The answers have been tested and are correct fundamental RMS values. (https://users.wpi.edu/~goulet/Matlab/overlap/trigfs.html)
For a large sample the code is very slow. For 55000 data points it takes 12seconds. For 3 voltages and 3 currents this gets to be VERY slow. I look at 100s of records a day.
How do I enhance it? What Python tips and tricks/ libraries are there to append my lists/array.
(Also feel free to rewrite or use the code). I use the code to pick out certain values out of a signal at given times. (which is like reading the values from a specialized program for power system analysis)
Edited: with how I load the files and use them, code works with pasting it:
import matplotlib.pyplot as plt
import csv
import math
import numpy as np
import cmath
# FILES ATTACHED TO POST
filenamecfg = r"E:/Python_Practise/2019-10-21 13-54-38-482.CFG"
filename = r"E:/Python_Practise/2019-10-21 13-54-38-482.DAT"
t = []
IR = []
newIR=[]
with open(filenamecfg,'r') as csvfile1:
cfgfile = [row for row in csv.reader(csvfile1, delimiter=',')]
numberofchannels=int(np.array(cfgfile)[1][0])
scaleval = float(np.array(cfgfile)[3][5])
scalevalI = float(np.array(cfgfile)[8][5])
samplingfreq = float(np.array(cfgfile)[numberofchannels+4][0])
numsamples = int(np.array(cfgfile)[numberofchannels+4][1])
freq = float(np.array(cfgfile)[numberofchannels+2][0])
intsample = int(samplingfreq/freq)
#TODO neeeed to get number of samples and frequency and detect
#automatically
#scaleval = np.array(cfgfile)[3]
print('multiplier:',scaleval)
print('SampFrq:',samplingfreq)
print('NumSamples:',numsamples)
print('Freq:',freq)
with open(filename,'r') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
t.append(float(row[1])/1000000) #get time from us to s
IR.append(float(row[6]))
newIR = np.array(IR) * scalevalI
t = np.array(t)
def mag_and_theta_for_given_freq(f,IVsignal,Tsignal,samples): #samples are the sample window size you want to caclulate for (256 in my case)
# f in hertz, IVsignal, Tsignal in numpy.array
timegap = Tsignal[2]-Tsignal[3]
pi = math.pi
w = 2*pi*f
Xr = []
Xc = []
Cplx = []
mag = []
theta = []
#print("Calculating for frequency:",f)
for i in range(len(IVsignal)-samples):
newspan = range(i,i+samples)
timewindow = Tsignal[newspan]
#print("this is my time: ",timewindow)
Sig20ms = IVsignal[newspan]
N = len(Sig20ms) #get number of samples of my current Freq
RealI = np.multiply(Sig20ms, np.cos(w*timewindow)) #Get Real and Imaginary part of any signal for given frequency
ImagI = -1*np.multiply(Sig20ms, np.sin(w*timewindow)) #Filters and calculates 1 WINDOW RMS (root mean square value).
#calculate for whole signal and create a new vector. This is the RMS vector (used everywhere in power system analysis)
Xr.append((math.sqrt(2)/N)*sum(RealI)) ### TAKES SO MUCH TIME
Xc.append((math.sqrt(2)/N)*sum(ImagI)) ## these steps make RMS
Cplx.append(complex(Xr[i],Xc[i]))
mag.append(abs(Cplx[i]))
theta.append(np.angle(Cplx[i]))#th*180/pi # this can be used to get Degrees if necessary
#also for freq 0 (DC) id the offset is negative how do I return a negative to indicate this when i'm using MAGnitude or Absolute value
return Cplx,mag,theta #mag[:,1]#,theta # BUT THE MAGNITUDE WILL NEVER BE zero
myZ,magn,th = mag_and_theta_for_given_freq(freq,newIR,t,intsample)
plt.plot(newIR[0:30000],'b',linewidth=0.4)#, label='CFG has been loaded!')
plt.plot(magn[0:30000],'r',linewidth=1)
plt.show()
The code as pasted runs smoothly given the files attached
Regards
EDIT: Please find a test csvfile and COMTRADE TEST files here:
CSV:
https://drive.google.com/open?id=18zc4Ms_MtYAeTBm7tNQTcQkTnFWQ4LUu
COMTRADE
https://drive.google.com/file/d/1j3mcBrljgerqIeJo7eiwWo9eDu_ocv9x/view?usp=sharing
https://drive.google.com/file/d/1pwYm2yj2x8sKYQUcw3dPy_a9GrqAgFtD/view?usp=sharing
Forewords
As I said in my previous comment:
Your code mainly relies on a for loop with a lot of indexation and
scalar operations. You already have imported numpy so you should take
advantage of vectorization.
This answer is a start towards your solution.
Light weight MCVE
First we create a trial signal for the MCVE:
import numpy as np
# Synthetic signal sampler: 5s sampled as 400 Hz
fs = 400 # Hz
t = 5 # s
t = np.linspace(0, t, fs*t+1)
# Synthetic Signal: Amplitude is about 325V #50Hz
A = 325 # V
f = 50 # Hz
y = A*np.sin(2*f*np.pi*t) # V
Then we can compute the RMS of this signal using the usual formulae:
# Actual definition of RMS:
yrms = np.sqrt(np.mean(y**2)) # 229.75 V
Or alternatively we can compute it using DFT (implemented as rfft in numpy.fft):
# RMS using FFT:
Y = np.fft.rfft(y)/y.size
Yrms = np.sqrt(np.real(Y[0]**2 + np.sum(Y[1:]*np.conj(Y[1:]))/2)) # 229.64 V
A demonstration of why this last formulae works can be found here. This is valid because of the Parseval Theorem implies Fourier transform does conserve Energy.
Both versions make use of vectorized functions, no need of splitting real and imaginary part to perform computation and then reassemble into a complex number.
MCVE: Windowing
I suspect you want to apply this function as a window on a long term time serie where RMS value is about to change. Then we can tackle this problem using pandas library which provides time series commodities.
import pandas as pd
We encapsulate the RMS function:
def rms(y):
Y = 2*np.fft.rfft(y)/y.size
return np.sqrt(np.real(Y[0]**2 + np.sum(Y[1:]*np.conj(Y[1:]))/2))
We generate a damped signal (variable RMS)
y = np.exp(-0.1*t)*A*np.sin(2*f*np.pi*t)
We wrap our trial signal into a dataframe to take advantage of the rolling or resample methods:
df = pd.DataFrame(y, index=t*pd.Timedelta('1s'), columns=['signal'])
A rolling RMS of your signal is:
df['rms'] = df.rolling(int(fs/f)).agg(rms)
A periodically sampled RMS is:
df['signal'].resample('1s').agg(rms)
The later returns:
00:00:00 2.187840e+02
00:00:01 1.979639e+02
00:00:02 1.791252e+02
00:00:03 1.620792e+02
00:00:04 1.466553e+02
Signal Conditioning
Addressing your need of keeping only fundamental harmonic (50 Hz), a straightforward solution could be a linear detrend (to remove constant and linear trend) followed by a Butterworth filter (bandpass filter).
We generate a synthetic signal with other frequencies and linear trend:
y = np.exp(-0.1*t)*A*(np.sin(2*f*np.pi*t) \
+ 0.2*np.sin(8*f*np.pi*t) + 0.1*np.sin(16*f*np.pi*t)) \
+ A/20*t + A/100
Then we condition the signal:
from scipy import signal
yd = signal.detrend(y, type='linear')
filt = signal.butter(5, [40,60], btype='band', fs=fs, output='sos', analog=False)
yfilt = signal.sosfilt(filt, yd)
Graphically it leads to:
It resumes to apply the signal conditioning before the RMS computation.
I am willing to apply Fourier transform on a time series data to convert data into frequency domain. I am not sure if the method I've used to apply Fourier Transform is correct or not? Following is the link to data that I've used.
After reading the data file I've plotted original data using
t = np.linspace(0,55*24*60*60, 55)
s = df.values
sns.set_style("darkgrid")
plt.ylabel("Amplitude")
plt.xlabel("Time [s]")
plt.plot(t, s)
plt.show()
Since the data is on a daily frequency I've converted it into seconds using 24*60*60 and for a period of 55 days using 55*24*60*60
The graph looks as follows:
Next I've implemeted Fourier Transform using following piece of code and obtained the image as follows:
#Applying Fourier Transform
fft = fftpack.fft(s)
#Time taken by one complete cycle of wave (seconds)
T = t[1] - t[0]
#Calculating sampling frequency
F = 1/T
N = s.size
#Avoid aliasing by multiplying sampling frequency by 1/2
f = np.linspace(0, 0.5*F, N)
#Convert frequency to mHz
f = f * 1000
#Plotting frequency domain against amplitude
sns.set_style("darkgrid")
plt.ylabel("Amplitude")
plt.xlabel("Frequency [mHz]")
plt.plot(f[:N // 2], np.abs(fft)[:N // 2])
plt.show()
I've following questions:
I am not sure if my above methodology is correct to implement Fourier Transform.
I am not sure if the method I am using to avoid aliasing is correct.
If, what I've done is correct than how to interpret the three peaks in Frequency domain plot.
Finally, how would I invert transform using only frequencies that are significant.
While I'd refrain from answering your first two questions (it looks okay to me but I'd love an expert's input), I can weigh in on the latter two:
If, what I've done is correct than how to interpret the three peaks in Frequency domain plot.
Well, that means you've got three main components to your signal at frequencies roughly 0.00025 mHz (not the best choice of units here, possibly!), 0.00125 mHz and 0.00275 mHz.
Finally, how would I invert transform using only frequencies that are significant.
You could just zero out every frequency below a cutoff you decide (say, absolute value of 3 - that should cover your peaks here). Then you can do:
below_cutoff = np.abs(fft) < 3
fft[below_cutoff] = 0
cleaner_signal = fftpack.ifft(fft)
And that should do it, really!
I have a signal generated by a simulation program. Because the solver in this program has a variable time step, I have a signal with unevenly spaced data. I have two lists, a list with the signal values, and another list with the times at which each value occurred. The data could be something like this
npts = 500
t=logspace(0,1,npts)
f1 = 0.5
f2 = 0.6
sig=(1+sin(2*pi*f1*t))+(1+sin(2*pi*f2*t))
I would like to be able to perform a frequency analysis on this signal using python. It seems I cannot use the fft function in numpy, because this requires evenly spaced data. Are there any standard functions which could help me find the frequencies contained in this signal?
The most common algorithm to solve such things is called a Least-Squares Spectral analysis of frequencies. It looks like this will be in a future release of the scipy.signals package. Maybe there is a current version, but I can't seem to find it... In addition, there is some code available from Astropython, which I will not copy in it's entirety, but it essentially creates a lomb class which you can use the following code to get some values out.. What you need to do is the following:
import numpy
import lomb
x = numpy.arange(10)
y = numpy.sin(x)
fx,fy, nout, jmax, prob = lomb.fasper(x,y, 6., 6.)
Very simple, just look up the formula for a Fourier transform, and implement it as a discrete sum over your data values:
given a set of values f(x) over some set of x, then for each frequency k,
F(k) = sum_x ( exp( +/-i * k *x ) )
choose your k's ranging from 0 to 2*pi / min separation in x.
and, you can use 2 * pi / max(x) as the increment size
For a test case, use something for which you known the correct answer, c.f., a single cos( k' * x ) for some k', or a Gaussian.
An easy way out is to interpolate to evenly-spaced time intervals
Essentially I've got an excel files with voltage in the first column, and time in the second. I want to find the period of the voltages, as it returns a graph of voltage in y axis and time in x axis with a periodicity, looking similar to a sine function.
To find the frequency I have uploaded my excel file to python as I think this will make it easier- there may be something I've missed that will simplify this.
So far in python I have:
import xlrd
import numpy as N
import numpy.fft as F
import matplotlib.pyplot as P
wb = xlrd.open_workbook('temp7.xls') #LOADING EXCEL FILE
wb.sheet_names()
sh = wb.sheet_by_index(0)
first_column = sh.col_values(1) #VALUES FROM EXCEL
second_column = sh.col_values(2) #VALUES FROM EXCEL
Now how do I find the frequency from this?
I'm not sure how much you know about the Fourier transform, so forgive me if this is too much background.
Your signal does not have "a frequency", it is but it can be thought of as the sum of many frequencies. The Fourier transform will tell you the weights of all the frequencies that make up your signal. Unfortunately information may be lost when sampling from the analog (continuous time) to digital (discrete time) domain. This puts a constraint on the information we can get about frequency - namely that the maximum frequency component we can determine is related to the digital sampling rate (Nyquist-Shannon criterion):
fs > 2B
Where fs is your sampling rate (samples/unit time, typically in Hz or something like it), and B is the maximum frequency of your signal. If your signal actually has frequencies higher than B they will be "aliased" to some value lower than B.
For your problem, all you have to do is this:
x = N.array(first_column)
X = F.fft(x)
Now X is the frequency-domain representation of your voltage signal. The corresponding frequency axis covers [0, fs), based on the sampling theorem. So, what is fs? You need to calculate that by looking at the number of samples you have divided by the total duration of your sampled signal (note your units here):
fs = len(second_column) / second_column[-1]
Note that this representation of your signal will also (probably) be complex, i.e. each frequency will have an associated amplitude and phase.
Hopefully this helps, and hopefully I didn't cover a bunch of stuff you already knew.