downsampling data using timestamp information

downsampling data using timestamp information - python

I have an array of some arbitrary data x and associated timestamps t that correspond to the data in x (they are the same length N).
I want to downsample my data x to a smaller length M < N, such that the new data is roughly equally spaced in time (by using the timestamp information). This would be instead of simply decimating the data by taking every nth datapoint. Using the closest time-neighbor is fine.
scipy has some resampling code, but it actually tries to interpolate between data points, which I cannot do for my data. Does numpy or scipy have code that does this?
For example, suppose I want to downsample the letters of the alphabet according to some logarithmic time:
import string
import numpy as np
x = string.lowercase[::]
t = np.logspace(1, 10, num=26)
y = downsample(x, t, 8)

I'd suggest using pandas, specifically the resample function:
Convenience method for frequency conversion and resampling of regular time-series data.
Note the how parameter in particular.
You can convert your numpy array to a DataFrame:
import pandas as pd
YourPandasDF = pd.DataFrame(YourNumpyArray)

Related

FFT on the csv data gives peak at 0

I have a dataset a csv file representing a wave like shown below. I would like to find the frequency of oscillations, so I have done fft. But the output of fft is peak at zero. I am new to python and fft. So I am not sure what I am doing wrong.
The data is captured at 300Hz(300 data points in one second). The data set contains 6317 values.
[image1]
Every peak has a wave following it. Here is an example at data points from 250 to 350
[image2]
import matplotlib.pyplot as plt
import csv
import numpy as np
csvfile=open('./abc.csv')
csvreader=csv.reader(csvfile)
readdata=next(csvreader)
csvfile.close()
data=np.array([readdata],dtype='float')
data1=data.reshape(6317,)
sp = np.fft.fft(data1)
sp_mag=np.abs(sp)/data1.size
freq = np.fft.fftfreq(data1.shape[-1])
plt.subplot(2,1,1)
plt.plot(data1)
plt.subplot(2,1,2)
plt.plot(freq,sp_mag)
plt.show()
The csv is available here .
The frequency associated with first three and next 3 peaks is same. So in fft i expect two peaks t different frequency.
Any help is really appreciated. Kindly let me know if any other data is needed to answer this question.

The value of the FFT at 0 is proportional to the sum of the data. Probably the easiest fix is to subtract off the mean of the data before taking the FFT (assuming you don't care about the constant offset).
Adopting the notation from wikipedia
X[m] = sum[ x[n]*exp(-i*2*pi*n*m/N) ]
(X is the FFT, x is the original data)
For m=0, the exponential factors are all ==1, so X[0] == sum[x[n]] (for this convention on where to put the normalization factors).

FFT using Python

I am receiving data from a vibration sensor in the form of two numpy arrays. The first array represents the actual values for the vibration measurement and the second array is the corresponding time information (timestamp). For example:
vibraton_data = np.array([621,1546,262])
timestamps = np.array([1592583531, 1592583548, 1592583555])
That means that for every vibration measurement I have the time information.
I would like to apply now the Fast-Fourier-Transformation. Does anyone know how to do that? My first try would be something like this:
N = 600
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
yf = fft(vibration_data)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
But I dont know here how to deal with the time information from my time array.

This is really a signal processing question, not a Python one. You have several options here:
if your data is uniformly sampled - you can ignore the timestamps altogether. All the information you need is in the data, and the (constant) sampling frequency: f_s = 1.0 / (timestamps[1] - timestamps[0])
if not, you can either:
use Non-uniform DFT (here is one implementation, haven't tried)
interpolate the data between non-uniform timestamps so it becomes uniform. Note that effectively, this applies a low-pass filter to your data, which may be not what you want (more on the effects of interpolation here).
In all cases, when you perform FFT, time information is not required anymore, as you are in the frequency domain.

How to find peaks in 1d array

I am reading a csv file in python and preparing a dataframe out of it. I have a Microsoft Kinect which is recording Arm Abduction exercise and generating this CSV file.
I have this array of Y-Coordinates of ElbowLeft joint. You can visualize this here. Now, I want to come up with a solution which can count number of peaks or local maximum in this array.
Can someone please help me to solve this problem?

You can use the find_peaks_cwt function from the scipy.signal module to find peaks within 1-D arrays:
from scipy import signal
import numpy as np
y_coordinates = np.array(y_coordinates) # convert your 1-D array to a numpy array if it's not, otherwise omit this line
peak_widths = np.arange(1, max_peak_width)
peak_indices = signal.find_peaks_cwt(y_coordinates, peak_widths)
peak_count = len(peak_indices) # the number of peaks in the array
More information here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html

It's easy, put the data in a 1-d array and compare each value with the neighboors, the n-1 and n+1 data are smaller than n.
Read data as Robert Valencia suggests
max_local=0
for u in range (1,len(data)-1):
if ((data[u]>data[u-1])&(data[u]>data[u+1])):
max_local=max_local+1

You could try to smooth the data with a smoothing filter and then find all values where the value before and after are less than the current value. This assumes you want all peaks in the sequence. The reason you need the smoothing filter is to avoid local maxima. The level of smoothing required will depend on the noise present in your data.
A simple smoothing filter sets the current value to the average of the N values before and N values after the current value in your sequence along with the current value being analyzed.

pandas rolling_std only perform every Nth calculation

I am working on some code optimization. Currently I use the pandas rolling_mean and rolling_std to compute normalized cross correlations of time series data from seismic instruments. For non-pertinent technical reasons I am only interested in every Nth value of the output of these pandas rolling mean and rolling std calls, so I am looking for away to only compute every Nth value. I may have to write a cython code to do this but I would prefer not to. Here is an example:
import pandas as pd
import numpy as np
As=5000 #Array size
as=150 #Moving window size
N=3 # only interested in every N values of output array
ar=np.random.rand(As) # generate generic random array
RSTD=pd.rolling_std(ar,as)[as-1:] # dont return the nans before widows overlap
foo=RSTD[::N] # use array indexing to decimate RSTD to only return every Nth value
Is there a good pandas way to only calculate every Nth value of RSTD rather than calculate all the values and decimate?
Thanks

scipy/numpy FFT on data from file

I looked into many examples of scipy.fft and numpy.fft. Specifically this example Scipy/Numpy FFT Frequency Analysis is very similar to what I want to do. Therefore, I used the same subplot positioning and everything looks very similar.
I want to import data from a file, which contains just one column to make my first test as easy as possible.
My code writes like this:
import numpy as np
import scipy as sy
import scipy.fftpack as syfp
import pylab as pyl
# Read in data from file here
array = np.loadtxt("data.csv")
length = len(array)
# Create time data for x axis based on array length
x = sy.linspace(0.00001, length*0.00001, num=length)
# Do FFT analysis of array
FFT = sy.fft(array)
# Getting the related frequencies
freqs = syfp.fftfreq(array.size, d=(x[1]-x[0]))
# Create subplot windows and show plot
pyl.subplot(211)
pyl.plot(x, array)
pyl.subplot(212)
pyl.plot(freqs, sy.log10(FFT), 'x')
pyl.show()
The problem is that I will always get my peak at exactly zero, which should not be the case at all. It really should appear at around 200 Hz.
With smaller range:
Still biggest peak at zero.

As already mentioned, it seems like your signal has a DC component, which will cause a peak at f=0. Try removing the mean with, e.g., arr2 = array - np.mean(array).
Furthermore, for analyzing signals, you might want to try plotting power spectral density.:
import matplotlib.pylab as plt
import matplotlib.mlab as mlb
Fs = 1./(d[1]- d[0]) # sampling frequency
plt.psd(array, Fs=Fs, detrend=mlb.detrend_mean)
plt.show()
Take a look at the documentation of plt.psd(), since there a quite a lot of options to fiddle with. For investigating the change of the spectrum over time, plt.specgram() comes in handy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

downsampling data using timestamp information - python

I'd suggest using pandas, specifically the resample function: Convenience method for frequency conversion and resampling of regular time-series data. Note the how parameter in particular. You can convert your numpy array to a DataFrame: import pandas as pd YourPandasDF = pd.DataFrame(YourNumpyArray)

Related

FFT on the csv data gives peak at 0

FFT using Python

How to find peaks in 1d array

pandas rolling_std only perform every Nth calculation

scipy/numpy FFT on data from file

Categories

Resources