Issues with scipy find_peaks function when used on an inverted dataset - python

The script below is a mixture of stackoverflow answers on different topics, but closely related to finding peaks on signals. Finding peaks based on prominence, as noted here works incredibly well, but my issue is that I need to find the lowest point immediately after the peak. The dataset is a fluorescence signal of a plant captured during 14 continuous hours, and the peaks are saturating pulses used to determined saturation under light conditions. A picture of the dataset (a 68MB CSV file) bellow:
This is my python script:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
# A parser is required to translate the timestamp
custom_date_parser = lambda x: datetime.strptime(x, "%d-%m-%Y %H:%M_%S.%f")
df = pd.read_csv('15-01-2022_05_00.csv', parse_dates=[ 'Timestamp'], date_parser=custom_date_parser)
x = df['Timestamp']
y = df['Mean_values']
# As per accepted answer here:
#https://stackoverflow.com/questions/1713335/peak-finding-algorithm-for-python-scipy
peaks, _ = find_peaks(y, prominence=1)
# Invert the data to find the lowest points of peaks as per answer here:
#https://stackoverflow.com/questions/61365881/is-there-an-opposite-version-of-scipy-find-peaks
valleys, _ = find_peaks(-y, prominence=1)
print(y[peaks])
print(y[valleys])
plt.subplot(2, 1, 1)
plt.plot(peaks, y[peaks], "ob"); plt.plot(y); plt.legend(['Prominence'])
plt.subplot(2, 1, 2)
plt.plot(valleys, y[valleys], "vg"); plt.plot(y); plt.legend(['Prominence Inverted'])
plt.show()
As you can see on the picture, not all the 'prominence inverted' points are below the respective peak. The prominence inverted function comes from this post here, and it simply inverts the dataset. Some are adjacent to the previous peak (difficult to see in the picture). Peaks and valleys below:
Peaks
1817 109.587178
3674 89.191393
56783 72.779385
111593 77.868118
166403 83.288949
221213 84.955026
276023 84.340550
330833 83.186605
385643 81.134827
440453 79.060960
495264 77.457803
550074 76.292243
604884 75.867575
659694 75.511924
714504 74.221657
769314 73.830941
824125 76.977637
878935 78.826169
933745 77.819844
988555 77.298089
1043365 77.188105
1098175 75.340765
1152985 74.311185
1207796 73.163844
1262606 72.613317
1317416 73.460068
1372226 70.388324
1427036 70.835355
1481845 70.154085
Valleys
2521 4.669368
56629 26.551585
56998 26.184984
111791 26.288734
166620 27.717165
221434 28.312708
330432 28.235397
385617 27.535091
440341 26.886589
495174 26.379043
549353 26.040947
550239 25.760023
605051 25.594147
714352 25.354300
714653 25.008184
769472 24.883584
824284 25.135316
879075 25.477464
933907 25.265173
988711 25.160046
1097917 25.058851
1098333 24.626667
1153134 24.357835
1207943 23.982878
1262750 23.938298
1371013 23.766077
1372381 23.351263
1427187 23.368314
Any ideas about this awkward result on the valleys?

You complicate your task by trying to find all valleys. This will always be difficult because they do not stand out as well as your peaks in comparison to the surrounding data. Whatever your parameters for find_peaks, sometimes it will identify two valleys after a peak, sometimes none. Instead, just identify the local minimum after each peak:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
#sample data
from scipy.misc import electrocardiogram
x = electrocardiogram()[2000:4000]
date_range = pd.date_range("20210116", periods=x.size, freq="10ms")
df = pd.DataFrame({"Timestamp": date_range, "Mean_values": x})
x = df['Timestamp']
y = df['Mean_values']
fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12, 8))
#peak finding
peaks, _ = find_peaks(y, prominence=1)
ax1.plot(x[peaks], y[peaks], "ob")
ax1.plot(x, y)
ax1.legend(['Prominence'])
#valley finder general
valleys, _ = find_peaks(-y, prominence=1)
ax2.plot(x[valleys], y[valleys], "vg")
ax2.plot(x, y)
ax2.legend(['Valleys without filtering'])
#valley finding restricted to a short time period after a peak
#set time window, e.g., for 200 ms
time_window_size = pd.Timedelta(200, unit="ms")
time_of_peaks = x[peaks]
peak_end = x.searchsorted(time_of_peaks + time_window_size)
#in case of evenly spaced data points, this can be simplified
#and you just add n data points to your peak index array
#peak_end = peaks + n
true_valleys = peaks.copy()
for i, (start, stop) in enumerate(zip(peaks, peak_end)):
true_valleys[i] = start + y[start:stop].argmin()
ax3.plot(x[true_valleys], y[true_valleys], "sr")
ax3.plot(x, y)
ax3.legend(['Valleys after events'])
plt.show()
Sample output:
I am not sure what you intend to do with these minima, but if you are only interested in baseline shifts, you can directly calculate the peak-wise baseline values like
baseline_per_peak = peaks.copy().astype(float)
for i, (start, stop) in enumerate(zip(peaks, peak_end)):
baseline_per_peak[i] = y[start:stop].mean()
print(baseline_per_peak)
Sample output:
[-0.71125 -0.203 0.29225 0.72825 0.6835 0.79125 0.51225 0.23
0.0345 -0.3945 -0.48125 -0.4675 ]
This can, of course, also easily be adapted to the period before the peak:
#valley in the short time period before a peak
#set time window, e.g., for 200 ms
time_window_size = pd.Timedelta(200, unit="ms")
time_of_peaks = x[peaks]
peak_start = x.searchsorted(time_of_peaks - time_window_size)
#in case of evenly spaced data points, this can be simplified
#and you just add n data points to your peak index array
#peak_start = peaks - n
true_valleys = peaks.copy()
for i, (start, stop) in enumerate(zip(peak_start, peaks)):
true_valleys[i] = start + y[start:stop].argmin()

Related

Is there a way I can find the range of local maxima of histogram?

I'm wondering if there's a way I can find the range of local maxima of a histogram. For instance, suppose I have the following histogram (just ignore the orange curve):
The histogram is actually obtained from a dictionary. I'm hoping to find the range of local maxima of this histogram (on the horizontal axis), which are, say, 1.3-1.6, and 2.1-2.4 in this case. I have no idea which tools would be helpful or which techniques I may want to use. I know there's a tool to find local maxima of a 1-D array:
from scipy.signal import argrelextrema
x = np.random.random(12)
argrelextrema(x, np.greater)
but I don't think it would work here since I'm looking for a range, and there're some 'wiggles' on the histogram. Can anyone give me some suggestions/examples about how I can obtain the range I'm looking for? Thanks a lot for the help
PS: I trying to not just search for the ranges of x whose y values are above a certain limit:)
I don't know if I correctly understand what you want to do, but you can treat the histogram as a Probability Density Function (PDF) of a bimodal distribution, then find the modes and the Highest Density Intervals (HDIs) around the two modes.
So, I create some sample data
import numpy as np
import pandas as pd
import scipy.stats as sps
from scipy.signal import find_peaks, argrelextrema
import matplotlib.pyplot as plt
d1 = sps.norm(loc=1.3, scale=.2)
d2 = sps.norm(loc=2.2, scale=.3)
r1 = d1.rvs(size=5000, random_state=1)
r2 = d2.rvs(size=5000, random_state=1)
r = np.concatenate((r1, r2))
h = plt.hist(r, bins=100, density=True);
We have only h, the result of the hist function that will contains the density (100) and the ranges of the bins (101).
print(h[0].size)
100
print(h[1].size)
101
So we first need to choose the mean of each bin
density = h[0]
values = h[1][:-1] + np.diff(h[1])[0] / 2
plt.hist(r, bins=100, density=True, alpha=.25)
plt.plot(values, density);
Now we can normalize the PDF (to sum to 1) and smooth the data with moving average that we'll use only to get the peaks (maxima) and minima
norm_density = density / density.sum()
norm_density_ma = pd.Series(norm_density).rolling(7, center=True).mean().values
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density);
Now we can obtain indexes of maxima
peaks = find_peaks(norm_density_ma)[0]
peaks
array([24, 57])
and minima
minima = argrelextrema(norm_density_ma, np.less)[0]
minima
array([40])
and check they're correct
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
for peak in peaks:
plt.axvline(values[peak], color='r')
plt.axvline(values[minima], color='k', ls='--');
Finally, we have to find out the HDIs around the two modes (peaks) from the normalized h histogram data. We can use a simple function to get the HDI of grid (see HDI_of_grid for details and Doing Bayesian Data Analysis by John K. Kruschke)
def HDI_of_grid(probMassVec, credMass=0.95):
sortedProbMass = np.sort(probMassVec, axis=None)[::-1]
HDIheightIdx = np.min(np.where(np.cumsum(sortedProbMass) >= credMass))
HDIheight = sortedProbMass[HDIheightIdx]
HDImass = np.sum(probMassVec[probMassVec >= HDIheight])
idx = np.where(probMassVec >= HDIheight)[0]
return {'indexes':idx, 'mass':HDImass, 'height':HDIheight}
Let's say we want the HDIs to contain a mass of 0.3
# HDI around the 1st mode
hdi1 = HDI_of_grid(norm_density, credMass=.3)
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
values[hdi1['indexes']],
0, norm_density[hdi1['indexes']],
alpha=.25
)
for peak in peaks:
plt.axvline(values[peak], color='r')
for the 2nd mode, we'll get HDI from minima to avoid the 1st mode
# HDI around the 2nd mode
hdi2 = HDI_of_grid(norm_density[minima[0]:], credMass=.3)
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
values[hdi1['indexes']],
0, norm_density[hdi1['indexes']],
alpha=.25
)
plt.fill_between(
values[hdi2['indexes']+minima],
0, norm_density[hdi2['indexes']+minima],
alpha=.25
)
for peak in peaks:
plt.axvline(values[peak], color='r')
And we have the values of the two HDIs
# 1st mode
values[peaks[0]]
1.320249129265321
# 0.3 HDI
values[hdi1['indexes']].take([0, -1])
array([1.12857599, 1.45715851])
# 2nd mode
values[peaks[1]]
2.2238510564735363
# 0.3 HDI
values[hdi2['indexes']+minima].take([0, -1])
array([1.95003229, 2.47028795])

How to validate the downsampling is as intended

how to validate whether the down sampled output is correct. For example, I had make some example, however, I am not sure whether the output is correct or not?
Any idea on the validation
Code
import numpy as np
import matplotlib.pyplot as plt # For ploting
from scipy import signal
import mne
fs = 100 # sample rate
rsample=50 # downsample frequency
fTwo=400 # frequency of the signal
x = np.arange(fs)
y = [ np.sin(2*np.pi*fTwo * (i/fs)) for i in x]
f_res = signal.resample(y, rsample)
xnew = np.linspace(0, 100, f_res.size, endpoint=False)
#
# ##############################
#
plt.figure(1)
plt.subplot(211)
plt.stem(x, y)
plt.subplot(212)
plt.stem(xnew, f_res, 'r')
plt.show()
Plotting the data is a good first take at a verification. Here I made regular plot with the points connected by lines. The lines are useful since they give a guide for where you expect the down-sampled data to lie, and also emphasize what the down-sampled data is missing. (It would also work to only show lines for the original data, but lines, as in a stem plot, are too confusing, imho.)
import numpy as np
import matplotlib.pyplot as plt # For ploting
from scipy import signal
fs = 100 # sample rate
rsample=43 # downsample frequency
fTwo=13 # frequency of the signal
x = np.arange(fs, dtype=float)
y = np.sin(2*np.pi*fTwo * (x/fs))
print y
f_res = signal.resample(y, rsample)
xnew = np.linspace(0, 100, f_res.size, endpoint=False)
#
# ##############################
#
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xnew, f_res, 'or')
plt.show()
A few notes:
If you're trying to make a general algorithm, use non-rounded numbers, otherwise you could easily introduce bugs that don't show up when things are even multiples. Similarly, if you need to zoom in to verify, go to a few random places, not, for example, only the start.
Note that I changed fTwo to be significantly less than the number of samples. Somehow, you need at least more than one data point per oscillation if you want to make sense of it.
I also remove the loop for calculating y: in general, you should try to vectorize calculations when using numpy.
The spectrum of the resampled signal should have a tone at the same frequency as the input signal just in a smaller nyquist bandwidth.
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
import scipy.fftpack as fft
fs = 100 # sample rate
rsample=50 # downsample frequency
fTwo=10 # frequency of the signal
n = np.arange(1024)
y = np.sin(2*np.pi*fTwo/fs*n)
y_res = signal.resample(y, len(n)/2)
Y = fft.fftshift(fft.fft(y))
f = -fs*np.arange(-512, 512)/1024
Y_res = fft.fftshift(fft.fft(y_res, 1024))
f_res = -fs/2*np.arange(-512, 512)/1024
plt.figure(1)
plt.subplot(211)
plt.stem(f, abs(Y))
plt.subplot(212)
plt.stem(f_res, abs(Y_res))
plt.show()
The tone is still at 10.
IF you down sample a signal both signals will still have the exact same value and a given time , so just loop through "time" and check that the values are the same. In your case you go from a sample rate of 100 to 50. Assuming you have 1 seconds worth of data from building your x from fs, then just loop through t = 0 to t=1 in 1/50'th increments and make sure that Yd(t) = Ys(t) where Yd d is the down sampled f and Ys is the original sampled frequency. Or to say it simply Yd(n) = Ys(2n) for n = 1,2,3,...n=total_samples-1.

Analyzing seasonality of Google trend time series using FFT

I am trying to evaluate the amplitude spectrum of the Google trends time series using a fast Fourier transformation. If you look at the data for 'diet' in the data provided here it shows a very strong seasonal pattern:
I thought I could analyze this pattern using a FFT, which presumably should have a strong peak for a period of 1 year.
However when I apply a FFT like this (a_gtrend_ham being the time series multiplied with a Hamming window):
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import fft, fftshift
import pandas as pd
gtrend = pd.read_csv('multiTimeline.csv',index_col=0)
gtrend.index = pd.to_datetime(gtrend.index, format='%Y-%m')
# Sampling rate
fs = 12 #Points per year
a_gtrend_orig = gtrend['diet: (Worldwide)']
N_gtrend_orig = len(a_gtrend_orig)
length_gtrend_orig = N_gtrend_orig / fs
t_gtrend_orig = np.linspace(0, length_gtrend_orig, num = N_gtrend_orig, endpoint = False)
a_gtrend_sel = a_gtrend_orig.loc['2005-01-01 00:00:00':'2017-12-01 00:00:00']
N_gtrend = len(a_gtrend_sel)
length_gtrend = N_gtrend / fs
t_gtrend = np.linspace(0, length_gtrend, num = N_gtrend, endpoint = False)
a_gtrend_zero_mean = a_gtrend_sel - np.mean(a_gtrend_sel)
ham = np.hamming(len(a_gtrend_zero_mean))
a_gtrend_ham = a_gtrend_zero_mean * ham
N_gtrend = len(a_gtrend_ham)
ampl_gtrend = 1/N_gtrend * abs(fft(a_gtrend_ham))
mag_gtrend = fftshift(ampl_gtrend)
freq_gtrend = np.linspace(-0.5, 0.5, len(ampl_gtrend))
response_gtrend = 20 * np.log10(mag_gtrend)
response_gtrend = np.clip(response_gtrend, -100, 100)
My resulting amplitude spectrum does not show any dominant peak:
Where is my misunderstanding of how to use the FFT to get the spectrum of the data series?
Here is a clean implementation of what I think you are trying to accomplish. I include graphical output and a brief discussion of what it likely means.
First, we use the rfft() because the data is real valued. This saves time and effort (and reduces the bug rate) that otherwise follows from generating the redundant negative frequencies. And we use rfftfreq() to generate the frequency list (again, it is unnecessary to hand code it, and using the api reduces the bug rate).
For your data, the Tukey window is more appropriate than the Hamming and similar cos or sin based window functions. Notice also that we subtract the median before multiplying by the window function. The median() is a fairly robust estimate of the baseline, certainly more so than the mean().
In the graph you can see that the data falls quickly from its intitial value and then ends low. The Hamming and similar windows, sample the middle too narrowly for this and needlessly attenuate a lot of useful data.
For the FT graphs, we skip the zero frequency bin (the first point) since this only contains the baseline and omitting it provides a more convenient scaling for the y-axes.
You will notice some high frequency components in the graph of the FT output.
I include a sample code below that illustrates a possible origin of those high frequency components.
Okay here is the code:
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import rfft, rfftfreq
from scipy.signal import tukey
from numpy.fft import fft, fftshift
import pandas as pd
gtrend = pd.read_csv('multiTimeline.csv',index_col=0,skiprows=2)
#print(gtrend)
gtrend.index = pd.to_datetime(gtrend.index, format='%Y-%m')
#print(gtrend.index)
a_gtrend_orig = gtrend['diet: (Worldwide)']
t_gtrend_orig = np.linspace( 0, len(a_gtrend_orig)/12, len(a_gtrend_orig), endpoint=False )
a_gtrend_windowed = (a_gtrend_orig-np.median( a_gtrend_orig ))*tukey( len(a_gtrend_orig) )
plt.subplot( 2, 1, 1 )
plt.plot( t_gtrend_orig, a_gtrend_orig, label='raw data' )
plt.plot( t_gtrend_orig, a_gtrend_windowed, label='windowed data' )
plt.xlabel( 'years' )
plt.legend()
a_gtrend_psd = abs(rfft( a_gtrend_orig ))
a_gtrend_psdtukey = abs(rfft( a_gtrend_windowed ) )
# Notice that we assert the delta-time here,
# It would be better to get it from the data.
a_gtrend_freqs = rfftfreq( len(a_gtrend_orig), d = 1./12. )
# For the PSD graph, we skip the first two points, this brings us more into a useful scale
# those points represent the baseline (or mean), and are usually not relevant to the analysis
plt.subplot( 2, 1, 2 )
plt.plot( a_gtrend_freqs[1:], a_gtrend_psd[1:], label='psd raw data' )
plt.plot( a_gtrend_freqs[1:], a_gtrend_psdtukey[1:], label='windowed psd' )
plt.xlabel( 'frequency ($yr^{-1}$)' )
plt.legend()
plt.tight_layout()
plt.show()
And here is the output displayed graphically. There are strong signals at 1/year and at 0.14 (which happens to be 1/2 of 1/14 yrs), and there is a set of higher frequency signals that at first perusal might seem quite mysterious.
We see that the windowing function is actually quite effective in bringing the data to baseline and you see that the relative signal strengths in the FT are not altered very much by applying the window function.
If you look at the data closely, there seems to be some repeated variations within the year. If those occur with some regularity, they can be expected to appear as signals in the FT, and indeed the presence or absence of signals in the FT is often used to distinguish between signal and noise. But as will be shown, there is a better explanation for the high frequency signals.
Okay, now here is a sample code that illustrates one way those high frequency components can be produced. In this code, we create a single tone, and then we create a set of spikes at the same frequency as the tone. Then we Fourier transform the two signals and finally, graph the raw and FT data.
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import rfft, rfftfreq
t = np.linspace( 0, 1, 1000. )
y = np.cos( 50*3.14*t )
y2 = [ 1. if 1.-v < 0.01 else 0. for v in y ]
plt.subplot( 2, 1, 1 )
plt.plot( t, y, label='tone' )
plt.plot( t, y2, label='spikes' )
plt.xlabel('time')
plt.subplot( 2, 1, 2 )
plt.plot( rfftfreq(len(y),d=1/100.), abs( rfft(y) ), label='tone' )
plt.plot( rfftfreq(len(y2),d=1/100.), abs( rfft(y2) ), label='spikes' )
plt.xlabel('frequency')
plt.legend()
plt.tight_layout()
plt.show()
Okay, here are the graphs of the tone, and the spikes, and then their Fourier transforms. Notice that the spikes produce high frequency components that are very similar to those in our data.
In other words, the origin of the high frequency components is very likely in the short time scales associated with the spikey character of signals in the raw data.

Considering Highest Peak Curve From Two Sets of Data Points

I have two columns which would correspond to x and y-axis in which I will be eventually graphing that sets of data points to a curve like graph.
The problem is that based on the nature of the datapoints, when graphing it, I end up having two peaks however, I want to pick only the highest peak when graphing and discard the lowest peak(s) (not the highest point but the entire the highest peak graphed).
Is there away to do that in Python? I don't show the codes here because I am not sure how to do the coding at all.
Here is the datapoints (input) as well as the graph!
You can use scipy argrelextrema to get all the peaks, work out the maximum and then build up a mask array for the peak you want to plot. This will give you full control based on your data, using things like mincutoff to work out what determines a separate peak,
import numpy as np
from scipy.signal import argrelextrema
import matplotlib.pyplot as plt
#Setup and plot data
fig, ax = plt.subplots(1,2)
y = np.array([0,0,0,0,0,6.14,7.04,5.6,0,0,0,0,0,0,0,0,0,0,0,16.58,60.06,99.58,100,50,0.,0.,0.])
x = np.linspace(3.92,161,y.size)
ax[0].plot(x,y)
#get peaks
peaks_indx = argrelextrema(y, np.greater)[0]
peaks = y[peaks_indx]
ax[0].plot(x[peaks_indx],y[peaks_indx],'o')
#Get maxpeak
maxpeak = 0.
for p in peaks_indx:
print(p)
if y[p] > maxpeak:
maxpeak = y[p]
maxpeak_indx = p
#Get mask of data around maxpeak to plot
mincutoff = 0.
indx_to_plot = np.zeros(y.size, dtype=bool)
for i in range(maxpeak_indx):
if y[maxpeak_indx-i] > mincutoff:
indx_to_plot[maxpeak_indx-i] = True
else:
indx_to_plot[maxpeak_indx-i] = True
break
for i in range(y.size-maxpeak_indx):
if y[maxpeak_indx+i] > mincutoff:
indx_to_plot[maxpeak_indx+i] = True
else:
indx_to_plot[maxpeak_indx+i] = True
break
ax[1].plot(x[indx_to_plot],y[indx_to_plot])
plt.show()
The result is then,
UPDATE: Code to plot only the largest peak.
import numpy as np
from scipy.signal import argrelextrema
import matplotlib.pyplot as plt
#Setup and plot data
y = np.array([0,0,0,0,0,6.14,7.04,5.6,0,0,0,0,0,0,
0,0,0,0,0,16.58,60.06,99.58,100,50,0.,0.,0.])
x = np.linspace(3.92,161,y.size)
#get peaks
peaks_indx = argrelextrema(y, np.greater)[0]
peaks = y[peaks_indx]
#Get maxpeak
maxpeak = 0.
for p in peaks_indx:
print(p)
if y[p] > maxpeak:
maxpeak = y[p]
maxpeak_indx = p
#Get mask of data around maxpeak to plot
mincutoff = 0.
indx_to_plot = np.zeros(y.size, dtype=bool)
for i in range(maxpeak_indx):
if y[maxpeak_indx-i] > mincutoff:
indx_to_plot[maxpeak_indx-i] = True
else:
indx_to_plot[maxpeak_indx-i] = True
break
for i in range(y.size-maxpeak_indx):
if y[maxpeak_indx+i] > mincutoff:
indx_to_plot[maxpeak_indx+i] = True
else:
indx_to_plot[maxpeak_indx+i] = True
break
#Plot just the highest peak
plt.plot(x[indx_to_plot],y[indx_to_plot])
plt.show()
I would still suggest plotting both peaks to ensure the algorithm is working correctly... I think you will find that identifying an arbitrary peak is probably not always trivial with messy data.

How to split dataframe according to intersection point in Python?

I am working on a project which is aiming to show difference between good form and bad form of an exercise. To do this we collected the acceleration data with wrist based accelerometer. The image above shows 2 set of a fitness execise (bench press). Each set has 10 repetitions. And the image below shows 10 repetitions of 1 set.I have a raw data set which consist of 10 set of an execises. What I want to do is splitting the raw data to 10 parts which will contain the part between 2 black line in the image above so I can analyze the data easily. My supervisor gave me a starting point which is choosing cutpoint in the each set. He said take a cutpoint, find the first interruption time start cutting at 3 sec before that time and count to 10 and finish cutting.
This an idea that I don't know how to apply. At least, if you can tell how to cut a dataframe according to cutpoint I would be greatful.
Well, I found another way to detect periodic parts of my accelerometer data. So, Here is my code:
import numpy as np
from peakdetect import peakdetect
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from matplotlib import style
from pandas import DataFrame as df
style.use('ggplot')
def get_periodic(path):
periodics = []
data_frame = df.from_csv(path)
data_frame.columns = ['z', 'y', 'x']
if path.__contains__('1'):
if path.__contains__('bench'):
bench_press_1_week = data_frame.between_time('11:24', '11:52')
peak_indexes = get_peaks(bench_press_1_week.y, lookahead=3000)
for i in range(0, len(peak_indexes)):
time_indexes = bench_press_1_week.index.tolist()
start_time = time_indexes[0]
periodic_start = start_time.to_datetime() + dt.timedelta(0, peak_indexes[i] / 100)
periodic_end = periodic_start + dt.timedelta(0, 60)
periodic = bench_press_1_week.between_time(periodic_start.time(), periodic_end.time())
periodics.append(periodic)
return periodics
def get_peaks(data, lookahead):
peak_indexes = []
correlation = np.correlate(data, data, mode='full')
realcorr = correlation[correlation.size / 2:]
maxpeaks, minpeaks = peakdetect(realcorr, lookahead=lookahead)
for i in range(0, len(maxpeaks)):
peak_indexes.append(maxpeaks[i][0])
return peak_indexes
def show_segment_plot(data, periodic_area, exercise_name):
plt.figure(8)
gs = gridspec.GridSpec(7, 2)
ax = plt.subplot(gs[:2, :])
plt.title(exercise_name)
ax.plot(data)
k = 0
for i in range(2, 7):
for j in range(0, 2):
ax = plt.subplot(gs[i, j])
title = "{} {}".format(k + 1, ".Set")
plt.title(title)
ax.plot(periodic_area[k])
k = k + 1
plt.show()
Firstly, this question gave me another perspective for my problem. The image below shows the raw accelerometer data of bench press with 10 sets. Here it has 3 axis(x,y,z) and it's major axis is y(Blue on the image).
I used autocorrelation function for detecting the periodic parts, In the image above every peak represents 1 set of execises. With this peak detection algorithm I found each peak's x-axis value,
In[196]: maxpeaks
Out[196]:
[[16204, 32910.14013671875],
[32281, 28726.95849609375],
[48515, 24583.898681640625],
[64436, 22088.130859375],
[80335, 19582.248291015625],
[96699, 16436.567626953125],
[113081, 12100.027587890625],
[129027, 8098.98486328125],
[145184, 5387.788818359375]]
Basically, each x-value represent samples. My sampling frequency was 100Hz so 16204/100 = 162,04 seconds. To find the time of periodic part I added 162,04 sec to started time. Each bench press took aproximatelly 1 min and in this example, exercise's starting time was 11:24, for first periodic part's start time is 11:26 and ending time is 1 min after. There is some lag but yes best solution that I found is this.

Categories

Resources