I am trying to extract frequency features from EMG Data on python with a sliding window. I do not have too much knowledge on frequency analysis, so I apologize in advance if I've got some wrong concepts.
I am trying to follow the definitions from this website:
MDF
more information
My questions are whether I am performing the calculations right, and if not how can I proceed to calculate them?
Windows_Size = 0.125
Overlap = 0.5
Bins = 256
START, END = df_filtered['Time'].min(), df_filtered['Time'].max()
Fs = 2000 #EMG Sampling Frequency
P = 1.0 / Fs
Windows = np.arange(START + Windows_Size, END, Windows_Size * (1 - Overlap))
FREQ = []
for w in WINDOWS:
win_start, win_end = w - Windows_Size, w
for var in ['Biceps_Femoris_Sq_Correct']:
value = df_filtered.loc[(win_start <= df_filtered['Time']) & (df_filtered['Time'] < win_end), var].values
fft = np.fft.fft(value * np.hamming(value.shape[0]), n=Bins)[1:Bins//2]
freq = np.fft.fftfreq(Bins, P)[1:Bins//2]
amp = np.abs(fft)
energy = amp ** 2 # Is this the right way to calculate power spectrum?
median_freq = energy/2
#mean_freq = EMG power spectrum is divided into two regions with equal
#amplitude... is this right?
mean_freq = np.sum(energy * freq) / np.sum(energy)
Average frequency which is calculated as the sum of product of the EMG power
spectrum, and the frequency divided by the total sum of the power spectrum. Is this the right way to calculate mean frequency?
the MDF should be something like
energy_cumsum = np.cumsum(energy)
MDF = freq[np.where(energy_cumsum>np.max(energy_cumsum)/2)[0][0]]
Related
I originally posted this in physics stack exchange but they requested it be posted here as well....
I am trying to create a known signal with a known wavelength, amplitude, and phase. I then want to break this signal apart into all of its frequencies, find amplitudes, phases, and wavelengths for each frequency, then create equations for each frequency based on these new wavelengths, amplitudes, and phases. In theory, the equations should be identical to the individual signals. However, they are not. I am almost positive it is an issue with phase but I cannot figure out how to resolve it. I will post the exact code to reproduce this below. Please help as my phase, wavelength, and amplitudes will vary once I get more complicated signals so it need to work for any combination of these.
import numpy as np
from matplotlib import pyplot as plt
from scipy import fftpack
# create signal
time_vec = np.arange(1, 11, 1)
wavelength = 1/.1
phase = 0
amp = 10
created_signal = amp * np.sin((2 * np.pi / wavelength * time_vec) + phase)
# plot it
fig, axs = plt.subplots(2, 1, figsize=(10,6))
axs[0].plot(time_vec, created_signal, label='exact_data')
# get fft and freq array
sig_fft = fftpack.fft(created_signal)
sample_freq = fftpack.fftfreq(created_signal.size, d=1)
# do inverse fft and verify same curve as original signal. This is fine!
filtered_signal = fftpack.ifft(sig_fft)
filtered_signal += np.mean(created_signal)
# create individual signals for each frequency
filtered_signals = []
for i in range(len(sample_freq)):
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) < np.nanmin(sample_freq[i])] = 0
high_freq_fft[np.abs(sample_freq) > np.nanmax(sample_freq[i])] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
filtered_sig += np.mean(created_signal)
filtered_signals.append(filtered_sig)
# get phase, amplitude, and wavelength for each individual frequency
sig_size = len(created_signal)
wavelength = []
ph = []
amp = []
indices = []
for j in range(len(sample_freq)):
wavelength.append(1 / sample_freq[j])
indices.append(int(sig_size * sample_freq[j]))
for j in indices:
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
ph.append([phase])
amp.append([np.sqrt((sig_fft[j].real * sig_fft[j].real) + (sig_fft[j].imag * sig_fft[j].imag)) / (sig_size / 2)])
# create an equation for each frequency based on each phase, amp, and wavelength found from above.
def eqn(filtered_si, wavelength, time_vec, phase, amp):
return amp * np.sin((2 * np.pi / wavelength * time_vec) + phase)
def find_equations(filtered_signals_mean, high_freq_fft, wavelength, filtered_signals, time_vec, ph, amp):
equations = []
for i in range(len(wavelength)):
temp = eqn(filtered_signals[i], wavelength[i], time_vec, ph[i], amp[i])
equations.append(temp + filtered_signals_mean)
return equations
filtered_signals_mean = np.abs(np.mean(filtered_signals))
equations = find_equations(filtered_signals_mean, sig_fft, wavelength,
filtered_signals, time_vec, ph, amp)
# at this point each equation, for each frequency should match identically each signal from each frequency,
# however, the phase seems wrong and they do not match!!??
axs[0].plot(time_vec, filtered_signal, '--', linewidth=3, label='filtered_sig_combined')
axs[1].plot(time_vec, filtered_signals[1], label='filtered_sig[-1]')
axs[1].plot(time_vec, equations[1], label='equations[-1]')
axs[0].legend()
axs[1].legend()
fig.tight_layout()
plt.show()
These are issues with your code:
filtered_signal = fftpack.ifft(sig_fft)
filtered_signal += np.mean(created_signal)
This only works because np.mean(created_signal) is approximately zero. The IFFT already takes the DC component into account, the zero frequency describes the mean of the signal.
filtered_signals = []
for i in range(len(sample_freq)):
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) < np.nanmin(sample_freq[i])] = 0
high_freq_fft[np.abs(sample_freq) > np.nanmax(sample_freq[i])] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
filtered_sig += np.mean(created_signal)
filtered_signals.append(filtered_sig)
Here you are, in the first half of the iterations, going through all the frequencies, taking both the negative and positive frequencies into account. For example, when i=1, you take both the -0.1 and the 0.1 frequencies. The second half of the iterations you are applying the IFFT to a zero signal, none of the np.abs(sample_freq) are smaller than zero by definition.
So the filtered_signals[1] contains a sine wave constructed by both the -0.1 and the 0.1 frequency components. This is good. Otherwise it would be a complex-valued function.
for j in range(len(sample_freq)):
wavelength.append(1 / sample_freq[j])
indices.append(int(sig_size * sample_freq[j]))
Here the second half of the indices array contains negative values. Not sure what you were planning with this, but it causes subsequent code to index from the end of the array.
for j in indices:
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
ph.append([phase])
amp.append([np.sqrt((sig_fft[j].real * sig_fft[j].real) + (sig_fft[j].imag * sig_fft[j].imag)) / (sig_size / 2)])
Here, because the indices are not the same as the j in the previous loop, phase[j] doesn't always correspond to wavelength[j], they refer to values from different frequency components in about half the cases. But those cases we shouldn't be evaluating any way. The code assumes a real-valued input, for which the magnitude and phase of only the positive frequencies is sufficient to reconstruct the signal. You should skip all the negative frequencies here.
Next, you build sine waves using the collected information, but using a time_vec that starts at 1, not at 0 as the FFT assumes. And therefore the signal is shifted with respect to the expected value. Furthermore, when phase==0, you should create an even signal (i.e. a cosine, not a sine).
Thus, changing the following two lines of code will create the correct output:
time_vec = np.arange(0, 10, 1)
and
def eqn(filtered_si, wavelength, time_vec, phase, amp):
return amp * np.cos((2 * np.pi / wavelength * time_vec) + phase)
# ^^^
Note that these two changes corrects the plotted graph, but doesn't correct all the issues in the code discussed above.
I solved this finally after 2 days of frustration. I still have no idea why this is the way it is so any insight would be great. The solution is to use the phase produced by arctan2(Im, Re) and modify it according to this equation.
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
formula = ((((wavelength[j]) / 2) - 2) * np.pi) / wavelength[j]
ph.append([phase + formula])
I had to derive this equation from data but I still do not know why this works. Please let me know. Finally!!
I am currently having some issues understanding some discrepancies between the frequency response function calculated through the Z-transform and numpy's FFT algorithm. It is of a simple echo represented by the impulse response:
h[n] = δ[n] + αδ[n-L]
Where α is an attenuation factor for the echo and L is the delay amount in samples. The corresponding transfer function is given by:
H(f) = ( e^(j2πfΔL) + α ) / e^(j2πfΔL)
Where Δ is the sampling period.
I seem to be getting different results using the same number of frequency bins when directly plotting the transfer function magnitude above and when using numpy's fft algorithm.
In particular, the FFT magnitude seems to form an envelope around the overall spectrum - I believe that I should be getting a simple comb filter as that of the transfer function method: imgur
Could anyone clarify why this may be happening and whether I have potentially overlooked anything? Is this due to errors in the DFT algorithms employed?
Appreciate your time, cheers!
import matplotlib.pyplot as pyplt
import numpy as np
fs = 48000 # Sample rate
T = 1/fs # Sample period
L = 3000 # Delay
a = 0.5 # Attenuation factor
# h[n] = dirac[n] + a * dirac[n-L]
h = np.zeros(L)
h[0] = 1
h[L-1] = a
# Transfer function H
freqs = np.arange(0, fs, fs/(2*L))
e = np.exp(1j*freqs*2*np.pi*L*T)
H = (e + a)/(e)
# Transfer function H via FFT - same # of bins
H_FFT = np.fft.fft(h, 2*L)
pyplt.figure()
# Correct comb filter
pyplt.plot(np.abs(H))
# Runing FFT gives a form of strange envelope error
pyplt.plot(np.abs(H_FFT))
pyplt.legend()
Your code is almost OK. What you need to change:
As far as I understand there is no reason the length of Fourier transform vector will be proportional to L. It needs to be the size of your sampling frequency, i.e. fs.
Your L is too high. The result oscillates too quickly. Try with lower L.
Here is a modified code to show how it works, plotted in two different figures for clarity:
import matplotlib.pyplot as plt
import numpy as np
fs = 48000 # Sample rate
T = 1/fs # Sample period
L = 3 # Delay
a = 0.5 # Attenuation factor
# h[n] = dirac[n] + a * dirac[n-L]
h = np.zeros(fs)
h[0] = 1
h[L] = a
# Transfer function H
freqs = np.arange(0, fs)
e = np.exp(1j*freqs*2*np.pi*L*T)
H = (e + a)/(e)
# Transfer function H via FFT - same # of bins
H_FFT = np.fft.fft(h)
# Correct comb filter
plt.figure()
plt.plot(np.abs(H))
# Runing FFT gives a form of strange envelope error
plt.figure()
plt.plot(np.abs(H_FFT))
I have a huge image dataset that does not fit in memory. I want to compute the mean and standard deviation, loading images from disk.
I'm currently trying to use this algorithm found on wikipedia.
# for a new value newValue, compute the new count, new mean, the new M2.
# mean accumulates the mean of the entire dataset
# M2 aggregates the squared distance from the mean
# count aggregates the amount of samples seen so far
def update(existingAggregate, newValue):
(count, mean, M2) = existingAggregate
count = count + 1
delta = newValue - mean
mean = mean + delta / count
delta2 = newValue - mean
M2 = M2 + delta * delta2
return existingAggregate
# retrieve the mean and variance from an aggregate
def finalize(existingAggregate):
(count, mean, M2) = existingAggregate
(mean, variance) = (mean, M2/(count - 1))
if count < 2:
return float('nan')
else:
return (mean, variance)
This is my current implementation (computing just for the red channel):
count = 0
mean = 0
delta = 0
delta2 = 0
M2 = 0
for i, file in enumerate(tqdm(first)):
image = cv2.imread(file)
for i in range(224):
for j in range(224):
r, g, b = image[i, j, :]
newValue = r
count = count + 1
delta = newValue - mean
mean = mean + delta / count
delta2 = newValue - mean
M2 = M2 + delta * delta2
print('first mean', mean)
print('first std', np.sqrt(M2 / (count - 1)))
This implementation works close enough on a subset of the dataset I tried.
The problem is that it is extremely slow and therefore nonviable.
Is there a standard way of doing this?
How can I adapt this for faster result or compute the RGB mean and standard deviation for all the dataset without loading it all in memory at the same time and at reasonable speed?
Since this is a numerically heavy task (a lot of iterations around a matrix, or a tensor), I always suggest to use libraries that are good at this: numpy.
A properly installed numpy should be able to utilize the underlying BLAS (Basic Linear Algebra Subroutines) routines which are optimized for operating an array of floating points from the memory hierarchy perspective.
imread should already give you the numpy array. You can get the reshaped 1d array of the image of the red channel by
import numpy as np
val = np.reshape(image[:,:,0], -1)
the mean of such by
np.mean(val)
and the standard deviation by
np.std(val)
In this way, you can get rid of two layers of python loops:
count = 0
mean = 0
delta = 0
delta2 = 0
M2 = 0
for i, file in enumerate(tqdm(first)):
image = cv2.imread(file)
val = np.reshape(image[:,:,0], -1)
img_mean = np.mean(val)
img_std = np.std(val)
...
The rest of the incremental update should be straightforward.
Once you have done this, the bottleneck will become the image loading speed, which is limited by disk read operation performance. For that regard, I suspect using multi-thread as others suggested will help much based on my prior experience.
You can use also opencv's method meanstddev.
cv2.meanStdDev(src[, mean[, stddev[, mask]]]) → mean, stddev
If you don't want to put things into a memory with an array containing your entire dataset, you can just compute it iteratively
# can be whatever I just made this number up
global_mean = 134.0
# just get a per-pixel array with the vals for (x_i - mu) ** 2 / |x|
sums = ((images[0] - global_mean) ** 2) / len(images)
for img in images[1:]:
sums = sums + ((img - global_mean) ** 2) / len(images)
# Get mean of all per-pixel variances, and then take sqrt to get std
dataset_std = np.sqrt(np.mean(sums))
I am trying to get the power at a particular frequency, with an RTL-SDR. I'm adapting FFT examples I've found online. Abbreviated code here (removed superfluous stuff):
import matplotlib.mlab as mlab
import rtlsdr
NFFT=1024
dwell = 0.016
sample_rate = 2.4e6
offset = 200e3
freq = 100e6
sdr = rtlsdr.RtlSdr(0)
sdr.set_sample_rate(sample_rate)
sdr.set_manual_gain_enabled(1)
sdr.set_gain(22.9)
sdr.freq_correction = 0
numsamples = next_2_to_pow(int(dwell * sample_rate))
freq = freq - offset # avoid dc spike
sdr.set_center_freq(freq)
samples = sdr.read_samples(numsamples)
powers, freqs = mlab.psd(samples, NFFT=NFFT, Fs=sample_rate/1e6, window=hamming(NFFT))
bin_offset = int(offset / (sample_rate / NFFT))
freq = int(freq + offset)
pwr = float( "{:.2f}".format(10 * math.log10( powers[ int(len(powers)/2) + bin_offset ] )) )
This simply doesn't work. The power at the frequency stays about the same (noise floor level), even when I inject a signal.
I have two theories about why this doesn't work. 1) RTL-SDRs return I/Q data, and this method doesn't account for that (?) 2) My understanding of the FFT is simply not sound enough to perform it right.
Which is it? How can I fix it?
Have you tried using rtl_power then just picking the right bin in the output? It's a relatively recent addition to the rtl_sdr suite. In C, not Python. https://github.com/osmocom/rtl-sdr
This is my second attempt at implementing gradient descent in one variable and it always diverges. Any ideas?
This is simple linear regression for minimizing the residual sum of squares in one variable.
def gradient_descent_wtf(xvalues, yvalues):
tolerance = 0.1
#y=mx+b
#some line to predict y values from x values
m=1.
b=1.
#a predicted y-value has value mx + b
for i in range(0,10):
#calculate y-value predictions for all x-values
predicted_yvalues = list()
for x in xvalues:
predicted_yvalues.append(m*x + b)
# predicted_yvalues holds the predicted y-values
#now calculate the residuals = y-value - predicted y-value for each point
residuals = list()
number_of_points = len(yvalues)
for n in range(0,number_of_points):
residuals.append(yvalues[n] - predicted_yvalues[n])
## calculate the residual sum of squares from the residuals, that is,
## square each residual and add them all up. we will try to minimize
## the residual sum of squares later.
residual_sum_of_squares = 0.
for r in residuals:
residual_sum_of_squares += r**2
print("RSS = %s" % residual_sum_of_squares)
##
##
##
#now make a version of the residuals which is multiplied by the x-values
residuals_times_xvalues = list()
for n in range(0,number_of_points):
residuals_times_xvalues.append(residuals[n] * xvalues[n])
#now create the sums for the residuals and for the residuals times the x-values
residuals_sum = sum(residuals)
residuals_times_xvalues_sum = sum(residuals_times_xvalues)
# now multiply the sums by a positive scalar and add each to m and b.
residuals_sum *= 0.1
residuals_times_xvalues_sum *= 0.1
b += residuals_sum
m += residuals_times_xvalues_sum
#and repeat until convergence.
#convergence occurs when ||sum vector|| < some tolerance.
# ||sum vector|| = sqrt( residuals_sum**2 + residuals_times_xvalues_sum**2 )
#check for convergence
magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
if magnitude_of_sum_vector < tolerance:
break
return (b, m)
Result:
gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567])
RSS = 370433.0
RSS = 300170125.7
RSS = 4.86943013045e+11
RSS = 7.90447409339e+14
RSS = 1.28312217794e+18
RSS = 2.08287421094e+21
RSS = 3.38110045417e+24
RSS = 5.48849288217e+27
RSS = 8.90939341376e+30
RSS = 1.44624932026e+34
Out[108]:
(-3.475524066284303e+16, -2.4195981188763203e+17)
The gradients are huge -- hence you are following large vectors for long distances (0.1 times a large number is large). Find unit vectors in the appropriate direction. Something like this (with comprehensions replacing your loops):
def gradient_descent_wtf(xvalues, yvalues):
tolerance = 0.1
m=1.
b=1.
for i in range(0,10):
predicted_yvalues = [m*x+b for x in xvalues]
residuals = [y-y_hat for y,y_hat in zip(yvalues,predicted_yvalues)]
residual_sum_of_squares = sum(r**2 for r in residuals) #only needed for debugging purposes
print("RSS = %s" % residual_sum_of_squares)
residuals_times_xvalues = [r*x for r,x in zip(residuals,xvalues)]
residuals_sum = sum(residuals)
residuals_times_xvalues_sum = sum(residuals_times_xvalues)
# (residuals_sum,residual_times_xvalues_sum) is a vector which points in the negative
# gradient direction. *Find a unit vector which points in same direction*
magnitude = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
residuals_sum /= magnitude
residuals_times_xvalues_sum /= magnitude
b += residuals_sum * (0.1)
m += residuals_times_xvalues_sum * (0.1)
#check for convergence -- this needs work!
magnitude_of_sum_vector = (residuals_sum**2 + residuals_times_xvalues_sum**2)**0.5
if magnitude_of_sum_vector < tolerance:
break
return (b, m)
For example:
>>> gradient_descent_wtf([1,2,3,4,5,6,7,8,9,10],[6,23,8,56,3,24,234,76,59,567])
RSS = 370433.0
RSS = 368732.1655050716
RSS = 367039.18363896786
RSS = 365354.0543519137
RSS = 363676.7775934381
RSS = 362007.3533123621
RSS = 360345.7814567845
RSS = 358692.061974069
RSS = 357046.1948108295
RSS = 355408.17991291644
(1.1157111313023558, 1.9932828425473605)
which is certainly much more plausible.
It isn't a trivial matter to make a numerically stable gradient-descent algorithm. You might want to consult a decent textbook in numerical analysis.
First, Your code is right.
But you should consider something about math when you do linear regression.
For example, the residual is -205.8 and your learning rate is 0.1 so you will get a huge descent step -25.8.
It's a so large step that you can't go back to the correct m and b. You have to make your step small enough.
There are two ways to make gradient descent step reasonable:
initialize a small learning rate, such as 0.001 and 0.0003.
Divide your step by the total amount of your input values.