Calculating event / stimulus triggered averages efficiently in Python - python

I would like to calculate event / stimulus triggered averages computationally efficient.
Assuming I have got a signal, e.g.
signal = [random.random() for i in xrange(0, 1000)]
with n_signal datapoints
n_signal = len(signal)
I know that this signal is sampled with a rate of
Fs = 25000 # Hz
In this case I know that the total time of the signal
T_sec = n_signal / float(Fs)
At specific times, certain events occur, e.g.
t_events = [0.01, 0.017, 0.018, 0.022, 0.034, 0.0345, 0.03456]
Now I would like to find the signal from a certain time before these events, e.g.
t_bef = 0.001
until a certain time after these events, e.g.
t_aft = 0.002
And once I have got all of these chunks of the signal, I would like to average these.
In the past I would have created the time vector of the signal
t_signal = numpy.linspace(0, T_sec, n_signal)
and looked for all of the indices for t_events in t_signal e.g. using numpy.serachsorted
(Link)
Since I know the sampling rate of the signal, these can be done much quicker, like
indices = [int(i * Fs) for i in t_events]
This saves me the memory for t_signal and I do not have to go through the whole signal to find my indices.
Next, I would determine how many data samples t_bef and t_aft are corresponding to
nsamples_t_bef = int(t_bef * Fs)
nsamples_t_aft = int(t_aft * Fs)
and I would save the signal chunks in a list
signal_chunks = list()
for i in xrange(0, len(t_events)):
signal_chunks.append(signal[indices[i] - nsamples_t_bef : indices[i] + nsamples_t_aft])
And finally I am averaging these
event_triggered_average = numpy.mean(signal_chunks, axis = 0)
If I am interested in the time vector, I am calculating it with
t_event_triggered_average = numpy.linspace(-t_signal[nsamples_t_bef], t_signal[nsamples_t_aft], nsamples_t_bef + nsamples_t_aft)
Now my questions: Is there a computational more efficient way to do this? If I have got a signal with many data points and many events, this computation can take a while. Is a list the best data structure to save these chunks in? Do you know how to get the chunks of data quicker? Maybe using buffer?
Thanks in advance for your comments and advice.
Minimum working example
import numpy
import random
random.seed(0)
signal = [random.random() for i in xrange(0, 1000)]
# sampling rate
Fs = 25000 # Hz
# total time of the signal
n_signal = len(signal)
T_sec = n_signal / float(Fs)
# time of events of interest
t_events = [0.01, 0.017, 0.018, 0.022, 0.034, 0.0345, 0.03456]
# and their corresponding indices
indices = [int(i * Fs) for i in t_events]
# define the time window of interest around each event
t_bef = 0.001
t_aft = 0.002
# and the corresponding index offset
nsamples_t_bef = int(t_bef * Fs)
nsamples_t_aft = int(t_aft * Fs)
# vector of signal times
t_signal = numpy.linspace(0, T_sec, n_signal)
signal_chunks = list()
for i in xrange(0, len(t_events)):
signal_chunks.append(signal[indices[i] - nsamples_t_bef : indices[i] + nsamples_t_aft])
# average signal value across chunks
event_triggered_average = numpy.mean(signal_chunks, axis = 0)
# not sure what's going on here
t_event_triggered_average = numpy.linspace(-t_signal[nsamples_t_bef],
t_signal[nsamples_t_aft],
nsamples_t_bef + nsamples_t_aft)

Since your signal is defined on a regular grid, you could do some arithmetic to find indices for all the samples that you require. Then you can construct the array with chunks using a single indexing operation.
import numpy as np
# Making some test data
n_signal = 1000
signal = np.random.rand(n_signal)
Fs = 25000 # Hz
t_events = np.array([0.01, 0.017, 0.018, 0.022, 0.034, 0.0345, 0.03456])
# Preferences
t_bef = 0.001
t_aft = 0.002
# The number of samples in a chunk
nsamples = int((t_bef+t_aft) * Fs)
# Create a vector from 0 up to nsamples
sample_idx = np.arange(nsamples)
# Calculate the index of the first sample for each chunk
# Require integers, because it will be used for indexing
start_idx = ((t_events - t_bef) * Fs).astype(int)
# Use broadcasting to create an array with indices
# Each row contains consecutive indices for each chunk
idx = start_idx[:, None] + sample_idx[None, :]
# Get all the chunks using fancy indexing
signal_chunks = signal[idx]
# Calculate the average like you did earlier
event_triggered_average = signal_chunks.mean(axis=0)
Note, the line with .astype(int) does not round to nearest integer, but rounds towards zero.

Related

How to compute Dimitrov spectral fatigue index in Python?

I have an already existing python script to calculate mean power frequency. Additionally I would like to calculate dimitrovs spectral fatigue index. Its formula slightly differs from mean power frequency instead of using a ratio of the moments of 1 and 0 it uses the moments of -1 and 5. I thought, I could simply add the moments of interest, but I get only inf-values in the renamed FInms5 function
[mean power frequency]
[dimitrov spectral fatigue index]
This is the working mean freq function. I changed the last line to:
mean_freq[i] = np.dot(P,f.t-1)/np.dot(P,f.t5)
from scipy.signal import periodogram
def get_mean_freq(emg_sig, sfreq, epoch_duration = 0.5):
'''
Parameters
----------
emg_sig : array
pre-filtered emg data.
sfreq : int
emg sampling frequency, in Hz.
epoch_duration : float
epoch (time window) duration, in seconds.
Returns
-------
mean_freq: array
mean frequency at each epoch
time_points: array
time point at the center of each evaluated epoch
samples: array
sample numbers at the center of each evaluated epoch
Method according to:
https://stackoverflow.com/questions/37922928/difference-in-mean-frequency-in-python-and-matlab
'''
ons = range(
0,
len(emg_sig),
int(epoch_duration*sfreq)
)
mean_freq = np.empty((len(ons),))
samples =np.empty((len(ons),))
time_points = np.empty((len(ons),))
for i,on in enumerate(ons):# i,on = 0,ons[0]
off = ons[i+1]-1 if i+1<len(ons) else len(emg_sig)
# print([on,off])
processing_window = emg_sig[on:off]
mid_point = (on + off) / 2
samples[i] = mid_point
# t_epoch[ch] += [mid_point / sfreq]
time_points[i] = mid_point / sfreq
# Processing window power spectrum (PSD) generation
f, Pxx_den = periodogram(np.array(processing_window), fs=float(sfreq))# plot_freq_spectrum(f, Pxx_den)
Pxx_den = np.reshape(Pxx_den, (1,-1))
width = np.tile(f[2]-f[0], (1, Pxx_den.shape[2]))
f = np.reshape(f, (1,-1))
P = Pxx_den*width
pwr = np.sum(P)
mean_freq[i] = np.dot(P, f.T)/pwr
return mean_freq, time_points, samples
This is the working mean freq function. I changed the last line to:
mean_freq[i] = np.dot(P,f.t-1)/np.dot(P,f.t5)
[1]: https://i.stack.imgur.com/He32L.png
[2]: https://i.stack.imgur.com/o6gGD.png

Fourier transform and Full Width Half Maximum

I'm trying to calculate the Fourier transform of three muon polarization signals, which are simply cosine functions multiplied by an exponential decay.
So, doing the Fourier transform, we are going to see broadened peaks centered at the corresponding frequency.
The problem is that I have already tried to do the Fourier transform, but I do not know if it's correct; furthermore, I'm trying to calculate the FWHM using the scipy.stats.moment function, using the 2-nd moment: is it correct?
Can you tell me if the code is correct?
I put here the three signals in .npy file and the code used for the Fourier analysis.
The signals are signal[0], signal[1] and signal[2], arrays of 10 dimension.
Each signal[k] contains 10 polarization functions (1 for each applied magnetic field), which are signals of 400 points.
The corresponding files are signal_100, signal_110, signal_111, provided here:
https://github.com/JonathanFrassineti/UNDI-examples.
Ah, the frequencies range from 0 Hz to 40 MHz.
Thank you!
N = 400 # Number of signal points.
N1 = 40000000
T = 1./800. # Sampling spacing.
xf = np.fft.rfftfreq(N1, T)
yf1 = FWHM1 = sigma1 = delta1 = bhar1 = np.zeros(fields, dtype = object)
yf2 = FWHM2 = sigma2 = delta2 = bhar2 = np.zeros(fields, dtype = object)
yf3 = FWHM3 = sigma3 = delta3 = bhar3 = np.zeros(fields, dtype = object)
for j in range(fields):
# Fourier transform.
yf1[j] = np.fft.rfft(signal[0][j])
yf2[j] = np.fft.rfft(signal[1][j])
yf3[j] = np.fft.rfft(signal[2][j])
FWHM1[j] = moment(yf1[j], moment=2)
FWHM2[j] = moment(yf2[j], moment=2)
FWHM3[j] = moment(yf3[j], moment=2)
sigma1[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
sigma2[j] = np.sqrt(np.abs(FWHM2[j]))/2.355
sigma3[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
delta1[j] = sigma1[j]/gamma_Cu
delta2[j] = sigma2[j]/gamma_Cu
delta3[j] = sigma3[j]/gamma_Cu
bhar1[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta1[j]
bhar2[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta2[j]
bhar3[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta3[j]
Currently i work in a python project with same object. I've a set of data of magnetic field B(x,y,z), i think ideal would be to organize your data periodically at event and deduce Fe (sampling_rate).
f(A, t)=A*( cos(2*pi*fe*t) - sin(2*pi*fe*t)
B=[ 50, 50, 10, 3 ] # where each data is |B| normal at second
res=[ f(a, time) for time, a in enumerate(B) ]
fourrier_transform=np.fft.fft( res )
frequency= fftfreq([ time for time in range(len(B)) ]) # U can use fftfreq provide by scipy
Please star this project, research ressource to contribute
RFSignalToolkit github project

BPM detection from pyAudioAnalysis is producing the wrong number of beats for any signal

Any help would be much appreciated. I am trying to extract the BPM from any .wav file that is loaded onto the python script by using the pyAudioAnalysis library. For some reason it is not outputting the correct BPM? I tried to change the window size in the beat_extraction() function but it only allows numbers under 1 second and when I change the window size, the BPM seem to change. But when kept at 1 second window it outputs 30 every time.
The following is my code:
from pyAudioAnalysis import audioBasicIO
from pyAudioAnalysis import ShortTermFeatures
from pyAudioAnalysis import MidTermFeatures
import matplotlib.pyplot as plt
import numpy as np
import warnings
file_name = "unity_alan.wav"
# Extract the Sampling Rate (Fs) and the Raw Signal Data (signal)
[Fs, signal] = audioBasicIO.read_audio_file(file_name)
# Uncomment if signal has two channels
# signal = signal[:,0]
# Function to Normalize the Signal
def normalize_signal(signal):
signal = np.double(signal)
return (signal - signal.mean()) / ((np.abs(signal)).max() + 0.0000000001)
# Short Term Features
# For each short-term window a set of features is extracted. This would result in a sequence of feature
# vectors stored in a np matrix (in this case Features_midTerm)
signal = normalize_signal(signal)
# Fs - frequency sampling Rate
print("The Sampling Freq: ",Fs)
# Total Signal Len
signal_len = len(signal)
print("Total Signal Length: ",signal_len)
# Total Time Len of Song in Seconds
len_of_song = signal_len / Fs
print("Total Song Time (s): ",len_of_song)
# Window Size in mseconds
windowSize_in_ms = 50
windowSize_in_s = windowSize_in_ms/1000
# Window Size in Samples
windowSize_in_samples = Fs * (windowSize_in_ms / 1000) #divide by 1000 to turn to seconds
print("Window Size (samples): ",windowSize_in_samples)
# Window Step in mseconds
wStep_in_ms = 25
# Window Step in Samples
wStep_in_samples = Fs * (wStep_in_ms / 1000) #divide by 1000 to turn to seconds
print("Window Step in Samples: ", wStep_in_samples)
# Oversampling Percentage
oversampling_Percentage = (wStep_in_ms / windowSize_in_ms) * 100
print("Oversampling Percentage (overlap of windows): ",oversampling_Percentage)
# Total Number of Windows in Signal
total_number_windows = signal_len / windowSize_in_samples
print("Total Number of Windows in Signal: ",total_number_windows)
# Total number of feature samples produced (windows/size * total windows)
feature_samples_points_total_cal = int(total_number_windows * (windowSize_in_ms/wStep_in_ms))
print("Calculated Total of Points Produced per Short Term Feature: ",feature_samples_points_total_cal)
# Extract features and their names. Each index has its own vector of features. The total number should be the same
# as the calculated total of points produced per feature.
Features_shortTerm, feature_names = ShortTermFeatures.feature_extraction(signal, Fs, windowSize_in_samples, wStep_in_samples)
# Exact Number of points in the Features
feature_samples_points_total_exact = len(Features_shortTerm[0])
print("Exact Total of Points Produced per Short Term Feature: ",feature_samples_points_total_exact)
# Mid-window (in seconds)
mid_window_seconds = int(1 * Fs)
# Mid-step (in seconds)
mid_step_seconds = int(1 * Fs)
# MID FEATURE Extraction
Features_midTerm, short_Features_ignore, mid_feature_names = MidTermFeatures.mid_feature_extraction(signal,Fs,mid_window_seconds,mid_step_seconds,windowSize_in_samples,wStep_in_samples)
# Exact Mid-Term Feature Total Number of Points
midTerm_features_total_points = len(Features_midTerm)
print("Exact Mid-Term Total Number of Feature Points: ",midTerm_features_total_points)
# Beats per min
# The Tempo of music determins the speed at which it is played (measured in BPM)
bpm,confidence_ratio = MidTermFeatures.beat_extraction(Features_shortTerm,1)
print("Beats per Minute (bpm): ",bpm)
print("Confidence ratio for BPM: ", confidence_ratio)
# Figure out why the BPM does not match the actual reading
# of 115 BPM. It is showing 30 BPM which is for sure wrong.
The output of my script is as follows:
The Sampling Freq: 48000
Total Signal Length: 10992884
Total Song Time (s): 229.01841666666667
Window Size (samples): 2400.0
Window Step in Samples: 1200.0
Oversampling Percentage (overlap of windows): 50.0
Total Number of Windows in Signal: 4580.368333333333
Calculated Total of Points Produced per Short Term Feature: 9160
Exact Total of Points Produced per Short Term Feature: 9159
Exact Mid-Term Total Number of Feature Points: 136
Beats per Minute (bpm): 30.0
Confidence ratio for BPM: 1.0
The library's def of the function is as follows:
def beat_extraction(short_features, window_size, plot=False):
"""
This function extracts an estimate of the beat rate for a musical signal.
ARGUMENTS:
- short_features: a np array (n_feats x numOfShortTermWindows)
- window_size: window size in seconds
RETURNS:
- bpm: estimates of beats per minute
- ratio: a confidence measure
"""
# Features that are related to the beat tracking task:
selected_features = [0, 1, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18]
max_beat_time = int(round(2.0 / window_size))
hist_all = np.zeros((max_beat_time,))
# for each feature
for ii, i in enumerate(selected_features):
# dif threshold (3 x Mean of Difs)
dif_threshold = 2.0 * (np.abs(short_features[i, 0:-1] -
short_features[i, 1::])).mean()
if dif_threshold <= 0:
dif_threshold = 0.0000000000000001
# detect local maxima
[pos1, _] = utilities.peakdet(short_features[i, :], dif_threshold)
position_diffs = []
# compute histograms of local maxima changes
for j in range(len(pos1)-1):
position_diffs.append(pos1[j+1]-pos1[j])
histogram_times, histogram_edges = \
np.histogram(position_diffs, np.arange(0.5, max_beat_time + 1.5))
hist_centers = (histogram_edges[0:-1] + histogram_edges[1::]) / 2.0
histogram_times = \
histogram_times.astype(float) / short_features.shape[1]
hist_all += histogram_times
if plot:
plt.subplot(9, 2, ii + 1)
plt.plot(short_features[i, :], 'k')
for k in pos1:
plt.plot(k, short_features[i, k], 'k*')
f1 = plt.gca()
f1.axes.get_xaxis().set_ticks([])
f1.axes.get_yaxis().set_ticks([])
if plot:
plt.show(block=False)
plt.figure()
# Get beat as the argmax of the agregated histogram:
max_indices = np.argmax(hist_all)
bpms = 60 / (hist_centers * window_size)
bpm = bpms[max_indices]
# ... and the beat ratio:
ratio = hist_all[max_indices] / hist_all.sum()
if plot:
# filter out >500 beats from plotting:
hist_all = hist_all[bpms < 500]
bpms = bpms[bpms < 500]
plt.plot(bpms, hist_all, 'k')
plt.xlabel('Beats per minute')
plt.ylabel('Freq Count')
plt.show(block=True)
return bpm, ratio

Not able to recreate same sound using FFT

I am trying to recreate musical note using top 10 frequencies returned by Fourier Transform (FFT). Resulting sound does not match the original sound. Not sure if I am not finding frequencies correctly or not generating sound from it correctly. The goal of this code is to match the original sound.
Here is my code:
import numpy as np
from scipy.io import wavfile
from scipy.fftpack import fft
import matplotlib.pyplot as plt
i_framerate = 44100
fs, data = wavfile.read('./Flute.nonvib.ff.A4.stereo.wav') # load the data
def findFrequencies(arr_data, i_framerate = 44100, i_top_n =5):
a = arr_data.T[0] # this is a two channel soundtrack, I get the first track
# b=[(ele/2**8.)*2-1 for ele in a] # this is 8-bit track, b is now normalized on [-1,1)
y = fft(a) # calculate fourier transform (complex numbers list)
xf = np.linspace(0,int(i_framerate/2.0),int((i_framerate/2.0))+1) /2 # Need to find out this last /2 part
yf = np.abs(y[:int((i_framerate//2.0))+1])
plt.plot(xf,yf)
yf_top_n = np.argsort(yf)[-i_top_n:][::-1]
amp_top_n = yf[yf_top_n] / np.max(yf[yf_top_n])
freq_top_n = xf[yf_top_n]
return freq_top_n, amp_top_n
def createSoundData(a_freq, a_amp, i_framerate=44100, i_time = 1, f_amp = 1000.0):
n_samples = i_time * i_framerate
x = np.linspace(0,i_time, n_samples)
y = np.zeros(n_samples)
for i in range(len(a_freq)):
y += np.sin(2 * np.pi * a_freq[i] * x)* f_amp * a_amp[i]
data2 = np.c_[y,y] # 2 Channel sound
return data2
top_freq , top_freq_amp = findFrequencies(data, i_framerate = 44100 , i_top_n = 200)
print('Frequencies: ',top_freq)
print('Amplitudes : ',top_freq_amp)
soundData = createSoundData(top_freq, top_freq_amp,i_time = 2, f_amp = 50 / len(top_freq))
wavfile.write('createsound_A4_v6.wav',i_framerate,soundData)
The top 10 spectral frequencies in a musical note are not the same as the center frequencies of the top 10 FFT result bin magnitudes. The actual frequency peaks can be between the FFT bins.
Not only can the frequency peak information be between FFT bins, but the phase information required to reproduce any note transients (attack, decay, etc.) can also be between bins. Spectral information that is between FFT bins is carried by a span (up to the full width) of the complex FFT result.

Python improving function speed

I am coding my own script to calculate relation between two signals. Therefore I use the mlab.csd and mlab.psd functions to compute the CSD and PSD of the signals.
My array x is in the shape of (120,68,68,815). My script runs several minutes and this function is the hotspot for this high amount of time.
Anyone any idea what I should do? I am not that familiar with script performance increasing. Thanks!
# to read the list of stcs for all the epochs
with open('/home/daniel/Dropbox/F[...]', 'rb') as f:
label_ts = pickle.load(f)
x = np.asarray(label_ts)
nfft = 512
n_freqs = nfft/2+1
n_epochs = len(x) # in this case there are 120 epochs
channels = 68
sfreq = 1017.25
def compute_mean_psd_csd(x, n_epochs, nfft, sfreq):
'''Computes mean of PSD and CSD for signals.'''
Rxy = np.zeros((n_epochs, channels, channels, n_freqs), dtype=complex)
Rxx = np.zeros((n_epochs, channels, channels, n_freqs))
Ryy = np.zeros((n_epochs, channels, channels, n_freqs))
for i in xrange(0, n_epochs):
print('computing connectivity for epoch %s'%(i+1))
for j in xrange(0, channels):
for k in xrange(0, channels):
Rxy[i,j,k], freqs = mlab.csd(x[j], x[k], NFFT=nfft, Fs=sfreq)
Rxx[i,j,k], _____ = mlab.psd(x[j], NFFT=nfft, Fs=sfreq)
Ryy[i,j,k], _____ = mlab.psd(x[k], NFFT=nfft, Fs=sfreq)
Rxy_mean = np.mean(Rxy, axis=0, dtype=np.float32)
Rxx_mean = np.mean(Rxx, axis=0, dtype=np.float32)
Ryy_mean = np.mean(Ryy, axis=0, dtype=np.float32)
return freqs, Rxy, Rxy_mean, np.real(Rxx_mean), np.real(Ryy_mean)
Something that could help, if the csd and psd methods are computationally intensive. There are chances that you could probably simply cache the results of previous calls and get it instead of calculating multiple times.
As it seems, you will have 120 * 68 * 68 = 591872 cycles.
In the case of the psd calculation, it should be possible to cache the values without problem has the method only depend on one parameter.
Store the value inside a dict for the x[j] or x[k] check if the value exists. If the value doesn't exist, compute it and store it. If the value exists, simply skip the value and reusue the value.
if x[j] not in cache_psd:
cache_psd[x[j]], ____ = mlab.psd(x[j], NFFT=nfft, Fs=sfreq)
Rxx[i,j,k] = cache_psd[x[j]]
if x[k] not in cache_psd:
cache_psd[x[k]], ____ = mlab.psd(x[k], NFFT=nfft, Fs=sfreq)
Ryy[i,j,k] = cache_psd[x[k]]
You can do the same with the csd method. I don't know enough about it to say more. If the order of the parameter doesn't matter, you can store the two parameter in a sorted order to prevent duplicates such as 2, 1 and 1, 2.
The use of the cache will make the code faster only if the memory access time is lower than the computation time and storing time. This fix could be easily added with a module that does memoization.
Here's an article about memoization for further reading:
http://www.python-course.eu/python3_memoization.php

Categories

Resources