The problem is straightforward but I am beginner in python and stuck on the optimal way to implement this:
I have a .txt file that contains time (s) and frequency.
I want a visualization of this data as a dot moving up and down a vertical axis.
I need the dot to move at the corresponding time stamp of the frequency, since I plan to output an mp4 file and sync up the animation to the original sound file.
The max and min of the y axis would be those of the frequency in the file.
By "moving" I think making the dot appear and disappear before the next one should work, so it's not continuous.
Below is what I have so far:
import matplotlib.pyplot as plt
import matplotlib.animation
from matplotlib.colors import LinearSegmentedColormap
# Import Data
time = np.loadtxt("sentence 1.txt", usecols=0, skiprows=1, dtype=float)
print(time)
print(time.shape)
f0 = np.loadtxt("sentence 1.txt", usecols=1, skiprows=1, dtype=float)
print(f0)
print(f0.shape)
# These are the indices of onset times of syllables
# time[0], time[10], time[22], time[34], time[85], time[100]
onset = [0,10,22,34,85,100]
# Do f0 averages for each syllable and create an array
def utterances(array):
f0_mean = []
end = 0
for i in range(len(array)):
if i == len(array)-1:
end = len(f0) - 1
else:
end = array[i+1]-1
avg = np.mean(f0[array[i]:end])
f0_mean.append(avg)
return f0_mean
utterance = utterances(onset)
print(utterance)
print(min(f0), max(f0))
Is there a way to handle specific times in seconds (for example [0.15 0.160714 0.171429 0.182143 0.192857 0.203571 0.214286 0.225
0.235714 0.246429]) in the matplotlib.animation.FuncAnimation()?
Related
I wish to plot 1 or 2 periods of a sine wave from csv data obtained from an oscilloscope.
the DF columns are
I have managed to produce something workable but it requires manual work for each new set of csv data. I am sure there is a better way.
The code I have finds the start and end times of one period by obtaining the row indexes where the amplitude is greater than zero but less than some manually inputted value obtained by trial and error. The thinking is that a sine wave willcross the x axis at the start and end of a period.
import pandas as pd
import matplotlib.pyplot as plot
import numpy as np
df = pd.read_csv("tek0000.csv", skiprows=20)
Sin_wave_vals = np.where((df['CH1'] >0)&(df['CH1'] <0.021))
// gives output (array([ 355, 604, 730, 1230, 1480, 1604, 1730, 2980, 3604, 3854, 4979,
5230, 5980, 7730, 9355, 9980]),)
time_df = df['TIME']
# select the rows from the df for a single period
time_df =time_df.iloc[Sin_wave_vals[0][0]:Sin_wave_vals[0][1]]
amplitude_ch1 = df.iloc[355:604,1]
plot.plot(time, amplitude_ch1)
plot.title('Sine wave')
plot.xlabel('Time')
plot.ylabel('Amplitude = sin(time)')
plot.grid(True, which='both')
plot.axhline(y=0, color='k')
plot.show()
This works ok and plots what I require, but will work out too manual as I have about 20 of these to plot. I tried to obtain the upper limit by using the following
upper_lim=min(filter(lambda x: x > 0, df['CH1']))
# returns 0.02
Sin_wave_vals = np.where((df['CH1'] >0)&(df['CH1'] <upper_lim))
However, this did not work out how I intended..
#Tim Roberts that works well, thank you.
This function works for a clear wave, with noise and a low amplitude there is an issue as the wave can pass the x axis multiple times in a period.
def Plot_wave(df,title, periods, time_col_iloc, amp_col_iloc):
## the time and columns location in DF are final 2 arguments and ahould be int
zero_crossings = np.where(np.diff(np.sign(df.iloc[:,1])))[0]
if periods == 1:
time = df.iloc[zero_crossings[0]:zero_crossings[2],[time_col_iloc]]
amplitude = df.iloc[zero_crossings[0]:zero_crossings[2],amp_col_iloc]
if periods == 2:
time = df.iloc[zero_crossings[0]:zero_crossings[4],[0]]
amplitude = df.iloc[zero_crossings[0]:zero_crossings[4],1]
if periods > 2:
return("please enter period of 1 OR 2")
plot.plot(time, amplitude)
plot.title(title)
plot.xlabel('Time')
plot.ylabel('Amplitude = sin(time)')
plot.grid(True, which='both')
plot.axhline(y=0, color='k')
plot.show()
I'm trying to plot the frequencies that make up the first 1 second of a voice recording.
My approach was to:
Read the .wav file as a numpy array containing time series data
Slice the array from [0:sample_rate-1], given that the sample rate has units of [samples/1 second], which implies that sample_rate [samples/seconds] * 1 [seconds] = sample_rate [samples]
Perform a fast fourier transform (fft) on the time series array in order to get the frequencies that make up that time-series sample.
Plot the the frequencies on the x-axis, and amplitude on the y-axis. The frequency domain would range from 0:(sample_rate/2) since the Nyquist Sampling Theorem tells us that the recording captured frequencies of at least two times the maximum frequency, i.e 2*max(frequency). I'll also slice the frequency output array in half since the output frequency data is symmetrical
Here is my implementation
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import fft
from scipy.io import wavfile
sample_rate, audio_time_series = wavfile.read(audio_path)
single_sample_data = audio_time_series[:sample_rate]
def fft_plot(audio, sample_rate):
N = len(audio) # Number of samples
T = 1/sample_rate # Period
y_freq = fft(audio)
domain = len(y_freq) // 2
x_freq = np.linspace(0, sample_rate//2, N//2)
plt.plot(x_freq, abs(y_freq[:domain]))
plt.xlabel("Frequency [Hz]")
plt.ylabel("Frequency Amplitude |X(t)|")
return plt.show()
fft_plot(single_sample_data, sample_rate)
This is the plot that it generated
However, this is incorrect, my spectrogram tells me I should have frequency peaks below the 5kHz range:
In fact, what this plot is actually showing, is the first second of my time series data:
Which I was able to debug by removing the absolute value function from y_freq when I plot it, and entering the entire audio signal into my fft_plot function:
...
sample_rate, audio_time_series = wavfile.read(audio_path)
single_sample_data = audio_time_series[:sample_rate]
def fft_plot(audio, sample_rate):
N = len(audio) # Number of samples
y_freq = fft(audio)
domain = len(y_freq) // 2
x_freq = np.linspace(0, sample_rate//2, N//2)
# Changed from abs(y_freq[:domain]) -> y_freq[:domain]
plt.plot(x_freq, y_freq[:domain])
plt.xlabel("Frequency [Hz]")
plt.ylabel("Frequency Amplitude |X(t)|")
return plt.show()
# Changed from single_sample_data -> audio_time_series
fft_plot(audio_time_series, sample_rate)
The code sample above produced, this plot:
Therefore, I think one of two things is going on:
The fft() function is not actually performing an fft on the time series data it is being given
The .wav file does not contain time series data to begin with
What could be the issue? Has anyone else experienced this?
I have replicated, essentially replicated, the code in the question and I don't see the problem the OP has described.
In [172]: %reset -f
...: import matplotlib.pyplot as plt
...: import numpy as np
...: from scipy.fftpack import fft
...: from scipy.io import wavfile
...:
...: sr, data = wavfile.read('sample.wav')
...: print(data.shape, sr)
...: signal = data[:sr,0]
...: Signal = fft(signal)
...: fig, (axt, axf) = plt.subplots(2, 1,
...: constrained_layout=1,
...: figsize=(11.8,3))
...: axt.plot(signal, lw=0.15) ; axt.grid(1)
...: axf.plot(np.abs(Signal[:sr//2]), lw=0.15) ; axf.grid(1)
...: plt.show()
sr, data = wavfile.read('sample.wav')
(268237, 2) 8000
Hence, I'm voting for closing the question because it is "Not reproducible or was caused by a typo".
I am working on a project which is aiming to show difference between good form and bad form of an exercise. To do this we collected the acceleration data with wrist based accelerometer. The image above shows 2 set of a fitness execise (bench press). Each set has 10 repetitions. And the image below shows 10 repetitions of 1 set.I have a raw data set which consist of 10 set of an execises. What I want to do is splitting the raw data to 10 parts which will contain the part between 2 black line in the image above so I can analyze the data easily. My supervisor gave me a starting point which is choosing cutpoint in the each set. He said take a cutpoint, find the first interruption time start cutting at 3 sec before that time and count to 10 and finish cutting.
This an idea that I don't know how to apply. At least, if you can tell how to cut a dataframe according to cutpoint I would be greatful.
Well, I found another way to detect periodic parts of my accelerometer data. So, Here is my code:
import numpy as np
from peakdetect import peakdetect
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from matplotlib import style
from pandas import DataFrame as df
style.use('ggplot')
def get_periodic(path):
periodics = []
data_frame = df.from_csv(path)
data_frame.columns = ['z', 'y', 'x']
if path.__contains__('1'):
if path.__contains__('bench'):
bench_press_1_week = data_frame.between_time('11:24', '11:52')
peak_indexes = get_peaks(bench_press_1_week.y, lookahead=3000)
for i in range(0, len(peak_indexes)):
time_indexes = bench_press_1_week.index.tolist()
start_time = time_indexes[0]
periodic_start = start_time.to_datetime() + dt.timedelta(0, peak_indexes[i] / 100)
periodic_end = periodic_start + dt.timedelta(0, 60)
periodic = bench_press_1_week.between_time(periodic_start.time(), periodic_end.time())
periodics.append(periodic)
return periodics
def get_peaks(data, lookahead):
peak_indexes = []
correlation = np.correlate(data, data, mode='full')
realcorr = correlation[correlation.size / 2:]
maxpeaks, minpeaks = peakdetect(realcorr, lookahead=lookahead)
for i in range(0, len(maxpeaks)):
peak_indexes.append(maxpeaks[i][0])
return peak_indexes
def show_segment_plot(data, periodic_area, exercise_name):
plt.figure(8)
gs = gridspec.GridSpec(7, 2)
ax = plt.subplot(gs[:2, :])
plt.title(exercise_name)
ax.plot(data)
k = 0
for i in range(2, 7):
for j in range(0, 2):
ax = plt.subplot(gs[i, j])
title = "{} {}".format(k + 1, ".Set")
plt.title(title)
ax.plot(periodic_area[k])
k = k + 1
plt.show()
Firstly, this question gave me another perspective for my problem. The image below shows the raw accelerometer data of bench press with 10 sets. Here it has 3 axis(x,y,z) and it's major axis is y(Blue on the image).
I used autocorrelation function for detecting the periodic parts, In the image above every peak represents 1 set of execises. With this peak detection algorithm I found each peak's x-axis value,
In[196]: maxpeaks
Out[196]:
[[16204, 32910.14013671875],
[32281, 28726.95849609375],
[48515, 24583.898681640625],
[64436, 22088.130859375],
[80335, 19582.248291015625],
[96699, 16436.567626953125],
[113081, 12100.027587890625],
[129027, 8098.98486328125],
[145184, 5387.788818359375]]
Basically, each x-value represent samples. My sampling frequency was 100Hz so 16204/100 = 162,04 seconds. To find the time of periodic part I added 162,04 sec to started time. Each bench press took aproximatelly 1 min and in this example, exercise's starting time was 11:24, for first periodic part's start time is 11:26 and ending time is 1 min after. There is some lag but yes best solution that I found is this.
I'm working with a Geiger counter which can be hooked up to a computer and which records its output in the form of a .txt file, NC.txt, where it records the time since starting and the 'value' of the radiation it recorded. It looks like
import pylab
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
x1 = []
y1 = []
#Define a dictionary: counts
f = open("NC.txt", "r")
for line in f:
line = line.strip()
parts = line.split(",") #the columns are separated by commas and spaces
time = float(parts[1]) #time is recorded in the second column of NC.txt
value = float(parts[2]) #and the value it records is in the third
x1.append(time)
y1.append(value)
f.close()
xv = np.array(x1)
yv = np.array(y1)
#Statistics
m = np.mean(yv)
d = np.std(yv)
#Strip out background radiation
trueval = yv - m
#Basic plot of counts
num_bins = 10000
plt.hist(trueval,num_bins)
plt.xlabel('Value')
plt.ylabel('Count')
plt.show()
So this code so far will just create a simple histogram of the radiation counts centred at zero, so the background radiation is ignored.
What I want to do now is perform a chi-squared test to see how well the data fits, say, Poisson statistics (and then go on to compare it with other distributions later). I'm not really sure how to do that. I have access to scipy and numpy, so I feel like this should be a simple task, but just learning python as I go here, so I'm not a terrific programmer.
Does anyone know of a straightforward way to do this?
Edit for clarity: I'm not asking so much about if there is a chi-squared function or not. I'm more interested in how to compare it with other statistical distributions.
Thanks in advance.
You can use SciPy library, here is documentation and examples.
I am trying to create smaller files than can themselves be run standalone in order to test them. One of my files reads data from a csv and the test is the plot the data. I am now creating a module that makes a moving average of the data, the test will be to read in the data (using the other module) and then run itself on the data and plot it. Each file is stored in its own directory (the read file module is stored in _ReadBand)
The issue seems to be with the %matplotlib inline command that i am using (i am working in ipython) that is only found in the test area
Here is my Read CSV file
##
# Read sensor band data
# Creates Timestamps from RTC ticks
##
import pandas as pd
from ticksConversion import ticks_to_time #converts difference in rtc to float of seconds
def read_sensor(folder_path, file_path):
##This function takes csv of raw data from sensor
##and returns the invidual signals as a well as a timestamp made from the real time clock
#Create full path
data_path = folder_path + file_path
#Read CSV
sensor_data = pd.read_csv(data_path, header=0, names=['time', 'band_rtc', 'ps1', 'ps2', 'ps3'])
#Extract sensor signals and rtc into lists
sensorData = [list(sensor_data.ps1), list(sensor_data.ps2), list(sensor_data.ps3)]
#Create Timestamps based on RTC
sensorTimestamp = [0] #first timestamp value
secondsElapsed = 0 #running total of seconds for timestamps
for indx, val in enumerate(sensor_data.band_rtc):
if( indx == 0):
continue #If first rtc value simply continue, this data already has timestamp zero
secondsElapsed += ticks_to_time(sensor_data.band_rtc[indx-1], sensor_data.band_rtc[indx]) #convert rtc elapsed to seconds and add to total
sensorTimestamp.append(secondsElapsed) #add timestamp for this data point
return sensorTimestamp, sensorData
#Test code
if __name__ == "__main__":
#matplotlib - 2D plotting library
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
#Test Data Path
folder_path = './'
file_path = 'testRead.csv'
#Test Fuunction
sensorTimestamp, sensorData = read_sensor(folder_path, file_path)
#Plot Data
for indx, data in enumerate(sensorData):
plt.figure(figsize=(20,3))
plt.plot(sensorTimestamp, sensorData[indx], label='data from LED %i' %indx)
plt.title('Raw Reading from Band')
plt.legend(loc='best')
plt.show()
This is my moving average file:
##
# Moving average Filter
##
def loop_rolling_mean(numRolls, dataset, window):
## Moving average with specificed window sized moved over data specified number of times
## Input:
# dataset - data to average
# numRolls - number of times to do a moving average
# window - window size for average, must be odd for unshifted data
##Output
# rolledMean - averaged data
rolledMean = Series(dataset) # Copy data over and cast to pandas series
for x in range(0,numRolls): #iterate how many times that you want to do run the window
rolledMean = pd.rolling_mean(rolledMean, window, min_periods=1, center=True)
return rolledMean ## the dataset that had the rolling mean going forward x number of times
def loop_rolling_mean_with_reverse(numRolls, dataset, window):
##roll over data set forward and backward to get rid of offset
## Input:
# dataset - data to average
# numRolls - number of rolls (forward and backward), must be even for unshifted data
# window - window size for average
##Output
# rolledMean - averaged data
#Error Checking
if(numRolls%2 != 0):
return "Number of rolls must be even for un-shifted data"
## Now going to do the alternating rolling
rolledMean = Series(dataset) # Copy data over and cast to pandas series
for x in range(0, int(numRolls/2)):
forwardRoll = pd.rolling_mean(rolledMean, window, min_periods=1) #roll data in forward direction
reversdData = forwardRoll.reindex(index=forwardRoll.index[::-1]) #reverse data
reverseRoll = pd.rolling_mean(reversdData, window, min_periods=1) #roll over reversed data
rolledMean = reverseRoll.reindex(index=reverseRoll.index[::-1]) #reverse data again so its in correct order
return rolledMean
#Test code
if __name__ == "__main__":
# import readBand
import sys
sys.path.insert(0, '../_ReadBand')
import readBand
matplotlib - 2D plotting library
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
#Set number of rolls and Window
numRolls = 2
windowSize = 3
rolledSensorData = []
for indx, val in enumerate(sensorData):
rolledSensorData.append(loop_rolling_mean(numRolls, sensorData[indx], windowSize))
#Plot Data
for indx, val in enumerate(rolledSensorData):
plt.figure(figsize=(20,3))
plt.plot(sens_data.timestamp, sensorData[indx], label='raw data')
plt.plot(sensor_data.timestamp, rolledSensorData[indx], label='Moving Avg forward')
plt.title('LED %i' %indx)
#plt.axis([8,22, 7000, 8600])
plt.legend(loc='best')
plt.show()
for indx, val in enumerate(rolledSensorData):
print len(rolledSensorData[indx])
And this is the error i receive, as you can see it refers to the test area in readBand.py
File "../_ReadBand\readBand.py", line 43
%matplotlib inline
^
SyntaxError: invalid syntax
I dont even think this part of the code should be running since its being imported and not the main