How to average a signal to remove noise with Python

How to average a signal to remove noise with Python - python

I am working on a small project in the lab with an Arduino Mega 2560 board. I want to average the signal (voltage) of the positive-slope portion (rise) of a triangle wave to try to remove as much noise as possible. My frequency is 20Hz and I am working with a data rate of 115200 bits/second (fastest recommended by Arduino for data transfer to a computer).
The raw signal looks like this:
My data is stored in a text file, with each line corresponding to a data point. Since I do have thousands of data points, I expect that some averaging would smooth the way my signal looks and make a close-to-perfect straight line in this case. However, other experimental conditions might lead to a signal where I could have features along the positive-slope portion of the triangle wave, such as a negative peak, and I absolutely do need to be able to see this feature on my averaged signal.
I am a Python beginner so I might not have the ideal approach to do so and my code might look bad for most of you guys but I would still like to get your hints / ideas on how to improve my signal processing code to achieve a better noise removal by averaging the signal.
#!/usr/bin/python
import matplotlib.pyplot as plt
import math
# *** OPEN AND PLOT THE RAW DATA ***
data_filename = "My_File_Name"
filepath = "My_File_Path" + data_filename + ".txt"
# Open the Raw Data
with open(filepath, "r") as f:
rawdata = f.readlines()
# Remove the \n
rawdata = map(lambda s: s.strip(), rawdata)
# Plot the Raw Data
plt.plot(rawdata, 'r-')
plt.ylabel('Lightpower (V)')
plt.show()
# *** FIND THE LOCAL MAXIMUM AND MINIMUM
# Number of data points for each range
datarange = 15 # This number can be changed for better processing
max_i_range = int(math.floor(len(rawdata)/datarange))-3
#Declare an empty lists for the max and min
min_list = []
max_list = []
min_list_index = []
max_list_index = []
i=0
for i in range(0, max_i_range):
delimiter0 = i * datarange
delimiter1 = (i+1) * datarange
delimiter2 = (i+2) * datarange
delimiter3 = (i+3) * datarange
sumrange1 = sum(float(rawdata[i]) for i in range(delimiter0, delimiter1 + 1))
averagerange1 = sumrange1 / len(rawdata[delimiter0:delimiter1])
sumrange2 = sum(float(rawdata[i]) for i in range(delimiter1, delimiter2 + 1))
averagerange2 = sumrange2 / len(rawdata[delimiter1:delimiter2])
sumrange3 = sum(float(rawdata[i]) for i in range(delimiter2, delimiter3 + 1))
averagerange3 = sumrange3 / len(rawdata[delimiter2:delimiter3])
# Find if there is a minimum in range 2
if ((averagerange1 > averagerange2) and (averagerange2 < averagerange3)):
min_list.append(min(rawdata[delimiter1:delimiter2])) # Find the value of all the minimum
#Find the index of the minimum
min_index = delimiter1 + [k for k, j in enumerate(rawdata[delimiter1:delimiter2]) if j == min(rawdata[delimiter1:delimiter2])][0] # [0] To use the first index out of the possible values
min_list_index.append(min_index)
# Find if there is a maximum in range 2
if ((averagerange1 < averagerange2) and (averagerange2 > averagerange3)):
max_list.append(max(rawdata[delimiter1:delimiter2])) # Find the value of all the maximum
#Find the index of the maximum
max_index = delimiter1 + [k for k, j in enumerate(rawdata[delimiter1:delimiter2]) if j == max(rawdata[delimiter1:delimiter2])][0] # [0] To use the first index out of the possible values
max_list_index.append(max_index)
# *** PROCESS EACH RISE PATTERN ***
# One rise pattern goes from a min to a max
numb_of_rise_pattern = 50 # This number can be increased or lowered. This will average 50 rise patterns
max_min_diff_total = 0
for i in range(0, numb_of_rise_pattern):
max_min_diff_total = max_min_diff_total + (max_list_index[i]-min_list_index[i])
# Find the average number of points for each rise pattern
max_min_diff_avg = abs(max_min_diff_total / numb_of_rise_pattern)
# Find the average values for each of the rise pattern
avg_position_value_list = []
for i in range(0, max_min_diff_avg):
sum_position_value = 0
for j in range(0, numb_of_rise_pattern):
sum_position_value = sum_position_value + float( rawdata[ min_list_index[j] + i ] )
avg_position_value = sum_position_value / numb_of_rise_pattern
avg_position_value_list.append(avg_position_value)
#Plot the Processed Signal
plt.plot(avg_position_value_list, 'r-')
plt.title(data_filename)
plt.ylabel('Lightpower (V)')
plt.show()
At the end, the processed signal looks like this:
I would expect a straighter line, but I could be wrong. I believe that there are probably a lot of flaws in my code and there would certainly be better ways to achieve what I want. I have included a link to a text file with some raw data if any of you guys want to have fun with it.
http://www108.zippyshare.com/v/2iba0XMD/file.html

Simpler might be to use a smoothing function, such as a moving window average. This is pretty simple to implement using the rolling function from pandas.Series. (Only 501 points are shown.) Tweak the numerical argument (window size) to get different amounts of smoothing.
import pandas as pd
import matplotlib.pyplot as plt
# Plot the Raw Data
ts = rawdata[0:500]
plt.plot(ts, 'r-')
plt.ylabel('Lightpower (V)')
# previous version
# smooth_data = pd.rolling_mean(rawdata[0:500],5).plot(style='k')
# changes to pandas require a change to the code as follows:
smooth_data = pd.Series(ts).rolling(window=7).mean().plot(style='k')
plt.show()
Moving Average
A moving average is, basically, a low-pass filter. So, we could also implement a low-pass filter with functions from SciPy as follows:
import scipy.signal as signal
# First, design the Buterworth filter
N = 3 # Filter order
Wn = 0.1 # Cutoff frequency
B, A = signal.butter(N, Wn, output='ba')
smooth_data = signal.filtfilt(B,A, rawdata[0:500])
plt.plot(ts,'r-')
plt.plot(smooth_data[0:500],'b-')
plt.show()
Low-Pass Filter
The Butterworth filter method is from OceanPython.org, BTW.

Related

Filter frequency using python

I am new to Python so please pardon me if this question is very basic.
I have Accelerometer Vector Magnitude (acc_VM) signal with sampling frequency of 100Hz. I have to find the Fourier transform of this signal and find the fundamental frequency between range Df.
Df is the family of frequencies corresponding to walking. Here we use Df = [1.2, 4]Hz. How can I choose the frequency range Df = [1.2, 4]Hz using python should I implement filters OR is combFunction() the correct code ?
def combFunction(n):
combSignal = []
for element in n:
if element>1.2 and element<4 :
combSignal.append(element)
else:
combSignal.append(0)
return np.maximum(combSignal)
def hann(total_data):
hann_array = np.zeros(total_data)
for i in range(total_data):
hann_array[i] = 0.5 - 0.5 * np.cos((2 * np.pi * i)/(total_data - 1))
return hann_array
def calculate_FT(x):
hann_weight = hann(len(x))
x_multiplied_hann = x * hann_weight
X = np.abs(np.fft.rfft(x_multiplied_hann))
combSignal = combFunction(X)
calculate_FT(acc_VM)

The FFT does not return frequencies, but rather an array of amplitudes for a fixed set of evenly spaced frequencies.
As a result your combFunction, as implemented, would pick the components which have a spectrum amplitude between 1.2 and 4.
To be able to select frequencies, you would need the corresponding array of those evenly spaced frequencies, which you can get
from np.fft.rfftfreq.
Note that you will need the sampling rate (and if your data isn't uniformly sampled, you will need to resample it).
In the code that follows I'll use the variable sampling_rate for that. Then the frequencies will be given by:
freqs = np.fft.rfftfreq(len(data), sampling_rate)
Now let's extract the array indices corresponding to those frequencies that are within the frequency band of interest:
in_band = np.where([f >= 1.2 and f <= 4 for f in freqs])[0]
Then you may get the location within this band where the original spectrum X has a peak:
peak_location = np.argmax(X[in_band])
which gives you a peak spectrum amplitude X[in_band[peak_location]] at a frequency f[in_band[peak_location]].
Putting it all together should give you something like the following:
def find_peak_in_frequency_range(X, freqs, fmin, fmax):
in_band = np.where([f >= fmin and f <= fmax for f in freqs])[0]
peak_location = np.argmax(X[in_band])
return f[in_band[peak_location]], X[in_band[peak_location]]
def calculate_FT(x, sampling_rage):
hann_weight = hann(len(x))
x_multiplied_hann = x * hann_weight
X = np.abs(np.fft.rfft(x_multiplied_hann))
freqs = np.fft.rfftfreq(len(x), sampling_rate)
peakFreq,peakAmp = find_peak_in_frequency_range(X, freqs, 1.2, 4)
Note that you may get better results by using a spectrum estimation method such as scipy.signal.welch instead of simply taking the FFT.
For sake of illustration, I've ran the above on a sample data set (file 1.csv with some resampling):

How can I smooth out noise from a bluetooth accelerometer dynamically?

I have a program that connects to a bluetooth accelerometer and reads that data to determine motion in real time and I'm trying to figure out how to smooth out the noise so I can better represent the motion. I found a scipy function for a butterworth filter (pardon my ignorance on filters) but it seems like it only works when you have the whole plot as it looks at the points before and after to smooth noise. How can I smooth out noise dynamically? Here's my code:
def animator():
global xyz
fig = plt.figure()
xyz_mot = fig.add_subplot(111, projection = "3d")
xyz_mot.set_title("Motion")
xyz_mot.set_xlim3d(-100, 100)
xyz_mot.set_ylim3d(-100, 100)
xyz_mot.set_zlim3d(-100, 100)
xyz = xyz_mot.scatter(0,0,0)
ani = FuncAnimation(fig, updateAni, frames=2, interval=50)
fig.show()
def updateAni(i):
t = float(time_data[-1] / 1000)**2
xmot[0] = .5 * acceleration_data[-1].x * t
ymot[0] = .5 * acceleration_data[-1].y * t
zmot[0] = .5 * acceleration_data[-1].z * t
xyz._offsets3d = (xmot, ymot, zmot)
#print("X Motion: " + str(xmot) + ", Y Motion: " + str(ymot))
#print(time_data[-1])
The accelerometer data and time data are being added to the arrays acceleration_data and time_data from another thread. Is there a matplotlib/some other library function to smooth noise? Any help is appreciated

Look at running data through an exponential averaging or moving average filter. The exponential average allows you to trade off averaging verse speed with alpha parameter. Filter the raw data from the accelerometer before adding to the output array.
The following snippet implements simple exponential avererager
avg = alpha * x + (1 - alpha) * x_prev
x_prev = x
buffer.append(avg)

Instead of adding the raw acceleration data to acceleration_data from your secondary thread, I'd filter the data there by using a buffer to constantly take the average of the last several measurements. Without your code for the thread, I can only give you some pseudo-code, but this should give you an idea of how it would work.:
import collections
def thread_func():
buf = collections.deque(maxlen=10) #take the average of up to the last 10 samples
while True:
accel = bluetooth_device.read()
buf.append(accel)
if buf: #may want to make sure the buffer isn't empty first to prevent devide by 0
acceleration_data.append(sum(buf)/len(buf))

How to split dataframe according to intersection point in Python?

I am working on a project which is aiming to show difference between good form and bad form of an exercise. To do this we collected the acceleration data with wrist based accelerometer. The image above shows 2 set of a fitness execise (bench press). Each set has 10 repetitions. And the image below shows 10 repetitions of 1 set.I have a raw data set which consist of 10 set of an execises. What I want to do is splitting the raw data to 10 parts which will contain the part between 2 black line in the image above so I can analyze the data easily. My supervisor gave me a starting point which is choosing cutpoint in the each set. He said take a cutpoint, find the first interruption time start cutting at 3 sec before that time and count to 10 and finish cutting.
This an idea that I don't know how to apply. At least, if you can tell how to cut a dataframe according to cutpoint I would be greatful.

Well, I found another way to detect periodic parts of my accelerometer data. So, Here is my code:
import numpy as np
from peakdetect import peakdetect
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from matplotlib import style
from pandas import DataFrame as df
style.use('ggplot')
def get_periodic(path):
periodics = []
data_frame = df.from_csv(path)
data_frame.columns = ['z', 'y', 'x']
if path.__contains__('1'):
if path.__contains__('bench'):
bench_press_1_week = data_frame.between_time('11:24', '11:52')
peak_indexes = get_peaks(bench_press_1_week.y, lookahead=3000)
for i in range(0, len(peak_indexes)):
time_indexes = bench_press_1_week.index.tolist()
start_time = time_indexes[0]
periodic_start = start_time.to_datetime() + dt.timedelta(0, peak_indexes[i] / 100)
periodic_end = periodic_start + dt.timedelta(0, 60)
periodic = bench_press_1_week.between_time(periodic_start.time(), periodic_end.time())
periodics.append(periodic)
return periodics
def get_peaks(data, lookahead):
peak_indexes = []
correlation = np.correlate(data, data, mode='full')
realcorr = correlation[correlation.size / 2:]
maxpeaks, minpeaks = peakdetect(realcorr, lookahead=lookahead)
for i in range(0, len(maxpeaks)):
peak_indexes.append(maxpeaks[i][0])
return peak_indexes
def show_segment_plot(data, periodic_area, exercise_name):
plt.figure(8)
gs = gridspec.GridSpec(7, 2)
ax = plt.subplot(gs[:2, :])
plt.title(exercise_name)
ax.plot(data)
k = 0
for i in range(2, 7):
for j in range(0, 2):
ax = plt.subplot(gs[i, j])
title = "{} {}".format(k + 1, ".Set")
plt.title(title)
ax.plot(periodic_area[k])
k = k + 1
plt.show()
Firstly, this question gave me another perspective for my problem. The image below shows the raw accelerometer data of bench press with 10 sets. Here it has 3 axis(x,y,z) and it's major axis is y(Blue on the image).
I used autocorrelation function for detecting the periodic parts, In the image above every peak represents 1 set of execises. With this peak detection algorithm I found each peak's x-axis value,
In[196]: maxpeaks
Out[196]:
[[16204, 32910.14013671875],
[32281, 28726.95849609375],
[48515, 24583.898681640625],
[64436, 22088.130859375],
[80335, 19582.248291015625],
[96699, 16436.567626953125],
[113081, 12100.027587890625],
[129027, 8098.98486328125],
[145184, 5387.788818359375]]
Basically, each x-value represent samples. My sampling frequency was 100Hz so 16204/100 = 162,04 seconds. To find the time of periodic part I added 162,04 sec to started time. Each bench press took aproximatelly 1 min and in this example, exercise's starting time was 11:24, for first periodic part's start time is 11:26 and ending time is 1 min after. There is some lag but yes best solution that I found is this.

How to reduce the execution time of this Python script for histogram matching?

I implemented an algorithm in Python, which is used to reduce the intensity variations (flickering) in an image. The algorithm first calculates the cumulative histogram for each row of the image. Then, it filters the accumulated row cumulative histograms in the vertical direction (across the columns) with a Gaussian kernel, so that the differences between the row cumulative histograms are reduced. In the final part, each row cumulative histogram of the original image should be matched to the corresponding obtained Gaussian-filtered row cumulative histogram. Hence, a histogram matching operation (per each row) is performed, and the desired rows are reconstructed. Naturally, the image is reconstructed in the end by simply stacking all the rows vertically on top each other.
In my code, the last part, where there are two for-loops in each other (iterating for each row and inside each row, for each intensity level [0,255]) takes a lot of time, such that the usage of the algorithm is no more feasible. For a single 4592x3448 (16MP) image, the execution time is over 10 minutes on my machine. Of course, iteration over 256 intensity values for each of the 3448 rows should be slowing down the algorithm quite a bit, but I can't see any other way to avoid these for-loops due to the nature of the algorithm.
I'm very new to Python, so I might be committing serious crimes in terms of programming here. I would appreciate any hints and code reviews. You can find an example image under this link:
http://s3.postimg.org/a17b3otpf/00000_cam0.jpg
import time
import cv2
import numpy as np
import scipy.ndimage as ndi
from matplotlib import pyplot as plt
start_time = time.time()
### Algorithm: Filtering row cumulative histograms with different Gaussian variances
T = 200 # threshold
img = cv2.imread('flicker.jpg',0)
rows,cols = img.shape
cdf_hist = np.zeros((rows,256))
for i in range(0,rows):
# Read one row
img_row = img[i,]
# Calculate the row histogram
hist_row = cv2.calcHist([img_row],[0],None,[256],[0,256])
# Calculate the cumulative row histogram
cdf_hist_row = hist_row.cumsum()
# Accumulate the cumulative histogram of each row
cdf_hist[i,:] = cdf_hist_row
# Apply Gaussian filtering on the row cumulative histograms along the columns (vertically)
# For dark pixels, use higher sigma
Gauss1_cdf = ndi.gaussian_filter1d(cdf_hist, sigma=6, axis=0, output=np.float64, mode='nearest')
# For bright pixel, use lower sigma
Gauss2_cdf = ndi.gaussian_filter1d(cdf_hist, sigma=3, axis=0, output=np.float64, mode='nearest')
##
print("--- %s seconds ---" % (time.time() - start_time))
### UNTIL HERE: it takes in my computer approx. 0.25 sec with a 16MP image
### This part takes too much time ###### START ######################
# Perform histogram matching (for each row) to either 'Hz1' or 'Hz2'
img_match = np.copy(img)
for r in range(0,rows):
row = img[r,:]
Hy = cdf_hist[r,:] # Original row histogram
Hz1 = Gauss1_cdf[r,:] # Histogram 1 to be matched
Hz2 = Gauss2_cdf[r,:] # Histogram 2 to be matched
row_match = img_match[r,:]
for i in range(0,255): # for each intensity value
# Find the indices of the pixels in the row whose intensity = i
ind = [m for (m,val) in enumerate(row) if val==i]
j = Hy[i]
while True:
# use the appropriate CDF (Hz1 or Hz2) according to the bin number
if i<T:
k = [m for (m,val) in enumerate(Hz1) if val>j-1 and val<j+1]
else:
k = [m for (m,val) in enumerate(Hz2) if val>j-1 and val<j+1]
if len(k)>0:
break
else:
j = j+1
row_match[ind] = k[0]
###################### END ####################################
# Set upper bound to the change of intensity values to avoid brightness overflows etc.
alfa = 5
diff_img = cv2.absdiff(img,img_match)
img_match2 = np.copy(img_match)
img_match2[diff_img>alfa] = img[diff_img>alfa] + alfa
## Plots
plt.subplot(121), plt.imshow(img,'gray')
plt.subplot(122), plt.imshow(img_match2, 'gray')
plt.show()
print("--- %s seconds ---" % (time.time() - start_time))

I'm not going to get into using timeit, but it's worth doing. I'm going to leave the first part alone (last part as well).
### This part takes too much time ###### START ######################
# Perform histogram matching (for each row) to either 'Hz1' or 'Hz2'
img_match = np.copy(img)
for r in xrange(rows): #no need to tell range to start from 0. and you should use xrange
row = img[r,:]
rowlength = len(row) #so you don't have to keep recalculating later.
Hy = cdf_hist[r,:] # Original row histogram
Hz1 = Gauss1_cdf[r,:] # Histogram 1 to be matched
Hz2 = Gauss2_cdf[r,:] # Histogram 2 to be matched
Hz1Dict = {}
Hz2dict = {}
for index, item in enumerate(reversed(Hz1)):
Hz1Dict[math.ceil(item)] = index
Hz1Dict[math.floor(item)] = index
for index, item in enumerate(reversed(Hz2)):
Hz2Dict[math.ceil(item)] = index
Hz2Dict[math.floor(item)] = index
row_match = img_match[r,:]
for i in xrange(255): # for each intensity value
# Find the indices of the pixels in the row whose intensity = i
ind = [m for m in xrange(rowlength) if row[m]==i]
j = Hy[i]
while True:
# use the appropriate CDF (Hz1 or Hz2) according to the bin number
if i<T:
if j in Hz1dict:
row_match[ind] = Hz1dict[j]
break
else:
if j in Hz2dict:
row_match[ind] = Hz2dict[j]
break
else:
j = j+1
###################### END ####################################
Obviously you should check closely that I haven't changed the logic at all.
edit what I've now done is stored a dict for each Hz recording for each j the index of the first number between j-1 and j+1 (note the reverse in the enumeration). So now I just check whether the dict has the corresponding j as a key. If so, I set k row_match[ind] to be that value.

how to plot on a smaller scale

I am using matplotlib and I'm finding some problems when trying to plot large vectors.
sometimes get "MemoryError"
My question is whether there is any way to reduce the scale of values that i need to plot ?
In this example I'm plotting a vector with size 2647296!
is there any way to plot the same values on a smaller scale?

It is very unlikely that you have so much resolution on your display that you can see 2.6 million data points in your plot. A simple way to plot less data is to sample e.g. every 1000th point: plot(x[::1000]). If that loses too much and it is e.g. important to see the extremal values, you could write some code to split the long vector into suitably many parts and take the minimum and maximum of each part, and plot those:
tmp = x[:len(x)-len(x)%1000] # drop some points to make length a multiple of 1000
tmp = tmp.reshape((1000,-1)) # split into pieces of 1000 points
tmp = tmp.reshape((-1,1000)) # alternative: split into 1000 pieces
figure(); hold(True) # plot minimum and maximum in the same figure
plot(tmp.min(axis=0))
plot(tmp.max(axis=0))

You can use a min/max for each block of data to subsample the signal.
Window size would have to be determined based on how accurately you want to display your signal and/or how large the window is compared to the signal length.
Example code:
from scipy.io import wavfile
import matplotlib.pyplot as plt
def value_for_window_min_max(data, start, stop):
min = data[start]
max = data[start]
for i in range(start,stop):
if data[i] < min:
min = data[i]
if data[i] > max:
max = data[i]
if abs(min) > abs(max):
return min
else:
return max
# This will only work properly if window_size divides evenly into len(data)
def subsample_data(data, window_size):
print len(data)
print len(data)/window_size
out_data = []
for i in range(0,(len(data)/window_size)):
out_data.append(value_for_window_min_max(data,i*window_size,i*window_size+window_size-1))
return out_data
sample_rate, data = wavfile.read('<path_to_wav_file>')
sub_amt = 10
sub_data = subsample_data(data, sub_amt)
print len(data)
print len(sub_data)
fig = plt.figure(figsize=(8,6), dpi=100)
fig.add_subplot(211)
plt.plot(data)
plt.title('Original')
plt.xlim([0,len(data)])
fig.add_subplot(212)
plt.plot(sub_data)
plt.xlim([0,len(sub_data)])
plt.title('Subsampled by %d'%sub_amt)
plt.show()
Output:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to average a signal to remove noise with Python - python

Related

Filter frequency using python

How can I smooth out noise from a bluetooth accelerometer dynamically?

How to split dataframe according to intersection point in Python?

How to reduce the execution time of this Python script for histogram matching?

how to plot on a smaller scale

Categories

Resources