I have a .wav file, I load it and I get the next spectrogram showing the spectrum in dB
http://i.stack.imgur.com/22TjY.png
Now I would like to know these values exactly because I want to compare with other wav file, for recognizing if these 4 values are there.
http://i.stack.imgur.com/Jun25.png
The source to generate that pictures (taken from other stackoverflow example)
## some stuff here
for i in range(0, int(RATE / CHUNK_SIZE * RECORD_SECONDS)):
# little endian, signed shortdata_chunk
data_chunk = array('h', stream.read(CHUNK_SIZE))
if byteorder == 'big':
data_chunk.byteswap()
data_all.extend(data_chunk)
## some stuff here
Fs = 16000
f = np.arange(1, 9) * 2000
t = np.arange(RECORD_SECONDS * Fs) / Fs
x = np.empty(t.shape)
for i in range(8):
x[i*Fs:(i+1)*Fs] = np.cos(2*np.pi * f[i] * t[i*Fs:(i+1)*Fs])
w = np.hamming(512)
Pxx, freqs, bins = mlab.specgram(data_all, NFFT=512, Fs=Fs, window=w,
noverlap=464)
#plot the spectrogram in dB
Pxx_dB = np.log10(Pxx)
pyplot.subplots_adjust(hspace=0.4)
pyplot.subplot(211)
ex1 = bins[0], bins[-1], freqs[0], freqs[-1]
pyplot.imshow(np.flipud(Pxx_dB), extent=ex1)
pyplot.axis('auto')
pyplot.axis(ex1)
pyplot.xlabel('time (s)')
pyplot.ylabel('freq (Hz)')
I "think" that the information is in Pxx but I don't know how to get it.
From the documentation, I gather that Pxx is a simple 2D numpy array.
You're interested in periodograms around 1s. Considering Pxx should have 512 columns and your sample is about 5s long, I'd take a slice somewhere around column 100:
periodogram_of_interest = Pxx[:, 100]
Then find the 4 maxima. Unfortunately, each of those 4 frequencies has a finite width, so simply looking for the top 4 maxima will nog be as easy. However, assuming your signal is quite clean, there's a function in scipy.signal that will list all local extrema: argrelmax. You could play with the order argument of that function to reduce your search space.
With the values returned from that function, you could get the frequencies like this: freqs[those_4_indices].
Related
I am new to Python so please pardon me if this question is very basic.
I have Accelerometer Vector Magnitude (acc_VM) signal with sampling frequency of 100Hz. I have to find the Fourier transform of this signal and find the fundamental frequency between range Df.
Df is the family of frequencies corresponding to walking. Here we use Df = [1.2, 4]Hz. How can I choose the frequency range Df = [1.2, 4]Hz using python should I implement filters OR is combFunction() the correct code ?
def combFunction(n):
combSignal = []
for element in n:
if element>1.2 and element<4 :
combSignal.append(element)
else:
combSignal.append(0)
return np.maximum(combSignal)
def hann(total_data):
hann_array = np.zeros(total_data)
for i in range(total_data):
hann_array[i] = 0.5 - 0.5 * np.cos((2 * np.pi * i)/(total_data - 1))
return hann_array
def calculate_FT(x):
hann_weight = hann(len(x))
x_multiplied_hann = x * hann_weight
X = np.abs(np.fft.rfft(x_multiplied_hann))
combSignal = combFunction(X)
calculate_FT(acc_VM)
The FFT does not return frequencies, but rather an array of amplitudes for a fixed set of evenly spaced frequencies.
As a result your combFunction, as implemented, would pick the components which have a spectrum amplitude between 1.2 and 4.
To be able to select frequencies, you would need the corresponding array of those evenly spaced frequencies, which you can get
from np.fft.rfftfreq.
Note that you will need the sampling rate (and if your data isn't uniformly sampled, you will need to resample it).
In the code that follows I'll use the variable sampling_rate for that. Then the frequencies will be given by:
freqs = np.fft.rfftfreq(len(data), sampling_rate)
Now let's extract the array indices corresponding to those frequencies that are within the frequency band of interest:
in_band = np.where([f >= 1.2 and f <= 4 for f in freqs])[0]
Then you may get the location within this band where the original spectrum X has a peak:
peak_location = np.argmax(X[in_band])
which gives you a peak spectrum amplitude X[in_band[peak_location]] at a frequency f[in_band[peak_location]].
Putting it all together should give you something like the following:
def find_peak_in_frequency_range(X, freqs, fmin, fmax):
in_band = np.where([f >= fmin and f <= fmax for f in freqs])[0]
peak_location = np.argmax(X[in_band])
return f[in_band[peak_location]], X[in_band[peak_location]]
def calculate_FT(x, sampling_rage):
hann_weight = hann(len(x))
x_multiplied_hann = x * hann_weight
X = np.abs(np.fft.rfft(x_multiplied_hann))
freqs = np.fft.rfftfreq(len(x), sampling_rate)
peakFreq,peakAmp = find_peak_in_frequency_range(X, freqs, 1.2, 4)
Note that you may get better results by using a spectrum estimation method such as scipy.signal.welch instead of simply taking the FFT.
For sake of illustration, I've ran the above on a sample data set (file 1.csv with some resampling):
I have python 3.4.
I transmitted a 2MHz (for example) frequency and received the cavitation over the time (until I stopped the measurement).
I want to get a spectrogram (cavitation vs frequency) and more interesting is a spectrogram of cavitation over the time of the sub-harmonic (1MHz) frequency.
The data is saved in sdataA (=cavitation), and t (=measurement time)
I tried to save fft in FFTA
FFTA = np.array([])
FFTA = np.fft.fft(dataA)
FFTA = np.append(FFTA, dataA)
I got real and complex numbers
Then I took only half (from 0 to 1MHz) and save the real and complex data.
nA = int(len(FFTA)/2)
yAre = FFTA[range(nA)].real
yAim = FFTA[range(nA)].imag
I tried to get the frequencies by:
FFTAfreqs = np.fft.fftfreq(len(yAre))
But it is totally wrong (I printed the data by print (FFTAfreqs))
I also plotted the data and again it's wrong:
plt.plot(t, FFTA[range(n)].real, 'b-', t, FFTA[range(n)].imag, 'r--')
plt.legend(('real', 'imaginary'))
plt.show()
How can I output a spectrogram of cavitation over the time of the sub-harmonic (1MHz) frequency?
EDIT:
Data example:
see a sample of 'dataA' and 'time':
dataA = [6.08E-04,2.78E-04,3.64E-04,3.64E-04,4.37E-04,4.09E-04,4.49E-04,4.09E-04,3.52E-04,3.24E-04,3.92E-04,3.24E-04,2.67E-04,3.24E-04,2.95E-04,2.95E-04,4.94E-04,4.09E-04,3.64E-04,3.07E-04]
time = [0.00E+00,4.96E-07,9.92E-07,1.49E-06,1.98E-06,2.48E-06,2.98E-06,3.47E-06,3.97E-06,4.46E-06,4.96E-06,5.46E-06,5.95E-06,6.45E-06,6.94E-06,7.44E-06,7.94E-06,8.43E-06,8.93E-06,9.42E-06]
EDIT II:
From #Martin example I tried the following code, please let me know if I did it right.
In the case that dataA and Time are saved as h5 files (or the data that I posted already)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dfdata = pd.read_hdf("C:\\data_python\\DataA.h5")
dft = pd.read_hdf("C:\\data_python\\time.h5")
dft_cor = int((len(dft)-2)*4.96E-6) # calculating the measured time
fs = 2000000 #sampling frequency 2MHz
CHUNK = 10000
signal_time = dft_cor # seconds
def sine(freq,fs,secs):
data=dfdata
wave = np.sin(freq*2*np.pi*data)
return wave
a1 = sine(fs,fs,120)
a2 = sine(fs/2,fs,120)
signal = a1+a2
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = freqs/np.amax(freqs*1.0)
# Plot spectral analysis
plt.plot(freqs[0:1000000],afft[0:1000000]) # 0-1MHz
plt.show()
number_of_chunks = 1000
# Empty spectrogram
Spectrogram = np.zeros(shape = [CHUNK,number_of_chunks])
for i in range(number_of_chunks):
afft = np.abs(np.fft.fft(signal[i*CHUNK:(1+i)*CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = afft/np.amax(afft*1.0)
try:
Spectrogram[:,i]=spectrogram_chunk
except:
break
import cv2
Spectrogram = Spectrogram[0:1000000,:]
cv2.imshow('spectrogram',np.uint8(255*Spectrogram/np.amax(Spectrogram)))
cv2.waitKey()
cv2.destroyAllWindows()
It seems your problem is not in Python but in understanding what is Spectrogram.
Spectrogram is sequences of spectral analysis of a signal.
1) You need to cut your signal in CHUNKS.
2) Do spectral analysis of these CHUNKS and stick it together.
Example:
You have 1 second of audio recoding (44100 HZ sampling). That means the recording will have 1s * 44100 -> 44100 samples. You define CHUNK size = 1024 (for example).
For each chunk you will do FFT, and stick it together into 2D matrix (X axis - FFT of the CHUNK, Y axis - CHUNK number,). 44100 samples / CHUNK ~ 44 FFTs, each of the FFT covers 1024/44100~0.023 seconds of the signal
The bigger the CHUNK, the more accurate Spectrogram is, but less 'realtime'.
The smaller the CHUNK is, the less acurate is the Spectrogram, but you have more measurements as you measure frequencies 'more often'.
If you need 1MHZ - actually you cannot use anything higher than 1MHZ, you just take half of the resulting FFT array - and it doesnt matter which half, because 1MHZ is just the half of your sampling frequency, and the FFT is mirroring anything that is higher than 1/2 of sampling frequency.
About FFT, you dont want complex numbers. You want to do
FFT = np.abs(FFT) # Edit - I just noticed you use '.real', but I will keep it here
because you want real numbers.
Preparation for Spectrogram - example of Spectrogram
Audio Signal with 150HZ wave and 300HZ Wave
import numpy as np
import matplotlib.pyplot as plt
fs = 44100#sampling frequency
CHUNK = 10000
signal_time = 20 # seconds
def sine(freq,fs,secs):
data=np.arange(fs*secs)/(fs*1.0)
wave = np.sin(freq*2*np.pi*data)
return wave
a1 = sine(150,fs,120)
a2 = sine(300,fs,120)
signal = a1+a2
afft = np.abs(np.fft.fft(signal[0:CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
spectrogram_chunk = freqs/np.amax(freqs*1.0)
# Plot spectral analysis
plt.plot(freqs[0:250],afft[0:250])
plt.show()
number_of_chunks = 1000
# Empty spectrogram
Spectrogram = np.zeros(shape = [CHUNK,number_of_chunks])
for i in range(number_of_chunks):
afft = np.abs(np.fft.fft(signal[i*CHUNK:(1+i)*CHUNK]))
freqs = np.linspace(0,fs,CHUNK)[0:int(fs/2)]
#plt.plot(spectrogram_chunk[0:250],afft[0:250])
#plt.show()
spectrogram_chunk = afft/np.amax(afft*1.0)
#print(signal[i*CHUNK:(1+i)*CHUNK].shape)
try:
Spectrogram[:,i]=spectrogram_chunk
except:
break
import cv2
Spectrogram = Spectrogram[0:250,:]
cv2.imshow('spectrogram',np.uint8(255*Spectrogram/np.amax(Spectrogram)))
cv2.waitKey()
cv2.destroyAllWindows()
Spectral analysis of single CHUNK
Spectrogram
My main task is to recognize a human humming from a microphone in real time. As the first step to recognizing signals in general, I have made a 5 seconds recording of a 440 Hz signal generated from an app on my phone and tried to detect the same frequency.
I used Audacity to plot and verify the spectrum from the same 440Hz wav file and I got this, which shows that 440Hz is indeed the dominant frequency :
(https://i.imgur.com/2UImEkR.png)
To do this with python, I use the PyAudio library and refer this blog. The code I have so far which I run with the wav file is this :
"""PyAudio Example: Play a WAVE file."""
import pyaudio
import wave
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
i = 0
while data != '':
i += 1
data_unpacked = struct.unpack('{n}h'.format(n= len(data)/2 ), data)
data_np = np.array(data_unpacked)
data_fft = np.fft.fft(data_np)
data_freq = np.abs(data_fft)/len(data_fft) # Dividing by length to normalize the amplitude as per https://www.mathworks.com/matlabcentral/answers/162846-amplitude-of-signal-after-fft-operation
print("Chunk: {} max_freq: {}".format(i,np.argmax(data_freq)))
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(data_freq)
ax.set_xscale('log')
plt.show()
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
In the output, I get that the max frequency is 10 for all the chunks and an example of one of the plots is :
(https://i.imgur.com/zsAXME5.png)
I had expected this value to be 440 instead of 10 for all the chunks. I admit I know very little about the theory of FFTs and I appreciate any help in letting my solve this.
EDIT:
The sampling rate is 44100. no. of channels is 2 and sample width is also 2.
Forewords
As xdurch0 pointed out, you are reading a kind of index instead of a frequency. If you are about to make all computation by yourself you need to compute you own frequency vector before plotting if you want to get consistent result. Reading this answer may help you towards the solution.
The frequency vector for FFT (half plane) is:
f = np.linspace(0, rate/2, N_fft/2)
Or (full plane):
f = np.linspace(-rate/2, rate/2, N_fft)
On the other hand we can delegate most of the work to the excellent scipy.signal toolbox which aims to cope with this kind of problems (and many more).
MCVE
Using scipy package it is straight forward to get the desired result for a simple WAV file with a single frequency (source):
import numpy as np
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt
# Read the file (rate and data):
rate, data = wavfile.read('tone.wav') # See source
# Compute PSD:
f, P = signal.periodogram(data, rate) # Frequencies and PSD
# Display PSD:
fig, axe = plt.subplots()
axe.semilogy(f, P)
axe.set_xlim([0,500])
axe.set_ylim([1e-8, 1e10])
axe.set_xlabel(r'Frequency, $\nu$ $[\mathrm{Hz}]$')
axe.set_ylabel(r'PSD, $P$ $[\mathrm{AU^2Hz}^{-1}]$')
axe.set_title('Periodogram')
axe.grid(which='both')
Basically:
Read the wav file and get the sample rate (here 44.1kHz);
Compute the Power Spectrum Density and frequencies;
Then display it with matplotlib.
This outputs:
Find Peak
Then we can find the frequency of the first highest peak (P>1e-2, this criterion is subject to tuning) using find_peaks:
idx = signal.find_peaks(P, height=1e-2)[0][0]
f[idx] # 440.0 Hz
Putting all together it merely boils down to:
def freq(filename, setup={'height': 1e-2}):
rate, data = wavfile.read(filename)
f, P = signal.periodogram(data, rate)
return f[signal.find_peaks(P, **setup)[0][0]]
Handling multiple channels
I tried this code with my wav file, and got the error for the line
axe.semilogy(f, Pxx_den) as follows : ValueError: x and y must have
same first dimension. I checked the shapes and f has (2,) while
Pxx_den has (220160,2). Also, the Pxx_den array seems to have all
zeros only.
Wav file can hold multiple channels, mainly there are mono or stereo files (max. 2**16 - 1 channels). The problem you underlined occurs because of multiple channels file (stereo sample).
rate, data = wavfile.read('aaaah.wav') # Shape: (46447, 2), Rate: 48 kHz
It is not well documented, but the method signal.periodogram also performs on matrix and its input is not directly consistent with wavfile.read output (they perform on different axis by default). So we need to carefully orient dimensions (using axis switch) when performing PSD:
f, P = signal.periodogram(data, rate, axis=0, detrend='linear')
It also works with Transposition data.T but then we need to back transpose the result.
Specifying the axis solve the issue: frequency vector is correct and PSD is not null everywhere (before it performed on the axis=1 which is of length 2, in your case it performed 220160 PSD on 2-samples signals we wanted the converse).
The detrend switch ensure the signal has zero mean and its linear trend is removed.
Real application
This approach should work for real chunked samples, provided chunks hold enough data (see Nyquist-Shannon sampling theorem). Then data are sub-samples of the signal (chunks) and rate is kept constant since it does not change during the process.
Having chunks of size 2**10 seems to work, we can identify specific frequencies from them:
f, P = signal.periodogram(data[:2**10,:], rate, axis=0, detrend='linear') # Shapes: (513,) (513, 2)
idx0 = signal.find_peaks(P[:,0], threshold=0.01, distance=50)[0] # Peaks: [46.875, 2625., 13312.5, 16921.875] Hz
fig, axe = plt.subplots(2, 1, sharex=True, sharey=True)
axe[0].loglog(f, P[:,0])
axe[0].loglog(f[idx0], P[idx0,0], '.')
# [...]
At this point, the trickiest part is the fine tuning of find-peaks method to catch desired frequencies. You may need to consider to pre-filter your signal or post-process the PSD in order to make the identification easier.
I'm trying to make a program in Python from which I can upload a music file and get notes from this file (on piano). I created a Spectrogram, and now how can I get a frequencies from it? How can I fix spectrogram (from half of spectrogram I have mirror reflection)? I need something like this. Here is my code.
import numpy as np
from matplotlib import pyplot as plt
import scipy.io.wavfile as wav
from numpy.lib import stride_tricks
""" short time fourier transform of audio signal """
def stft(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(np.floor(frameSize/2.0)), sig)
# cols for windowing
cols = np.ceil((len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))
frames = stride_tricks.as_strided(samples, shape=(cols, frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win
return np.fft.rfft(frames)
""" scale frequency axis logarithmically """
def logscale_spec(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))
# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
if i == len(scale)-1:
newspec[:,i] = np.sum(spec[:,scale[i]:], axis=1)
else:
newspec[:,i] = np.sum(spec[:,scale[i]:scale[i+1]], axis=1)
# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
if i == len(scale)-1:
freqs += [np.mean(allfreqs[scale[i]:])]
else:
freqs += [np.mean(allfreqs[scale[i]:scale[i+1]])]
return newspec, freqs
""" plot spectrogram"""
def plotstft(audiopath, binsize=2**10, plotpath=None, colormap="jet"):
samplerate, samples = wav.read(audiopath)
s = stft(samples, binsize)
sshow, freq = logscale_spec(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
timebins, freqbins = np.shape(ims)
plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
plt.colorbar()
plt.xlabel("time (s)")
plt.ylabel("frequency (Hz)")
plt.xlim([0, timebins-1])
plt.ylim([0, freqbins])
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])
if plotpath:
plt.savefig(plotpath, bbox_inches="tight")
else:
plt.show()
plt.clf()
plotstft("Sound/piano2.wav")
The audio transcription problem you describe is a well know problem in the Music Information Retrieval (MIR) research community. It is not one that is easy to solve and consists of two aspects:
detecting pitch frequencies, which is often hard due to the occurrence of harmonics and the fact that notes are often glided into (C# can be detected instead of C), also due to tuning discrepancies.
beat detection: audio performances are often not played in time exactly, so finding the actual onsets can be tricky.
A promising novel approach is to use deep neural networks to solve this, e.g.:
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392.
More info:
Poliner, G. E., Ellis, D. P., Ehmann, A. F., Gómez, E., Streich, S., & Ong, B. (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1247-1256.
I am working on a small project in the lab with an Arduino Mega 2560 board. I want to average the signal (voltage) of the positive-slope portion (rise) of a triangle wave to try to remove as much noise as possible. My frequency is 20Hz and I am working with a data rate of 115200 bits/second (fastest recommended by Arduino for data transfer to a computer).
The raw signal looks like this:
My data is stored in a text file, with each line corresponding to a data point. Since I do have thousands of data points, I expect that some averaging would smooth the way my signal looks and make a close-to-perfect straight line in this case. However, other experimental conditions might lead to a signal where I could have features along the positive-slope portion of the triangle wave, such as a negative peak, and I absolutely do need to be able to see this feature on my averaged signal.
I am a Python beginner so I might not have the ideal approach to do so and my code might look bad for most of you guys but I would still like to get your hints / ideas on how to improve my signal processing code to achieve a better noise removal by averaging the signal.
#!/usr/bin/python
import matplotlib.pyplot as plt
import math
# *** OPEN AND PLOT THE RAW DATA ***
data_filename = "My_File_Name"
filepath = "My_File_Path" + data_filename + ".txt"
# Open the Raw Data
with open(filepath, "r") as f:
rawdata = f.readlines()
# Remove the \n
rawdata = map(lambda s: s.strip(), rawdata)
# Plot the Raw Data
plt.plot(rawdata, 'r-')
plt.ylabel('Lightpower (V)')
plt.show()
# *** FIND THE LOCAL MAXIMUM AND MINIMUM
# Number of data points for each range
datarange = 15 # This number can be changed for better processing
max_i_range = int(math.floor(len(rawdata)/datarange))-3
#Declare an empty lists for the max and min
min_list = []
max_list = []
min_list_index = []
max_list_index = []
i=0
for i in range(0, max_i_range):
delimiter0 = i * datarange
delimiter1 = (i+1) * datarange
delimiter2 = (i+2) * datarange
delimiter3 = (i+3) * datarange
sumrange1 = sum(float(rawdata[i]) for i in range(delimiter0, delimiter1 + 1))
averagerange1 = sumrange1 / len(rawdata[delimiter0:delimiter1])
sumrange2 = sum(float(rawdata[i]) for i in range(delimiter1, delimiter2 + 1))
averagerange2 = sumrange2 / len(rawdata[delimiter1:delimiter2])
sumrange3 = sum(float(rawdata[i]) for i in range(delimiter2, delimiter3 + 1))
averagerange3 = sumrange3 / len(rawdata[delimiter2:delimiter3])
# Find if there is a minimum in range 2
if ((averagerange1 > averagerange2) and (averagerange2 < averagerange3)):
min_list.append(min(rawdata[delimiter1:delimiter2])) # Find the value of all the minimum
#Find the index of the minimum
min_index = delimiter1 + [k for k, j in enumerate(rawdata[delimiter1:delimiter2]) if j == min(rawdata[delimiter1:delimiter2])][0] # [0] To use the first index out of the possible values
min_list_index.append(min_index)
# Find if there is a maximum in range 2
if ((averagerange1 < averagerange2) and (averagerange2 > averagerange3)):
max_list.append(max(rawdata[delimiter1:delimiter2])) # Find the value of all the maximum
#Find the index of the maximum
max_index = delimiter1 + [k for k, j in enumerate(rawdata[delimiter1:delimiter2]) if j == max(rawdata[delimiter1:delimiter2])][0] # [0] To use the first index out of the possible values
max_list_index.append(max_index)
# *** PROCESS EACH RISE PATTERN ***
# One rise pattern goes from a min to a max
numb_of_rise_pattern = 50 # This number can be increased or lowered. This will average 50 rise patterns
max_min_diff_total = 0
for i in range(0, numb_of_rise_pattern):
max_min_diff_total = max_min_diff_total + (max_list_index[i]-min_list_index[i])
# Find the average number of points for each rise pattern
max_min_diff_avg = abs(max_min_diff_total / numb_of_rise_pattern)
# Find the average values for each of the rise pattern
avg_position_value_list = []
for i in range(0, max_min_diff_avg):
sum_position_value = 0
for j in range(0, numb_of_rise_pattern):
sum_position_value = sum_position_value + float( rawdata[ min_list_index[j] + i ] )
avg_position_value = sum_position_value / numb_of_rise_pattern
avg_position_value_list.append(avg_position_value)
#Plot the Processed Signal
plt.plot(avg_position_value_list, 'r-')
plt.title(data_filename)
plt.ylabel('Lightpower (V)')
plt.show()
At the end, the processed signal looks like this:
I would expect a straighter line, but I could be wrong. I believe that there are probably a lot of flaws in my code and there would certainly be better ways to achieve what I want. I have included a link to a text file with some raw data if any of you guys want to have fun with it.
http://www108.zippyshare.com/v/2iba0XMD/file.html
Simpler might be to use a smoothing function, such as a moving window average. This is pretty simple to implement using the rolling function from pandas.Series. (Only 501 points are shown.) Tweak the numerical argument (window size) to get different amounts of smoothing.
import pandas as pd
import matplotlib.pyplot as plt
# Plot the Raw Data
ts = rawdata[0:500]
plt.plot(ts, 'r-')
plt.ylabel('Lightpower (V)')
# previous version
# smooth_data = pd.rolling_mean(rawdata[0:500],5).plot(style='k')
# changes to pandas require a change to the code as follows:
smooth_data = pd.Series(ts).rolling(window=7).mean().plot(style='k')
plt.show()
Moving Average
A moving average is, basically, a low-pass filter. So, we could also implement a low-pass filter with functions from SciPy as follows:
import scipy.signal as signal
# First, design the Buterworth filter
N = 3 # Filter order
Wn = 0.1 # Cutoff frequency
B, A = signal.butter(N, Wn, output='ba')
smooth_data = signal.filtfilt(B,A, rawdata[0:500])
plt.plot(ts,'r-')
plt.plot(smooth_data[0:500],'b-')
plt.show()
Low-Pass Filter
The Butterworth filter method is from OceanPython.org, BTW.