Good day
EDIT:
What I want: From any current/voltage waveform on a Power System(PS) I want the filtered 50Hz (fundamental) RMS values magnitudes (and effectively their angles). The current as measured contains all harmonics from 100Hz to 1250Hz depending on the equipment. One cannot analyse and calculate using a wave with these harmonics your error gets so big (depending on equipment) that PS protection equipment calculates incorrect quantities. The signal attached also has MANY many other frequency components involved.
My aim: PS protection Relays are special and calculate a 20ms window in a very short time. I.m not trying to get this. I'm using external recording tech and testing what the relays see are true and they operate correctly. Thus I need to do what they do and only keep 50Hz values without any harmonic and DC.
Important expected result: Given any frequency component that MAY be in the signal I want to see the magnitude of any given harmonic (150,250 - 3rd harmonic magnitudes and 5th harmonic of the fundamental) as well as the magnitude of the DC. This will tell me what type of PS equipment possibly injects these frequencies. It is important that I provide a frequency and the answer is a vector of that frequency only with all other values filtered OUT.
RMS-of-the-fundamental vs RMS differs with a factor of 4000A (50Hz only) and 4500A (with other freqs included)
This code calculates a One Cycle Fourier value (RMS) for given frequency. Sometimes called a Fourier filter I think? I use it for Power System 50Hz/0Hz/150Hz analogues analysis. (The answers have been tested and are correct fundamental RMS values. (https://users.wpi.edu/~goulet/Matlab/overlap/trigfs.html)
For a large sample the code is very slow. For 55000 data points it takes 12seconds. For 3 voltages and 3 currents this gets to be VERY slow. I look at 100s of records a day.
How do I enhance it? What Python tips and tricks/ libraries are there to append my lists/array.
(Also feel free to rewrite or use the code). I use the code to pick out certain values out of a signal at given times. (which is like reading the values from a specialized program for power system analysis)
Edited: with how I load the files and use them, code works with pasting it:
import matplotlib.pyplot as plt
import csv
import math
import numpy as np
import cmath
# FILES ATTACHED TO POST
filenamecfg = r"E:/Python_Practise/2019-10-21 13-54-38-482.CFG"
filename = r"E:/Python_Practise/2019-10-21 13-54-38-482.DAT"
t = []
IR = []
newIR=[]
with open(filenamecfg,'r') as csvfile1:
cfgfile = [row for row in csv.reader(csvfile1, delimiter=',')]
numberofchannels=int(np.array(cfgfile)[1][0])
scaleval = float(np.array(cfgfile)[3][5])
scalevalI = float(np.array(cfgfile)[8][5])
samplingfreq = float(np.array(cfgfile)[numberofchannels+4][0])
numsamples = int(np.array(cfgfile)[numberofchannels+4][1])
freq = float(np.array(cfgfile)[numberofchannels+2][0])
intsample = int(samplingfreq/freq)
#TODO neeeed to get number of samples and frequency and detect
#automatically
#scaleval = np.array(cfgfile)[3]
print('multiplier:',scaleval)
print('SampFrq:',samplingfreq)
print('NumSamples:',numsamples)
print('Freq:',freq)
with open(filename,'r') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
t.append(float(row[1])/1000000) #get time from us to s
IR.append(float(row[6]))
newIR = np.array(IR) * scalevalI
t = np.array(t)
def mag_and_theta_for_given_freq(f,IVsignal,Tsignal,samples): #samples are the sample window size you want to caclulate for (256 in my case)
# f in hertz, IVsignal, Tsignal in numpy.array
timegap = Tsignal[2]-Tsignal[3]
pi = math.pi
w = 2*pi*f
Xr = []
Xc = []
Cplx = []
mag = []
theta = []
#print("Calculating for frequency:",f)
for i in range(len(IVsignal)-samples):
newspan = range(i,i+samples)
timewindow = Tsignal[newspan]
#print("this is my time: ",timewindow)
Sig20ms = IVsignal[newspan]
N = len(Sig20ms) #get number of samples of my current Freq
RealI = np.multiply(Sig20ms, np.cos(w*timewindow)) #Get Real and Imaginary part of any signal for given frequency
ImagI = -1*np.multiply(Sig20ms, np.sin(w*timewindow)) #Filters and calculates 1 WINDOW RMS (root mean square value).
#calculate for whole signal and create a new vector. This is the RMS vector (used everywhere in power system analysis)
Xr.append((math.sqrt(2)/N)*sum(RealI)) ### TAKES SO MUCH TIME
Xc.append((math.sqrt(2)/N)*sum(ImagI)) ## these steps make RMS
Cplx.append(complex(Xr[i],Xc[i]))
mag.append(abs(Cplx[i]))
theta.append(np.angle(Cplx[i]))#th*180/pi # this can be used to get Degrees if necessary
#also for freq 0 (DC) id the offset is negative how do I return a negative to indicate this when i'm using MAGnitude or Absolute value
return Cplx,mag,theta #mag[:,1]#,theta # BUT THE MAGNITUDE WILL NEVER BE zero
myZ,magn,th = mag_and_theta_for_given_freq(freq,newIR,t,intsample)
plt.plot(newIR[0:30000],'b',linewidth=0.4)#, label='CFG has been loaded!')
plt.plot(magn[0:30000],'r',linewidth=1)
plt.show()
The code as pasted runs smoothly given the files attached
Regards
EDIT: Please find a test csvfile and COMTRADE TEST files here:
CSV:
https://drive.google.com/open?id=18zc4Ms_MtYAeTBm7tNQTcQkTnFWQ4LUu
COMTRADE
https://drive.google.com/file/d/1j3mcBrljgerqIeJo7eiwWo9eDu_ocv9x/view?usp=sharing
https://drive.google.com/file/d/1pwYm2yj2x8sKYQUcw3dPy_a9GrqAgFtD/view?usp=sharing
Forewords
As I said in my previous comment:
Your code mainly relies on a for loop with a lot of indexation and
scalar operations. You already have imported numpy so you should take
advantage of vectorization.
This answer is a start towards your solution.
Light weight MCVE
First we create a trial signal for the MCVE:
import numpy as np
# Synthetic signal sampler: 5s sampled as 400 Hz
fs = 400 # Hz
t = 5 # s
t = np.linspace(0, t, fs*t+1)
# Synthetic Signal: Amplitude is about 325V #50Hz
A = 325 # V
f = 50 # Hz
y = A*np.sin(2*f*np.pi*t) # V
Then we can compute the RMS of this signal using the usual formulae:
# Actual definition of RMS:
yrms = np.sqrt(np.mean(y**2)) # 229.75 V
Or alternatively we can compute it using DFT (implemented as rfft in numpy.fft):
# RMS using FFT:
Y = np.fft.rfft(y)/y.size
Yrms = np.sqrt(np.real(Y[0]**2 + np.sum(Y[1:]*np.conj(Y[1:]))/2)) # 229.64 V
A demonstration of why this last formulae works can be found here. This is valid because of the Parseval Theorem implies Fourier transform does conserve Energy.
Both versions make use of vectorized functions, no need of splitting real and imaginary part to perform computation and then reassemble into a complex number.
MCVE: Windowing
I suspect you want to apply this function as a window on a long term time serie where RMS value is about to change. Then we can tackle this problem using pandas library which provides time series commodities.
import pandas as pd
We encapsulate the RMS function:
def rms(y):
Y = 2*np.fft.rfft(y)/y.size
return np.sqrt(np.real(Y[0]**2 + np.sum(Y[1:]*np.conj(Y[1:]))/2))
We generate a damped signal (variable RMS)
y = np.exp(-0.1*t)*A*np.sin(2*f*np.pi*t)
We wrap our trial signal into a dataframe to take advantage of the rolling or resample methods:
df = pd.DataFrame(y, index=t*pd.Timedelta('1s'), columns=['signal'])
A rolling RMS of your signal is:
df['rms'] = df.rolling(int(fs/f)).agg(rms)
A periodically sampled RMS is:
df['signal'].resample('1s').agg(rms)
The later returns:
00:00:00 2.187840e+02
00:00:01 1.979639e+02
00:00:02 1.791252e+02
00:00:03 1.620792e+02
00:00:04 1.466553e+02
Signal Conditioning
Addressing your need of keeping only fundamental harmonic (50 Hz), a straightforward solution could be a linear detrend (to remove constant and linear trend) followed by a Butterworth filter (bandpass filter).
We generate a synthetic signal with other frequencies and linear trend:
y = np.exp(-0.1*t)*A*(np.sin(2*f*np.pi*t) \
+ 0.2*np.sin(8*f*np.pi*t) + 0.1*np.sin(16*f*np.pi*t)) \
+ A/20*t + A/100
Then we condition the signal:
from scipy import signal
yd = signal.detrend(y, type='linear')
filt = signal.butter(5, [40,60], btype='band', fs=fs, output='sos', analog=False)
yfilt = signal.sosfilt(filt, yd)
Graphically it leads to:
It resumes to apply the signal conditioning before the RMS computation.
Related
I'm trying to denoise financial time series data (second by second). I have a very long time series, but I've been working with 100,000 observations just to test how well the wavelet denoising (haar) works. It doesn't.
No matter what I do, the reconstructed signal ends up invariably almost identical to the original. Obviously, I want to preserve the original signal, but I feel like the series just simply isn't being denoised -- a financial time series whose only noise occurs in the few-second resolution? Moreover, even at the smallest time scales, the graph of the reconstructed and original graph remain almost the same.
I've tried changing the mother wavelet, the time series length, the mode in which reconstruction of the time series is done (soft vs hard) and, obviously, I've messed with the threshold value itself. I started at the recommended/standard threshold value of sqrt(2*log(len(signal))), but that did virtually nothing for me, so I gradually increased it until I got to the completely ridiculous 2*len(signal)**2 -- which should have smoothed the graph beyond recognition but did basically nothing.
WAVELET = "haar"
LEVEL = 2
signal = training_series
mean = signal.mean()
mean_series = [mean] * len(signal)
signal = [a - b for a, b in zip(signal, mean_series)]
coeffs = pywt.wavedec(signal, WAVELET, level=LEVEL)
sigma = mad(coeffs[-LEVEL])
threshold = sigma * np.sqrt(2*np.log(len(signal)))
coeffs[1:] = (pywt.threshold(i, value=threshold, mode="soft" ) for i in coeffs[1:])
reconstructed_signal = pywt.waverec(coeffs, WAVELET)
I expected that the reconstructed signal would be significantly different from the original signal (as in, smoothed out, denoised, less... identical to the original), but that wasn't the case. At the smallest of scales (think every 10 or 20 seconds on a scale of 100,000 seconds), there is some very minor smoothing that is essentially just ignoring peaks and valleys of size 0.01 (the smallest possible change), but it's almost negligible.
I expected a signal that would be, well, I don't know -- denoised? Am I doing something wrong?
Your threshold might be too high.
You should try setting it by a metric based on the detail coefficients at each level, instead of the original time trace.
Usually starting at:
threshold=np.std(coeff[i])
and going from there will at least get one started.
I had the same problem and found by steadily increasing a scale factor on the threshold helped.
I was attempting to denoise an acoustic emission signal, and only got reconstruction. By multiplying sigma by an increasing scale factor I could find out how high the thresholds needed to be to stop reproducing the signal.
import pywt
import numpy as np
import matplotlib.pyplot as plt
def madev(d, axis=None):
""" Mean absolute deviation of a signal """
return np.mean(np.absolute(d - np.mean(d, axis)), axis)
def wavelet_denoising(x, wavelet, level, s_factor):
"""
deconstructs, thresholds then reconstructs
higher thresholds = less detailed reconstruction
"""
coeff = pywt.wavedec(x, wavelet, mode="per")
sigma = (1/0.6745) * madev(coeff[-level])*s_factor
uthresh = sigma * np.sqrt(2 * np.log(len(x)))
coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])
return pywt.waverec(coeff, wavelet, mode='per')
wav = 'db4'
level=1
for s_factor in np.arange(0,20, 2):
data = wavelet_denoising(signal, wav, level, s_factor)
plt.plot(data)
plt.title('scale factor = {}'.format(s_factor))
fname = 'wavelet_{}_sf_{}_n_{}'.format(wav, s_factor, len(signal))
plt.savefig(fname)
plt.show()
I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)
What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()
I am trying to use a fast fourier transform to extract the phase shift of a single sinusoidal function. I know that on paper, If we denote the transform of our function as T, then we have the following relations:
However, I am finding that while I am able to accurately capture the frequency of my cosine wave, the phase is inaccurate unless I sample at an extremely high rate. For example:
import numpy as np
import pylab as pl
num_t = 100000
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
w = 2.0*np.pi*30.0
phase = np.pi/2.0
amp = np.fft.rfft(np.cos(w*t+phase))
freqs = np.fft.rfftfreq(t.shape[-1],dt)
print (np.arctan2(amp.imag,amp.real))[30]
pl.subplot(211)
pl.plot(freqs[:60],np.sqrt(amp.real**2+amp.imag**2)[:60])
pl.subplot(212)
pl.plot(freqs[:60],(np.arctan2(amp.imag,amp.real))[:60])
pl.show()
Using num=100000 points I get a phase of 1.57173880459.
Using num=10000 points I get a phase of 1.58022110476.
Using num=1000 points I get a phase of 1.6650441064.
What's going wrong? Even with 1000 points I have 33 points per cycle, which should be enough to resolve it. Is there maybe a way to increase the number of computed frequency points? Is there any way to do this with a "low" number of points?
EDIT: from further experimentation it seems that I need ~1000 points per cycle in order to accurately extract a phase. Why?!
EDIT 2: further experiments indicate that accuracy is related to number of points per cycle, rather than absolute numbers. Increasing the number of sampled points per cycle makes phase more accurate, but if both signal frequency and number of sampled points are increased by the same factor, the accuracy stays the same.
Your points are not distributed equally over the interval, you have the point at the end doubled: 0 is the same point as 1. This gets less important the more points you take, obviusly, but still gives some error. You can avoid it totally, the linspace has a flag for this. Also it has a flag to return you the dt directly along with the array.
Do
t, dt = np.linspace(0, 1, num_t, endpoint=False, retstep=True)
instead of
t = np.linspace(0,1,num_t)
dt = 1.0/num_t
then it works :)
The phase value in the result bin of an unrotated FFT is only correct if the input signal is exactly integer periodic within the FFT length. Your test signal is not, thus the FFT measures something partially related to the phase difference of the signal discontinuity between end-points of the test sinusoid. A higher sample rate will create a slightly different last end-point from the sinusoid, and thus a possibly smaller discontinuity.
If you want to decrease this FFT phase measurement error, create your test signal so the your test phase is referenced to the exact center (sample N/2) of the test vector (not the 1st sample), and then do an fftshift operation (rotate by N/2) so that there will be no signal discontinuity between the 1st and last point in your resulting FFT input vector of length N.
This snippet of code might help:
def reconstruct_ifft(data):
"""
In this function, we take in a signal, find its fft, retain the dominant modes and reconstruct the signal from that
Parameters
----------
data : Signal to do the fft, ifft
Returns
-------
reconstructed_signal : the reconstructed signal
"""
N = data.size
yf = rfft(data)
amp_yf = np.abs(yf) #amplitude
yf = yf*(amp_yf>(THRESHOLD*np.amax(amp_yf)))
reconstructed_signal = irfft(yf)
return reconstructed_signal
The 0.01 is the threshold of amplitudes of the fft that you would want to retain. Making the THRESHOLD greater(more than 1 does not make any sense), will give
fewer modes and cause higher rms error but ensures higher frequency selectivity.
(Please adjust the TABS for the python code)
Essentially I've got an excel files with voltage in the first column, and time in the second. I want to find the period of the voltages, as it returns a graph of voltage in y axis and time in x axis with a periodicity, looking similar to a sine function.
To find the frequency I have uploaded my excel file to python as I think this will make it easier- there may be something I've missed that will simplify this.
So far in python I have:
import xlrd
import numpy as N
import numpy.fft as F
import matplotlib.pyplot as P
wb = xlrd.open_workbook('temp7.xls') #LOADING EXCEL FILE
wb.sheet_names()
sh = wb.sheet_by_index(0)
first_column = sh.col_values(1) #VALUES FROM EXCEL
second_column = sh.col_values(2) #VALUES FROM EXCEL
Now how do I find the frequency from this?
I'm not sure how much you know about the Fourier transform, so forgive me if this is too much background.
Your signal does not have "a frequency", it is but it can be thought of as the sum of many frequencies. The Fourier transform will tell you the weights of all the frequencies that make up your signal. Unfortunately information may be lost when sampling from the analog (continuous time) to digital (discrete time) domain. This puts a constraint on the information we can get about frequency - namely that the maximum frequency component we can determine is related to the digital sampling rate (Nyquist-Shannon criterion):
fs > 2B
Where fs is your sampling rate (samples/unit time, typically in Hz or something like it), and B is the maximum frequency of your signal. If your signal actually has frequencies higher than B they will be "aliased" to some value lower than B.
For your problem, all you have to do is this:
x = N.array(first_column)
X = F.fft(x)
Now X is the frequency-domain representation of your voltage signal. The corresponding frequency axis covers [0, fs), based on the sampling theorem. So, what is fs? You need to calculate that by looking at the number of samples you have divided by the total duration of your sampled signal (note your units here):
fs = len(second_column) / second_column[-1]
Note that this representation of your signal will also (probably) be complex, i.e. each frequency will have an associated amplitude and phase.
Hopefully this helps, and hopefully I didn't cover a bunch of stuff you already knew.
I have two datasets listing the average voltage outputs of two assemblies of neural networks at times t, that look something like this:
A = [-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -79.58, -79.55, -79.08, -78.95, -78.77, -78.45,-77.75, -77.18, -77.08, -77.18, -77.16, -76.6, -76.34, -76.35]
B = [-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -78.74, -78.65, -78.08, -77.75, -77.31, -76.55, -75.55, -75.18, -75.34, -75.32, -75.43, -74.94, -74.7, -74.68]
When two neural assemblies are "in phase" to a reasonable extent, that means that they are interrelated. What I want to do is calculate the phase difference between A and B, preferably over the whole time of the simulation. Since two assemblies are unlikely to be totally in phase, I want to compare that phase difference to a certain threshold.
These are inharmonic oscillators and I don't know their functions, only these values, so I have no idea how to determine the phase or the respective phase difference.
I am doing this project in Python, using numpy and scipy (the two assemblies are numpy arrays).
Any suggestions would be greatly appreciated!
EDIT: Added plots
Example datafile for assembly 1
Example datafile for assembly 2
Here is a plot of what the two datasets look like:
Perhaps you are looking for the cross-correlation:
scipy.​signal.​signaltools.correlate(A, B)
The position of the peak in the cross-correlation will be an estimate of the phase difference.
EDIT 3: Update now that I have looked at the real data files. There are two reasons that you find a phase shift of zero. First, the phase shift really is zero between your two time series. You can see this clearly if you zoom in horizontally on your matplotlib graph. Second, it is important to regularize the data first (most importantly, subtract off the mean), otherwise the effect of zero-padding at the ends of the arrays swamps the real signal in the cross-correlation. In the following example, I verify that I am finding the "true" peak by adding an artificial shift and then checking that I recover it correctly.
import numpy, scipy
from scipy.signal import correlate
# Load datasets, taking mean of 100 values in each table row
A = numpy.loadtxt("vb-sync-XReport.txt")[:,1:].mean(axis=1)
B = numpy.loadtxt("vb-sync-YReport.txt")[:,1:].mean(axis=1)
nsamples = A.size
# regularize datasets by subtracting mean and dividing by s.d.
A -= A.mean(); A /= A.std()
B -= B.mean(); B /= B.std()
# Put in an artificial time shift between the two datasets
time_shift = 20
A = numpy.roll(A, time_shift)
# Find cross-correlation
xcorr = correlate(A, B)
# delta time array to match xcorr
dt = numpy.arange(1-nsamples, nsamples)
recovered_time_shift = dt[xcorr.argmax()]
print "Added time shift: %d" % (time_shift)
print "Recovered time shift: %d" % (recovered_time_shift)
# SAMPLE OUTPUT:
# Added time shift: 20
# Recovered time shift: 20
EDIT: Here is an example of how it works with fake data.
EDIT 2: Added a graph of the example.
import numpy, scipy
from scipy.signal import square, sawtooth, correlate
from numpy import pi, random
period = 1.0 # period of oscillations (seconds)
tmax = 10.0 # length of time series (seconds)
nsamples = 1000
noise_amplitude = 0.6
phase_shift = 0.6*pi # in radians
# construct time array
t = numpy.linspace(0.0, tmax, nsamples, endpoint=False)
# Signal A is a square wave (plus some noise)
A = square(2.0*pi*t/period) + noise_amplitude*random.normal(size=(nsamples,))
# Signal B is a phase-shifted saw wave with the same period
B = -sawtooth(phase_shift + 2.0*pi*t/period) + noise_amplitude*random.normal(size=(nsamples,))
# calculate cross correlation of the two signals
xcorr = correlate(A, B)
# The peak of the cross-correlation gives the shift between the two signals
# The xcorr array goes from -nsamples to nsamples
dt = numpy.linspace(-t[-1], t[-1], 2*nsamples-1)
recovered_time_shift = dt[xcorr.argmax()]
# force the phase shift to be in [-pi:pi]
recovered_phase_shift = 2*pi*(((0.5 + recovered_time_shift/period) % 1.0) - 0.5)
relative_error = (recovered_phase_shift - phase_shift)/(2*pi)
print "Original phase shift: %.2f pi" % (phase_shift/pi)
print "Recovered phase shift: %.2f pi" % (recovered_phase_shift/pi)
print "Relative error: %.4f" % (relative_error)
# OUTPUT:
# Original phase shift: 0.25 pi
# Recovered phase shift: 0.24 pi
# Relative error: -0.0050
# Now graph the signals and the cross-correlation
from pyx import canvas, graph, text, color, style, trafo, unit
from pyx.graph import axis, key
text.set(mode="latex")
text.preamble(r"\usepackage{txfonts}")
figwidth = 12
gkey = key.key(pos=None, hpos=0.05, vpos=0.8)
xaxis = axis.linear(title=r"Time, \(t\)")
yaxis = axis.linear(title="Signal", min=-5, max=17)
g = graph.graphxy(width=figwidth, x=xaxis, y=yaxis, key=gkey)
plotdata = [graph.data.values(x=t, y=signal+offset, title=label) for label, signal, offset in (r"\(A(t) = \mathrm{square}(2\pi t/T)\)", A, 2.5), (r"\(B(t) = \mathrm{sawtooth}(\phi + 2 \pi t/T)\)", B, -2.5)]
linestyles = [style.linestyle.solid, style.linejoin.round, style.linewidth.Thick, color.gradient.Rainbow, color.transparency(0.5)]
plotstyles = [graph.style.line(linestyles)]
g.plot(plotdata, plotstyles)
g.text(10*unit.x_pt, 0.56*figwidth, r"\textbf{Cross correlation of noisy anharmonic signals}")
g.text(10*unit.x_pt, 0.33*figwidth, "Phase shift: input \(\phi = %.2f \,\pi\), recovered \(\phi = %.2f \,\pi\)" % (phase_shift/pi, recovered_phase_shift/pi))
xxaxis = axis.linear(title=r"Time Lag, \(\Delta t\)", min=-1.5, max=1.5)
yyaxis = axis.linear(title=r"\(A(t) \star B(t)\)")
gg = graph.graphxy(width=0.2*figwidth, x=xxaxis, y=yyaxis)
plotstyles = [graph.style.line(linestyles + [color.rgb(0.2,0.5,0.2)])]
gg.plot(graph.data.values(x=dt, y=xcorr), plotstyles)
gg.stroke(gg.xgridpath(recovered_time_shift), [style.linewidth.THIck, color.gray(0.5), color.transparency(0.7)])
ggtrafos = [trafo.translate(0.75*figwidth, 0.45*figwidth)]
g.insert(gg, ggtrafos)
g.writePDFfile("so-xcorr-pyx")
So it works pretty well, even for very noisy data and very aharmonic waves.
#deprecated's comments are the exact answer to the question, when it comes to the pure-code python solution. The comments were very valuable, but I feel like I should add some notes for people searching for an answer in the specific context of neural networks.
When you take the average membrane potential of large assemblies of neurons, like I did, the correlation will be relatively weak. What you want to look at, primarily, is either the correlation between spike trains, the latency or the excitability (i.e. synaptic efficacy) of the individual assemblies. This can be found relatively easily by just looking at points where the potential exceeds a certain threshold. Scipy's correlation function on spike trains will show a much more detailed picture of interdependence between neurons or neural assemblies when you give it spike trains, as opposed to the actual potentials. You can also take a look at Brian's statistics module, which can be found here:
http://neuralensemble.org/trac/brian/browser/trunk/brian/tools/statistics.py
As for phase difference, it probably is an inadequate measure, because neurons are not harmonic oscillators. If you want to take very precise measurements of phase, it's best to look at the synchronization of inharmonic oscillators. The mathematical model that describes these kinds of oscillators, which is very useful in the context of neurons and neural networks, is the Kuramoto model. There is extensive documentation available for the Kuramoto model and Integrate-and-fire-synchronization, so I'll leave it to that.