Calculating time-varying frequency and phase angle from a timeseries

Calculating time-varying frequency and phase angle from a timeseries - python

I have data from a number of high frequency data capture devices connected to generators on an electricity grid. These meters collect data in ~1 second "bursts" at ~1.25ms frequency, ie. fast enough to actually see the waveform. See below graphs showing voltage and current for the three phases shown in different colours.
This timeseries has a changing fundamental frequency, ie the frequency of the electricity grid is changing over the length of the timeseries. I want to roll this (messy) waveform data up to summary statistics of frequency and phase angle for each phase, calculated/estimated every 20ms (approx once per cycle).
The simplest way that I can think of would be to just count the gap between the 0 passes (y=0) on each wave and use the offset to calculate phase angle. Is there a neat way to achieve this (ie. a table of interpolated x values for which y=0).
However the above may be quite noisy, and I was wondering if there is a more mathematically elegant way of estimating a changing frequency and phase angle with pandas/scipy etc. I know there are some sophisticated techniques available for periodic functions but I'm not familiar enough with them. Any suggestions would be appreciated :)
Here's a "toy" data set of the first few waves as a pandas Series:
import pandas as pd, datetime as dt
ds_waveform = pd.Series(
index = pd.date_range('2020-08-23 12:35:37.017625', '2020-08-23 12:35:37.142212890', periods=100),
data = [ -9982., -110097., -113600., -91812., -48691., -17532.,
24452., 75533., 103644., 110967., 114652., 92864.,
49697., 18402., -23309., -74481., -103047., -110461.,
-113964., -92130., -49373., -18351., 24042., 75033.,
103644., 111286., 115061., 81628., 61614., 19039.,
-34408., -62428., -103002., -110734., -114237., -92858.,
-49919., -19124., 23542., 74987., 103644., 111877.,
115379., 82720., 62251., 19949., -33953., -62382.,
-102820., -111053., -114555., -81941., -62564., -19579.,
34459., 62706., 103325., 111877., 115698., 83084.,
62888., 20949., -33362., -61791., -102547., -111053.,
-114919., -82805., -62882., -20261., 33777., 62479.,
103189., 112195., 116380., 83630., 63843., 21586.,
-32543., -61427., -102410., -111553., -115374., -83442.,
-63565., -21217., 33276., 62024., 103007., 112468.,
116471., 84631., 64707., 22405., -31952., -61108.,
-101955., -111780., -115647., -84261.])

Related

python Spectrogram by using value in timeseries

I am new to spectrogram and try to plot spectrogram by using relative velocity variations value of ambient seismic noise.
So the format of the data I have is 'time', 'station pair', 'velocity variation value' as below. (If error is needed, I can add it on the data)
2013-11-24,05_PK01_05_SS01,0.057039371136200
2013-11-25,05_PK01_05_SS01,-0.003328071661900
2013-11-26,05_PK01_05_SS01,0.137221779659000
2013-11-27,05_PK01_05_SS01,0.068823721831000
2013-11-28,05_PK01_05_SS01,-0.006876687060810
2013-11-29,05_PK01_05_SS01,-0.023895268916200
2013-11-30,05_PK01_05_SS01,-0.105762098404000
2013-12-01,05_PK01_05_SS01,-0.028069540807700
2013-12-02,05_PK01_05_SS01,0.015091601414300
2013-12-03,05_PK01_05_SS01,0.016353885353700
2013-12-04,05_PK01_05_SS01,-0.056654092859700
2013-12-05,05_PK01_05_SS01,-0.044520608528500
2013-12-06,05_PK01_05_SS01,0.020226437197700
...
But I searched for it, I can only see people using data of network, station, location, channel, or wav data.
Therefore, I have no idea what I have to start because the data format is different..
If you know some ways to get spectrogram by using 'value' of timeseries.
p.s. I would compute cross correlation with velocity variation value and other environmental data such as air temperature, air pressure etc.
###Edit (I add two pictures but the notice pops up that I cannot post images yet but only link)
I would write about groundwater level or other environmental data because those are easier to see variations.
The plot that I want to make similarly is from David et al., 2021 as below.
enter image description here
X axis shows time series and y axis shows cycles/day.
So when the light color is located at 1 then it means diurnal cycle (if 2, semidiurnal cycle).
Now I plot spectrogram and make the frequency as cycles / 1day.
enter image description here
But what I found to edit are two.
In the reference, it is normalized as log scale.
So I need to find the way to edit it as log scale.
In the reference, the x axis becomes 1*10^7.
But in my data, there are only 755 points in time series (dates in 2013-2015).
So what do I have to do to make x axis to time series?
p.s. The code I made
fil=pd.read_csv('myfile.csv')
cf=fil.iloc[:,1]
cf=cf/max(abs(cf))
nfft=128 #The number of data points
fs=1/86400 #Hz [0, fs/2] cycles / unit time
n=len(cf)
fr=fs/n
spec, freq, tt, pplot = pylab.specgram(cf, NFFT=nfft, Fs=fs, detrend=pylab.detrend,
window=pylab.window_hanning, noverlap=100, mode='psd')
pylab.title('%s' % e_n)
plt.colorbar()
plt.ylabel("Frequency (cycles / %s Day)" % str(1/fs/86400))
plt.xlabel("days")
plt.show()

If you look closely at it, wav data is basically just an array of numbers (sound amplitude), recorded at a certain interval.
Note: You have an array of equally spaced samples, but they are for velocity difference, not amplitude. So while the following is technically valid, I don't think that the resulting frequencies represent seismic sound frequencies?
So the discrete Fourier transform (in the form of np.fft.rfft) would normally be the right thing to use.
If you give the function np.fft.rfft() n numbers, it will return n/2+1 frequencies. This is because of the inherent symmetry in the transform.
However, one thing to keep in mind is the frequency resolution of FFT. For example if you take n=44100 samples from a wav file sampled at Fs=44100 Hz, you get a convenient frequency resolution of Fs/n = 1 Hz. Which means that the first number in the FFT result is 0 Hz, the second number is 1 Hz et cetera.
It seems that the sampling frequency in your dataset is once per day, i.e. Fs= 1/(24x3600) =0.000012 Hz. Suppose you have n = 10000 samples, then the FFT will return 5001 numbers, with a frequency resolution of Fs/n= 0.0000000012 Hz. That means that the highest frequency you will be able to detect from data sampled at this frequncy is 0.0000000012*5001 = 0.000006 Hz.
So the highest frequency you can detect is approximately Fs/2!
I'm no domain expert, but that value seems to be a bit low for seismic noise?

Fourier Transform Time Series in Python

I've got a time series of sunspot numbers, where the mean number of sunspots is counted per month, and I'm trying to use a Fourier Transform to convert from the time domain to the frequency domain. The data used is from https://wwwbis.sidc.be/silso/infosnmtot.
The first thing I'm confused about is how to express the sampling frequency as once per month. Do I need to convert it to seconds, eg. 1/(seconds in 30 days)? Here's what I've got so far:
fs = 1/2592000
#the sampling frequency is 1/(seconds in a month)
fourier = np.fft.fft(sn_value)
#sn_value is the mean number of sunspots measured each month
freqs = np.fft.fftfreq(sn_value.size,d=fs)
power_spectrum = np.abs(fourier)
plt.plot(freqs,power_spectrum)
plt.xlim(0,max(freqs))
plt.title("Power Spectral Density of the Sunspot Number Time Series")
plt.grid(True)
I don't think this is correct - namely because I don't know what the scale of the x-axis is. However I do know that there should be a peak at (11years)^-1.
The second thing I'm wondering from this graph is why there seems to be two lines - one being a horizontal line just above y=0. It's more clear when I change the x-axis bounds to: plt.xlim(0,1).
Am I using the fourier transform functions incorrectly?

You can use any units you want. Feel free to express your sampling frequency as fs=12 (samples/year), the x-axis will then be 1/year units. Or use fs=1 (sample/month), the units will then be 1/month.
The extra line you spotted comes from the way you plot your data. Look at the output of the np.fft.fftfreq call. The first half of that array contains positive values from 0 to 1.2e6 or so, the other half contain negative values from -1.2e6 to almost 0. By plotting all your data, you get a data line from 0 to the right, then a straight line from the rightmost point to the leftmost point, then the rest of the data line back to zero. Your xlim call makes it so you don’t see half the data plotted.
Typically you’d plot only the first half of your data, just crop the freqs and power_spectrum arrays.

Normalizing time series measurements

I have read the following sentence:
Figure 3 depicts how the pressure develops during a touch event. It
shows the mean over all button touches from all users. To account for
the different hold times of the touch events, the time axis has been
normalized before averaging the pressure values.
They have measured the touch pressure over touch events and made a plot. I think normalizing the time axis means to scale the time axis to 1s, for example. But how can this be made? Let's say for example I have a measurement which spans 3.34 seconds (1000 timestamps and 1000 measurements). How can I normalize this measurement?

If you want to normalize you data you can do as you suggest and simply calculate:
z_i=\frac{x_i-min(x)}{max(x)-min(x)}
(Sorry but i cannot post images yet but you can visit this )
where zi is your i-th normalized time data, and xi is your absolute data.
An example using numpy:
import numpy
x = numpy.random.rand(10) # generate 10 random values
normalized = (x-min(x))/(max(x)-min(x))
print(x,normalized)

fft power spectrum woes

I'm having trouble getting a frequency spectrum out of a fourier transform... I have some data:
That I have mean-centered, and doesn't seem to have too much of a trend...
I plot the fourier transform of it:
And I get something that is not nice....
Here is my code:
def fourier_spectrum(X, sample_freq=1):
ps = np.abs(np.fft.fft(X))**2
freqs = np.fft.fftfreq(X.size, sample_freq)
idx = np.argsort(freqs)
plt.plot(freqs[idx], ps[idx])
As adapted from code taken from here.
It seems to work for some naive sin wave data:
fourier_spectrum(np.sin(2*np.pi*np.linspace(-10,10,400)), 20./400)
So my questions are: I'm expecting a non-zero-almost-everywhere-spectrum, what am I doing wrong? If I'm not doing anything wrong, what features of my data are causing this? Also, if I'm not doing anything wrong, and fft is just unsuited for my data for some reason, what should I do to extract important frequencies from my data?

It turns out that I just didn't understand the units of the x-axis in the frequency spectrum, which is Hz. Because my sample spacings were on the order of a second, and my period was on the order of a day, the only units really visible on my frequency spectrum were ~1/s (at the edges) to about ~1/m (near the middle), and anything with a longer period than that was indistinguishable from 0. My misunderstanding stemmed from the graph on this tutorial, where they do conversions so that the x-axis units are in time, as opposed to inverse time. I rewrote my frequency_spectrum plotting function to do the appropriate "zooming" on the resulting graph...
def fourier_spectrum(X, sample_spacing_in_s=1, min_period_in_s=5):
'''
X: is our data
sample_spacing_in_s: is the time spacing between samples
min_period_in_s: is the minimum period we want to show up in our
graph... this is handy because if our sample spacing is
small compared to the periods in our data, then our spikes
will all cluster near 0 (the infinite period) and we can't
see them. E.g. if you want to see periods on the order of
days, set min_period_in_s=5*60*60 #5 hours
'''
ps = np.abs(np.fft.fft(X))**2
freqs = np.fft.fftfreq(X.size, sample_spacing_in_s)
idx = np.argsort(freqs)
plt.plot(freqs[idx], ps[idx])
plt.xlim(-1./min_period_in_s,1./min_period_in_s) # the x-axis is in Hz

How to correlate two time series with gaps and different time bases?

I have two time series of 3D accelerometer data that have different time bases (clocks started at different times, with some very slight creep during the sampling time), as well as containing many gaps of different size (due to delays associated with writing to separate flash devices).
The accelerometers I'm using are the inexpensive GCDC X250-2. I'm running the accelerometers at their highest gain, so the data has a significant noise floor.
The time series each have about 2 million data points (over an hour at 512 samples/sec), and contain about 500 events of interest, where a typical event spans 100-150 samples (200-300 ms each). Many of these events are affected by data outages during flash writes.
So, the data isn't pristine, and isn't even very pretty. But my eyeball inspection shows it clearly contains the information I'm interested in. (I can post plots, if needed.)
The accelerometers are in similar environments but are only moderately coupled, meaning that I can tell by eye which events match from each accelerometer, but I have been unsuccessful so far doing so in software. Due to physical limitations, the devices are also mounted in different orientations, where the axes don't match, but they are as close to orthogonal as I could make them. So, for example, for 3-axis accelerometers A & B, +Ax maps to -By (up-down), +Az maps to -Bx (left-right), and +Ay maps to -Bz (front-back).
My initial goal is to correlate shock events on the vertical axis, though I would eventually like to a) automatically discover the axis mapping, b) correlate activity on the mapped aces, and c) extract behavior differences between the two accelerometers (such as twisting or flexing).
The nature of the times series data makes Python's numpy.correlate() unusable. I've also looked at R's Zoo package, but have made no headway with it. I've looked to different fields of signal analysis for help, but I've made no progress.
Anyone have any clues for what I can do, or approaches I should research?
Update 28 Feb 2011: Added some plots here showing examples of the data.

My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.
My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).
import numpy
from scipy.ndimage import gaussian_filter
"""
sig1 and sig 2 are assumed to be large, 1D numpy arrays
sig1 is sampled at times t1, sig2 is sampled at times t2
t_start, t_end, is your desired sampling interval
t_len is your desired number of measurements
"""
t = numpy.linspace(t_start, t_end, t_len)
sig1 = numpy.interp(t, t1, sig1)
sig2 = numpy.interp(t, t2, sig2)
#Now sig1 and sig2 are sampled at the same points.
"""
Rectify and smooth, so 'peaks' will stand out.
This makes big assumptions about your data;
these assumptions seem true-ish based on your plots.
"""
sigma = 10 #Tune this parameter to get the right smoothing
sig1, sig2 = abs(sig1), abs(sig2)
sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)
"""
Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.
Hopefully we can search a small range of shifts to find the maximum of the
cross-correlation. This assumes your data are *nearly* lined up already.
"""
max_xc = 0
best_shift = 0
for shift in range(-10, 10): #Tune this search range
xc = (numpy.roll(sig1, shift) * sig2).sum()
if xc > max_xc:
max_xc = xc
best_shift = shift
print 'Best shift:', best_shift
"""
If best_shift is at the edges of your search range,
you should expand the search range.
"""

If the data contains gaps of unknown sizes that are different in each time series, then I would give up on trying to correlate entire sequences, and instead try cross correlating pairs of short windows on each time series, say overlapping windows twice the length of a typical event (300 samples long). Find potential high cross correlation matches across all possibilities, and then impose a sequential ordering constraint on the potential matches to get sequences of matched windows.
From there you have smaller problems that are easier to analyze.

This isn't a technical answer, but it might help you come up with one:
Convert the plot to an image, and stick it into a decent image program like gimp or photoshop
break the plots into discrete images whenever there's a gap
put the first series of plots in a horizontal line
put the second series in a horizontal line right underneath it
visually identify the first correlated event
if the two events are not lined up vertically:
select whichever instance is further to the left and everything to the right of it on that row
drag those things to the right until they line up
This is pretty much how an audio editor works, so you if you converted it into a simple audio format like an uncompressed WAV file, you could manipulate it directly in something like Audacity. (It'll sound horrible, of course, but you'll be able to move the data plots around pretty easily.)
Actually, audacity has a scripting language called nyquist, too, so if you don't need the program to detect the correlations (or you're at least willing to defer that step for the time being) you could probably use some combination of audacity's markers and nyquist to automate the alignment and export the clean data in your format of choice once you tag the correlation points.

My guess is, you'll have to manually build an offset table that aligns the "matches" between the series. Below is an example of a way to get those matches. The idea is to shift the data left-right until it lines up and then adjust the scale until it "matches". Give it a try.
library(rpanel)
#Generate the x1 and x2 data
n1 <- rnorm(500)
n2 <- rnorm(200)
x1 <- c(n1, rep(0,100), n2, rep(0,150))
x2 <- c(rep(0,50), 2*n1, rep(0,150), 3*n2, rep(0,50))
#Build the panel function that will draw/update the graph
lvm.draw <- function(panel) {
plot(x=(1:length(panel$dat3))+panel$off, y=panel$dat3, ylim=panel$dat1, xlab="", ylab="y", main=paste("Alignment Graph Offset = ", panel$off, " Scale = ", panel$sca, sep=""), typ="l")
lines(x=1:length(panel$dat3), y=panel$sca*panel$dat4, col="red")
grid()
panel
}
#Build the panel
xlimdat <- c(1, length(x1))
ylimdat <- c(-5, 5)
panel <- rp.control(title = "Eye-Ball-It", dat1=ylimdat, dat2=xlimdat, dat3=x1, dat4=x2, off=100, sca=1.0, size=c(300, 160))
rp.slider(panel, var=off, from=-500, to=500, action=lvm.draw, title="Offset", pos=c(5, 5, 290, 70), showvalue=TRUE)
rp.slider(panel, var=sca, from=0, to=2, action=lvm.draw, title="Scale", pos=c(5, 70, 290, 90), showvalue=TRUE)

It sounds like you want to minimize the function (Ax'+By) + (Az'+Bx) + (Ay'+Bz) for a pair of values: Namely, the time-offset: t0 and a time scale factor: tr. where Ax' = tr*(Ax + t0), etc..
I would look into SciPy's bivariate optimize functions. And I would use a mask or temporarily zero the data (both Ax' and By for example) over the "gaps" (assuming the gaps can be programmatically determined).
To make the process more efficient, start with a coarse sampling of A and B, but set the precision in fmin (or whatever optimizer you've selected) that is commensurate with your sampling. Then proceed with progressively finer-sampled windows of the full dataset until your windows are narrow and are not down-sampled.
Edit - matching axes
Regarding the issue of trying to identify which axis is co-linear with a given axis, and not knowing at thing about the characteristics of your data, i can point towards a similar question. Look into pHash or any of the other methods outlined in this post to help identify similar waveforms.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.