python - unwanted periodicity in the autocorrelation result - python

I'm calculating the autocorrelation sampled by every kth element.
Autocorrelation equation image
I'm currently debugging a function that calculates the autocorrelation. The function uses two methods: equation(1) - Rtau and equation(2) - Rktau. Rtau is the original method, while Rktau is a modified version that takes into account the sampling interval k.
My problem is that unwanted periodicity appears with the modified autocorrelation function - equation(2).
I know that the original autocorrelation function is defined by equation(1). However, to relax the high sample rate, equation(2) is used for autocorrelation calculation. This is because, with the WSS assumption, the autocorrelation function depends only on the interval τ instead of n (or absolute time).
import numpy as np
k = 7 # sampling interval
M = 10000 # list size for autocorrelation calculation
R = 2**9 # autocorrelation size
N = 2**20 # data size
data = np.random.normal(1e-1, 1e-2, N)
Rtau = []
Rktau = []
for tau in range(R):
r = data[:M] * data[tau:tau+M]
rk = data[:M*k:k] * data[tau:tau+M*k:k]
Rtau.append(np.mean(r))
Rktau.append(np.mean(rk))
As expected, the results from Rktau are quite similar to those from Rtau. However, I've noticed that Rktau shows unwanted periodicity that is determined by the sampling interval k. This periodicity isn't present in Rtau.
Unwanted periodicity in Rktau:

import numpy as np
from matplotlib import pyplot as plt
k = 7 # sampling rate divider
M = 10000 # list size for autocorrelation calculation
R = 2**9 # autocorrelation size
N = 2**20 # data size
data = np.random.normal(1e-1, 1e-2, N)
Rtau = []
Rktau = []
for tau in range(R):
r = data[:M] * data[tau:tau+M]
Rtau.append(np.mean(r))
for ktau in range(0, R, k):
rk = data[:M] * data[ktau:ktau+M]
Rktau.append(np.mean(rk))
plt.plot(range(R), Rtau)
plt.plot(range(0, R, k), Rktau)

Related

Breaking apart a sin wave but phase doesn't seem to match up sometimes

I originally posted this in physics stack exchange but they requested it be posted here as well....
I am trying to create a known signal with a known wavelength, amplitude, and phase. I then want to break this signal apart into all of its frequencies, find amplitudes, phases, and wavelengths for each frequency, then create equations for each frequency based on these new wavelengths, amplitudes, and phases. In theory, the equations should be identical to the individual signals. However, they are not. I am almost positive it is an issue with phase but I cannot figure out how to resolve it. I will post the exact code to reproduce this below. Please help as my phase, wavelength, and amplitudes will vary once I get more complicated signals so it need to work for any combination of these.
import numpy as np
from matplotlib import pyplot as plt
from scipy import fftpack
# create signal
time_vec = np.arange(1, 11, 1)
wavelength = 1/.1
phase = 0
amp = 10
created_signal = amp * np.sin((2 * np.pi / wavelength * time_vec) + phase)
# plot it
fig, axs = plt.subplots(2, 1, figsize=(10,6))
axs[0].plot(time_vec, created_signal, label='exact_data')
# get fft and freq array
sig_fft = fftpack.fft(created_signal)
sample_freq = fftpack.fftfreq(created_signal.size, d=1)
# do inverse fft and verify same curve as original signal. This is fine!
filtered_signal = fftpack.ifft(sig_fft)
filtered_signal += np.mean(created_signal)
# create individual signals for each frequency
filtered_signals = []
for i in range(len(sample_freq)):
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) < np.nanmin(sample_freq[i])] = 0
high_freq_fft[np.abs(sample_freq) > np.nanmax(sample_freq[i])] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
filtered_sig += np.mean(created_signal)
filtered_signals.append(filtered_sig)
# get phase, amplitude, and wavelength for each individual frequency
sig_size = len(created_signal)
wavelength = []
ph = []
amp = []
indices = []
for j in range(len(sample_freq)):
wavelength.append(1 / sample_freq[j])
indices.append(int(sig_size * sample_freq[j]))
for j in indices:
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
ph.append([phase])
amp.append([np.sqrt((sig_fft[j].real * sig_fft[j].real) + (sig_fft[j].imag * sig_fft[j].imag)) / (sig_size / 2)])
# create an equation for each frequency based on each phase, amp, and wavelength found from above.
def eqn(filtered_si, wavelength, time_vec, phase, amp):
return amp * np.sin((2 * np.pi / wavelength * time_vec) + phase)
def find_equations(filtered_signals_mean, high_freq_fft, wavelength, filtered_signals, time_vec, ph, amp):
equations = []
for i in range(len(wavelength)):
temp = eqn(filtered_signals[i], wavelength[i], time_vec, ph[i], amp[i])
equations.append(temp + filtered_signals_mean)
return equations
filtered_signals_mean = np.abs(np.mean(filtered_signals))
equations = find_equations(filtered_signals_mean, sig_fft, wavelength,
filtered_signals, time_vec, ph, amp)
# at this point each equation, for each frequency should match identically each signal from each frequency,
# however, the phase seems wrong and they do not match!!??
axs[0].plot(time_vec, filtered_signal, '--', linewidth=3, label='filtered_sig_combined')
axs[1].plot(time_vec, filtered_signals[1], label='filtered_sig[-1]')
axs[1].plot(time_vec, equations[1], label='equations[-1]')
axs[0].legend()
axs[1].legend()
fig.tight_layout()
plt.show()
These are issues with your code:
filtered_signal = fftpack.ifft(sig_fft)
filtered_signal += np.mean(created_signal)
This only works because np.mean(created_signal) is approximately zero. The IFFT already takes the DC component into account, the zero frequency describes the mean of the signal.
filtered_signals = []
for i in range(len(sample_freq)):
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) < np.nanmin(sample_freq[i])] = 0
high_freq_fft[np.abs(sample_freq) > np.nanmax(sample_freq[i])] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
filtered_sig += np.mean(created_signal)
filtered_signals.append(filtered_sig)
Here you are, in the first half of the iterations, going through all the frequencies, taking both the negative and positive frequencies into account. For example, when i=1, you take both the -0.1 and the 0.1 frequencies. The second half of the iterations you are applying the IFFT to a zero signal, none of the np.abs(sample_freq) are smaller than zero by definition.
So the filtered_signals[1] contains a sine wave constructed by both the -0.1 and the 0.1 frequency components. This is good. Otherwise it would be a complex-valued function.
for j in range(len(sample_freq)):
wavelength.append(1 / sample_freq[j])
indices.append(int(sig_size * sample_freq[j]))
Here the second half of the indices array contains negative values. Not sure what you were planning with this, but it causes subsequent code to index from the end of the array.
for j in indices:
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
ph.append([phase])
amp.append([np.sqrt((sig_fft[j].real * sig_fft[j].real) + (sig_fft[j].imag * sig_fft[j].imag)) / (sig_size / 2)])
Here, because the indices are not the same as the j in the previous loop, phase[j] doesn't always correspond to wavelength[j], they refer to values from different frequency components in about half the cases. But those cases we shouldn't be evaluating any way. The code assumes a real-valued input, for which the magnitude and phase of only the positive frequencies is sufficient to reconstruct the signal. You should skip all the negative frequencies here.
Next, you build sine waves using the collected information, but using a time_vec that starts at 1, not at 0 as the FFT assumes. And therefore the signal is shifted with respect to the expected value. Furthermore, when phase==0, you should create an even signal (i.e. a cosine, not a sine).
Thus, changing the following two lines of code will create the correct output:
time_vec = np.arange(0, 10, 1)
and
def eqn(filtered_si, wavelength, time_vec, phase, amp):
return amp * np.cos((2 * np.pi / wavelength * time_vec) + phase)
# ^^^
Note that these two changes corrects the plotted graph, but doesn't correct all the issues in the code discussed above.
I solved this finally after 2 days of frustration. I still have no idea why this is the way it is so any insight would be great. The solution is to use the phase produced by arctan2(Im, Re) and modify it according to this equation.
phase = np.arctan2(sig_fft[j].imag, sig_fft[j].real)
formula = ((((wavelength[j]) / 2) - 2) * np.pi) / wavelength[j]
ph.append([phase + formula])
I had to derive this equation from data but I still do not know why this works. Please let me know. Finally!!

Simulating an elementary stochastic process in Python

I'm trying to simulate a simple stochastic process in Python, but with no success. The process is the following:
x(t + δt) = r(t) * x(t)
where r(t) is a Bernoulli random variable that can assume the values 1.5 or 0.6.
I've tried the following:
n = 10
r = np.zeros( (1,n))
for i in range(0, n, 1):
if r[1,i] == r[1,0]:
r[1,i] = 1
else:
B = bernoulli.rvs(0.5, size=1)
if B == 0:
r[1,i] = r[1,i-1] * 0.6
else:
r[1,i] = r[1,i-1] * 1.5
Can you explain why this is wrong and a possible solution?
So , first thing is that the SDE should be perceived over time, so you also need to consider the discretization rather than just giving the number of steps through n .
Essentially, what you are asking is just a simple random walk with a Bernoulli random variable taking on the values 0.5 and 1.6 instead of a Gaussian (standard normal) random variable.
So I have created an answer here, using NumPy to create the Bernoulli random variable for efficiency (numpy is faster than scipy) and then running the simulation with a stepsize of 0.01 then plotting the solution using matplotlib.
One thing to note that this SDE is one dimensional so we can just store the state and time in separate vectors and plot them at the end.
# Function generating bernoulli trial (your r(t))
def get_bernoulli(p=0.5):
'''
Function using numpy (faster than scipy.stats)
to generate bernoulli random variable with values 0.5 or 1.6
'''
B = np.random.binomial(1, p, 1)
if B == 0:
return 0.6
else:
return 1.5
This is then used in the simulation as
import numpy as np
import matplotlib.pyplot as plt
dt = 0.01 #step size
x0 = 1# initialize
tfinal = 1
sqrtdt = np.sqrt(dt)
n = int(tfinal/dt)
# State and time vectors
xtraj = np.zeros(n+1, float)
trange = np.linspace(start=0,stop=tfinal ,num=n+1)
# initialized
xtraj[0] = x0
for i in range(n):
xtraj[i+1] = xtraj[i] * get_bernoulli(p=0.5)
plt.plot(trange,xtraj,label=r'$x(t)$')
plt.xlabel("time")
plt.ylabel(r"$X$")
plt.legend()
plt.show()
Where we assumed the Bernoulli trial is fair, but can be customized to add some more variation.

Plotting Multiple Realizations of a Stochastic Process in Python

I'm trying to plot the time evolution graph for Ornstein-Uhlenbeck Process, which is a stochastic process, and then find the probability distribution at each time steps. I'm able to plot the graph for 1000 realizations of the process. Each realization has a 1000 time step, with width of the time step as .001. I used a 1000 x 1000 array to store the data. Each rows hold value of each realizations. And column wise i-th columns correspond value at i-th time step for 1000 realizations.
Now I want bin results at each time steps together and then plot the probability distribution corresponding to each time step. I'm quite confused with doing it (I tried modifying code from IPython Cookbook, where they don't store each realizations in the memory).
The code that I made from the IPython Cookbook:
import numpy as np
import matplotlib.pyplot as plt
sigma = 1. # Standard deviation.
mu = 10. # Mean.
tau = .05 # Time constant.
dt = .001 # Time step.
T = 1. # Total time.
n = int(T / dt) # Number of time steps.
ntrails = 1000 # Number of Realizations.
t = np.linspace(0., T, n) # Vector of times.
sigmabis = sigma * np.sqrt(2. / tau)
sqrtdt = np.sqrt(dt)
x = np.zeros((ntrails,n)) # Vector containing all successive values of our process
for j in range (ntrails): # Euler Method
for i in range(n - 1):
x[j,i + 1] = x[j,i] + dt * (-(x[j,i] - mu) / tau) + sigmabis * sqrtdt * np.random.randn()
for k in range(ntrails): #plotting 1000 realizations
plt.plot(t, x[k])
# Time averaging of each time stamp using bin
# Really lost from this point onwrds.
bins = np.linspace(-2., 15., 100)
fig, ax = plt.subplots(1, 1, figsize=(12, 4))
for i in range(ntrails):
hist, _ = np.histogram(x[:,[i]], bins=bins)
ax.plot(hist)
Graph for 1000 realizations of Ornstein- Uhlenbeck Process:
Distribution generated from the code above:
I'm really lost with assigning of the bin value and plotting the histogram using it. I want to know whether my code is correct for plotting distributions corresponding to each time step, using bin. If not please tell me what modifications I need to make to my code.
The last for loop should iterate over n, not ntrails (which happen to be the same value here) but otherwise the code and plots look correct (apart from a few minor issues such as that is takes 101 breaks to get 100 bins so your code should probably read bins = np.linspace(-2., 15., 101)).
Your plots could be improved a bit though. A good guiding principle is to use as little ink as necessary to communicate the point that you are trying to make. You are always trying to plot all the data, which ends up obscuring your plots. Also, you could benefit from paying more attention to colour. Colour should carry meaning, or not be used at all.
Here would be my suggestions:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['axes.spines.top'] = False
mpl.rcParams['axes.spines.right'] = False
sigma = 1. # Standard deviation.
mu = 10. # Mean.
tau = .05 # Time constant.
dt = .001 # Time step.
T = 1 # Total time.
n = int(T / dt) # Number of time steps.
ntrails = 10000 # Number of Realizations.
t = np.linspace(0., T, n) # Vector of times.
sigmabis = sigma * np.sqrt(2. / tau)
sqrtdt = np.sqrt(dt)
x = np.zeros((ntrails,n)) # Vector containing all successive values of our process
for j in range(ntrails): # Euler Method
for i in range(n - 1):
x[j,i + 1] = x[j,i] + dt * (-(x[j,i] - mu) / tau) + sigmabis * sqrtdt * np.random.randn()
fig, ax = plt.subplots()
for k in range(200): # plotting fewer realizations shows the distribution better in this case
ax.plot(t, x[k], color='k', alpha=0.02)
# Really lost from this point onwards.
bins = np.linspace(-2., 15., 101) # you need 101 breaks to get 100 bins
fig, ax = plt.subplots(1, 1, figsize=(12, 4))
# plotting a smaller selection of time points spaced out using a log scale prevents
# the asymptotic approach to the mean from dominating the plot
for i in np.logspace(0, np.log10(n)-1, 21):
hist, _ = np.histogram(x[:,[int(i)]], bins=bins)
ax.plot(hist, color=plt.cm.plasma(i/20))
plt.show()

efficient sampling from beta-binomial distribution in python

for a stochastic simulation I need to draw a lot of random numbers which are beta binomial distributed.
At the moment I implemented it this way (using python):
import scipy as scp
from scipy.stats import rv_discrete
class beta_binomial(rv_discrete):
"""
creating betabinomial distribution by defining its pmf
"""
def _pmf(self, k, a, b, n):
return scp.special.binom(n,k)*scp.special.beta(k+a,n-k+b)/scp.special.beta(a,b)
so sampling a random number x can be done by:
betabinomial = beta_binomial(name="betabinomial")
x = betabinomial.rvs(0.5,0.5,3) # with some parameter
The problem is, that sampling one random number takes ca. 0.5ms, which is in my case dominating the whole simulation speed. The limiting element is the evaluation of the beta functions (or gamma functions within these).
Does anyone has a great idea how to speed up the sampling?
Well, here is working and lightly tested code which seems to be faster, using compound distribution property of Beta-Binomial.
We sample p from beta and then using it as parameter for binomial. If you would sample large sized vectors, it would be even faster.
import numpy as np
def sample_Beta_Binomial(a, b, n, size=None):
p = np.random.beta(a, b, size=size)
r = np.random.binomial(n, p)
return r
np.random.seed(777777)
q = sample_Beta_Binomial(0.5, 0.5, 3, size=10)
print(q)
Output is
[3 1 3 2 0 0 0 3 0 3]
Quick test
np.random.seed(777777)
n = 10
a = 2.
b = 2.
N = 100000
q = sample_Beta_Binomial(a, b, n, size=N)
h = np.zeros(n+1, dtype=np.float64) # histogram
for v in q: # fill it
h[v] += 1.0
h /= np.float64(N) # normalization
print(h)
prints histogram
[0.03752 0.07096 0.09314 0.1114 0.12286 0.12569 0.12254 0.1127 0.09548 0.06967 0.03804]
which is quite similar to green graph in the Wiki page on Beta-Binomial

Numpy FFT error - Comb filter with envelope

I am currently having some issues understanding some discrepancies between the frequency response function calculated through the Z-transform and numpy's FFT algorithm. It is of a simple echo represented by the impulse response:
h[n] = δ[n] + αδ[n-L]
Where α is an attenuation factor for the echo and L is the delay amount in samples. The corresponding transfer function is given by:
H(f) = ( e^(j2πfΔL) + α ) / e^(j2πfΔL)
Where Δ is the sampling period.
I seem to be getting different results using the same number of frequency bins when directly plotting the transfer function magnitude above and when using numpy's fft algorithm.
In particular, the FFT magnitude seems to form an envelope around the overall spectrum - I believe that I should be getting a simple comb filter as that of the transfer function method: imgur
Could anyone clarify why this may be happening and whether I have potentially overlooked anything? Is this due to errors in the DFT algorithms employed?
Appreciate your time, cheers!
import matplotlib.pyplot as pyplt
import numpy as np
fs = 48000 # Sample rate
T = 1/fs # Sample period
L = 3000 # Delay
a = 0.5 # Attenuation factor
# h[n] = dirac[n] + a * dirac[n-L]
h = np.zeros(L)
h[0] = 1
h[L-1] = a
# Transfer function H
freqs = np.arange(0, fs, fs/(2*L))
e = np.exp(1j*freqs*2*np.pi*L*T)
H = (e + a)/(e)
# Transfer function H via FFT - same # of bins
H_FFT = np.fft.fft(h, 2*L)
pyplt.figure()
# Correct comb filter
pyplt.plot(np.abs(H))
# Runing FFT gives a form of strange envelope error
pyplt.plot(np.abs(H_FFT))
pyplt.legend()
Your code is almost OK. What you need to change:
As far as I understand there is no reason the length of Fourier transform vector will be proportional to L. It needs to be the size of your sampling frequency, i.e. fs.
Your L is too high. The result oscillates too quickly. Try with lower L.
Here is a modified code to show how it works, plotted in two different figures for clarity:
import matplotlib.pyplot as plt
import numpy as np
fs = 48000 # Sample rate
T = 1/fs # Sample period
L = 3 # Delay
a = 0.5 # Attenuation factor
# h[n] = dirac[n] + a * dirac[n-L]
h = np.zeros(fs)
h[0] = 1
h[L] = a
# Transfer function H
freqs = np.arange(0, fs)
e = np.exp(1j*freqs*2*np.pi*L*T)
H = (e + a)/(e)
# Transfer function H via FFT - same # of bins
H_FFT = np.fft.fft(h)
# Correct comb filter
plt.figure()
plt.plot(np.abs(H))
# Runing FFT gives a form of strange envelope error
plt.figure()
plt.plot(np.abs(H_FFT))

Categories

Resources