I need the values of the autocorrelation coefficients coming from the autocorrelation_plot(). The problem is that the output coming from this function is not accessible, so I need another function to get such values. That's why I used acf() from statsmodels but it didn't get the same plot as autocorrelation_plot() does. Here is my code:
from statsmodels.tsa.stattools import acf
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt
import numpy as np
y = np.sin(np.arange(1,6*np.pi,0.1))
plt.plot(acf(y))
plt.show()
So the result is not the same as this:
autocorrelation_plot(y)
plt.show()
This seems to be related to the nlags parameter of acf:
nlags: int, optional
Number of lags to return autocorrelation for.
I don't know what exactly this does but in the source of acf there is a slicing
that shortens the array:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
acf = avf[:nlags + 1] / avf[0]
If you use statsmodels.tsa.stattools.acovf directly the result is the same as with autocorrelation_plot:
avf = acovf(x, unbiased=unbiased, demean=True, fft=fft, missing=missing)
So you can call it like
plt.plot(acf(y, nlags=len(y)))
to make it work.
An explanation of lag: https://math.stackexchange.com/questions/2548314/what-is-lag-in-a-time-series/2548350
Related
I am trying to code two random variables with a correlation. I have been given $Z_1\tilde N(0,1)$ and $Z_2\tilde N(0,1)$. I is also given $cor(Z_1,Z_2)=\rho$. So I need the formula to get $Z_2$ from $Z_1$. Initially, I was trying this code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
rho=0.5
N=100
Z1=np.random(N)
Z2=np.random(N)
return Z2
However, then I realized that $Z_2$ is now no longer correlated to $Z_1$. So I want to ask how I can get the correct $Z_2$.
Let $\alpha$ such that $\alpha^2+\rho^2 = 1$. Let $X, Y$ be independent $N(0,1)$ distributed variables. Set $Z_1 := \rho * X + \alpha * Y$ and $Z_2:=X$. Now $Z1, Z_2$ should fulfill your requirements.
I'm trying to fit my exponential data, but I am unable to get a decent answer. I'm using scipy and the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import scipy.optimize
import pylab
def exponential(x, a, k, b):
return a*np.exp(-x/k) + b
def main():
filename = 'tek0071ALL.csv'
df = pd.read_csv(filename, skiprows=14)
t = df['TIME']
ch3 = df['CH3']
idx1 = df.index[df['TIME']==-0.32]
idx2 = df.index[df['TIME']==-0.18]
t= t[idx1.values[0]:idx2.values[0]]
data=ch3[idx1.values[0]:idx2.values[0]]
popt_exponential, pcov_exponential = scipy.optimize.curve_fit(exponential, t, data, p0=[1,.1, 0])
# print(popt_exponential,pcov_exponential)
print(popt_exponential[0])
print(popt_exponential[1])
print(popt_exponential[2])
plt.plot(t,data,'.')
plt.plot(t,exponential(t,popt_exponential[0],popt_exponential[1],popt_exponential[2]))
plt.show()
plt.legend(['Data','Fit'])
main()
This is what the fit looks like:
and I think this means that it's actually a good fit. I think my time constant is correct, and that's what I'm trying to extract. However, the amplitude is really giving me trouble -- I expected the amplitude to be around 0.5 by inspection, but instead I get the following values for equation A*exp(-t/K)+C:
A:1.2424893552249658e-07
K:0.0207112474466181
C: 0.010623336832120528
I'm left wondering if this is correct, and that my amplitude really ought to be so tiny to account for the exponential's behavior.
Lets say I have the following function:
def f(x):
return log(3*exp(3*x) + 7*exp(7*x))
I want to do two things:
1) plot the function over a range of x-values
2) find the root of the function using the Newton method from scipy
My problem is that it seems that plotting is best done with a numpy array x=np.linspace(-2,2,1000), but then evaluating the function results in erros TypeError: only size-1 arrays can be converted to Python scalars. I can fix this by simply changing log and exp to np.log and np.exp, respectively.
But doing so then makes scipy.optimize.newton unhappy.
It seems like I need to define the function twice, once for use in plotting (with np. ...) and once for optimizing in the form given above.
I can't imagine that this is actually the case. Any hints would be greatly appreciated.
Seems legit, you just need to use numpy functions instead of base math functions:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
%matplotlib inline
def f(x):
return np.log(3*np.exp(3*x) + 7*np.exp(7*x))
x = np.linspace(-2,2,1000)
y = f(x)
plt.scatter(x, y)
optimize.root(f, 1)
I have an integration equations to calculate key rate and need to convert it into Python.
The equation to calculate key rate is given by:
where R(n) is:
and p(n)dn is:
The key rate should be plotted like this:
I have sucessfully plotted the static model of the graph using following equation:
import numpy as np
import math
from math import pi,e,log
import matplotlib.pyplot as plt
n1=np.arange(10, 55, 1)
n=10**(-n1/10)
Y0=1*(10**-5)
nd=0.25
ed=0.03
nsys=nd*n
QBER=((1/2*Y0)+(ed*nsys))/(Y0+nsys)
H2=-QBER*np.log2(QBER)-(1-QBER)*np.log2(1-QBER)
Rsp=np.log10((Y0+nsys)*(1-(2*H2)))
print (Rsp)
plt.plot(n1,Rsp)
plt.xlabel('Loss (dB)')
plt.ylabel('log10(Rate)')
plt.show()
However, I failed to plot the R^ratewise model. This is my code:
import numpy as np
import matplotlib.pyplot as plt
def h2(x):
return -x*np.log2(x)-(1-x)*np.log2(1-x)
e0=0.5
ed=0.03
Y0=1e-5
nd=0.25
nt=np.linspace(0.1,0.00001,1000)
y=np.zeros(np.size(nt))
Rate=np.zeros(np.size(nt))
eta_0=0.0015
for (i,eta) in enumerate(nt):
nsys=eta*nd
sigma=0.9
y[i]=1/(eta*sigma*np.sqrt(2*np.pi))*np.exp(-(np.log(eta/eta_0)+(1/2*sigma*sigma))**2/(2*sigma*sigma))
Rate[i]=(max(0.0,(Y0+nsys)*(1-2*h2(min(0.5,(e0*Y0+ed*nsys)/(Y0+nsys))))))*y[i]
plt.plot(nt,np.log10(Rate))
plt.xlabel('eta')
plt.ylabel('Rate')
plt.show()
Hopefully that anyone can help me to code the key rate with integration p(n)dn as stated above. This is the paper for referrence:
key rate
Thank you.
I copied & ran your second code block as-is, and it generated a plot. Is that what you wanted?
Using y as the p(n) in the equation, and the Rsp as the R(n), you should be able to use
NumPy's trapz function
to approximate the integral from the sampled p(n) and R(n):
n = np.linspace(0, 1, no_of_samples)
# ...generate y & Rst from n...
R_rate = np.trapz(y * Rst, n)
However, you'll have to change your code to sample y & Rst using the same n, spanning from 0 to 1`.
P.S. there's no need for the loop in your second code block; it can be condensed by removing the i's, swapping eta for nt, and using NumPy's minimum and maximum functions, like so:
nsys=nt*nd
sigma=0.9
y=1/(nt*sigma*np.sqrt(2*np.pi))*np.exp(-(np.log(nt/eta_0)+(1/2*sigma*sigma))**2/(2*sigma*sigma))
Rate=(np.maximum(0.0,(Y0+nsys)*(1-2*h2(np.minimum(0.5,(e0*Y0+ed*nsys)/(Y0+nsys))))))*y
I am running scipy.interpolate.griddata on a set of coordinates that could be of many dimensions (even 1). When the coordinates are 1D the nearest method produces nans instead of the closest values when outside boundaries. An example:
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
target_points = [1.,2.,3.,4.,5.,6.,7.]
points = np.random.rand(50)*2*np.pi
values = np.sin(points)
interp = griddata(points, values, target_points, method='nearest')
plt.plot(points,values,'o')
plt.plot(target_points,interp,'ro')
print interp
plt.show()
The last value printed is a NaN. Am I doing something wrong? If this is a limitation of scipy do you have a smart workaround?
Note that linear/cubic modes are expected to give NaNs, but this should not be the case for the 'nearest' mode.
When the data is 1-dimensional, griddata defers to interpolate.interp1d:
if ndim == 1 and method in ('nearest', 'linear', 'cubic'):
from .interpolate import interp1d
points = points.ravel()
...
ip = interp1d(points, values, kind=method, axis=0, bounds_error=False,
fill_value=fill_value)
return ip(xi)
So even though method='nearest' griddata will not extrapolate since interp1d behaves this way.
However, there are other tools, such as scipy.cluster.vq (vector quantization), which you could use to find the nearest value. For example,
import numpy as np
import scipy.cluster.vq as vq
import matplotlib.pyplot as plt
target_points = np.array([1.,2.,3.,4.,5.,6.,7.])
points = (np.random.rand(50)*2*np.pi)
values = np.sin(points)
code, dist = vq.vq(target_points, points)
interp = values[code]
plt.plot(points,values,'o')
plt.plot(target_points,interp,'ro')
print interp
plt.show()
This looks like a bug in scipy.interpolate.griddata because the behaviour is not according to the documentation which clearly states that the input argument "fill_value" has no effect when method is "nearest".
The output of the following line:
scipy.interpolate.griddata(points=np.array([1,2]), values=np.array([10,20]), xi=3, method='nearest', fill_value=-1)
is array(-1.0) which proves that the fill_value has an impact on the output contrary to what is stated in the documentation.