Using numpy vectorize - python

I'm trying to do some bayesian probit code using data augmentation. I can get it to work if I loop over the rows of the output matrix, but I'd like to vectorize it and do it all in one shot (presumably that's faster).
import numpy as np
from numpy import random
import statsmodels.api as sm
from scipy import stats
from scipy.stats import norm, truncnorm
##################################
### Create some simulated data ###
num_leg = 50
num_bills = 20
a = np.random.uniform(-1,1,num_bills).reshape(num_bills, 1)
b = np.random.uniform(-2,2,num_bills).reshape(num_bills, 1)
x = np.random.standard_normal(num_leg).reshape(num_leg, 1)
ystar_base = a + np.dot(b,x.T)
epsilon = np.random.standard_normal(num_leg * num_bills).reshape(num_bills, num_leg)
ystar = ystar_base + epsilon
y = 1*(ystar >0)
### Initialize some stuff I need ###
avec = [0]*num_bills # These are bill parameters
bvec = [0]*num_bills
betavec = [np.matrix(zip(avec,bvec))]
xvec = [0]*num_leg # these are legislator parameters
_ones = np.ones(num_leg)
def init_y(mat): # initialize a latent y matrix
if mat==1: return truncnorm.rvs(0,10000)
else: return truncnorm.rvs(-10000,0)
vectorize_y = np.vectorize(init_y)
latent_y = np.matrix(vectorize_y(y))
burn = 500 # How long to run the MCMC
runs = 500
### define the functions ###
def sample_params(xnow,ynow): # This is the function I'd like to vectorize
if type(xnow) == list:
xnow = np.array(xnow)
if type(ynow) == list:
ynow = np.array(ynow)
ynow = ynow.T #reshape(ynow.shape[0],1)
sigma = np.linalg.inv(np.dot(xnow.T,xnow)) ###This is the line that produces an error###
xy = np.dot(xnow.T,ynow)
mu = np.dot(sigma, xy) # this is just (x'x)inv x'y
return np.random.multivariate_normal(np.array(mu).flatten(), sigma)
vecparams = np.vectorize(sample_params)
def get_mu(xnow, bnow): # getting the updated mean to draw the latent ys
if type(xnow) == list:
xnow = np.array(xnow)
if type(bnow) == list:
bnow = np.array(bnow)
mu = np.dot(xnow,bnow.T)
mu = np.matrix(mu)
return mu
def sample_y(mu, ynow): # generate latent y matrix
if ynow==1:
a, b = (0 - mu),(10000-mu)
else:
a, b = (-10000 - mu),(0-mu)
return truncnorm.rvs(a,b)
vector_sample = np.vectorize(sample_y) # I'd like to be able to do something like this
### Here's the MCMC loop with the internal loop over rows(bills)
for i in range(burn+runs):
this_beta = []
this_x = []
this_y = []
for j in range(num_bills): #I'd like to get rid of this loop
ex = zip(x_ones, x)
newbeta = sample_params(ex, latent_y[j])
this_beta.append(newbeta)
#ex = np.array(zip(x_ones, x))
#this_beta = vecparams(ex, latent_y[:,]) # and call the vectorized function here
betavec.append(this_beta)
#Note, I can vectorize the latent outputs easily enough here
mean = get_mu(ex, betavec[-1])
latent_y = np.matrix(vector_sample(mean, np.matrix(y).T).T.reshape(latent_y.shape[0], latent_y.shape[1]))
### Now a bit of code to check to see if I've recovered what I want ###
test_beta = [zip(*(z)) for z in betavec[burn:]]
test_a = np.array([z[0] for z in test_beta])
test_b = np.array([z[1] for z in test_beta])
amean = test_a.sum(axis = 0)/float(runs)
bmean = test_b.sum(axis = 0)/float(runs)
print 'a mean'
print np.corrcoef([amean, np.array(a)])
print
print 'b mean'
print np.corrcoef([bmean, np.array(b)])
If I comment out the loop and use the commented out lines just above, I get the following error at the line I indicated earlier (the one that defines sigma):
LinAlgError: 0-dimensional array given. Array must be at least two-dimensional

Related

Does this python integration scheme match the analytic expression?

According to the original paper by Huang
https://arxiv.org/pdf/1401.4211.pdf
The marginal Hibert spectrum is given by:
where A = A(w,t) (i.e., a function time and frequency) and p(w,A)
the joint probability density function of P(ω, A) of the frequency [ωi] and amplitude [Ai].
I am trying to estimate 1) The joint probability density using the plt.hist2d 2) the integral shown below using a sum.
The code I am using is the following:
IA_flat1 = np.ravel(IA) ### Turn matrix to 1 D array
IF_flat1 = np.ravel(IF) ### Here IA corresponds to A
IF_flat = IF_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep only desired frequencies
IA_flat = IA_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep IA that correspond to desired frequencies
### return the Joint probability density
Pjoint,f_edges, A_edges,_ = plt.hist2d(IF_flat,IA_flat,bins=[bins_F,bins_A], density=True)
plt.close()
n1 = np.digitize(IA_flat, A_edges).astype(int) ### Return the indices of the bins to which
n2 = np.digitize(IF_flat, f_edges).astype(int) ### each value in input array belongs.
### define integration function
from numba import jit, prange ### Numba is added for speed
#jit(nopython=True, parallel= True)
def get_int(A_edges, Pjoint ,IA_flat, n1, n2):
dA = np.diff(A_edges)[0] ### Find dx for integration
sum_h = np.zeros(np.shape(Pjoint)[0]) ### Intitalize array
for j in prange(np.shape(Pjoint)[0]):
h = np.zeros(np.shape(Pjoint)[1]) ### Intitalize array
for k in prange(np.shape(Pjoint)[1]):
needed = IA_flat[(n1==k) & (n2==j)] ### Keep only the elements of arrat that
### are related to PJoint[j,k]
h[k] = Pjoint[j,k]*np.nanmean(needed**2)*dA ### Pjoint*A^2*dA
sum_h[j] = np.nansum(h) ### Sum_{i=0}^{N}(Pjoint*A^2*dA)
return sum_h
### Now run previously defined function
sum_h = get_int(A_edges, Pjoint ,IA_flat, n1, n2)
1) I am not sure that everything is correct though. Any suggestions or comments on what I might be doing wrong?
2) Is there a way to do the same using a scipy integration scheme?
You can extract the probability from the 2D histogram and use it for the integration:
# Added some numbers to have something to run
import numpy as np
import matplotlib.pyplot as plt
IA = np.random.rand(100,100)
IF = np.random.rand(100,100)
bins_F = np.linspace(0,1,20)
bins_A = np.linspace(0,1,100)
min_f = 0
fs = 1.0
IA_flat1 = np.ravel(IA) ### Turn matrix to 1 D array
IF_flat1 = np.ravel(IF) ### Here IA corresponds to A
IF_flat = IF_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep only desired frequencies
IA_flat = IA_flat1[(IF_flat1>min_f) & (IF_flat1<fs)] ### Keep IA that correspond to desired frequencies
### return the Joint probability density
Pjoint,f_edges, A_edges,_ = plt.hist2d(IF_flat,IA_flat,bins=[bins_F,bins_A], density=True)
f_values = (f_edges[1:]+f_edges[:-1])/2
A_values = (A_edges[1:]+A_edges[:-1])/2
dA = A_values[1]-A_values[0] # for the integral
#Pjoint.shape (19,99)
h = np.zeros(f_values.shape)
for i in range(len(f_values)):
f = f_values[i]
# column of the histogram with frequency f, probability
p = Pjoint[i]
# summatory equivalent to the integral
integral_result = np.sum(p*A_values**2*dA )
h[i] = integral_result
plt.figure()
plt.plot(f_values,h)

How to calculate Bayesian model selection with Python's pymcmcstat library

I am lost within the pymcmcstat documentation of Python. I managed to plot the parameter distributions etc, but when it comes to the Bayes factor, I need to calculate the integral over the parameter space of likelihood for each model.
I followed this video. Each model has a different model function with different parameters. According to this link, I am supposed to compare the model evidences for model selection. All I have in my hand is the chain results after burnin that returns the distribution for each parameters, chain for sum-of-squares error (SSE) and variances. How do I compare the models with mcmc chain results I have?
Where do I go from here?
Here is my code for one model; for each model, the test_modelfun is changed and the chain results are saved for further comparison of different models;
# Data related lines: input omega and output fm
x = (np.array([76.29395, 152.5879, 305.1758, 610.3516, 1220.703, 2441.406, 4882.813, 9765.625, 19531.25, 39062.5, 78125, 156250, 312500, 625000]))
y = np.array([155.6412886 -63.3826188j , 113.9114436 -79.90544719j, 64.97809441-77.65152741j, 26.87482243-57.38474656j, 7.44462341-34.02438426j, 2.32954856-16.17918216j, 2.30747953 -6.72487436j, 3.39658859 -2.72444011j, 4.0084345 -1.2029167j , 4.25877486 -0.70276446j, 4.11761329 -0.69591231j, 3.83339489 -0.65244854j, 3.47289164 -0.6079278j , 3.07027319 -0.14914359j])
#import mcmc library and add data to the library in the second line below
mcstat = MCMC()
mcstat.data.add_data_set(x,y)
##define transfer function model calculated with theta parameters
def test_modelfun(xdata, theta):
K, alpha_0, alpha_1, Tp_1, Tp_2, Tz_1 = 10**theta[0], 10**theta[1], 10**theta[2], 10**theta[3], 10**theta[4], 10**theta[5]
#####################
Pz_0 = (omega**(alpha_0))
Pz_1 = (np.sqrt(((Tp_1**2)*(omega**(2*alpha_1))) + (2*Tp_1*(omega**alpha_1)*cos(alpha_1*pi/2)) +1))
Pz_2 = (np.sqrt(((Tp_2**2)*(omega**(2*alpha_1))) + (2*Tp_2*(omega**alpha_1)*cos(alpha_1*pi/2)) +1))
Zz_1 = (np.sqrt(((Tz_1**2)*(omega**(2*alpha_1))) + (2*Tz_1*(omega**alpha_1)*cos(alpha_1*pi/2)) +1))
Pp_0 = np.array([(-1*pi*alpha_0)/2]*len(omega)).T#[0]
Pp_1 = np.array([math.atan((Tp_1*(omega[i]**alpha_1)*sin(pi*alpha_1/2))/(1+(Tp_1*(omega[i]**alpha_1)*cos(pi*alpha_1/2)))) for i in range(len(omega))])
Pp_2 = np.array([math.atan((Tp_2*(omega[i]**alpha_1)*sin(pi*alpha_1/2))/(1+(Tp_2*(omega[i]**alpha_1)*cos(pi*alpha_1/2)))) for i in range(len(omega))])
Zp_1 = np.array([math.atan((Tz_1*(omega[i]**alpha_1)*sin(pi*alpha_1/2))/(1+(Tz_1*(omega[i]**alpha_1)*cos(pi*alpha_1/2)))) for i in range(len(omega))])
#####################
Z_est = (K*Zz_1)/(Pz_0*Pz_1*Pz_2)
P_est = Zp_1 + Pp_0 - Pp_1 - Pp_2
#####################
R_est = np.real([cmath.rect(Z_est[i], P_est[i]) for i in range(len(omega))])#abs()#[:,0]
X_est = np.imag([cmath.rect(Z_est[i], P_est[i]) for i in range(len(omega))])#abs()#[:,0]
RX_est = (R_est + 1j*X_est)
return RX_est
def modelfun(xdata, theta):
ymodel = test_modelfun(xdata,theta)
Zest = 20*log10(np.abs(ymodel))
return Zest
##define sum of squares function for the error in evaluating the likelihood function L(Fobs(i)|q)
def test_ssfun(theta,data):
xdata = data.xdata[0]
ydata = data.ydata[0]
ymodel = test_modelfun(xdata,theta)
return (1/len(omega))*(sum((real(fm)- real(ymodel))**2 + (imag(fm)-imag(ymodel))**2))
#sumsquares = sum((ymodel[:,0]-ydata[:,0])**2)
##import mcmc library and add data to the library in the second line below
itr = 50.0e4
verb = 1
wbar = 1
mcstat = MCMC()
mcstat.data.add_data_set(x,y)
## add model parameters
mcstat.parameters.add_model_parameter(name='th_1',theta0=1, minimum=-2,maximum=3) #m_k, M_k = -2, 3
mcstat.parameters.add_model_parameter(name='th_2',theta0=-1, minimum=-4,maximum=0) #m_a0, M_a0 = -4, 0
mcstat.parameters.add_model_parameter(name='th_3',theta0=-1, minimum=-3,maximum=0) #m_a1, M_a1 = -3, 0
mcstat.parameters.add_model_parameter(name='th_4',theta0=-4, minimum=-9,maximum=0) #m_p1, M_p1 = -9, 0
mcstat.parameters.add_model_parameter(name='th_5',theta0=-4, minimum=-9,maximum=0) #m_p2, M_p2 = -9, 0
mcstat.parameters.add_model_parameter(name='th_6',theta0=-4, minimum=-9,maximum=0) #m_z1, M_z1 = -9, 0
## define simulation options: mh=metropolis-hastings, am=adaptive metropolis, dr=delayed rejection, dram=dr+am
mcstat.simulation_options.define_simulation_options(nsimu=int(itr), updatesigma=1, method='dr', adaptint=100, verbosity=verb, waitbar=wbar)
## define model settings
mcstat.model_settings.define_model_settings(sos_function=test_ssfun)
mcstat.run_simulation()
## extract results
results=mcstat.simulation_results.results
chain = results['chain']# chain for each parameter sampled during simulation. s2
s2chain = results['s2chain']# chain for error variances. if updatesigma=0 then s2chain is an empty list
sschain = results['sschain']# chain for sum-of-squares error calculated using each set of parameter values in the cahin
names = results['names']
burnin = int(itr/2)
## display chain statistics
mcstat.chainstats(chain[burnin:,:],results)
mcpl = mcstat.mcmcplot
figcp = mcpl.plot_chain_panel(chain, names, figsizeinches = (7,6))
axes = figcp.get_axes()
for ii, ax in enumerate(axes):
ch = chain[:, ii]
ax.plot([burnin, burnin], [ch.min(), ch.max()], 'r')
figpd = mcpl.plot_density_panel(chain[burnin:,:], names, figsizeinches=(7,6))
figpc = mcpl.plot_pairwise_correlation_panel(chain[burnin:,:], names, figsizeinches = (7,6))
mcstat.PI.setup_prediction_interval_calculation(results=results, data=mcstat.data, modelfunction=modelfun, burnin=burnin)
mcstat.PI.generate_prediction_intervals(calc_pred_int=True, waitbar=False)
fg, ax = mcstat.PI.plot_prediction_intervals(adddata=True, plot_pred_int=True, figsizeinches = (7,5), data_display=dict(color='k'))

Why is my code so much slower in function form?

I wrote a code a while ago that processes spectra using data from text files and performing calculations on them. I started with a code that just does everything line-by-line without any functions, and despite being long, it finishes running in 2.11 seconds (according to %%timeit). Below is that original code, labeled as such.
However, I wanted to put my code into functions instead, to allow for easier readability and usage with different models in the future. Even though I'm using all the same steps as I did before (but this time inside my functions), it is so much slower. This code is also below. Now, I have to wait for about 15-20 minutes to get the same outputs. Why is it so much slower, and is there any way I can make it significantly faster but still use functions?
Original Code:
import re
import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate
filename = 'bpass_spectra.txt'
extinctionfile = 'ExtinctionLawPoints.txt' # from R_V = 4.0
pointslist = []
datalist = []
speclist = []
# Constants
Msun = 1.98892e30 # solar mass [kg]
h = 4.1357e-15 # Planck's constant [eV s]
c = float(3e8) # speed of light [m/s]
# Read spectra file
f = open(filename, 'r')
rawspectra = f.readlines()
met = re.findall('Z\s=\s(\d*\.\d+)', rawspectra[0])
del rawspectra[0]
for i in range(len(rawspectra)):
newlist = rawspectra[i].split(' ')
datalist.append(newlist)
# Read extinction curve data file
rawpoints = open(extinctionfile, 'r').readlines()
for i in range(len(rawpoints)):
newlst = re.split('(?!\S)\s(?=\S)|(?!\S)\s+(?=\S)', rawpoints[i])
pointslist.append(newlst)
pointslist = pointslist[3:]
lambdalist = [float(item[0]) for item in pointslist]
k_abslist = [float(item[4]) for item in pointslist]
xvallist = [(c*h)/(lamb*1e-6) for lamb in lambdalist]
k_interp = scipy.interpolate.interp1d(xvallist, k_abslist)
# Create new lists
Elist = [float(item[0]) for item in datalist]
speclambdalist = [h*c*1e9/E for E in Elist]
z1list = [float(item[1]) for item in datalist]
speclist.extend(z1list)
met = met[0]
klist = [None]*len(speclist)
Loutlist = [None]*len(speclist)
Tlist = [None]*len(speclist)
# Define parameters
b = 2.0
R = 1.0
z = 1.0
Mgas = 1.0 # mass of gas, input
Mhalo = 2e41 # mass of dark matter halo, known
if float(met) > 0.0052:
DGRlist = [50.0*np.exp(-2.21)*float(met)]*len(speclist)
elif float(met) <= 0.0052:
DGRlist = [((50.0*float(met))**3.15)*np.exp(-0.96)]*len(speclist)
for i in range(len(speclist)):
if float(Elist[i]) <= 4.1357e-3: # frequencies <= 10^12 Hz
klist[i] = 0.1*(float(Elist[i])/(1000.0*h))**b # extinction law [cm^2/g]
elif float(Elist[i]) > 4.1357e-3: # frequencies > 10^12 Hz
klist[i] = k_interp(Elist[i]) # interpolated function's value at Elist[i]
Mdustlist = [Mgas*DGR for DGR in DGRlist] # dust mass
Rhalo = 0.784*(0.27**2.0)*(0.7**(-2.0/3.0))*float(10.0/(1.0+z))*((Mhalo/(1e8*Msun))**(1.0/3.0))
Rdust = 0.018*Rhalo # [kpc]
for i in range(len(speclist)):
Tlist[i] = 3*Mdustlist[i]*klist[i]/(4*np.pi*Rdust)
Linlist = [float(spectra)*R for spectra in speclist]
# Outgoing luminosity as function of wavelength
for i in range(len(Linlist)):
Loutlist[i] = Linlist[i]*np.exp(-Tlist[i])
# Test the calculation
print "LIN ELEMENTS 0 AND 1000:", Linlist[0], Linlist[1000]
print "LOUT ELEMENTS 0 AND 1000:", Loutlist[0], Loutlist[1000]
New "function-ized" Code (much slower):
import re
import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate
# Required files and lists
filename = 'bpass_spectra.txt' # number of columns = 4
extinctionfile = 'ExtinctionLawPoints.txt' # R_V = 4.0
datalist = []
if filename == 'bpass_spectra.txt':
filetype = 4
else:
filetype = 1
if extinctionfile == 'ExtinctionLawPoints.txt':
R_V = 4.0
else:
R_V = 1.0 #to be determined
# Constants
M_sun = 1.98892e30 # solar mass [kg]
h = 4.1357e-15 # Planck's constant [eV s]
c = float(3e8) # speed of light [m/s]
# Inputs
beta = 2.0
R = 1.0
z = 1.0
M_gas = 1.0
M_halo = 2e41
# Read spectra file
f = open(filename, 'r')
rawlines = f.readlines()
met = re.findall('Z\s=\s(\d*\.\d+)', rawlines[0])
del rawlines[0]
for i in range(len(rawlines)):
newlist = rawlines[i].split(' ')
datalist.append(newlist)
# Read extinction curve data file
rawpoints = open(extinctionfile, 'r').readlines()
def interpolate(R_V, rawpoints, Elist, j):
pointslist = []
if R_V == 4.0:
for i in range(len(rawpoints)):
newlst = re.split('(?!\S)\s(?=\S)|(?!\S)\s+(?=\S)', rawpoints[i])
pointslist.append(newlst)
pointslist = pointslist[3:]
lambdalist = [float(item[0]) for item in pointslist]
k_abslist = [float(item[4]) for item in pointslist]
xvallist = [(c*h)/(lamb*1e-6) for lamb in lambdalist]
k_interp = scipy.interpolate.interp1d(xvallist, k_abslist)
return k_interp(Elist[j])
# Dust extinction function
def dust(interpolate, filetype, datalist, beta, R, z, M_gas, M_halo, met):
speclist = []
if filetype == 4:
metallicity = float(met[0])
Elist = [float(item[0]) for item in datalist]
speclambdalist = [h*c*1e9/E for E in Elist]
met1list = [float(item[1]) for item in datalist]
speclist.extend(met1list)
klist, Tlist = [None]*len(speclist), [None]*len(speclist)
if metallicity > 0.0052:
DGRlist = [50.0*np.exp(-2.21)*metallicity]*len(speclist) # dust to gas ratio
elif metallicity <= 0.0052:
DGRlist = [((50.0*metallicity)**3.15)*np.exp(-0.96)]*len(speclist)
for i in range(len(speclist)):
if Elist[i] <= 4.1357e-3: # frequencies <= 10^12 Hz
klist[i] = 0.1*(float(Elist[i])/(1000.0*h))**beta # extinction law [cm^2/g]
elif Elist[i] > 4.1357e-3: # frequencies > 10^12 Hz
klist[i] = interpolate(R_V, rawpoints, Elist, i) # interpolated function's value at Elist[i]
Mdustlist = [M_gas*DGR for DGR in DGRlist] # dust mass
R_halo = 0.784*(0.27**2.0)*(0.7**(-2.0/3.0))*float(10/(1+z))*((M_halo/(1e8*M_sun))**(1.0/3.0))
R_dust = 0.018*R_halo # [kpc]
# Optical depth calculation
Tlist = [3*Mdustlist[i]*klist[i]/(4*np.pi*R_dust) for i in range(len(speclist))]
# Ingoing and outgoing luminosities as functions of wavelength
Linlist = [float(spectra)*R for spectra in speclist]
Loutlist = [Linlist[i]*np.exp(-Tlist[i]) for i in range(len(speclist))]
return speclambdalist, Linlist, Loutlist
print dust(interpolate, filetype, datalist, beta, R, z, M_gas, M_halo, met)
Even when I only have the function return Loutlist instead of the tuple of 3 lists, it's still extremely slow. Any ideas on why this is? Also, I'm going to want to return the tuple and then plot speclambdalist versus Linlist, and also plot speclambdalist versus Loutlist on the same plot. But I'm under the impression that each time I call dust(interpolate, filetype, datalist, beta, R, z, M_gas, M_halo, met)[i] where i = 0, 1, or 2 (I'll be doing this multiple times), it'll have to run the function again each time. Is there any way to bypass these extra runs to further increase speed? Thank you!

Python's fsolve not working

I'm currently trying to find the intercept of 2 equations from my code (pasted below). I'm using fsolve and have used it successfully in one part but I can't get it to work for the second.
Confusingly it's not showing up an error, if you paste this code into your notebook and run it you'll see 2 grphs, on the first graph there's a line at an angle which should be stopping at the eqm line.
The section which wont work is def q_eqm(x_q). Thank you for your help
import numpy as np
import scipy.optimize as opt
import matplotlib.pyplot as plt
AC_LK = np.array([4.02232,1206.53,220.291])
AC_HK = np.array([4.0854,1348.77,219.976])
P_Tot = 1 # Bara
N_Size = 11 # 1001 = 0.1% accuracy for xA
xf = 0.7
q = 0.7
xA = np.linspace(0,1,N_Size)
yA = np.linspace(0.00,0.00,N_Size)
T = np.linspace(0.00,0.00,N_Size)
x = np.array([xA[0:N_Size],yA[0:N_Size],T[0:N_Size]]) # x[xA,yA,T]
F = np.empty((1))
def xA_T(N):
xA_Ant = x[0,N]
def P_Ant(T):
PA = pow(10,AC_LK[0]-(AC_LK[1]/(T+AC_LK[2])))*xA_Ant
PB = pow(10,AC_HK[0]-(AC_HK[1]/(T+AC_HK[2])))*(1-xA_Ant)
F[0] = P_Tot - (PA + PB)
return F
return x
TGuess = [100]
T = opt.fsolve(P_Ant,TGuess)
x[2,N] = T
return x
for N in range(0,len(xA)):
xA_T(N)
x[1,N] = pow(10,AC_LK[0]-(AC_LK[1]/(x[2,N]+AC_LK[2])))*x[0,N]/P_Tot
q_int = ((-q*0)/(1-q)) + (xf/(1-q))
Eqm_Poly = np.polyfit(x[0,0:N_Size], x[1,0:N_Size], 6)
q_Poly = np.polyfit([xf,0], [xf,q_int], 1)
F = np.empty((1))
def q_Eqm(x_q):
y_q = q_Poly[0]*x_q + q_Poly[1]
eqm_y = (Eqm_Poly[0]*pow(x_q,6)+Eqm_Poly[1]*pow(x_q,5)+Eqm_Poly[2]*pow(x_q,4)+Eqm_Poly[3]*pow(x_q,3)+Eqm_Poly[4]*pow(x_q,2)+Eqm_Poly[5]*pow(x_q,1)+Eqm_Poly[6]*pow(x_q,0))
F[0] = y_q - eqm_y
return F
x_qGuess = [0]
x_q = opt.fsolve(q_Eqm,x_qGuess)
print(x,Eqm_Poly,x_q,q_int)
plt.plot(x[0,0:N_Size],x[1,0:N_Size],'k-',linewidth=1)
plt.plot([xf,xf],[0,xf],'b-',linewidth=1)
plt.plot([xf,x_q],[xf,(q_Poly[0]*x_q + q_Poly[1])],'r-',linewidth=1)
plt.legend(['Eqm','Feed'])
plt.xlabel('xA')
plt.ylabel('yA')
plt.xlim([0.00, 1])
plt.ylim([0.00, 1])
plt.savefig('x.png')
plt.savefig('x.eps')
plt.show()
plt.plot(x[0,0:N_Size],x[2,0:N_Size],'r--',linewidth=3)
plt.plot(x[1,0:N_Size],x[2,0:N_Size],'b--',linewidth=3)
plt.legend(['xA','yA'])
plt.xlabel('Mol Frac')
plt.ylabel('Temp degC')
plt.xlim([0, 1])
plt.savefig('Txy.png')
plt.savefig('Txy.eps')
plt.show()
The answer turns out to be relatively simple:
#F = np.empty((1)) # remove this
def q_Eqm(x_q):
y_q = q_Poly[0]*x_q + q_Poly[1]
eqm_y = (Eqm_Poly[0]*pow(x_q,6)+Eqm_Poly[1]*pow(x_q,5)+Eqm_Poly[2]*pow(x_q,4)+Eqm_Poly[3]*pow(x_q,3)+Eqm_Poly[4]*pow(x_q,2)+Eqm_Poly[5]*pow(x_q,1)+Eqm_Poly[6]*pow(x_q,0))
return y_q - eqm_y
The original code defines a global F, which is modified in the function and then returned. So in each iteration the function returns different values but they are the same object. This seems to confuse fsolve (I guess it internally stores references to the results rather than values). Removing this F and simply returning the result of the subtraction resolves the problem.

Is there any numpy autocorrellation function with standardized output?

I followed the advice of defining the autocorrelation function in another post:
def autocorr(x):
result = np.correlate(x, x, mode = 'full')
maxcorr = np.argmax(result)
#print 'maximum = ', result[maxcorr]
result = result / result[maxcorr] # <=== normalization
return result[result.size/2:]
however the maximum value was not "1.0". therefore I introduced the line tagged with "<=== normalization"
I tried the function with the dataset of "Time series analysis" (Box - Jenkins) chapter 2. I expected to get a result like fig. 2.7 in that book. However I got the following:
anybody has an explanation for this strange not expected behaviour of autocorrelation?
Addition (2012-09-07):
I got into Python - programming and did the following:
from ClimateUtilities import *
import phys
#
# the above imports are from R.T.Pierrehumbert's book "principles of planetary
# climate"
# and the homepage of that book at "cambridge University press" ... they mostly
# define the
# class "Curve()" used in the below section which is not necessary in order to solve
# my
# numpy-problem ... :)
#
import numpy as np;
import scipy.spatial.distance;
# functions to be defined ... :
#
#
def autocorr(x):
result = np.correlate(x, x, mode = 'full')
maxcorr = np.argmax(result)
# print 'maximum = ', result[maxcorr]
result = result / result[maxcorr]
#
return result[result.size/2:]
##
# second try ... "Box and Jenkins" chapter 2.1 Autocorrelation Properties
# of stationary models
##
# from table 2.1 I get:
s1 = np.array([47,64,23,71,38,64,55,41,59,48,71,35,57,40,58,44,\
80,55,37,74,51,57,50,60,45,57,50,45,25,59,50,71,56,74,50,58,45,\
54,36,54,48,55,45,57,50,62,44,64,43,52,38,59,\
55,41,53,49,34,35,54,45,68,38,50,\
60,39,59,40,57,54,23],dtype=float);
# alternatively in order to test:
s2 = np.array([47,64,23,71,38,64,55,41,59,48,71])
##################################################################################3
# according to BJ, ch.2
###################################################################################3
print '*************************************************'
global s1short, meanshort, stdShort, s1dev, s1shX, s1shXk
s1short = s1
#s1short = s2 # for testing take s2
meanshort = s1short.mean()
stdShort = s1short.std()
s1dev = s1short - meanshort
#print 's1short = \n', s1short, '\nmeanshort = ', meanshort, '\ns1deviation = \n',\
# s1dev, \
# '\nstdShort = ', stdShort
s1sh_len = s1short.size
s1shX = np.arange(1,s1sh_len + 1)
#print 'Len = ', s1sh_len, '\nx-value = ', s1shX
##########################################################
# c0 to be computed ...
##########################################################
sumY = 0
kk = 1
for ii in s1shX:
#print 'ii-1 = ',ii-1,
if ii > s1sh_len:
break
sumY += s1dev[ii-1]*s1dev[ii-1]
#print 'sumY = ',sumY, 's1dev**2 = ', s1dev[ii-1]*s1dev[ii-1]
c0 = sumY / s1sh_len
print 'c0 = ', c0
##########################################################
# now compute autocorrelation
##########################################################
auCorr = []
s1shXk = s1shX
lenS1 = s1sh_len
nn = 1 # factor by which lenS1 should be divided in order
# to reduce computation length ... 1, 2, 3, 4
# should not exceed 4
#print 's1shX = ',s1shX
for kk in s1shXk:
sumY = 0
for ii in s1shX:
#print 'ii-1 = ',ii-1, ' kk = ', kk, 'kk+ii-1 = ', kk+ii-1
if ii >= s1sh_len or ii + kk - 1>=s1sh_len/nn:
break
sumY += s1dev[ii-1]*s1dev[ii+kk-1]
#print sumY, s1dev[ii-1], '*', s1dev[ii+kk-1]
auCorrElement = sumY / s1sh_len
auCorrElement = auCorrElement / c0
#print 'sum = ', sumY, ' element = ', auCorrElement
auCorr.append(auCorrElement)
#print '', auCorr
#
#manipulate s1shX
#
s1shX = s1shXk[:lenS1-kk]
#print 's1shX = ',s1shX
#print 'AutoCorr = \n', auCorr
#########################################################
#
# first 15 of above Values are consistent with
# Box-Jenkins "Time Series Analysis", p.34 Table 2.2
#
#########################################################
s1sh_sdt = s1dev.std() # Standardabweichung short
#print '\ns1sh_std = ', s1sh_sdt
print '#########################################'
# "Curve()" is a class from RTP ClimateUtilities.py
c2 = Curve()
s1shXfloat = np.ndarray(shape=(1,lenS1),dtype=float)
s1shXfloat = s1shXk # to make floating point from integer
# might be not necessary
#print 'test plotting ... ', s1shXk, s1shXfloat
c2.addCurve(s1shXfloat)
c2.addCurve(auCorr, '', 'Autocorr')
c2.PlotTitle = 'Autokorrelation'
w2 = plot(c2)
##########################################################
#
# now try function "autocorr(arr)" and plot it
#
##########################################################
auCorr = autocorr(s1short)
c3 = Curve()
c3.addCurve( s1shXfloat )
c3.addCurve( auCorr, '', 'Autocorr' )
c3.PlotTitle = 'Autocorr with "autocorr"'
w3 = plot(c3)
#
# well that should it be!
#
So your problem with your initial attempt is that you did not subtract the average from your signal. The following code should work:
timeseries = (your data here)
mean = np.mean(timeseries)
timeseries -= np.mean(timeseries)
autocorr_f = np.correlate(timeseries, timeseries, mode='full')
temp = autocorr_f[autocorr_f.size/2:]/autocorr_f[autocorr_f.size/2]
iact.append(sum(autocorr_f[autocorr_f.size/2:]/autocorr_f[autocorr_f.size/2]))
In my example temp is the variable you are interested in; it is the forward integrated autocorrelation function. If you want the integrated autocorrelation time you are interested in iact.
I'm not sure what the issue is.
The autocorrelation of a vector x has to be 1 at lag 0 since that is just the squared L2 norm divided by itself, i.e., dot(x, x) / dot(x, x) == 1.
In general, for any lags i, j in Z, where i != j the unit-scaled autocorrelation is dot(shift(x, i), shift(x, j)) / dot(x, x) where shift(y, n) is a function that shifts the vector y by n time points and Z is the set of integers since we're talking about the implementation (in theory the lags can be in the set of real numbers).
I get 1.0 as the max with the following code (start on the command line as $ ipython --pylab), as expected:
In[1]: n = 1000
In[2]: x = randn(n)
In[3]: xc = correlate(x, x, mode='full')
In[4]: xc /= xc[xc.argmax()]
In[5]: xchalf = xc[xc.size / 2:]
In[6]: xchalf_max = xchalf.max()
In[7]: print xchalf_max
Out[1]: 1.0
The only time when the lag 0 autocorrelation is not equal to 1 is when x is the zero signal (all zeros).
The answer to your question is: no, there is no NumPy function that automatically performs standardization for you.
Besides, even if it did you would still have to check it against your expected output, and if you're able to say "Yes this performed the standardization correctly", then I would assume that you know how to implement it yourself.
I'm going to suggest that it might be the case that you've implemented their algorithm incorrectly, although I can't be sure since I'm not familiar with it.

Categories

Resources