I am trying to fit an ARMA model to time series data. I haven't find any functions that can automatically choose the parameter. Below are the code I wrote however as I am a beginner to Python hence I believe this code can be optimised.
Can someone give me some ideas on how to:
do the Vectorization on the double loop
quicker way to do the parameter choosing
Much appreciate.
parameter_bound = 3
# Creating a 2-D array, storing the residuals of two different parameters of ARMA model
residuals = [[0 for x in range(parameter_bound)] for x in range(parameter_bound)]
model = [[0 for x in range(parameter_bound)] for x in range(parameter_bound)]
# Calculate residuals for each parameter combinations
for i in range(parameter_bound):
for j in range(parameter_bound):
model[i][j] = sm.tsa.ARMA(input_data, (i,j)).fit()
residuals[i][j] = sum(abs(model[i][j].resid))
# Find the parameters with lowest residuals
parameters = np.argmin(residuals)
parameter1 = parameters/parameter_bound
parameter2 = parameters - parameters/parameter_bound*parameter_bound
# Use the model with lowest residuals to get prediction data
prediction = model[parameter1][parameter2].resid + input_data
I'm not sure exactly what you're expecting, but you could replace your lists with numpy arrays (I don't think it'll improve your specific code):
import numpy as np
residuals = np.zeros((parameter_bound, parameter_bound))
model = np.zeros((parameter_bound, parameter_bound), np.object)
Also, be aware that np.argmin with axis=None returns an index for a flattened array, if you want to return the model parameters of the model with the lowest residuals you might try:
prediction = model.ravel()[np.argmin(residuals)].resid + input_data
You can use Ljung-Box test:
__, pvalue = sm.diagnostic.acorr_ljungbox(model[i][j].resid)
# if p-value higher than confidence interval 0.95, reject H
if pvalue > 0.05:
use_parameters = ...
Related
I'm trying to fit an asymmetric gaussian to my data. My data is just a numpy array called wave (x) and a numpy array called spec (y) that looks like an asymmetric gaussian.
This is the image with the data with an asymmetric gaussian fitted with curve_fit (this has a continuum too, but this is not important right now.
This is the function:
def agauss(amp, cen, b_sigma, r_sigma, x):
y = np.zeros(len(x))
for i in range(len(x)):
if x[i] < cen:
y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*b_sigma**2))
else:
y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*r_sigma**2))
return y
I'm using this code to fit the parameters:
with pm.Model() as asym:
cen = pm.Uniform('cen', lower=5173, upper=5179)
bsigma = pm.HalfCauchy('bsigma', beta=3)
rsigma = pm.HalfCauchy('rsigma', beta=3)
amp = pm.Uniform('amp', lower=1e-19, upper=1e-16)
err = pm.HalfCauchy('err', beta=0.0000001)
ag_pred = pm.Normal('ag_pred', mu=agauss(amp, cen, bsigma, rsigma, wave), sigma=err, observed=spec)
agdata = pm.sample(3000, cores=2)
But I get the error "Variables do not support boolean operations" in the theano.tensor module. How should I define the function in order to fit the paremeters? There is a better way to do this? Thanks!!
144 err = pm.HalfCauchy('err', beta=0.0000001)
145
--> 146 ag_pred = pm.Normal('ag_pred', mu=agauss(amp, cen, bsigma, rsigma, wave), sigma=err, observed=spec)
147
148 agdata = pm.sample(3000, cores=2)
~/Documents/OIII_emitters/m2fs_reduction/test/assets/scripts/analysis.py in agauss(amp, cen, b_sigma, r_sigma, x)
118 y = np.zeros(len(x))
119 for i in range(len(x)):
--> 120 if x[i] < cen:
121 y[i] = amp*np.exp(-((x[i] - cen)**2)/(2*b_sigma**2))
122 else:
~/anaconda3/envs/data_science/lib/python3.7/site-packages/theano/tensor/var.py in __bool__(self)
92 else:
93 raise TypeError(
---> 94 "Variables do not support boolean operations."
95 )
96
TypeError: Variables do not support boolean operations.
Here is an approach that follows the switchpoint trick of pymc3. It's basically an non-linear extension of this SO question. Below I show the code and how it works with some mock data that I created. I don't expect the mock data to be too similar with your actual data but it should give you a starting point. This approach does have some modelling limitations (i.e. fitting two gaussians) but again they could go away with more realistic input data.
Firstly I create some mock input data. Note that two gaussians attached to each other will have different normalisations and will also approach the continuum of your spectrum somewhat differently. I didn't attempt to include the latter subtlety in the mock data but I kept the normalisation consistent. Another complication is the fact that the amplitude of the two gaussians is different. This probably won't be a problem with real data but here I simply adda constant to match them. From these mock data we would like to recover parameters cen_mock, sigma_mock_1 and sigma_mock_2. Finally note that I kept the same frequency limits as in your question.
import numpy as np
import pymc3 as pm
import theano.tensor as tt
import matplotlib.pyplot as plt
def gaussian_pdf(x,sigma,cen):
g1 = (1/(sigma*np.sqrt(2*np.pi)))*np.exp(
-0.5*np.square(x-cen)/np.square(sigma))
return g1
# create mock data
wavelength_min = 5173
wavelength_max = 5179
cen_mock = 5175
x1 = np.arange(5173,cen_mock,0.01)
x2 = np.arange(cen_mock,5179,0.01)
x = np.arange(wavelength_min,wavelength_max,0.01)
sigma_mock_1 = 1.4
g1 = gaussian_pdf(x[x<=cen_mock],sigma_mock_1,cen_mock)
sigma_mock_2 = 1.9
g2 = gaussian_pdf(x[x>cen_mock],sigma_mock_2,cen_mock)
Once we have some mock data we build the model in pymc3 and sample:
# construct model
with pm.Model() as asym:
switchpoint = pm.DiscreteUniform("switchpoint",lower=x.min(),upper=x.max())
sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
sigma_1 = pm.HalfNormal('sigma_1', sd=10)
sigma_2 = pm.HalfNormal('sigma_2', sd=10)
sd = pm.math.switch(switchpoint > x, sigma_1, sigma_2)
likelihood = pm.Normal(
'y', mu=(1/(sd*np.sqrt(2*np.pi)))*tt.exp(
-0.5*tt.square(x-switchpoint)/tt.square(sd)),
sd=sigma, observed=y_mock)
with asym:
step1 = pm.NUTS([sigma_1,sigma_2])
step2 = pm.Metropolis([switchpoint])
trace = pm.sample(4000, step=[step1,step2],cores=4,tune=4000,chains=4)
The parameters of the model is the swithcpoint which represents where the gaussians change and the standard deviations. So the sampler will be estimating the "right" or "left" standard deviation based on the value of x.
We now check the results:
df = pm.summary(trace)
x1_m = x[x<=df.loc['switchpoint']['mean']]
x2_m = x[x>df.loc['switchpoint']['mean']]
g1_m = gaussian_pdf(x1_m,df.loc['sigma_1']['mean'],df.loc['switchpoint']['mean'])
g2_m = gaussian_pdf(x2_m,df.loc['sigma_2']['mean'],df.loc['switchpoint']['mean'])
amp_m = g1_m.max() - g2_m.max()
plt.plot(x[x<=cen_mock],g1,label='left gaussian mock')
plt.plot(x[x>cen_mock],g2,label='right gaussian mock')
plt.plot(x1_m,g1_m,label='left gaussian model')
plt.plot(x2_m,g2_m+amp_m,label='right gaussian model')
plt.text(5177,0.23,'sd_mock1='+str(sigma_mock_1))
plt.text(5177,0.2,'sd_mock2='+str(sigma_mock_2))
plt.legend(frameon=False)
There are of course much better ways of checking the sampling results (which are actually bayesian) but this is a quick and dirty way to get a feel of what is going on. Note that after the modelling I still need to adda constant to one of the gaussians to join them naturally. For three different pairs of standard deviations I get the following plots:
Now for some discussion. To evaluate how good this approach is we need to have some better idea of the real data. As you can see if the standard deviations differ significantly then the model starts failing. However for somewhat similar standard deviation and the edge case of the standard deviation being equal the model is somewhat acceptable. Furthermore another important input is whether the number of frequency datapoints is equal for the two gaussians. This will determine how each gaussian approaches the continuum. It will also determine whether there any need to arbitrarily adda constant to the second gaussian in order to join them in a more natural way when fitting on real data.
To sum up, this is an approach that follows closely your desired parametrisation, but with some extra work in evaluation using mock data. From your question it seems to me that the switchpoint approach is required, but only when applied to real data (or more realistic mock data) one can know with some certainty if it's sufficient.
I'm trying to use a conditional asymmetric loss function with a regression model and am having issues. I want to penalize wrong way results but the direction flips depending on the sign of the variable.
import numpy as np
def CustomLoss(predict,true):
ix = np.logical_and((predict*true)>0,np.abs(true)>=np.abs(predict))
n = ((predict - true)**2)*2
y = (predict-true)**2
out = np.where(ix,y,n)
return out
# CustomLoss(1,3) = 4
# CustomLoss(1,-1) = 8 ## Bigger loss for wrong way result
# CustomLoss(-2,-4) = 4
# CustomLoss(-2, 0) = 8 ## Bigger loss for wrong way result
I tried using scipy optimize, it converges for some data but not others. The function is still convex so I'd think this should always converge.
I've typically used CVXPY but can't figure out how to implement the conditional part of the cost function.
I am having trouble understanding the output of my function to implement multiple-ridge regression. I am doing this from scratch in Python for the closed form of the method. This closed form is shown below:
I have a training set X that is 100 rows x 10 columns and a vector y that is 100x1.
My attempt is as follows:
def ridgeRegression(xMatrix, yVector, lambdaRange):
wList = []
for i in range(1, lambdaRange+1):
lambVal = i
# compute the inner values (X.T X + lambda I)
xTranspose = np.transpose(x)
xTx = xTranspose # x
lamb_I = lambVal * np.eye(xTx.shape[0])
# invert inner, e.g. (inner)**(-1)
inner_matInv = np.linalg.inv(xTx + lamb_I)
# compute outer (X.T y)
outer_xTy = np.dot(xTranspose, y)
# multiply together
w = inner_matInv # outer_xTy
wList.append(w)
print(wList)
For testing, I am running it with the first 5 lambda values.
wList becomes 5 numpy.arrays each of length 10 (I'm assuming for the 10 coefficients).
Here is the first of those 5 arrays:
array([ 0.29686755, 1.48420319, 0.36388528, 0.70324668, -0.51604451,
2.39045735, 1.45295857, 2.21437745, 0.98222546, 0.86124358])
My question, and clarification:
Shouldn't there be 11 coefficients, (1 for the y-intercept + 10 slopes)?
How do I get the Minimum Square Error from this computation?
What comes next if I wanted to plot this line?
I think I am just really confused as to what I'm looking at, since I'm still working on my linear-algebra.
Thanks!
First, I would modify your ridge regression to look like the following:
import numpy as np
def ridgeRegression(X, y, lambdaRange):
wList = []
# Get normal form of `X`
A = X.T # X
# Get Identity matrix
I = np.eye(A.shape[0])
# Get right hand side
c = X.T # y
for lambVal in range(1, lambdaRange+1):
# Set up equations Bw = c
lamb_I = lambVal * I
B = A + lamb_I
# Solve for w
w = np.linalg.solve(B,c)
wList.append(w)
return wList
Notice that I replaced your inv call to compute the matrix inverse with an implicit solve. This is much more numerically stable, which is an important consideration for these types of problems especially.
I've also taken the A=X.T#X computation, identity matrix I generation, and right hand side vector c=X.T#y computation out of the loop--these don't change within the loop and are relatively expensive to compute.
As was pointed out by #qwr, the number of columns of X will determine the number of coefficients you have. You have not described your model, so it's not clear how the underlying domain, x, is structured into X.
Traditionally, one might use polynomial regression, in which case X is the Vandermonde Matrix. In that case, the first coefficient would be associated with the y-intercept. However, based on the context of your question, you seem to be interested in multivariate linear regression. In any case, the model needs to be clearly defined. Once it is, then the returned weights may be used to further analyze your data.
Typically to make notation more compact, the matrix X contains a column of ones for an intercept, so if you have p predictors, the matrix is dimensions n by p+1. See Wikipedia article on linear regression for an example.
To compute in-sample MSE, use the definition for MSE: the average of squared residuals. To compute generalization error, you need cross-validation.
Also, you shouldn't take lambVal as an integer. It can be small (close to 0) if the aim is just to avoid numerical error when xTx is ill-conditionned.
I would advise you to use a logarithmic range instead of a linear one, starting from 0.001 and going up to 100 or more if you want to. For instance you can change your code to that:
powerMin = -3
powerMax = 3
for i in range(powerMin, powerMax):
lambVal = 10**i
print(lambVal)
And then you can try a smaller range or a linear range once you figure out what is the correct order of lambVal with your data from cross-validation.
I am solving a linear model with bounds on the parameters. The simple statsmodels OLS method doesn't allow for bounds on the fitted parameters, so to do this, I maximize a likelihood function using scipy.optimize.minimize. From this, I have my set of parameters for a linear model. All good so far.
All I need to acheive now is to be able to access statistics for my model, such as R^2, F-Stat, etc. For an OLS, these things all come with the object returned by model.fit() along with other nice features.
I'm wondering if it is possible to create this object, manually assign my parameters from the bounded fit, and have it compute the data fields on the fit result object? Obviously, I could just manually compute these things but I want it such that whether I am calling for a bounded or unbounded fit, I get the same object type returned and life is easy downstream.
Pseudo code:
bounded_params = fitBoundedLinear(x, y) # solution to bounded problem - a list of floats
model = statsmodels.api.OLS(y, x)
unbounded_fitResult = model.fit() # solution to unbounded problem - a regression results object
want to do something like:
aFitResult.params = bounded_params # manually set the parameters
aFitResult.calculate() # force it to compute data fields based on these params
rsq = aFitResult.rsquared # etc...
I have something that works - but it is probably not an ideal solution:
aFitResult = statsmodels.regression.linear_model.RegressionResultsWrapper(statsmodels.regression.linear_model.OLSResults(model, bounded_params))
you can add upper_bound and lower_bound to fit_elasticnet in elastic_net.py as:
def fit_elasticnet(model, method="coord_descent", maxiter=100,
alpha=0., L1_wt=1., start_params=None, cnvrg_tol=1e-7,
zero_tol=1e-8, refit=False, check_step=True,
loglike_kwds=None, score_kwds=None, hess_kwds=None, upper_bound=None, lower_bound=None):
then inside that function after the following line:
params[k] = _opt_1d(func, grad, hess, model_1var, params[k], alpha[k]*L1_wt,
tol=btol, check_step=check_step)
add:
if upper_bound is not None:
params[k] = min(params[k], upper_bound[k])
if lower_bound is not None:
params[k] = max(params[k], lower_bound[k])
then call the function similar to:
model = lm.OLS(y, x)
results_fu = model.fit()
#results_fu.summary()
results_fr = model.fit_regularized(alpha=0.001
,start_params=results_fu.params
,upper_bound=(.60,0,0,1,1,1,1,1)
,lower_bound=(-1, 0,0,0,-1,1,1,1,-10) )
Set the model's initial parameters to the desired values via start_params= and then fit them using maxiter=0 to do a fit with 0 steps (i.e. don't fit, but still run through all the initialization and metric computation).
result = model.fit(start_params=your_parameters_here, maxiter=0)
result.rsquared # or any other fit index
I am using keras to build a recommender model. Because the item set is quite large, I'd like to calculate the Hits # N metric as a measure of accuracy. That is, if the observed item is in the top N predicted, it counts as relevant recommendation.
I was able to build the hits at N function using numpy. But as I'm trying to port it into a custom loss function for keras, I'm having problem with the tensors. Specifically, enumerating over a tensor is different. And when I looked into the syntax to find something equivalent, I started to question the whole approach. It's sloppy and slow, reflective of my general python familiarity.
def hits_at(y_true, y_pred): #numpy version
a=y_pred.argsort(axis=1) #ascending, sort by row, return index
a = np.fliplr(a) #reverse to get descending
a = a[:,0:10] #return only the first 10 columns of each row
Ybool = [] #initialze 2D arrray
for t, idx in enumerate(a):
ybool = np.zeros(num_items +1) #zero fill; 0 index is reserved
ybool[idx] = 1 #flip the recommended item from 0 to 1
Ybool.append(ybool)
A = map(lambda t: list(t), Ybool)
right_sum = (A * y_true).max(axis=1) #element-wise multiplication, then find the max
right_sum = right_sum.sum() #how many times did we score a hit?
return right_sum/len(y_true) #fraction of observations where we scored a hit
How should I approach this in a more compact, and tensor-friendly way?
Update:
I was able to get a version of Top 1 working. I based it loosely on the GRU4Rec description
def custom_objective(y_true, y_pred):
y_pred_idx_sort = T.argsort(-y_pred, axis=1)[:,0] #returns the first element, which is the index of the row with the largest value
y_act_idx = T.argmax(y_true, axis=1)#returns an array of indexes with the top value
return T.cast(-T.mean(T.nnet.sigmoid((T.eq(y_pred_idx_sort,y_act_idx)))), theano.config.floatX)`
I just had to compare the array of top 1 predictions to the array of the actuals element-wise. And Theano has an eq() function to do that.
Independent of N, the number of possible values of your loss function is finite. Therefore it can't be differentiable in a sensible tensor way and you cannot use it as loss function in Keras / Theano. You may try to use a theano log loss with top N guys.
UPDATE :
In Keras - you may write your own loss functions. They have a declaration of a form :
def loss_function(y_pred, y_true):
Both y_true and y_pred are numpy arrays, so you may obtain easly a vector v which is 1 when an example given is in top 500 and 0 otherwise. Then you may transform it to theano tensor constant vector and apply it in a way :
return theano.tensor.net.binary_crossentropy(y_pred * v, y_true * v)
This should work correctly.
UPDATE 2:
Log loss is the same thing what binary_crossentropy.