I'm trying to use a conditional asymmetric loss function with a regression model and am having issues. I want to penalize wrong way results but the direction flips depending on the sign of the variable.
import numpy as np
def CustomLoss(predict,true):
ix = np.logical_and((predict*true)>0,np.abs(true)>=np.abs(predict))
n = ((predict - true)**2)*2
y = (predict-true)**2
out = np.where(ix,y,n)
return out
# CustomLoss(1,3) = 4
# CustomLoss(1,-1) = 8 ## Bigger loss for wrong way result
# CustomLoss(-2,-4) = 4
# CustomLoss(-2, 0) = 8 ## Bigger loss for wrong way result
I tried using scipy optimize, it converges for some data but not others. The function is still convex so I'd think this should always converge.
I've typically used CVXPY but can't figure out how to implement the conditional part of the cost function.
Related
I am trying to implement the fitting routine for experimentally received data.
But the function that I am trying to optimize is a black-box - I don't know anything about some specific moments right now - but I can call it with some parameters.
I am trying to find optimal parameters for function f(x), where x - is the list of parameters to optimize,
The function f() returns one value as a result.
Trying to use Particle Swarm Optimization to find optimal parameters for x.
I have bounds for all the parameters inside x, also I have some like initial parameters for almost all of them.
As the toy-problem trying to get this code working:
import pyswarms as ps
import numpy as np
# Define the function to optimize
def f1(x:list) -> float:
return x[0]**2 + x[1]**2 + x[2]**2
# Define the bounds for the parameters to optimize
# Create bounds
max_bound = 5 * np.ones(3)
min_bound = - max_bound
bounds = (min_bound, max_bound)
print(bounds)
# Set up the optimization options
options = {'c1': 0.5, 'c2': 0.3, 'w': 0.9}
# Perform the optimization
dimensions = 3 # because we have 3 inputs for f1()??
# how to give the PSO initial values for all optimization parameters?
# init_pos =
optimizer = ps.single.GlobalBestPSO(n_particles=100, dimensions=dimensions, options=options, bounds=bounds, init_pos=None)
cost, pos = optimizer.optimize(f1, iters=1000)
# Print the optimized parameters and the cost
optimized_params = pos
print("Optimized parameters: ", optimized_params)
print("Cost: ", cost)
It gives an error here:
ValueError: operands could not be broadcast together with shapes (3,) (100,)
What am I doing wrong?
If I give the n_particles=3 parameter - it actually works - but it can't find the minima of the function and works really slow.. That is strange so I am pretty confused.
Note My real application requires large number of elements in X-list in input can be relatively large in real-world application - approx 100.
And the real application must vary all the components inside the x-list...
Maybe someone can suggest a python module to efficiently use PSO?
How can I give the optimizer the information on the initial guesses for the parameters in this case?
I am using GEKKO for fitting purposes trying to optimise functions which are explicitly defined - so I have a fully functional form and can create equation objects for optimisation purposes.
But now I have a different problem.
I can't create equations because of the complexed functional dependence.
But I have a python function that calculates the output using some inputs - optimisation parameters and some other that can be interpreted as fixed or known.
The key moments: I have the experimental data and a complexed model that is described in f1(set_of_parameters) - python function. f1 - is nonlinear, nonconvex and it can't be expressed as one simple equation - it has a lot of conditional parameters and a lot of branches the calls of other python functions inside, etc.
So actually f1 can't be converted to a gekko model equation.
And I need to find such parameters - set_of_optimal_parameters, which will lead to the minimum of a distance so that f1(set_of_optimal_parameters) will be as close as possible to the experimental data I have, so I will find a set_of_optimal_parameters.
For each parameter of a set, I have initial values and boundaries and even some constraints.
So I need to do something like this:
m = GEKKO()
#parameters I need to find
param_1 = m.FV(value = val_1, lb = lb_1, ub=rb_1)
param_2 = m.FV(value = val_2, lb = lb_2, ub=rb_2)
...
param_n = m.FV(value = val_n, lb = lb_n, ub=rb_n) #constructing the input for the function f1()
params_dataframe = ....()# some function that collects all the parameters and arranges all of them to a proper form to an input of f1()
#exp data description
x = m.Param(value = xData)
z = m.Param(value = yData)
y = m.Var()
#model description - is it possible to use other function inside equation? because f1 is very complexed with a lot of branches and options.. I don't really want to translate it in equation form..
m.Equation(
y==f1(params_dataframe)
)
#add some constraints
min = m.Param(value=some_val_min)
m.Equation(min <= (param_1+param_2) / (param_1+param_2)**2))
# trying to solve and minimize the sum of squares
m.Minimize(((y-z))**2)
# Options for solver
param_1.STATUS = 1
param_2.STATUS = 1
...
param_n.STATUS = 1
m.options.IMODE = 2
m.options.SOLVER = 1
m.options.MAX_ITER = 1000
m.solve(disp=1)
Is it possible to use GEKKO this way or it's not allowed? and why?
Gekko compiles equations into byte-code and requires all equations in Gekko format so that it can overload equation operators to provide exact first and second derivatives in sparse form. Black-box functions do not provide the necessary first and second derivatives, but they can provide function evaluations for finite differences (derivative approximations) or for surrogate functions.
To answer your question directly, you can't use f1(params) in a Gekko problem. If you need an optimizer to evaluate arbitrary black box functions, an optimizer such as scipy.optimize.minimize() is a good choice.
If you would still like to use Gekko, there are several options to built a surrogate model for f1 that has continuous first and second derivatives. The surrogate model depends on the number of params:
1D: use cspline()
2D: use bspline()
3D+: use Machine learning such as Gaussian Processes, Neural Network, Linear Regression, etc.
Here is an example that create a surrogate model for y=f(x) where f(x)=3*np.sin(x) - (x-3). This equation could be modeled directly in Gekko, but it serves as an example of creating a cspline() object that approximates the function and finds the minimum.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
"""
minimize y
s.t. y = f(x)
using cubic spline with random sampling of data
"""
# function to generate data for cspline
def f(x):
return 3*np.sin(x) - (x-3)
x_data = np.random.rand(50)*10+10
y_data = f(x_data)
c = GEKKO()
x = c.Var(value=np.random.rand(1)*10+10)
y = c.Var()
c.cspline(x,y,x_data,y_data,True)
c.Obj(y)
c.options.IMODE = 3
c.options.CSV_READ = 0
c.options.SOLVER = 3
c.solve(disp=True)
if c.options.SOLVESTATUS == 1:
plt.figure()
plt.scatter(x_data,y_data,5,'b')
plt.scatter(x.value,y.value,200,'r','x')
else:
print ('Failed!')
print(x_data,y_data)
plt.figure()
plt.scatter(x_data,y_data,5,'b')
plt.show()
I am illustrating hyperopt's TPE algorithm for my master project and cant seem to get the algorithm to converge. From what i understand from the original paper and youtube lecture the TPE algorithm works in the following steps:
(in the following, x=hyperparameters and y=loss)
Start by creating a search history of [x,y], say 10 points.
Sort the hyperparameters according to their loss and divide them into two sets using some quantile γ (γ = 0.5 means the sets will be equally sized)
Make a kernel density estimation for both the poor hyperparameter group (g(x)) and good hyperparameter group (l(x))
Good estimations will have low probability in g(x) and high probability in l(x), so we propose to evaluate the function at argmin(g(x)/l(x))
Evaluate (x,y) pair at the proposed point and repeat steps 2-5.
I have implemented this in python on the objective function f(x) = x^2, but the algorithm fails to converge to the minimum.
import numpy as np
import scipy as sp
from matplotlib import pyplot as plt
from scipy.stats import gaussian_kde
def objective_func(x):
return x**2
def measure(x):
noise = np.random.randn(len(x))*0
return x**2+noise
def split_meassures(x_obs,y_obs,gamma=1/2):
#split x and y observations into two sets and return a seperation threshold (y_star)
size = int(len(x_obs)//(1/gamma))
l = {'x':x_obs[:size],'y':y_obs[:size]}
g = {'x':x_obs[size:],'y':y_obs[size:]}
y_star = (l['y'][-1]+g['y'][0])/2
return l,g,y_star
#sample objective function values for ilustration
x_obj = np.linspace(-5,5,10000)
y_obj = objective_func(x_obj)
#start by sampling a parameter search history
x_obs = np.linspace(-5,5,10)
y_obs = measure(x_obs)
nr_iterations = 100
for i in range(nr_iterations):
#sort observations according to loss
sort_idx = y_obs.argsort()
x_obs,y_obs = x_obs[sort_idx],y_obs[sort_idx]
#split sorted observations in two groups (l and g)
l,g,y_star = split_meassures(x_obs,y_obs)
#aproximate distributions for both groups using kernel density estimation
kde_l = gaussian_kde(l['x']).evaluate(x_obj)
kde_g = gaussian_kde(g['x']).evaluate(x_obj)
#define our evaluation measure for sampling a new point
eval_measure = kde_g/kde_l
if i%10==0:
plt.figure()
plt.subplot(2,2,1)
plt.plot(x_obj,y_obj,label='Objective')
plt.plot(x_obs,y_obs,'*',label='Observations')
plt.plot([-5,5],[y_star,y_star],'k')
plt.subplot(2,2,2)
plt.plot(x_obj,kde_l)
plt.subplot(2,2,3)
plt.plot(x_obj,kde_g)
plt.subplot(2,2,4)
plt.semilogy(x_obj,eval_measure)
plt.draw()
#find point to evaluate and add the new observation
best_search = x_obj[np.argmin(eval_measure)]
x_obs = np.append(x_obs,[best_search])
y_obs = np.append(y_obs,[measure(np.asarray([best_search]))])
plt.show()
I suspect this happens because we keep sampling where we are most certain, thus making l(x) more and more narrow around this point, which doesn't change where we sample at all. So where is my understanding lacking?
So, I am still learning about TPE as well. But here's are the two problems in this code:
This code will only evaluate a few unique point. Because the best location is calculated based on the best recommended by the kernel density functions but there is no way for the code to do exploration of the search space. For example, what acquisition functions do.
Because this code is simply appending new observations to the list of x and y. It adds a whole lot of duplicates. The duplicates lead to a skewed set of observations and that leads to a very weird split and you can easily see that in the later plots. The eval_measure starts as something similar to the objective function but diverges later on.
If you remove the duplicates in x_obs and y_obs you can remove the problem no. 2. However, the first problem can only be removed through the addition of some way of exploring the search space.
I am having trouble understanding the output of my function to implement multiple-ridge regression. I am doing this from scratch in Python for the closed form of the method. This closed form is shown below:
I have a training set X that is 100 rows x 10 columns and a vector y that is 100x1.
My attempt is as follows:
def ridgeRegression(xMatrix, yVector, lambdaRange):
wList = []
for i in range(1, lambdaRange+1):
lambVal = i
# compute the inner values (X.T X + lambda I)
xTranspose = np.transpose(x)
xTx = xTranspose # x
lamb_I = lambVal * np.eye(xTx.shape[0])
# invert inner, e.g. (inner)**(-1)
inner_matInv = np.linalg.inv(xTx + lamb_I)
# compute outer (X.T y)
outer_xTy = np.dot(xTranspose, y)
# multiply together
w = inner_matInv # outer_xTy
wList.append(w)
print(wList)
For testing, I am running it with the first 5 lambda values.
wList becomes 5 numpy.arrays each of length 10 (I'm assuming for the 10 coefficients).
Here is the first of those 5 arrays:
array([ 0.29686755, 1.48420319, 0.36388528, 0.70324668, -0.51604451,
2.39045735, 1.45295857, 2.21437745, 0.98222546, 0.86124358])
My question, and clarification:
Shouldn't there be 11 coefficients, (1 for the y-intercept + 10 slopes)?
How do I get the Minimum Square Error from this computation?
What comes next if I wanted to plot this line?
I think I am just really confused as to what I'm looking at, since I'm still working on my linear-algebra.
Thanks!
First, I would modify your ridge regression to look like the following:
import numpy as np
def ridgeRegression(X, y, lambdaRange):
wList = []
# Get normal form of `X`
A = X.T # X
# Get Identity matrix
I = np.eye(A.shape[0])
# Get right hand side
c = X.T # y
for lambVal in range(1, lambdaRange+1):
# Set up equations Bw = c
lamb_I = lambVal * I
B = A + lamb_I
# Solve for w
w = np.linalg.solve(B,c)
wList.append(w)
return wList
Notice that I replaced your inv call to compute the matrix inverse with an implicit solve. This is much more numerically stable, which is an important consideration for these types of problems especially.
I've also taken the A=X.T#X computation, identity matrix I generation, and right hand side vector c=X.T#y computation out of the loop--these don't change within the loop and are relatively expensive to compute.
As was pointed out by #qwr, the number of columns of X will determine the number of coefficients you have. You have not described your model, so it's not clear how the underlying domain, x, is structured into X.
Traditionally, one might use polynomial regression, in which case X is the Vandermonde Matrix. In that case, the first coefficient would be associated with the y-intercept. However, based on the context of your question, you seem to be interested in multivariate linear regression. In any case, the model needs to be clearly defined. Once it is, then the returned weights may be used to further analyze your data.
Typically to make notation more compact, the matrix X contains a column of ones for an intercept, so if you have p predictors, the matrix is dimensions n by p+1. See Wikipedia article on linear regression for an example.
To compute in-sample MSE, use the definition for MSE: the average of squared residuals. To compute generalization error, you need cross-validation.
Also, you shouldn't take lambVal as an integer. It can be small (close to 0) if the aim is just to avoid numerical error when xTx is ill-conditionned.
I would advise you to use a logarithmic range instead of a linear one, starting from 0.001 and going up to 100 or more if you want to. For instance you can change your code to that:
powerMin = -3
powerMax = 3
for i in range(powerMin, powerMax):
lambVal = 10**i
print(lambVal)
And then you can try a smaller range or a linear range once you figure out what is the correct order of lambVal with your data from cross-validation.
I am trying to fit an ARMA model to time series data. I haven't find any functions that can automatically choose the parameter. Below are the code I wrote however as I am a beginner to Python hence I believe this code can be optimised.
Can someone give me some ideas on how to:
do the Vectorization on the double loop
quicker way to do the parameter choosing
Much appreciate.
parameter_bound = 3
# Creating a 2-D array, storing the residuals of two different parameters of ARMA model
residuals = [[0 for x in range(parameter_bound)] for x in range(parameter_bound)]
model = [[0 for x in range(parameter_bound)] for x in range(parameter_bound)]
# Calculate residuals for each parameter combinations
for i in range(parameter_bound):
for j in range(parameter_bound):
model[i][j] = sm.tsa.ARMA(input_data, (i,j)).fit()
residuals[i][j] = sum(abs(model[i][j].resid))
# Find the parameters with lowest residuals
parameters = np.argmin(residuals)
parameter1 = parameters/parameter_bound
parameter2 = parameters - parameters/parameter_bound*parameter_bound
# Use the model with lowest residuals to get prediction data
prediction = model[parameter1][parameter2].resid + input_data
I'm not sure exactly what you're expecting, but you could replace your lists with numpy arrays (I don't think it'll improve your specific code):
import numpy as np
residuals = np.zeros((parameter_bound, parameter_bound))
model = np.zeros((parameter_bound, parameter_bound), np.object)
Also, be aware that np.argmin with axis=None returns an index for a flattened array, if you want to return the model parameters of the model with the lowest residuals you might try:
prediction = model.ravel()[np.argmin(residuals)].resid + input_data
You can use Ljung-Box test:
__, pvalue = sm.diagnostic.acorr_ljungbox(model[i][j].resid)
# if p-value higher than confidence interval 0.95, reject H
if pvalue > 0.05:
use_parameters = ...