Piecewise Linear Regression failure using curve-fit and lmfit

Piecewise Linear Regression failure using curve-fit and lmfit - python

I am trying to use curve fitting to find coefficients for an equation using multiple datasets. The equation itself is piecewise, it is defined as :
In this equation, we don't know the break point Po. The variable
I have tried using scipy curve_fit and lmfit. Curve_fit succefully fitted the data for some datasets but failed miserably in others. Here is the code for lmfit inspired by this answer and Curve_fit inspired by this answer:
import pandas as pd
import matplotlib
from scipy.signal import savgol_filter
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.transforms as mtrans
from sklearn import linear_model
import csv
from scipy import stats
from sklearn import preprocessing
from scipy.special import erf,erfc
from lmfit import minimize, Parameters, Model
from sklearn.linear_model import LinearRegression
power_level_for_prediction = [45,50,60,69,71,88]
group_by_column = "mem_pow"
critical_device_power_name = "core_pow"
files = pd.read_csv("file_path")
def residual(params,x,y = None):
param1 = params['a']
param2 = params['b']
param3 = params['x0']
param4 = params['c']
param5 = params['d']
dx = (max(x) - min(x))/(len(x) -1)
xhi = (erf((x-param3)/dx) + 1)/2.0
xlo = (erfc((x-param3)/dx) + 1)/2.0
# p = xlo*param4*np.exp(param5*x) + xhi*(param1*x+param2)
p = xlo*(param1*x + param2) + xhi*(param4*x + param5)
# p = param1*x + param2
# p[np.where(param2 < x)] = param3*x + param2
if y is None:
return p
return p - y
def linear_lmfit(x,y):
params = Parameters()
params.add('a', value = 0.1)
params.add('b', value = 0.2)
params.add('c', value = 0.3)
params.add('d', value = 0.4,min = -5, max =5)
params.add('x0', value = 120)
out = minimize(residual,params,args = (x,y))
fit = residual(out.params,x)
return fit
def piecewise_linear(x, x0, y0, a, c):
# Represntation of above equation. here b and d from above equation, would remain same.
return np.piecewise(x, [x< x0],[lambda x: a*x + y0-a*x0, lambda x: c*x + y0-c*x0])
def linear(files):
files_grouped = files.groupby(group_by_column)
rows, columns = (2,3)
fig, ax = plt.subplots(rows,columns,figsize = (20,10))
k = 0
for name, group in files_grouped:
x = group[critical_device_power_name].to_numpy().astype(float)
y = group['elapsed_time'].to_numpy().astype(float)
if name in power_level_for_prediction:
i = math.floor( k / columns)
j = k % columns
p ,e = curve_fit(piecewise_linear,x,y)
#pred = piecewise_linear(x,*p)
pred = linear_lmfit(x,y)
ax[i][j].plot(x,y,label = "Actual Elapsed Time")
ax[i][j].plot(x,pred, label = "Predicted Elapsed Time")
ax[i][j].grid()
ax[i][j].set_title(f"Prediction Result for {name}W {group_by_column}")
ax[i][j].set_ylabel(r"$T_c$ (sec)")
ax[i][j].set_xlabel(f"{critical_device_power_name}")
ax[i][j].legend(title = f'{group_by_column}')
k = k+1
fig.suptitle(f"{experiment_name}")
fig.tight_layout()
plt.show()
Result using LMFIT:
I have no clue, why LMFIT is showing this type of result. Do you think is it because of the intial value.
and here is the result for the curve_fit:
As seen in the graph, for some mem_pow values the graph is somewhat good but for other it is quite bad. I am unable to understand the reason behind this. In my opinion, the curve fitting is failling for mem_pow level because the second piecwise function is quite flat and the function fails to fit that part.
Here is the csv file :
https://gist.github.com/kulnaman/8952e9c14ec5e8dcf2bbbd40f2dccdaa

Related

Different results when optimizing hyperparameter for a Gaussian process regression

i'm studying gaussian process regression, and i'm trying to use the built-in functions from scikit-learn, and also trying to impement a custom function for doing so.
This is the code when using scikit-learn:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s
X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) # Function
v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized
hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn
Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result
And this is the code i wrote manually:
def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,var,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = var*np.exp(-(1/(2*(l**2)))*dist_X)
inverse = np.linalg.inv(k + (sigma_n**2)*np.eye(len(k)))
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k +
(sigma_n**2)*np.eye(len(k)))) + (n/2)*np.log(2*np.pi)
return ml
b= [0.0005,100]
bnd = [b,b,b] #bounds used for "minimize" function
start = np.array([1.1,1.6,0.05]) #initial hyperparameters values
re = minimize(marglike,start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the
method used is the same as the one used by scikit-learn
re.x #Hyperparameter results
Out[]: array([3.55266484e+00, 9.99986210e+01, 5.00000000e-04])
As you can see, the hyperparameter i got from the 2 methods are different, but yet i used the same data(X,Y) and same minimization method.
Could somebody help me to understand why and maybe how to get same results ?!

As suggested by San Mason, adding noise actually works! Otherwise, while you do it manually (in the custom code), set the initial noise to reasonably low and have multiple restarts with different initializations then you will get values close by. By the way, noiseless data seems to be creating a stationary ridge in the space of hyperparameters (like Fig. 1.6 in Surrogates GP book). Note that scikit-learn noise is sigma_n^2 for your custom function. Below are the snippets of noisy and noise-less cases.
Noise-less case
scikit-learn
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s
X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) #+ np.random.normal(10)# Function
v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn
# Out[]: 14.1**2 * RBF(length_scale=3.7) + WhiteKernel(noise_level=1e-05) #result
custom function
def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,std,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
inverse = np.linalg.inv(k)
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
return ml[0,0]
b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (9.920690495739379, 3.5657912350017575, 1.0000000000000002e-10)
Noisy case
scikit-learn
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor as gpr
from sklearn.gaussian_process.kernels import RBF,WhiteKernel,ConstantKernel as C
from scipy.optimize import minimize
import scipy.stats as s
X = np.linspace(0,10,10).reshape(-1,1) # Input Values
Y = 2*X + np.sin(X) + np.random.normal(size=10).reshape(10,1)*0.1 # Function
v = 1
kernel = v*RBF() + WhiteKernel() #Defining kernel
gp = gpr(kernel=kernel,n_restarts_optimizer=50).fit(X,Y) #fitting the process to get optimized
# hyperparameter
gp.kernel_ #Hyperparameters optimized by the GPR function in scikit-learn
# Out[]: 10.3**2 * RBF(length_scale=3.45) + WhiteKernel(noise_level=0.00792) #result
Custom function
def marglike(par,X,Y): #defining log-marginal-likelihood
# print(par)
l,std,sigma_n = par
n = len(X)
dist_X = (X - X.T)**2
# print(dist_X)
k = std**2*np.exp(-(dist_X/(2*(l**2)))) + (sigma_n**2)*np.eye(n)
inverse = np.linalg.inv(k)
ml = (1/2)*np.dot(np.dot(Y.T,inverse),Y) + (1/2)*np.log(np.linalg.det(k)) + (n/2)*np.log(2*np.pi)
return ml[0,0]
b= [10**-5,10**5]
bnd = [b,b,b] #bounds used for "minimize" function
start = [1,1,10**-5] #initial hyperparameters values
re = minimize(fun=marglike,x0=start,args=(X,Y),method="L-BFGS-B",options = {'disp':True},bounds=bnd) #the
# method used is the same as the one used by scikit-learn
re.x[1], re.x[0], re.x[2]**2
# Output - (10.268943740577331, 3.4462604625225106, 0.007922681239535326)

Second order coupled ODE using ODEINT

I am new to solving coupled ODEs with python, I am wondering if my approach is correct, currently this code outputs a graph that looks nothing like the expected output. These are the equations I am trying to solve:
And here is the code I am using (for the functions f_gr, f_sc_phi and f_gTheta you can just put any constant value)
import Radial as rd
import ScatteringAzimuthal as sa
import PolarComponent as pc
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
#gamma for now set to 1
g_mm = 1
def f(u,t):
#y1 = thetadot :: y2 = phidot :: y3 = cdot
rho, theta, y1, phi, y2, c, y3 = u
p = [y1, (pc.f_gTheta(theta,524.1+rho)/(c*np.cos(phi))-(g_mm*y1)+(2*y1*y2*np.tan(phi))-(2*y3*y1/c)),
y2, ((sa.f_sc_phi(theta,524.1+rho/c))-(g_mm*y2)-(2*y3*y2/c)-(np.sin(phi)*np.cos(phi)*y2**2)),
y3, (rd.f_gr(theta,524.1+rho)-(g_mm*y3)+(c*y2**2)+(c*(y1**2)*(np.cos(phi)**2))), phi]
return p
time = np.linspace(0,10,100)
z2 = odeint(f,[0.1,np.pi/2,0.1,np.pi/2,0.1,0.1,0.1], time)
rhoPl = z2[:,0]
thetaPl = z2[:,1]
phiPl = z2[:,3]
'''
plt.plot(rhoPl,time)
plt.plot(thetaPl,time)
plt.plot(phiPl,time)
plt.show()
'''
x = rhoPl*np.sin(thetaPl)*np.cos(phiPl)
y = rhoPl*np.sin(thetaPl)*np.sin(phiPl)
z = rhoPl*np.cos(thetaPl)
plt.plot(x,time)
plt.plot(y,time)
plt.plot(z,time)
plt.show()
when I change the time from 0.1 to 5 I get an error:
ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
Any ideas on how to improve this code or if my approach is completely incorrect?
Code for Radial.py
import numpy as np
from scipy.special import spherical_jn
from scipy.special import spherical_yn
import sympy as sp
import matplotlib.pyplot as plt
R_r = 5.6*10**(-5)
l = 720
n_w = 1.326
#k = 524.5/R_r
X_r = 524.5
# R is constant r is changing
def f_gr(theta,x):
f = ((sp.sin(theta))**(2*l-2))*(1+(sp.cos(theta))**2)
b = (spherical_jn(l,n_w*x)*spherical_jn(l,n_w*x,True))+(spherical_yn(l,n_w*x)*spherical_yn(l,n_w*x,True))
c = (spherical_jn(l,n_w*X_r)*spherical_jn(l,n_w*X_r,True))+(spherical_yn(l,n_w*X_r)*spherical_yn(l,n_w*X_r,True))
n = b/c
f = f*n
return f
Code for ScatteringAzimuthal.py
from scipy.special import spherical_jn, spherical_yn
import numpy as np
import matplotlib.pyplot as plt
l = 720
n_w = 1.326
n_p = 1.572
X_r = 524.5
R_r = 5.6*10**(-5)
R_p = 7.5*10**(-7)
k = X_r/R_r
def f_sc_phi(theta,x):
f = (2/3)*(n_w**2)*((X_r**3)/x)*((R_p**3)/(R_r**3))*(((n_p**2)-(n_w**2))/((n_p**2)+(2*(n_w**2))))
g = np.sin(theta)**(2*l-3)
numerator = (l*(1+np.sin(theta))- np.cos(2*theta))\
*((spherical_jn(l,n_w*x)*spherical_jn(l,n_w*x))+(spherical_yn(l,n_w*x)*spherical_yn(l,n_w*x)))
denominator = ((spherical_jn(l,n_w*X_r)*spherical_jn(l,n_w*X_r,True))\
+(spherical_yn(l,n_w*X_r)*spherical_yn(l,n_w*X_r,True)))
m = numerator/denominator
final = f*g*m
return final
And Code for PolarComponent.py
import numpy as np
from scipy.special import spherical_yn, spherical_jn
import matplotlib.pyplot as plt
l = 720
n_w = 1.326
X_r = 524.5 #this value is implemented in the ode file
#define dimensionless polar component
#X_r is radius, x is variable
def f_gTheta(theta,x):
bessel1 = (spherical_jn(l,n_w*x)*spherical_jn(l,n_w*x)) + \
(spherical_yn(l,n_w*x)*spherical_yn(l,n_w*x))
bessel2 = ((spherical_yn(l,n_w*X_r)*spherical_yn(l,n_w*X_r,True)) + \
(spherical_yn(l,n_w*X_r)*spherical_yn(l,n_w*X_r,True)))*n_w*x
bessels = bessel1/bessel2
rest = (np.sin(theta)**(2*l-3))*((l-1)*(1+(np.cos(theta)**2)) \
-((np.sin(theta)**2)*np.cos(theta)))
final = rest*bessels
return final

Here is a link that I really like for simulating second order odes. It has an optamization twist on it because it is fitting the model to match a simulation. It has a couple of examples for odeint and also gekko.

Simulating two coupled dynamical systems in Python

The goal is to plot two identical dynamical systems that are coupled.
We have:
X = [x0,x1,x2]
U = [u0,u1,u2]
And
Xdot = f(X) + alpha*(U-X)
Udot = f(U) + alpha*(X-U)
So I wish to plot the solution to this grand system on one set of axes (i.e in xyz for example) and eventually change the coupling strength to investigate the behaviour.
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from mpl_toolkits.mplot3d import Axes3D
def couple(s,t,a=0.2,beta=0.2,gamma=5.7,alpha=0.03):
[x,u] = s
[u0,u1,u2] = u
[x0,x1,x2] = x
xdot = np.zeros(3)
xdot[0] = -x1-x2
xdot[1] = x0+a*x1
xdot[2] = beta + x2*(x0-gamma)
udot = np.zeros(3)
udot[0] = -u1-u2
udot[1] = u0+a*u1
udot[2] = beta + u2*(u0-gamma)
sdot = np.zeros(2)
sdot[0] = xdot + alpha*(u-x)
sdot[1] = udot + alpha*(x-u)
return sdot
s_init = [0.1,0.1]
t_init=0; t_final = 300; t_step = 0.01
tpoints = np.arange(t_init,t_final,t_step)
a=0.2; beta=0.2; gamma=5.7; alpha=0.03
y = odeint(couple, s_init, tpoints,args=(a,beta,gamma,alpha), hmax = 0.01)
I imagine that something is wrong with s_init since it should be TWO initial condition vectors but when I try that I get that "odeint: y0 should be one-dimensional." On the other hand when I try s_init to be a 6-vector I get "too many values to unpack (expected two)." With the current setup, I am getting the error
File "C:/Users/Python Scripts/dynsys2019work.py", line 88, in couple
[u0,u1,u2] = u
TypeError: cannot unpack non-iterable numpy.float64 object
Cheers
*Edit: Please note this is basically my first time attempting this kind of thing and will be happy to receive further documentation and references.

The ode definition takes in and returns a 1D vector in scipy odeint, and I think some of your confusion is that you actually have 1 system of ODEs with 6 variables. You have just mentally apportioned it into 2 separate ODEs that are coupled.
You can do it like this:
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import numpy as np
def couple(s,t,a=0.2,beta=0.2,gamma=5.7,alpha=0.03):
x0, x1, x2, u0, u1, u2 = s
xdot = np.zeros(3)
xdot[0] = -x1-x2
xdot[1] = x0+a*x1
xdot[2] = beta + x2*(x0-gamma)
udot = np.zeros(3)
udot[0] = -u1-u2
udot[1] = u0+a*u1
udot[2] = beta + u2*(u0-gamma)
return np.ravel([xdot, udot])
s_init = [0.1,0.1, 0.1, 0.1, 0.1, 0.1]
t_init=0; t_final = 300; t_step = 0.01
tpoints = np.arange(t_init,t_final,t_step)
a=0.2; beta=0.2; gamma=5.7; alpha=0.03
y = odeint(couple, s_init, tpoints,args=(a,beta,gamma,alpha), hmax = 0.01)
plt.plot(tpoints,y[:,0])

Regularised Logistic regression in Python

I am using the below code for logistic regression with regularization in python. Its giving me 80% accuracy on the training set itself.
I am using minimize method 'TNC'. With BFG the results are of 50%.
What is the ideal method(equivalent to fminunc in Octave) to use for gradient descent?
How can I increase or decrease iteration?
What is the default iteration?
Any other suggestion/approach to improve performance?
The same algo in Octave with fminunc gives 83% accuracy on the training set.
import numpy as np
import scipy.optimize as op
from sklearn import preprocessing
import matplotlib.pyplot as plt
from matplotlib import style
from pylab import scatter, show, legend, xlabel, ylabel
from numpy import loadtxt, where
from sklearn.preprocessing import PolynomialFeatures
def sigmoid(z):
return 1/(1 + np.exp(-z));
def Gradient(theta,X,y,l):
m,n = X.shape
#print("theta shape")
#print(theta.shape)
theta = theta.reshape((n,1))
thetaR = theta[1:n,:]
y = y.reshape((m,1))
h = sigmoid(X.dot(theta))
nonRegGrad = ((np.sum(((h-y)*X),axis=0))/m).reshape(n,1)
reg = np.insert((l/m)*thetaR,0,0,axis=0)
grad = nonRegGrad + reg
return grad.flatten();
def CostFunc(theta,X,y,l):
h = sigmoid(X.dot(theta))
m,n=X.shape;
#print("theta shape")
#print(theta.shape)
theta = theta.reshape((n,1))
thetaR = theta[1:n,:]
cost=np.sum((np.multiply(-y,np.log(h))-np.multiply((1-y),np.log(1-h))))/m
reg=(l/(2*m))* np.sum(np.square(thetaR))
J=cost+reg
return J;
def predict(theta,X):
m,n=X.shape;
return np.round(sigmoid(X.dot(theta.reshape(n,1))));
data = np.loadtxt(open("ex2data2.txt","rb"),delimiter=",",skiprows=1)
nr,nc = data.shape
X=data[:,0:nc - 1]
#X=preprocessing.scale(X)
#X=np.insert(X,0,1,axis=1)
y= data[:,[nc - 1]]
pos = where(y == 1)
neg = where(y == 0)
scatter(X[pos, 0], X[pos, 1], marker='o', c='b')
scatter(X[neg, 0], X[neg, 1], marker='x', c='r')
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')
legend(['Passed', 'Failed'])
show()
storeX=X
poly = PolynomialFeatures(6)
X=poly.fit_transform(X)
#print(X.shape)
m , n = X.shape;
initial_theta = np.zeros((n,1));
#initial_theta = zeros(shape=(it.shape[1], 1))
l = 1
# Compute and display initial cost and gradient for regularized logistic
# regression
#cost, grad = cost_function_reg(initial_theta, X, y, l)
#def decorated_cost(theta):
# return cost_function_reg(theta, X, y, l)
#print fmin_bfgs(decorated_cost, initial_theta, maxfun=400)
print("Calling optimization")
Result = op.minimize(fun = CostFunc,
x0 = initial_theta,
args = (X, y,l),
method = 'TNC',
jac = Gradient);
optimal_theta = Result.x;
print(Result.x.shape)
print("optimal theta")
print(optimal_theta)
p=predict(optimal_theta,X)
accuracy = np.mean(np.double(p==y))
print("accuracy")
print(accuracy)
enter code here

How to use scipy.optimize minimize_scalar when objective function has multiple arguments?

I have a function of multiple arguments. I want to optimize it with respect to a single variable while holding others constant. For that I want to use minimize_scalar from spicy.optimize. I read the documentation, but I am still confused how to tell minimize_scalar that I want to minimize with respect to variable:w1. Below is a minimal working code.
import numpy as np
from scipy.optimize import minimize_scalar
def error(w0,w1,x,y_actual):
y_pred = w0+w1*x
mse = ((y_actual-y_pred)**2).mean()
return mse
w0=50
x = np.array([1,2,3])
y = np.array([52,54,56])
minimize_scalar(error,args=(w0,x,y),bounds=(-5,5))

You can use a lambda function
minimize_scalar(lambda w1: error(w0,w1,x,y),bounds=(-5,5))

You can also use a partial function.
from functools import partial
error_partial = partial(error, w0=w0, x=x, y_actual=y)
minimize_scalar(error_partial, bounds=(-5, 5))
In case you are wondering about the performance ... it is the same as with lambdas.
import time
from functools import partial
import numpy as np
from scipy.optimize import minimize_scalar
def error(w1, w0, x, y_actual):
y_pred = w0 + w1 * x
mse = ((y_actual - y_pred) ** 2).mean()
return mse
w0 = 50
x = np.arange(int(1e5))
y = np.arange(int(1e5)) + 52
error_partial = partial(error, w0=w0, x=x, y_actual=y)
p_time = []
for _ in range(100):
p_time_ = time.time()
p = minimize_scalar(error_partial, bounds=(-5, 5))
p_time_ = time.time() - p_time_
p_time.append(p_time_ / p.nfev)
l_time = []
for _ in range(100):
l_time_ = time.time()
l = minimize_scalar(lambda w1: error(w1, w0, x, y), bounds=(-5, 5))
l_time_ = time.time() - l_time_
l_time.append(l_time_ / l.nfev)
print(f'Same performance? {np.median(p_time) == np.median(l_time)}')
# Same performance? True

The marked correct answer is actually minimizing with respect to W0. It should be:
minimize_scalar(lambda w1: error(w1,w0,x,y),bounds=(-5,5))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Piecewise Linear Regression failure using curve-fit and lmfit - python

Related

Different results when optimizing hyperparameter for a Gaussian process regression

Second order coupled ODE using ODEINT

Simulating two coupled dynamical systems in Python

Regularised Logistic regression in Python

How to use scipy.optimize minimize_scalar when objective function has multiple arguments?

Categories

Resources