curve_fit multivariable arrays non-linear regression - python

I am trying to fit the coefficients of a multivariate function with curve_fit. All variables are arrays of the following shape : (1000,) Manually I can fit the curves as follows. First I define my function where the variables = [dphi1,dphi2,phi1,phi2,M] and the coefficients = [c1,c2,c3,c4,c5,c6,c7,c8,c9]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import math
# Function definition
def ddphi1(dphi1,dphi2,phi1,phi2,M,c1,c2,c3,c4,c5,c6,c7,c8,c9):
return (-(c1*np.sin(phi1-phi2)*np.cos(phi1-phi2)*dphi1**2)-(c2*np.sin(phi1-phi2)*dphi2**2)+(c3*np.cos(phi1-phi2)*np.sin(phi2))+(c4*np.cos(phi1-phi2)*(dphi2-dphi1))-(c5*np.sin(phi1))+c6*M-c7*dphi1)/(c8-(c9*np.cos(phi1-phi2)*np.cos(phi1-phi2)))
my first prediction of the coefficients:
p = [0.5625, 0.375, 27.590625000000003, 0.09375, 55.18125, 62.5, 0.425, 1, 0.5625]
I calculate the values of the function iteratively. I take the length of any of the variables already have the same size:
n = len(time1)
y = np.empty(n)
for i in range(n):
y[i] = ddphi1(dphi11[i],dphi22[i],phi11[i],phi22[i],M[i],p[0],p[1],p[2],p[3],p[4],p[5],p[6],p[7],p[8])
plt.plot(time1, ddphi11)
plt.plot(time1, y, 'r')
Predicted Vs real data
Now the idea is to calculate the coefficients automatically with curve_fit as follows: ** ddphi1 ist my Callback function and ddphi11 my data of shape (1000,) as well as the other variables
from scipy.optimize import curve_fit
g = [0.56, 0.37, 27.63, 0.094, 55.18, 62.5, 0.625, 1, 0.56]
c,cov =curve_fit(ddphi1,(dphi1,dphi2,phi1,phi2,M),ddphi11,g)
print(c)
and I receive this error
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-158-e8e42e7b1216> in <module>()
1 from scipy.optimize import curve_fit
2
----> 3 c,cov =curve_fit(ddphi1,(dphi1,dphi2,phi1,phi2,M),ddphi11,g=[0.56, 0.37, 27.63, 0.094, 55.18, 62.5, 0.625, 1, 0.56])
4 print(c)
1 frames
/usr/local/lib/python3.7/dist-packages/scipy/optimize/minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
719 # non-array_like `xdata`.
720 if check_finite:
--> 721 xdata = np.asarray_chkfinite(xdata, float)
722 else:
723 xdata = np.asarray(xdata, float)
/usr/local/lib/python3.7/dist-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
484
485 """
--> 486 a = asarray(a, dtype=dtype, order=order)
487 if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
488 raise ValueError(
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.
I have seen that most of the data that goes into the curve_fir is in the form list. Maybe there is a solution when dealing with arrays? espero I hope you can help me as I am new to Python.

I was finally able to solve it. only the arrays should have been concatenated in a global variable
X=np.column_stack([dphi11,dphi22,phi11,phi22,M])
Then describe the model in based the global variable
def model(X,c1,c2,c3,c4,c5,c6,c7,c8,c9):
dphi1 = X[:,0]
dphi2 = X[:,1]
phi1 = X[:,2]
phi2 = X[:,3]
M = X[:,4]
f = (-(c1*np.sin(phi1-phi2)*np.cos(phi1-phi2)*dphi1**2)-(c2*np.sin(phi1-phi2)*dphi2**2)+(c3*np.cos(phi1-phi2)*np.sin(phi2))+(c4*np.cos(phi1-phi2)*(dphi2-dphi1))-(c5*np.sin(phi1))+c6*M-c7*dphi1)/(c8-(c9*np.cos(phi1-phi2)*np.cos(phi1-phi2)))
return f
and the magic begins
guesses = [0.56, 0.37, 27.63, 0.094, 55.18, 62.5, 0.625, 1, 0.56]
from scipy.optimize import curve_fit
popt, pcov = curve_fit(model, X, ddphi11, guesses)
print(popt)

Related

Curve_Fit returrns error "Result from function Call is not a proper array of floats"

I am trying to call scipy curve_fit(), with the proper:
model function
xdata (float numpy 1D Array)
ydata (float numpy 1D Array)
p (float numpy 1D Array, initial values)
However I am getting the error:
ValueError: Object too deep for desired Array
Result from function Call is not a proper array of floats.
the model function I am computing is :
The mathematical expression that optimizes model_f, from which we are trying to find the optimal alpha, gamma.
function model_f computes the mathematical expression appended in the picture.
with open("Data_case_3.csv",'r') as i: #open a file in directory of this script for reading
rawdata = list(csv.reader(i,delimiter=",")) #make a list of data in file
exampledata = np.array(rawdata[1:],dtype=np.float) #convert to data array
xdata = exampledata[:,0]
ydata = exampledata[:,1]
m = 0.5
omega0 = 34.15
k = np.square(omega0)*m
def model_f(x,a,g):
zetaeq = (a*np.sqrt(np.pi)*(x**(g-1))*omega0*math.gamma(g/2))/(2*np.pi*k*math.gamma((3+g)/2))
return zetaeq
#------------------------------------------------------------------------------
funcdata = model_f(xdata,0.3,0.1)
plt.plot(xdata,funcdata,label="Model")
plt.legend()
popt, pcov = curve_fit(model_f, xdata, ydata, p0=[0.3,0.1])
And I am attaching the data types of the variables mentioned:
Variable types and shapes of the script
Can you help me understand what I am doing wrong?
Compare these 2 calls to curve_fit:
In [217]: xdata, ydata = np.ones(5), np.ones(5)
In [218]: curve_fit(model_f, xdata, ydata, p0=[0.3, 0.1])
Out[218]:
(array([0.74436049, 0.02752099]),
array([[2.46401533e-16, 9.03501810e-18],
[9.03501810e-18, 3.31294823e-19]]))
and
In [219]: xdata, ydata = np.ones((5,1)), np.ones((5,1))
In [220]: curve_fit(model_f, xdata, ydata, p0=[0.3, 0.1])
ValueError: object too deep for desired array
Traceback (most recent call last):
Input In [220] in <module>
curve_fit(model_f, xdata, ydata, p0=[0.3, 0.1])
File /usr/local/lib/python3.8/dist-packages/scipy/optimize/_minpack_py.py:789 in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File /usr/local/lib/python3.8/dist-packages/scipy/optimize/_minpack_py.py:423 in leastsq
retval = _minpack._lmdif(func, x0, args, full_output, ftol, xtol,
error: Result from function call is not a proper array of floats.
Which is closer to your experience?

Fitting zenithal equal area projection with astropy and fit_wcs_from_points

I'm trying to use astropy.wcs.utils.fit_wcs_from_points to fit points projected with zenithal equal area projection (WCS code ZEA). This projection is popular in all-sky cameras.
I have started by projecting a set of celestial coordinates with a known WCS, to see if I can recover it. Input values are obtained by:
x, y = w.world_to_pixel(lon * u.deg, lat * u.deg)
world_coords = SkyCoord(lon * u.deg, lat * u.deg)
and the projection is:
from astropy import wcs
w = wcs.WCS(naxis=2)
scale = 0.095
w.wcs.crpix = [1290, 1950]
w.wcs.cdelt = [scale, scale]
w.wcs.crval = [0, 90]
w.wcs.ctype = ["ALON-ZEA", "ALAT-ZEA"]
In all my tests I get the following exception when I perform the fitting:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-cfb0b01c0d18> in <module>
----> 1 astropy.wcs.utils.fit_wcs_from_points([x, y], world_coords,
2 proj_point=SkyCoord(0 * u.deg, 90 * u.deg),
3 projection='ZEA')
/usr/lib64/python3.9/site-packages/astropy/wcs/utils.py in fit_wcs_from_points(xy, world_coords, proj_point, projection, sip_degree)
1076 # and cd terms are way off.
1077 p0 = np.concatenate([wcs.wcs.cd.flatten(), wcs.wcs.crpix.flatten()])
-> 1078 fit = least_squares(_linear_wcs_fit, p0,
1079 args=(lon, lat, xp, yp, wcs))
1080 wcs.wcs.crpix = np.array(fit.x[4:6])
/usr/lib64/python3.9/site-packages/scipy/optimize/_lsq/least_squares.py in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
812
813 if not np.all(np.isfinite(f0)):
--> 814 raise ValueError("Residuals are not finite in the initial point.")
815
816 n = x0.size
ValueError: Residuals are not finite in the initial point.
Things I have tried, without changes:
pass the actual test WCS transformation in projection, instead of ZEA
set CRPIX to 0 in my test WCS, so all the points are around (0 ,0)
remove proj_point
try with astropy 4.2.1 (latest released)
This sample code reproduces the problem:
import numpy as np
import astropy.wcs.utils
from astropy.coordinates import SkyCoord
import astropy.units as u
x0 = np.array([702.4, 1480.4, 1223.5, 897, 1916.6])
y0 = np.array([1925.8, 2269.3, 2679.1, 1632.7, 1586.3])
zea_lon = [268.8, 145.6, 181.6, 305.4, 56.5]
zea_lat = [31.8, 54.0, 15.2, 40.6, 16.1]
world_coords0 = SkyCoord(zea_lon * u.deg, zea_lat * u.deg)
astropy.wcs.utils.fit_wcs_from_points([x0, y0], world_coords0,
proj_point=SkyCoord(0 * u.deg, 90 * u.deg),
projection='ZEA')

skcuda.linalg.PCA's fit_transform throws error

I am trying to run PCA (Principal component analysis) on GPU. I am using skcuda.linalg.PCA for that purpose, but it's not working. From their tutorial (https://scikit-cuda.readthedocs.io/en/latest/generated/skcuda.linalg.PCA.html):
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy as np
import skcuda.linalg as linalg
from skcuda.linalg import PCA as cuPCA
pca = cuPCA(n_components=4) # map the data to 4 dimensions
X = np.random.rand(1000,100) # 1000 samples of 100-dimensional data vectors
X_gpu = gpuarray.GPUArray((1000,100), np.float64, order="F") # note that order="F" or a transpose is necessary. fit_transform requires row-major matrices, and column-major is the default
X_gpu.set(X) # copy data to gpu
T_gpu = pca.fit_transform(X_gpu) # calculate the principal components
When I run it I get the following error:
cublasInternalError Traceback (most recent call last)
<ipython-input-31-02aaf0fa19e4> in <module>
8 X_gpu = gpuarray.GPUArray((1000,100), np.float64, order="F") # note that order="F" or a transpose is necessary. fit_transform requires row-major matrices, and column-major is the default
9 X_gpu.set(X) # copy data to gpu
---> 10 T_gpu = pca.fit_transform(X_gpu)
/opt/conda/lib/python3.7/site-packages/skcuda/linalg.py in fit_transform(self, X_gpu)
204 cuGemv (self.h, 'n', p, k, -1.0, P_gpu.gpudata, p, U_gpu.gpudata, 1, 1.0, P_gpu[:,k].gpudata, 1)
205
--> 206 l2 = cuNrm2(self.h, p, P_gpu[:,k].gpudata, 1)
207 cuScal(self.h, p, 1.0/l2, P_gpu[:,k].gpudata, 1)
208 cuGemv(self.h, 'n', n, p, 1.0, R_gpu.gpudata, n, P_gpu[:,k].gpudata, 1, 0.0, T_gpu[:,k].gpudata, 1)
/opt/conda/lib/python3.7/site-packages/skcuda/cublas.py in cublasDnrm2(handle, n, x, incx)
1295 n, int(x), incx,
1296 ctypes.byref(result))
-> 1297 cublasCheckStatus(status)
1298 return np.float64(result.value)
1299
/opt/conda/lib/python3.7/site-packages/skcuda/cublas.py in cublasCheckStatus(status)
177 raise cublasError
178 else:
--> 179 raise e
180
181 # Helper functions:
cublasInternalError
Initially, I was running on my own data and I got this error. Then I decided to run the example and I got exactly the same error. Does any1 know what's the problem here? I am using Kaggle notebook with Tesla T4 GPU. Thanks.

hierarchical clustering in scipy - memory error

Here is my code:
import numpy as np
from scipy.cluster.hierarchy import fclusterdata
def mydist(p1,p2):
return 1
Y = np.random.randn(100000,2)
fclust1 = fclusterdata(Y, 1.0, metric=mydist)
It produces the following error:
MemoryError Traceback (most recent call last)
<ipython-input-52-818db8791e96> in <module>()
----> 1 fclust1 = fclusterdata(Y, 1.0, metric=mydist)
C:\Anaconda3\lib\site-packages\scipy\cluster\hierarchy.py in fclusterdata(X, t, criterion, metric, depth, method, R)
1682 'array.')
1683
-> 1684 Y = distance.pdist(X, metric=metric)
1685 Z = linkage(Y, method=method)
1686 if R is None:
C:\Anaconda3\lib\site-packages\scipy\spatial\distance.py in pdist(X, metric, p, w, V, VI)
1218
1219 m, n = s
-> 1220 dm = np.zeros((m * (m - 1)) // 2, dtype=np.double)
1221
1222 wmink_names = ['wminkowski', 'wmi', 'wm', 'wpnorm']
MemoryError:
So I am guessing my vector is too large. I am a bit surprised, since my distance function is trivial. What is max size vector that fclusterdata can accept?
Hierarchical clustering usually requires a pairwise distance matrix.
That means you need O(n^2) memory. And it does not 'see' that your distance is constant (and it doesn't make sense to optimize for this either).
It's not a very scalable algorithm.

Fitting a curve to a power-law distribution with curve_fit does not work

I am trying to find a curve fitting my data that visually seem to have a power law distribution.
I hoped to utilize scipy.optimize.curve_fit, but no matter what function or data normalization I try, I am getting either a RuntimeError (parameters not found or overflow) or a curve that does not fit my data even remotely. Please help me to figure out what I am doing wrong here.
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
df = pd.DataFrame({
'x': [ 1000, 3250, 5500, 10000, 32500, 55000, 77500, 100000, 200000 ],
'y': [ 1100, 500, 288, 200, 113, 67, 52, 44, 5 ]
})
df.plot(x='x', y='y', kind='line', style='--ro', figsize=(10, 5))
def func_powerlaw(x, m, c, c0):
return c0 + x**m * c
target_func = func_powerlaw
X = df['x']
y = df['y']
popt, pcov = curve_fit(target_func, X, y)
plt.figure(figsize=(10, 5))
plt.plot(X, target_func(X, *popt), '--')
plt.plot(X, y, 'ro')
plt.legend()
plt.show()
Output
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-243-17421b6b0c14> in <module>()
18 y = df['y']
19
---> 20 popt, pcov = curve_fit(target_func, X, y)
21
22 plt.figure(figsize=(10, 5))
/Users/evgenyp/.virtualenvs/kindle-dev/lib/python2.7/site-packages/scipy/optimize/minpack.pyc in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, **kwargs)
653 cost = np.sum(infodict['fvec'] ** 2)
654 if ier not in [1, 2, 3, 4]:
--> 655 raise RuntimeError("Optimal parameters not found: " + errmsg)
656 else:
657 res = least_squares(func, p0, args=args, bounds=bounds, method=method,
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
As the traceback states, the maximum number of function evaluations was reached without finding a stationary point (to terminate the algorithm). You can increase the maximum number using the option maxfev. For this example, setting maxfev=2000 is large enough to successfully terminate the algorithm.
However, the solution is not satisfactory. This is due to the algorithm choosing a (default) initial estimate for the variables, which, for this example, is not good (the large number of iterations required is an indicator of this). Providing another initialization point (found by simple trial and error) results in a good fit, without the need to increase maxfev.
The two fits and a visual comparison with the data is shown below.
x = np.asarray([ 1000, 3250, 5500, 10000, 32500, 55000, 77500, 100000, 200000 ])
y = np.asarray([ 1100, 500, 288, 200, 113, 67, 52, 44, 5 ])
sol1 = curve_fit(func_powerlaw, x, y, maxfev=2000 )
sol2 = curve_fit(func_powerlaw, x, y, p0 = np.asarray([-1,10**5,0]))
Your func_powerlaw is not strictly a power law, as it has an additive constant.
Generally speaking, if you want a quick visual appraisal of a power law relation, you would
plot(log(x),log(y))
or
loglog(x,y)
Both of them should give a straight line, although there are subtle differences among them (in particular, regarding curve fitting).
All this without the additive constant, which messes up the power law relation.
If you want to fit a power law that weighs data according to the log-log scale (typically desirable), you can use code below.
import numpy as np
from scipy.optimize import curve_fit
def powlaw(x, a, b) :
return a * np.power(x, b)
def linlaw(x, a, b) :
return a + x * b
def curve_fit_log(xdata, ydata) :
"""Fit data to a power law with weights according to a log scale"""
# Weights according to a log scale
# Apply fscalex
xdata_log = np.log10(xdata)
# Apply fscaley
ydata_log = np.log10(ydata)
# Fit linear
popt_log, pcov_log = curve_fit(linlaw, xdata_log, ydata_log)
#print(popt_log, pcov_log)
# Apply fscaley^-1 to fitted data
ydatafit_log = np.power(10, linlaw(xdata_log, *popt_log))
# There is no need to apply fscalex^-1 as original data is already available
return (popt_log, pcov_log, ydatafit_log)

Categories

Resources