How to do curve_fit in python - python

I need to curve fit a set of data using y = x / (a + x), where a is the parameter that I am required to get from this excercise.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = [1, 2, 7, 10, 20, 70, 200, 1000]
y = [0, 0, 15.3, 34.6, 49.3, 82.6, 100]
def fit(x, a):
return x/(a+x)
par, con = curve_fit(fit, x, y)
plt.plot(x, fit(x, par[0]))
plt.show()
Using this I get some abomination of a fit. Doesn't even remotely fit.
If I try it like this:
def fit(x, a, b):
return b*x/(a+x)
I get a fit but it's without round cornerns. It's just straight lines. What am I doing wrong?

Notice that your x is a list of int, in Python division are by default integer division, which is not what you want here.
Therefore, a few changes will make it work, use the 2nd function as an example, your first function is not going to fit well as it will have a limit of 1 when x->inf:
def fit(x, a, b):
return b*x*1./(a+x)
A, B=curve_fit(fit, x, y)[0]
plt.plot(x, fit(x, A, B))
plt.plot(x, y, 'r+')
plt.savefig('temp.png')
It is a set of straight lines because you only calculate y at those x values, to get a curve: change the plotting call to plt.plot(np.linspace(0,200,100), fit(np.linspace(0,200,100), A, B))

Related

How to show standard error with curve_fit from scipy in python?

I am trying to fit multiple curve equations to my data to determine what kind of decay curve best represents my data. I am using the curve_fit function within scipy in python.
Here is my example data:
data = {'X':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12 ,13 ,14 ,15
],
'Y':[55, 55, 55, 54, 54, 54, 54, 53, 53, 50, 45, 37, 27, 16, 0
]}
df = pd.DataFrame(data)
I then want to try fitting a linear, logarithmic, second-degree polynomial, and exponential decay curve to my data points and then plotting. I am trying out the following code:
import pandas as pd
import numpy as np
from numpy import array, exp
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# load the dataset
data = df.values
# choose the input and output variables
x, y = data[:, 0], data[:, 1]
def func1(x, a, b):
return a * x + b
def func2(x, a, b):
return a * np.log(x) + b
def func3(x, a, b, c):
return a*x**2+b*x+c
def func4(x, a, b, c):
return a*exp(b*x)+c
params, covs = curve_fit(func1, x, y)
params, _ = curve_fit(func1, x, y)
a, b = params[0], params[1]
yfit1 = a * x + b
print('Linear decay fit:')
print('y = %.5f * x + %.5f' % (a, b))
params, _ = curve_fit(func2, x, y)
a, b = params[0], params[1]
yfit2 = a * np.log(x) + b
print('Logarithmic decay fit:')
print('y = %.5f * ln(x)+ %.5f' % (a, b))
params, _ = curve_fit(func3, x, y)
a, b, c = params[0], params[1], params[2]
yfit3 = a*x**2+b*x+c
print('Polynomial decay fit:')
print('y = %.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
params, _ = curve_fit(func4, x, y)
a, b, c = params[0], params[1], params[2]
yfit4 = a*exp(x*b)+c
print('Exponential decay fit:')
print('y = %.5f * exp(x*%.5f)+%.5f' % (a, b, c))
plt.plot(x, y, 'bo', label="y-original")
plt.plot(x, yfit1, label="y=a * x + b")
plt.plot(x, yfit2, label="y=a * np.log(x) + b")
plt.plot(x, yfit3, label="y=a*x^2 + bx + c")
plt.plot(x, yfit4, label="y=a*exp(x*b)+c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='best', fancybox=True, shadow=True)
plt.grid(True)
plt.show()
This produces the following text and visual plotting output:
I had originally tried to find a way to show the r-squared value for each curve fit, but I found out that for non-linear curves, R squared is not suitable, and that instead I should be identifying the standard error for each curve fit to best judge which curve equation best describes the decay I am seeing in my datapoints. My question is how can I take my code and the fits of each equation I attempted and output a standard error reading for each curve fit so that I can best judge which equation most accurately represents my data? I am ultimately trying to make the judgement call of saying "I have discovered that as the x-axis value increases, the y-axis value decays ..." and fitting in that blank with "linearly", "logarithmically", second-degree quadratically", or "exponentially". Visually I see that the exponentially decay equation fits my data the best, but I do not have a quantitative basis to make that call, other than simply visually, and so that is why I am trying to find out how I can print the standard error, so that I can judge the best fit for an equation as corresponding to the equation with the lowest standard error. I have been unable to locate just how to derive this from my calculations from documentation.
You can use Mean Absolute Scaled Error (MASE) for comparing the goodness of fit. Define a function for MASE as follows:
import numpy as np
def mase(actual : np.ndarray, predicted : np.ndarray):
forecast_error = np.mean(np.abs(actual - predicted))
naive_forecast = np.mean(np.abs(np.diff(actual)))
mase = forecast_error / naive_forecast
return mase
Then simply compare all of the predicted values with the actual values (both must be NumPy arrays to use the function above). You can then select the model/equation with the lowest MASE value. For example:
>>> actual = np.array([1,2,3])
>>> predicted1 = np.array([1.1, 2.2, 3.3])
>>> mase(actual, predicted1)
0.20000000000000004
>>>
>>> predicted2 = np.array([1.1, 2.1, 3.1])
>>> mase(actual, predicted2)
0.10000000000000009
Here we can safely say that predicted2 has a lower MASE value and thus is favorable.

Intersection point between line and surface in 3D python

I want to be able to find the intersection between a line and a three-dimensional surface.
Mathematically, I have done this by taking the following steps:
Define the (x, y, z) coordinates of the line in a parametric manner. e.g. (x, y, z) = (1+t, 2+3t, 1-t)
Define the surface as a function. e.g. z = f(x, y)
Substitute the values of x, y, and z from the line into the surface function.
By solving, I would be able to get the intersection of the surface and the line
I want to know if there is a method for doing this in Python. I am also open to suggestions on more simple ways to solving for the intersection.
You can use the following code:
import numpy as np
import scipy as sc
import scipy.optimize
from matplotlib import pyplot as plt
def f(x, y):
""" Function of the surface"""
# example equation
z = x**2 + y**2 -10
return z
p0 = np.array([1, 2, 1]) # starting point for the line
direction = np.array( [1, 3, -1]) # direction vector
def line_func(t):
"""Function of the straight line.
:param t: curve-parameter of the line
:returns xyz-value as array"""
return p0 + t*direction
def target_func(t):
"""Function that will be minimized by fmin
:param t: curve parameter of the straight line
:returns: (z_line(t) - z_surface(t))**2 – this is zero
at intersection points"""
p_line = line_func(t)
z_surface = f(*p_line[:2])
return np.sum((p_line[2] - z_surface)**2)
t_opt = sc.optimize.fmin(target_func, x0=-10)
intersection_point = line_func(t_opt)
The main idea is to reformulate the algebraic equation point_of_line = point_of_surface (condition for intersection) into a minimization problem: |point_of_line - point_of_surface| → min. Due to the representation of the surface as z_surface = f(x, y) it is convenient to calculate the distance for a given t-value only on basis of the z-values. This is done in target_func(t). And then the optimal t-value is found by fmin.
The correctness and plausibility of the result can be checked with some plotting:
from mpl_toolkits.mplot3d import Axes3D
ax = plt.subplot(projection='3d')
X = np.linspace(-5, 5, 10)
Y = np.linspace(-5, 5, 10)
tt = np.linspace(-5, 5, 100)
XX, YY = np.meshgrid(X, Y)
ZZ = f(XX, YY)
ax.plot_wireframe(XX, YY, ZZ, zorder=0)
LL = np.array([line_func(t) for t in tt])
ax.plot(*LL.T, color="orange", zorder=10)
ax.plot([x], [y], [z], "o", color="red", ms=10, zorder=20)
Note that this combination of wire frame and line plots does not handle well, which part of the orange line should be below the blue wire lines of the surface.
Also note, that for this type of problem there might be any number of solutions from 0 up to +∞. This depends on the actual surface. fmin finds an local optimum, this might be a global optimum with target_func(t_opt)=0 or it might not. Changing the initial guess x0 might change which local optimum fmin finds.

An error in matplotlib related to numpy.roots

I was trying to plot (the modulus of) sum of quadratic roots and it returns me an error illustrated as follow:
import numpy as np
import matplotlib.pyplot as plt
def rooting(a, b, c):
y = [a, b, c]
z = np.roots(y)
return np.absolute(z[0]+z[1])
x = np.linspace(1, 10, 10)
plt.plot(x, rooting(x, 2, 3))
and the error was:
File "C:\Users\user\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 1570, in nonzero
res = nonzero()
SystemError: <built-in method nonzero of numpy.ndarray object at 0x000001422B9BFD00> returned a result with an error set
Can someone tell me what's going on?
The problem arises because you are passing the variable to a vector and then concatenate with b and c are numbers, you must pass to the variable to a scalar, I show my next solution based on the above.
def rooting(a, b, c):
y = [a, b, c]
z = np.roots(y)
return np.absolute(z[0]+z[1])
x = np.linspace(1, 10, 10)
y = [rooting(xi, 2, 3) for xi in x]
plt.plot(x, y)
plt.show()
Using the quadratic formula, we know the roots are (-b ± √(b**2-4ac))/2a.
So the modulus of the sum of the roots is |b/a|.
With this simplification, we can compute the result in a vectorized way (no list comprehesion, looping, or multiple calls of rooting necessary):
import numpy as np
import matplotlib.pyplot as plt
def rooting(a, b, c):
# The roots are (-b ± √(b**2-4ac))/2a
# So the modulus of the sum of the roots is |b/a|
return np.abs(b/a)
x = np.linspace(0, 10, 11)
plt.plot(x, rooting(x, 2, 3))
plt.show()

Linear curve_fit always yields a slope and y-intercept of 1

I am trying to do a linear fit of some data, but I cannot get curve_fit in Python to give me anything but a slope and y-intercept of 1. Here is an example of my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b):
return a*x + b
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
popt, pcov = curve_fit(func, x, y)
print popt
I have also tried giving curve_fit a "guess," but when I do that it gives me an overflow error, which I'm guessing is because the numbers are too large.
Another way of doing this without using curve_fit is to use numpy's polyfit.
import matplotlib.pyplot as plt
import numpy as np
# This is merely a sample of some of my actual data
x = [290., 300., 310.]
y = [1.87e+21, 2.07e+21, 2.29e+21]
xp = np.linspace(290, 310, 100)
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
print (z)
fig, ax = plt.subplots()
ax.plot(x, y, '.')
ax.plot(xp, p(xp), '-')
plt.show()
This prints the coefficients as [2.10000000e+19 -4.22333333e+21] and produces the following graph:
I got something in the ballpark as Excel's linear fit by using scipy basinhopping instead of curve_fit with a large number of iterations. It takes a bit to run the iterations and it also requires an error function, but it was done without scaling the original data. Basinhopping docs.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import basinhopping
def func( x0, x_data, y_data ):
error = 0
for x_val, y_val in zip(x_data, y_data):
error += (y_val - (x0[0]*x_val + x0[1]))**2
return error
x_data = [290., 300., 310.]
y_data = [1.87e+21, 2.07e+21, 2.29e+21]
a = 1
b = 1
x0 = [a, b]
minimizer_kwargs = { 'method': 'TNC', 'args': (x_data, y_data) }
res = basinhopping(func, x0, niter=1000000, minimizer_kwargs=minimizer_kwargs)
print res
This gives x: array([ 7.72723434e+18, -2.38554994e+20]) but if you try again, you'll see this has the problem of non-unique outcomes, although it will give similar ballpark values.
Here's a comparison of the fit with the Excel solution.
Confirmed correct results are returned using:
x = [290., 300., 310.]
y = [300., 301., 302.]
My guess is magnitudes ≅ 10²¹ are too large for the function to work well.
What you can try doing is taking the logarithm of both sides:
def func(x, a, b):
# might need to check if ≤ 0.0
return math.log(a*x + b)
# ... code omitted
y = [48.9802253837, 49.0818355602, 49.1828387704]
Then undo the transformation afterwards.
Also for simple linear approximation, there is an easy deterministic method.

Optimal parameters not found for my curve fitting

Hello I have a problem to fit some data with Python. I just begin to fit my data with Python so I have some problems... This is my code :
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import *
from numpy import linalg as LA
def f(x,a,b,c):
return a*np.power(x,b)+c
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79])
y = np.array([7200,7925,8050,8200,8000,7550,7500,6800,6400,8150,6566,6280,6105,5963,5673,5495,5395,4800,4550,4558,4228,4087,3951,3817,3721,3612,3498,3416,3359,3269,3163,3241,2984,4475,2757,2644,2555,2600,3163,2720,2630,2543,2454,2441,2389,2339,2293,2261,2212,2180,2143,2450,2065,2032,1994,1960,1930,1897,1870,1838,1821,1785,1763,1741,1718,1689,1676,1662,1635,1635,1667,1633,1617,1615,1599,1581,1565,1547,1547])
params, extras = curve_fit(f, x, y)
plt.plot(x,y, 'o')
plt.plot(x, f(x, params[0], params[1], params[2]))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
And actually I want to fit my data with a function f(x) = a*x^b + c where I am looking for the best values of a, b and c to fit my data.
Do you know where there is something which is wrong ?
Thank you for your help.
Three caveats :
your model is not very good.
it diverge in x=0 : don't take first points.
you must give initial parameter estimations.
An exemple:
p0=[50000,-1,0]
x=x[10:]
y=y[10:]
params, cov = curve_fit(f, x, y,p0) #params=[3.16e+04 -5.83e-01 -1.00e+03]
plt.plot(x,y, 'o')
plt.plot(x, f(x, *params))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
You can estimate the quality of the model by
In [178]: np.sqrt(np.diag(cov))/params
Out[178]: array([ 0.12066005, -0.12537714, -0.53450057])
which shows that the estimation of error on parameters is greater than 10%.
The problem is the function you use for fitting. Consider using something like
def f(x, a, b, c):
return a*x + b*np.power(x, 2) + c
EDIT: accidentally posted the original function instead of the one I wanted to suggest.

Categories

Resources