I have a problem: I have two distinct equations, one is a linear equation, the other one is an exponential equation. However not both equations should be valid at the same time, meaning that there are two distinct regimes.
Equation 1 (x < a): E*x
Equation 2 (x >=a): a+b*x+c*(1-np.exp(-d*np.array(x)))
Meaning the first part of the data should just be fit with a linear equation and the rest should be fit with the before mentioned equation 2.
The data I'm trying to fit looks like this (I have also added some sample data, if people wanna have a go):
I have tried several thing already, from just defining one fit function with a heaviside function:
def fit_fun(x,a,b,c,d,E):
funktion1=E*np.array(x)
funktion2=a+b*x+c*(1-np.exp(-d*np.array(x)))
return np.heaviside(x+a,0)*funktion2+(1-np.heaviside(x+a,0))*funktion1
defining a piecewise function:
def fit_fun(x,a,b,c,d,E):
return np.piecewise(x, [x <= a, x > a], [lambda x: E*np.array(x), lambda x: a+b*x+c*(1-np.exp(-d*np.array(x)))])
to lastly (which unforunattly yields me some form function error?):
def plast_fun(x,a,b,c,d,E):
out = E*x
out [np.where(x >= a)] = a+b*x+c*(1-np.exp(-d+x))
return out
Don't get me wrong I do get "some" fits, but they do seem to either take one or the other equation and not really use both. I also tried using several bounds and inital guesses, but it never changes.
Any input would be greatly appreciated!
Data:
0.000000 -1.570670
0.000434 83.292677
0.000867 108.909402
0.001301 124.121676
0.001734 138.187659
0.002168 151.278839
0.002601 163.160478
0.003035 174.255626
0.003468 185.035092
0.003902 195.629820
0.004336 205.887161
0.004769 215.611995
0.005203 224.752083
0.005636 233.436680
0.006070 241.897851
0.006503 250.352697
0.006937 258.915168
0.007370 267.569337
0.007804 276.199005
0.008237 284.646778
0.008671 292.772349
0.009105 300.489611
0.009538 307.776858
0.009972 314.666291
0.010405 321.224211
0.010839 327.531594
0.011272 333.669261
0.011706 339.706420
0.012139 345.689265
0.012573 351.628362
0.013007 357.488150
0.013440 363.185771
0.013874 368.606298
0.014307 373.635696
0.014741 378.203192
0.015174 382.315634
0.015608 386.064126
0.016041 389.592120
0.016475 393.033854
0.016908 396.454226
0.017342 399.831519
0.017776 403.107084
0.018209 406.277016
0.018643 409.441119
0.019076 412.710982
0.019510 415.987331
0.019943 418.873140
0.020377 421.178098
0.020810 423.756827
So far I have found these two questions, but I could't figure it out:
Fit of two different functions with boarder as fit parameter
Fit a curve for data made up of two distinct regimes
I suspect you are making a mistake in the second equation, where you do a+b*x+c*(1-np.exp(-d+x)). where a is the value of x where you change from one curve to the other. I think you should use the value of y instead which is a*E. Also it is very important to define initial parameters to the fit. I've ran the following code with your data in .txt file and the fit seems pretty good as you can see bellow:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import optimize, stats
def fit_fun(x,a,b,c,d,E):
return np.piecewise(x, [x <= a, x > a], [lambda x: E*x, lambda x: a*E+b*x+c*(1-np.exp(-d*x))])
df = pd.read_csv('teste.txt', delimiter='\s+', header=None)
df.columns = ['x','y']
xdata = df['x']
ydata = df['y']
p0 = [0.001,1,1,1,100000]
popt, pcov = optimize.curve_fit(fit_fun, xdata.values, ydata.values, p0=p0, maxfev=10000, absolute_sigma=True, method='trf')
print(popt)
plt.plot(xdata, ydata,'*')
plt.plot(xdata, fit_fun(xdata.values, *popt), 'r')
plt.show()
Related
I have the following dataset from a mechanical indentation test:
https://www.dropbox.com/s/jovjl55sjjyph3r/Test%20dataset.csv?dl=0
The graph shows the displacement of a spherical probe vs force recorded. I need to fit these data with specific equations (DMT model if you are familiar with it).
I have produced the code below, but I am unable to get a good fitting result. The code gives no error or warnings, so I don't know if the problem is on the plotting or on the actually fitting.
Did I write the fitting code correctly? Did I pass the variables Fad and R correctly into the function? Is the code that plots the fitting curve correct?
Also, in the code you can notice 2 different fitting functions. Function1 is based on 2 equations:
a = ((R/K)*(x+Fad))^(1/3)
y = ((a^2)/R)
Function2 is the same as function1 but the 2 equations are combined in a single equation. The funny thing is that they give 2 different plots!
Importantly, I'd like to use the 2 equations method because there are other more complex models that I should use to fit the same dataset. In these models the equations cannot be combined that easily like in this case.
Any help from the Community to solve this problem would be very much appreciated.
import pandas
from matplotlib import pyplot as plt
from scipy import optimize
import numpy as np
df = pandas.read_table("Test dataset.csv", sep = ',', header=0)
df = df.astype(float) #Change data from object to float
print(df.shape)
print(df)
df_Fad = -(df.iloc[0, 0])
print("Adhesion force = {} N".format(df_Fad))
R = 280*1e-6
print("Probe radius = {} m".format(df_Fad))
df_x = df["Corr Force A [N]"].to_list()
df_y = df["Corr Displacement [m]"].to_list()
#Define fitting function1
def DMT(x, R, K, Fad):
a = ((R/K)*(x+Fad))**(1/3)
return ((a**2)/R)
custom_DMT = lambda x, K: DMT(x, R, K, df_Fad) #Fix Fad value
pars_DMT, cov_DMT = optimize.curve_fit(f=custom_DMT, xdata=df_x, ydata=df_y)
print("K = ", round(pars_DMT[0],2))
print ("E = ", round(pars_DMT[0]*(4/3),2))
ax0 = df.plot(kind='scatter', x="Corr Force A [N]", y="Corr Displacement [m]", color='lightblue')
plt.plot(df_x, DMT(np.array(df_y), pars_DMT[0], R, df_Fad), "--", color='black')
ax0.set_title("DMT fitting")
ax0.set_xlabel("Force / N")
ax0.set_ylabel("Displacement / m")
ax0.legend(['Dataset'])
plt.tight_layout()
#Define fitting function2 => function2 = funtion1 in one line
def DMT2(x, Fad, R, K):
return ((x+Fad)**(2/3))/((R**(1/3))*(K**(2/3)))
custom_DMT2 = lambda x, K: DMT2(x, df_Fad, R, K) #Fix Fad value
pars_DMT2, cov_DMT2 = optimize.curve_fit(f=custom_DMT2, xdata=df_x, ydata=df_y)
print("K = ", round(pars_DMT2[0],2))
print ("E = ", round(pars_DMT2[0]*(4/3),2))
ax1 = df.plot(kind='scatter', x="Corr Force A [N]", y="Corr Displacement [m]", color='lightblue')
plt.plot( df_x, DMT2(np.array(df_y), pars_DMT2[0], df_Fad, R), "--", color='black')
ax1.set_title("DMT fitting")
ax1.set_xlabel("Force / N")
ax1.set_ylabel("Displacement / m")
ax1.legend(['Dataset'])
plt.tight_layout()
plt.show()
After further attempts and research I have solved the problem even though a doubt on a line of the code above still remains. I decided to post my solution here hoping that this could be helpful to others.
I could see that the code:
def DMT(x, R, K, Fad):
a = ((R/K)*(x+Fad))**(1/3)
return ((a**2)/R)
works well in fitting the experimental data, meaning that more equations can be easily used and combined in Python, which is great. It is important though that the variables (x, R, K, Fad) are introduced in the same order as they appear in the equations. Failing to do this gives random results.
The problem stays in the code line:
plt.plot(df_x, DMT(np.array(df_y), pars_DMT[0], R, df_Fad), "--", color='black')
Initially I thought that the order of the parameters (x, R, K, Fad) was wrong. I tried:
plt.plot(df_x, DMT(np.array(df_y), R, pars_DMT[0], df_Fad), "--", color='black')
but this didn't solve the problem. Anyone who can tell me what's wrong with this line?
Anyway, my way around this problem was to directly calculate the y data of the fitting line from the calculated parameter K (R and df_Fad are fixed) using the following code:
df["Fitting Displacement [m]"] = ((df["Corr Force A [N]"]+df_Fad)**(2/3))/((R**(1/3))*(pars_DMT[0]**(2/3)))
This was anyway a necessary step to do in order to save the fitting results.
I am trying to fit some experimental data to a nonlinear function with one parameter that includes an arcus cosine function which therefore is limited in its area of definition from -1 to 1. I use scipy's curve_fit to find the parameter of the function, but it returns the following error:
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 400.
The function I want to fit is this one:
def fitfunc(x, a):
y = np.rad2deg(np.arccos(x*np.cos(np.deg2rad(a))))
return y
For the fitting, I provid a numpy array for x and y respectively which contain values in degree (which is why the function contains conversion to and from radians).
param, param_cov = curve_fit(fitfunc, xs, ys)
When I use other fit functions like for example a polynomial, the curve_fit returns some values, the error mentioned above only occurs when I use this function which includes an arcus cosine.
I suspect that it cannot fit the data points because depending on the parameter of the arcus cosine function, some data points do not lie inside the area of definition of the arcus cosine. I have tried raising the number iterations (maxfev) but without success.
Sample data:
ys = np.array([113.46125, 129.4225, 140.88125, 145.80375, 145.4425,
146.97125, 97.8025, 112.91125, 114.4325, 119.16125,
130.13875, 134.63125, 129.4375, 141.99, 139.86,
138.77875, 137.91875, 140.71375])
xs = np.array([2.786427013, 3.325624466, 3.473013087, 3.598247534, 4.304280248,
4.958273121, 2.679526725, 2.409388637, 2.606306639, 3.661558062,
4.569923009, 4.836843789, 3.377013596, 3.664550526, 4.335401233,
3.064199519, 3.97155254, 4.100567011])
As HS-nebula mentioned in his comments, you need to define an initial value a0 of a as a start guess for the curve-fitting. Moreover, you need to be careful when choosing a0 as your np.arcos() is only defined in [-1,1] and choosing the wrong a0 results in error.
import numpy as np
from scipy.optimize import curve_fit
ys = np.array([113.46125, 129.4225, 140.88125, 145.80375, 145.4425, 146.97125,
97.8025, 112.91125, 114.4325, 119.16125, 130.13875, 134.63125,
129.4375, 141.99, 139.86, 138.77875, 137.91875, 140.71375])
xs = np.array([2.786427013, 3.325624466, 3.473013087, 3.598247534, 4.304280248, 4.958273121,
2.679526725, 2.409388637, 2.606306639, 3.661558062, 4.569923009, 4.836843789,
3.377013596, 3.664550526, 4.335401233, 3.064199519, 3.97155254, 4.100567011])
def fit_func(x, a):
a_in_rad = np.deg2rad(a)
cos_a_in_rad = np.cos(a_in_rad)
arcos_xa_product = np.arccos( x * cos_a_in_rad )
return np.rad2deg(arcos_xa_product)
a0 = 80
param, param_cov = curve_fit(fit_func, xs, ys, a0, bounds = (0, 360))
print('Using curve we retrieve a value of a = ', param[0])
Output:
Using curve we retrieve a value of a = 100.05275506147824
However if you choose a0=60, you get the following error:
ValueError: Residuals are not finite in the initial point.
To be able to use the data with all possible values of a, a normalization as HS-nebula suggested is good idea.
I have a set of data made up of (2-dimensional) observations of multiple objects. The observations can be described by a general function plus an offset that is unique to each object. I want to use curve_fit to simultaneously recover the general function and the offsets for each object (with associated errors). I do not know in advance how many objects the data-set will be made up of, only that there are likely to be multiple observations of each.
So a generalised data set of 7 observations might look like this:
[[x[0], y1[0], y2[0], lab='A'],
[x[1], y1[1], y2[1], lab='B'],
[x[2], y1[2], y2[2], lab='A'],
[x[3], y1[3], y2[3], lab='A'],
[x[4], y1[4], y2[4], lab='B'],
[x[5], y1[5], y2[5], lab='C'],
[x[6], y1[6], y2[6], lab='A']]
I could do the task by passing the parameters of the general function (say g = [g0, g1, g2]) and the object offsets offsets = n x [o1, o2] to fit_func and then using an object label to decide which of the n offsets needs to be added to the general function, except that I can't figure out how to pass the label.
def fit_func(x, g, offsets, lab):
y1 = g[0] * cos(2*(x - g[1])) + offsets['lab',0] + g[2]
y2 = g[0] * sin(2*(x - g[1])) + offsets['lab',1] + g[2]
return [y1, y2]
The problem is that lab is not a float to be fit, so I can't figure out how to pass it. From reading some other threads I believe I will need a wrapper function, but I can't figure out what form it should take, and then how to call it in such a way that I can specify sigma and p0.
Can anyone point me in the right direction?
Edit: I managed to produce a function that I thought would work. It used a global parameter call to choose options within the function call. So, for example I interleaved the y1 and y2 arrays, and had the function call the second equation every second run with a global getEven() and setEven(bool) call. However curve_fit really didn't like that. The fit values were nonsensical.
At the moment I am fitting the equation for y1 and the equation for y2 separately and taking the rms to determine g0 and g1 (this also gives me offsets['A',0] and offsets['A',1] respectively. I could just do this multiple times with each different object in the set, but I can't fit the g2 parameter this way, since in any given call to the y1 or y2 function it is degenerate with the corresponding offset.
Here is example code that fits two different equations with a shared parameter using 'A' or 'B' decoding. It appears to work as you need for decoding the lab type, but I personally have never done this before and while it appears to function per your post the "text-to-float" conversion inside the function seems klunky to me. But it works.
import numpy
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# single array with all "X" data to pass around
num = numpy.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0])
ids = numpy.array(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])
xdata = numpy.array([num, ids]) # combine data, numpy auto-converts to 'text' type
# ydata is numeric single array
ydata = [9.0,8.0,7.0,6.0,4.0,3.0,2.0,1.0]
def fitFunction(data, commonParameter, pA, pB):
numericDataAsText = data[0]
textData = data[1]
returnArray = []
for i in range(len(textData)):
x = float(numericDataAsText[i])
if textData[i] == 'A':
val = commonParameter + x * pA
elif textData[i] == 'B':
val = commonParameter + x * pB
else:
raise(Exception('Error: must use A or B'))
returnArray.append(val)
return returnArray
initialParameters = [1.0, 1.0, 1.0]
# curve fit the equations individually to their respective data
params, pcov = curve_fit(fitFunction, xdata, ydata, initialParameters)
# values for display of fitted function
commonParameter, pA, pB = params
# for plotting the fitting results
y_fit = fitFunction(xdata, commonParameter, pA, pB)
plt.plot(xdata[0], ydata, 'D') # plot the raw data as a scatterplot
plt.plot(xdata[0][:4], y_fit[:4])
plt.plot(xdata[0][4:], y_fit[4:])
plt.show()
print('fittedparameters:', params)
It would be helpful to show a more complete example of what you are trying, including the call to scipy.optimize.curve_fit. But, if I understand the question correctly, you want to have an argument for your model function that is not treated as a variable in the fit. I believe that curve_fit cannot do this, and treats all arguments after the first as variables.
In fact, I think that your model function will not work for curve_fit because you expect g to be a sequence of values. With curve_fit, each argument after the first will get a single float value. So you probably want something like
def func(x, g0, g1, g2, offsets):
y1 = g0 * cos(2*(x - g1)) + offsets['lab', 0] + g2
...
Anyway, I have two suggestions to work around this limitation of curve_fit:
First, you could overload x. Now, curve_fit will internally apply numpy.asarray() to the x you pass in, but it will otherwise just pass it along to your model function. So, if you turn x into a list containing your real x and your lab, you should be able to unpack this in your model function, say like
xhack = [x, offsets]
def func(x, g0, g1, g2):
x, offsets = x
....
out = curve_fit(func, xhack, ...)
Personally, I think that's kind of ugly, but it might work.
Second, you could use lmfit (https://lmfit.github.io/lmfit-py/), which provides a higher level interface to curve fitting and fixes many of the shortcomings of curve_fit. For your question in particular, lmfit's Model class for curve fitting examines the model function more carefully to turn function arguments into fitting parameters. Specifically:
keyword arguments with non-numerical defaults will not be turned into fit parameters.
you can specify more than 1 "independent variable", and they do not have to be the first argument of the function.
That is, you could either write:
from lmfit import Model
def func(x, g0, g1, g2, offsets=None):
y1 = g0 * cos(2*(x - g1)) + offsets['lab', 0] + g2
mymodel = Model(func)
or explicitly tell Model what the independent variables are:
from lmfit import Model
def func(x, g0, g1, g2, offsets):
y1 = g0 * cos(2*(x - g1)) + offsets['lab', 0] + g2
mymodel = Model(func, independent_vars=['x', 'offsets'])
Either way, offsets can be any complex objects, and you would use this mymodel for curve fitting with:
# create parameter objects for this model, with initial values:
params = mymodel.make_params(g0=0, g1=0.5, g2=2.0)
# run the fit
result = mymodel.fit(ydata, params, x=x, offsets=offsets)
There are lots of other conveniences we added to lmfit (I am one of the developers) for building curve fitting models and working with parameters as high-level objects, but this might be enough to get you started.
I'm looking for a way to generate a plot similar to how ezplot works in MATLAB in that I can type:
ezplot('x^2 + y^2 = y + 5')
and get a graph ready to go for any arbitrary function. I'm only worrying about the case where I have both a x and a y.
I only have the function, and I'd really rather not go about trying to calculate all the y values for some given x range if I didn't have to.
The few solutions I've seen suggested are either about decision boundaries (which this is not. There is no test data or anything, just an arbitrary function) or are all for functions already defined as y = some x equation which doesn't really help me.
I would somewhat accept if there was a good way to mimic Wolfram|Alpha in their solve functionality("solve x^2 + y^2 = y + 5 for y" will give me two functions I could then graph separately), but rather prefer the ezplot as that's more or less instant within MATLAB.
I think you could use sympy plotting and parse_expr for this For your example, this would work as follows
from sympy.plotting import plot_implicit
from sympy.parsing.sympy_parser import parse_expr
def ezplot(s):
#Parse doesn't parse = sign so split
lhs, rhs = s.replace("^","**").split("=")
eqn_lhs = parse_expr(lhs)
eqn_rhs = parse_expr(rhs)
plot_implicit(eqn_lhs-eqn_rhs)
ezplot('x^2 + y^2 = y + 5')
This can be made as general as needed
You could use sympy to solve the equation and then use the resulting functions for plotting y over x:
import sympy
x=sympy.Symbol('x')
y=sympy.Symbol('y')
f = sympy.solve(x**2 + y**2 - y - 5, [y])
print f
xpts = (numpy.arange(10.)-5)/10
ypts = sympy.lambdify(x, f, 'numpy')(xpts)
# then e.g.: pylab.scatter(xpts, ypts)
#EdSmith solution works fine. Nevertheless, I have another suggestion. You can use plot a contour. You can rewrite your function as f(x, y)=0, and then use this code
from numpy import mgrid, pi
import matplotlib.pyplot as plt
def ezplot(f):
x, y = mgrid[-2*pi:2*pi:51, -2*pi:2*pi:51]
z = f(x, y)
ezplt = plt.contour(x, y, f, 0, colors='k')
return ezplt
That's the main idea. Of course, you can generalize it as the function in MATLAB, like general intervals of x and y, passing the function as a string, etc.
I am trying to write a function which returns the x value of some data when the y value is approximately zero. I am given two lists to enter in to the function as [1,4,5] for x values and [-3,5,9] for y values for example. I have written this function by using interpolation and then using indexing to first find index of the y value when it is closes to zero and using this to find the x value at this point: (please note: I added the graph and the y = 0 line for illustrative purposes only.)
def root(xs, ys):
xfine = np.linspace(min(xs), max(xs), 10000)
y0 = inter.interp1d(xs, ys, kind = 'linear')
f1 = xfine, y0(xfine)
x2fine = np.linspace(min(xs), max(xs), 10000)
y2 = np.linspace(0,0, 10000)
f2 = x2fine, y2
pl.plot(xfine, y0(xfine))
pl.plot(x2fine, y2)
pl.show()
closest = min(abs(y0(xfine)))
xindex = numpy.searchsorted(y0(xfine), closest)
print round(xfine[xindex], 3)
This appears to be giving me the right answers but I am told I should use brentq in my function. However I am only given data like that mentioned above, and I swear brentq needs two continuous functions entered doesn't it? How can I go about making this work for brentq with only a new numbers instead of a function?
Although you can use brentq on the interpolated function, since you are already using interpolation, just use it to invert the function:
finv = inter.interp1d(y, x)
print (finv(0))