How to fit math function to the dataset in Python [closed]

How to fit math function to the dataset in Python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
I have three sets of data as shown below:
I wonder what is the function they follow and how to fit these curves in Python?
I guess the first function is something like:
y = axb + cx + d
I tried some arbitrary parameters:
x = numpy.arange(1,10000,2.)
a = 100.
b = -0.003
c = 50.
d = 0.1
y = -a/x**d+b*x+c
scatter(x,y)
The figure shows like this:
Anyone could help with the other two?

This is how I would solve this, as a total beginner with NumPy and SciPy:
First, create a Python function that we think calculates y for every x. (This equation is in the question above.)
def fx(x, a, b, c, d):
return -a / x**d + b*x + c
Then we need some data from the graph, which was not included in the question, so I guessed a bit, using the green graph in the question:
x = [0.01, 1000, 2000, 4000, 6000, 8000, 10000]
y = [1, 1.67, 1.75, 1.67, 1.6, 1.5, 1.4]
(x = 0 is an invalid point, since 0**d is 0, and we can't divide by zero. That's why I say "0.01" instead.)
Then we let SciPy calculate what the constants should be:
from scipy.optimize import curve_fit
popt, pcov = curve_fit(fx, x, y)
print(popt)
This results in "RuntimeError: Optimal parameters not found". We can help by manually guessing a value for d:
def fx(x, a, b, c):
d = -0.1
return -a / x**d + b*x + c
The popt variable will contain values for a, b, and c:
[-5.64063556e-01 -6.55610681e-05 6.41890483e-01]
It's helpful to use a graph calculator like Desmos when experimenting like this, and trying out different values.

There is a function in scipy that can be used to fit a function to data: scipy.optimize.curve_fit.

Thank for all your help. I got a solution from my friends.
y = -a/x**d+b*x+c
Since d makes the fit complicated, it will be easier to set d from 0.1 to 1.0, then use the curve_fit to fit the model. Finally find the best parameter set.

Related

How to convert the Matlab function code into Python code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have been coding in Matlab for a few years and was recently switched to Python. How could I convert a Matlab function code into Python3 code as shown below?
function [estimates, model] = curvefitting(x, y, numOfpoint)
model = #expfun;
estimates = point(model, numOfpoint);
function [sse, FittedCurve] = expfun(params)
A = params(1);
B = params(2);
C = params(3);
FittedCurve = A*(x-B).^C;
ErrorVector = FittedCurve - y;
sse = sum(ErrorVector .^ 2);
end
end
What is #expfun meaning in python? How could I make model = #expfun work in python?

Instead of doing a line-by-line translation, I'd instead recommend using the tools available in Python. In this case if you are trying to perform a curve fit, first define your fit function
import numpy as np
def func(x, a, b, c):
return a * np.power(x - b, c)
Then you you use scipy.optimize.curve_fit
from scipy.optimize import curve_fit
# Assume these were already populated
x = np.array([...])
y = np.array([...])
# Perform curve fit
popt, pcov = curve_fit(func, x, y)
# Get fitted y-values at each x point
fit_y = func(x, *popt)

Fitting parameter inside an integral using python (or another useful language)

I have a set of data, basically with the information of f(x) as a function of x, and x itself. I know from the theory of the problem that I'm working on the format of f(x), which is given as the expression below:
Essentially, I want to use this set of data to find the parameters a and b. My problem is: How can I do that? What library should I use? I would like an answer using Python. But R or Julia would be ok as well.
From everything I had done so far, I've read about a functionallity called curve fit from the SciPy library but I'm having some trouble in which form I would do the code as long my x variable is located in one of the integration limit.
For better ways of working with the problem, I also have the following resources:
A sample set, for which I know the parameters I'm looking for. To this set I know that a = 2 and b = 1 (and c = 3). And before it rises some questions about how I know these parameters: I know they because I created this sample set using this parameters from the integration of the equation above just to use the sample to investigate how can I find them and have a reference.
I also have this set, for which the only information I have is that c = 4 and want to find a and b.
I would also like to point out that:
i) right now I have no code to post here because I don't have a clue how to write something to solve my problem. But I would be happy to edit and update the question after reading any answer or help that you guys could provide me.
ii) I'm looking first for a solution where I don't know a and b. But in case that it is too hard I would be happy to see some solution where I suppose that one either a or b is known.
EDIT 1: I would like to reference this question to anyone interested in this problem as it's a parallel but also important discussion to the problem faced here

I would use a pure numeric approach, which you can use even when you can not directly solve the integral. Here's a snipper for fitting only the a parameter:
import numpy as np
from scipy.optimize import curve_fit
import pandas as pd
import matplotlib.pyplot as plt
def integrand(x, a):
b = 1
c = 3
return 1/(a*np.sqrt(b*(1+x)**3 + c*(1+x)**4))
def integral(x, a):
dx = 0.001
xx = np.arange(0, x, dx)
arr = integrand(xx, a)
return np.trapz(arr, dx=dx, axis=-1)
vec_integral = np.vectorize(integral)
df = pd.read_csv('data-with-known-coef-a2-b1-c3.csv')
x = df.domin.values
y = df.resultados2.values
out_mean, out_var = curve_fit(vec_integral, x, y, p0=[2])
plt.plot(x, y)
plt.plot(x, vec_integral(x, out_mean[0]))
plt.title(f'a = {out_mean[0]:.3f} +- {np.sqrt(out_var[0][0]):.3f}')
plt.show()
vec_integral = np.vectorize(integral)
Of course, you can lower the value of dx to get the desired precision. While for fitting just the a, when you try to fir b as well, the fit does not converge properly (in my opinion because a and b are strongly correlated). Here's what you get:
def integrand(x, a, b):
c = 3
return 1/(a*np.sqrt(np.abs(b*(1+x)**3 + c*(1+x)**4)))
def integral(x, a, b):
dx = 0.001
xx = np.arange(0, x, dx)
arr = integrand(xx, a, b)
return np.trapz(arr, dx=dx, axis=-1)
vec_integral = np.vectorize(integral)
out_mean, out_var = sp.optimize.curve_fit(vec_integral, x, y, p0=[2,3])
plt.title(f'a = {out_mean[0]:.3f} +- {np.sqrt(out_var[0][0]):.3f}\nb = {out_mean[1]:.3f} +- {np.sqrt(out_var[1][1]):.3f}')
plt.plot(x, y, alpha=0.4)
plt.plot(x, vec_integral(x, out_mean[0], out_mean[1]), color='green', label='fitted solution')
plt.plot(x, vec_integral(x, 2, 1),'--', color='red', label='theoretical solution')
plt.legend()
plt.show()
As you can see, even if the resulting a and b parameters form the fit are "not good", the plot is very similar.

They are three variables a,b,c which are not independent. One of them must be given if we want compute the two others thanks to regression. With given c, solving for a,b is simple :
The example of numerical calculus below is made with a small data (n=10) in order to make it easy to check.
Note that the regression is for the function t(y) wich is not exactly the same as for y(x) when the data is scattered (The result is the same if no scatter).
If it is absolutely necessary to have the regression for y(x) a non-linear regression is necessary. This involves an iterative process starting from good enought initial guess for a,b. The above calculus gives very good initial values.
IN ADDITION :
Meanwhile Andrea posted a pertinent answer. Of course the fitting with his method is better because this is a non-linear regression instead of linear as already pointed out in the above note.
Nevertheless, dispite the different values (a=1.881 ; b=1.617) compared to (a=2.346 , b=-0.361) the respective curves drawn below are not far one from the other :
Blue curve : from linear regression (above method)
Green curve : from non-linear regression ( Andrea's )
CASE OF THE SECOND SET OF DATA
https://mega.nz/#!echEjQyK!tUEx0gpFND7gucvsTONiB_wn-ewBq-5k-pZlfLxmfvw
The regression fails because the assumption c=3 is false.
In the case c=0 the analytic calculus of the integral is different from above :

Curve fitting of Hyperbola and finding its parameters associated [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Assuming General formula of hyperbola to be y = 1 / (a*x + b), and we are provided with 100 data points out of which 99 points exactly fits a hyperbola and one data point is doesn't fits in it (unknown), from this information we need to find approximate values of a and b parameters for the hyperbola that will be formed from correct data points which are provided.
The approach which I took was by using scipy.optimize.curve_fit method as "parameters, _ = optimize.curve_fit(test, x_data ,y_data) "
where my "test" function was "def test(x, a, b): return 1 / (a*x + b)" using this method provides me perfect solution is my data points are all in first quadrant but if the data is distributed in more than one quadrant then this approach fails and I get wrong values of a and b.
Code:
import numpy as np
from scipy import optimize
x_data = np.linspace(-5,1, num=99)
y_data = 1 / (5 * x_data + 4) # value of original value of a = 5 & b = 4
np.random.seed(0)
# adding wrong point at 36th position
x_data = np.insert(x_data, 36 , 7)
y_data = np.insert(y_data, 36, 5)
def test(x, a, b):
return 1 / (a*x + b)
parameters, _ = optimize.curve_fit(test, x_data ,y_data)
[a,b] = parameters
a = 146.83956808191303
b = 148.78257639384725
# which is too wrong
Solution for above will certainly be appreciated.

Your problem is easy if the given points "exactly fit the hyperbola," and you need only two data points.
Your equation y = 1 / (ax + b) can be transformed into
(x*y) * a + (y) * b = 1
That is a linear equation in a and b. Use two data points to find the corresponding values of x * y and y and you end up with two linear equations in two variables (though in a and b rather than x and y). Then solve those two linear equations. This can easily be automated. This also does not depend on the quadrants of your two given points.
This does not work if your given points only approximate a hyperbola, of course.
In your last edit, you added the statement that only 99 of the points fit on the hyperbola, and one does not. This can be handled by choosing three pairs of your points (six distinct points), and finding the hyperbola that goes through each pair of points. That means calculating three hyperbolas (equivalently, calculating three values of a and b). If those three pairs of a and b all agree with small precision, the non-matching point was not in the chosen sample of three pairs of points and you have your hyperbola. If only two of them agree within precision, use that hyperbola. If no pair of the hyperbolas agree, some mistake was made and you should raise an exception.

python - how to find area under curve? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
would like to ask if it is possible to calculate the area under curve for a fitted distribution curve?
The curve would look like this
I've seen some post online regarding the usage of trapz, but i'm not sure if it will work for a curve like that. Please enlighten me and thank you for the help!

If your distribution, f, is discretized on a set of points, x, that you know about, then you can use scipy.integrate.trapz or scipy.integrate.simps directly (pass f, x as arguments in that order). For a quick check (e.g. that your distribution is normalized), just sum the values of f and multiply by the grid spacing:
import numpy as np
from scipy.integrate import trapz, simps
x, dx = np.linspace(-100, 250, 50, retstep=True)
mean, sigma = 90, 20
f = np.exp(-((x-mean)/sigma)**2/2) / sigma / np.sqrt(2 * np.pi)
print('{:18.16f}'.format(np.sum(f)*dx))
print('{:18.16f}'.format(trapz(f, x)))
print('{:18.16f}'.format(simps(f, x)))
Output:
1.0000000000000002
0.9999999999999992
1.0000000000000016

Firstly, you have to find a function from a graph. You can check here. Then you can use integration in python with scipy. You can check here for integration.
It is just math stuff as Daniel Sanchez says.

curve_fit from scipy generates wrong values

I have several data from measurements, and I wanted to fit my own function onto this data. For that I wanted to use curve_fit. Therefore I implemented it in the following way:
def func(self, w, H, T, L):
B = (L + 0.25)*((2 * math.pi / T)** 4)
S = (B**L)*H*H / (4 * scipy.special.gamma(L)*(w**( 4 * L + 1)))*numpy.exp(-B / (w*w*w*w))
return S
def fit_this(self):
self.popt, self.copt = curve_fit(self.func, self.xdata, self.ydata)
print self.popt
print self.copt
self.xdata and self.ydata contain my measured values. I already did the fit with the same function in another program (matlab) and therefore I already know working parameters. But when start this program, I get for popt definitely wrong parameters, and the covariance of the values is somewhere in the range of 10e+22, and therefore completely useless. Why is that happening (Guess: Wrong starting values), and how can I improve this situation? If providing my data helps I can provide them, too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.