Lognormal Curve Fit - python

I have lognormal distributed data in x0 and y0 arrays:
x0.ravel() = array([19.8815 , 19.0141 , 18.1857 , 17.3943 , 16.6382 , 15.9158 ,
15.2254 , 14.5657 , 13.9352 , 13.3325 , 12.7564 , 12.2056 ,
11.679 , 11.1755 , 10.6941 , 10.2338 , 9.79353, 9.37249,
8.96979, 8.58462, 8.21619, 7.86376, 7.52662, 7.20409,
6.89552, 6.6003 , 6.31784, 6.04757, 5.78897, 5.54151,
5.30472, 5.07812, 4.86127, 4.65375, 4.45514, 4.26506,
4.08314, 3.90903, 3.74238, 3.58288, 3.4302 , 3.28407,
3.14419, 3.01029, 2.88212, 2.75943, 2.64198, 2.52955,
2.42192, 2.31889, 2.22026, 2.12583, 2.03543, 1.94889,
1.86604, 1.78671, 1.71077, 1.63807, 1.56845, 1.50181,
1.43801, 1.37691, 1.31842, 1.26242, 1.2088 , 1.15746,
1.10832, 1.06126, 1.01619])
y0.ravel() =array([1.01567e+03, 8.18397e+02, 7.31992e+02, 1.11397e+03, 2.39987e+03,
2.73762e+03, 4.65722e+03, 7.06308e+03, 9.67945e+03, 1.38983e+04,
1.98178e+04, 1.97461e+04, 3.28070e+04, 4.48814e+04, 5.80853e+04,
7.35511e+04, 8.94090e+04, 1.08274e+05, 1.28276e+05, 1.50281e+05,
1.69258e+05, 1.91944e+05, 2.16416e+05, 2.37259e+05, 2.57426e+05,
2.74818e+05, 2.90343e+05, 3.01369e+05, 3.09232e+05, 3.13713e+05,
3.17225e+05, 3.19177e+05, 3.17471e+05, 3.14415e+05, 3.08396e+05,
2.95692e+05, 2.76097e+05, 2.52075e+05, 2.29330e+05, 1.97843e+05,
1.74262e+05, 1.46360e+05, 1.20599e+05, 9.82223e+04, 7.80995e+04,
6.34618e+04, 4.77460e+04, 3.88737e+04, 3.23715e+04, 2.58129e+04,
2.15724e+04, 1.58737e+04, 1.13006e+04, 7.64983e+03, 4.64590e+03,
3.31463e+03, 2.40929e+03, 3.02183e+03, 1.47422e+03, 1.06046e+03,
1.34875e+03, 8.26674e+02, 9.53167e+02, 6.47428e+02, 9.83651e+02,
8.93673e+02, 1.23637e+03, 0.00000e+00, 8.36573e+01])
I want to use curve_fit to get an function, that fits my data points, to gain the mu (and then the exp(mu) for the median) and the sigma of this distribution.
import numpy as np
from scipy.optimize import *
def f(x, mu, sigma) :
return 1/(np.sqrt(2*np.pi)*sigma*x)*np.exp(-((np.log(x)-
mu)**2)/(2*sigma**2))
params, extras = curve_fit(f, x0.ravel(), y0.ravel())
print "mu=%g, sigma=%g" % (params[0], params[1])
plt.plot(x0, y0, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
The result is the following:
mu=1.47897, sigma=0.0315236
Curve_fit
Obviously the function does not fit the data by any means.
When i multiply the fitting function by, lets say 1.3*10^(5) in the code:
plt.plot(x0, 1.3*10**5*f(x0 ,params[0], params[1]))
This is the result:
manually changed fitting curve
The calculated mu value, which is the mean value of the related normal distribution seems right, because when im using:
np.mean(np.log(x))
i get 1.4968838412183132, which is quite similar to the mu i obtain from curve_fit.
Calculating the median with
np.exp(np.mean(np.log(x))
gives 4.4677451525990675, which seems to be ok.
But unless i see the fitting function going threw my datapoints, i do not really trust these numbers. My problem is obviously, that the fitting function has no information of the (big) y0 values. How can i change that?
Any help apreciated!

The problem is, that your data are not(!) showing a lognormal pdf, since they are not normalized properly. Note, that the integral over a pdf has to be 1. If you numerically integrate your data and nomalize by that, e.g.
y1 = y0/np.trapz(x0, y0)
your approach works fine.
params, extras = curve_fit(f, x0, y1)
plt.plot(x0, y1, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
and
print("mu=%g, sigma=%g" % (params[0], params[1]))
resulting in
mu=1.80045, sigma=0.372185

Related

Fitting data in python using curve_fit

I'm trying to fit data in python to obtain the coefficients for the best fit.
The equation I need to fit is:
Vs = a*(qt^b)(fs^c)(ov^d)
Whereby I have the data for qt, fs and ov and need to obtain the values for a,b,c,d.
The code I'm using is:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
qt = [10867.073, 8074.986, 2208.366, 3066.566, 2945.326, 4795.766, 2249.813, 2018.3]
fs = [229.6, 17.4, 5.3, 0.1, 0.1, 0.1, 0.1, 0.1]
ov = [19.159, 29.054, 37.620, 44.854, 51.721, 58.755, 65.622, 72.492]
Vs = [149.787, 125.3962, 133.927, 110.047, 149.787, 137.809, 201.506, 154.925]
d = [1.018, 1.518, 2.0179, 2.517, 3.017, 3.517, 4.018, 4.52]
def func(a, b, c, d):
return a*qt**b*fs**c*ov**d
popt, pcov = curve_fit(func, Vs, d)
print(popt[0], popt[1], popt[2])
plt.plot(Vs, d, 'ro',label="Original Data")
plt.plot(Vs, func(Vs,*popt), label="Fitted Curve")
plt.gca().invert_yaxis()
plt.show()
Which produces the following output (Significant figures cut by me):
-0.333528 -0.1413381 -0.3553966
I was hoping to get something more like below where the data it fitted but it hasn't been done perfectly (note the one below is just an example and is not correct).
The main hitch is the graphical representation of the result. What is drawn has no signifiance : Even with perfect data and with perfect fitting the points will appear verry scattered. This is misleading.
Better draw (Vs from computation) divided by (Vs from data) and compare to 1 which should be the exact value if the fitting was perfect.
Note that your problem is a simple linear regression in logarithm scale.
ln(Vs) = A + b * ln(qt) + c * ln(fs) + d * ln(ov)
Linear regression straight away gives A , b , c , d and a = exp(A) .

scipy.odr fails in fitting exponential function

I'm working on an astrophysics project where I need to measure the density(ne) of the gas in the center of the galaxy by two methods(A and S). I made a plot of ne_s x ne_a and I want to try an exponential fit in this plot. The problems are the following:
the errors in the data are asymmetrical and, apparently, scipy.odr does not accept this type of error. When the erros are included 'ValueError: could not convert we to a suitable array' is raised.
even if I do not include the errors the fit still does not work.
The code used(errors in the fit not included):
import numpy as np
import matplotlib.pyplot as plt
ne_s = np.array([ 134.70722125, 316.27850769, 403.37221974, 579.91067991,
1103.06258335, 1147.23685549, 115.00820933, 476.42659337,
667.61690967, 403.30988606, 282.08007264, 479.98058352,
897.64247885, 214.75999934, 213.22512064, 491.81749573,
743.68513419, 374.37957281, 362.136037 , 893.88595455])
dne_s_max = np.array([23.6619623 , 5.85802097, 12.02456923, 1.50211648, 5.15987014,
10.3830146 , 10.5274528 , 0.82928872, 2.18586603, 31.95014727,
6.53134179, 2.38392559, 32.2838402 , 5.43629034, 1.02316579,
6.60281602, 14.53943481, 9.16809221, 6.84052648, 12.87655997])
dne_s_min = np.array([21.94513608, 5.80578938, 11.8303456 , 1.49856527, 5.1265976 ,
10.2523836 , 10.12663739, 0.82824884, 2.17914616, 30.55846643,
6.45691351, 2.37446669, 30.87025015, 5.37271061, 1.02087355,
6.5358395 , 14.21332643, 9.0523711 , 6.77187898, 12.64596461])
ne_a = np.array([ 890.61498788, 2872.03715706, 10222.33463389, 1946.48193766,
6695.25304235, 2107.36471192, 891.72010662, 3988.87511761,
11328.9670489 , 1097.38904905, 2896.62668843, 4849.57809801,
5615.96780935, 1415.18564794, 1204.00022768, 3616.05423907,
15638.52683391, 3300.6039601 , 775.28841051, 12325.54379524])
dne_a_max = np.array([1082.33639266, 571.57094375, 2396.39839075, 458.32058555,
796.79916236, 665.95370946, 2262.73423374, 1006.65192577,
1761.9251987 , 1718.78400914, 579.65477159, 245.54811362,
1652.50314639, 401.37677822, 178.03620792, 725.26490794,
6625.62353545, 908.21490446, 719.01117673, 2098.24809312])
dne_a_min = np.array([ 865.33019015, 518.08880981, 1877.85283954, 412.91242092,
724.38681574, 582.52644162, 870.14392196, 866.63643893,
1478.1792513 , 1076.64135559, 521.08794554, 236.2457763 ,
1349.36104495, 362.72343267, 169.23314057, 646.39803115,
4139.5768453 , 789.04878324, 620.55523654, 1720.06369942])
dne_a = [dne_a_min, dne_a_max]
dne_s = [dne_s_min, dne_s_max]
fig, ax = plt.subplots(1,1)
ax.errorbar(ne_s, ne_a, xerr = dne_s, yerr = dne_a,
linestyle = 'none', linewidth = 0.7, capsize = 5, color = 'crimson')
ax.scatter(ne_s, ne_a, s = 15, color = 'black')
ax.set_ylabel('$n_e(A)$'), ax.set_xlabel('$n_e(S)$')
from scipy.odr import Data, RealData, Model, ODR
def f(B, x):
return B[0] + B[1] * np.exp(B[2] * x)
exponential = Model(f)
data = RealData(ne_s, ne_a)
odr = ODR(data, exponential, beta0=[1, 200, 3e-3])
out = odr.run()
ax.plot(ne_s, f(out.beta, ne_s), linewidth = 0.7)
Which results in:
And the actual plot is:
So what am I missing here? Did I applied the odr routine erroneously? What should I do to make the fit work properly? And how to make scipy.odr accept asymmetrical error?
Important to add that I don't know too much about scipy.odr, I just adapted the documentation example to an exponential function.
Appreciate the help. Let me know if more information is necessary.

Plotting exponential curve by fitting to data in python

I have x array and y array. I need to predict points(points of pareto fronts) in left and right side (I don't know exactly how). That's why used in first extrapolation of points, but it gaves linear
I tried to fit my data points to exponential curve, but gave me as result only straight line with coefficients A=1.0, K=1.0, C=232.49024883323932.
I also tried to fit my data to polynomial function with degree 3, result better, but the tail is increasing
x=[263.56789586, 263.56983885, 263.57178259, 263.57372634,
263.57567008, 263.57761383, 263.57955759, 263.58150134,
263.58344509, 263.58538884, 263.5873326 , 263.66699508,
263.5912201 , 263.59316385, 263.59510762, 263.59705135,
263.59899512, 263.60093885, 263.60288261, 263.60482637,
263.60677014, 263.60871386, 263.61065764, 263.61260141,
263.61454514, 263.6164889 , 263.61843266, 263.62037642,
263.62232019, 263.62426392, 263.62620767, 263.62815143,
263.63009519, 263.63203894, 263.63398269, 263.63592645,
263.63787021, 263.63981396, 263.64175772, 263.64370148,
263.64564523, 263.64758898, 263.64953274, 263.65147649,
263.65342024, 263.655364 , 263.65730775, 263.65925151,
263.66119527, 263.66313902, 263.66508278, 263.66702653,
263.66897028, 263.67091404, 263.67285779, 263.67480155,
263.67674531, 263.67868906, 263.68063281, 263.68257657,
263.68452032, 263.68646408, 263.68840783, 263.69035159,
263.69229534, 263.69423909, 263.69618285, 263.6981266 ,
263.70007036, 263.70201411, 263.70395787, 263.70590162,
263.70784537, 263.70978913, 263.71173288, 263.71367664,
263.71562039, 263.71756415, 263.7195079 , 263.72145166,
263.72339541, 263.72533917, 263.72728292, 263.72922667,
263.73117043, 263.73311418, 263.73505802, 263.73700169,
263.73894545, 263.74088929, 263.74283296, 263.74477671,
263.74672046, 263.74866422, 263.75060797, 263.75255173,
263.75449548, 263.75613889, 263.75617049, 263.75587478]
y= [232.99031933, 232.95558575, 232.93713544, 232.9214609 ,
232.9072364 , 232.8939496 , 232.88133917, 232.86925025,
232.85758305, 232.84626821, 232.8352564 , 232.59299633,
232.81389123, 232.80326395, 232.79262328, 232.7819675 ,
232.77129713, 232.76061533, 232.74992564, 232.739233 ,
232.72854327, 232.717862 , 232.70719578, 232.69655002,
232.68593193, 232.67534666, 232.66479994, 232.65429722,
232.64384269, 232.63344132, 232.62309663, 232.61281189,
232.60259005, 232.59243338, 232.5823437 , 232.57232231,
232.56237032, 232.55248837, 232.54267676, 232.5329356 ,
232.52326498, 232.51366471, 232.5041348 , 232.4946754 ,
232.48528698, 232.47597054, 232.46672774, 232.4575614 ,
232.44847565, 232.43947638, 232.43057166, 232.42177232,
232.41309216, 232.40454809, 232.39615988, 232.387949 ,
232.37993719, 232.37215305, 232.36467603, 232.35750509,
232.35062207, 232.34401067, 232.33765646, 232.33154649,
232.32566912, 232.32001373, 232.31457099, 232.30933224,
232.30428981, 232.29943675, 232.29476679, 232.29027415,
232.2859539 , 232.28180142, 232.2778127 , 232.27398435,
232.27031312, 232.26679651, 232.26343235, 232.26021888,
232.25715486, 232.25423943, 232.25147212, 232.24885312,
232.24638301, 232.24406282, 232.24189425, 232.23987999,
232.23802286, 232.23632689, 232.2347971 , 232.23343943,
232.23226138, 232.23127218, 232.2304829 , 232.22990751,
232.22956307, 232.22946832, 232.22947064, 232.22947062]
def func(x, a, b, c, d):
#return a*x**3 + b*x**2 +c*x + d
return a*np.exp(-b*x) + c
x = pareto_df['BSFC_bar_LET'].values
y = pareto_df['BSFC_RatedP'].values
popt, pcov = curve_fit(func, x, y)
print (popt[0], popt[1], popt[2], popt[3])
min_x=min(np.exp(x))-0.5*(max(np.exp(x))-min(np.exp(x)))
max_x=max(np.exp(x))+0.5*(max(np.exp(x))-min(np.exp(x)))
xnew= np.linspace(min(x), max(x), 1000)
plt.plot(x, y, 'o')
plt.plot(xnew, func(xnew, *popt), label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
enter image description here

scipy curve_fit returns initial estimates

To fit a hyperbolic function I am trying to use the following code:
import numpy as np
from scipy.optimize import curve_fit
def hyperbola(x, s_1, s_2, o_x, o_y, c):
# x > Input x values
# s_1 > slope of line 1
# s_2 > slope of line 2
# o_x > x offset of crossing of asymptotes
# o_y > y offset of crossing of asymptotes
# c > curvature of hyperbola
b_2 = (s_1 + s_2) / 2
b_1 = (s_2 - s_1) / 2
return o_y + b_1 * (x - o_x) + b_2 * np.sqrt((x - o_x) ** 2 + c ** 2 / 4)
min_fit = np.array([-3.0, 0.0, -2.0, -10.0, 0.0])
max_fit = np.array([0.0, 3.0, 3.0, 0.0, 10.0])
guess = np.array([-2.5/3.0, 4/3.0, 1.0, -4.0, 0.5])
vars, covariance = curve_fit(f=hyperbola, xdata=n_step, ydata=n_mean, p0=guess, bounds=(min_fit, max_fit))
Where n_step and n_mean are measurement values generated earlier on. The code runs fine and gives no error message, but it only returns the initial guess with a very small change. Also, the covariance matrix contains only zeros. I tried to do the same fit with a better initial guess, but that does not have any influence.
Further, I plotted the exact same function with the initial guess as input and that gives me indeed a function which is close to the real values. Does anyone know where I make a mistake here? Or do I use the wrong function to make my fit?
The issue must lie with n_step and n_mean (which are not given in the question as currently stated); when trying to reproduce the issue with some arbitrarily chosen set of input parameters, the optimization works as expected. Let's try it out.
First, let's define some arbitrarily chosen input parameters in the given parameter space by
params = [-0.1, 2.95, -1, -5, 5]
Let's see what that looks like:
import matplotlib.pyplot as plt
xs = np.linspace(-30, 30, 100)
plt.plot(xs, hyperbola(xs, *params))
Based on this, let us define some rather crude inputs for xdata and ydata by
xdata = np.linspace(-30, 30, 10)
ydata = hyperbola(xs, *params)
With these, let us run the optimization and see if we match our given parameters:
vars, covariance = curve_fit(f=hyperbola, xdata=xdata, ydata=ydata, p0=guess, bounds=(min_fit, max_fit))
print(vars) # [-0.1 2.95 -1. -5. 5. ]
That is, the fit is perfect even though our params are rather different from our guess. In other words, if we are free to choose n_step and n_mean, then the method works as expected.
In order to try to challenge the optimization slightly, we could also try to add a bit of noise:
np.random.seed(42)
xdata = np.linspace(-30, 30, 10)
ydata = hyperbola(xdata, *params) + np.random.normal(0, 10, size=len(xdata))
vars, covariance = curve_fit(f=hyperbola, xdata=xdata, ydata=ydata, p0=guess, bounds=(min_fit, max_fit))
print(vars) # [ -1.18173287e-01 2.84522636e+00 -1.57023215e+00 -6.90851334e-12 6.14480856e-08]
plt.plot(xdata, ydata, '.')
plt.plot(xs, hyperbola(xs, *vars))
Here we note that the optimum ends up being different from both our provided params and the guess, still within the bounds provided by min_fit and max_fit, and still provided a good fit.

Python line of best fit return value

I am trying to fit a linear line to my graph:
the x values of the red line(raw data) are:
array([ 0.03591733, 0.16728212, 0.49537727, 0.96912459,
1. , 1. , 1.11894521, 1.93042113,
2.94284656, 10.98699942])
and the y values are
array([ 0.0016241 , 0.00151784, 0.00155586, 0.00174498, 0.00194872,
0.00189413, 0.00208325, 0.00218074, 0.0021281 , 0.00243127])
my code for the line of best fit is:
LineFit = np.polyfit(x, y, 1)
p = np.poly1d(LineFit)
plt.plot(x,y,'r-')
plt.plot(x,p(y),'--')
plt.show()
However, my LineFit returns me
array([ 7.03475069e-05, 1.76565292e-03])
which supposed to be interception and gradient according to the definition of polyfit (lower to higher order coefficient)
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.polyfit.html
but seems like its the opposite (gradient and interception) from the plot.
Could someone explain this to me?
You are looking at a different doc. See https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyfit.html#numpy.polyfit:
...Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y).
So in your example, it is p(x) = p[0] * x + p[1], which is exactly gradient and interception...

Categories

Resources