I have x array and y array. I need to predict points(points of pareto fronts) in left and right side (I don't know exactly how). That's why used in first extrapolation of points, but it gaves linear
I tried to fit my data points to exponential curve, but gave me as result only straight line with coefficients A=1.0, K=1.0, C=232.49024883323932.
I also tried to fit my data to polynomial function with degree 3, result better, but the tail is increasing
x=[263.56789586, 263.56983885, 263.57178259, 263.57372634,
263.57567008, 263.57761383, 263.57955759, 263.58150134,
263.58344509, 263.58538884, 263.5873326 , 263.66699508,
263.5912201 , 263.59316385, 263.59510762, 263.59705135,
263.59899512, 263.60093885, 263.60288261, 263.60482637,
263.60677014, 263.60871386, 263.61065764, 263.61260141,
263.61454514, 263.6164889 , 263.61843266, 263.62037642,
263.62232019, 263.62426392, 263.62620767, 263.62815143,
263.63009519, 263.63203894, 263.63398269, 263.63592645,
263.63787021, 263.63981396, 263.64175772, 263.64370148,
263.64564523, 263.64758898, 263.64953274, 263.65147649,
263.65342024, 263.655364 , 263.65730775, 263.65925151,
263.66119527, 263.66313902, 263.66508278, 263.66702653,
263.66897028, 263.67091404, 263.67285779, 263.67480155,
263.67674531, 263.67868906, 263.68063281, 263.68257657,
263.68452032, 263.68646408, 263.68840783, 263.69035159,
263.69229534, 263.69423909, 263.69618285, 263.6981266 ,
263.70007036, 263.70201411, 263.70395787, 263.70590162,
263.70784537, 263.70978913, 263.71173288, 263.71367664,
263.71562039, 263.71756415, 263.7195079 , 263.72145166,
263.72339541, 263.72533917, 263.72728292, 263.72922667,
263.73117043, 263.73311418, 263.73505802, 263.73700169,
263.73894545, 263.74088929, 263.74283296, 263.74477671,
263.74672046, 263.74866422, 263.75060797, 263.75255173,
263.75449548, 263.75613889, 263.75617049, 263.75587478]
y= [232.99031933, 232.95558575, 232.93713544, 232.9214609 ,
232.9072364 , 232.8939496 , 232.88133917, 232.86925025,
232.85758305, 232.84626821, 232.8352564 , 232.59299633,
232.81389123, 232.80326395, 232.79262328, 232.7819675 ,
232.77129713, 232.76061533, 232.74992564, 232.739233 ,
232.72854327, 232.717862 , 232.70719578, 232.69655002,
232.68593193, 232.67534666, 232.66479994, 232.65429722,
232.64384269, 232.63344132, 232.62309663, 232.61281189,
232.60259005, 232.59243338, 232.5823437 , 232.57232231,
232.56237032, 232.55248837, 232.54267676, 232.5329356 ,
232.52326498, 232.51366471, 232.5041348 , 232.4946754 ,
232.48528698, 232.47597054, 232.46672774, 232.4575614 ,
232.44847565, 232.43947638, 232.43057166, 232.42177232,
232.41309216, 232.40454809, 232.39615988, 232.387949 ,
232.37993719, 232.37215305, 232.36467603, 232.35750509,
232.35062207, 232.34401067, 232.33765646, 232.33154649,
232.32566912, 232.32001373, 232.31457099, 232.30933224,
232.30428981, 232.29943675, 232.29476679, 232.29027415,
232.2859539 , 232.28180142, 232.2778127 , 232.27398435,
232.27031312, 232.26679651, 232.26343235, 232.26021888,
232.25715486, 232.25423943, 232.25147212, 232.24885312,
232.24638301, 232.24406282, 232.24189425, 232.23987999,
232.23802286, 232.23632689, 232.2347971 , 232.23343943,
232.23226138, 232.23127218, 232.2304829 , 232.22990751,
232.22956307, 232.22946832, 232.22947064, 232.22947062]
def func(x, a, b, c, d):
#return a*x**3 + b*x**2 +c*x + d
return a*np.exp(-b*x) + c
x = pareto_df['BSFC_bar_LET'].values
y = pareto_df['BSFC_RatedP'].values
popt, pcov = curve_fit(func, x, y)
print (popt[0], popt[1], popt[2], popt[3])
min_x=min(np.exp(x))-0.5*(max(np.exp(x))-min(np.exp(x)))
max_x=max(np.exp(x))+0.5*(max(np.exp(x))-min(np.exp(x)))
xnew= np.linspace(min(x), max(x), 1000)
plt.plot(x, y, 'o')
plt.plot(xnew, func(xnew, *popt), label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
enter image description here
Related
I'm working on an astrophysics project where I need to measure the density(ne) of the gas in the center of the galaxy by two methods(A and S). I made a plot of ne_s x ne_a and I want to try an exponential fit in this plot. The problems are the following:
the errors in the data are asymmetrical and, apparently, scipy.odr does not accept this type of error. When the erros are included 'ValueError: could not convert we to a suitable array' is raised.
even if I do not include the errors the fit still does not work.
The code used(errors in the fit not included):
import numpy as np
import matplotlib.pyplot as plt
ne_s = np.array([ 134.70722125, 316.27850769, 403.37221974, 579.91067991,
1103.06258335, 1147.23685549, 115.00820933, 476.42659337,
667.61690967, 403.30988606, 282.08007264, 479.98058352,
897.64247885, 214.75999934, 213.22512064, 491.81749573,
743.68513419, 374.37957281, 362.136037 , 893.88595455])
dne_s_max = np.array([23.6619623 , 5.85802097, 12.02456923, 1.50211648, 5.15987014,
10.3830146 , 10.5274528 , 0.82928872, 2.18586603, 31.95014727,
6.53134179, 2.38392559, 32.2838402 , 5.43629034, 1.02316579,
6.60281602, 14.53943481, 9.16809221, 6.84052648, 12.87655997])
dne_s_min = np.array([21.94513608, 5.80578938, 11.8303456 , 1.49856527, 5.1265976 ,
10.2523836 , 10.12663739, 0.82824884, 2.17914616, 30.55846643,
6.45691351, 2.37446669, 30.87025015, 5.37271061, 1.02087355,
6.5358395 , 14.21332643, 9.0523711 , 6.77187898, 12.64596461])
ne_a = np.array([ 890.61498788, 2872.03715706, 10222.33463389, 1946.48193766,
6695.25304235, 2107.36471192, 891.72010662, 3988.87511761,
11328.9670489 , 1097.38904905, 2896.62668843, 4849.57809801,
5615.96780935, 1415.18564794, 1204.00022768, 3616.05423907,
15638.52683391, 3300.6039601 , 775.28841051, 12325.54379524])
dne_a_max = np.array([1082.33639266, 571.57094375, 2396.39839075, 458.32058555,
796.79916236, 665.95370946, 2262.73423374, 1006.65192577,
1761.9251987 , 1718.78400914, 579.65477159, 245.54811362,
1652.50314639, 401.37677822, 178.03620792, 725.26490794,
6625.62353545, 908.21490446, 719.01117673, 2098.24809312])
dne_a_min = np.array([ 865.33019015, 518.08880981, 1877.85283954, 412.91242092,
724.38681574, 582.52644162, 870.14392196, 866.63643893,
1478.1792513 , 1076.64135559, 521.08794554, 236.2457763 ,
1349.36104495, 362.72343267, 169.23314057, 646.39803115,
4139.5768453 , 789.04878324, 620.55523654, 1720.06369942])
dne_a = [dne_a_min, dne_a_max]
dne_s = [dne_s_min, dne_s_max]
fig, ax = plt.subplots(1,1)
ax.errorbar(ne_s, ne_a, xerr = dne_s, yerr = dne_a,
linestyle = 'none', linewidth = 0.7, capsize = 5, color = 'crimson')
ax.scatter(ne_s, ne_a, s = 15, color = 'black')
ax.set_ylabel('$n_e(A)$'), ax.set_xlabel('$n_e(S)$')
from scipy.odr import Data, RealData, Model, ODR
def f(B, x):
return B[0] + B[1] * np.exp(B[2] * x)
exponential = Model(f)
data = RealData(ne_s, ne_a)
odr = ODR(data, exponential, beta0=[1, 200, 3e-3])
out = odr.run()
ax.plot(ne_s, f(out.beta, ne_s), linewidth = 0.7)
Which results in:
And the actual plot is:
So what am I missing here? Did I applied the odr routine erroneously? What should I do to make the fit work properly? And how to make scipy.odr accept asymmetrical error?
Important to add that I don't know too much about scipy.odr, I just adapted the documentation example to an exponential function.
Appreciate the help. Let me know if more information is necessary.
I have data like this:
x = np.array([ 0. , 3. , 3.3 , 10. , 18. , 43. , 80. ,
120. , 165. , 210. , 260. , 310. , 360. , 410. ,
460. , 510. , 560. , 610. , 660. , 710. , 760. ,
809.5 , 859. , 908.5 , 958. , 1007.5 , 1057. , 1106.5 ,
1156. , 1205.5 , 1255. , 1304.5 , 1354. , 1403.5 , 1453. ,
1502.5 , 1552. , 1601.5 , 1651. , 1700.5 , 1750. , 1799.5 ,
1849. , 1898.5 , 1948. , 1997.5 , 2047. , 2096.5 , 2146. ,
2195.5 , 2245. , 2294.5 , 2344. , 2393.5 , 2443. , 2492.5 ,
2542. , 2591.5 , 2640. , 2690. , 2740. , 2789.67, 2839.33,
2891.5 ])
y = array([ 1.45 , 1.65 , 5.8 , 6.8 , 8.0355, 8.0379, 8.04 ,
8.0505, 8.175 , 8.3007, 8.4822, 8.665 , 8.8476, 9.0302,
9.528 , 9.6962, 9.864 , 10.032 , 10.2 , 10.9222, 11.0553,
11.1355, 11.2228, 11.3068, 11.3897, 11.4704, 11.5493, 11.6265,
11.702 , 11.7768, 11.8491, 11.9208, 11.9891, 12.0571, 12.1247,
12.1912, 12.2558, 12.3181, 12.3813, 12.4427, 12.503 , 12.5638,
12.6226, 12.6807, 12.7384, 12.7956, 12.8524, 12.9093, 12.9663,
13.0226, 13.0786, 13.1337, 13.1895, 13.2465, 13.3017, 13.3584,
13.4156, 13.4741, 13.5311, 13.5899, 13.6498, 13.6533, 13.657 ,
13.6601])
and look like this :
I need to make curve fitting for this trend. Iam using Moving Average for smoothing and look like this:
where the magenta color is the MA, and Iam using polynomial (5th Ordo) and look like this:
where the blue is the result of the polynomial. I have try higher ordo, but the result getting worst. How can I get a result where first point at (0,0) and look like this (like black curve)?
This is my code :
import numpy as np
from scipy import interpolate
def movingaverage(interval, window_size):
window= np.ones(int(window_size))/float(window_size)
print(window)
return np.convolve(interval, window, 'same')
y_av = movingaverage(y, 2)
X = np.arange(0,np.max(x),30).ravel()
yinter = interpolate.interp1d(x,y_av)(X)
z = np.poly1d(np.polyfit(x,y_av,5))
Y = z(X)
plt.figure(1)
plt.plot(xm,ym,'*-r')
plt.plot(xm,y_av,'.-m')
plt.plot(X,Y,'*-b')
To do this, you should use your analytical function (with parameters) based on some assumption (not only polynomial functions). You can use curve_fit form scipy.optimize to find the unknown parameters of your analytic function that best fit your input data.
For example:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# your analytical function (theoretical function) with parameters: a, b (or more)
def your_analytical_func(x, a, b):
return a * np.log(x + b) # this is just for example
# or using anonymous (lambda) function
# your_analytical_func = lambda x, a, b: a * np.log(x + b)
# Fit for the parameters a, b (or more) of the function your_analytical_func:
popt, pcov = curve_fit(your_analytical_func, x, y)
plt.plot(x, y, 'r.', label='incoming data')
plt.plot(x, your_analytical_func(x, *popt), '-', color="black", label='fit: your_analytical_func(x, a=%5.3f, b=%5.3f)' % tuple(popt))
plt.legend()
I have a set of x, y and z points and am trying to fit a plane to this three-dimensional data so that z=f(x,y) can be calculated for any x and y.
I am hoping to get an equation for the plane and plot the graph in a Jupyter notebook for visualization.
This is the (working) code I've been using to plot my data:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import pandas as pd
x = np.arange(-12, 1)
y = np.arange(-40,-25)
Z = array([[402., 398., 395., 391., 387., 383., 379., 375., 371., 367., 363.,358., 354.],
[421., 417., 413., 409., 406., 402., 398., 393., 389., 385., 381.,
376., 372.],
[440., 436., 432., 429., 425., 421., 416., 412., 408., 404., 399.,
395., 391.],
[460., 456., 452., 448., 444., 440., 436., 432., 427., 423., 419.,
414., 410.],
[480., 477., 473., 469., 465., 460., 456., 452., 447., 443., 438.,
434., 429.],
[501., 498., 494., 490., 485., 481., 477., 472., 468., 463., 459.,
454., 449.],
[523., 519., 515., 511., 507., 502., 498., 494., 489., 484., 480.,
475., 470.],
[545., 541., 537., 533., 529., 524., 520., 515., 511., 506., 501.,
496., 492.],
[568., 564., 560., 556., 551., 547., 542., 538., 533., 528., 523.,
518., 513.],
[592., 588., 583., 579., 575., 570., 565., 561., 556., 551., 546.,
541., 536.],
[616., 612., 607., 603., 598., 594., 589., 584., 579., 575., 569.,
564., 559.],
[640., 636., 632., 627., 623., 618., 613., 609., 604., 599., 593.,
588., 583.],
[666., 662., 657., 653., 648., 643., 638., 633., 628., 623., 618.,
613., 607.],
[692., 688., 683., 679., 674., 669., 664., 659., 654., 649., 643.,
638., 632.],
[ nan, 714., 710., 705., 700., 695., 690., 685., 680., 675., 669.,
664., 658.]])
X, Y = np.meshgrid(x, y)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
print (X.shape, Y.shape, Z.shape)
ax.plot_surface(X, Y, Z)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
I have tried implementing these solutions:
https://gist.github.com/amroamroamro/1db8d69b4b65e8bc66a6
http://inversionlabs.com/2016/03/21/best-fit-surfaces-for-3-dimensional-data.html
However, since my x and y arrays don't have the same length, I get this error message:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
The data you shared seemed to work for me during plotting. Your X, Y, Z are all having the same size. There is one nan value in your Z array. You can remove that point while estimating equation of plane.
You want to fit your data to a plan in 3D. Thus, it is a linear regression problem. You can use multivariate regression from scikit-learn package to estimate the coefficient of the equation of plane.
Equation of plane is given by the following:
Z = a1 * X + a2 * Y + c
You can flatten your data as follows and use scikit-learn's linear_model to fit a plane to the data. Please refer below:
# your data is stored as X, Y, Z
print(X.shape, Y.shape, Z.shape)
x1, y1, z1 = X.flatten(), Y.flatten(), Z.flatten()
X_data = np.array([x1, y1]).reshape((-1, 2))
Y_data = z1
from sklearn import linear_model
reg = linear_model.LinearRegression().fit(X_data, Y_data)
print("coefficients of equation of plane, (a1, a2): ", reg.coef_)
print("value of intercept, c:", reg.intercept_)
The above code will fit a plane to the given data which is linear.
To fit a second degree surface, read further.
You will have Second Degree Surface equation for the following form:
Z = a1*X + a2*Y + a3*X*Y + a4*X*X + a5*Y*Y + c
To fit this curve using linear regression, you will have to modify the above code in the following manner:
# your data is stored as X, Y, Z
print(X.shape, Y.shape, Z.shape)
x1, y1, z1 = X.flatten(), Y.flatten(), Z.flatten()
x1y1, x1x1, y1y1 = x1*y1, x1*x1, y1*y1
X_data = np.array([x1, y1, x1y1, x1x1, y1y1]).T # X_data shape: n, 5
Y_data = z1
from sklearn import linear_model
reg = linear_model.LinearRegression().fit(X_data, Y_data)
print("coefficients of equation of plane, (a1, a2, a3, a4, a5): ", reg.coef_)
print("value of intercept, c:", reg.intercept_)
I have lognormal distributed data in x0 and y0 arrays:
x0.ravel() = array([19.8815 , 19.0141 , 18.1857 , 17.3943 , 16.6382 , 15.9158 ,
15.2254 , 14.5657 , 13.9352 , 13.3325 , 12.7564 , 12.2056 ,
11.679 , 11.1755 , 10.6941 , 10.2338 , 9.79353, 9.37249,
8.96979, 8.58462, 8.21619, 7.86376, 7.52662, 7.20409,
6.89552, 6.6003 , 6.31784, 6.04757, 5.78897, 5.54151,
5.30472, 5.07812, 4.86127, 4.65375, 4.45514, 4.26506,
4.08314, 3.90903, 3.74238, 3.58288, 3.4302 , 3.28407,
3.14419, 3.01029, 2.88212, 2.75943, 2.64198, 2.52955,
2.42192, 2.31889, 2.22026, 2.12583, 2.03543, 1.94889,
1.86604, 1.78671, 1.71077, 1.63807, 1.56845, 1.50181,
1.43801, 1.37691, 1.31842, 1.26242, 1.2088 , 1.15746,
1.10832, 1.06126, 1.01619])
y0.ravel() =array([1.01567e+03, 8.18397e+02, 7.31992e+02, 1.11397e+03, 2.39987e+03,
2.73762e+03, 4.65722e+03, 7.06308e+03, 9.67945e+03, 1.38983e+04,
1.98178e+04, 1.97461e+04, 3.28070e+04, 4.48814e+04, 5.80853e+04,
7.35511e+04, 8.94090e+04, 1.08274e+05, 1.28276e+05, 1.50281e+05,
1.69258e+05, 1.91944e+05, 2.16416e+05, 2.37259e+05, 2.57426e+05,
2.74818e+05, 2.90343e+05, 3.01369e+05, 3.09232e+05, 3.13713e+05,
3.17225e+05, 3.19177e+05, 3.17471e+05, 3.14415e+05, 3.08396e+05,
2.95692e+05, 2.76097e+05, 2.52075e+05, 2.29330e+05, 1.97843e+05,
1.74262e+05, 1.46360e+05, 1.20599e+05, 9.82223e+04, 7.80995e+04,
6.34618e+04, 4.77460e+04, 3.88737e+04, 3.23715e+04, 2.58129e+04,
2.15724e+04, 1.58737e+04, 1.13006e+04, 7.64983e+03, 4.64590e+03,
3.31463e+03, 2.40929e+03, 3.02183e+03, 1.47422e+03, 1.06046e+03,
1.34875e+03, 8.26674e+02, 9.53167e+02, 6.47428e+02, 9.83651e+02,
8.93673e+02, 1.23637e+03, 0.00000e+00, 8.36573e+01])
I want to use curve_fit to get an function, that fits my data points, to gain the mu (and then the exp(mu) for the median) and the sigma of this distribution.
import numpy as np
from scipy.optimize import *
def f(x, mu, sigma) :
return 1/(np.sqrt(2*np.pi)*sigma*x)*np.exp(-((np.log(x)-
mu)**2)/(2*sigma**2))
params, extras = curve_fit(f, x0.ravel(), y0.ravel())
print "mu=%g, sigma=%g" % (params[0], params[1])
plt.plot(x0, y0, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
The result is the following:
mu=1.47897, sigma=0.0315236
Curve_fit
Obviously the function does not fit the data by any means.
When i multiply the fitting function by, lets say 1.3*10^(5) in the code:
plt.plot(x0, 1.3*10**5*f(x0 ,params[0], params[1]))
This is the result:
manually changed fitting curve
The calculated mu value, which is the mean value of the related normal distribution seems right, because when im using:
np.mean(np.log(x))
i get 1.4968838412183132, which is quite similar to the mu i obtain from curve_fit.
Calculating the median with
np.exp(np.mean(np.log(x))
gives 4.4677451525990675, which seems to be ok.
But unless i see the fitting function going threw my datapoints, i do not really trust these numbers. My problem is obviously, that the fitting function has no information of the (big) y0 values. How can i change that?
Any help apreciated!
The problem is, that your data are not(!) showing a lognormal pdf, since they are not normalized properly. Note, that the integral over a pdf has to be 1. If you numerically integrate your data and nomalize by that, e.g.
y1 = y0/np.trapz(x0, y0)
your approach works fine.
params, extras = curve_fit(f, x0, y1)
plt.plot(x0, y1, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
and
print("mu=%g, sigma=%g" % (params[0], params[1]))
resulting in
mu=1.80045, sigma=0.372185
I am trying to fit a linear line to my graph:
the x values of the red line(raw data) are:
array([ 0.03591733, 0.16728212, 0.49537727, 0.96912459,
1. , 1. , 1.11894521, 1.93042113,
2.94284656, 10.98699942])
and the y values are
array([ 0.0016241 , 0.00151784, 0.00155586, 0.00174498, 0.00194872,
0.00189413, 0.00208325, 0.00218074, 0.0021281 , 0.00243127])
my code for the line of best fit is:
LineFit = np.polyfit(x, y, 1)
p = np.poly1d(LineFit)
plt.plot(x,y,'r-')
plt.plot(x,p(y),'--')
plt.show()
However, my LineFit returns me
array([ 7.03475069e-05, 1.76565292e-03])
which supposed to be interception and gradient according to the definition of polyfit (lower to higher order coefficient)
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.polyfit.html
but seems like its the opposite (gradient and interception) from the plot.
Could someone explain this to me?
You are looking at a different doc. See https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyfit.html#numpy.polyfit:
...Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y).
So in your example, it is p(x) = p[0] * x + p[1], which is exactly gradient and interception...