scipy RBFInterpolator doesnt reproduce data points - python

I'm trying to perform a 3D interpolation using a cubic kernel with scipy's RBFInterpolator routine. I'm running into a problem where the interpolator wont reproduce the input data points, even with smoothing set to 0. I've tried other kernels, such as 'linear' and 'quintic' and varying the neighbors argument, but still the interpolator wont reproduce the input data.
My two questions are:
Why wont RBFInterpolator reproduce the input data?
Is there another python module that can perform 3d cubic spline interpolation?
Here's a minimal example:
import numpy as np
from numpy import array
import scipy.interpolate as spint
import matplotlib.pyplot as plt
TMPenden=array([[[0.00033903, 0.00034132, 0.00034312, 0.00034442, 0.0003452 ,
0.00034547, 0.0003452 , 0.00034441, 0.00034311, 0.0003413 ],
[0.00041104, 0.0004126 , 0.00041383, 0.00041471, 0.00041524,
0.00041542, 0.00041524, 0.00041471, 0.00041382, 0.00041259],
[0.00044731, 0.00044845, 0.00044934, 0.00044999, 0.00045037,
0.0004505 , 0.00045037, 0.00044998, 0.00044934, 0.00044844],
[0.00046978, 0.00047068, 0.00047138, 0.00047189, 0.00047219,
0.00047229, 0.00047219, 0.00047188, 0.00047138, 0.00047067],
[0.00048431, 0.00048505, 0.00048563, 0.00048604, 0.00048629,
0.00048638, 0.00048629, 0.00048604, 0.00048563, 0.00048504],
[0.00049386, 0.00049449, 0.00049499, 0.00049534, 0.00049555,
0.00049562, 0.00049555, 0.00049534, 0.00049498, 0.00049449],
[0.00050013, 0.00050068, 0.00050111, 0.00050142, 0.0005016 ,
0.00050166, 0.0005016 , 0.00050142, 0.00050111, 0.00050068],
[0.00050414, 0.00050462, 0.000505 , 0.00050527, 0.00050544,
0.00050549, 0.00050544, 0.00050527, 0.000505 , 0.00050462],
[0.00050653, 0.00050696, 0.0005073 , 0.00050754, 0.00050769,
0.00050774, 0.00050769, 0.00050754, 0.0005073 , 0.00050696],
[0.00050773, 0.00050812, 0.00050843, 0.00050865, 0.00050878,
0.00050882, 0.00050878, 0.00050865, 0.00050843, 0.00050812]],
[[0.00047842, 0.00048166, 0.00048422, 0.00048606, 0.00048717,
0.00048755, 0.00048717, 0.00048605, 0.0004842 , 0.00048164],
[0.00058177, 0.00058398, 0.00058571, 0.00058696, 0.00058771,
0.00058796, 0.00058771, 0.00058695, 0.0005857 , 0.00058396],
[0.00063274, 0.00063436, 0.00063562, 0.00063653, 0.00063708,
0.00063726, 0.00063708, 0.00063653, 0.00063562, 0.00063435],
[0.00066297, 0.00066424, 0.00066524, 0.00066595, 0.00066638,
0.00066653, 0.00066638, 0.00066595, 0.00066523, 0.00066424],
[0.00068146, 0.00068251, 0.00068333, 0.00068392, 0.00068427,
0.00068439, 0.00068427, 0.00068392, 0.00068333, 0.00068251],
[0.00069271, 0.00069361, 0.0006943 , 0.0006948 , 0.0006951 ,
0.0006952 , 0.0006951 , 0.0006948 , 0.0006943 , 0.0006936 ],
[0.00069925, 0.00070002, 0.00070062, 0.00070106, 0.00070132,
0.00070141, 0.00070132, 0.00070106, 0.00070062, 0.00070001],
[0.00070256, 0.00070325, 0.00070378, 0.00070416, 0.00070439,
0.00070447, 0.00070439, 0.00070416, 0.00070378, 0.00070324],
[0.00070361, 0.00070422, 0.0007047 , 0.00070505, 0.00070525,
0.00070532, 0.00070525, 0.00070504, 0.0007047 , 0.00070422],
[0.00070302, 0.00070358, 0.00070401, 0.00070432, 0.0007045 ,
0.00070457, 0.0007045 , 0.00070432, 0.00070401, 0.00070357]],
[[0.00064677, 0.00065116, 0.00065462, 0.00065712, 0.00065863,
0.00065914, 0.00065863, 0.00065711, 0.00065461, 0.00065114],
[0.00078754, 0.00079052, 0.00079286, 0.00079455, 0.00079556,
0.0007959 , 0.00079556, 0.00079454, 0.00079285, 0.0007905 ],
[0.00085515, 0.00085733, 0.00085904, 0.00086027, 0.00086101,
0.00086126, 0.00086101, 0.00086026, 0.00085903, 0.00085732],
[0.00089318, 0.0008949 , 0.00089624, 0.0008972 , 0.00089778,
0.00089798, 0.00089778, 0.0008972 , 0.00089623, 0.00089489],
[0.00091474, 0.00091616, 0.00091726, 0.00091805, 0.00091853,
0.00091869, 0.00091853, 0.00091805, 0.00091726, 0.00091615],
[0.00092632, 0.00092752, 0.00092845, 0.00092913, 0.00092953,
0.00092967, 0.00092953, 0.00092912, 0.00092845, 0.00092751],
[0.00093148, 0.00093252, 0.00093333, 0.00093391, 0.00093426,
0.00093438, 0.00093426, 0.00093391, 0.00093332, 0.00093251],
[0.00093233, 0.00093325, 0.00093396, 0.00093448, 0.00093479,
0.00093489, 0.00093478, 0.00093447, 0.00093396, 0.00093324],
[0.0009302 , 0.00093102, 0.00093166, 0.00093212, 0.00093239,
0.00093248, 0.00093239, 0.00093211, 0.00093166, 0.00093102],
[0.00092595, 0.00092669, 0.00092726, 0.00092768, 0.00092792,
0.00092801, 0.00092792, 0.00092767, 0.00092726, 0.00092668]]])
TMPraddata=array([ 2.43597707, 3.43597707, 4.43597707, 5.43597707, 6.43597707,
7.43597707, 8.43597707, 9.43597707, 10.43597707, 11.43597707])
TMPthetadata=array([1.41381669, 1.44523262, 1.47664855, 1.50806447, 1.5394804 ,
1.57089633, 1.60231225, 1.63372818, 1.66514411, 1.69656003])
TMPalpha = np.array([0.06,0.07,0.08])
coords = np.asarray([[alpha,r,theta] for alpha in TMPalpha for r in TMPraddata for theta in TMPthetadata])
yvalues = np.ravel(TMPenden)
tmprbf = spint.RBFInterpolator(
coords,
yvalues,
neighbors=1000,
kernel='linear',
smoothing=0,
degree=2
)
tmpalpha = 0.6
pnts = [[tmpalpha,i,TMPthetadata[5]] for i in TMPraddata]
datpnts = tmprbf(pnts)
plt.scatter(TMPraddata, datpnts, label="Radial Basis Function",marker=".", linewidth=.5)
plt.plot(TMPraddata, TMPenden[0,:,5],label="Data")
plt.legend();

Related

Area under the histogram is not 1 when using density in plt.hist

Consider the following dataset with random data:
test_dataset = np.array([ -2.09601881, -4.26602684, 1.09105452, -4.59559669,
1.05865251, -0.93076762, -14.70398945, -18.01937129,
4.64126152, -10.34178822, -9.46058493, -5.66864965,
-3.17562022, 15.7030379 , 10.59675205, -5.80882413,
-24.00604149, -4.81518663, -1.94333927, 1.18142171,
12.72030312, 3.84917581, -0.4468796 , 11.91828567,
-17.99171774, 9.35108712, -5.57233376, 5.77547128,
5.49296099, -10.96132844, -18.75174336, 5.27843303,
25.73548956, -21.58043021, -14.24734733, 12.57886018,
-22.10002076, 1.72207555, -6.0411867 , -3.63568527,
7.26542117, -0.21449529, -6.64974714, -0.94574606,
-4.23339431, 16.76199734, -12.42195793, 18.965854 ,
-23.85336123, -15.55104466, 6.17215868, 7.34993316,
8.62461351, -16.30482638, -16.35601099, 1.96857833,
18.74440399, -22.48374434, -10.895831 , -10.14393648,
-17.62768751, 4.83388855, 20.1578181 , 6.04299626,
0.97198296, -3.40889754, -10.62734293, 1.70240472,
20.4203839 , 10.26751364, 15.47859675, -10.97940064,
1.82728251, 4.22894717, 8.31502887, -5.48502811,
-1.09244874, -11.32072796, -24.88520436, -7.42108403,
19.4200716 , 4.82704045, -12.46290135, -15.18466755,
6.37714692, -11.06825059, 5.10898588, -9.07485484,
1.63946084, -12.2270078 , 12.63776832, -25.03916909,
2.42972082, -14.22890171, 18.2199446 , 6.9819771 ,
-12.07795089, 2.59948596, -16.90206575, 6.35192719,
7.33823106, -23.69653447, -11.66091871, -19.40251179,
-12.64863792, 11.04004231, 13.7247356 , -16.36107329,
20.43227515, 17.97334692, 16.92675175, -5.62051239,
-8.66304184, -8.40848514, -23.20919855, 0.96808137,
-5.03287253, -3.13212582, 18.81155666, -8.27988284,
3.85708447, 12.43039322, 17.98003878, 18.11009997,
-3.74294421, -16.62276121, 9.4446743 , 2.2060981 ,
8.34853736, 14.79144713, -1.91113975, -5.17061419,
4.53451746, 8.19090358, 7.98343201, 11.44592322,
-16.9132677 , -25.92554857, 10.10638432, -8.09236786,
20.8878207 , 19.52368296, 0.85858125, 2.61760415,
9.21360649, -8.1192651 , -6.94829273, 2.73562447,
13.40981323, -9.05018331, -17.77563166, -21.03927199,
4.10415845, -1.31550732, 5.68284828, 15.08670773,
-19.78675315, 12.94697869, -11.51797637, 1.91485992,
16.69417993, -16.04271622, -1.14028558, 9.79830109,
-18.58386093, -7.52963269, -10.10059878, -25.2194216 ,
-0.10598426, -15.77641532, -14.15999125, 14.35011271,
11.15178588, -14.43856266, 15.84015226, -3.41221883,
11.90724469, 0.57782081, 18.82127466, -6.01068727,
-19.83684476, 2.20091942, -1.38707755, -8.62821053,
-11.89000913, -11.69539815, 5.70242019, -3.83781841,
5.35894135, -0.30995954, 21.76661212, 8.52974329,
-9.13065082, -11.06209 , -12.00654618, 2.769838 ,
-12.21579496, -27.2686534 , -4.58538197, -6.94388425])
I'd like to plot normalized histogram of it, so in the plt.hist options I choose density=True:
import numpy as np
import matplotlib.pyplot as plt
data1, bins, _ = plt.hist(test_dataset, density=True);
print(np.trapz(data1))
print(sum(data1))
which outputs the following histogram:
0.18206124014272715
0.18866449755723017
From matplotlib documentation:
The density parameter, which normalizes bin heights so that the integral of the histogram is 1. The resulting histogram is an approximation of the probability density function.
But from my example it is clearly seen that the integral of the histogram is NOT 1 and strongly depends on the number of bins: if I specify it for example to be 40 the sum will increase:
data1, bins, _ = plt.hist(test_dataset, density=True);
print(np.trapz(data1))
print(sum(data1))
0.7508847002777762
0.7546579902289207
Is it incorrect description in documentation or I misunderstand some issues here?
you do not calculate the area, area you should calculate as follow (in your example):
sum(data1 * np.diff(bins)) == 1

scipy.odr fails in fitting exponential function

I'm working on an astrophysics project where I need to measure the density(ne) of the gas in the center of the galaxy by two methods(A and S). I made a plot of ne_s x ne_a and I want to try an exponential fit in this plot. The problems are the following:
the errors in the data are asymmetrical and, apparently, scipy.odr does not accept this type of error. When the erros are included 'ValueError: could not convert we to a suitable array' is raised.
even if I do not include the errors the fit still does not work.
The code used(errors in the fit not included):
import numpy as np
import matplotlib.pyplot as plt
ne_s = np.array([ 134.70722125, 316.27850769, 403.37221974, 579.91067991,
1103.06258335, 1147.23685549, 115.00820933, 476.42659337,
667.61690967, 403.30988606, 282.08007264, 479.98058352,
897.64247885, 214.75999934, 213.22512064, 491.81749573,
743.68513419, 374.37957281, 362.136037 , 893.88595455])
dne_s_max = np.array([23.6619623 , 5.85802097, 12.02456923, 1.50211648, 5.15987014,
10.3830146 , 10.5274528 , 0.82928872, 2.18586603, 31.95014727,
6.53134179, 2.38392559, 32.2838402 , 5.43629034, 1.02316579,
6.60281602, 14.53943481, 9.16809221, 6.84052648, 12.87655997])
dne_s_min = np.array([21.94513608, 5.80578938, 11.8303456 , 1.49856527, 5.1265976 ,
10.2523836 , 10.12663739, 0.82824884, 2.17914616, 30.55846643,
6.45691351, 2.37446669, 30.87025015, 5.37271061, 1.02087355,
6.5358395 , 14.21332643, 9.0523711 , 6.77187898, 12.64596461])
ne_a = np.array([ 890.61498788, 2872.03715706, 10222.33463389, 1946.48193766,
6695.25304235, 2107.36471192, 891.72010662, 3988.87511761,
11328.9670489 , 1097.38904905, 2896.62668843, 4849.57809801,
5615.96780935, 1415.18564794, 1204.00022768, 3616.05423907,
15638.52683391, 3300.6039601 , 775.28841051, 12325.54379524])
dne_a_max = np.array([1082.33639266, 571.57094375, 2396.39839075, 458.32058555,
796.79916236, 665.95370946, 2262.73423374, 1006.65192577,
1761.9251987 , 1718.78400914, 579.65477159, 245.54811362,
1652.50314639, 401.37677822, 178.03620792, 725.26490794,
6625.62353545, 908.21490446, 719.01117673, 2098.24809312])
dne_a_min = np.array([ 865.33019015, 518.08880981, 1877.85283954, 412.91242092,
724.38681574, 582.52644162, 870.14392196, 866.63643893,
1478.1792513 , 1076.64135559, 521.08794554, 236.2457763 ,
1349.36104495, 362.72343267, 169.23314057, 646.39803115,
4139.5768453 , 789.04878324, 620.55523654, 1720.06369942])
dne_a = [dne_a_min, dne_a_max]
dne_s = [dne_s_min, dne_s_max]
fig, ax = plt.subplots(1,1)
ax.errorbar(ne_s, ne_a, xerr = dne_s, yerr = dne_a,
linestyle = 'none', linewidth = 0.7, capsize = 5, color = 'crimson')
ax.scatter(ne_s, ne_a, s = 15, color = 'black')
ax.set_ylabel('$n_e(A)$'), ax.set_xlabel('$n_e(S)$')
from scipy.odr import Data, RealData, Model, ODR
def f(B, x):
return B[0] + B[1] * np.exp(B[2] * x)
exponential = Model(f)
data = RealData(ne_s, ne_a)
odr = ODR(data, exponential, beta0=[1, 200, 3e-3])
out = odr.run()
ax.plot(ne_s, f(out.beta, ne_s), linewidth = 0.7)
Which results in:
And the actual plot is:
So what am I missing here? Did I applied the odr routine erroneously? What should I do to make the fit work properly? And how to make scipy.odr accept asymmetrical error?
Important to add that I don't know too much about scipy.odr, I just adapted the documentation example to an exponential function.
Appreciate the help. Let me know if more information is necessary.

Unique trend Curve Fitting

I have data like this:
x = np.array([ 0. , 3. , 3.3 , 10. , 18. , 43. , 80. ,
120. , 165. , 210. , 260. , 310. , 360. , 410. ,
460. , 510. , 560. , 610. , 660. , 710. , 760. ,
809.5 , 859. , 908.5 , 958. , 1007.5 , 1057. , 1106.5 ,
1156. , 1205.5 , 1255. , 1304.5 , 1354. , 1403.5 , 1453. ,
1502.5 , 1552. , 1601.5 , 1651. , 1700.5 , 1750. , 1799.5 ,
1849. , 1898.5 , 1948. , 1997.5 , 2047. , 2096.5 , 2146. ,
2195.5 , 2245. , 2294.5 , 2344. , 2393.5 , 2443. , 2492.5 ,
2542. , 2591.5 , 2640. , 2690. , 2740. , 2789.67, 2839.33,
2891.5 ])
y = array([ 1.45 , 1.65 , 5.8 , 6.8 , 8.0355, 8.0379, 8.04 ,
8.0505, 8.175 , 8.3007, 8.4822, 8.665 , 8.8476, 9.0302,
9.528 , 9.6962, 9.864 , 10.032 , 10.2 , 10.9222, 11.0553,
11.1355, 11.2228, 11.3068, 11.3897, 11.4704, 11.5493, 11.6265,
11.702 , 11.7768, 11.8491, 11.9208, 11.9891, 12.0571, 12.1247,
12.1912, 12.2558, 12.3181, 12.3813, 12.4427, 12.503 , 12.5638,
12.6226, 12.6807, 12.7384, 12.7956, 12.8524, 12.9093, 12.9663,
13.0226, 13.0786, 13.1337, 13.1895, 13.2465, 13.3017, 13.3584,
13.4156, 13.4741, 13.5311, 13.5899, 13.6498, 13.6533, 13.657 ,
13.6601])
and look like this :
I need to make curve fitting for this trend. Iam using Moving Average for smoothing and look like this:
where the magenta color is the MA, and Iam using polynomial (5th Ordo) and look like this:
where the blue is the result of the polynomial. I have try higher ordo, but the result getting worst. How can I get a result where first point at (0,0) and look like this (like black curve)?
This is my code :
import numpy as np
from scipy import interpolate
def movingaverage(interval, window_size):
window= np.ones(int(window_size))/float(window_size)
print(window)
return np.convolve(interval, window, 'same')
y_av = movingaverage(y, 2)
X = np.arange(0,np.max(x),30).ravel()
yinter = interpolate.interp1d(x,y_av)(X)
z = np.poly1d(np.polyfit(x,y_av,5))
Y = z(X)
plt.figure(1)
plt.plot(xm,ym,'*-r')
plt.plot(xm,y_av,'.-m')
plt.plot(X,Y,'*-b')
To do this, you should use your analytical function (with parameters) based on some assumption (not only polynomial functions). You can use curve_fit form scipy.optimize to find the unknown parameters of your analytic function that best fit your input data.
For example:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# your analytical function (theoretical function) with parameters: a, b (or more)
def your_analytical_func(x, a, b):
return a * np.log(x + b) # this is just for example
# or using anonymous (lambda) function
# your_analytical_func = lambda x, a, b: a * np.log(x + b)
# Fit for the parameters a, b (or more) of the function your_analytical_func:
popt, pcov = curve_fit(your_analytical_func, x, y)
plt.plot(x, y, 'r.', label='incoming data')
plt.plot(x, your_analytical_func(x, *popt), '-', color="black", label='fit: your_analytical_func(x, a=%5.3f, b=%5.3f)' % tuple(popt))
plt.legend()

How to find Inverse Cumulative Distribution Function of discrete functions in Python

I am trying to find the Inverse CDF function of discrete probability distribution in Python and then plot it. My CDF is derived from the following numpy output:
array([ 0.228157, 0.440671, 0.588515, 0.683326, 0.740365, 0.783288,
0.81362 , 0.840518, 0.859213, 0.876764, 0.889355, 0.89813 ,
0.909194, 0.916443, 0.9256 , 0.930369, 0.938572, 0.942387,
0.946012, 0.951353, 0.954405, 0.956694, 0.965088, 0.966614,
0.96814 , 0.969475, 0.970047, 0.971001, 0.971573, 0.973099,
0.974816, 0.975388, 0.977105, 0.984163, 0.984354, 0.984736,
0.98569 , 0.985881, 0.986072, 0.986644, 0.990269, 0.990651,
0.990842, 0.993322, 0.993704, 0.994467, 0.995039, 0.995802,
0.996184, 0.996375, 0.996566, 0.996757, 0.997329, 0.99752 ,
0.997711, 0.997902, 0.998093, 0.998284, 0.998475, 0.998666,
0.998857, 0.999239, 0.999621, 0.999812, 1.00000])
I tried rv_discrete.ppf(q, *args, **kwds), but it works for random variables, which is not my case.
Since you have lots of points perhaps you would find linear interpolation acceptable between adjacent points. Do binary search to find the points that are adjacent to the probability you seek in the first place. Like this, with some tidying up:
import numpy as np
CDF = np.array([ 0.228157, 0.440671, 0.588515, 0.683326, 0.740365, 0.783288, 0.81362 , 0.840518, 0.859213, 0.876764, 0.889355, 0.89813 , 0.909194, 0.916443, 0.9256 , 0.930369, 0.938572, 0.942387, 0.946012, 0.951353, 0.954405, 0.956694, 0.965088, 0.966614, 0.96814 , 0.969475, 0.970047, 0.971001, 0.971573, 0.973099, 0.974816, 0.975388, 0.977105, 0.984163, 0.984354, 0.984736, 0.98569 , 0.985881, 0.986072, 0.986644, 0.990269, 0.990651, 0.990842, 0.993322, 0.993704, 0.994467, 0.995039, 0.995802, 0.996184, 0.996375, 0.996566, 0.996757, 0.997329, 0.99752 , 0.997711, 0.997902, 0.998093, 0.998284, 0.998475, 0.998666, 0.998857, 0.999239, 0.999621, 0.999812, 1.00000] )
## inverse of .3
index = np.searchsorted(CDF, .3)
print ( index )
print ( (.3 - CDF [ index-1 ] ) / ( CDF [ index ] - CDF [ index-1 ] ) )
Output is this.
1
0.338062433534

Different values weibull pdf

I was wondering why the values of weibull pdf with the prebuilt function dweibull.pdf are more or less the half they should be
I did a test. For the same x I created the weibull pdf for A=10 and K=2 twice, one by writing myself the formula and the other one with the prebuilt function of dweibull.
import numpy as np
from scipy.stats import exponweib,dweibull
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
K=2.0
A=10.0
x=np.arange(0.,20.,1)
#own function
def weib(data,a,k):
return (k / a) * (data / a)**(k - 1) * np.exp(-(data / a)**k)
pdf1=weib(x,A,K)
print sum(pdf1)
#prebuilt function
dist=dweibull(K,1,A)
pdf2=dist.pdf(x)
print sum(pdf2)
f=plt.figure()
suba=f.add_subplot(121)
suba.plot(x,pdf1)
suba.set_title('pdf dweibull')
subb=f.add_subplot(122)
subb.plot(x,pdf2)
subb.set_title('pdf own function')
f.show()
It seems with dweibull the pdf values are the half but that this is wrong as the summation should be in total 1 and not aroung 0.5 as it is with dweibull. By writing myself the formula the summation is around 1[
scipy.stats.dweibull implements the double Weibull distribution. Its support is the real line. Your function weib corresponds to the PDF of scipy's weibull_min distribution.
Compare your function weib to weibull_min.pdf:
In [128]: from scipy.stats import weibull_min
In [129]: x = np.arange(0, 20, 1.0)
In [130]: K = 2.0
In [131]: A = 10.0
Your implementation:
In [132]: weib(x, A, K)
Out[132]:
array([ 0. , 0.019801 , 0.03843158, 0.05483587, 0.0681715 ,
0.07788008, 0.08372116, 0.0857677 , 0.08436679, 0.08007445,
0.07357589, 0.0656034 , 0.05686266, 0.04797508, 0.03944036,
0.03161977, 0.02473752, 0.01889591, 0.014099 , 0.0102797 ])
scipy.stats.weibull_min.pdf:
In [133]: weibull_min.pdf(x, K, scale=A)
Out[133]:
array([ 0. , 0.019801 , 0.03843158, 0.05483587, 0.0681715 ,
0.07788008, 0.08372116, 0.0857677 , 0.08436679, 0.08007445,
0.07357589, 0.0656034 , 0.05686266, 0.04797508, 0.03944036,
0.03161977, 0.02473752, 0.01889591, 0.014099 , 0.0102797 ])
By the way, there is a mistake in this line of your code:
dist=dweibull(K,1,A)
The order of the parameters is shape, location, scale, so you are setting the location parameter to 1. That's why the values in your second plot are shifted by one. That line should have been
dist = dweibull(K, 0, A)

Categories

Resources