Constructing a 2d interpolator given scattered input data - python

I have three lists as follows:
x = [100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0]
y = [300.0, 300.0, 300.0, 300.0, 500.0, 500.0, 500.0, 500.0, 700.0, 700.0, 700.0, 700.0, 1000.0, 1000.0, 1000.0, 1000.0, 1500.0, 1500.0, 1500.0, 1500.0, 2000.0, 2000.0, 2000.0, 2000.0, 3000.0, 3000.0, 3000.0, 3000.0, 5000.0, 5000.0, 5000.0, 5000.0, 7500.0, 7500.0, 7500.0, 75000.0, 10000.0, 10000.0, 10000.0, 10000.0]
z = [100.0, 95.0, 87.5, 77.5, 60.0, 57.0, 52.5, 46.5, 40.0, 38.0, 35.0, 31.0, 30.0, 28.5, 26.25, 23.25, 23.0, 21.85, 20.125, 17.825, 17.0, 16.15, 14.875, 13.175, 13.0, 12.35, 11.375, 10.075, 10.0, 9.5, 8.75, 7.75, 7.0, 6.65, 6.125, 5.425, 5.0, 4.75, 4.375, 3.875]
Each entry of each list is read as a point so point 0 is (100,300,100) point 1 is (75,300,95) and so on.
I am trying to do 2d interpolation, so that I can compute a z value for any given input (x0, y0) point.
I was reading that using meshgrid I can interpolate with RegularGridInterpolator from scipy but I am not sure how to set it up when I do:
x_,y_,z_ = np.meshgrid(x,y,z) # both indexing ij or xy
I don't get values for x_,y_,z_ that make sense and I am not sure how to go from there.
I am trying to use the data points I have above to find intermediate values so something similar to scipy's interp1d where
f = interp1d(x, y, kind='cubic')
where I can later call f(any (x,y) point within range) and get the corresponding z value.

You need 2d interpolation over scattered data. I'd default to using scipy.interpolate.griddata in this case, but you seem to want a callable interpolator, whereas griddata needs a given set of points onto which it will interpolate.
Not to worry: griddata with 2d cubic interpolation uses a CloughTocher2DInterpolator. So we can do exactly that:
import numpy as np
import scipy.interpolate as interp
x = [100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0]
y = [300.0, 300.0, 300.0, 300.0, 500.0, 500.0, 500.0, 500.0, 700.0, 700.0, 700.0, 700.0, 1000.0, 1000.0, 1000.0, 1000.0, 1500.0, 1500.0, 1500.0, 1500.0, 2000.0, 2000.0, 2000.0, 2000.0, 3000.0, 3000.0, 3000.0, 3000.0, 5000.0, 5000.0, 5000.0, 5000.0, 7500.0, 7500.0, 7500.0, 75000.0, 10000.0, 10000.0, 10000.0, 10000.0]
z = [100.0, 95.0, 87.5, 77.5, 60.0, 57.0, 52.5, 46.5, 40.0, 38.0, 35.0, 31.0, 30.0, 28.5, 26.25, 23.25, 23.0, 21.85, 20.125, 17.825, 17.0, 16.15, 14.875, 13.175, 13.0, 12.35, 11.375, 10.075, 10.0, 9.5, 8.75, 7.75, 7.0, 6.65, 6.125, 5.425, 5.0, 4.75, 4.375, 3.875]
interpolator = interp.CloughTocher2DInterpolator(np.array([x,y]).T, z)
Now you can call this interpolator with 2 coordinates to give you the corresponding interpolated data point:
>>> interpolator(x[10], y[10]) == z[10]
True
>>> interpolator(2, 300)
array(77.81343)
Note that you'll have to stay inside the convex hull of the input points, otherwise you'll get nan (or whatever is passed as the fill_value keyword to the interpolator):
>>> interpolator(2, 30)
array(nan)
Extrapolation is usually meaningless anyway, and your input points are scattered in a bit erratic way:
So even if extrapolation was possible I wouldn't believe it.
Just to demonstrate how the resulting interpolator is constrained to the convex hull of the input points, here's a surface plot of your data on a gridded mesh we create just for plotting:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# go linearly in the x grid
xline = np.linspace(min(x), max(x), 30)
# go logarithmically in the y grid (considering y distribution)
yline = np.logspace(np.log10(min(y)), np.log10(max(y)), 30)
# construct 2d grid from these
xgrid,ygrid = np.meshgrid(xline, yline)
# interpolate z data; same shape as xgrid and ygrid
z_interp = interpolator(xgrid, ygrid)
# create 3d Axes and plot surface and base points
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xgrid, ygrid, z_interp, cmap='viridis',
vmin=min(z), vmax=max(z))
ax.plot(x, y, z, 'ro')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Here's the output from two angles (it's better to rotate around interactively; such stills don't do the 3d representation justice):
There are two main features to note:
The surface nicely fits the red points, which is expected from interpolation. Fortunately the input points are nice and smooth so everything goes well with interpolation. (The fact that the red points are usually hidden by the surface is only due to how pyplot's renderer mishandles the relative position of complex 3d objects)
The surface is cut (due to nan values) along the convex hull of the input points, so even though our gridded arrays define a rectangular grid we only get a cut of the surface where interpolation makes sense.

Related

Generate probability distribution or smoothing plot from points containing probabilities

I have points which include the probability on the y-axis and values on the x-axis, like:
p1 =
[[0.0, 0.0001430560406790707],
[10.0, 6.2797052001508247e-13],
[15.0, 4.8114669550502021e-06],
[20.0, 0.0007443231772534647],
[25.0, 0.00061070912573869406],
[30.0, 0.48116582167944905],
[35.0, 0.24698643991977953],
[40.0, 0.016407283121225951],
[45.0, 0.2557158314329116],
[50.0, 1.1252231121357235e-05],
[55.0, 0.064666668633158647],
[60.0, 1.7631447655837744e-17],
[65.0, 1.1294722466816786e-14],
[70.0, 2.9419020411134367e-16],
[75.0, 3.0887653014525822e-17],
[80.0, 4.4973693062706866e-17],
[85.0, 9.0975358174005147e-15],
[90.0, 1.0758266454985257e-10],
[95.0, 7.2923752473657924e-08],
[100.0, 1.8065366882584036e-08]]
p2 =
[[0.0, 4.1652247577331996e-06],
[10.0, 1.2212829713673957e-06],
[15.0, 6.5906857192417344e-08],
[20.0, 0.00016745946587138236],
[25.0, 0.0054431111796765554],
[30.0, 0.0067575214586160616],
[35.0, 0.00011856110316632124],
[40.0, 0.00032181662132509944],
[45.0, 0.001397981055516994],
[50.0, 0.0027058954834684062],
[55.0, 2.553142406703067e-06],
[60.0, 1.1514033594755017e-08],
[65.0, 0.21961568282994792],
[70.0, 2.4658349829099807e-08],
[75.0, 0.0022850986575076743],
[80.0, 3.5603047823624507e-06],
[85.0, 0.99406392082894734],
[90.0, 0.24399923235645221],
[95.0, 0.0013470125217945798],
[100.0, 0.042582366972883985]]
Now I want to generate a probability distribution from the points, where the x-axis values are (0,10,15,20,...,100) and the y-axis values contain the probabilities (0.00014,....)
When using the plt.plot fuction I get:
plt.plot([item[0] for item in p1],[item[1] for item in p1])
And for p2:
plt.plot([item[0] for item in p2],[item[1] for item in p2])
I want to get a more smooth visualization, like a probability distribution:
And if a probability distribution is not possible, then a smoothing spline:
Scipy's gaussian_kde is often used to smoothly approximate a probability distribution. It sums a gaussian kernel for each input point. Usually individual measurements are used as inputs, but the weights parameter allows working with binned data. The function is normalized to have its integral equal to one.
This approach assumes the values of p1 and p2 are meant as a mean for the segment around each x-value, similar to a histogram. I.e. a step function where the x-values identify the end of each step.
from matplotlib import pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
p1 = np.array([[0.0, 0.0001430560406790707],
[10.0, 6.2797052001508247e-13],
[15.0, 4.8114669550502021e-06],
[20.0, 0.0007443231772534647],
[25.0, 0.00061070912573869406],
[30.0, 0.48116582167944905],
[35.0, 0.24698643991977953],
[40.0, 0.016407283121225951],
[45.0, 0.2557158314329116],
[50.0, 1.1252231121357235e-05],
[55.0, 0.064666668633158647],
[60.0, 1.7631447655837744e-17],
[65.0, 1.1294722466816786e-14],
[70.0, 2.9419020411134367e-16],
[75.0, 3.0887653014525822e-17],
[80.0, 4.4973693062706866e-17],
[85.0, 9.0975358174005147e-15],
[90.0, 1.0758266454985257e-10],
[95.0, 7.2923752473657924e-08],
[100.0, 1.8065366882584036e-08]])
p2 = np.array([[0.0, 4.1652247577331996e-06],
[10.0, 1.2212829713673957e-06],
[15.0, 6.5906857192417344e-08],
[20.0, 0.00016745946587138236],
[25.0, 0.0054431111796765554],
[30.0, 0.0067575214586160616],
[35.0, 0.00011856110316632124],
[40.0, 0.00032181662132509944],
[45.0, 0.001397981055516994],
[50.0, 0.0027058954834684062],
[55.0, 2.553142406703067e-06],
[60.0, 1.1514033594755017e-08],
[65.0, 0.21961568282994792],
[70.0, 2.4658349829099807e-08],
[75.0, 0.0022850986575076743],
[80.0, 3.5603047823624507e-06],
[85.0, 0.99406392082894734],
[90.0, 0.24399923235645221],
[95.0, 0.0013470125217945798],
[100.0, 0.042582366972883985]])
x = np.linspace(0, 100, 1000)
fig, axes = plt.subplots(ncols=2)
for ax, p in zip(axes, [p1, p2]):
p[0, 0] = 5.0 # let each x-value be the end of a segment
ax.step(p[:,0], p[:,1], color='dodgerblue', lw=1, ls=':', where='pre')
ax2 = ax.twinx()
kde = gaussian_kde(p[:,0]-2.5, bw_method=.25, weights=p[:,1])
ax2.plot(x, kde(x), color='crimson')
plt.show()

Histogram hide empty bins

I want to illustrate nicely how often (y-axis) a certain output (x-axis) occurs...
My code produces following plot:
It's not good, because the values are rounded to integers apparently, e.g., there are not over a 100 outputs with 100%, but actually most of them are 99% I think.
The code:
#!/usr/bin/env python3
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
trajectoryIds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0]
avgSolutionPercentages = [20.6256, 99.1448, 15.6764, 21.8231, 16.3733, 17.7502, 20.0055, 86.6873, 11.3105, 15.6693, 10.3449, 81.8921, 11.6745, 92.6031, 11.8787, 23.0229, 37.9636, 2.3903, 15.1727, 14.7088, 10.0426, 59.6758, 8.0042, 12.4174, 10.0585, 46.0567, 90.2376, 98.3273, 52.8645, 49.3027, 62.4136, 32.6199, 19.0642, 10.3319, 74.6157, 22.5771, 22.4118, 11.2017, 16.5053, 11.2021, 30.8376, 24.5255, 83.1072, 10.1529, 14.3991, 46.3459, 16.2137, 4.5773, 44.9549, 1.0719, 76.5605, 42.6589, 13.6209, 34.2856, 1.3574, 29.0465, 66.8146, 16.4796, 32.9564, 62.0732, 3.7047, 13.8828, 31.6088, 60.1141, 3.3247, 45.0796, 13.7862, 26.4498, 93.6806, 10.3245, 62.5157, 10.9833, 42.5908, 37.3208, 27.4115, 84.1648, 13.9058, 13.9065, 67.8918, 27.9075, 3.6116, 10.9091, 41.0988, 24.2177, 50.2762, 61.3869, 15.5915, 27.6536, 0.7993, 22.9483, 22.3393, 88.1832, 25.1604, 18.3625, 15.7212, 56.9646, 4.0434, 11.8431, 56.0613, 32.5472, 97.8757, 21.8233, 14.8162, 38.8259, 20.5676, 72.7201, 17.7987, 35.8117, 15.1699, 17.0359, 14.0621, 35.9655, 11.9095, 10.5691, 23.3259, 16.1746, 10.1936, 12.5084, 24.1494, 16.4727, 21.0687, 15.7495, 28.8929, 11.0135, 13.3133, 14.6639, 50.1304, 21.0346, 5.1604, 53.5107, 20.0712, 41.5111, 12.1633, 74.3263, 17.7904, 17.1684, 25.3977, 21.5871, 21.9332, 22.6674, 36.6634, 99.1179, 15.3213, 16.3999, 12.0147, 57.5163, 4.2062, 17.3874, 10.7132, 17.4919, 17.8457, 29.3538, 26.1468, 75.1234, 16.4368, 21.6191, 61.1394, 12.9972, 73.5746, 72.5788, 41.6835, 39.9912, 20.1648, 11.7097, 11.5203, 36.7387, 5.0694, 30.8129, 12.0922, 22.5419, 12.3569, 54.6776, 28.3561, 26.1219, 44.7455, 1.3281, 46.5064, 13.6016, 23.5483, 11.7151, 44.3669, 3.2577, 75.0943, 10.8634, 14.8226, 45.7661, 19.7319, 30.7981, 3.5965, 47.8161, 14.5996, 39.4484, 13.0693, 24.9947, 97.4253, 76.7901, 73.1183, 4.0922]
solutionPercentages = [99.2537, 99.8467, 96.4718, 99.6637, 99.6633, 97.1289, 9.7373, 99.5126, 97.3251, 96.0545, 99.6756, 75.6587, 61.1496, 96.7575, 97.1969, 96.5258, 99.7409, 99.8641, 99.8821, 98.5401, 99.7833, 99.6314, 99.7899, 99.9117, 99.5754, 99.5868, 99.7919, 99.9127, 0.0001, 99.7297, 40.8438, 99.8559, 99.6591, 99.8917, 99.3622, 0.0001, 0.0001, 99.4828, 0.0001, 99.8559, 0.0001, 0.0001, 99.6714, 9.9635, 99.8744, 93.8854, 67.3692, 96.3229, 98.4899, 66.9173, 98.2533, 99.8318, 73.9904, 99.8431, 6.2614, 97.2776, 96.0938, 71.9457, 99.9211, 96.1596, 99.8405, 99.6314, 95.4566, 98.4786, 99.8217, 96.1014, 99.0391, 94.6034, 99.8403, 99.9093, 9.8096, 97.8549, 98.7041, 19.9098, 86.3154, 21.5302, 99.2769, 99.0496, 99.7266, 99.8602, 86.7925, 96.3197, 99.9226, 9.4447, 97.9722, 50.4884, 92.2358, 87.4311, 74.2156, 97.8819, 93.2483, 96.3186, 77.9828, 80.2446, 47.1835, 40.8011, 90.5123, 85.7852, 9.8074, 95.9032, 98.5906, 12.5081, 97.0264, 9.9166, 73.6486, 97.8634, 8.4403, 97.7592, 97.9933, 95.8486, 49.7977, 95.1031, 76.1712, 96.1552, 89.0059, 79.6172, 96.7383, 90.8518, 95.8096, 98.2061, 96.3314, 97.5753, 97.9857, 9.0739, 66.9977, 86.5744, 76.8124, 8.6195, 81.3285, 91.0891, 87.3345, 65.3729, 86.7354, 89.9558, 3.1401, 83.4993, 75.1529, 83.5419, 78.3002, 89.8564, 82.2419, 19.3794, 88.2163, 87.9032, 97.8686, 95.0742, 12.3542, 84.7324, 99.4753, 76.1753, 99.5386, 99.8664, 85.7785, 9.9933, 99.7167, 99.9328, 74.4693, 99.7531, 99.0579, 99.5994, 99.7785, 19.2743, 54.7251, 91.7269, 99.5033, 98.9247, 97.6214, 0.0001, 97.7027, 98.6832, 98.4691, 98.9759, 99.7087, 99.9244, 99.4908, 82.1103, 67.6125, 78.2363, 93.5725, 91.5612, 99.8865, 68.5426, 79.0635, 76.8951, 99.3555, 98.9196, 6.1157, 75.8655, 83.8525, 86.1269, 83.3388, 96.1854, 87.1961, 81.7453, 9.2689, 95.2765, 9.0809, 99.8599]
avgSuccess = sum(avgSolutionPercentages)/len(trajectoryIds)
y = solutionPercentages
#Plot
fig, ax = plt.subplots()
ax.hist(y)
ax.set_ylabel('Number of Motions (Total: '+ str(len(trajectoryIds)) + ')')
ax.set_xlabel('Planning Solution (%)')
ax.set_title('Planning Success Rate (Avg: ' + str(round(avgSuccess,2)) + '%)')
plt.legend(loc='upper left')
plt.show()
So I found out how to make the values on the x axis more precise: I changed ax.hist(y) to ax.hist(y, bins = 1000). But that didn't really work out well either:
So now I need to:
get rid of the empty space between my bars (is there a way to get rid of these empty x values?)
while keeping all bars at the same width
change the precision anytime, e.g., from 1 to 0,01 step for each bar
Just any suggestions on how to make the plot (and code) look better are much appreciated :) Maybe it's not the .hist function that's best for this...but I don't know any better - failed doing this with a bar chart so far :(
How about something like
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import numpy as np
trajectoryIds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0]
avgSolutionPercentages = [20.6256, 99.1448, 15.6764, 21.8231, 16.3733, 17.7502, 20.0055, 86.6873, 11.3105, 15.6693, 10.3449, 81.8921, 11.6745, 92.6031, 11.8787, 23.0229, 37.9636, 2.3903, 15.1727, 14.7088, 10.0426, 59.6758, 8.0042, 12.4174, 10.0585, 46.0567, 90.2376, 98.3273, 52.8645, 49.3027, 62.4136, 32.6199, 19.0642, 10.3319, 74.6157, 22.5771, 22.4118, 11.2017, 16.5053, 11.2021, 30.8376, 24.5255, 83.1072, 10.1529, 14.3991, 46.3459, 16.2137, 4.5773, 44.9549, 1.0719, 76.5605, 42.6589, 13.6209, 34.2856, 1.3574, 29.0465, 66.8146, 16.4796, 32.9564, 62.0732, 3.7047, 13.8828, 31.6088, 60.1141, 3.3247, 45.0796, 13.7862, 26.4498, 93.6806, 10.3245, 62.5157, 10.9833, 42.5908, 37.3208, 27.4115, 84.1648, 13.9058, 13.9065, 67.8918, 27.9075, 3.6116, 10.9091, 41.0988, 24.2177, 50.2762, 61.3869, 15.5915, 27.6536, 0.7993, 22.9483, 22.3393, 88.1832, 25.1604, 18.3625, 15.7212, 56.9646, 4.0434, 11.8431, 56.0613, 32.5472, 97.8757, 21.8233, 14.8162, 38.8259, 20.5676, 72.7201, 17.7987, 35.8117, 15.1699, 17.0359, 14.0621, 35.9655, 11.9095, 10.5691, 23.3259, 16.1746, 10.1936, 12.5084, 24.1494, 16.4727, 21.0687, 15.7495, 28.8929, 11.0135, 13.3133, 14.6639, 50.1304, 21.0346, 5.1604, 53.5107, 20.0712, 41.5111, 12.1633, 74.3263, 17.7904, 17.1684, 25.3977, 21.5871, 21.9332, 22.6674, 36.6634, 99.1179, 15.3213, 16.3999, 12.0147, 57.5163, 4.2062, 17.3874, 10.7132, 17.4919, 17.8457, 29.3538, 26.1468, 75.1234, 16.4368, 21.6191, 61.1394, 12.9972, 73.5746, 72.5788, 41.6835, 39.9912, 20.1648, 11.7097, 11.5203, 36.7387, 5.0694, 30.8129, 12.0922, 22.5419, 12.3569, 54.6776, 28.3561, 26.1219, 44.7455, 1.3281, 46.5064, 13.6016, 23.5483, 11.7151, 44.3669, 3.2577, 75.0943, 10.8634, 14.8226, 45.7661, 19.7319, 30.7981, 3.5965, 47.8161, 14.5996, 39.4484, 13.0693, 24.9947, 97.4253, 76.7901, 73.1183, 4.0922]
solutionPercentages = [99.2537, 99.8467, 96.4718, 99.6637, 99.6633, 97.1289, 9.7373, 99.5126, 97.3251, 96.0545, 99.6756, 75.6587, 61.1496, 96.7575, 97.1969, 96.5258, 99.7409, 99.8641, 99.8821, 98.5401, 99.7833, 99.6314, 99.7899, 99.9117, 99.5754, 99.5868, 99.7919, 99.9127, 0.0001, 99.7297, 40.8438, 99.8559, 99.6591, 99.8917, 99.3622, 0.0001, 0.0001, 99.4828, 0.0001, 99.8559, 0.0001, 0.0001, 99.6714, 9.9635, 99.8744, 93.8854, 67.3692, 96.3229, 98.4899, 66.9173, 98.2533, 99.8318, 73.9904, 99.8431, 6.2614, 97.2776, 96.0938, 71.9457, 99.9211, 96.1596, 99.8405, 99.6314, 95.4566, 98.4786, 99.8217, 96.1014, 99.0391, 94.6034, 99.8403, 99.9093, 9.8096, 97.8549, 98.7041, 19.9098, 86.3154, 21.5302, 99.2769, 99.0496, 99.7266, 99.8602, 86.7925, 96.3197, 99.9226, 9.4447, 97.9722, 50.4884, 92.2358, 87.4311, 74.2156, 97.8819, 93.2483, 96.3186, 77.9828, 80.2446, 47.1835, 40.8011, 90.5123, 85.7852, 9.8074, 95.9032, 98.5906, 12.5081, 97.0264, 9.9166, 73.6486, 97.8634, 8.4403, 97.7592, 97.9933, 95.8486, 49.7977, 95.1031, 76.1712, 96.1552, 89.0059, 79.6172, 96.7383, 90.8518, 95.8096, 98.2061, 96.3314, 97.5753, 97.9857, 9.0739, 66.9977, 86.5744, 76.8124, 8.6195, 81.3285, 91.0891, 87.3345, 65.3729, 86.7354, 89.9558, 3.1401, 83.4993, 75.1529, 83.5419, 78.3002, 89.8564, 82.2419, 19.3794, 88.2163, 87.9032, 97.8686, 95.0742, 12.3542, 84.7324, 99.4753, 76.1753, 99.5386, 99.8664, 85.7785, 9.9933, 99.7167, 99.9328, 74.4693, 99.7531, 99.0579, 99.5994, 99.7785, 19.2743, 54.7251, 91.7269, 99.5033, 98.9247, 97.6214, 0.0001, 97.7027, 98.6832, 98.4691, 98.9759, 99.7087, 99.9244, 99.4908, 82.1103, 67.6125, 78.2363, 93.5725, 91.5612, 99.8865, 68.5426, 79.0635, 76.8951, 99.3555, 98.9196, 6.1157, 75.8655, 83.8525, 86.1269, 83.3388, 96.1854, 87.1961, 81.7453, 9.2689, 95.2765, 9.0809, 99.8599]
avgSuccess = sum(avgSolutionPercentages)/len(trajectoryIds)
y = solutionPercentages
BIN_COUNT = 15
BAR_WIDTH = 0.75
fig, ax = plt.subplots()
# use numpy histogram so we can perform filtering
hist, bin_edges = np.histogram(y, bins=BIN_COUNT)
# so we can remove bins with zero entries
non_zero = np.nonzero(hist)
# take only entries where bin is non-zero
hist = hist[non_zero]
bin_edges = bin_edges[non_zero]
# generate labels based on bin edge values (maybe use centers?)
x_ticks = [str(int(edge)) for edge in bin_edges]
indices = np.arange(len(bin_edges))
plt.bar(indices, hist, BAR_WIDTH, align='center')
plt.xticks(indices, x_ticks)
ax.set_ylabel('Number of Motions (Total: '+ str(len(trajectoryIds)) + ')')
ax.set_xlabel('Planning Solution (%)')
ax.set_title('Planning Success Rate (Avg: ' + str(round(avgSuccess,2)) + '%)')
plt.show()
which produces the plot
You may use some nonlinear dependence of the bin width, e.g.
b = 5
bins = (np.linspace(np.min(y)**b, np.max(y)**b))**(1/b)
fig, ax = plt.subplots()
ax.hist(y, bins=bins, edgecolor="k")
Or you may define the bins completely customized, e.g. use a bin width of 10 up to 60 and then use a bin width of 5 till 90, finally use a bin with of 1 till 100.
bins = np.concatenate((np.linspace(0,60,7),
np.linspace(60,90,7),
np.linspace(90,100,11)))
fig, ax = plt.subplots()
ax.hist(y, bins=bins, edgecolor="k")

Python: Linregress slope and y-intercept

I'm working on a program that can calculate the slope using the linregress native scipyy function, but I'm getting two errors (depending on how I try to fix it). The two lists should be two-dimensional, basically x and y values.
from __future__ import division
from scipy.stats import linregress
import matplotlib.pyplot as mplot
import numpy as np
xs=[[20.0, 80.0, 45.0, 42.0, 93.0, 98.0, 65.0, 43.0, 72.0, 36.0, 9.0, 60.0, 47.0, 84.0, 31.0, 46.0, 57.0, 76.0, 27.0, 85.0, 0.0, 39.0, 2.0, 56.0, 68.0, 6.0, 41.0, 28.0, 61.0, 12.0, 32.0, 1.0, 54.0, 77.0, 18.0, 86.0, 62.0, 23.0, 30.0, 69.0, 4.0, 71.0, 64.0, 92.0, 24.0, 79.0, 8.0, 35.0, 49.0, 53.0, 7.0, 59.0, 70.0, 37.0, 13.0, 15.0, 73.0, 89.0, 96.0, 83.0, 22.0, 95.0, 19.0, 67.0, 5.0, 88.0, 38.0, 50.0, 55.0, 52.0, 81.0, 58.0, 11.0, 51.0, 99.0, 78.0, 25.0, 33.0, 40.0, 75.0, 3.0, 91.0, 48.0, 90.0, 82.0, 26.0, 10.0, 16.0, 21.0, 66.0, 14.0, 87.0, 74.0, 97.0, 94.0, 44.0, 29.0, 17.0, 63.0, 34.0], [87.0, 17.0, 69.0, 72.0, 76.0, 62.0, 20.0, 77.0, 5.0, 49.0, 81.0, 3.0, 24.0, 36.0, 44.0, 91.0, 99.0, 35.0, 43.0, 50.0, 12.0, 54.0, 46.0, 30.0, 37.0, 45.0, 90.0, 85.0, 70.0, 83.0, 38.0, 22.0, 23.0, 0.0, 60.0, 47.0, 26.0, 1.0, 95.0, 73.0, 65.0, 94.0, 84.0, 8.0, 34.0, 56.0, 66.0, 13.0, 75.0, 52.0, 19.0, 55.0, 67.0, 39.0, 21.0, 80.0, 98.0, 33.0, 11.0, 68.0, 40.0, 32.0, 2.0, 79.0, 82.0, 93.0, 96.0, 88.0, 14.0, 92.0, 41.0, 89.0, 28.0, 29.0, 42.0, 6.0, 86.0, 74.0, 58.0, 16.0, 31.0, 64.0, 15.0, 53.0, 25.0, 59.0, 61.0, 78.0, 51.0, 7.0, 57.0, 9.0, 97.0, 63.0, 48.0, 71.0, 18.0, 10.0, 4.0, 27.0]]
ys=[[155.506, 50.592, 104.447, 111.318, 36.148, 36.87, 74.266, 106.413, 58.341, 122.563, 180.555, 85.202, 96.84, 50.726, 126.56, 100.686, 88.303, 54.797, 138.487, 44.946, 200.9, 116.524, 193.652, 82.8, 65.823, 184.436, 113.738, 133.458, 83.765, 167.408, 129.491, 200.469, 89.238, 51.799, 159.217, 49.382, 78.443, 146.051, 129.045, 63.805, 185.564, 65.614, 74.243, 43.408, 140.863, 53.446, 182.767, 127.373, 94.494, 91.079, 187.194, 81.254, 68.702, 121.368, 164.756, 169.696, 59.483, 45.978, 33.057, 47.12, 154.755, 33.872, 160.754, 70.256, 190.393, 38.398, 113.188, 100.493, 84.511, 88.635, 49.353, 81.821, 178.876, 95.307, 32.2, 54.715, 141.389, 132.337, 109.673, 57.611, 189.251, 39.283, 97.31, 41.173, 47.529, 140.03, 173.058, 160.288, 154.773, 67.903, 164.718, 42.032, 60.739, 28.656, 34.302, 107.022, 137.344, 160.195, 73.636, 123.797], [14.138, 100.87, 30.287, 28.675, 21.826, 42.445, 97.938, 29.574, 125.976, 59.404, 26.609, 125.743, 95.329, 75.467, 59.497, 15.342, 9.834, 77.402, 65.019, 54.468, 112.64, 45.466, 55.197, 79.992, 71.146, 55.39, 14.795, 15.971, 28.535, 25.862, 73.239, 92.455, 87.635, 137.6, 38.59, 53.718, 86.26, 130.567, 11.274, 33.867, 40.035, 11.07, 16.109, 114.732, 76.552, 45.85, 31.827, 110.877, 26.292, 55.738, 101.801, 48.601, 33.632, 66.647, 98.39, 23.904, 11.172, 78.215, 109.417, 31.653, 68.368, 79.593, 124.548, 21.513, 19.828, 13.48, 9.993, 22.043, 108.229, 16.904, 66.704, 12.262, 79.947, 85.012, 66.754, 124.114, 17.548, 25.872, 45.392, 101.775, 78.085, 36.358, 101.795, 52.045, 87.637, 42.784, 37.011, 26.036, 50.146, 119.666, 42.514, 113.313, 9.125, 42.394, 51.954, 26.898, 96.678, 112.108, 125.252, 86.296]]
slope, intercept, r_value, std_err = linregress(xs,ys)
print(slope)
My error is:
in linregress
ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
ValueError: too many values to unpack (expected 4)
I've tried changing my code to something like this:
slope, intercept, r_value, std_err = linregress(xs[:,0], ys[:,0])
But then my error becomes a TypeError:
TypeError: list indices must be integers or slices, not tuple
Does anyone have any suggestions? Perhaps there's something I don't understand about the use of the linregress function. I'm sure my first error has to do with my lists being 2D. For the second error, I'm lost.
You have two problems:
When interpreted as arrays, your variables xs and ys are two-dimensional with shape (2, 100). When linregress is given both arguments x and y, it expects them to be one-dimensional arrays.
As you can see in the "Returns" section of the docstring, linregress returns five values, not four.
You'll have to call linregress twice, and handle the five return values. For example,
In [144]: slope, intercept, rvalue, pvalue, stderr = linregress(xs[0], ys[0])
In [145]: slope, intercept, rvalue
Out[145]: (-1.7059670627062702, 187.5658196039604, -0.9912859597363385)
In [146]: slope, intercept, rvalue, pvalue, stderr = linregress(xs[1], ys[1])
In [147]: slope, intercept, rvalue
Out[147]: (-1.2455432103210327, 121.51968891089112, -0.9871123119133126)

Custom arrangement of NumPy array elements

I have a NumPy array 'data' as follows:
data = np.array([
[0.0, 30.0, 60.0, 90.0, 120.0, 150.0, -180.0, -150.0, -120.0, -90.0, -60.0, -30.0],
[0.0, 30.0, 60.0, 90.0, 120.0, 150.0, -180.0, -150.0, -120.0, -90.0, -60.0, -30.0],
[0.0, 30.0, 60.0, 90.0, 120.0, 150.0, -180.0, -150.0, -120.0, -90.0, -60.0, -30.0]])
I want to produce the array 'result' from the given array 'data'. Actually, in the required array, the zero column has to be put on the middle, then the values are increasing towards right direction, while the values are decreasing in the left direction as shown below:
result = np.array([
[-180.0, -150.0, -120.0, -90.0, -60.0, -30.0, 0.0, 30.0, 60.0, 90.0, 120.0, 150.0, 180.0],
[-180.0, -150.0, -120.0, -90.0, -60.0, -30.0, 0.0, 30.0, 60.0, 90.0, 120.0, 150.0, 180.0],
[-180.0, -150.0, -120.0, -90.0, -60.0, -30.0, 0.0, 30.0, 60.0, 90.0, 120.0, 150.0, 180.0]])
The resulted array should be based on index manipulation of the given array. What is the best way of doing it in NumPy?
I tried np.rot90, np.flipud, np.fliprl functions with no success.
However, I could not think how to start.
Take a look at np.roll:
np.roll(data, shift=data.shape[1]//2, axis=1)
Here the shift is saying how many elements to roll the data by (Positive values to the right, negative to the left). Given the specification, we want to roll it to the right by half the length of the array along the second dimension (axis=1). The // is integer division, and data.shape[1] gets the size of the dimension along the second dimension (based on zero indexing).
I think you're missing the +180.0 value that you have in result in data though (i.e your result has 13 columns, but your data has only 12).
In [9]: data = np.array([
[0.0, 30.0, 60.0, 90.0, 120.0, 150.0, -180.0, -150.0, -120.0, -90.0, -60.0, -30.0],
[0.0, 30.0, 60.0, 90.0, 120.0, 150.0, -180.0, -150.0, -120.0, -90.0, -60.0, -30.0],
[0.0, 30.0, 60.0, 90.0, 120.0, 150.0, -180.0, -150.0, -120.0, -90.0, -60.0, -30.0]])
In [10]: result = np.roll(data, data.shape[1]//2, axis=1)
In [11]: result
Out[11]:
array([[-180., -150., -120., -90., -60., -30., 0., 30., 60.,
90., 120., 150.],
[-180., -150., -120., -90., -60., -30., 0., 30., 60.,
90., 120., 150.],
[-180., -150., -120., -90., -60., -30., 0., 30., 60.,
90., 120., 150.]])
This isn't really sorting based on index, but given the other methods you tried, I'm guessing this is the type manipulation that you might want.
So, I am guessing that you're trying to sort this array from least to highest value? It isn't entirely clear from your question. Try this?
result = []
for row in data:
result.append(np.sort(row))
You can use the facilities in numpy to easily convert 'result' into a numpy array again. There is more information about np.sort available here

SciPy interpolation ValueError: x and y arrays must be equal in length along interpolation axis

I'm trying to work with interp1d of SciPy.interpolate. I "plugged in" two arrays (filtered_mass and integrated_column), of same size, but it still give me ValueError that the sizes of the arrays must be equal. How can it be?
This is the code I'm using in this part:
def interp_integrated_column(self, definition):
''' (string) -> interpolated_function(mass)
This functions output the interpolated value of the integrated columns
as function of the mass of the WIMP (mDM)
'''
print self.filtered_mass_array
print "len(filtered_mass)", len(self.filtered_mass_array) , "len(integrated_column)", len(self.integrated_columns_values[definition])
print self.integrated_columns_values[definition]
interpolated_values = interp1d(self.filtered_mass_array, self.integrated_columns_values[definition])
return interpolated_values
This is the error message:
[5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 180.0, 200.0, 220.0, 240.0, 260.0, 280.0, 300.0, 330.0, 360.0, 400.0, 450.0, 500.0, 550.0, 600.0, 650.0, 700.0, 750.0, 800.0, 900.0, 1000.0, 1100.0, 1200.0, 1300.0, 1500.0, 1700.0, 2000.0, 2500.0, 3000.0, 4000.0, 5000.0, 6000.0, 7000.0, 8000.0, 9000.0, 10000.0, 12000.0, 15000.0, 20000.0, 30000.0, 50000.0, 100000.0]
len(filtered_mass) 62 len(integrated_column) 62
[[2.8855960615102004e-05], [4.0701386519793902e-05], [6.6563800907013242e-05], [0.0001006393622421269], [0.00019862657113084296], [0.00032843266928887332], [0.00046438711039847576], [0.00060420820026262198], [0.00091858847275374405], [0.0012828446411529174], [0.0016307748004155418], [0.0020049092489578773], [0.0023859804990953733], [0.0027809435562397089], [0.0031914945950108709], [0.0036198713189993367], [0.004049356593219729], [0.058652386100581579], [0.080971818217450073], [0.10330986231789899], [0.13710341994459613], [0.20188314005754618], [0.2891914189026335], [0.37721295733783522], [0.47493929411417846], [0.57539389630897464], [0.70805980165022075], [0.85872215884312952], [1.0664252638663609], [1.2783399280844934], [1.564710616680836], [2.0375181832882485], [2.5037792909103884], [2.9693614352642328], [3.4461139299681416], [3.9753240755452568], [4.5112890074931942], [5.0575238552577968], [5.6116617190278557], [6.75034712149598], [7.9290625424458492], [9.1455816114675219], [10.393026346405367], [14.442148067840661], [18.539929482157905], [22.594593494117799], [28.852213268263831], [39.804824036584456], [51.348027754488449], [83.695041150108111], [118.92653801185628], [155.17895505284363], [192.83930746140334], [231.78928736553948], [271.95372644243321], [313.16712050353419], [398.50142684880342], [532.55760945531256], [768.84170621340957], [1276.9057251660611], [2387.368055624514], [5476.4080305101643]]
Traceback (most recent call last):
File "data_mining.py", line 8, in <module>
e_int = nu_e.interp_integrated_column('e')
File "/home/ohm/projects/mucalc/PPPC4DMID_Reader.py", line 121, in interp_integrated_column
interpolated_values = interp1d(self.filtered_mass_array, self.integrated_columns_values[definition])
File "/usr/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 278, in __init__
raise ValueError("x and y arrays must be equal in length along "
ValueError: x and y arrays must be equal in length along interpolation axis.
Your two lists both have length 62, but they have different shapes interpreted as numpy arrays:
>>> a = [5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 180.0, 200.0, 220.0, 240.0, 260.0, 280.0, 300.0, 330.0, 360.0, 400.0, 450.0, 500.0, 550.0, 600.0, 650.0, 700.0, 750.0, 800.0, 900.0, 1000.0, 1100.0, 1200.0, 1300.0, 1500.0, 1700.0, 2000.0, 2500.0, 3000.0, 4000.0, 5000.0, 6000.0, 7000.0, 8000.0, 9000.0, 10000.0, 12000.0, 15000.0, 20000.0, 30000.0, 50000.0, 100000.0]
>>> b = [[2.8855960615102004e-05], [4.0701386519793902e-05], [6.6563800907013242e-05], [0.0001006393622421269], [0.00019862657113084296], [0.00032843266928887332], [0.00046438711039847576], [0.00060420820026262198], [0.00091858847275374405], [0.0012828446411529174], [0.0016307748004155418], [0.0020049092489578773], [0.0023859804990953733], [0.0027809435562397089], [0.0031914945950108709], [0.0036198713189993367], [0.004049356593219729], [0.058652386100581579], [0.080971818217450073], [0.10330986231789899], [0.13710341994459613], [0.20188314005754618], [0.2891914189026335], [0.37721295733783522], [0.47493929411417846], [0.57539389630897464], [0.70805980165022075], [0.85872215884312952], [1.0664252638663609], [1.2783399280844934], [1.564710616680836], [2.0375181832882485], [2.5037792909103884], [2.9693614352642328], [3.4461139299681416], [3.9753240755452568], [4.5112890074931942], [5.0575238552577968], [5.6116617190278557], [6.75034712149598], [7.9290625424458492], [9.1455816114675219], [10.393026346405367], [14.442148067840661], [18.539929482157905], [22.594593494117799], [28.852213268263831], [39.804824036584456], [51.348027754488449], [83.695041150108111], [118.92653801185628], [155.17895505284363], [192.83930746140334], [231.78928736553948], [271.95372644243321], [313.16712050353419], [398.50142684880342], [532.55760945531256], [768.84170621340957], [1276.9057251660611], [2387.368055624514], [5476.4080305101643]]
>>> np.asarray(a).shape
(62,)
>>> np.asarray(b).shape
(62, 1)
You'll want to make your second array 1D, not 2D. There are roughly a quadrillion ways to do this in numpy, but one is to use .squeeze(), which removes single-dimensional axes:
>>> a = np.asarray(a)
>>> b = np.asarray(b).squeeze()
>>> b.shape
(62,)
after which:
>>> from scipy.interpolate import interp1d
>>> i = interp1d(a,b)
>>> i(2123)
array(31.546555517270704)

Categories

Resources