Histogram hide empty bins - python

I want to illustrate nicely how often (y-axis) a certain output (x-axis) occurs...
My code produces following plot:
It's not good, because the values are rounded to integers apparently, e.g., there are not over a 100 outputs with 100%, but actually most of them are 99% I think.
The code:
#!/usr/bin/env python3
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
trajectoryIds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0]
avgSolutionPercentages = [20.6256, 99.1448, 15.6764, 21.8231, 16.3733, 17.7502, 20.0055, 86.6873, 11.3105, 15.6693, 10.3449, 81.8921, 11.6745, 92.6031, 11.8787, 23.0229, 37.9636, 2.3903, 15.1727, 14.7088, 10.0426, 59.6758, 8.0042, 12.4174, 10.0585, 46.0567, 90.2376, 98.3273, 52.8645, 49.3027, 62.4136, 32.6199, 19.0642, 10.3319, 74.6157, 22.5771, 22.4118, 11.2017, 16.5053, 11.2021, 30.8376, 24.5255, 83.1072, 10.1529, 14.3991, 46.3459, 16.2137, 4.5773, 44.9549, 1.0719, 76.5605, 42.6589, 13.6209, 34.2856, 1.3574, 29.0465, 66.8146, 16.4796, 32.9564, 62.0732, 3.7047, 13.8828, 31.6088, 60.1141, 3.3247, 45.0796, 13.7862, 26.4498, 93.6806, 10.3245, 62.5157, 10.9833, 42.5908, 37.3208, 27.4115, 84.1648, 13.9058, 13.9065, 67.8918, 27.9075, 3.6116, 10.9091, 41.0988, 24.2177, 50.2762, 61.3869, 15.5915, 27.6536, 0.7993, 22.9483, 22.3393, 88.1832, 25.1604, 18.3625, 15.7212, 56.9646, 4.0434, 11.8431, 56.0613, 32.5472, 97.8757, 21.8233, 14.8162, 38.8259, 20.5676, 72.7201, 17.7987, 35.8117, 15.1699, 17.0359, 14.0621, 35.9655, 11.9095, 10.5691, 23.3259, 16.1746, 10.1936, 12.5084, 24.1494, 16.4727, 21.0687, 15.7495, 28.8929, 11.0135, 13.3133, 14.6639, 50.1304, 21.0346, 5.1604, 53.5107, 20.0712, 41.5111, 12.1633, 74.3263, 17.7904, 17.1684, 25.3977, 21.5871, 21.9332, 22.6674, 36.6634, 99.1179, 15.3213, 16.3999, 12.0147, 57.5163, 4.2062, 17.3874, 10.7132, 17.4919, 17.8457, 29.3538, 26.1468, 75.1234, 16.4368, 21.6191, 61.1394, 12.9972, 73.5746, 72.5788, 41.6835, 39.9912, 20.1648, 11.7097, 11.5203, 36.7387, 5.0694, 30.8129, 12.0922, 22.5419, 12.3569, 54.6776, 28.3561, 26.1219, 44.7455, 1.3281, 46.5064, 13.6016, 23.5483, 11.7151, 44.3669, 3.2577, 75.0943, 10.8634, 14.8226, 45.7661, 19.7319, 30.7981, 3.5965, 47.8161, 14.5996, 39.4484, 13.0693, 24.9947, 97.4253, 76.7901, 73.1183, 4.0922]
solutionPercentages = [99.2537, 99.8467, 96.4718, 99.6637, 99.6633, 97.1289, 9.7373, 99.5126, 97.3251, 96.0545, 99.6756, 75.6587, 61.1496, 96.7575, 97.1969, 96.5258, 99.7409, 99.8641, 99.8821, 98.5401, 99.7833, 99.6314, 99.7899, 99.9117, 99.5754, 99.5868, 99.7919, 99.9127, 0.0001, 99.7297, 40.8438, 99.8559, 99.6591, 99.8917, 99.3622, 0.0001, 0.0001, 99.4828, 0.0001, 99.8559, 0.0001, 0.0001, 99.6714, 9.9635, 99.8744, 93.8854, 67.3692, 96.3229, 98.4899, 66.9173, 98.2533, 99.8318, 73.9904, 99.8431, 6.2614, 97.2776, 96.0938, 71.9457, 99.9211, 96.1596, 99.8405, 99.6314, 95.4566, 98.4786, 99.8217, 96.1014, 99.0391, 94.6034, 99.8403, 99.9093, 9.8096, 97.8549, 98.7041, 19.9098, 86.3154, 21.5302, 99.2769, 99.0496, 99.7266, 99.8602, 86.7925, 96.3197, 99.9226, 9.4447, 97.9722, 50.4884, 92.2358, 87.4311, 74.2156, 97.8819, 93.2483, 96.3186, 77.9828, 80.2446, 47.1835, 40.8011, 90.5123, 85.7852, 9.8074, 95.9032, 98.5906, 12.5081, 97.0264, 9.9166, 73.6486, 97.8634, 8.4403, 97.7592, 97.9933, 95.8486, 49.7977, 95.1031, 76.1712, 96.1552, 89.0059, 79.6172, 96.7383, 90.8518, 95.8096, 98.2061, 96.3314, 97.5753, 97.9857, 9.0739, 66.9977, 86.5744, 76.8124, 8.6195, 81.3285, 91.0891, 87.3345, 65.3729, 86.7354, 89.9558, 3.1401, 83.4993, 75.1529, 83.5419, 78.3002, 89.8564, 82.2419, 19.3794, 88.2163, 87.9032, 97.8686, 95.0742, 12.3542, 84.7324, 99.4753, 76.1753, 99.5386, 99.8664, 85.7785, 9.9933, 99.7167, 99.9328, 74.4693, 99.7531, 99.0579, 99.5994, 99.7785, 19.2743, 54.7251, 91.7269, 99.5033, 98.9247, 97.6214, 0.0001, 97.7027, 98.6832, 98.4691, 98.9759, 99.7087, 99.9244, 99.4908, 82.1103, 67.6125, 78.2363, 93.5725, 91.5612, 99.8865, 68.5426, 79.0635, 76.8951, 99.3555, 98.9196, 6.1157, 75.8655, 83.8525, 86.1269, 83.3388, 96.1854, 87.1961, 81.7453, 9.2689, 95.2765, 9.0809, 99.8599]
avgSuccess = sum(avgSolutionPercentages)/len(trajectoryIds)
y = solutionPercentages
#Plot
fig, ax = plt.subplots()
ax.hist(y)
ax.set_ylabel('Number of Motions (Total: '+ str(len(trajectoryIds)) + ')')
ax.set_xlabel('Planning Solution (%)')
ax.set_title('Planning Success Rate (Avg: ' + str(round(avgSuccess,2)) + '%)')
plt.legend(loc='upper left')
plt.show()
So I found out how to make the values on the x axis more precise: I changed ax.hist(y) to ax.hist(y, bins = 1000). But that didn't really work out well either:
So now I need to:
get rid of the empty space between my bars (is there a way to get rid of these empty x values?)
while keeping all bars at the same width
change the precision anytime, e.g., from 1 to 0,01 step for each bar
Just any suggestions on how to make the plot (and code) look better are much appreciated :) Maybe it's not the .hist function that's best for this...but I don't know any better - failed doing this with a bar chart so far :(

How about something like
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import numpy as np
trajectoryIds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0]
avgSolutionPercentages = [20.6256, 99.1448, 15.6764, 21.8231, 16.3733, 17.7502, 20.0055, 86.6873, 11.3105, 15.6693, 10.3449, 81.8921, 11.6745, 92.6031, 11.8787, 23.0229, 37.9636, 2.3903, 15.1727, 14.7088, 10.0426, 59.6758, 8.0042, 12.4174, 10.0585, 46.0567, 90.2376, 98.3273, 52.8645, 49.3027, 62.4136, 32.6199, 19.0642, 10.3319, 74.6157, 22.5771, 22.4118, 11.2017, 16.5053, 11.2021, 30.8376, 24.5255, 83.1072, 10.1529, 14.3991, 46.3459, 16.2137, 4.5773, 44.9549, 1.0719, 76.5605, 42.6589, 13.6209, 34.2856, 1.3574, 29.0465, 66.8146, 16.4796, 32.9564, 62.0732, 3.7047, 13.8828, 31.6088, 60.1141, 3.3247, 45.0796, 13.7862, 26.4498, 93.6806, 10.3245, 62.5157, 10.9833, 42.5908, 37.3208, 27.4115, 84.1648, 13.9058, 13.9065, 67.8918, 27.9075, 3.6116, 10.9091, 41.0988, 24.2177, 50.2762, 61.3869, 15.5915, 27.6536, 0.7993, 22.9483, 22.3393, 88.1832, 25.1604, 18.3625, 15.7212, 56.9646, 4.0434, 11.8431, 56.0613, 32.5472, 97.8757, 21.8233, 14.8162, 38.8259, 20.5676, 72.7201, 17.7987, 35.8117, 15.1699, 17.0359, 14.0621, 35.9655, 11.9095, 10.5691, 23.3259, 16.1746, 10.1936, 12.5084, 24.1494, 16.4727, 21.0687, 15.7495, 28.8929, 11.0135, 13.3133, 14.6639, 50.1304, 21.0346, 5.1604, 53.5107, 20.0712, 41.5111, 12.1633, 74.3263, 17.7904, 17.1684, 25.3977, 21.5871, 21.9332, 22.6674, 36.6634, 99.1179, 15.3213, 16.3999, 12.0147, 57.5163, 4.2062, 17.3874, 10.7132, 17.4919, 17.8457, 29.3538, 26.1468, 75.1234, 16.4368, 21.6191, 61.1394, 12.9972, 73.5746, 72.5788, 41.6835, 39.9912, 20.1648, 11.7097, 11.5203, 36.7387, 5.0694, 30.8129, 12.0922, 22.5419, 12.3569, 54.6776, 28.3561, 26.1219, 44.7455, 1.3281, 46.5064, 13.6016, 23.5483, 11.7151, 44.3669, 3.2577, 75.0943, 10.8634, 14.8226, 45.7661, 19.7319, 30.7981, 3.5965, 47.8161, 14.5996, 39.4484, 13.0693, 24.9947, 97.4253, 76.7901, 73.1183, 4.0922]
solutionPercentages = [99.2537, 99.8467, 96.4718, 99.6637, 99.6633, 97.1289, 9.7373, 99.5126, 97.3251, 96.0545, 99.6756, 75.6587, 61.1496, 96.7575, 97.1969, 96.5258, 99.7409, 99.8641, 99.8821, 98.5401, 99.7833, 99.6314, 99.7899, 99.9117, 99.5754, 99.5868, 99.7919, 99.9127, 0.0001, 99.7297, 40.8438, 99.8559, 99.6591, 99.8917, 99.3622, 0.0001, 0.0001, 99.4828, 0.0001, 99.8559, 0.0001, 0.0001, 99.6714, 9.9635, 99.8744, 93.8854, 67.3692, 96.3229, 98.4899, 66.9173, 98.2533, 99.8318, 73.9904, 99.8431, 6.2614, 97.2776, 96.0938, 71.9457, 99.9211, 96.1596, 99.8405, 99.6314, 95.4566, 98.4786, 99.8217, 96.1014, 99.0391, 94.6034, 99.8403, 99.9093, 9.8096, 97.8549, 98.7041, 19.9098, 86.3154, 21.5302, 99.2769, 99.0496, 99.7266, 99.8602, 86.7925, 96.3197, 99.9226, 9.4447, 97.9722, 50.4884, 92.2358, 87.4311, 74.2156, 97.8819, 93.2483, 96.3186, 77.9828, 80.2446, 47.1835, 40.8011, 90.5123, 85.7852, 9.8074, 95.9032, 98.5906, 12.5081, 97.0264, 9.9166, 73.6486, 97.8634, 8.4403, 97.7592, 97.9933, 95.8486, 49.7977, 95.1031, 76.1712, 96.1552, 89.0059, 79.6172, 96.7383, 90.8518, 95.8096, 98.2061, 96.3314, 97.5753, 97.9857, 9.0739, 66.9977, 86.5744, 76.8124, 8.6195, 81.3285, 91.0891, 87.3345, 65.3729, 86.7354, 89.9558, 3.1401, 83.4993, 75.1529, 83.5419, 78.3002, 89.8564, 82.2419, 19.3794, 88.2163, 87.9032, 97.8686, 95.0742, 12.3542, 84.7324, 99.4753, 76.1753, 99.5386, 99.8664, 85.7785, 9.9933, 99.7167, 99.9328, 74.4693, 99.7531, 99.0579, 99.5994, 99.7785, 19.2743, 54.7251, 91.7269, 99.5033, 98.9247, 97.6214, 0.0001, 97.7027, 98.6832, 98.4691, 98.9759, 99.7087, 99.9244, 99.4908, 82.1103, 67.6125, 78.2363, 93.5725, 91.5612, 99.8865, 68.5426, 79.0635, 76.8951, 99.3555, 98.9196, 6.1157, 75.8655, 83.8525, 86.1269, 83.3388, 96.1854, 87.1961, 81.7453, 9.2689, 95.2765, 9.0809, 99.8599]
avgSuccess = sum(avgSolutionPercentages)/len(trajectoryIds)
y = solutionPercentages
BIN_COUNT = 15
BAR_WIDTH = 0.75
fig, ax = plt.subplots()
# use numpy histogram so we can perform filtering
hist, bin_edges = np.histogram(y, bins=BIN_COUNT)
# so we can remove bins with zero entries
non_zero = np.nonzero(hist)
# take only entries where bin is non-zero
hist = hist[non_zero]
bin_edges = bin_edges[non_zero]
# generate labels based on bin edge values (maybe use centers?)
x_ticks = [str(int(edge)) for edge in bin_edges]
indices = np.arange(len(bin_edges))
plt.bar(indices, hist, BAR_WIDTH, align='center')
plt.xticks(indices, x_ticks)
ax.set_ylabel('Number of Motions (Total: '+ str(len(trajectoryIds)) + ')')
ax.set_xlabel('Planning Solution (%)')
ax.set_title('Planning Success Rate (Avg: ' + str(round(avgSuccess,2)) + '%)')
plt.show()
which produces the plot

You may use some nonlinear dependence of the bin width, e.g.
b = 5
bins = (np.linspace(np.min(y)**b, np.max(y)**b))**(1/b)
fig, ax = plt.subplots()
ax.hist(y, bins=bins, edgecolor="k")
Or you may define the bins completely customized, e.g. use a bin width of 10 up to 60 and then use a bin width of 5 till 90, finally use a bin with of 1 till 100.
bins = np.concatenate((np.linspace(0,60,7),
np.linspace(60,90,7),
np.linspace(90,100,11)))
fig, ax = plt.subplots()
ax.hist(y, bins=bins, edgecolor="k")

Related

Constructing a 2d interpolator given scattered input data

I have three lists as follows:
x = [100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0]
y = [300.0, 300.0, 300.0, 300.0, 500.0, 500.0, 500.0, 500.0, 700.0, 700.0, 700.0, 700.0, 1000.0, 1000.0, 1000.0, 1000.0, 1500.0, 1500.0, 1500.0, 1500.0, 2000.0, 2000.0, 2000.0, 2000.0, 3000.0, 3000.0, 3000.0, 3000.0, 5000.0, 5000.0, 5000.0, 5000.0, 7500.0, 7500.0, 7500.0, 75000.0, 10000.0, 10000.0, 10000.0, 10000.0]
z = [100.0, 95.0, 87.5, 77.5, 60.0, 57.0, 52.5, 46.5, 40.0, 38.0, 35.0, 31.0, 30.0, 28.5, 26.25, 23.25, 23.0, 21.85, 20.125, 17.825, 17.0, 16.15, 14.875, 13.175, 13.0, 12.35, 11.375, 10.075, 10.0, 9.5, 8.75, 7.75, 7.0, 6.65, 6.125, 5.425, 5.0, 4.75, 4.375, 3.875]
Each entry of each list is read as a point so point 0 is (100,300,100) point 1 is (75,300,95) and so on.
I am trying to do 2d interpolation, so that I can compute a z value for any given input (x0, y0) point.
I was reading that using meshgrid I can interpolate with RegularGridInterpolator from scipy but I am not sure how to set it up when I do:
x_,y_,z_ = np.meshgrid(x,y,z) # both indexing ij or xy
I don't get values for x_,y_,z_ that make sense and I am not sure how to go from there.
I am trying to use the data points I have above to find intermediate values so something similar to scipy's interp1d where
f = interp1d(x, y, kind='cubic')
where I can later call f(any (x,y) point within range) and get the corresponding z value.
You need 2d interpolation over scattered data. I'd default to using scipy.interpolate.griddata in this case, but you seem to want a callable interpolator, whereas griddata needs a given set of points onto which it will interpolate.
Not to worry: griddata with 2d cubic interpolation uses a CloughTocher2DInterpolator. So we can do exactly that:
import numpy as np
import scipy.interpolate as interp
x = [100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0, 100.0, 75.0, 50.0, 0.0]
y = [300.0, 300.0, 300.0, 300.0, 500.0, 500.0, 500.0, 500.0, 700.0, 700.0, 700.0, 700.0, 1000.0, 1000.0, 1000.0, 1000.0, 1500.0, 1500.0, 1500.0, 1500.0, 2000.0, 2000.0, 2000.0, 2000.0, 3000.0, 3000.0, 3000.0, 3000.0, 5000.0, 5000.0, 5000.0, 5000.0, 7500.0, 7500.0, 7500.0, 75000.0, 10000.0, 10000.0, 10000.0, 10000.0]
z = [100.0, 95.0, 87.5, 77.5, 60.0, 57.0, 52.5, 46.5, 40.0, 38.0, 35.0, 31.0, 30.0, 28.5, 26.25, 23.25, 23.0, 21.85, 20.125, 17.825, 17.0, 16.15, 14.875, 13.175, 13.0, 12.35, 11.375, 10.075, 10.0, 9.5, 8.75, 7.75, 7.0, 6.65, 6.125, 5.425, 5.0, 4.75, 4.375, 3.875]
interpolator = interp.CloughTocher2DInterpolator(np.array([x,y]).T, z)
Now you can call this interpolator with 2 coordinates to give you the corresponding interpolated data point:
>>> interpolator(x[10], y[10]) == z[10]
True
>>> interpolator(2, 300)
array(77.81343)
Note that you'll have to stay inside the convex hull of the input points, otherwise you'll get nan (or whatever is passed as the fill_value keyword to the interpolator):
>>> interpolator(2, 30)
array(nan)
Extrapolation is usually meaningless anyway, and your input points are scattered in a bit erratic way:
So even if extrapolation was possible I wouldn't believe it.
Just to demonstrate how the resulting interpolator is constrained to the convex hull of the input points, here's a surface plot of your data on a gridded mesh we create just for plotting:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# go linearly in the x grid
xline = np.linspace(min(x), max(x), 30)
# go logarithmically in the y grid (considering y distribution)
yline = np.logspace(np.log10(min(y)), np.log10(max(y)), 30)
# construct 2d grid from these
xgrid,ygrid = np.meshgrid(xline, yline)
# interpolate z data; same shape as xgrid and ygrid
z_interp = interpolator(xgrid, ygrid)
# create 3d Axes and plot surface and base points
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xgrid, ygrid, z_interp, cmap='viridis',
vmin=min(z), vmax=max(z))
ax.plot(x, y, z, 'ro')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
Here's the output from two angles (it's better to rotate around interactively; such stills don't do the 3d representation justice):
There are two main features to note:
The surface nicely fits the red points, which is expected from interpolation. Fortunately the input points are nice and smooth so everything goes well with interpolation. (The fact that the red points are usually hidden by the surface is only due to how pyplot's renderer mishandles the relative position of complex 3d objects)
The surface is cut (due to nan values) along the convex hull of the input points, so even though our gridded arrays define a rectangular grid we only get a cut of the surface where interpolation makes sense.

Python: Linregress slope and y-intercept

I'm working on a program that can calculate the slope using the linregress native scipyy function, but I'm getting two errors (depending on how I try to fix it). The two lists should be two-dimensional, basically x and y values.
from __future__ import division
from scipy.stats import linregress
import matplotlib.pyplot as mplot
import numpy as np
xs=[[20.0, 80.0, 45.0, 42.0, 93.0, 98.0, 65.0, 43.0, 72.0, 36.0, 9.0, 60.0, 47.0, 84.0, 31.0, 46.0, 57.0, 76.0, 27.0, 85.0, 0.0, 39.0, 2.0, 56.0, 68.0, 6.0, 41.0, 28.0, 61.0, 12.0, 32.0, 1.0, 54.0, 77.0, 18.0, 86.0, 62.0, 23.0, 30.0, 69.0, 4.0, 71.0, 64.0, 92.0, 24.0, 79.0, 8.0, 35.0, 49.0, 53.0, 7.0, 59.0, 70.0, 37.0, 13.0, 15.0, 73.0, 89.0, 96.0, 83.0, 22.0, 95.0, 19.0, 67.0, 5.0, 88.0, 38.0, 50.0, 55.0, 52.0, 81.0, 58.0, 11.0, 51.0, 99.0, 78.0, 25.0, 33.0, 40.0, 75.0, 3.0, 91.0, 48.0, 90.0, 82.0, 26.0, 10.0, 16.0, 21.0, 66.0, 14.0, 87.0, 74.0, 97.0, 94.0, 44.0, 29.0, 17.0, 63.0, 34.0], [87.0, 17.0, 69.0, 72.0, 76.0, 62.0, 20.0, 77.0, 5.0, 49.0, 81.0, 3.0, 24.0, 36.0, 44.0, 91.0, 99.0, 35.0, 43.0, 50.0, 12.0, 54.0, 46.0, 30.0, 37.0, 45.0, 90.0, 85.0, 70.0, 83.0, 38.0, 22.0, 23.0, 0.0, 60.0, 47.0, 26.0, 1.0, 95.0, 73.0, 65.0, 94.0, 84.0, 8.0, 34.0, 56.0, 66.0, 13.0, 75.0, 52.0, 19.0, 55.0, 67.0, 39.0, 21.0, 80.0, 98.0, 33.0, 11.0, 68.0, 40.0, 32.0, 2.0, 79.0, 82.0, 93.0, 96.0, 88.0, 14.0, 92.0, 41.0, 89.0, 28.0, 29.0, 42.0, 6.0, 86.0, 74.0, 58.0, 16.0, 31.0, 64.0, 15.0, 53.0, 25.0, 59.0, 61.0, 78.0, 51.0, 7.0, 57.0, 9.0, 97.0, 63.0, 48.0, 71.0, 18.0, 10.0, 4.0, 27.0]]
ys=[[155.506, 50.592, 104.447, 111.318, 36.148, 36.87, 74.266, 106.413, 58.341, 122.563, 180.555, 85.202, 96.84, 50.726, 126.56, 100.686, 88.303, 54.797, 138.487, 44.946, 200.9, 116.524, 193.652, 82.8, 65.823, 184.436, 113.738, 133.458, 83.765, 167.408, 129.491, 200.469, 89.238, 51.799, 159.217, 49.382, 78.443, 146.051, 129.045, 63.805, 185.564, 65.614, 74.243, 43.408, 140.863, 53.446, 182.767, 127.373, 94.494, 91.079, 187.194, 81.254, 68.702, 121.368, 164.756, 169.696, 59.483, 45.978, 33.057, 47.12, 154.755, 33.872, 160.754, 70.256, 190.393, 38.398, 113.188, 100.493, 84.511, 88.635, 49.353, 81.821, 178.876, 95.307, 32.2, 54.715, 141.389, 132.337, 109.673, 57.611, 189.251, 39.283, 97.31, 41.173, 47.529, 140.03, 173.058, 160.288, 154.773, 67.903, 164.718, 42.032, 60.739, 28.656, 34.302, 107.022, 137.344, 160.195, 73.636, 123.797], [14.138, 100.87, 30.287, 28.675, 21.826, 42.445, 97.938, 29.574, 125.976, 59.404, 26.609, 125.743, 95.329, 75.467, 59.497, 15.342, 9.834, 77.402, 65.019, 54.468, 112.64, 45.466, 55.197, 79.992, 71.146, 55.39, 14.795, 15.971, 28.535, 25.862, 73.239, 92.455, 87.635, 137.6, 38.59, 53.718, 86.26, 130.567, 11.274, 33.867, 40.035, 11.07, 16.109, 114.732, 76.552, 45.85, 31.827, 110.877, 26.292, 55.738, 101.801, 48.601, 33.632, 66.647, 98.39, 23.904, 11.172, 78.215, 109.417, 31.653, 68.368, 79.593, 124.548, 21.513, 19.828, 13.48, 9.993, 22.043, 108.229, 16.904, 66.704, 12.262, 79.947, 85.012, 66.754, 124.114, 17.548, 25.872, 45.392, 101.775, 78.085, 36.358, 101.795, 52.045, 87.637, 42.784, 37.011, 26.036, 50.146, 119.666, 42.514, 113.313, 9.125, 42.394, 51.954, 26.898, 96.678, 112.108, 125.252, 86.296]]
slope, intercept, r_value, std_err = linregress(xs,ys)
print(slope)
My error is:
in linregress
ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat
ValueError: too many values to unpack (expected 4)
I've tried changing my code to something like this:
slope, intercept, r_value, std_err = linregress(xs[:,0], ys[:,0])
But then my error becomes a TypeError:
TypeError: list indices must be integers or slices, not tuple
Does anyone have any suggestions? Perhaps there's something I don't understand about the use of the linregress function. I'm sure my first error has to do with my lists being 2D. For the second error, I'm lost.
You have two problems:
When interpreted as arrays, your variables xs and ys are two-dimensional with shape (2, 100). When linregress is given both arguments x and y, it expects them to be one-dimensional arrays.
As you can see in the "Returns" section of the docstring, linregress returns five values, not four.
You'll have to call linregress twice, and handle the five return values. For example,
In [144]: slope, intercept, rvalue, pvalue, stderr = linregress(xs[0], ys[0])
In [145]: slope, intercept, rvalue
Out[145]: (-1.7059670627062702, 187.5658196039604, -0.9912859597363385)
In [146]: slope, intercept, rvalue, pvalue, stderr = linregress(xs[1], ys[1])
In [147]: slope, intercept, rvalue
Out[147]: (-1.2455432103210327, 121.51968891089112, -0.9871123119133126)

How to take value from one cell and add to list over multiple excel files

I'm trying to select the same cell from multiple excel files and add them to a list, but I keep getting double of the same number. How do I solve this?
I'm using xlrd, os, and numpy libraries to do this.
for root, dirs, files in os.walk("/Users/Isaac/Experiment"):
xlsfiles = [_ for _ in files if _.endswith('xlsx')]
my_matrix = []
my_matrix_2 = []
for xlsfile in xlsfiles:
workbook = xlrd.open_workbook(os.path.join(root,xlsfile))
worksheet = workbook.sheet_by_index(0)
for col in range(worksheet.ncols):
my_matrix_2.append(worksheet.cell_value(4,1))
print my_matrix_2
What I get as as a result is
[4.0, 4.0, 40.0, 40.0, 44.0, 44.0, 48.0, 48.0, 52.0, 52.0, 56.0, 56.0, 60.0, 60.0, 64.0, 64.0, 68.0, 68.0, 72.0, 72.0, 76.0, 76.0, 8.0, 8.0, 80.0, 80.0, 84.0, 84.0, 88.0, 88.0, 92.0, 92.0, 96.0, 96.0, 100.0, 100.0, 12.0, 12.0, 16.0, 16.0, 20.0, 20.0, 24.0, 24.0, 28.0, 28.0, 32.0, 32.0, 36.0, 36.0]

Is there a simpler way for finding a number

I'm writing a python script.
I have a list of numbers:
b = [55.0, 54.0, 54.0, 53.0, 52.0, 51.0, 50.0, 49.0, 48.0, 47.0,
45.0, 45.0, 44.0, 43.0, 41.0, 40.0, 39.0, 39.0, 38.0, 37.0, 36.0, 35.0, 34.0, 33.0, 32.0, 31.0, 30.0, 28.0, 27.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 22.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 11.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]
I need to parse the list and see if the list contains '50'. If it does not,I have to search for one less number 49. if it is not there I have to look for 48. I can do this down to 47.
In python, is there a one liner code I can do this, or can I use a lambda for this?
You could use min() and abs():
>>> b = [55.0, 54.0, 54.0, 53.0, 52.0, 51.0, 50.0, 49.0, 48.0, 47.0, 45.0, 45.0, 44.0, 43.0, 41.0, 40.0, 39.0, 39.0, 38.0, 37.0, 36.0, 35.0, 34.0, 33.0, 32.0, 31.0, 30.0, 28.0, 27.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 22.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 11.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]
>>> min(b, key=lambda x:abs(x-50))
50.0
>>> min(b, key=lambda x:abs(x-20.1))
20.0
max(i for i in b if i <= 50)
It will raise a ValueError if there are no elements that match the condition.
max(filter(lambda i: i<=50, b))
or, to handle list with all elements above 50:
max(filter(lambda i: i<=50, b) or [None])
You can do this with a generator expression and max.
max(n for n in b if n >= 47 and n <= 50)
highestValue = max(b)
lowestValue = min(b)
if 50 in b:
pass
Three different ways of finding numbers, highest, lowest and if 50 is in the mix.
And if you need to check if multiple numbers is in your hughe list, say you need to know if 50, 30 and 40 is in there:
set(b).issuperset(set([50, 40, 30]))
Oneliner without any lambda (raises ValueError if value not found):
max((x for x in b if 46 < x <= 50))
or version that returns None in this case:
from itertools import chain
max(chain((x for x in b if 46 < x <= 50), (None,)))

SciPy interpolation ValueError: x and y arrays must be equal in length along interpolation axis

I'm trying to work with interp1d of SciPy.interpolate. I "plugged in" two arrays (filtered_mass and integrated_column), of same size, but it still give me ValueError that the sizes of the arrays must be equal. How can it be?
This is the code I'm using in this part:
def interp_integrated_column(self, definition):
''' (string) -> interpolated_function(mass)
This functions output the interpolated value of the integrated columns
as function of the mass of the WIMP (mDM)
'''
print self.filtered_mass_array
print "len(filtered_mass)", len(self.filtered_mass_array) , "len(integrated_column)", len(self.integrated_columns_values[definition])
print self.integrated_columns_values[definition]
interpolated_values = interp1d(self.filtered_mass_array, self.integrated_columns_values[definition])
return interpolated_values
This is the error message:
[5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 180.0, 200.0, 220.0, 240.0, 260.0, 280.0, 300.0, 330.0, 360.0, 400.0, 450.0, 500.0, 550.0, 600.0, 650.0, 700.0, 750.0, 800.0, 900.0, 1000.0, 1100.0, 1200.0, 1300.0, 1500.0, 1700.0, 2000.0, 2500.0, 3000.0, 4000.0, 5000.0, 6000.0, 7000.0, 8000.0, 9000.0, 10000.0, 12000.0, 15000.0, 20000.0, 30000.0, 50000.0, 100000.0]
len(filtered_mass) 62 len(integrated_column) 62
[[2.8855960615102004e-05], [4.0701386519793902e-05], [6.6563800907013242e-05], [0.0001006393622421269], [0.00019862657113084296], [0.00032843266928887332], [0.00046438711039847576], [0.00060420820026262198], [0.00091858847275374405], [0.0012828446411529174], [0.0016307748004155418], [0.0020049092489578773], [0.0023859804990953733], [0.0027809435562397089], [0.0031914945950108709], [0.0036198713189993367], [0.004049356593219729], [0.058652386100581579], [0.080971818217450073], [0.10330986231789899], [0.13710341994459613], [0.20188314005754618], [0.2891914189026335], [0.37721295733783522], [0.47493929411417846], [0.57539389630897464], [0.70805980165022075], [0.85872215884312952], [1.0664252638663609], [1.2783399280844934], [1.564710616680836], [2.0375181832882485], [2.5037792909103884], [2.9693614352642328], [3.4461139299681416], [3.9753240755452568], [4.5112890074931942], [5.0575238552577968], [5.6116617190278557], [6.75034712149598], [7.9290625424458492], [9.1455816114675219], [10.393026346405367], [14.442148067840661], [18.539929482157905], [22.594593494117799], [28.852213268263831], [39.804824036584456], [51.348027754488449], [83.695041150108111], [118.92653801185628], [155.17895505284363], [192.83930746140334], [231.78928736553948], [271.95372644243321], [313.16712050353419], [398.50142684880342], [532.55760945531256], [768.84170621340957], [1276.9057251660611], [2387.368055624514], [5476.4080305101643]]
Traceback (most recent call last):
File "data_mining.py", line 8, in <module>
e_int = nu_e.interp_integrated_column('e')
File "/home/ohm/projects/mucalc/PPPC4DMID_Reader.py", line 121, in interp_integrated_column
interpolated_values = interp1d(self.filtered_mass_array, self.integrated_columns_values[definition])
File "/usr/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 278, in __init__
raise ValueError("x and y arrays must be equal in length along "
ValueError: x and y arrays must be equal in length along interpolation axis.
Your two lists both have length 62, but they have different shapes interpreted as numpy arrays:
>>> a = [5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 180.0, 200.0, 220.0, 240.0, 260.0, 280.0, 300.0, 330.0, 360.0, 400.0, 450.0, 500.0, 550.0, 600.0, 650.0, 700.0, 750.0, 800.0, 900.0, 1000.0, 1100.0, 1200.0, 1300.0, 1500.0, 1700.0, 2000.0, 2500.0, 3000.0, 4000.0, 5000.0, 6000.0, 7000.0, 8000.0, 9000.0, 10000.0, 12000.0, 15000.0, 20000.0, 30000.0, 50000.0, 100000.0]
>>> b = [[2.8855960615102004e-05], [4.0701386519793902e-05], [6.6563800907013242e-05], [0.0001006393622421269], [0.00019862657113084296], [0.00032843266928887332], [0.00046438711039847576], [0.00060420820026262198], [0.00091858847275374405], [0.0012828446411529174], [0.0016307748004155418], [0.0020049092489578773], [0.0023859804990953733], [0.0027809435562397089], [0.0031914945950108709], [0.0036198713189993367], [0.004049356593219729], [0.058652386100581579], [0.080971818217450073], [0.10330986231789899], [0.13710341994459613], [0.20188314005754618], [0.2891914189026335], [0.37721295733783522], [0.47493929411417846], [0.57539389630897464], [0.70805980165022075], [0.85872215884312952], [1.0664252638663609], [1.2783399280844934], [1.564710616680836], [2.0375181832882485], [2.5037792909103884], [2.9693614352642328], [3.4461139299681416], [3.9753240755452568], [4.5112890074931942], [5.0575238552577968], [5.6116617190278557], [6.75034712149598], [7.9290625424458492], [9.1455816114675219], [10.393026346405367], [14.442148067840661], [18.539929482157905], [22.594593494117799], [28.852213268263831], [39.804824036584456], [51.348027754488449], [83.695041150108111], [118.92653801185628], [155.17895505284363], [192.83930746140334], [231.78928736553948], [271.95372644243321], [313.16712050353419], [398.50142684880342], [532.55760945531256], [768.84170621340957], [1276.9057251660611], [2387.368055624514], [5476.4080305101643]]
>>> np.asarray(a).shape
(62,)
>>> np.asarray(b).shape
(62, 1)
You'll want to make your second array 1D, not 2D. There are roughly a quadrillion ways to do this in numpy, but one is to use .squeeze(), which removes single-dimensional axes:
>>> a = np.asarray(a)
>>> b = np.asarray(b).squeeze()
>>> b.shape
(62,)
after which:
>>> from scipy.interpolate import interp1d
>>> i = interp1d(a,b)
>>> i(2123)
array(31.546555517270704)

Categories

Resources