I am making a grouped bar chart of proficiency levels on a standardized test. Here is my code:
bush_prof_boy = bush.groupby(['BOY Prof'])['BOY Prof'].count()
bush_prof_pct_boy = bush_prof_boy/bush['BOY Prof'].count() * 100
bush_prof_eoy = bush.groupby(['EOY Prof'])['EOY Prof'].count()
bush_prof_pct_eoy = bush_prof_eoy/bush['EOY Prof'].count() * 100
labels = ['Remedial', 'Below Proficient', 'Proficient', 'Advanced']
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, bush_prof_pct_boy, width, label='BOY',
color='mediumorchid')
rects2 = ax.bar(x + width/2, bush_prof_pct_eoy, width, label='EOY', color='teal')
ax.set_ylabel('% of Students at Proficiency Level', fontsize=18)
ax.set_title('Bushwick Middle Change in Proficiency Levels', fontsize=25)
ax.set_xticks(x)
ax.set_xticklabels(labels, fontsize=25)
ax.legend(fontsize=25)
plt.yticks(fontsize=15)
plt.figure(figsize=(5,15))
plt.show()
"BOY" stands for "Beginning of Year" and "EOY" "End of Year" so the bar graph is intended to show percent of students who fell into each proficiency level at the beginning and end of the year. The graph looks alright but when I drill into the numbers, I can see that the labels for EOY are incorrect. Here is my graph:
The percentages for BOY are graphed correctly, but the EOY ones are with the wrong labels. Here are the actual percentages, which I am certain are correct:
BOY %
Advanced 14.0
Below Proficient 38.0
Proficient 34.0
Remedial 14.0
EOY %
Advanced 39.0
Below Proficient 18.0
Proficient 32.0
Remedial 11.0
Using data from Kaggle: Brooklyn NY Schools
Calculating the bar groups separately can be problematic. It is better to make the calculations within one dataframe, shape the dataframe, and then plot, because this will ensure the bars are plotted in the correct groups.
Since no data is provided, this begins with wide form numeric data and then cleans and shapes the dataframe.
Numeric values are converted to categorical with .cut
Dataframe is converted to long form with .melt, and then use .groupby to calculate percentage within the 'x of Year'
Reshaped with .pivot, and plot with pandas.DataFrame.plot
Tested in python 3.8, pandas 1.3.1, and matplotlib 3.4.2
Imports, Load and Clean the DataFrame
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
# data
data = {'BOY': [11.0, 11.0, 11.0, 11.0, 11.0, 8.0, 11.0, 14.0, 12.0, 13.0, 11.0, 14.0, 10.0, 9.0, 10.0, 10.0, 10.0, 12.0, 12.0, 13.0, 12.0, 11.0, 9.0, 12.0, 16.0, 12.0, 12.0, 12.0, 15.0, 10.0, 10.0, 10.0, 8.0, 11.0, 12.0, 14.0, 10.0, 8.0, 11.0, 12.0, 14.0, 12.0, 13.0, 15.0, 13.0, 8.0, 8.0, 11.0, 10.0, 11.0, 13.0, 11.0, 13.0, 15.0, 10.0, 8.0, 10.0, 9.0, 8.0, 11.0, 13.0, 11.0, 8.0, 11.0, 15.0, 11.0, 12.0, 17.0, 12.0, 11.0, 18.0, 14.0, 15.0, 16.0, 7.0, 11.0, 15.0, 16.0, 13.0, 13.0, 13.0, 0.0, 11.0, 15.0, 14.0, 11.0, 13.0, 16.0, 14.0, 12.0, 8.0, 13.0, 13.0, 14.0, 7.0, 10.0, 16.0, 10.0, 13.0, 10.0, 14.0, 8.0, 16.0, 13.0, 12.0, 14.0, 12.0, 14.0, 16.0, 15.0, 13.0, 13.0, 10.0, 14.0, 8.0, 10.0, 10.0, 11.0, 12.0, 10.0, 12.0, 14.0, 17.0, 13.0, 14.0, 16.0, 15.0, 13.0, 16.0, 9.0, 16.0, 15.0, 11.0, 11.0, 15.0, 14.0, 12.0, 15.0, 11.0, 16.0, 14.0, 14.0, 15.0, 14.0, 14.0, 14.0, 16.0, 15.0, 12.0, 12.0, 14.0, 15.0, 13.0, 14.0, 13.0, 17.0, 14.0, 13.0, 14.0, 13.0, 13.0, 12.0, 10.0, 15.0, 14.0, 12.0, 12.0, 14.0, 12.0, 14.0, 13.0, 15.0, 13.0, 14.0, 14.0, 12.0, 11.0, 15.0, 14.0, 14.0, 10.0], 'EOY': [16.0, 16.0, 16.0, 14.0, 10.0, 14.0, 16.0, 14.0, 15.0, 15.0, 15.0, 11.0, 11.0, 15.0, 10.0, 14.0, 17.0, 14.0, 9.0, 15.0, 14.0, 16.0, 14.0, 13.0, 11.0, 13.0, 12.0, 14.0, 15.0, 13.0, 14.0, 15.0, 12.0, 19.0, 9.0, 13.0, 11.0, 14.0, 17.0, 17.0, 14.0, 13.0, 14.0, 10.0, 16.0, 15.0, 12.0, 11.0, 12.0, 14.0, 15.0, 10.0, 15.0, 14.0, 14.0, 15.0, 18.0, 15.0, 10.0, 10.0, 15.0, 15.0, 13.0, 15.0, 19.0, 13.0, 18.0, 20.0, 21.0, 17.0, 18.0, 17.0, 18.0, 17.0, 12.0, 16.0, 15.0, 18.0, 19.0, 17.0, 20.0, 11.0, 18.0, 19.0, 11.0, 12.0, 17.0, 20.0, 17.0, 15.0, 13.0, 18.0, 14.0, 17.0, 12.0, 12.0, 16.0, 12.0, 14.0, 15.0, 14.0, 10.0, 20.0, 13.0, 18.0, 20.0, 11.0, 20.0, 17.0, 20.0, 13.0, 17.0, 15.0, 18.0, 14.0, 13.0, 13.0, 18.0, 10.0, 13.0, 12.0, 18.0, 20.0, 20.0, 16.0, 18.0, 15.0, 20.0, 22.0, 18.0, 21.0, 18.0, 18.0, 18.0, 17.0, 16.0, 19.0, 16.0, 20.0, 19.0, 19.0, 20.0, 20.0, 14.0, 18.0, 20.0, 20.0, 18.0, 16.0, 21.0, 20.0, 18.0, 15.0, 14.0, 17.0, 19.0, 21.0, 14.0, 18.0, 15.0, 18.0, 21.0, 19.0, 17.0, 16.0, 16.0, 15.0, 20.0, 19.0, 16.0, 21.0, 17.0, 19.0, 15.0, 18.0, 20.0, 18.0, 20.0, 18.0, 16.0, 16.0]}
df = pd.DataFrame(data)
# replace numbers with categorical labels; could also create new columns
labels = ['Remedial', 'Below Proficient', 'Proficient', 'Advanced']
bins = [1, 11, 13, 15, np.inf]
df['BOY'] = pd.cut(x=df.BOY, labels=labels, bins=bins, right=True)
df['EOY'] = pd.cut(x=df.EOY, labels=labels, bins=bins, right=True)
# melt the relevant columns into a long form
dfm = df.melt(var_name='Tested', value_name='Proficiency')
# set the categorical label order, which makes the xaxis labels print in the specific order
dfm['Proficiency'] = pd.Categorical(dfm['Proficiency'], labels, ordered=True)
Groupby, Percent Calculation, and Shape for Plotting
# groupby and get the value counts
dfg = dfm.groupby('Tested')['Proficiency'].value_counts().reset_index(level=1, name='Size').rename({'level_1': 'Proficiency'}, axis=1)
# divide by the Tested value counts to get the percent
dfg['percent'] = dfg['Size'].div(dfm.Tested.value_counts()).mul(100).round(1)
# reshape to plot
dfp = dfg.reset_index().pivot(index='Proficiency', columns='Tested', values='percent')
# display(dfp)
Tested BOY EOY
Proficiency
Remedial 34.8 9.9
Below Proficient 28.7 12.7
Proficient 27.1 25.4
Advanced 8.8 51.9
Plot
ax = dfp.plot(kind='bar', figsize=(15, 5), rot=0, color=['orchid', 'teal'])
# formatting
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
ax.set_ylabel('Students at Proficiency Level', fontsize=18)
ax.set_xlabel('')
ax.set_title('Bushwick Middle Change in Proficiency Levels', fontsize=25)
ax.set_xticklabels(ax.get_xticklabels(), fontsize=25)
ax.legend(fontsize=25)
_ = plt.yticks(fontsize=15)
# add bar labels
for p in ax.containers:
ax.bar_label(p, fmt='%.1f%%', label_type='edge', fontsize=12)
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.2)
See the bar labels match dfp
I want to illustrate nicely how often (y-axis) a certain output (x-axis) occurs...
My code produces following plot:
It's not good, because the values are rounded to integers apparently, e.g., there are not over a 100 outputs with 100%, but actually most of them are 99% I think.
The code:
#!/usr/bin/env python3
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
trajectoryIds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0]
avgSolutionPercentages = [20.6256, 99.1448, 15.6764, 21.8231, 16.3733, 17.7502, 20.0055, 86.6873, 11.3105, 15.6693, 10.3449, 81.8921, 11.6745, 92.6031, 11.8787, 23.0229, 37.9636, 2.3903, 15.1727, 14.7088, 10.0426, 59.6758, 8.0042, 12.4174, 10.0585, 46.0567, 90.2376, 98.3273, 52.8645, 49.3027, 62.4136, 32.6199, 19.0642, 10.3319, 74.6157, 22.5771, 22.4118, 11.2017, 16.5053, 11.2021, 30.8376, 24.5255, 83.1072, 10.1529, 14.3991, 46.3459, 16.2137, 4.5773, 44.9549, 1.0719, 76.5605, 42.6589, 13.6209, 34.2856, 1.3574, 29.0465, 66.8146, 16.4796, 32.9564, 62.0732, 3.7047, 13.8828, 31.6088, 60.1141, 3.3247, 45.0796, 13.7862, 26.4498, 93.6806, 10.3245, 62.5157, 10.9833, 42.5908, 37.3208, 27.4115, 84.1648, 13.9058, 13.9065, 67.8918, 27.9075, 3.6116, 10.9091, 41.0988, 24.2177, 50.2762, 61.3869, 15.5915, 27.6536, 0.7993, 22.9483, 22.3393, 88.1832, 25.1604, 18.3625, 15.7212, 56.9646, 4.0434, 11.8431, 56.0613, 32.5472, 97.8757, 21.8233, 14.8162, 38.8259, 20.5676, 72.7201, 17.7987, 35.8117, 15.1699, 17.0359, 14.0621, 35.9655, 11.9095, 10.5691, 23.3259, 16.1746, 10.1936, 12.5084, 24.1494, 16.4727, 21.0687, 15.7495, 28.8929, 11.0135, 13.3133, 14.6639, 50.1304, 21.0346, 5.1604, 53.5107, 20.0712, 41.5111, 12.1633, 74.3263, 17.7904, 17.1684, 25.3977, 21.5871, 21.9332, 22.6674, 36.6634, 99.1179, 15.3213, 16.3999, 12.0147, 57.5163, 4.2062, 17.3874, 10.7132, 17.4919, 17.8457, 29.3538, 26.1468, 75.1234, 16.4368, 21.6191, 61.1394, 12.9972, 73.5746, 72.5788, 41.6835, 39.9912, 20.1648, 11.7097, 11.5203, 36.7387, 5.0694, 30.8129, 12.0922, 22.5419, 12.3569, 54.6776, 28.3561, 26.1219, 44.7455, 1.3281, 46.5064, 13.6016, 23.5483, 11.7151, 44.3669, 3.2577, 75.0943, 10.8634, 14.8226, 45.7661, 19.7319, 30.7981, 3.5965, 47.8161, 14.5996, 39.4484, 13.0693, 24.9947, 97.4253, 76.7901, 73.1183, 4.0922]
solutionPercentages = [99.2537, 99.8467, 96.4718, 99.6637, 99.6633, 97.1289, 9.7373, 99.5126, 97.3251, 96.0545, 99.6756, 75.6587, 61.1496, 96.7575, 97.1969, 96.5258, 99.7409, 99.8641, 99.8821, 98.5401, 99.7833, 99.6314, 99.7899, 99.9117, 99.5754, 99.5868, 99.7919, 99.9127, 0.0001, 99.7297, 40.8438, 99.8559, 99.6591, 99.8917, 99.3622, 0.0001, 0.0001, 99.4828, 0.0001, 99.8559, 0.0001, 0.0001, 99.6714, 9.9635, 99.8744, 93.8854, 67.3692, 96.3229, 98.4899, 66.9173, 98.2533, 99.8318, 73.9904, 99.8431, 6.2614, 97.2776, 96.0938, 71.9457, 99.9211, 96.1596, 99.8405, 99.6314, 95.4566, 98.4786, 99.8217, 96.1014, 99.0391, 94.6034, 99.8403, 99.9093, 9.8096, 97.8549, 98.7041, 19.9098, 86.3154, 21.5302, 99.2769, 99.0496, 99.7266, 99.8602, 86.7925, 96.3197, 99.9226, 9.4447, 97.9722, 50.4884, 92.2358, 87.4311, 74.2156, 97.8819, 93.2483, 96.3186, 77.9828, 80.2446, 47.1835, 40.8011, 90.5123, 85.7852, 9.8074, 95.9032, 98.5906, 12.5081, 97.0264, 9.9166, 73.6486, 97.8634, 8.4403, 97.7592, 97.9933, 95.8486, 49.7977, 95.1031, 76.1712, 96.1552, 89.0059, 79.6172, 96.7383, 90.8518, 95.8096, 98.2061, 96.3314, 97.5753, 97.9857, 9.0739, 66.9977, 86.5744, 76.8124, 8.6195, 81.3285, 91.0891, 87.3345, 65.3729, 86.7354, 89.9558, 3.1401, 83.4993, 75.1529, 83.5419, 78.3002, 89.8564, 82.2419, 19.3794, 88.2163, 87.9032, 97.8686, 95.0742, 12.3542, 84.7324, 99.4753, 76.1753, 99.5386, 99.8664, 85.7785, 9.9933, 99.7167, 99.9328, 74.4693, 99.7531, 99.0579, 99.5994, 99.7785, 19.2743, 54.7251, 91.7269, 99.5033, 98.9247, 97.6214, 0.0001, 97.7027, 98.6832, 98.4691, 98.9759, 99.7087, 99.9244, 99.4908, 82.1103, 67.6125, 78.2363, 93.5725, 91.5612, 99.8865, 68.5426, 79.0635, 76.8951, 99.3555, 98.9196, 6.1157, 75.8655, 83.8525, 86.1269, 83.3388, 96.1854, 87.1961, 81.7453, 9.2689, 95.2765, 9.0809, 99.8599]
avgSuccess = sum(avgSolutionPercentages)/len(trajectoryIds)
y = solutionPercentages
#Plot
fig, ax = plt.subplots()
ax.hist(y)
ax.set_ylabel('Number of Motions (Total: '+ str(len(trajectoryIds)) + ')')
ax.set_xlabel('Planning Solution (%)')
ax.set_title('Planning Success Rate (Avg: ' + str(round(avgSuccess,2)) + '%)')
plt.legend(loc='upper left')
plt.show()
So I found out how to make the values on the x axis more precise: I changed ax.hist(y) to ax.hist(y, bins = 1000). But that didn't really work out well either:
So now I need to:
get rid of the empty space between my bars (is there a way to get rid of these empty x values?)
while keeping all bars at the same width
change the precision anytime, e.g., from 1 to 0,01 step for each bar
Just any suggestions on how to make the plot (and code) look better are much appreciated :) Maybe it's not the .hist function that's best for this...but I don't know any better - failed doing this with a bar chart so far :(
How about something like
#!/usr/bin/env python3
import matplotlib.pyplot as plt
import numpy as np
trajectoryIds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0, 99.0, 100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 113.0, 114.0, 115.0, 116.0, 117.0, 118.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0, 128.0, 129.0, 130.0, 131.0, 132.0, 133.0, 134.0, 135.0, 136.0, 137.0, 138.0, 139.0, 140.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0, 153.0, 154.0, 155.0, 156.0, 157.0, 158.0, 159.0, 160.0, 161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0, 169.0, 170.0, 171.0, 172.0, 173.0, 174.0, 175.0, 176.0, 177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0, 185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0, 193.0, 194.0, 195.0, 196.0, 197.0, 198.0]
avgSolutionPercentages = [20.6256, 99.1448, 15.6764, 21.8231, 16.3733, 17.7502, 20.0055, 86.6873, 11.3105, 15.6693, 10.3449, 81.8921, 11.6745, 92.6031, 11.8787, 23.0229, 37.9636, 2.3903, 15.1727, 14.7088, 10.0426, 59.6758, 8.0042, 12.4174, 10.0585, 46.0567, 90.2376, 98.3273, 52.8645, 49.3027, 62.4136, 32.6199, 19.0642, 10.3319, 74.6157, 22.5771, 22.4118, 11.2017, 16.5053, 11.2021, 30.8376, 24.5255, 83.1072, 10.1529, 14.3991, 46.3459, 16.2137, 4.5773, 44.9549, 1.0719, 76.5605, 42.6589, 13.6209, 34.2856, 1.3574, 29.0465, 66.8146, 16.4796, 32.9564, 62.0732, 3.7047, 13.8828, 31.6088, 60.1141, 3.3247, 45.0796, 13.7862, 26.4498, 93.6806, 10.3245, 62.5157, 10.9833, 42.5908, 37.3208, 27.4115, 84.1648, 13.9058, 13.9065, 67.8918, 27.9075, 3.6116, 10.9091, 41.0988, 24.2177, 50.2762, 61.3869, 15.5915, 27.6536, 0.7993, 22.9483, 22.3393, 88.1832, 25.1604, 18.3625, 15.7212, 56.9646, 4.0434, 11.8431, 56.0613, 32.5472, 97.8757, 21.8233, 14.8162, 38.8259, 20.5676, 72.7201, 17.7987, 35.8117, 15.1699, 17.0359, 14.0621, 35.9655, 11.9095, 10.5691, 23.3259, 16.1746, 10.1936, 12.5084, 24.1494, 16.4727, 21.0687, 15.7495, 28.8929, 11.0135, 13.3133, 14.6639, 50.1304, 21.0346, 5.1604, 53.5107, 20.0712, 41.5111, 12.1633, 74.3263, 17.7904, 17.1684, 25.3977, 21.5871, 21.9332, 22.6674, 36.6634, 99.1179, 15.3213, 16.3999, 12.0147, 57.5163, 4.2062, 17.3874, 10.7132, 17.4919, 17.8457, 29.3538, 26.1468, 75.1234, 16.4368, 21.6191, 61.1394, 12.9972, 73.5746, 72.5788, 41.6835, 39.9912, 20.1648, 11.7097, 11.5203, 36.7387, 5.0694, 30.8129, 12.0922, 22.5419, 12.3569, 54.6776, 28.3561, 26.1219, 44.7455, 1.3281, 46.5064, 13.6016, 23.5483, 11.7151, 44.3669, 3.2577, 75.0943, 10.8634, 14.8226, 45.7661, 19.7319, 30.7981, 3.5965, 47.8161, 14.5996, 39.4484, 13.0693, 24.9947, 97.4253, 76.7901, 73.1183, 4.0922]
solutionPercentages = [99.2537, 99.8467, 96.4718, 99.6637, 99.6633, 97.1289, 9.7373, 99.5126, 97.3251, 96.0545, 99.6756, 75.6587, 61.1496, 96.7575, 97.1969, 96.5258, 99.7409, 99.8641, 99.8821, 98.5401, 99.7833, 99.6314, 99.7899, 99.9117, 99.5754, 99.5868, 99.7919, 99.9127, 0.0001, 99.7297, 40.8438, 99.8559, 99.6591, 99.8917, 99.3622, 0.0001, 0.0001, 99.4828, 0.0001, 99.8559, 0.0001, 0.0001, 99.6714, 9.9635, 99.8744, 93.8854, 67.3692, 96.3229, 98.4899, 66.9173, 98.2533, 99.8318, 73.9904, 99.8431, 6.2614, 97.2776, 96.0938, 71.9457, 99.9211, 96.1596, 99.8405, 99.6314, 95.4566, 98.4786, 99.8217, 96.1014, 99.0391, 94.6034, 99.8403, 99.9093, 9.8096, 97.8549, 98.7041, 19.9098, 86.3154, 21.5302, 99.2769, 99.0496, 99.7266, 99.8602, 86.7925, 96.3197, 99.9226, 9.4447, 97.9722, 50.4884, 92.2358, 87.4311, 74.2156, 97.8819, 93.2483, 96.3186, 77.9828, 80.2446, 47.1835, 40.8011, 90.5123, 85.7852, 9.8074, 95.9032, 98.5906, 12.5081, 97.0264, 9.9166, 73.6486, 97.8634, 8.4403, 97.7592, 97.9933, 95.8486, 49.7977, 95.1031, 76.1712, 96.1552, 89.0059, 79.6172, 96.7383, 90.8518, 95.8096, 98.2061, 96.3314, 97.5753, 97.9857, 9.0739, 66.9977, 86.5744, 76.8124, 8.6195, 81.3285, 91.0891, 87.3345, 65.3729, 86.7354, 89.9558, 3.1401, 83.4993, 75.1529, 83.5419, 78.3002, 89.8564, 82.2419, 19.3794, 88.2163, 87.9032, 97.8686, 95.0742, 12.3542, 84.7324, 99.4753, 76.1753, 99.5386, 99.8664, 85.7785, 9.9933, 99.7167, 99.9328, 74.4693, 99.7531, 99.0579, 99.5994, 99.7785, 19.2743, 54.7251, 91.7269, 99.5033, 98.9247, 97.6214, 0.0001, 97.7027, 98.6832, 98.4691, 98.9759, 99.7087, 99.9244, 99.4908, 82.1103, 67.6125, 78.2363, 93.5725, 91.5612, 99.8865, 68.5426, 79.0635, 76.8951, 99.3555, 98.9196, 6.1157, 75.8655, 83.8525, 86.1269, 83.3388, 96.1854, 87.1961, 81.7453, 9.2689, 95.2765, 9.0809, 99.8599]
avgSuccess = sum(avgSolutionPercentages)/len(trajectoryIds)
y = solutionPercentages
BIN_COUNT = 15
BAR_WIDTH = 0.75
fig, ax = plt.subplots()
# use numpy histogram so we can perform filtering
hist, bin_edges = np.histogram(y, bins=BIN_COUNT)
# so we can remove bins with zero entries
non_zero = np.nonzero(hist)
# take only entries where bin is non-zero
hist = hist[non_zero]
bin_edges = bin_edges[non_zero]
# generate labels based on bin edge values (maybe use centers?)
x_ticks = [str(int(edge)) for edge in bin_edges]
indices = np.arange(len(bin_edges))
plt.bar(indices, hist, BAR_WIDTH, align='center')
plt.xticks(indices, x_ticks)
ax.set_ylabel('Number of Motions (Total: '+ str(len(trajectoryIds)) + ')')
ax.set_xlabel('Planning Solution (%)')
ax.set_title('Planning Success Rate (Avg: ' + str(round(avgSuccess,2)) + '%)')
plt.show()
which produces the plot
You may use some nonlinear dependence of the bin width, e.g.
b = 5
bins = (np.linspace(np.min(y)**b, np.max(y)**b))**(1/b)
fig, ax = plt.subplots()
ax.hist(y, bins=bins, edgecolor="k")
Or you may define the bins completely customized, e.g. use a bin width of 10 up to 60 and then use a bin width of 5 till 90, finally use a bin with of 1 till 100.
bins = np.concatenate((np.linspace(0,60,7),
np.linspace(60,90,7),
np.linspace(90,100,11)))
fig, ax = plt.subplots()
ax.hist(y, bins=bins, edgecolor="k")
I'm trying to work with interp1d of SciPy.interpolate. I "plugged in" two arrays (filtered_mass and integrated_column), of same size, but it still give me ValueError that the sizes of the arrays must be equal. How can it be?
This is the code I'm using in this part:
def interp_integrated_column(self, definition):
''' (string) -> interpolated_function(mass)
This functions output the interpolated value of the integrated columns
as function of the mass of the WIMP (mDM)
'''
print self.filtered_mass_array
print "len(filtered_mass)", len(self.filtered_mass_array) , "len(integrated_column)", len(self.integrated_columns_values[definition])
print self.integrated_columns_values[definition]
interpolated_values = interp1d(self.filtered_mass_array, self.integrated_columns_values[definition])
return interpolated_values
This is the error message:
[5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 180.0, 200.0, 220.0, 240.0, 260.0, 280.0, 300.0, 330.0, 360.0, 400.0, 450.0, 500.0, 550.0, 600.0, 650.0, 700.0, 750.0, 800.0, 900.0, 1000.0, 1100.0, 1200.0, 1300.0, 1500.0, 1700.0, 2000.0, 2500.0, 3000.0, 4000.0, 5000.0, 6000.0, 7000.0, 8000.0, 9000.0, 10000.0, 12000.0, 15000.0, 20000.0, 30000.0, 50000.0, 100000.0]
len(filtered_mass) 62 len(integrated_column) 62
[[2.8855960615102004e-05], [4.0701386519793902e-05], [6.6563800907013242e-05], [0.0001006393622421269], [0.00019862657113084296], [0.00032843266928887332], [0.00046438711039847576], [0.00060420820026262198], [0.00091858847275374405], [0.0012828446411529174], [0.0016307748004155418], [0.0020049092489578773], [0.0023859804990953733], [0.0027809435562397089], [0.0031914945950108709], [0.0036198713189993367], [0.004049356593219729], [0.058652386100581579], [0.080971818217450073], [0.10330986231789899], [0.13710341994459613], [0.20188314005754618], [0.2891914189026335], [0.37721295733783522], [0.47493929411417846], [0.57539389630897464], [0.70805980165022075], [0.85872215884312952], [1.0664252638663609], [1.2783399280844934], [1.564710616680836], [2.0375181832882485], [2.5037792909103884], [2.9693614352642328], [3.4461139299681416], [3.9753240755452568], [4.5112890074931942], [5.0575238552577968], [5.6116617190278557], [6.75034712149598], [7.9290625424458492], [9.1455816114675219], [10.393026346405367], [14.442148067840661], [18.539929482157905], [22.594593494117799], [28.852213268263831], [39.804824036584456], [51.348027754488449], [83.695041150108111], [118.92653801185628], [155.17895505284363], [192.83930746140334], [231.78928736553948], [271.95372644243321], [313.16712050353419], [398.50142684880342], [532.55760945531256], [768.84170621340957], [1276.9057251660611], [2387.368055624514], [5476.4080305101643]]
Traceback (most recent call last):
File "data_mining.py", line 8, in <module>
e_int = nu_e.interp_integrated_column('e')
File "/home/ohm/projects/mucalc/PPPC4DMID_Reader.py", line 121, in interp_integrated_column
interpolated_values = interp1d(self.filtered_mass_array, self.integrated_columns_values[definition])
File "/usr/lib/python2.7/dist-packages/scipy/interpolate/interpolate.py", line 278, in __init__
raise ValueError("x and y arrays must be equal in length along "
ValueError: x and y arrays must be equal in length along interpolation axis.
Your two lists both have length 62, but they have different shapes interpreted as numpy arrays:
>>> a = [5.0, 6.0, 8.0, 10.0, 15.0, 20.0, 25.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 180.0, 200.0, 220.0, 240.0, 260.0, 280.0, 300.0, 330.0, 360.0, 400.0, 450.0, 500.0, 550.0, 600.0, 650.0, 700.0, 750.0, 800.0, 900.0, 1000.0, 1100.0, 1200.0, 1300.0, 1500.0, 1700.0, 2000.0, 2500.0, 3000.0, 4000.0, 5000.0, 6000.0, 7000.0, 8000.0, 9000.0, 10000.0, 12000.0, 15000.0, 20000.0, 30000.0, 50000.0, 100000.0]
>>> b = [[2.8855960615102004e-05], [4.0701386519793902e-05], [6.6563800907013242e-05], [0.0001006393622421269], [0.00019862657113084296], [0.00032843266928887332], [0.00046438711039847576], [0.00060420820026262198], [0.00091858847275374405], [0.0012828446411529174], [0.0016307748004155418], [0.0020049092489578773], [0.0023859804990953733], [0.0027809435562397089], [0.0031914945950108709], [0.0036198713189993367], [0.004049356593219729], [0.058652386100581579], [0.080971818217450073], [0.10330986231789899], [0.13710341994459613], [0.20188314005754618], [0.2891914189026335], [0.37721295733783522], [0.47493929411417846], [0.57539389630897464], [0.70805980165022075], [0.85872215884312952], [1.0664252638663609], [1.2783399280844934], [1.564710616680836], [2.0375181832882485], [2.5037792909103884], [2.9693614352642328], [3.4461139299681416], [3.9753240755452568], [4.5112890074931942], [5.0575238552577968], [5.6116617190278557], [6.75034712149598], [7.9290625424458492], [9.1455816114675219], [10.393026346405367], [14.442148067840661], [18.539929482157905], [22.594593494117799], [28.852213268263831], [39.804824036584456], [51.348027754488449], [83.695041150108111], [118.92653801185628], [155.17895505284363], [192.83930746140334], [231.78928736553948], [271.95372644243321], [313.16712050353419], [398.50142684880342], [532.55760945531256], [768.84170621340957], [1276.9057251660611], [2387.368055624514], [5476.4080305101643]]
>>> np.asarray(a).shape
(62,)
>>> np.asarray(b).shape
(62, 1)
You'll want to make your second array 1D, not 2D. There are roughly a quadrillion ways to do this in numpy, but one is to use .squeeze(), which removes single-dimensional axes:
>>> a = np.asarray(a)
>>> b = np.asarray(b).squeeze()
>>> b.shape
(62,)
after which:
>>> from scipy.interpolate import interp1d
>>> i = interp1d(a,b)
>>> i(2123)
array(31.546555517270704)