Matplotlib's bar chart displays uneven bars

Matplotlib's bar chart displays uneven bars - python

If we look at this code and x,y data,
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],[1, 2, 4, 10, 5, 9, 1,4, 9, 9],edgecolor='black')
plt.xlabel('Sample Mean')
plt.ylabel('Probability')
this displays the following graph
I can not understand how the x values go beyond 1 and even takes negative values. Also, why do the bars have different widths?

The problem is that your x-values are separated by a spacing of 0.1 and the default bar width is 1 so you see overlapping bars. The solution is to define the bar width. In your case, a bar width smaller than 0.1 will work perfectly fine. For instance, you can use width=0.05 and you will get the following graph.
Why negative?: The bars are by default centered at 0, 1, 2, 3 and so on. So your first bar in the question was drawn centered at 0 and had a width of 1. That's why it was spanning from -0.5 to +0.5.
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],
[1, 2, 4, 10, 5, 9, 1,4, 9, 9], width=0.05, edgecolor='black')
plt.xlabel('Sample Mean')
plt.ylabel('Probability')
If you don't want bars at x<0: You can align your bars to the right by passing argument align='edge.
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],
[1, 2, 4, 10, 5, 9, 1,4, 9, 9], width=0.05, align='edge',
edgecolor='black')

Related

is it possible to add x_ticks to pywaffle

i was wondering if and how i can add x axis label to pywaffle.
value1 = new_df['value1'].tolist()
new_list = [i+1 for i in range(len(value1))]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1), # Either rows or columns could be omitted
values=value1,
title = {"label": name, "loc": "left"},
)
plt.savefig("plot.png", bbox_inches="tight")
my value1 values are [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
i will like every column to be labeld

Yes, it is possible to add ticks etc.
A waffle chart with limited number of columns
But it is a bit unclear what your final goal is. By default, a waffle charts draws as many squares as each of the values indicates. So, if the values are [1, 2, 3, 4, 5, 6], and the color ['red', 'orange', 'blue', 'gold', 'green', 'purple'], there would be 1 red square, 2 oranges, 3 blues, 4 yellows, 5 greens and 6 purples.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
#columns=sum(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
If you set the number of rows and columns so their product is smaller than 21, each of the values will be reduced more or less proportionally, but still be an integer. In the current example, the red one goes suppressed, the orange, blue, yellow and green get reduced to 1, and the green gets reduced to 2 squares. This makes it unclear which label you want to put where.
value1 = [1, 2, 3, 4, 5, 6]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
colors=['red','orange','blue','gold','green','purple']
)
Adding x ticks
To add ticks to a waffle chart, you can turn the axes on. To position the ticks, you need to know that the squares have a width of 1, and a default distance of 0.2. So, the first tick comes at 0.5, the next one at 1+0.2+0.5, etc. Optionally, you can remove spines and the dummy y ticks.
import matplotlib.pyplot as plt
from pywaffle import Waffle
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
fig = plt.figure(
FigureClass=Waffle,
rows=1,
columns=len(value1),
values=value1,
title={"label": 'title', "loc": "left"},
figsize=(15,3),
)
plt.axis('on')
plt.yticks([])
plt.xticks([i * 1.2 + 0.5 for i in range(len(value1))], value1)
for sp in ['left', 'right', 'top']:
plt.gca().spines[sp].set_visible(False)
plt.show()
A Seaborn heatmap
Instead of a waffle chart, you could create a heatmap. Then, each square will get a color corresponding to the given values. Optionally, these values (or another string) can be shown as annotation or as x tick label.
import matplotlib.pyplot as plt
import seaborn as sns
value1 = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
plt.figure(figsize=(15, 3))
ax = sns.heatmap(data=[value1], xticklabels=value1, yticklabels=False,
annot=True, square=True, linewidths=1.5, cbar=False)
ax.set_title('title', loc='left')
plt.tight_layout()
plt.show()

# Remove borders, ticks, etc.
ax.axis("off")
saw this in pywaffle.py, so i dont think adding axis is possible.

How to plot scatter graph with SCATTER fill_between in Python?

I am a manufacturing engineer, very new to Python and Matplotlib. Currently, I am trying to plot a scatter time graph, where for every single record, I have the data (read from a sensor) and upper and lower limits for that data that will stop the tool if data is not between them.
So for a simple set of data like this:
time = [1, 2, 3, 7, 8, 9, 10]*
data = [5, 6, 5, 5, 6, 7, 8]
lower_limit = [4, 4, 5, 5, 5, 5, 5]
upper_limit = [6, 6, 6, 7, 7, 7, 7]
When the tool is not working, nothing will be recorded, hence a gap b/w 3 & 7 in time records.
The desired graph would look like this:
A few rules that I am trying to stick to:
All three graphs (data, upper_limit, and lower_limit) are required to be scattered points and not lines, with the x-axis (time) being shared among them. - required.
A green highlight that fills between upper and lower limits, considering only the two points with the same time for each highlight. - highly recommended.
(I tried matplotlib.fill_between, but it creates a polygon between trend lines, rather than straight vertical lines between matching pairs of L.L. & U.L. dots. Therefore, it won't be accurate, and it will fill up the gap b/w times 3s and 7s, which is not desired. Also, I tried to use matplot.bar for limits along the scatter plot for the 'data', but I was not able to set a minimum = lower_limit for the bars.)
When the value of data is not equal to or between the limits, the representing dot should appear in red, rather than the original color. -highly recommended.
So, with all of that in mind, and thousands of records per day, a regular graph, for a 24hr time span, should look like the following: (notice the gap due to possible lack of records in a time span, as well as vertical green lines, for the limits.)
Thanks for your time and help!

This is a version using numpys masking and matplotlibs errorbar
import matplotlib.pyplot as plt
import numpy as np
time = np.array( [0, 1, 2, 3, 7, 8, 9, 10] )
data = np.array([2, 5, 6, 5, 5, 6, 7, 8] )
lower = np.array([4, 4, 4, 5, 5, 5, 5, 5] )
upper = np.array([6, 6, 6, 6, 7, 7, 7, 7] )
nn = len( lower )
delta = upper - lower
### creating masks
inside = ( ( upper - data ) >= 0 ) & ( ( data - lower ) >= 0 )
outside = np.logical_not( inside )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.errorbar( time, lower, yerr=( nn*[0], delta), ls='', ecolor="#00C023" )
ax.scatter( time[ inside ], data[ inside ], c='k' )
ax.scatter( time[ outside ], data[ outside ], c='r' )
plt.show()

Something like this should work, plotting each component separately:
time = [1, 2, 3, 7, 8, 9, 10]
data = [5, 6, 5, 5, 6, 7, 8]
lower_limit = [4, 4, 5, 5, 5, 5, 5]
upper_limit = [6, 6, 6, 7, 7, 7, 7]
# put data into dataframe and identify which points are out of range (not between the lower and upper limit)
df = pd.DataFrame({'time': time, 'data': data, 'll': lower_limit, 'ul': upper_limit})
df.loc[:, 'in_range'] = 0
df.loc[((df['data'] >= df['ll']) & (df['data'] <= df['ul'])), 'in_range'] = 1
# make the plot
fig, ax = plt.subplots()
# plot lower-limit and upper-limit points
plt.scatter(df['time'], df['ll'], c='green')
plt.scatter(df['time'], df['ul'], c='green')
# plot data points in range
plt.scatter(df.loc[df['in_range']==1, :]['time'], df.loc[df['in_range']==1, :]['data'], c='black')
# plot data points out of range (in red)
plt.scatter(df.loc[df['in_range']==0, :]['time'], df.loc[df['in_range']==0, :]['data'], c='red')
# plot lines between lower limit and upper limit
plt.plot((df['time'],df['time']),([i for i in df['ll']], [j for j in df['ul']]), c='lightgreen')

Histogram Bars not Centred over xticks in pyplot.hist

I guess I just didn't use the right keywords, because this probably has been asked before, but I didn't find a solution. Anyway, I have a problem where the the bars of a histogram do not line up with the xticks. I want the bars to be centred over the xticks they correspond to, but they get placed between ticks to fill the space in-between evenly.
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, rwidth = .3)
plt.xticks(bins)
plt.show()

Note that what you are plotting here is not a histogram. A histogram would be
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, edgecolor="k", alpha=1)
plt.xticks(bins)
plt.show()
Here, the bars range between the bins as expected. E.g. you have 3 values in the interval 1 <= x < 1.5.
Conceptually what you want to do here is get a bar plot of the counts of data values. This would not require any bins at all and could be done as follows:
import numpy as np
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
u, inv = np.unique(data, return_inverse=True)
counts = np.bincount(inv)
plt.bar(u, counts, width=0.3)
plt.xticks(np.arange(1,10,0.5))
plt.show()
Of course you can "misuse" a histogram plot to get a similar result. This would require to move the center of the bar to the left bin edge, plt.hist(.., align="left").
import matplotlib.pyplot as plt
data = [1, 1, 1, 1.5, 2, 4, 4, 4, 4, 4.5, 5, 6, 6.5, 7, 9,9, 9.5]
bins = [x+n for n in range(1, 10) for x in [0.0, 0.5]]+[10.0]
plt.hist(data, bins, align="left", rwidth = .6)
plt.xticks(bins)
plt.show()
This results in the same plot as above.

matplotlib, pyplot : custom color for a specific data value

I am generating a heat map for my data.
everything works fine, but I have a little problem. My data (numbers) are from 0 to 10.000.
0 means nothing (no data) and at the moment the field with 0 just take the lowest color of my color scala. My problem is how to make the data with 0 to have a total different color (e.g. black or white)
Just see the Picture to better understand what i mean:
My code (snippet) looks like this:
matplotlib.pyplot.imshow(results, interpolation='none')
matplotlib.pyplot.colorbar();
matplotlib.pyplot.xticks([0, 1, 2, 3, 4, 5, 6, 7, 8], [10, 15, 20, 25, 30, 35, 40, 45, 50]);
matplotlib.pyplot.xlabel('Population')
matplotlib.pyplot.yticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 'serial']);
matplotlib.pyplot.ylabel('Communication Step');
axis.xaxis.tick_top();
matplotlib.pyplot.savefig('./results_' + optimisationProblem + '_dim' + str(numberOfDimensions) + '_' + statisticType + '.png');
matplotlib.pyplot.close();

If you are not interested in a smooth transition between the values 0 and 0.0001, you can just set every value that equals 0 to NaN. This will result in a white color whereas 0.0001 will still be deep blue-ish.
In the following code I include an example. I generate the data randomly. I therefore select a single element from my array and set it to NaN. This results in the color white. I also included a line in which you can set every data point that equals 0 to NaN.
import numpy
import matplotlib.pyplot as plt
#Random data
data = numpy.random.random((10, 10))
#Set all data points equal to zero to NaN
#data[data == 0.] = float("NaN")
#Set single data value to nan
data[2][2] = float("NaN")
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.imshow(data, interpolation = "nearest")
plt.show()

Matplotlib many subplots xtick labels intercepting

I'm plotting many subplots in the same figure. I encounter the problem that xtick labels intercept one with each other. I do not want any space between the subplots.
Here is an example:
In particular I would like xtick labels not to be above/below the green lines, just like it happens at the points indicated with red squares.
One idea I had so far was, in a case where my max=4 and min=0, I'd draw tick labels for 1 2 and 3 at their respective locations, e.g 1,2,3. Then I'd draw 4 at the position 3.8 and 0 at the position 0.2. Any ideas?
thanks!

Not exactly what you asked for, but a quick solution is to set the alignment parameter:
pylab.xticks(..., horizontalalignment='left')
pylab.yticks(..., verticalalignment='bottom')
This will apply to all ticks.

This is how I would do it:
axScatter.set_xticks([0, 1, 2, 3, 4 ,5 ,6])
axScatter.set_yticks([-8, -6, -4, -2, 0, 2, 4, 6])
And you can use:
axScatter.yaxis.set_major_formatter(nullfmt)
To make the y axis labels disappear for the top right and bottom right plots.

The whole plt.figure routine should look something like this:
fig = plt.figure()
axplot_topleft = fig.add_subplot(2,2,1)
axplot_topleft.xaxis.set_major_formatter(nullfmt)
axplot_topleft.set_yticks([-8, -6, -4, -2, 0, 2, 4, 6])
axplot_topright = fig.add_subplot(2,2,2)
axplot_topright.xaxis.set_major_formatter(nullfmt)
axplot_topright.yaxis.set_major_formatter(nullfmt)
axplot_bottomleft = fig.add_subplot(2,2,3)
axplot_bottomleft.set_xticks([0, 1, 2, 3, 4 ,5 ,6])
axplot_bottomleft.set_yticks([-8, -6, -4, -2, 0, 2, 4, 6])
axplot_bottomright = fig.add_subplot(2,2,4)
axplot_bottomright.yaxis.set_major_formatter(nullfmt)
axplot_bottomright.set_xticks([0, 1, 2, 3, 4 ,5 ,6])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.