setting spacing between grouped bar plots in matplotlib - python

I'm trying to make a grouped bar plot in matplotlib, following the example in the gallery. I use the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(7,7), dpi=300)
xticks = [0.1, 1.1]
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
ind = arange(num_items)
width = 0.1
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
group_len = len(vals)
gene_rects = plt.bar(ind, vals, width,
align="center")
ind = ind + width
num_groups = len(group_labels)
# Make label centered with respect to group of bars
# Is there a less complicated way?
offset = (num_groups / 2.) * width
xticks = arange(num_groups) + offset
s.set_xticks(xticks)
print "xticks: ", xticks
plt.xlim([0 - width, max(xticks) + (num_groups * width)])
s.set_xticklabels(group_labels)
My questions are:
How can I control the space between the groups of bars? Right now the spacing is huge and it looks silly. Note that I do not want to make the bars wider - I want them to have the same width, but be closer together.
How can I get the labels to be centered below the groups of bars? I tried to come up with some arithmetic calculations to position the xlabels in the right place (see code above) but it's still slightly off... it feels a bit like writing a plotting library rather than using one. How can this be fixed? (Is there a wrapper or built in utility for matplotlib where this is default behavior?)
EDIT: Reply to #mlgill: thank you for your answer. Your code is certainly much more elegant but still has the same issue, namely that the width of the bars and the spacing between the groups are not controlled separately. Your graph looks correct but the bars are far too wide -- it looks like an Excel graph -- and I wanted to make the bar thinner.
Width and margin are now linked, so if I try:
margin = 0.60
width = (1.-2.*margin)/num_items
It makes the bar skinnier, but brings the group far apart, so the plot again does not look right.
How can I make a grouped bar plot function that takes two parameters: the width of each bar, and the spacing between the bar groups, and plots it correctly like your code did, i.e. with the x-axis labels centered below the groups?
I think that since the user has to compute specific low-level layout quantities like margin and width, we are still basically writing a plotting library :)

Actually I think this problem is best solved by adjusting figsize and width; here is my output with figsize=(2,7) and width=0.3:
By the way, this type of thing becomes a lot simpler if you use pandas wrappers (i've also imported seaborn, not necessary for the solution, but makes the plot a lot prettier and more modern looking in my opinion):
import pandas as pd
import seaborn
seaborn.set()
df = pd.DataFrame(groups, index=group_labels)
df.plot(kind='bar', legend=False, width=0.8, figsize=(2,5))
plt.show()

The trick to both of your questions is understanding that bar graphs in Matplotlib expect each series (G1, G2) to have a total width of "1.0", counting margins on either side. Thus, it's probably easiest to set margins up and then calculate the width of each bar depending on how many of them there are per series. In your case, there are two bars per series.
Assuming you left align each bar, instead of center aligning them as you had done, this setup will result in series which span from 0.0 to 1.0, 1.0 to 2.0, and so forth on the x-axis. Thus, the exact center of each series, which is where you want your labels to appear, will be at 0.5, 1.5, etc.
I've cleaned up your code as there were a lot of extraneous variables. See comments within.
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(7,7), dpi=300)
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
# This needs to be a numpy range for xdata calculations
# to work.
ind = np.arange(num_items)
# Bar graphs expect a total width of "1.0" per group
# Thus, you should make the sum of the two margins
# plus the sum of the width for each entry equal 1.0.
# One way of doing that is shown below. You can make
# The margins smaller if they're still too big.
margin = 0.05
width = (1.-2.*margin)/num_items
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
# The position of the xdata must be calculated for each of the two data series
xdata = ind+margin+(num*width)
# Removing the "align=center" feature will left align graphs, which is what
# this method of calculating positions assumes
gene_rects = plt.bar(xdata, vals, width)
# You should no longer need to manually set the plot limit since everything
# is scaled to one.
# Also the ticks should be much simpler now that each group of bars extends from
# 0.0 to 1.0, 1.0 to 2.0, and so forth and, thus, are centered at 0.5, 1.5, etc.
s.set_xticks(ind+0.5)
s.set_xticklabels(group_labels)

I read an answer that Paul Ivanov posted on Nabble that might solve this problem with less complexity. Just set the index as below. This will increase the spacing between grouped columns.
ind = np.arange(0,12,2)

Related

Setting the same x-scale but different x-limits for adjacent subplots matplotlib

I am trying to create a figure with three bar plots side by side. These bar plots have different yscales, but the data is fundamentally similar so I'd like all the bars to have the same width.
The only way I was able to get the bars to have the exact same width was by using sharex when creating the subplots, in order to keep the same x scale.
import matplotlib.pyplot as plt
BigData = [[100,300],[400,200]]
MediumData = [[40, 30],[50,20],[60,50],[30,30]]
SmallData = [[3,2],[11,3],[7,5]]
data = [BigData, MediumData, SmallData]
colors = ['#FC766A','#5B84B1']
fig, axs = plt.subplots(1, 3, figsize=(30,5), sharex=True)
subplot = 0
for scale in data:
for type in range(2):
bar_x = [x + type*0.2 for x in range(len(scale))]
bar_y = [d[type] for d in scale]
axs[subplot].bar(bar_x,bar_y, width = 0.2, color = colors[type])
subplot += 1
plt.show()
This creates this figure:
The problem with this is that the x-limits of the plot are also shared, leading to unwanted whitespace. I've tried setting the x-bounds after the fact, but it doesn't seem to override sharex. Is there a way to make the bars have the same width, without each subplot also being the same width?
Additionally, is there a way to create such a plot (one with different y scales to depending on the size of the data) without having to sort the data manually beforehand, like shown in my code?
Thanks!
Thanks to Jody Klymak for help finding this solution! I thought I should document it for future users.
We can make use of the 'width_ratios' GridSpec parameter. Unfortunately there's no way to specify these ratios after we've already drawn a graph, so the best way I found to implement this is to write a function that creates a dummy graph, and measures the x-limits from that graph:
def getXRatios(data, size):
phig, aks = plt.subplots(1, 3, figsize=size)
subplot = 0
for scale in data:
for type in range(2):
bar_x = [x + type*0.2 for x in range(len(scale))]
bar_y = [d[type] for d in scale]
aks[subplot].bar(bar_x,bar_y, width = 0.2)
subplot += 1
ratios = [aks[i].get_xlim()[1] for i in range(3)]
plt.close(phig)
return ratios
This is essentially identical to the code that creates the actual figure, with the cosmetic aspects removed, as all we want from this dummy figure is the x-limits of the graph (something we can't get from our actual figure as we need to define those limits before we start in order to solve the problem).
Now all you need to do is call this function when you're creating your subplots:
fig, axs = plt.subplots(1, 3, figsize=(40,5), gridspec_kw = {'width_ratios':getXRatios(data,(40,5))})
As long as your XRatio function creates your graph in the same way your actual graph does, everything should work! Here's my output using this solution.
To save space you could re-purpose the getXRatios function to also construct your final graph, by calling itself in the arguments and giving an option to return either the ratios or the final figure. I couldn't be bothered.

How to adjust height of individual sublots in seaborn heatmap

I have a heatmap using seaborn and am trying to adjust the height of the 4th plot below. You will see that it only has 2 rows of data vs the others that have more:
I have used the following code to create the plot:
f, ax = plt.subplots(nrows=4,figsize=(20,10))
cmap = plt.cm.GnBu_r
sns.heatmap(df,cbar=False,cmap=cmap,ax=ax[0])
sns.heatmap(df2,cbar=False,cmap=cmap,ax=ax[1])
sns.heatmap(df3,cbar=False,cmap=cmap,ax=ax[2])
sns.heatmap(df4,cbar=False,cmap=cmap,ax=ax[3])
Does anyone know the next step to essentially make the 4th plot smaller in height and thus stretching out the other 3? The 4th plot will generally always have 2-3 where as the others will have 6-7 most times. Thanks very much!
As normal, it is pretty funky/tedious with matplotlib. But here it is!
f = plt.figure(constrained_layout = True)
specs = f.add_gridspec(ncols = 1, nrows = 4, height_ratios = [1,1,1,.5])
for spec, df in zip(specs, (df, df2, df3, df4)):
ax = sns.heatmap(df,cbar=False,cmap=cmap, ax=f.add_subplot(spec))
You can change the heights relative to each other using the height_ratios. You could also implement a wdith_ratios parameter if you desired to change the relative widths. You could also implement a for loop to iterate over the graphing.

Horizontal bar chart that does not start at zero / displaying ranges

I am trying to visualize a set of frequency ranges for around 20 samples I have. What I want to do is a horizontal bar chart where each row represents one sample. The sample name is supposed to go on the left and on the right I want an x-axis with limits 0 and 150 kHz.
Now the ranges I have are something like (70.5, 95.5). Can I realize this with a horizontal bar chart or am I looking for a different type of chart?
Sorry that I can't provide an example, because I just got nothing so far. A bar chart just doesn't do what I want.
Edit: I basically want something like in this example but without the actual bars and with being able to enter my data for the error bars. As far as I know error bars can only work with errors relative to the "main data".
If I understand you correctly, you can do this with a simple errorbar plot (though it's a bit of a hack):
import numpy as np
import matplotlib.pyplot as plt
# 20 random samples
nsamples = 20
xmin, xmax = 0, 150
samples = np.random.random_sample((nsamples,2)) * (xmax-xmin) + xmin
samples.sort(axis=1)
means = np.mean(samples, axis=1)
# Find the length of the errorbar each side of the mean
half_range = samples[:,1] - means
# Plot without markers and customize the errorbar
_, caps, _ = plt.errorbar(means, np.arange(nsamples)+1, xerr=half_range, ls='',
elinewidth=3, capsize=5)
for cap in caps:
cap.set_markeredgewidth(3)
# Set the y-range so we can see all the errorbars clearly
plt.ylim(0, nsamples+1)
plt.show()

Add padding between bars and Y-Axis

I am building a bar chart using matplotlib using the code below. When my first or last column of data is 0, my first column is wedged against the Y-axis.
An example of this. Note that the first column is ON the x=0 point.
If I have data in this column, I get a huge padding between the Y-Axis and the first column as seen here. Note the additional bar, now at X=0. This effect is repeated if I have data in my last column as well.
My code is as follows:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import MultipleLocator
binVals = [0,5531608,6475325,1311915,223000,609638,291151,449434,1398731,2516755,3035532,2976924,2695079,1822865,1347155,304911,3562,157,5,0,0,0,0,0,0,0,0]
binTot = sum(binVals)
binNorm = []
for v in range(len(binVals)):
binNorm.append(float(binVals[v])/binTot)
fig = plt.figure(figsize=(6,4))
ax1 = fig.add_subplot(1,1,1)
ax1.bar(range(len(binNorm)),binNorm,align='center', label='Values')
plt.legend(loc=1)
plt.title("Demo Histogram")
plt.xlabel("Value")
plt.xticks(range(len(binLabels)),binLabels,rotation='vertical')
plt.grid(b=True, which='major', color='grey', linestyle='--', alpha=0.35)
ax1.xaxis.grid(False)
plt.ylabel("% of Count")
plt.subplots_adjust(bottom=0.15)
plt.tight_layout()
plt.show()
How can I set a constant margin between the Y-axis and my first/last bar?
Additionally, I realize it's labeled "Demo Histogram", that is a because I missed it when correcting problems discussed here.
I can't run the code snippet you gave, and even with some modification I couldn't replicate the big space. Aside from that, if you need to enforce a border to matplotlib, you ca do somthing like this:
ax.set_xlim( min(your_data) - 10, None )
The first term tells the axis to put the border at 10 units of distance from the minimum of your data, the None parameter teels it to keep the present value.
to put it into contest:
from collections import Counter
from pylab import *
data = randint(20,size=1000)
res = Counter(data)
vals = arange(20)
ax = gca()
ax.bar(vals-0.4, [ res[i] for i in vals ], width=0.8)
ax.set_xlim( min(data)-1, None )
show()
searching around stackoverflow I just learned a new trick: you can call
ax.margins( margin_you_desire )
to let automatically let matplotlib put that amount of space around your plot. It can also be configured differently between x and y.
In your case the best solution would be something like
ax.margins(0.01, None)
The little catch is that the unit is in axes unit, referred to the size of you plot, so a margin of 1 will put space around your plot at both sizes big as your present plot
The problem is align='center'. Remove it.

blank space in the top of the plot matplotlib django

I've a question about matplotlib bars.
I've already made some bar charts but I don't know why, this one left a huge blank space in the top.
the code is similar to other graphics I've made and they don't have this problem.
If anyone has any idea, I appreciate the help.
x = matplotlib.numpy.arange(0, max(total))
ind = matplotlib.numpy.arange(len(age_list))
ax.barh(ind, total)
ax.set_yticks(ind)
ax.set_yticklabels(age_list)
By "blank space in the top" do you mean that the y-limits are set too large?
By default, matplotlib will choose the x and y axis limits so that they're rounded to the closest "even" number (e.g. 1, 2, 12, 5, 50, -0.5 etc...).
If you want the axis limits to be set so that they're "tight" around the plot (i.e. the min and max of the data) use ax.axis('tight') (or equivalently, plt.axis('tight') which will use the current axis).
Another very useful method is plt.margins(...)/ax.margins(). It will act similar to axis('tight'), but will leave a bit of padding around the limits.
As an example of your problem:
import numpy as np
import matplotlib.pyplot as plt
# Make some data...
age_list = range(10,31)
total = np.random.random(len(age_list))
ind = np.arange(len(age_list))
plt.barh(ind, total)
# Set the y-ticks centered on each bar
# The default height (thickness) of each bar is 0.8
# Therefore, adding 0.4 to the tick positions will
# center the ticks on the bars...
plt.yticks(ind + 0.4, age_list)
plt.show()
If I wanted the limits to be tighter, I could call plt.axis('tight') after the call to plt.barh, which would give:
However, you might not want things to be too tight, so you could use plt.margins(0.02) to add 2% padding in all directions. You can then set the left-hand limit back to 0 with plt.xlim(xmin=0):
import numpy as np
import matplotlib.pyplot as plt
# Make some data...
age_list = range(10,31)
total = np.random.random(len(age_list))
ind = np.arange(len(age_list))
height = 0.8
plt.barh(ind, total, height=height)
plt.yticks(ind + height / 2.0, age_list)
plt.margins(0.05)
plt.xlim(xmin=0)
plt.show()
Which produces a bit nicer of a plot:
Hopefully that points you in the right direction, at any rate!

Categories

Resources