Python Matplotlib polar Labeling - python

Hi Im currently wishing to label my polar bar chart in the form whereby the labels are all rotating by differing amounts so they can be read easily much like a clock. I know there is a rotation in plt.xlabel however this will only rotate it by one amount I have many values and thus would like to not have them all crossing my graph.
This is figuratively what my graph is like with all the orientations in the same way, however I would like something akin to this; I really need this just using matplotlib and pandas if possible. Thanks in advance for the help!
Some example names might be farming, generalists, food and drink if these are not correctly rotated they will overlap the graph and be difficult to read.
from pandas import DataFrame,Series
import pandas as pd
import matplotlib.pylab as plt
from pylab import *
import numpy as np
data = pd.read_csv('/.../data.csv')
data=DataFrame(data)
N = len(data)
data1=DataFrame(data,columns=['X'])
data1=data1.get_values()
plt.figure(figsize=(8,8))
ax = plt.subplot(projection='polar')
plt.xlabel("AAs",fontsize=24)
ax.set_theta_zero_location("N")
bars = ax.bar(theta, data1,width=width, bottom=0.0,color=colours)
I would then like to label the bars according to their names which I can obtain in a list, However there are a number of values and i would like to be able to read the data names.

The very meager beginnings of an answer for you (I was doing something similar, so I just threw a quick hack to go in the right direction):
# The number of labels you'd like
In [521]: N = 5
# Where on the circle it will show up
In [522]: theta = numpy.linspace(0., 2 * numpy.pi, N + 1, endpoint = True)
In [523]: theta = theta[1:]
# Create the figure
In [524]: fig = plt.figure(figsize = (6,6), facecolor = 'white', edgecolor = None)
# Create the axis, notice polar = True
In [525]: ax = plt.subplot2grid((1, 1), (0,0), polar = True)
# Create white bars so you're really just focusing on the labels
In [526]: ax.bar(theta, numpy.ones_like(theta), align = 'center',
...: color = 'white', edgecolor = 'white')
# Create the text you're looking to add, here I just use numbers from counter = 1 to N
In [527]: counter = 1
In [528]: for t, o in zip(theta, numpy.ones_like(theta)):
...: ax.text(t, 1 - .1, counter, horizontalalignment = 'center', verticalalignment = 'center', rotation = t * 100)
...: counter += 1
In [529]: ax.set_yticklabels([])
In [530]: ax.set_xticklabels([])
In [531]: ax.grid(False)
In [531]: plt.show()

Related

Creating subplots with equal axis scale, Python, matplotlib

I am plotting seismological data and am creating a figure featuring 16 subplots of different depth slices. Each subplot displays the lat/lon of the epicenter and the color is scaled to its magnitude. I am trying to do two things:
Adjust the scale of all plots to equal the x and y min and max for the area selected. This will allow easy comparison across the plots. (so all plots would range from xmin to xmax etc)
adjust the magnitude colors so they also represent the scale (ie colors represent all available points not just the points on that specific sub plot)
I have seen this accomplished a number of ways but am struggling to apply them to the loop in my code. The data I am using is here: Data.
I posted my code and what the current output looks like below.
import matplotlib.pyplot as plt
import pandas as pd
eq_df = pd.read_csv(eq_csv)
eq_data = eq_df[['LON', 'LAT', 'DEPTH', 'MAG']]
nbound = max(eq_data.LAT)
sbound = min(eq_data.LAT)
ebound = max(eq_data.LON)
wbound = min(eq_data.LON)
xlimit = (wbound, ebound)
ylimit = (sbound, nbound)
magmin = min(eq_data.MAG)
magmax = max(eq_data.MAG)
for n in list(range(1,17)):
km = eq_data[(eq_data.DEPTH > n - 1) & (eq_data.DEPTH <= n)]
plt.subplot(4, 4, n)
plt.scatter(km["LON"], km['LAT'], s = 10, c = km['MAG'], vmin = magmin, vmax = magmax) #added vmin/vmax to scale my magnitude data
plt.ylim(sbound, nbound) # set y limits of plot
plt.xlim(wbound, ebound) # set x limits of plot
plt.tick_params(axis='both', which='major', labelsize= 6)
plt.subplots_adjust(hspace = 1)
plt.gca().set_title('Depth = ' + str(n - 1) +'km to ' + str(n) + 'km', size = 8) #set title of subplots
plt.suptitle('Magnitude of Events at Different Depth Slices, 1950 to Today')
plt.show()
ETA: new code to resolve my issue
In response to this comment on the other answer, here is a demonstration of the use of sharex=True and sharey=True for this use case:
import matplotlib.pyplot as plt
import numpy as np
# Supply the limits since random data will be plotted
wbound = -0.1
ebound = 1.1
sbound = -0.1
nbound = 1.1
fig, axs = plt.subplots(nrows=4, ncols=4, figsize=(16,12), sharex=True, sharey=True)
plt.xlim(wbound, ebound)
plt.ylim(sbound, nbound)
for n, ax in enumerate(axs.flatten()):
ax.scatter(np.random.random(20), np.random.random(20),
c = np.random.random(20), marker = '.')
ticks = [n % 4 == 0, n > 12]
ax.tick_params(left=ticks[0], bottom=ticks[1])
ax.set_title('Depth = ' + str(n - 1) +'km to ' + str(n) + 'km', size = 12)
plt.suptitle('Magnitude of Events at Different Depth Slices, 1950 to Today', y = 0.95)
plt.subplots_adjust(wspace=0.05)
plt.show()
Explanation of a couple things:
I have reduced the horizontal spacing between subplots with subplots_adjust(wspace=0.05)
plt.suptitle does not need to be (and should not be) in the loop.
ticks = [n % 4 == 0, n > 12] creates a pair of bools for each axis which is then used to control which tick marks are drawn.
Left and bottom tick marks are controlled for each axis with ax.tick_params(left=ticks[0], bottom=ticks[1])
plt.xlim() and plt.ylim() need only be called once, before the loop
Finally got it thanks to some help above and some extended googling.
I have updated my code above with notes indicating where code was added.
To adjust the limits of my plot axes I used:
plt.ylim(sbound, nbound)
plt.xlim(wbound, ebound)
To scale my magnitude data across all plots I added vmin, vmax to the following line:
plt.scatter(km["LON"], km['LAT'], s = 10, c = km['MAG'], vmin = magmin, vmax = magmax)
And here is the resulting figure:

Heatmap with circles indicating size of population

I would like to produce a heatmap in Python, similar to the one shown, where the size of the circle indicates the size of the sample in that cell. I looked in seaborn's gallery and couldn't find anything, and I don't think I can do this with matplotlib.
It's the inverse. While matplotlib can do pretty much everything, seaborn only provides a small subset of options.
So using matplotlib, you can plot a PatchCollection of circles as shown below.
Note: You could equally use a scatter plot, but since scatter dot sizes are in absolute units it would be rather hard to scale them into the grid.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
N = 10
M = 11
ylabels = ["".join(np.random.choice(list("PQRSTUVXYZ"), size=7)) for _ in range(N)]
xlabels = ["".join(np.random.choice(list("ABCDE"), size=3)) for _ in range(M)]
x, y = np.meshgrid(np.arange(M), np.arange(N))
s = np.random.randint(0, 180, size=(N,M))
c = np.random.rand(N, M)-0.5
fig, ax = plt.subplots()
R = s/s.max()/2
circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]
col = PatchCollection(circles, array=c.flatten(), cmap="RdYlGn")
ax.add_collection(col)
ax.set(xticks=np.arange(M), yticks=np.arange(N),
xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(M+1)-0.5, minor=True)
ax.set_yticks(np.arange(N+1)-0.5, minor=True)
ax.grid(which='minor')
fig.colorbar(col)
plt.show()
Here's a possible solution using Bokeh Plots:
import pandas as pd
from bokeh.palettes import RdBu
from bokeh.models import LinearColorMapper, ColumnDataSource, ColorBar
from bokeh.models.ranges import FactorRange
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import numpy as np
output_notebook()
d = dict(x = ['A','A','A', 'B','B','B','C','C','C','D','D','D'],
y = ['B','C','D', 'A','C','D','B','D','A','A','B','C'],
corr = np.random.uniform(low=-1, high=1, size=(12,)).tolist())
df = pd.DataFrame(d)
df['size'] = np.where(df['corr']<0, np.abs(df['corr']), df['corr'])*50
#added a new column to make the plot size
colors = list(reversed(RdBu[9]))
exp_cmap = LinearColorMapper(palette=colors,
low = -1,
high = 1)
p = figure(x_range = FactorRange(), y_range = FactorRange(), plot_width=700,
plot_height=450, title="Correlation",
toolbar_location=None, tools="hover")
p.scatter("x","y",source=df, fill_alpha=1, line_width=0, size="size",
fill_color={"field":"corr", "transform":exp_cmap})
p.x_range.factors = sorted(df['x'].unique().tolist())
p.y_range.factors = sorted(df['y'].unique().tolist(), reverse = True)
p.xaxis.axis_label = 'Values'
p.yaxis.axis_label = 'Values'
bar = ColorBar(color_mapper=exp_cmap, location=(0,0))
p.add_layout(bar, "right")
show(p)
One option is to use matplotlib's scatter plots with legends and grid. You can specify size of those circles with specifying the scales. You can also change the color of each circle. You should somehow specify X,Y values so that the circles sit straight on lines. This is an example I got from here:
volume = np.random.rayleigh(27, size=40)
amount = np.random.poisson(10, size=40)
ranking = np.random.normal(size=40)
price = np.random.uniform(1, 10, size=40)
fig, ax = plt.subplots()
# Because the price is much too small when being provided as size for ``s``,
# we normalize it to some useful point sizes, s=0.3*(price*3)**2
scatter = ax.scatter(volume, amount, c=ranking, s=0.3*(price*3)**2,
vmin=-3, vmax=3, cmap="Spectral")
# Produce a legend for the ranking (colors). Even though there are 40 different
# rankings, we only want to show 5 of them in the legend.
legend1 = ax.legend(*scatter.legend_elements(num=5),
loc="upper left", title="Ranking")
ax.add_artist(legend1)
# Produce a legend for the price (sizes). Because we want to show the prices
# in dollars, we use the *func* argument to supply the inverse of the function
# used to calculate the sizes from above. The *fmt* ensures to show the price
# in dollars. Note how we target at 5 elements here, but obtain only 4 in the
# created legend due to the automatic round prices that are chosen for us.
kw = dict(prop="sizes", num=5, color=scatter.cmap(0.7), fmt="$ {x:.2f}",
func=lambda s: np.sqrt(s/.3)/3)
legend2 = ax.legend(*scatter.legend_elements(**kw),
loc="lower right", title="Price")
plt.show()
Output:
I don't have enough reputation to comment on Delenges' excellent answer, so I'll leave my comment as an answer instead:
R.flat doesn't order the way we need it to, so the circles assignment should be:
circles = [plt.Circle((j,i), radius=R[j][i]) for j, i in zip(x.flat, y.flat)]
Here is an easy example to plot circle_heatmap.
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.datasets import load_wine as load_data
from psynlig import plot_correlation_heatmap
plt.style.use('seaborn-talk')
data_set = load_data()
data = pd.DataFrame(data_set['data'], columns=data_set['feature_names'])
#data = df_corr_selected
kwargs = {
'heatmap': {
'vmin': -1,
'vmax': 1,
'cmap': 'viridis',
},
'figure': {
'figsize': (14, 10),
},
}
plot_correlation_heatmap(data, bubble=True, annotate=False, **kwargs)
plt.show()

How can I add different hatch colors in a matplotlib barplot?

I would like to change my hedge color based on the group my data belongs to.
I found that I can change the color of the entire plot, and somehow people seem to be able to do what I want, but I really don't understand how to implement this in my bar plot. Ultimately, I want to have different mixes and matches of colored patches and colored hatches, but that's something I can figure out for myself.
This is an example stolen from another answer here on SO:
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
# Juwairia's data:
a = [4,-6,9]
b = [2,-7,1]
c = [3,3,1]
d = [4,0,-3]
data = np.array([a, b, c, d])
data_shape = np.shape(data)
# Take negative and positive data apart and cumulate
def get_cumulated_array(data, **kwargs):
cum = data.clip(**kwargs)
cum = np.cumsum(cum, axis=0)
d = np.zeros(np.shape(data))
d[1:] = cum[:-1]
return d
cumulated_data = get_cumulated_array(data, min=0)
cumulated_data_neg = get_cumulated_array(data, max=0)
# Re-merge negative and positive data.
row_mask = (data<0)
cumulated_data[row_mask] = cumulated_data_neg[row_mask]
data_stack = cumulated_data
cols = [ i for i in px.colors.qualitative.Light24]
fig = plt.figure()
ax = plt.subplot(111)
bars=[]
for i in np.arange(0, data_shape[0]):
bars.append(ax.bar(np.arange(data_shape[1]), data[i], bottom=data_stack[i], color=cols[i],))
bars[0][1].set_hatch('\\')# make this dark red - HOW?
bars[1][2].set_hatch('\\')#make this dark green - HOW?
plt.show()
Working with matplotlib 2.2.4 and Python 2.7
Many Thanks!
According to this question the hatch color changes with the edge color. The answer then solves that by plotting twice: once for the hatch color and one for the edge color.

Horizontal stacked bar plot and add labels to each section

I am trying to replicate the following image in matplotlib and it seems barh is my only option. Though it appears that you can't stack barh graphs so I don't know what to do
If you know of a better python library to draw this kind of thing, please let me know.
This is all I could come up with as a start:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
y_pos = np.arange(len(people))
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax.barh(y_pos, bottomdata,color='r',align='center')
ax.barh(y_pos, topdata,color='g',align='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
I would then have to add labels individually using ax.text which would be tedious. Ideally I would like to just specify the width of the part to be inserted then it updates the center of that section with a string of my choosing. The labels on the outside (e.g. 3800) I can add myself later, it is mainly the labeling over the bar section itself and creating this stacked method in a nice way I'm having problems with. Can you even specify a 'distance' i.e. span of color in any way?
Edit 2: for more heterogeneous data. (I've left the above method since I find it more usual to work with the same number of records per series)
Answering the two parts of the question:
a) barh returns a container of handles to all the patches that it drew. You can use the coordinates of the patches to aid the text positions.
b) Following these two answers to the question that I noted before (see Horizontal stacked bar chart in Matplotlib), you can stack bar graphs horizontally by setting the 'left' input.
and additionally c) handling data that is less uniform in shape.
Below is one way you could handle data that is less uniform in shape is simply to process each segment independently.
import numpy as np
import matplotlib.pyplot as plt
# some labels for each row
people = ('A','B','C','D','E','F','G','H')
r = len(people)
# how many data points overall (average of 3 per person)
n = r * 3
# which person does each segment belong to?
rows = np.random.randint(0, r, (n,))
# how wide is the segment?
widths = np.random.randint(3,12, n,)
# what label to put on the segment (xrange in py2.7, range for py3)
labels = range(n)
colors ='rgbwmc'
patch_handles = []
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
left = np.zeros(r,)
row_counts = np.zeros(r,)
for (r, w, l) in zip(rows, widths, labels):
print r, w, l
patch_handles.append(ax.barh(r, w, align='center', left=left[r],
color=colors[int(row_counts[r]) % len(colors)]))
left[r] += w
row_counts[r] += 1
# we know there is only one patch but could enumerate if expanded
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y, "%d%%" % (l), ha='center',va='center')
y_pos = np.arange(8)
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
Which produces a graph like this , with a different number of segments present in each series.
Note that this is not particularly efficient since each segment used an individual call to ax.barh. There may be more efficient methods (e.g. by padding a matrix with zero-width segments or nan values) but this likely to be problem-specific and is a distinct question.
Edit: updated to answer both parts of the question.
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
segments = 4
# generate some multi-dimensional data & arbitrary labels
data = 3 + 10* np.random.rand(segments, len(people))
percentages = (np.random.randint(5,20, (len(people), segments)))
y_pos = np.arange(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
colors ='rgbwmc'
patch_handles = []
left = np.zeros(len(people)) # left alignment of data starts at zero
for i, d in enumerate(data):
patch_handles.append(ax.barh(y_pos, d,
color=colors[i%len(colors)], align='center',
left=left))
# accumulate the left-hand offsets
left += d
# go through all of the bar segments and annotate
for j in range(len(patch_handles)):
for i, patch in enumerate(patch_handles[j].get_children()):
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x,y, "%d%%" % (percentages[i,j]), ha='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
You can achieve a result along these lines (note: the percentages I used have nothing to do with the bar widths, as the relationship in the example seems unclear):
See Horizontal stacked bar chart in Matplotlib for some ideas on stacking horizontal bar plots.
Imports and Test DataFrame
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
For vertical stacked bars see Stacked Bar Chart with Centered Labels
import pandas as pd
import numpy as np
# create sample data as shown in the OP
np.random.seed(365)
people = ('A','B','C','D','E','F','G','H')
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
# create the dataframe
df = pd.DataFrame({'Female': bottomdata, 'Male': topdata}, index=people)
# display(df)
Female Male
A 12.41 7.42
B 9.42 4.10
C 9.85 7.38
D 8.89 10.53
E 8.44 5.92
F 6.68 11.86
G 10.67 12.97
H 6.05 7.87
Updated with matplotlib v3.4.2
Use matplotlib.pyplot.bar_label
See How to add value labels on a bar chart for additional details and examples with .bar_label.
labels = [f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ] for python < 3.8, without the assignment expression (:=).
Plotted using pandas.DataFrame.plot with kind='barh'
ax = df.plot(kind='barh', stacked=True, figsize=(8, 6))
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
# uncomment and use the next line if there are no nan or 0 length sections; just use fmt to add a % (the previous two lines of code are not needed, in this case)
# ax.bar_label(c, fmt='%.2f%%', label_type='center')
# move the legend
ax.legend(bbox_to_anchor=(1.025, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Using seaborn
sns.barplot does not have an option for stacked bar plots, however, sns.histplot and sns.displot can be used to create horizontal stacked bars.
seaborn typically requires the dataframe to be in a long, instead of wide, format, so use pandas.DataFrame.melt to reshape the dataframe.
Reshape dataframe
# convert the dataframe to a long form
df = df.reset_index()
df = df.rename(columns={'index': 'People'})
dfm = df.melt(id_vars='People', var_name='Gender', value_name='Percent')
# display(dfm)
People Gender Percent
0 A Female 12.414557
1 B Female 9.416027
2 C Female 9.846105
3 D Female 8.885621
4 E Female 8.438872
5 F Female 6.680709
6 G Female 10.666258
7 H Female 6.050124
8 A Male 7.420860
9 B Male 4.104433
10 C Male 7.383738
11 D Male 10.526158
12 E Male 5.916262
13 F Male 11.857227
14 G Male 12.966913
15 H Male 7.865684
sns.histplot: axes-level plot
fig, axe = plt.subplots(figsize=(8, 6))
sns.histplot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', ax=axe)
# iterate through each set of containers
for c in axe.containers:
# add bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
sns.displot: figure-level plot
g = sns.displot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', height=6)
# iterate through each facet / supbplot
for axe in g.axes.flat:
# iteate through each set of containers
for c in axe.containers:
# add the bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
Original Answer - before matplotlib v3.4.2
The easiest way to plot a horizontal or vertical stacked bar, is to load the data into a pandas.DataFrame
This will plot, and annotate correctly, even when all categories ('People'), don't have all segments (e.g. some value is 0 or NaN)
Once the data is in the dataframe:
It's easier to manipulate and analyze
It can be plotted with the matplotlib engine, using:
pandas.DataFrame.plot.barh
label_text = f'{width}' for annotations
pandas.DataFrame.plot.bar
label_text = f'{height}' for annotations
SO: Vertical Stacked Bar Chart with Centered Labels
These methods return a matplotlib.axes.Axes or a numpy.ndarray of them.
Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.
Each .Rectangle has methods for extracting the various values that define the rectangle.
Each .Rectangle is in order from left the right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches.
The labels are made using an f-string, label_text = f'{width:.2f}%', so any additional text can be added as needed.
Plot and Annotate
Plotting the bar, is 1 line, the remainder is annotating the rectangles
# plot the dataframe with 1 line
ax = df.plot.barh(stacked=True, figsize=(8, 6))
# .patches is everything inside of the chart
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The height of the bar is the data value and can be used as the label
label_text = f'{width:.2f}%' # f'{width:.2f}' to format decimal values
# ax.text(x, y, text)
label_x = x + width / 2
label_y = y + height / 2
# only plot labels greater than given width
if width > 0:
ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
# move the legend
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Example with Missing Segment
# set one of the dataframe values to 0
df.iloc[4, 1] = 0
Note the annotations are all in the correct location from df.
For this case, the above answers work perfectly. The issue I had, and didn't find a plug-and-play solution online, was that I often have to plot stacked bars in multi-subplot figures, with many values, which tend to have very non-homogenous amplitudes.
(Note: I work usually with pandas dataframes, and matplotlib. I couldn't make the bar_label() method of matplotlib to work all the times.)
So, I just give a kind of ad-hoc, but easily generalizable solution. In this example, I was working with single-row dataframes (for power-exchange monitoring purposes per hour), so, my dataframe (df) had just one row.
(I provide an example figure to show how this can be useful in very densely-packed plots)
[enter image description here][1]
[1]: https://i.stack.imgur.com/9akd8.png
'''
This implementation produces a stacked, horizontal bar plot.
df --> pandas dataframe. Columns are used as the iterator, and only the firs value of each column is used.
waterfall--> bool: if True, apart from the stack-direction, also a perpendicular offset is added.
cyclic_offset_x --> list (of any length) or None: loop through these values to use as x-offset pixels.
cyclic_offset_y --> list (of any length) or None: loop through these values to use as y-offset pixels.
ax --> matplotlib Axes, or None: if None, creates a new axis and figure.
'''
def magic_stacked_bar(df, waterfall=False, cyclic_offset_x=None, cyclic_offset_y=None, ax=None):
if isinstance(cyclic_offset_x, type(None)):
cyclic_offset_x = [0, 0]
if isinstance(cyclic_offset_y, type(None)):
cyclic_offset_y = [0, 0]
ax0 = ax
if isinstance(ax, type(None)):
fig, ax = plt.subplots()
fig.set_size_inches(19, 10)
cycler = 0;
prev = 0 # summation variable to make it stacked
for c in df.columns:
if waterfall:
y = c ; label = "" # bidirectional stack
else:
y = 0; label = c # unidirectional stack
ax.barh(y=y, width=df[c].values[0], height=1, left=prev, label = label)
prev += df[c].values[0] # add to sum-stack
offset_x = cyclic_offset_x[divmod(cycler, len(cyclic_offset_x))[1]]
offset_y = cyclic_offset_y[divmod(cycler, len(cyclic_offset_y))[1]]
ax.annotate(text="{}".format(int(df[c].values[0])), xy=(prev - df[c].values / 2, y),
xytext=(offset_x, offset_y), textcoords='offset pixels',
ha='center', va='top', fontsize=8,
arrowprops=dict(facecolor='black', shrink=0.01, width=0.3, headwidth=0.3),
bbox=dict(boxstyle='round', facecolor='grey', alpha=0.5))
cycler += 1
if not waterfall:
ax.legend() # if waterfall, the index annotates the columns. If
# waterfall ==False, the legend annotates the columns
if isinstance(ax0, type(None)):
ax.set_title("Voi la")
ax.set_xlabel("UltraWatts")
plt.show()
else:
return ax
''' (Sometimes, it is more tedious and requires some custom functions to make the labels look alright.
'''
A, B = 80,80
n_units = df.shape[1]
cyclic_offset_x = -A*np.cos(2*np.pi / (2*n_units) *np.arange(n_units))
cyclic_offset_y = B*np.sin(2*np.pi / (2*n_units) * np.arange(n_units)) + B/2

Histogram bars overlapping matplotlib

I am able to build the histogram I need. However, the bars overlap over one another.
As you can see I changed the width of the bars to 0.2 but it still overlaps. What is the mistake I am doing?
from matplotlib import pyplot as plt
import numpy as np
from matplotlib.font_manager import FontProperties
from random import randrange
color = ['r', 'b', 'g','c','m','y','k','darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred']
label = ['2','6','10','14','18','22','26','30','34','38','42','46']
file_names = ['a','b','c']
diff = [[randrange(10) for a in range(0, len(label))] for a in range(0, len(file_names))]
print diff
x = diff
name = file_names
y = zip(*x)
pos = np.arange(len(x))
width = 1. / (1 + len(x))
fig, ax = plt.subplots()
for idx, (serie, color,label) in enumerate(zip(y, color,label)):
ax.bar(pos + idx * width, serie, width, color=color, label=label)
ax.set_xticks(pos + width)
plt.xlabel('foo')
plt.ylabel('bar')
ax.set_xticklabels(name)
ax.legend()
plt.savefig("final" + '.eps', bbox_inches='tight', pad_inches=0.5,dpi=100,format="eps")
plt.clf()
Here is the graph:
As you can see in the below example, you can easily get non-overlapping bars using a heavily simplified version of your plotting code. I'd suggest you to have a closer look at whether x and y really are what you expect them to be. (And that you try to simplify your code as much as possible when you are looking for an error in the code.)
Also have a look at the computation of the width of the bars. You appear to use the number of subjects for this, while it should be the number of bars per subject instead.
Have a look at this example:
import numpy as np
import matplotlib.pyplot as plt
subjects = ('Tom', 'Dick', 'Harry', 'Sally', 'Sue')
# number of bars per subject
n = 5
# y-data per subject
y = np.random.rand(n, len(subjects))
# x-positions for the bars
x = np.arange(len(subjects))
# plot bars
width = 1./(1+n) # <-- n.b., use number of bars, not number of subjects
for i, yi in enumerate(y):
plt.bar(x+i*width, yi, width)
# add labels
plt.xticks(x+n/2.*width, subjects)
plt.show()
This is the result image:
For reference:
http://matplotlib.org/examples/api/barchart_demo.html
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar
The problem is that the width of your bars is calculated from the three subjects, not the twelve bars per subject. That means you're placing multiple bars at each x-position. Try swapping in these lines where appropriate to fix that:
n = len(x[0]) # New variable with the right length to calculate bar width
width = 1. / (1 + n)
ax.set_xticks(pos + n/2. * width)

Categories

Resources