Double loop to populate subplots in matplotlib

Double loop to populate subplots in matplotlib - python

I have a dict of dataframes that I want to use to populate subplots.
Each dict has two columns of data for x and y axis, and two categorical columns for hue.
Pseudo code:
for df in dict of dataframes:
for cat in categories:
plot(x=col_0, y=col_1, hue=cat)
Data for example:
dict_dfs = dict()
for i in range(5):
dict_dfs['df_{}'.format(i)] = pd.DataFrame({'col_1':np.random.randn(10), # first column with data = x axis
'col_2':np.random.randn(10), # second column with data = y axis
'cat_0': ('Home '*5 + 'Car '*5).split(), # first category = hue of plots on the left
'cat_1': ('kitchen '*3 + 'Bedroom '*2 + 'Care '*5).split() # second category = hue of plots on the right
})
IN:
fig, axes = plt.subplots(len(dict_dfs.keys()), 2, figsize=(15,10*len(dict_dfs.keys())))
for i, (name, df) in enumerate(dict_dfs.items()):
for j, cat in enumerate(['cat_0', 'cat_1']):
sns.scatterplot(
x="col_1", y="col_2", hue=cat, data=df, ax=axes[i,j], alpha=0.6)
axes[i,j].set_title('df: {}, cat: {}'.format(name, cat), fontsize = 25, pad = 35, fontweight = 'bold')
axes[i,j].set_xlabel('col_1', fontsize = 26, fontweight = 'bold')
axes[i,j].set_ylabel('col_2', fontsize = 26, fontweight = 'bold')
plt.show()
OUT:
the 10 subplots are created correctly (5 dfs * 2 categories), but only the first one (axes[0, 0]) gets populated. I am used to create subplots with one loop, but it's the first time I use two. I have checked the code without finding the issue. Anyone can help ?

The plt.show() is within the scope of the for-loops, so the figure plot gets shown after the initialization of the first subplot. If you move it out of the loops (un-indent it to the beginning of the line), the plot should correctly be shown with all subplots.

Related

Remove for loops when plotting matplotlib subplots

I have large subplot-based figure to produce in python using matplotlib. In total the figure has in excess of 500 individual plots each with 1000s of datapoints. This can be plotted using a for loop-based approach modelled on the minimum example given below
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# define main plot names and subplot names
mains = ['A','B','C','D']
subs = list(range(9))
# generate mimic data in pd dataframe
col = [letter+str(number) for letter in mains for number in subs]
col.insert(0,'Time')
df = pd.DataFrame(columns=col)
for title in df.columns:
df[title] = [i for i in range(100)]
# although alphabet and mains are the same in this minimal example this may not always be true
alphabet = ['A', 'B', 'C', 'D']
column_names = [column for column in df.columns if column != 'Time']
# define figure size and main gridshape
fig = plt.figure(figsize=(15, 15))
outer = gridspec.GridSpec(2, 2, wspace=0.2, hspace=0.2)
for i, letter in enumerate(alphabet):
# define inner grid size and shape
inner = gridspec.GridSpecFromSubplotSpec(3, 3,
subplot_spec=outer[i], wspace=0.1, hspace=0.1)
# select only columns with correct letter
plot_array = [col for col in column_names if col.startswith(letter)]
# set title for each letter plot
ax = plt.Subplot(fig, outer[i])
ax.set_title(f'Letter {letter}')
ax.axis('off')
fig.add_subplot(ax)
# create each subplot
for j, col in enumerate(plot_array):
ax = plt.Subplot(fig, inner[j])
X = df['Time']
Y = df[col]
# plot waveform
ax.plot(X, Y)
# hide all axis ticks
ax.axis('off')
# set y_axis limits so all plots share same y_axis
ax.set_ylim(df[column_names].min().min(),df[column_names].max().max())
fig.add_subplot(ax)
However this is slow, requiring minutes to plot the figure. Is there a more efficient (potentially for loop free) method to achieve the same result

The issue with the loop is not the plotting but the setting of the axis limits with df[column_names].min().min() and df[column_names].max().max().
Testing with 6 main plots, 64 subplots and 375,000 data points, the plotting section of the example takes approx 360s to complete when axis limits are set by searching df for min and max values each loop. However by moving the search for min and max outside the loops. eg
# set y_lims
y_upper = df[column_names].max().max()
y_lower = df[column_names].min().min()
and changing
ax.set_ylim(df[column_names].min().min(),df[column_names].max().max())
to
ax.set_ylim(y_lower,y_upper)
the plotting time is reduced to approx 24 seconds.

Matplotlib segmented Plot

I have the following dataset:
df = pd.DataFrame ({"a": [1,2,3,4,5,6,7,8,9,1,11,12,13,14,15,16,17,18,19,20],
'b':[1,2,3,4,50,60,70,8,9,10,110,120,130,140,150,16,17,18,19,20],
'c':[np.nan,2.2,3.4,np.nan,40.9,60.2,np.nan,8.2,8.9,10.1,np.nan,120.2,
130.07,140.23,np.nan,16.054,17.20,18.1,np.nan,20.1],
'd': [100, np.nan,np.nan, 500,np.nan, np.nan,500,
np.nan,np.nan,np.nan,100, np.nan,np.nan, np.nan,500,
np.nan,np.nan, np.nan,100,np.nan ]}
)
I am trying to plot the data based on the following conditions:
Between 100 to the next 100 in column 'd' I want to have one
plot having column 'a' in the x axis, and scatterplot of column 'b' and line plot of 'c' in the y axis.
That is I will be having 3 different plots. First one from index 0 to 10, second one from index 10 to index 18, third one from 18 to 20. (I can generate this using for loop)
Within each plot I want segmented lineplot based on the location 500 value in column 'd',i.e., for the first plot from index 0-3 one lineplot, from index 3-6 another and from index 6-10 another lineplot.( I can't make the segmented lineplot)
I am using the following codes:
index = index + [len(df)]
index1 = index1 + [len(df)]
for k in range (len(index)-1):
x = df['a'][index[k] + 1:index[k+1]]
y = df['c'][index[k]+ 1:index[k+1]]
y1 = df['b'][index[k]+ 1:index[k+1]]
plt.scatter(x, y)
plt.plot(x, y1)
plt.savefig('plot'+ str(k+1000) +'.png')
plt.clf()
My first plot look like this: (But want to have three segmented
lineplot not the continuous one (that is line from index 0-3 should not be connected with 3-6 and so on)
Sorry for the rather long question and thx:)

The expected output is unclear, but here is a general strategy to split your dataset in groups with help of groupby:
option 1: independent figures
group = df['d'].eq(100).cumsum()
for name, g in df.groupby(group):
f,ax = plt.subplots()
ax.scatter(g['a'], g['c'])
ax.plot(g['a'], g['b'])
f.savefig(f'figure_{name}.png')
option 2
ax = plt.subplot()
group = df['d'].eq(100).cumsum()
for name, g in df.groupby(group):
ax.scatter(g['a'], g['c'])
ax.plot(g['a'], g['b'], label=name)
ax.legend()
option 3
ax = plt.subplot()
group = df['d'].eq(100).cumsum()
for name, g in df.groupby(group):
g = g.reset_index()
ax.scatter(g.index+1, g['c'])
ax.plot(g.index+1, g['b'])

How to generate labelled barplots using seaborn?

I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.

In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()

Plotting subplots using dataframes

I have dictionary containing 9 dataframes. I want to create a 3,3 subplot and plot bar charts for each dataframe.
To plot a single plot I would do this (just a singplot not considering subplots),
%matplotlib inline
with plt.style.context('bmh'):
famd = satFAMD_obj['MODIS.NDVI']
df_norm = satFAMD_dfNorm['MODIS.NDVI']
df_cor =famd.column_correlations(df_norm)
df_cor.columns = ['component 1','component 2', 'component 3']
df_cor.plot(kind = 'bar',cmap = 'Set1', figsize = (10,6))
plt.show()
where satFAMD_obj & satFAMD_dfNorm are two dictionaries containing factor analysis trained objects and a dataframes. In the next line I create a new dataframe called df_cor and then plot it using this line df_cor.plot(kind = 'bar',cmap = 'Set1', figsize = (10,6)).
Now my problem is when it comes to multiple subplots how do I do this ?
I cannot simply do this,
fig,ax = plt.subplots(3,3, figsize = (12,8))
ax[0,0].df_cor.plot(kind = 'bar',cmap = 'Set1')
Any ideas?

I'm supposing that all of your keys in your two dictionaries will need to be plotted.
You will:
declare subplots,
iterate over dictionaries,
iterate over the axes objects,
plot to each set of axes.
Using code like the below example:
fig,ax = plt.subplots(3,3, figsize = (12,8))
for k1,k2 in zip(satFAMD_obj.keys(),satFAMD_dfNorm.keys()):
for axes in ax.flatten():
famd = satFAMD_obj[k1]
df_norm = satFAMD_dfNorm[k2]
df_cor = famd.column_correlations(df_norm)
df_cor.columns = ['component 1','component 2', 'component 3']
df_cor.plot(kind = 'bar',cmap = 'Set1',ax=axes)
# ^^^^^^^

Horizontal stacked bar plot and add labels to each section

I am trying to replicate the following image in matplotlib and it seems barh is my only option. Though it appears that you can't stack barh graphs so I don't know what to do
If you know of a better python library to draw this kind of thing, please let me know.
This is all I could come up with as a start:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
y_pos = np.arange(len(people))
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax.barh(y_pos, bottomdata,color='r',align='center')
ax.barh(y_pos, topdata,color='g',align='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
I would then have to add labels individually using ax.text which would be tedious. Ideally I would like to just specify the width of the part to be inserted then it updates the center of that section with a string of my choosing. The labels on the outside (e.g. 3800) I can add myself later, it is mainly the labeling over the bar section itself and creating this stacked method in a nice way I'm having problems with. Can you even specify a 'distance' i.e. span of color in any way?

Edit 2: for more heterogeneous data. (I've left the above method since I find it more usual to work with the same number of records per series)
Answering the two parts of the question:
a) barh returns a container of handles to all the patches that it drew. You can use the coordinates of the patches to aid the text positions.
b) Following these two answers to the question that I noted before (see Horizontal stacked bar chart in Matplotlib), you can stack bar graphs horizontally by setting the 'left' input.
and additionally c) handling data that is less uniform in shape.
Below is one way you could handle data that is less uniform in shape is simply to process each segment independently.
import numpy as np
import matplotlib.pyplot as plt
# some labels for each row
people = ('A','B','C','D','E','F','G','H')
r = len(people)
# how many data points overall (average of 3 per person)
n = r * 3
# which person does each segment belong to?
rows = np.random.randint(0, r, (n,))
# how wide is the segment?
widths = np.random.randint(3,12, n,)
# what label to put on the segment (xrange in py2.7, range for py3)
labels = range(n)
colors ='rgbwmc'
patch_handles = []
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
left = np.zeros(r,)
row_counts = np.zeros(r,)
for (r, w, l) in zip(rows, widths, labels):
print r, w, l
patch_handles.append(ax.barh(r, w, align='center', left=left[r],
color=colors[int(row_counts[r]) % len(colors)]))
left[r] += w
row_counts[r] += 1
# we know there is only one patch but could enumerate if expanded
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y, "%d%%" % (l), ha='center',va='center')
y_pos = np.arange(8)
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
Which produces a graph like this , with a different number of segments present in each series.
Note that this is not particularly efficient since each segment used an individual call to ax.barh. There may be more efficient methods (e.g. by padding a matrix with zero-width segments or nan values) but this likely to be problem-specific and is a distinct question.
Edit: updated to answer both parts of the question.
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
segments = 4
# generate some multi-dimensional data & arbitrary labels
data = 3 + 10* np.random.rand(segments, len(people))
percentages = (np.random.randint(5,20, (len(people), segments)))
y_pos = np.arange(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
colors ='rgbwmc'
patch_handles = []
left = np.zeros(len(people)) # left alignment of data starts at zero
for i, d in enumerate(data):
patch_handles.append(ax.barh(y_pos, d,
color=colors[i%len(colors)], align='center',
left=left))
# accumulate the left-hand offsets
left += d
# go through all of the bar segments and annotate
for j in range(len(patch_handles)):
for i, patch in enumerate(patch_handles[j].get_children()):
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x,y, "%d%%" % (percentages[i,j]), ha='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
You can achieve a result along these lines (note: the percentages I used have nothing to do with the bar widths, as the relationship in the example seems unclear):
See Horizontal stacked bar chart in Matplotlib for some ideas on stacking horizontal bar plots.

Imports and Test DataFrame
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
For vertical stacked bars see Stacked Bar Chart with Centered Labels
import pandas as pd
import numpy as np
# create sample data as shown in the OP
np.random.seed(365)
people = ('A','B','C','D','E','F','G','H')
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
# create the dataframe
df = pd.DataFrame({'Female': bottomdata, 'Male': topdata}, index=people)
# display(df)
Female Male
A 12.41 7.42
B 9.42 4.10
C 9.85 7.38
D 8.89 10.53
E 8.44 5.92
F 6.68 11.86
G 10.67 12.97
H 6.05 7.87
Updated with matplotlib v3.4.2
Use matplotlib.pyplot.bar_label
See How to add value labels on a bar chart for additional details and examples with .bar_label.
labels = [f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ] for python < 3.8, without the assignment expression (:=).
Plotted using pandas.DataFrame.plot with kind='barh'
ax = df.plot(kind='barh', stacked=True, figsize=(8, 6))
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
# uncomment and use the next line if there are no nan or 0 length sections; just use fmt to add a % (the previous two lines of code are not needed, in this case)
# ax.bar_label(c, fmt='%.2f%%', label_type='center')
# move the legend
ax.legend(bbox_to_anchor=(1.025, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Using seaborn
sns.barplot does not have an option for stacked bar plots, however, sns.histplot and sns.displot can be used to create horizontal stacked bars.
seaborn typically requires the dataframe to be in a long, instead of wide, format, so use pandas.DataFrame.melt to reshape the dataframe.
Reshape dataframe
# convert the dataframe to a long form
df = df.reset_index()
df = df.rename(columns={'index': 'People'})
dfm = df.melt(id_vars='People', var_name='Gender', value_name='Percent')
# display(dfm)
People Gender Percent
0 A Female 12.414557
1 B Female 9.416027
2 C Female 9.846105
3 D Female 8.885621
4 E Female 8.438872
5 F Female 6.680709
6 G Female 10.666258
7 H Female 6.050124
8 A Male 7.420860
9 B Male 4.104433
10 C Male 7.383738
11 D Male 10.526158
12 E Male 5.916262
13 F Male 11.857227
14 G Male 12.966913
15 H Male 7.865684
sns.histplot: axes-level plot
fig, axe = plt.subplots(figsize=(8, 6))
sns.histplot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', ax=axe)
# iterate through each set of containers
for c in axe.containers:
# add bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
sns.displot: figure-level plot
g = sns.displot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', height=6)
# iterate through each facet / supbplot
for axe in g.axes.flat:
# iteate through each set of containers
for c in axe.containers:
# add the bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
Original Answer - before matplotlib v3.4.2
The easiest way to plot a horizontal or vertical stacked bar, is to load the data into a pandas.DataFrame
This will plot, and annotate correctly, even when all categories ('People'), don't have all segments (e.g. some value is 0 or NaN)
Once the data is in the dataframe:
It's easier to manipulate and analyze
It can be plotted with the matplotlib engine, using:
pandas.DataFrame.plot.barh
label_text = f'{width}' for annotations
pandas.DataFrame.plot.bar
label_text = f'{height}' for annotations
SO: Vertical Stacked Bar Chart with Centered Labels
These methods return a matplotlib.axes.Axes or a numpy.ndarray of them.
Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.
Each .Rectangle has methods for extracting the various values that define the rectangle.
Each .Rectangle is in order from left the right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches.
The labels are made using an f-string, label_text = f'{width:.2f}%', so any additional text can be added as needed.
Plot and Annotate
Plotting the bar, is 1 line, the remainder is annotating the rectangles
# plot the dataframe with 1 line
ax = df.plot.barh(stacked=True, figsize=(8, 6))
# .patches is everything inside of the chart
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The height of the bar is the data value and can be used as the label
label_text = f'{width:.2f}%' # f'{width:.2f}' to format decimal values
# ax.text(x, y, text)
label_x = x + width / 2
label_y = y + height / 2
# only plot labels greater than given width
if width > 0:
ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
# move the legend
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Example with Missing Segment
# set one of the dataframe values to 0
df.iloc[4, 1] = 0
Note the annotations are all in the correct location from df.

For this case, the above answers work perfectly. The issue I had, and didn't find a plug-and-play solution online, was that I often have to plot stacked bars in multi-subplot figures, with many values, which tend to have very non-homogenous amplitudes.
(Note: I work usually with pandas dataframes, and matplotlib. I couldn't make the bar_label() method of matplotlib to work all the times.)
So, I just give a kind of ad-hoc, but easily generalizable solution. In this example, I was working with single-row dataframes (for power-exchange monitoring purposes per hour), so, my dataframe (df) had just one row.
(I provide an example figure to show how this can be useful in very densely-packed plots)
[enter image description here][1]
[1]: https://i.stack.imgur.com/9akd8.png
'''
This implementation produces a stacked, horizontal bar plot.
df --> pandas dataframe. Columns are used as the iterator, and only the firs value of each column is used.
waterfall--> bool: if True, apart from the stack-direction, also a perpendicular offset is added.
cyclic_offset_x --> list (of any length) or None: loop through these values to use as x-offset pixels.
cyclic_offset_y --> list (of any length) or None: loop through these values to use as y-offset pixels.
ax --> matplotlib Axes, or None: if None, creates a new axis and figure.
'''
def magic_stacked_bar(df, waterfall=False, cyclic_offset_x=None, cyclic_offset_y=None, ax=None):
if isinstance(cyclic_offset_x, type(None)):
cyclic_offset_x = [0, 0]
if isinstance(cyclic_offset_y, type(None)):
cyclic_offset_y = [0, 0]
ax0 = ax
if isinstance(ax, type(None)):
fig, ax = plt.subplots()
fig.set_size_inches(19, 10)
cycler = 0;
prev = 0 # summation variable to make it stacked
for c in df.columns:
if waterfall:
y = c ; label = "" # bidirectional stack
else:
y = 0; label = c # unidirectional stack
ax.barh(y=y, width=df[c].values[0], height=1, left=prev, label = label)
prev += df[c].values[0] # add to sum-stack
offset_x = cyclic_offset_x[divmod(cycler, len(cyclic_offset_x))[1]]
offset_y = cyclic_offset_y[divmod(cycler, len(cyclic_offset_y))[1]]
ax.annotate(text="{}".format(int(df[c].values[0])), xy=(prev - df[c].values / 2, y),
xytext=(offset_x, offset_y), textcoords='offset pixels',
ha='center', va='top', fontsize=8,
arrowprops=dict(facecolor='black', shrink=0.01, width=0.3, headwidth=0.3),
bbox=dict(boxstyle='round', facecolor='grey', alpha=0.5))
cycler += 1
if not waterfall:
ax.legend() # if waterfall, the index annotates the columns. If
# waterfall ==False, the legend annotates the columns
if isinstance(ax0, type(None)):
ax.set_title("Voi la")
ax.set_xlabel("UltraWatts")
plt.show()
else:
return ax
''' (Sometimes, it is more tedious and requires some custom functions to make the labels look alright.
'''
A, B = 80,80
n_units = df.shape[1]
cyclic_offset_x = -A*np.cos(2*np.pi / (2*n_units) *np.arange(n_units))
cyclic_offset_y = B*np.sin(2*np.pi / (2*n_units) * np.arange(n_units)) + B/2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Double loop to populate subplots in matplotlib - python

The plt.show() is within the scope of the for-loops, so the figure plot gets shown after the initialization of the first subplot. If you move it out of the loops (un-indent it to the beginning of the line), the plot should correctly be shown with all subplots.

Related

Remove for loops when plotting matplotlib subplots

Matplotlib segmented Plot

How to generate labelled barplots using seaborn?

Plotting subplots using dataframes

Horizontal stacked bar plot and add labels to each section

Categories

Resources