matplotlib visualization- positive negative proportion chart

matplotlib visualization- positive negative proportion chart - python

I'm trying to make the same chart as below and wonder if matplotlib has a similar chart to make that.
The chart below is the result of the STM topic model in the R package
I have probs values using DMR in Python:
array([[0.07204196, 0.04238116],
[0.04518877, 0.30546978],
[0.0587892 , 0.19870868],
[0.16710107, 0.07182639],
[0.128209 , 0.02422131],
[0.15264449, 0.07237352],
[0.2250081 , 0.06986096],
[0.1337716 , 0.10750801],
[0.01197221, 0.06736039],
[0.00527367, 0.04028973]], dtype=float32)
These are the results and left is Negative words and right is Positive
Example of negative positive proportion chart:

It is possible to create something quite close to the image you included. I understood that the right column should be negative while the right column should be positive?
First make the data negative:
import numpy as np
arr = np.array([[0.07204196, 0.04238116],
[0.04518877, 0.30546978],
[0.0587892 , 0.19870868],
[0.16710107, 0.07182639],
[0.128209 , 0.02422131],
[0.15264449, 0.07237352],
[0.2250081 , 0.06986096],
[0.1337716 , 0.10750801],
[0.01197221, 0.06736039],
[0.00527367, 0.04028973]], dtype="float32")
# Make the right col negative
arr[:, 0] *= -1
Then we can plot like so:
from string import ascii_lowercase
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for y, x in enumerate(arr.flatten()):
# Get a label from the alphabet
label = ascii_lowercase[y]
# Plot the point
ax.plot(x, y, "o", color="black")
# Annotate the point with the label
ax.annotate(label, xy=(x, y), xytext=(x - 0.036, y), verticalalignment="center")
# Add the vertical line at zero
ax.axvline(0, ls="--", color="black", lw=1.25)
# Make the x axis equal
xlim = abs(max(ax.get_xlim(), key=abs))
ax.set_xlim((-xlim, xlim))
# Remove y axis
ax.yaxis.set_visible(False)
# Add two text labels for the x axis
for text, x in zip(["Negative", "Positive"], ax.get_xlim()):
ax.text(x / 2, -3.75, f"{text} Reviews", horizontalalignment="center")
Which outputs:
You can tweak the values in the calls to ax.annotate and ax.text if you need to change the locations of the text on the plot or x-axis.

I'm not sure what the key part of the question is. That is, are you more interested in labeling the individual points based on the category, or if you're more concerned with the unique circle with a line through it. With the array provided it's a little confusing about what the data represents.
What I've assumed is each sublist represents a single category. With that in mind, what I did was make a separate column (delta) for the differences in values and then plotted them vs the index.
# New column (delta) with styling
df['delta'] = df[0]-df[1]
col = np.where(df.delta>0,'g',np.where(df.index<0,'b','r'))
fig, ax = plt.subplots(figsize =(10,7))
# Style it up a bit
plt.title('Differnece in Topic Proportion (Negative vs Positive)')
plt.xlabel('Net Review Score')
plt.ylabel('Index Number')
plt.tight_layout()
plt.savefig("Evolution of rapport of polarisation - (Aluminium).png")
plt.scatter(df['delta'], df.index, s=None, c=col, marker=None, linewidth=2)
plt.axvline(x = 0, color = 'b', label = 'axvline - full height', linestyle="--" )
That gives an out of this:

Related

How to generate labelled barplots using seaborn?

I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.

In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()

Setting legend labels to dates in Python

In short, I have (a couple of days worth of) glucose values plotted against their timestamps. I have written a function which then layers the glucose values on the same x-axis so I can look for glucose trends. Ultimately, that means that glucose data from a couple of days is plotted with different lines, resulting in the graph below:
Currently, the label says 'Glucose reading' for every color. I am looking to set the label in a way so when the data is being plotted it shows the dates (2019-11-21, 2019-11-22) and so on. I'm really not sure how to do it since I've never dealt with matplotlib legends below and I couldn't really find any useful solutions.
Any guidance would be much appreciated!
EDIT 1:
I am using pandas dataframe. Minimal code example - My legend is positioned in a plotting function like so:
def plotting_function(x, y, isoverlay = False):
years_fmt = mdates.DateFormatter(' %H:%M:%S')
ax = plt.axes()
ax.xaxis.set_major_formatter(years_fmt)
dates = [date.to_pydatetime() for date in x]
if isoverlay:
plt.plot(x, y, label= "Glucose reading" )
else:
plt.plot(x, y, 'rs:', label="Glucose reading")
plt.xlabel("Time of readings")
plt.ylabel("Glucose readings in mmol/L")
plt.legend(ncol=2)
plt.title("Glucose readings plotted against their timestamps")

In the plotting function you could add a list of the labels for the legend as an extra parameter and give that to plt.legend().
Here is a minimal example to show how it could work:
import numpy as np
import matplotlib.pyplot as plt
def plotting_function(x, y, labels):
plt.plot(x, y)
plt.legend(labels, ncol=2)
N = 100
K = 9
x = np.arange(N)
y = np.random.normal(.05, .2, (N, K)).cumsum(axis=0) + np.random.uniform(1, 10, K)
labels = [f'label {i + 1}' for i in range(K)] # as a test: ['label 1', 'label 2' ,...]
# labels = ['2019-11-21', '2019-11-22', ...] # this is another example, how dates could be used
plotting_function(x, y, labels)

Matplotlib: creating a scatter plot where each point is colored (weighted) based on its count of instances in the dataset

I have a dataset of N=910 probabilities, and hte probabilites are represented as all integers between 5 and 90 that are divisible by 5. This constitutes my x input. Each probability has a boolean response associated with it, the booleans being encoded using a 0 for false and a 1 for true. Some code to recreate this.
x_inpt = np.random.choice(np.arange(5, 91, 5), 910)
y_inpt = np.random.choice([0, 1], 910)
A lot of the line plots for my actual data look like this.
(and for curiosity sake, here's the original code used for this plot)
plt.scatter(x_inpt, y_inpt)
plt.ylabel("Decisions On Adminstering Experimental Treatment")
plt.xlabel("Harm probabilities")
plt.xticks(range(0, 101, 10))
plt.yticks([0.0, 1.0], labels=["No", "Yes"])
title_str = "Pilot Data From " + str(exp_count) + " Experiments / " + str(num_trials) + " trials"
plt.title(title_str)
plt.tight_layout()
plt.show()
Even though this image has 910 data points, they all get placed on top of one another other. There's multiple instances of the same data point, or multiple instances of the same x y coordinate being plotted in my data.
I wanted to find a way to make data points that have the most instances be darker (or lighter) just to make this graph more clearly informative.
But I'm not really sure how to, and my code is stuck looking like the code sample I posted for the above plot. I seem to be having a rough time parsing matplotlib documentation and figuring out how to implement this.

A perhaps silly solution to this would be something like hashing each point based on (x,y) so it always is unique and counting this up:
# hash (x_inpt,y_input)
def hash(x,y):
# Dummy sum since we have two nice integer arrays
return x+y
hashed_output = hash(x_inpt, y_inpt)
x_y_weights = np.bincount(hashed_output)
color_for_each_sample = x_y_weights[hashed_output]
...
plt.scatter(x_inpt, y_inpt, c=color_for_each_sample)
plt.colorbar()
...
I'm working on a more elegant version now

If you don't mind pandas, you could use something like this
import pandas as pd
df = pd.DataFrame({'x':x_inpt, 'y':y_inpt})
grp = df.groupby(['x','y']).size().reset_index()
a = plt.scatter(grp['x'], grp['y'], c=grp[0], cmap='cool')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Number of points', rotation=-90, va="bottom")
plt.ylabel("Decisions On Adminstering Experimental Treatment")
plt.xlabel("Harm probabilities")
plt.xticks(range(0, 101, 10))
plt.yticks([0.0, 1.0], labels=["No", "Yes"])
title_str = "Pilot Data"
plt.title(title_str)
plt.tight_layout()
plt.show()

Here is a solution using a counter to count each x,y pair. And then use scatter to either change the color or the size of the dots. Or even a number in text form. The size is proportional to the area of the dot, so I squared it in the demo below.
Just to show the possibilities, the three ways are combined in the experimental code. In practise, you'd probably only use one of the methods.
from matplotlib import pyplot as plt
import numpy as np
from collections import Counter
num_trials = 910
x_inpt = np.random.choice(np.arange(5, 91, 5), num_trials)
y_inpt = np.random.choice([0, 1], num_trials)
count = Counter(zip(x_inpt, y_inpt))
xs = np.array([x for (x, y), c in count.items()])
ys = np.array([y for (x, y), c in count.items()])
cs = np.array([c for (x, y), c in count.items()])
cmin = cs.min()
cmax = cs.max()
cmid = (cmin + cmax) / 2
fig, ax = plt.subplots(figsize=(12, 3))
plt.scatter(xs, ys, c=cs, cmap='plasma', s=1200*cs*cs/(cmax * cmax))
for (x, y), c in count.items():
# the maximum fontsize is set to 22
# the color is either white or black the contrast with the color of the scatter dot
ax.text(x, y, c, color='w' if c<cmid else 'k', fontsize=22*c/cmax, ha='center', va='center')
cbar = plt.colorbar()
cbar.ax.set_title('Counts')
plt.ylabel("Decisions On Adminstering\nExperimental Treatment")
plt.xlabel("Harm probabilities")
plt.xticks(range(0, 91, 10))
plt.ylim(-0.5, 1.5)
plt.yticks([0, 1], labels=["No", "Yes"])
title_str = f"Pilot Data From {20} Experiments / {num_trials} trials"
plt.title(title_str)
plt.tight_layout()
plt.show()
Here is another example, supposing the data has a binomial distribution and using the reversed colormap without the numbers.
y_inpt = np.random.choice([0, 1], num_trials)
x_inpt = np.where(y_inpt == 0,
np.random.binomial(20, 0.5, num_trials),
np.random.binomial(20, 0.3, num_trials)) * 5

Plotting and color coding multiple y-axes

This is my first attempt using Matplotlib and I am in need of some guidance. I am trying to generate plot with 4 y-axes, two on the left and two on the right with shared x axis. Here's my dataset on shared dropbox folder
import pandas as pd
%matplotlib inline
url ='http://dropproxy.com/f/D34'
df= pd.read_csv(url, index_col=0, parse_dates=[0])
df.plot()
This is what the simple pandas plot looks like:
I would like to plot this similar to the example below, with TMAX and TMIN on primary y-axis (on same scale).
My attempt:
There's one example I found on the the matplotlib listserv..I am trying to adapt it to my data but something is not working right...Here's the script.
# multiple_yaxes_with_spines.py
# This is a template Python program for creating plots (line graphs) with 2, 3,
# or 4 y-axes. (A template program is one that you can readily modify to meet
# your needs). Almost all user-modifiable code is in Section 2. For most
# purposes, it should not be necessary to modify anything else.
# Dr. Phillip M. Feldman, 27 Oct, 2009
# Acknowledgment: This program is based on code written by Jae-Joon Lee,
# URL= http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/trunk/matplotlib/
# examples/pylab_examples/multiple_yaxis_with_spines.py?revision=7908&view=markup
# Section 1: Import modules, define functions, and allocate storage.
import matplotlib.pyplot as plt
from numpy import *
def make_patch_spines_invisible(ax):
ax.set_frame_on(True)
ax.patch.set_visible(False)
for sp in ax.spines.itervalues():
sp.set_visible(False)
def make_spine_invisible(ax, direction):
if direction in ["right", "left"]:
ax.yaxis.set_ticks_position(direction)
ax.yaxis.set_label_position(direction)
elif direction in ["top", "bottom"]:
ax.xaxis.set_ticks_position(direction)
ax.xaxis.set_label_position(direction)
else:
raise ValueError("Unknown Direction : %s" % (direction,))
ax.spines[direction].set_visible(True)
# Create list to store dependent variable data:
y= [0, 0, 0, 0, 0]
# Section 2: Define names of variables and the data to be plotted.
# `labels` stores the names of the independent and dependent variables). The
# first (zeroth) item in the list is the x-axis label; remaining labels are the
# first y-axis label, second y-axis label, and so on. There must be at least
# two dependent variables and not more than four.
labels= ['Date', 'Maximum Temperature', 'Solar Radiation',
'Rainfall', 'Minimum Temperature']
# Plug in your data here, or code equations to generate the data if you wish to
# plot mathematical functions. x stores values of the independent variable;
# y[1], y[2], ... store values of the dependent variable. (y[0] is not used).
# All of these objects should be NumPy arrays.
# If you are plotting mathematical functions, you will probably want an array of
# uniformly spaced values of x; such an array can be created using the
# `linspace` function. For example, to define x as an array of 51 values
# uniformly spaced between 0 and 2, use the following command:
# x= linspace(0., 2., 51)
# Here is an example of 6 experimentally measured y1-values:
# y[1]= array( [3, 2.5, 7.3e4, 4, 8, 3] )
# Note that the above statement requires both parentheses and square brackets.
# With a bit of work, one could make this program read the data from a text file
# or Excel worksheet.
# Independent variable:
x = df.index
# First dependent variable:
y[1]= df['TMAX']
# Second dependent variable:
y[2]= df['RAD']
y[3]= df['RAIN']
y[4]= df['TMIN']
# Set line colors here; each color can be specified using a single-letter color
# identifier ('b'= blue, 'r'= red, 'g'= green, 'k'= black, 'y'= yellow,
# 'm'= magenta, 'y'= yellow), an RGB tuple, or almost any standard English color
# name written without spaces, e.g., 'darkred'. The first element of this list
# is not used.
colors= [' ', '#C82121', '#E48E3C', '#4F88BE', '#CF5ADC']
# Set the line width here. linewidth=2 is recommended.
linewidth= 2
# Section 3: Generate the plot.
N_dependents= len(labels) - 1
if N_dependents > 4: raise Exception, \
'This code currently handles a maximum of four independent variables.'
# Open a new figure window, setting the size to 10-by-7 inches and the facecolor
# to white:
fig= plt.figure(figsize=(16,9), dpi=120, facecolor=[1,1,1])
host= fig.add_subplot(111)
host.set_xlabel(labels[0])
# Use twinx() to create extra axes for all dependent variables except the first
# (we get the first as part of the host axes). The first element of y_axis is
# not used.
y_axis= (N_dependents+2) * [0]
y_axis[1]= host
for i in range(2,len(labels)+1): y_axis[i]= host.twinx()
if N_dependents >= 3:
# The following statement positions the third y-axis to the right of the
# frame, with the space between the frame and the axis controlled by the
# numerical argument to set_position; this value should be between 1.10 and
# 1.2.
y_axis[3].spines["right"].set_position(("axes", 1.15))
make_patch_spines_invisible(y_axis[3])
make_spine_invisible(y_axis[3], "right")
plt.subplots_adjust(left=0.0, right=0.8)
if N_dependents >= 4:
# The following statement positions the fourth y-axis to the left of the
# frame, with the space between the frame and the axis controlled by the
# numerical argument to set_position; this value should be between 1.10 and
# 1.2.
y_axis[4].spines["left"].set_position(("axes", -0.15))
make_patch_spines_invisible(y_axis[4])
make_spine_invisible(y_axis[4], "left")
plt.subplots_adjust(left=0.2, right=0.8)
p= (N_dependents+1) * [0]
# Plot the curves:
for i in range(1,N_dependents+1):
p[i], = y_axis[i].plot(x, y[i], colors[i],
linewidth=linewidth, label=labels[i])
# Set axis limits. Use ceil() to force upper y-axis limits to be round numbers.
host.set_xlim(x.min(), x.max())
host.set_xlabel(labels[0], size=16)
for i in range(1,N_dependents+1):
y_axis[i].set_ylim(0.0, ceil(y[i].max()))
y_axis[i].set_ylabel(labels[i], size=16)
y_axis[i].yaxis.label.set_color(colors[i])
for sp in y_axis[i].spines.itervalues():
sp.set_color(colors[i])
for obj in y_axis[i].yaxis.get_ticklines():
# `obj` is a matplotlib.lines.Line2D instance
obj.set_color(colors[i])
obj.set_markeredgewidth(3)
for obj in y_axis[i].yaxis.get_ticklabels():
obj.set_color(colors[i])
obj.set_size(12)
obj.set_weight(600)
# To enable the legend, uncomment the following two lines:
lines= p[1:]
host.legend(lines, [l.get_label() for l in lines])
plt.draw(); plt.show()
And the output
How can I put the scale on max and min temp on a same scale? Also, how can I get rid of second y-axis with black color, scaled from 0 to 10?
Is there a simpler way to achieve this?

How can I put the scale on max and min temp on a same scale?
Plot them in the same axes.
Also, how can I get rid of second y-axis with black color, scaled from 0 to 10?
Do not create that axes.
You want to plot four variables, two of them can go in the same subplot so you only need three subplots. But you are creating five of them?
Step by step
Keep in mind: different y scales <-> different subplots sharing x-axis.
Two variables with a common scale (left), two variables with independent scales (right).
Create the primary subplot, let's call it ax1. Plot everything you want in it, in this case TMIN and TMAX as stated in your question.
Create a twin subplot sharing x axis twinx(ax=ax1). Plot the third variable, say RAIN.
Create another twin subplot twinx(ax=ax1). Plot the fourth variable 'RAD'.
Adjust colors, labels, spine positions... to your heart's content.
Unsolicited advice: do not try to fix code you don't understand.

Variation of the original plot showing how you can plot variables on multiple axes
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
url ='http://dropproxy.com/f/D34'
df= pd.read_csv(url, index_col=0, parse_dates=[0])
fig = plt.figure()
ax = fig.add_subplot(111) # Primary y
ax2 = ax.twinx() # Secondary y
# Plot variables
ax.plot(df.index, df['TMAX'], color='red')
ax.plot(df.index, df['TMIN'], color='green')
ax2.plot(df.index, df['RAIN'], color='orange')
ax2.plot(df.index, df['RAD'], color='yellow')
# Custom ylimit
ax.set_ylim(0,50)
# Custom x axis date formats
import matplotlib.dates as mdates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))

I modified #bishopo's suggestions to generate what I wanted, however, the plot still needs some tweaking with font sizes for axes label.
Here's what I have done so far.
import pandas as pd
%matplotlib inline
url ='http://dropproxy.com/f/D34'
df= pd.read_csv(url, index_col=0, parse_dates=[0])
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA
import matplotlib.pyplot as plt
if 1:
# Set the figure size, dpi, and background color
fig = plt.figure(1, (16,9),dpi =300, facecolor = 'W',edgecolor ='k')
# Update the tick label size to 12
plt.rcParams.update({'font.size': 12})
host = host_subplot(111, axes_class=AA.Axes)
plt.subplots_adjust(right=0.75)
par1 = host.twinx()
par2 = host.twinx()
par3 = host.twinx()
offset = 60
new_fixed_axis = par2.get_grid_helper().new_fixed_axis
new_fixed_axis1 = host.get_grid_helper().new_fixed_axis
par2.axis["right"] = new_fixed_axis(loc="right",
axes=par2,
offset=(offset, 0))
par3.axis["left"] = new_fixed_axis1(loc="left",
axes=par3,
offset=(-offset, 0))
par2.axis["right"].toggle(all=True)
par3.axis["left"].toggle(all=True)
par3.axis["right"].set_visible(False)
# Set limit on both y-axes
host.set_ylim(-30, 50)
par3.set_ylim(-30,50)
host.set_xlabel("Date")
host.set_ylabel("Minimum Temperature ($^\circ$C)")
par1.set_ylabel("Solar Radiation (W$m^{-2}$)")
par2.set_ylabel("Rainfall (mm)")
par3.set_ylabel('Maximum Temperature ($^\circ$C)')
p1, = host.plot(df.index,df['TMIN'], 'm,')
p2, = par1.plot(df.index, df.RAD, color ='#EF9600', linestyle ='--')
p3, = par2.plot(df.index, df.RAIN, '#09BEEF')
p4, = par3.plot(df.index, df['TMAX'], '#FF8284')
par1.set_ylim(0, 36)
par2.set_ylim(0, 360)
host.legend()
host.axis["left"].label.set_color(p1.get_color())
par1.axis["right"].label.set_color(p2.get_color())
par2.axis["right"].label.set_color(p3.get_color())
par3.axis["left"].label.set_color(p4.get_color())
tkw = dict(size=5, width=1.5)
host.tick_params(axis='y', colors=p1.get_color(), **tkw)
par1.tick_params(axis='y', colors=p2.get_color(), **tkw)
par2.tick_params(axis='y', colors=p3.get_color(), **tkw)
par3.tick_params(axis='y', colors=p4.get_color(), **tkw)
host.tick_params(axis='x', **tkw)
par1.axis["right"].label.set_fontsize(16)
par2.axis["right"].label.set_fontsize(16)
par3.axis["left"].label.set_fontsize(16)
host.axis["bottom"].label.set_fontsize(16)
host.axis["left"].label.set_fontsize(16)
plt.figtext(.5,.92,'Weather Data', fontsize=22, ha='center')
plt.draw()
plt.show()
fig.savefig("Test1.png")
The output

Horizontal stacked bar plot and add labels to each section

I am trying to replicate the following image in matplotlib and it seems barh is my only option. Though it appears that you can't stack barh graphs so I don't know what to do
If you know of a better python library to draw this kind of thing, please let me know.
This is all I could come up with as a start:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
y_pos = np.arange(len(people))
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax.barh(y_pos, bottomdata,color='r',align='center')
ax.barh(y_pos, topdata,color='g',align='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
I would then have to add labels individually using ax.text which would be tedious. Ideally I would like to just specify the width of the part to be inserted then it updates the center of that section with a string of my choosing. The labels on the outside (e.g. 3800) I can add myself later, it is mainly the labeling over the bar section itself and creating this stacked method in a nice way I'm having problems with. Can you even specify a 'distance' i.e. span of color in any way?

Edit 2: for more heterogeneous data. (I've left the above method since I find it more usual to work with the same number of records per series)
Answering the two parts of the question:
a) barh returns a container of handles to all the patches that it drew. You can use the coordinates of the patches to aid the text positions.
b) Following these two answers to the question that I noted before (see Horizontal stacked bar chart in Matplotlib), you can stack bar graphs horizontally by setting the 'left' input.
and additionally c) handling data that is less uniform in shape.
Below is one way you could handle data that is less uniform in shape is simply to process each segment independently.
import numpy as np
import matplotlib.pyplot as plt
# some labels for each row
people = ('A','B','C','D','E','F','G','H')
r = len(people)
# how many data points overall (average of 3 per person)
n = r * 3
# which person does each segment belong to?
rows = np.random.randint(0, r, (n,))
# how wide is the segment?
widths = np.random.randint(3,12, n,)
# what label to put on the segment (xrange in py2.7, range for py3)
labels = range(n)
colors ='rgbwmc'
patch_handles = []
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
left = np.zeros(r,)
row_counts = np.zeros(r,)
for (r, w, l) in zip(rows, widths, labels):
print r, w, l
patch_handles.append(ax.barh(r, w, align='center', left=left[r],
color=colors[int(row_counts[r]) % len(colors)]))
left[r] += w
row_counts[r] += 1
# we know there is only one patch but could enumerate if expanded
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y, "%d%%" % (l), ha='center',va='center')
y_pos = np.arange(8)
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
Which produces a graph like this , with a different number of segments present in each series.
Note that this is not particularly efficient since each segment used an individual call to ax.barh. There may be more efficient methods (e.g. by padding a matrix with zero-width segments or nan values) but this likely to be problem-specific and is a distinct question.
Edit: updated to answer both parts of the question.
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
segments = 4
# generate some multi-dimensional data & arbitrary labels
data = 3 + 10* np.random.rand(segments, len(people))
percentages = (np.random.randint(5,20, (len(people), segments)))
y_pos = np.arange(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
colors ='rgbwmc'
patch_handles = []
left = np.zeros(len(people)) # left alignment of data starts at zero
for i, d in enumerate(data):
patch_handles.append(ax.barh(y_pos, d,
color=colors[i%len(colors)], align='center',
left=left))
# accumulate the left-hand offsets
left += d
# go through all of the bar segments and annotate
for j in range(len(patch_handles)):
for i, patch in enumerate(patch_handles[j].get_children()):
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x,y, "%d%%" % (percentages[i,j]), ha='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
You can achieve a result along these lines (note: the percentages I used have nothing to do with the bar widths, as the relationship in the example seems unclear):
See Horizontal stacked bar chart in Matplotlib for some ideas on stacking horizontal bar plots.

Imports and Test DataFrame
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
For vertical stacked bars see Stacked Bar Chart with Centered Labels
import pandas as pd
import numpy as np
# create sample data as shown in the OP
np.random.seed(365)
people = ('A','B','C','D','E','F','G','H')
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
# create the dataframe
df = pd.DataFrame({'Female': bottomdata, 'Male': topdata}, index=people)
# display(df)
Female Male
A 12.41 7.42
B 9.42 4.10
C 9.85 7.38
D 8.89 10.53
E 8.44 5.92
F 6.68 11.86
G 10.67 12.97
H 6.05 7.87
Updated with matplotlib v3.4.2
Use matplotlib.pyplot.bar_label
See How to add value labels on a bar chart for additional details and examples with .bar_label.
labels = [f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ] for python < 3.8, without the assignment expression (:=).
Plotted using pandas.DataFrame.plot with kind='barh'
ax = df.plot(kind='barh', stacked=True, figsize=(8, 6))
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
# uncomment and use the next line if there are no nan or 0 length sections; just use fmt to add a % (the previous two lines of code are not needed, in this case)
# ax.bar_label(c, fmt='%.2f%%', label_type='center')
# move the legend
ax.legend(bbox_to_anchor=(1.025, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Using seaborn
sns.barplot does not have an option for stacked bar plots, however, sns.histplot and sns.displot can be used to create horizontal stacked bars.
seaborn typically requires the dataframe to be in a long, instead of wide, format, so use pandas.DataFrame.melt to reshape the dataframe.
Reshape dataframe
# convert the dataframe to a long form
df = df.reset_index()
df = df.rename(columns={'index': 'People'})
dfm = df.melt(id_vars='People', var_name='Gender', value_name='Percent')
# display(dfm)
People Gender Percent
0 A Female 12.414557
1 B Female 9.416027
2 C Female 9.846105
3 D Female 8.885621
4 E Female 8.438872
5 F Female 6.680709
6 G Female 10.666258
7 H Female 6.050124
8 A Male 7.420860
9 B Male 4.104433
10 C Male 7.383738
11 D Male 10.526158
12 E Male 5.916262
13 F Male 11.857227
14 G Male 12.966913
15 H Male 7.865684
sns.histplot: axes-level plot
fig, axe = plt.subplots(figsize=(8, 6))
sns.histplot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', ax=axe)
# iterate through each set of containers
for c in axe.containers:
# add bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
sns.displot: figure-level plot
g = sns.displot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', height=6)
# iterate through each facet / supbplot
for axe in g.axes.flat:
# iteate through each set of containers
for c in axe.containers:
# add the bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
Original Answer - before matplotlib v3.4.2
The easiest way to plot a horizontal or vertical stacked bar, is to load the data into a pandas.DataFrame
This will plot, and annotate correctly, even when all categories ('People'), don't have all segments (e.g. some value is 0 or NaN)
Once the data is in the dataframe:
It's easier to manipulate and analyze
It can be plotted with the matplotlib engine, using:
pandas.DataFrame.plot.barh
label_text = f'{width}' for annotations
pandas.DataFrame.plot.bar
label_text = f'{height}' for annotations
SO: Vertical Stacked Bar Chart with Centered Labels
These methods return a matplotlib.axes.Axes or a numpy.ndarray of them.
Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.
Each .Rectangle has methods for extracting the various values that define the rectangle.
Each .Rectangle is in order from left the right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches.
The labels are made using an f-string, label_text = f'{width:.2f}%', so any additional text can be added as needed.
Plot and Annotate
Plotting the bar, is 1 line, the remainder is annotating the rectangles
# plot the dataframe with 1 line
ax = df.plot.barh(stacked=True, figsize=(8, 6))
# .patches is everything inside of the chart
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The height of the bar is the data value and can be used as the label
label_text = f'{width:.2f}%' # f'{width:.2f}' to format decimal values
# ax.text(x, y, text)
label_x = x + width / 2
label_y = y + height / 2
# only plot labels greater than given width
if width > 0:
ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
# move the legend
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Example with Missing Segment
# set one of the dataframe values to 0
df.iloc[4, 1] = 0
Note the annotations are all in the correct location from df.

For this case, the above answers work perfectly. The issue I had, and didn't find a plug-and-play solution online, was that I often have to plot stacked bars in multi-subplot figures, with many values, which tend to have very non-homogenous amplitudes.
(Note: I work usually with pandas dataframes, and matplotlib. I couldn't make the bar_label() method of matplotlib to work all the times.)
So, I just give a kind of ad-hoc, but easily generalizable solution. In this example, I was working with single-row dataframes (for power-exchange monitoring purposes per hour), so, my dataframe (df) had just one row.
(I provide an example figure to show how this can be useful in very densely-packed plots)
[enter image description here][1]
[1]: https://i.stack.imgur.com/9akd8.png
'''
This implementation produces a stacked, horizontal bar plot.
df --> pandas dataframe. Columns are used as the iterator, and only the firs value of each column is used.
waterfall--> bool: if True, apart from the stack-direction, also a perpendicular offset is added.
cyclic_offset_x --> list (of any length) or None: loop through these values to use as x-offset pixels.
cyclic_offset_y --> list (of any length) or None: loop through these values to use as y-offset pixels.
ax --> matplotlib Axes, or None: if None, creates a new axis and figure.
'''
def magic_stacked_bar(df, waterfall=False, cyclic_offset_x=None, cyclic_offset_y=None, ax=None):
if isinstance(cyclic_offset_x, type(None)):
cyclic_offset_x = [0, 0]
if isinstance(cyclic_offset_y, type(None)):
cyclic_offset_y = [0, 0]
ax0 = ax
if isinstance(ax, type(None)):
fig, ax = plt.subplots()
fig.set_size_inches(19, 10)
cycler = 0;
prev = 0 # summation variable to make it stacked
for c in df.columns:
if waterfall:
y = c ; label = "" # bidirectional stack
else:
y = 0; label = c # unidirectional stack
ax.barh(y=y, width=df[c].values[0], height=1, left=prev, label = label)
prev += df[c].values[0] # add to sum-stack
offset_x = cyclic_offset_x[divmod(cycler, len(cyclic_offset_x))[1]]
offset_y = cyclic_offset_y[divmod(cycler, len(cyclic_offset_y))[1]]
ax.annotate(text="{}".format(int(df[c].values[0])), xy=(prev - df[c].values / 2, y),
xytext=(offset_x, offset_y), textcoords='offset pixels',
ha='center', va='top', fontsize=8,
arrowprops=dict(facecolor='black', shrink=0.01, width=0.3, headwidth=0.3),
bbox=dict(boxstyle='round', facecolor='grey', alpha=0.5))
cycler += 1
if not waterfall:
ax.legend() # if waterfall, the index annotates the columns. If
# waterfall ==False, the legend annotates the columns
if isinstance(ax0, type(None)):
ax.set_title("Voi la")
ax.set_xlabel("UltraWatts")
plt.show()
else:
return ax
''' (Sometimes, it is more tedious and requires some custom functions to make the labels look alright.
'''
A, B = 80,80
n_units = df.shape[1]
cyclic_offset_x = -A*np.cos(2*np.pi / (2*n_units) *np.arange(n_units))
cyclic_offset_y = B*np.sin(2*np.pi / (2*n_units) * np.arange(n_units)) + B/2

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

matplotlib visualization- positive negative proportion chart - python

Related

How to generate labelled barplots using seaborn?

Setting legend labels to dates in Python

Matplotlib: creating a scatter plot where each point is colored (weighted) based on its count of instances in the dataset

Plotting and color coding multiple y-axes

Horizontal stacked bar plot and add labels to each section

Categories

Resources